[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,453)

Search Parameters:
Keywords = RGB

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 17602 KiB  
Article
Enhancing Detection of Pedestrians in Low-Light Conditions by Accentuating Gaussian–Sobel Edge Features from Depth Maps
by Minyoung Jung and Jeongho Cho
Appl. Sci. 2024, 14(18), 8326; https://doi.org/10.3390/app14188326 (registering DOI) - 15 Sep 2024
Abstract
Owing to the low detection accuracy of camera-based object detection models, various fusion techniques with Light Detection and Ranging (LiDAR) have been attempted. This has resulted in improved detection of objects that are difficult to detect due to partial occlusion by obstacles or [...] Read more.
Owing to the low detection accuracy of camera-based object detection models, various fusion techniques with Light Detection and Ranging (LiDAR) have been attempted. This has resulted in improved detection of objects that are difficult to detect due to partial occlusion by obstacles or unclear silhouettes. However, the detection performance remains limited in low-light environments where small pedestrians are located far from the sensor or pedestrians have difficult-to-estimate shapes. This study proposes an object detection model that employs a Gaussian–Sobel filter. This filter combines Gaussian blurring, which suppresses the effects of noise, and a Sobel mask, which accentuates object features, to effectively utilize depth maps generated by LiDAR for object detection. The model performs independent pedestrian detection using the real-time object detection model You Only Look Once v4, based on RGB images obtained using a camera and depth maps preprocessed by the Gaussian–Sobel filter, and estimates the optimal pedestrian location using non-maximum suppression. This enables accurate pedestrian detection while maintaining a high detection accuracy even in low-light or external-noise environments, where object features and contours are not well defined. The test evaluation results demonstrated that the proposed method achieved at least 1–7% higher average precision than the state-of-the-art models under various environments. Full article
(This article belongs to the Special Issue Object Detection and Image Classification)
Show Figures

Figure 1

Figure 1
<p>Block diagram of the proposed multi-sensor-based detection model.</p>
Full article ">Figure 2
<p>Process for generating a depth map for image registration: (<b>a</b>) RGB image; (<b>b</b>) PCD projected on RGB image; (<b>c</b>) depth map.</p>
Full article ">Figure 3
<p>Preprocessing of depth maps using the Gaussian–Sobel filter: (<b>a</b>) depth map; (<b>b</b>) depth map after Gaussian filtering; (<b>c</b>) depth map after Gaussian–Sobel filtering; (<b>d</b>) depth map after Canny edge filtering.</p>
Full article ">Figure 3 Cont.
<p>Preprocessing of depth maps using the Gaussian–Sobel filter: (<b>a</b>) depth map; (<b>b</b>) depth map after Gaussian filtering; (<b>c</b>) depth map after Gaussian–Sobel filtering; (<b>d</b>) depth map after Canny edge filtering.</p>
Full article ">Figure 4
<p>Flowchart for non-maximum suppression (NMS).</p>
Full article ">Figure 5
<p>Comparison of pedestrian detection performance of the proposed model and similar models at 100% brightness: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 5 Cont.
<p>Comparison of pedestrian detection performance of the proposed model and similar models at 100% brightness: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 6
<p>Comparison of pedestrian detection performance of the proposed model and similar models at 40% brightness level: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 6 Cont.
<p>Comparison of pedestrian detection performance of the proposed model and similar models at 40% brightness level: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 7
<p>Comparison of the pedestrian detection performance of the proposed model and similar models at 40% brightness and 0.5% noise level: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">Figure 7 Cont.
<p>Comparison of the pedestrian detection performance of the proposed model and similar models at 40% brightness and 0.5% noise level: (<b>a</b>) depth map; (<b>b</b>) RGB + depth map; (<b>c</b>) Maragos and Pessoa [<a href="#B12-applsci-14-08326" class="html-bibr">12</a>]; (<b>d</b>) Deng [<a href="#B13-applsci-14-08326" class="html-bibr">13</a>]; (<b>e</b>) Ali and Clausi [<a href="#B14-applsci-14-08326" class="html-bibr">14</a>]; (<b>f</b>) proposed model.</p>
Full article ">
15 pages, 12764 KiB  
Article
Learning Unsupervised Cross-Domain Model for TIR Target Tracking
by Xiu Shu, Feng Huang, Zhaobing Qiu, Xinming Zhang and Di Yuan
Mathematics 2024, 12(18), 2882; https://doi.org/10.3390/math12182882 (registering DOI) - 15 Sep 2024
Abstract
The limited availability of thermal infrared (TIR) training samples leads to suboptimal target representation by convolutional feature extraction networks, which adversely impacts the accuracy of TIR target tracking methods. To address this issue, we propose an unsupervised cross-domain model (UCDT) for TIR tracking. [...] Read more.
The limited availability of thermal infrared (TIR) training samples leads to suboptimal target representation by convolutional feature extraction networks, which adversely impacts the accuracy of TIR target tracking methods. To address this issue, we propose an unsupervised cross-domain model (UCDT) for TIR tracking. Our approach leverages labeled training samples from the RGB domain (source domain) to train a general feature extraction network. We then employ a cross-domain model to adapt this network for effective target feature extraction in the TIR domain (target domain). This cross-domain strategy addresses the challenge of limited TIR training samples effectively. Additionally, we utilize an unsupervised learning technique to generate pseudo-labels for unlabeled training samples in the source domain, which helps overcome the limitations imposed by the scarcity of annotated training data. Extensive experiments demonstrate that our UCDT tracking method outperforms existing tracking approaches on the PTB-TIR and LSOTB-TIR benchmarks. Full article
18 pages, 5572 KiB  
Article
Visual-Inertial RGB-D SLAM with Encoder Integration of ORB Triangulation and Depth Measurement Uncertainties
by Zhan-Wu Ma and Wan-Sheng Cheng
Sensors 2024, 24(18), 5964; https://doi.org/10.3390/s24185964 (registering DOI) - 14 Sep 2024
Viewed by 217
Abstract
In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct [...] Read more.
In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct measurements from RGB-D depth cameras, which inherently contain measurement errors. Moreover, the potential of triangulation-based estimation for ORB (Oriented FAST and Rotated BRIEF) feature points remains underutilized. To address the singularity of measurement data, this paper proposes the integration of the ORB features, triangulation uncertainty estimation and depth measurements uncertainty estimation, for 3D positions of feature points. This integration is achieved using a CI (Covariance Intersection) filter, referred to as the CI-TEDM (Triangulation Estimates and Depth Measurements) method. Vision-based SLAM systems face significant challenges, particularly in environments, such as long straight corridors, weakly textured scenes, or during rapid motion, where tracking failures are common. To enhance the stability of visual SLAM, this paper introduces an improved CI-TEDM method by incorporating wheel encoder data. The mathematical model of the encoder is proposed, and detailed derivations of the encoder pre-integration model and error model are provided. Building on these improvements, we propose a novel tightly coupled visual-inertial RGB-D SLAM with encoder integration of ORB triangulation and depth measurement uncertainties. Validation on open-source datasets and real-world environments demonstrates that the proposed improvements significantly enhance the robustness of real-time state estimation and localization accuracy for intelligent vehicles in challenging environments. Full article
Show Figures

Figure 1

Figure 1
<p>The system framework diagram. The system framework diagram consists of three main modules: input, function, and output.</p>
Full article ">Figure 2
<p>An example diagram of reprojection error. Feature matching indicates that points <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>1</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>2</mn> </mrow> </semantics></math> are projections of the same spatial point <math display="inline"><semantics> <mi>p</mi> </semantics></math>, but the camera pose is initially unknown. Initially, there is a certain distance between the projected point, <math display="inline"><semantics> <mrow> <mover> <mi>p</mi> <mo>∧</mo> </mover> <mn>2</mn> </mrow> </semantics></math>, of <math display="inline"><semantics> <mi>P</mi> </semantics></math> and the actual point, <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>2</mn> </mrow> </semantics></math>. The camera pose is then adjusted to minimize this distance.</p>
Full article ">Figure 3
<p>The motion model of the wheeled robot using wheel encoders. The figure illustrates the motion model of a mobile robot using wheel encoders in a 2D plane. The model describes the robot’s trajectory between its position at time <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mi>k</mi> </msub> </mrow> </semantics></math>, denoted as <math display="inline"><semantics> <mrow> <mfenced> <mrow> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </mfenced> </mrow> </semantics></math>, and its position at time <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, denoted as <math display="inline"><semantics> <mrow> <mfenced> <mrow> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </mfenced> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>The process of running datasets in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene. The blue frames represent keyframes, red frames represent initial keyframes, and green frames represent current frames.</p>
Full article ">Figure 5
<p>The process of tracking datasets in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene. The green boxes in the figure represent key feature points detected by VEOS3-TEDM algorithm.</p>
Full article ">Figure 6
<p>The comparison between estimated and true trajectory in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 7
<p>The comparison between true and estimated trajectories in <span class="html-italic">x</span>, <span class="html-italic">y</span> and <span class="html-italic">z</span> directions, using the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 8
<p>3D point cloud maps: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 9
<p>Images of the experimental platform: (<b>a</b>) front view and (<b>b</b>) left view.</p>
Full article ">Figure 10
<p>The location of various components on the mobile robot: (<b>a</b>) bottom level and (<b>b</b>) upper level.</p>
Full article ">Figure 11
<p>The process of tracking real-world environments in the VEOS3-TEDM algorithm: (<b>a1</b>,<b>a2</b>) laboratory, (<b>b1</b>,<b>b2</b>) hall, (<b>c1</b>,<b>c2</b>) weak texture scene, (<b>d1</b>,<b>d2</b>) long straight corridor. The green boxes in the figure represent key feature points detected by VEOS3-TEDM algorithm.</p>
Full article ">Figure 12
<p>A comparison of estimated and true trajectories in real-world environments using the VEOS3-TEDM algorithm.</p>
Full article ">
23 pages, 11793 KiB  
Article
Detecting Canopy Gaps in Uneven-Aged Mixed Forests through the Combined Use of Unmanned Aerial Vehicle Imagery and Deep Learning
by Nyo Me Htun, Toshiaki Owari, Satoshi Tsuyuki and Takuya Hiroshima
Drones 2024, 8(9), 484; https://doi.org/10.3390/drones8090484 - 13 Sep 2024
Viewed by 365
Abstract
Canopy gaps and their associated processes play an important role in shaping forest structure and dynamics. Understanding the information about canopy gaps allows forest managers to assess the potential for regeneration and plan interventions to enhance regeneration success. Traditional field surveys for canopy [...] Read more.
Canopy gaps and their associated processes play an important role in shaping forest structure and dynamics. Understanding the information about canopy gaps allows forest managers to assess the potential for regeneration and plan interventions to enhance regeneration success. Traditional field surveys for canopy gaps are time consuming and often inaccurate. In this study, canopy gaps were detected using unmanned aerial vehicle (UAV) imagery of two sub-compartments of an uneven-aged mixed forest in northern Japan. We compared the performance of U-Net and ResU-Net (U-Net combined with ResNet101) deep learning models using RGB, canopy height model (CHM), and fused RGB-CHM data from UAV imagery. Our results showed that the ResU-Net model, particularly when pre-trained on ImageNet (ResU-Net_2), achieved the highest F1-scores—0.77 in Sub-compartment 42B and 0.79 in Sub-compartment 16AB—outperforming the U-Net model (0.52 and 0.63) and the non-pre-trained ResU-Net model (ResU-Net_1) (0.70 and 0.72). ResU-Net_2 also achieved superior overall accuracy values of 0.96 and 0.97, outperforming previous methods that used UAV datasets with varying methodologies for canopy gap detection. These findings underscore the effectiveness of the ResU-Net_2 model in detecting canopy gaps in uneven-aged mixed forests. Furthermore, when these trained models were applied as transfer models to detect gaps specifically caused by selection harvesting using pre- and post-UAV imagery, they showed considerable potential, achieving moderate F1-scores of 0.54 and 0.56, even with a limited training dataset. Overall, our study demonstrates that combining UAV imagery with deep learning techniques, particularly pre-trained models, significantly improves canopy gap detection accuracy and provides valuable insights for forest management and future research. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Location map of the University of Tokyo Hokkaido Forest (UTHF); (<b>b</b>) location map of the selected two Sub- compartments in the UTHF; Orthomosaics of Sub-compartment (<b>c</b>) 42B and (<b>d</b>) 16AB.</p>
Full article ">Figure 2
<p>Canopy Height Model of sub-compartments: (<b>a</b>) 42B, (<b>b</b>) 16AB.</p>
Full article ">Figure 3
<p>Aerial orthophotos of Sub-compartment 42B’s (<b>a</b>) pre-selection harvesting and (<b>b</b>) post-selection harvesting.</p>
Full article ">Figure 4
<p>Workflow for detecting canopy gaps in uneven-aged mixed forests using UAV imagery and deep learning models.</p>
Full article ">Figure 5
<p>ResU-Net (U-Net with ResNet101 backbone) classification algorithm.</p>
Full article ">Figure 6
<p>Visualization of the canopy gap distribution in the Sub-compartment 42B predicted by the ResU-Net_2 model.</p>
Full article ">Figure 7
<p>Visualization of the canopy gap distribution in Sub-compartment 16AB predicted by the ResU-Net_2 model.</p>
Full article ">Figure 8
<p>Confusion matrix of canopy gap detection by selection harvesting by using transfer models: (<b>a</b>) by the transfer model using the Sub-compartment 42B dataset (before extended training); (<b>b</b>) by the transfer model using the Sub-compartment 16AB dataset (before extended training); (<b>c</b>) by the transfer model using the Sub-compartment 42B dataset (after extended training); (<b>d</b>) by the transfer model using the Sub-compartment 16AB dataset (after extended training); and (<b>e</b>) by the ResU-Net model.</p>
Full article ">Figure 9
<p>Visualization of predicted canopy gaps by using transfer models. (<b>a</b>) Testing RGB image (post-selection harvesting); (<b>b</b>) labeled mask; (<b>c</b>,<b>d</b>) prediction results using the transfer model on the Sub-compartment 42B dataset (before extended training) and misclassified regions; (<b>e</b>,<b>f</b>) prediction results using the transfer model on the Sub-compartment 16AB dataset (before extended training) and misclassified regions; (<b>g</b>,<b>h</b>) prediction results using the transfer model on the Sub-compartment 42B dataset (after extended training) and misclassified regions; (<b>i</b>,<b>j</b>) prediction results using the transfer model on the Sub-compartment 16AB dataset (after extended training) and misclassified regions; (<b>k</b>,<b>l</b>) prediction results using the ResU-Net model and misclassified region.</p>
Full article ">Figure 10
<p>Training and validation accuracies and losses of (<b>a</b>,<b>b</b>) the transfer model using the Sub-compartment 42B dataset (before extended training); (<b>c</b>,<b>d</b>) the transfer model using the Sub-compartment 16AB dataset (before extended training); (<b>e</b>,<b>f</b>) the transfer model using the Sub-compartment 42B dataset (after extended training); (<b>g</b>,<b>h</b>) the transfer model using the Sub-compartment 16AB dataset (after extended training); and (<b>i</b>,<b>j</b>) the ResU-Net model.</p>
Full article ">Figure A1
<p>Visualization of the canopy gap distribution in Sub-compartment 42B predicted by the (<b>a</b>) U-Net model using the RGB dataset, (<b>b</b>) U-Net model using the CHM dataset, (<b>c</b>) U-Net model using the fused RGB and CHM dataset, (<b>d</b>) ResU-Net_1 model using the RGB dataset, (<b>e</b>) ResU-Net_1 model using the CHM dataset, and (<b>f</b>) ResU-Net_1 model using the fused RGB and CHM dataset.</p>
Full article ">Figure A2
<p>Visualization of the canopy gap distribution in Sub-compartment 16AB predicted by the (<b>a</b>) U-Net model using the RGB dataset, (<b>b</b>) U-Net model using the CHM dataset, (<b>c</b>) U-Net model using the fused RGB and CHM dataset, (<b>d</b>) ResU-Net_1 model using the RGB dataset, (<b>e</b>) ResU-Net_1 model using the CHM dataset, and (<b>f</b>) ResU-Net_1 model using the fused RGB and CHM dataset.</p>
Full article ">
17 pages, 17092 KiB  
Article
Detection and Assessment of White Flowering Nectar Source Trees and Location of Bee Colonies in Rural and Suburban Environments Using Deep Learning
by Atanas Z. Atanasov, Boris I. Evstatiev, Asparuh I. Atanasov and Ivaylo S. Hristakov
Diversity 2024, 16(9), 578; https://doi.org/10.3390/d16090578 - 13 Sep 2024
Viewed by 193
Abstract
Environmental pollution with pesticides as a result of intensive agriculture harms the development of bee colonies. Bees are one of the most important pollinating insects on our planet. One of the ways to protect them is to relocate and build apiaries in populated [...] Read more.
Environmental pollution with pesticides as a result of intensive agriculture harms the development of bee colonies. Bees are one of the most important pollinating insects on our planet. One of the ways to protect them is to relocate and build apiaries in populated areas. An important condition for the development of bee colonies is the rich species diversity of flowering plants and the size of the areas occupied by them. In this study, a methodology for detecting and distinguishing white flowering nectar source trees and counting bee colonies is developed and demonstrated, applicable in populated environments. It is based on UAV-obtained RGB imagery and two convolutional neural networks—a pixel-based one for identification of flowering areas and an object-based one for beehive identification, which achieved accuracies of 93.4% and 95.2%, respectively. Based on an experimental study near the village of Yuper (Bulgaria), the productive potential of black locust (Robinia pseudoacacia) areas in rural and suburban environments was determined. The obtained results showed that the identified blooming area corresponds to 3.654 m2, out of 89.725 m2 that were scanned with the drone, and the number of identified beehives was 149. The proposed methodology will facilitate beekeepers in choosing places for the placement of new apiaries and planning activities of an organizational nature. Full article
(This article belongs to the Special Issue Ecology and Diversity of Bees in Urban Environments)
Show Figures

Figure 1

Figure 1
<p>Location of the experimental plot: (<b>a</b>) the village of Yuper; (<b>b</b>) the geographic location of the experimental area in the north-eastern part of Bulgaria.</p>
Full article ">Figure 2
<p>Summary of the proposed methodology for analysis of the honey production potential.</p>
Full article ">Figure 3
<p>Summary of the geo-referenced image in the Yuper region. The flight range of the bees is marked with the yellow circles. The areas <span class="html-italic">Robinia pseudoacacia</span> are marked with green.</p>
Full article ">Figure 4
<p>The merged images selected as reference data for recognizing blooming trees (marked in yellow).</p>
Full article ">Figure 5
<p>Training and validation loss of the DeepLabV3 CNN model for blooming areas identification.</p>
Full article ">Figure 6
<p>Image used as reference data for training the beehives recognition model (<b>a</b>) and closeup image of an area with the beehives (<b>b</b>). All beehives are marked with yellow rectangles.</p>
Full article ">Figure 7
<p>Training and validation loss of the Mask RCNN model for beehive counting.</p>
Full article ">Figure 8
<p>HQ map of the investigated area, generated using the UAV images. The yellow squares represent the locations of the UAV-obtained images.</p>
Full article ">Figure 9
<p>Examples of false positives during beehive identification and counting. The red marks represent the artificial objects, incorrectly identified as beehives.</p>
Full article ">Figure 10
<p>Identified beehives (marked in pink and green) in: (<b>a</b>) area 1; (<b>b</b>) area 2.</p>
Full article ">Figure 11
<p>Graphical results from the pixel-based identification of blooming trees (marked in blue).</p>
Full article ">Figure 12
<p>Control hive for monitoring the weight.</p>
Full article ">
19 pages, 18432 KiB  
Article
Low-Cost Lettuce Height Measurement Based on Depth Vision and Lightweight Instance Segmentation Model
by Yiqiu Zhao, Xiaodong Zhang, Jingjing Sun, Tingting Yu, Zongyao Cai, Zhi Zhang and Hanping Mao
Agriculture 2024, 14(9), 1596; https://doi.org/10.3390/agriculture14091596 - 13 Sep 2024
Viewed by 196
Abstract
Plant height is a crucial indicator of crop growth. Rapid measurement of crop height facilitates the implementation and management of planting strategies, ensuring optimal crop production quality and yield. This paper presents a low-cost method for the rapid measurement of multiple lettuce heights, [...] Read more.
Plant height is a crucial indicator of crop growth. Rapid measurement of crop height facilitates the implementation and management of planting strategies, ensuring optimal crop production quality and yield. This paper presents a low-cost method for the rapid measurement of multiple lettuce heights, developed using an improved YOLOv8n-seg model and the stacking characteristics of planes in depth images. First, we designed a lightweight instance segmentation model based on YOLOv8n-seg by enhancing the model architecture and reconstructing the channel dimension distribution. This model was trained on a small-sample dataset augmented through random transformations. Secondly, we proposed a method to detect and segment the horizontal plane. This method leverages the stacking characteristics of the plane, as identified in the depth image histogram from an overhead perspective, allowing for the identification of planes parallel to the camera’s imaging plane. Subsequently, we evaluated the distance between each plane and the centers of the lettuce contours to select the cultivation substrate plane as the reference for lettuce bottom height. Finally, the height of multiple lettuce plants was determined by calculating the height difference between the top and bottom of each plant. The experimental results demonstrated that the improved model achieved a 25.56% increase in processing speed, along with a 2.4% enhancement in mean average precision compared to the original YOLOv8n-seg model. The average accuracy of the plant height measurement algorithm reached 94.339% in hydroponics and 91.22% in pot cultivation scenarios, with absolute errors of 7.39 mm and 9.23 mm, similar to the sensor’s depth direction error. With images downsampled by a factor of 1/8, the highest processing speed recorded was 6.99 frames per second (fps), enabling the system to process an average of 174 lettuce targets per second. The experimental results confirmed that the proposed method exhibits promising accuracy, efficiency, and robustness. Full article
(This article belongs to the Special Issue Smart Agriculture Sensors and Monitoring Systems for Field Detection)
Show Figures

Figure 1

Figure 1
<p>Lettuce growing environment.</p>
Full article ">Figure 2
<p>Plant height measurement tool.</p>
Full article ">Figure 3
<p>Examples of random transformations.</p>
Full article ">Figure 4
<p>YOLOv8n-seg structure.</p>
Full article ">Figure 5
<p>Structure of YOLOv8-seg with FasterNet as backbone.</p>
Full article ">Figure 6
<p>Hydroponics scenario: (<b>A</b>) distribution of depth image pixels along the depth axis, (<b>B</b>) histogram of depth image. Potting scenario: (<b>C</b>) distribution of depth image pixels along the depth axis, (<b>D</b>) histogram of depth image.</p>
Full article ">Figure 7
<p>(<b>A</b>,<b>C</b>) Results of plane detection based on pixel stacking. (<b>B</b>,<b>D</b>) Image region division based on crop center.</p>
Full article ">Figure 8
<p>Algorithm flow diagram.</p>
Full article ">Figure 9
<p>Model channel dimension comparisons (before multiplying the width coefficient of the model).</p>
Full article ">Figure 10
<p>mAP changes of 7 models during model training.</p>
Full article ">Figure 11
<p>Segmentation performance comparison of 7 models with target confidence scores.</p>
Full article ">Figure 12
<p>Heat maps of the last layer of different backbones.</p>
Full article ">Figure 13
<p>Lettuce height measurement outputs (mm).</p>
Full article ">Figure 14
<p>Plant height measurement results of hydroponics scenario.</p>
Full article ">Figure 15
<p>Plant height measurement results of potted lettuce.</p>
Full article ">Figure 16
<p>Segmentation comparison between vegetation index method and Model 5.</p>
Full article ">Figure 17
<p>Comparison of different plane detection algorithms.</p>
Full article ">
15 pages, 10244 KiB  
Article
Identification of Floating Green Tide in High-Turbidity Water from Sentinel-2 MSI Images Employing NDVI and CIE Hue Angle Thresholds
by Lin Wang, Qinghui Meng, Xiang Wang, Yanlong Chen, Xinxin Wang, Jie Han and Bingqiang Wang
J. Mar. Sci. Eng. 2024, 12(9), 1640; https://doi.org/10.3390/jmse12091640 - 13 Sep 2024
Viewed by 180
Abstract
Remote sensing technology is widely used to obtain information on floating green tides, and thresholding methods based on indices such as the normalized difference vegetation index (NDVI) and the floating algae index (FAI) play an important role in such studies. However, as the [...] Read more.
Remote sensing technology is widely used to obtain information on floating green tides, and thresholding methods based on indices such as the normalized difference vegetation index (NDVI) and the floating algae index (FAI) play an important role in such studies. However, as the methods are influenced by many factors, the threshold values vary greatly; in particular, the error of data extraction clearly increases in situations of high-turbidity water (HTW) (NDVI > 0). In this study, high spatial resolution, multispectral images from the Sentinel-2 MSI mission were used as the data source. It was found that the International Commission on Illumination (CIE) hue angle calculated using remotely sensed equivalent multispectral reflectance data and the RGB method is extremely effective in distinguishing floating green tides from areas of HTW. Statistical analysis of Sentinel-2 MSI images showed that the threshold value of the hue angle that can effectively eliminate the effect of HTW is 218.94°. A test demonstration of the method for identifying the floating green tide in HTW in a Sentinel-2 MSI image was carried out using the identified threshold values of NDVI > 0 and CIE hue angle < 218.94°. The demonstration showed that the method effectively eliminates misidentification caused by HTW pixels (NDVI > 0), resulting in better consistency of the identification of the floating green tide and its distribution in the true color image. The method enables rapid and accurate extraction of information on floating green tide in HTW, and offers a new solution for the monitoring and tracking of green tides in coastal areas. Full article
(This article belongs to the Section Marine Environmental Science)
Show Figures

Figure 1

Figure 1
<p>The spatial distribution of the in situ optical observation stations and the spatial coverage of the satellite data used in this study.</p>
Full article ">Figure 2
<p>The in situ measured hyperspectral reflectance of typical water bodies and floating green tides of different coverages.</p>
Full article ">Figure 3
<p>Scatterplots showing the relationship between the NDVI value and the hue angle calculated using (<b>a</b>) the in situ measured hyperspectral reflectance, and the corresponding Sentinel-2 MSI multispectral reflectance in (<b>b</b>) the five visible bands and (<b>c</b>) with the three RGB bands.</p>
Full article ">Figure 4
<p>Sentinel-2 MSI images (23 May 2023) and the corresponding pixel identification results (red area) based on NDVI &gt; 0 for HTW floating green tides.</p>
Full article ">Figure 5
<p>The distribution of pixel counts at different hue angles when NDVI &gt; 0.</p>
Full article ">Figure 6
<p>(<b>a</b>,<b>d</b>,<b>g</b>) Sentinel-2 MSI true-color image obtained on 7 June 2022, 23 May 2023, and 1 June 2024, (<b>b</b>,<b>e</b>,<b>h</b>) identification results obtained using the traditional NDVI thresholding method (NDVI &gt; 0), and (<b>c</b>,<b>f</b>,<b>i</b>) identification results obtained using the method proposed in this study (NDVI &gt; 0 and hue angle &lt; 218.94°).</p>
Full article ">Figure 7
<p>The distribution of the pixel counts within the variation interval of sensitivity factors, including (<b>a</b>) hue angle, (<b>b</b>–<b>h</b>) reflectance values in B2–B8, (<b>i</b>) B4/B3 reflectance ratio.</p>
Full article ">
24 pages, 17247 KiB  
Article
Efficient Lossy Compression of Video Sequences of Automotive High-Dynamic Range Image Sensors for Advanced Driver-Assistance Systems and Autonomous Vehicles
by Paweł Pawłowski and Karol Piniarski
Electronics 2024, 13(18), 3651; https://doi.org/10.3390/electronics13183651 - 13 Sep 2024
Viewed by 299
Abstract
In this paper, we introduce an efficient lossy coding procedure specifically tailored for handling video sequences of automotive high-dynamic range (HDR) image sensors in advanced driver-assistance systems (ADASs) for autonomous vehicles. Nowadays, mainly for security reasons, lossless compression is used in the automotive [...] Read more.
In this paper, we introduce an efficient lossy coding procedure specifically tailored for handling video sequences of automotive high-dynamic range (HDR) image sensors in advanced driver-assistance systems (ADASs) for autonomous vehicles. Nowadays, mainly for security reasons, lossless compression is used in the automotive industry. However, it offers very low compression rates. To obtain higher compression rates, we suggest using lossy codecs, especially when testing image processing algorithms in software in-the-loop (SiL) or hardware-in-the-loop (HiL) conditions. Our approach leverages the high-quality VP9 codec, operating in two distinct modes: grayscale image compression for automatic image analysis and color (in RGB format) image compression for manual analysis. In both modes, images are acquired from the automotive-specific RCCC (red, clear, clear, clear) image sensor. The codec is designed to achieve a controlled image quality and state-of-the-art compression ratios while maintaining real-time feasibility. In automotive applications, the inherent data loss poses challenges associated with lossy codecs, particularly in rapidly changing scenes with intricate details. To address this, we propose configuring the lossy codecs in variable bitrate (VBR) mode with a constrained quality (CQ) parameter. By adjusting the quantization parameter, users can tailor the codec behavior to their specific application requirements. In this context, a detailed analysis of the quality of lossy compressed images in terms of the structural similarity index metric (SSIM) and the peak signal-to-noise ratio (PSNR) metrics is presented. With this analysis, we extracted some codec parameters, which have an important impact on preservation of video quality and compression ratio. The proposed compression settings are very efficient: the compression ratios vary from 51 to 7765 for grayscale image mode and from 4.51 to 602.6 for RGB image mode, depending on the specified output image quality settings. We reached 129 frames per second (fps) for compression and 315 fps for decompression in grayscale mode and 102 fps for compression and 121 fps for decompression in the RGB mode. These make it possible to achieve a much higher compression ratio compared to lossless compression while maintaining control over image quality. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Figure 1
<p>Most popular color filter arrays (CFAs) in automotive sensors: (<b>a</b>) monochrome (CCCC), (<b>b</b>) RCCC, (<b>c</b>) RCCB, (<b>d</b>) RGCB, (<b>e</b>) RYYc, (<b>f</b>) RGGB (C—clear; R—red; B—blue; G—grey; Y—yellow; c—cyan).</p>
Full article ">Figure 2
<p>The illustrative example of the variability of the output stream (bitrate) and image quality across Q, CQ, CBR, and VBR modes for the VP9 codec [<a href="#B25-electronics-13-03651" class="html-bibr">25</a>].</p>
Full article ">Figure 3
<p>Lossy compression process scheme used for RCCC images with conversion to monochrome or RGB images.</p>
Full article ">Figure 4
<p>PSNR and SSIM within sequence 3 with various GOP size and quality settings (CRF = 15 for high quality and CRF = 55 for reduced quality).</p>
Full article ">Figure 5
<p>One image from sequence 7 presented in RGB color space.</p>
Full article ">
21 pages, 5815 KiB  
Article
Enhancing the Image Pre-Processing for Large Fleets Based on a Fuzzy Approach to Handle Multiple Resolutions
by Ching-Yun Mu and Pin Kung
Appl. Sci. 2024, 14(18), 8254; https://doi.org/10.3390/app14188254 - 13 Sep 2024
Viewed by 236
Abstract
Image pre-processing is crucial for large fleet management. Many traffic videos are collected by closed-circuit television (CCTV), which has a fixed area monitoring for image analysis. This paper adopts the front camera installed in large vehicles to obtain moving traffic images, whereas CCTV [...] Read more.
Image pre-processing is crucial for large fleet management. Many traffic videos are collected by closed-circuit television (CCTV), which has a fixed area monitoring for image analysis. This paper adopts the front camera installed in large vehicles to obtain moving traffic images, whereas CCTV is more limited. In practice, fleets often install cameras with different resolutions due to cost considerations. The cameras evaluate the front images with traffic lights. This paper proposes fuzzy enhancement with RGB and CIELAB conversions to handle multiple resolutions. This study provided image pre-processing adjustment comparisons, enabling further model training and analysis. This paper proposed fuzzy enhancement to deal with multiple resolutions. The fuzzy enhancement and fuzzy with brightness adjustment produced images with lower MSE and higher PSNR for the images of the front view. Fuzzy enhancement can also be used to enhance traffic light image adjustments. Moreover, this study employed You Only Look Once Version 9 (YOLOv9) for model training. YOLOv9 with fuzzy enhancement obtained better detection performance. This fuzzy enhancement made more flexible adjustments for pre-processing tasks and provided guidance for fleet managers to perform consistent image-enhancement adjustments for handling multiple resolutions. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart for image enhancement [<a href="#B40-applsci-14-08254" class="html-bibr">40</a>].</p>
Full article ">Figure 2
<p>Image counts by resolution category.</p>
Full article ">Figure 3
<p>Label traffic lights data.</p>
Full article ">Figure 4
<p>Using RGB for image conversion.</p>
Full article ">Figure 5
<p>Input function.</p>
Full article ">Figure 6
<p>A comparison of three different image enhancements.</p>
Full article ">Figure 7
<p>Display 33 MSE performance for fuzzy enhancement and HE (Unit: a.u.).</p>
Full article ">Figure 8
<p>Display 33 PSNR performance for fuzzy enhancement and HE (Unit: a.u.).</p>
Full article ">Figure 9
<p>Comparison results of precision (Unit: a.u.).</p>
Full article ">Figure 10
<p>Comparison results of recall (Unit: a.u.).</p>
Full article ">Figure 11
<p>Comparison results of mAP@0.5 (Unit: a.u.).</p>
Full article ">Figure 12
<p>Comparison of loss curves.</p>
Full article ">Figure 13
<p>Confusion matrix of YOLOv9 results.</p>
Full article ">
17 pages, 5434 KiB  
Article
HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis
by Daniel La’ah Ayuba, Jean-Yves Guillemaut, Belen Marti-Cardona and Oscar Mendez
Remote Sens. 2024, 16(18), 3399; https://doi.org/10.3390/rs16183399 - 12 Sep 2024
Viewed by 373
Abstract
The use of a pretrained image classification model (trained on cats and dogs, for example) as a perceptual loss function for hyperspectral super-resolution and pansharpening tasks is surprisingly effective. However, RGB-based networks do not take full advantage of the spectral information in hyperspectral [...] Read more.
The use of a pretrained image classification model (trained on cats and dogs, for example) as a perceptual loss function for hyperspectral super-resolution and pansharpening tasks is surprisingly effective. However, RGB-based networks do not take full advantage of the spectral information in hyperspectral data. This inspired the creation of HyperKon, a dedicated hyperspectral Convolutional Neural Network backbone built with self-supervised contrastive representation learning. HyperKon uniquely leverages the high spectral continuity, range, and resolution of hyperspectral data through a spectral attention mechanism. We also perform a thorough ablation study on different kinds of layers, showing their performance in understanding hyperspectral layers. Notably, HyperKon achieves a remarkable 98% Top-1 retrieval accuracy and surpasses traditional RGB-trained backbones in both pansharpening and image classification tasks. These results highlight the potential of hyperspectral-native backbones and herald a paradigm shift in hyperspectral image analysis. Full article
(This article belongs to the Special Issue Advances in Hyperspectral Remote Sensing Image Processing)
Show Figures

Figure 1

Figure 1
<p>General Overview of HyperKon System Architecture.</p>
Full article ">Figure 2
<p>Illustration of HSI contrastive sampling.</p>
Full article ">Figure 3
<p>Conceptual comparison of HSI vs. RGB perceptual loss.</p>
Full article ">Figure 4
<p>Top-1 HSI retrieval accuracy achieved by various versions of the HyperKon model during the pretraining phase. The performance of each version is presented as a bar in the chart, illustrating how the integration of different components, such as 3D convolutions, DSC, the CBAM, and the SEB, affected the model’s accuracy. The chart underscores the superior performance of the SEB.</p>
Full article ">Figure 5
<p>Top-1 HSI retrieval accuracy for the 3D Conv, SEB, and CBAM models following dimensionality reduction using PCA. The graph indicates a superior performance by the 3D convolution model.</p>
Full article ">Figure 6
<p>Top-1 HSI retrieval accuracy for the 3D Conv, SEB, and CBAM models when manual band selection was employed. It shows an initial advantage for 3D Conv, but over time, the SEB and CBAM models catch up to similar levels of performance.</p>
Full article ">Figure 7
<p>Visual results generated by different pansharpening algorithms (HyperPNN [<a href="#B49-remotesensing-16-03399" class="html-bibr">49</a>], DARN [<a href="#B45-remotesensing-16-03399" class="html-bibr">45</a>], GPPNN [<a href="#B50-remotesensing-16-03399" class="html-bibr">50</a>], HyperTransformer [<a href="#B51-remotesensing-16-03399" class="html-bibr">51</a>], HyperKon (ours), and ground truth) for Pavia Center [<a href="#B42-remotesensing-16-03399" class="html-bibr">42</a>], Botswana [<a href="#B43-remotesensing-16-03399" class="html-bibr">43</a>], Chikusei [<a href="#B44-remotesensing-16-03399" class="html-bibr">44</a>], and EnMAP [<a href="#B26-remotesensing-16-03399" class="html-bibr">26</a>] datasets. MAE denotes the (normalized) Mean Absolute Error across all spectral bands.</p>
Full article ">Figure 8
<p>HyperKon image classification visualization for Indian Pines, Pavia University, and Salinas datasets. (<b>a</b>) Predicted classification map generated by HyperKon, (<b>b</b>) Predicted classification map with masked regions (showing only labeled areas), (<b>c</b>) predicted accuracy map: green for correct predictions, red for incorrect predictions, and black for unlabeled areas, (<b>d</b>) ground-truth classification map, and (<b>e</b>) original RGB image.</p>
Full article ">Figure 9
<p>Zoom-out predicted accuracy map for Indian Pines: green for correct predictions, red for incorrect predictions, and black for unlabeled areas.</p>
Full article ">
29 pages, 9403 KiB  
Article
DIO-SLAM: A Dynamic RGB-D SLAM Method Combining Instance Segmentation and Optical Flow
by Lang He, Shiyun Li, Junting Qiu and Chenhaomin Zhang
Sensors 2024, 24(18), 5929; https://doi.org/10.3390/s24185929 - 12 Sep 2024
Viewed by 316
Abstract
Feature points from moving objects can negatively impact the accuracy of Visual Simultaneous Localization and Mapping (VSLAM) algorithms, while detection or semantic segmentation-based VSLAM approaches often fail to accurately determine the true motion state of objects. To address this challenge, this paper introduces [...] Read more.
Feature points from moving objects can negatively impact the accuracy of Visual Simultaneous Localization and Mapping (VSLAM) algorithms, while detection or semantic segmentation-based VSLAM approaches often fail to accurately determine the true motion state of objects. To address this challenge, this paper introduces DIO-SLAM: Dynamic Instance Optical Flow SLAM, a VSLAM system specifically designed for dynamic environments. Initially, the detection thread employs YOLACT (You Only Look At CoefficienTs) to distinguish between rigid and non-rigid objects within the scene. Subsequently, the optical flow thread estimates optical flow and introduces a novel approach to capture the optical flow of moving objects by leveraging optical flow residuals. Following this, an optical flow consistency method is implemented to assess the dynamic nature of rigid object mask regions, classifying them as either moving or stationary rigid objects. To mitigate errors caused by missed detections or motion blur, a motion frame propagation method is employed. Lastly, a dense mapping thread is incorporated to filter out non-rigid objects using semantic information, track the point clouds of rigid objects, reconstruct the static background, and store the resulting map in an octree format. Experimental results demonstrate that the proposed method surpasses current mainstream dynamic VSLAM techniques in both localization accuracy and real-time performance. Full article
(This article belongs to the Special Issue Sensors and Algorithms for 3D Visual Analysis and SLAM)
Show Figures

Figure 1

Figure 1
<p>Performance of the traditional ORB-SLAM3 algorithm in highly dynamic environments. (<b>a</b>) Image of the highly dynamic scene. (<b>b</b>) Feature point extraction in the highly dynamic scene, where yellow boxes indicate moving objects and dynamic feature points are marked in red. (<b>c</b>) Comparison between the estimated camera pose and the ground truth camera pose. (<b>d</b>) Reconstruction results of dense mapping.</p>
Full article ">Figure 2
<p>The overall system framework of DIO-SLAM. Key innovations are highlighted in red font, while the original ORB-SLAM3 framework is represented by unfilled boxes. (<b>a</b>) Detection thread, represented by green boxes. (<b>b</b>) Optical flow thread, represented by blue boxes. (<b>c</b>) Dynamic feature point filtering module, which is composed of both the detection and optical flow threads. (<b>d</b>) Independent dense mapping thread.</p>
Full article ">Figure 3
<p>Instance segmentation results and non-rigid object mask extraction. (<b>a</b>) RGB frame used for segmentation. (<b>b</b>) Instance segmentation output.</p>
Full article ">Figure 4
<p>Separation of non-rigid and rigid object masks based on semantic information.</p>
Full article ">Figure 5
<p>Optical flow network inputs and output. (<b>a</b>) Frame <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>−</mo> <mn>1</mn> </mrow> </semantics></math>. (<b>b</b>) Frame <math display="inline"><semantics> <mi>n</mi> </semantics></math>. (<b>c</b>) Dense optical flow.</p>
Full article ">Figure 6
<p>Optical flow changes between adjacent frames.</p>
Full article ">Figure 7
<p>Iterative removal of camera self-motion flow using optical flow residuals. (<b>a</b>) Original dense optical flow. (<b>b</b>) Number of iterations = 5. (<b>c</b>) Number of iterations = 7. (<b>d</b>) Number of iterations = 9.</p>
Full article ">Figure 8
<p>Optical flow consistency for determining the moving rigid object region.</p>
Full article ">Figure 9
<p>Motion frame propagation.</p>
Full article ">Figure 10
<p>Effect of dynamic feature point removal. The colored areas in the figure depict the optical flow of moving rigid objects, while the green areas indicate the final extracted feature points. The feature points of non-rigid objects, such as the human body, are removed in all scenes. (<b>a</b>,<b>b</b>) A chair is being dragged, with its feature points being removed. (<b>c</b>,<b>d</b>) Hitting a balloon, where the feature points on the balloon are removed. (<b>e</b>) The box is stationary, and the feature points are normally extracted. (<b>f</b>) The box is being moved, with its feature points removed. (<b>g</b>,<b>h</b>) The box is put down, and its feature points are restored.</p>
Full article ">Figure 11
<p>Absolute trajectory error and relative pose error of fr3_walkingx_xyz.</p>
Full article ">Figure 12
<p>Absolute trajectory error and relative pose error of fr3_walking_static.</p>
Full article ">Figure 13
<p>Absolute trajectory error and relative pose error of fr3_walking_rpy.</p>
Full article ">Figure 14
<p>Absolute trajectory error and relative pose error of fr3_walking_halfsphere.</p>
Full article ">Figure 15
<p>Absolute trajectory error and relative pose error of fr3_sitting_static.</p>
Full article ">Figure 16
<p>Dense point cloud reconstruction. (<b>a</b>) RGB frame, dense point cloud, and octree map of the fr3_walking_xyz sequence. (<b>b</b>) RGB frame, dense point cloud, and octree map of the moving_nonobstructing_box sequence.</p>
Full article ">Figure 16 Cont.
<p>Dense point cloud reconstruction. (<b>a</b>) RGB frame, dense point cloud, and octree map of the fr3_walking_xyz sequence. (<b>b</b>) RGB frame, dense point cloud, and octree map of the moving_nonobstructing_box sequence.</p>
Full article ">Figure 17
<p>Point cloud error heatmaps. (<b>a</b>) kt0 sequence. (<b>b</b>) kt1 sequence. (<b>c</b>) kt2 sequence. (<b>d</b>) kt3 sequence.</p>
Full article ">Figure 18
<p>Real-world scenario test results. (<b>a</b>) Color images. (<b>b</b>) Depth images. (<b>c</b>) Optical flow of moving objects. (<b>d</b>) Moving rigid object masks. (<b>e</b>) Feature points in traditional ORB-SLAM3. (<b>f</b>) Feature points in DIO-SLAM.</p>
Full article ">
18 pages, 8682 KiB  
Article
Analysis of Factors Influencing the Precision of Body Tracking Outcomes in Industrial Gesture Control
by Aleksej Weber, Markus Wilhelm and Jan Schmitt
Sensors 2024, 24(18), 5919; https://doi.org/10.3390/s24185919 - 12 Sep 2024
Viewed by 192
Abstract
The body tracking systems on the current market offer a wide range of options for tracking the movements of objects, people, or extremities. The precision of this technology is often limited and determines its field of application. This work aimed to identify relevant [...] Read more.
The body tracking systems on the current market offer a wide range of options for tracking the movements of objects, people, or extremities. The precision of this technology is often limited and determines its field of application. This work aimed to identify relevant technical and environmental factors that influence the performance of body tracking in industrial environments. The influence of light intensity, range of motion, speed of movement and direction of hand movement was analyzed individually and in combination. The hand movement of a test person was recorded with an Azure Kinect at a distance of 1.3 m. The joints in the center of the hand showed the highest accuracy compared to other joints. The best results were achieved at a luminous intensity of 500 lx, and movements in the x-axis direction were more precise than in the other directions. The greatest inaccuracy was found in the z-axis direction. A larger range of motion resulted in higher inaccuracy, with the lowest data scatter at a 100 mm range of motion. No significant difference was found at hand velocity of 370 mm/s, 670 mm/s and 1140 mm/s. This study emphasizes the potential of RGB-D camera technology for gesture control of industrial robots in industrial environments to increase efficiency and ease of use. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

Figure 1
<p>Experimental setups in x-direction (<b>top left</b>), y-direction (<b>top right</b>) and z-direction (<b>bottom</b>).</p>
Full article ">Figure 2
<p>Exemplary distribution of the measured positions of joint (15) of the x- and y-axis without regression function.</p>
Full article ">Figure 3
<p>Exemplary distribution of the measured positions of joint (15) of the x- and y-axis with quadratic regression function (red curve).</p>
Full article ">Figure 4
<p>Distribution of the <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> </semantics></math>s per direction of movement for all data sets of the joint (15)–Joint (17).</p>
Full article ">Figure 5
<p>Distribution <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> </semantics></math>_1 as a function of hand movement direction and light intensity [lx].</p>
Full article ">Figure 6
<p>Distribution <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> </semantics></math>_2 as a function of hand movement direction and light intensity [lx].</p>
Full article ">Figure 7
<p>Distribution <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> </semantics></math>_1 as a function of hand movement direction and range of hand movement [mm].</p>
Full article ">Figure 8
<p>Distribution <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> </semantics></math>_2 as a function of hand movement direction and range of hand movement [mm].</p>
Full article ">Figure 9
<p>Distribution <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> </semantics></math>_1 as a function of hand movement direction and velocity of hand movement [mm/s].</p>
Full article ">Figure 10
<p>Distribution <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>A</mi> <mi>E</mi> </mrow> </semantics></math>_2 as a function of hand movement direction and velocity of hand movement [mm/s].</p>
Full article ">
21 pages, 13059 KiB  
Article
Change Detection for Forest Ecosystems Using Remote Sensing Images with Siamese Attention U-Net
by Ashen Iranga Hewarathna, Luke Hamlin, Joseph Charles, Palanisamy Vigneshwaran, Romiyal George, Selvarajah Thuseethan, Chathrie Wimalasooriya and Bharanidharan Shanmugam
Technologies 2024, 12(9), 160; https://doi.org/10.3390/technologies12090160 - 12 Sep 2024
Viewed by 462
Abstract
Forest ecosystems are critical components of Earth’s biodiversity and play vital roles in climate regulation and carbon sequestration. They face increasing threats from deforestation, wildfires, and other anthropogenic activities. Timely detection and monitoring of changes in forest landscapes pose significant challenges for government [...] Read more.
Forest ecosystems are critical components of Earth’s biodiversity and play vital roles in climate regulation and carbon sequestration. They face increasing threats from deforestation, wildfires, and other anthropogenic activities. Timely detection and monitoring of changes in forest landscapes pose significant challenges for government agencies. To address these challenges, we propose a novel pipeline by refining the U-Net design, including employing two different schemata of early fusion networks and a Siam network architecture capable of processing RGB images specifically designed to identify high-risk areas in forest ecosystems through change detection across different time frames in the same location. It annotates ground truth change maps in such time frames using an encoder–decoder approach with the help of an enhanced feature learning and attention mechanism. Our proposed pipeline, integrated with ResNeSt blocks and SE attention techniques, achieved impressive results in our newly created forest cover change dataset. The evaluation metrics reveal a Dice score of 39.03%, a kappa score of 35.13%, an F1-score of 42.84%, and an overall accuracy of 94.37%. Notably, our approach significantly outperformed multitasking model approaches in the ONERA dataset, boasting a precision of 53.32%, a Dice score of 59.97%, and an overall accuracy of 97.82%. Furthermore, it surpassed multitasking models in the HRSCD dataset, even without utilizing land cover maps, achieving a Dice score of 44.62%, a kappa score of 11.97%, and an overall accuracy of 98.44%. Although the proposed model had a lower F1-score than other methods, other performance metrics highlight its effectiveness in timely detection and forest landscape monitoring, advancing deep learning techniques in this field. Full article
(This article belongs to the Section Environmental Technology)
Show Figures

Figure 1

Figure 1
<p>The proposed pipeline for change detection in high threat zones in forests.</p>
Full article ">Figure 2
<p>General U-Net architecture used in this study with the addition of feature learning modules.</p>
Full article ">Figure 3
<p>Change detection process using an encoder–decoder approach with enhanced feature learning and attention mechanism.</p>
Full article ">Figure 4
<p>High-level overview of using AGs in the second strategy of applying attention.</p>
Full article ">Figure 5
<p>Sample cropping patches from the original high-resolution satellite image.</p>
Full article ">Figure 6
<p>Change annotation in a particular region (trios); <math display="inline"><semantics> <msub> <mi>t</mi> <mn>0</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>t</mi> <mn>1</mn> </msub> </semantics></math> show two different instance over the same location in two different time periods.</p>
Full article ">Figure 7
<p>Method of annotating changes between two different timestamps (<math display="inline"><semantics> <msub> <mi>t</mi> <mn>1</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>t</mi> <mn>2</mn> </msub> </semantics></math>). (<b>a</b>) Extracted image from time frame <math display="inline"><semantics> <msub> <mi>T</mi> <mi>o</mi> </msub> </semantics></math>; (<b>b</b>) extracted image from time frame <math display="inline"><semantics> <msub> <mi>T</mi> <mn>1</mn> </msub> </semantics></math>.</p>
Full article ">Figure 8
<p>Systematic patches for images that are obtained from dynamic cropping. (<b>a</b>) Extract from image using stride and patch side size values from annotated image trios; (<b>b</b>) randomly extract image patches from annotated image trios.</p>
Full article ">Figure 9
<p>Resultant images from color changes: (<b>a</b>) normal image; (<b>b</b>) increased brightness; (<b>c</b>) increased saturation; (<b>d</b>) randomly increased brightness and saturation.</p>
Full article ">Figure 10
<p>Resultant segmented image after applying change detection algorithm.</p>
Full article ">Figure 11
<p>Attention blocks in the fully convolutional early fusion (FCEF) architecture compared to those without attention FCEF models.</p>
Full article ">Figure 12
<p>Each attention block in the Siam architecture compared to those without the Siam attention model.</p>
Full article ">Figure 13
<p>Validation loss graph for (<b>a</b>) FCEF ResNeSt, (<b>b</b>) ResNeSt (Siam), (<b>c</b>) ResNeXt additive (Siam), (<b>d</b>) ResNeSt SE (Siam).</p>
Full article ">Figure 14
<p>Illustration of how a sample image is segmented, cropped, and normalized, ready for training at different time points. (<b>a</b>) Our study dataset; (<b>b</b>) HRSCD dataset; (<b>c</b>) ONERA dataset; <math display="inline"><semantics> <mo>Δ</mo> </semantics></math>T—time difference between consecutive frames.</p>
Full article ">
19 pages, 20386 KiB  
Article
YOD-SLAM: An Indoor Dynamic VSLAM Algorithm Based on the YOLOv8 Model and Depth Information
by Yiming Li, Yize Wang, Liuwei Lu and Qi An
Electronics 2024, 13(18), 3633; https://doi.org/10.3390/electronics13183633 - 12 Sep 2024
Viewed by 249
Abstract
Aiming at the problems of low positioning accuracy and poor mapping effect of the visual SLAM system caused by the poor quality of the dynamic object mask in an indoor dynamic environment, an indoor dynamic VSLAM algorithm based on the YOLOv8 model and [...] Read more.
Aiming at the problems of low positioning accuracy and poor mapping effect of the visual SLAM system caused by the poor quality of the dynamic object mask in an indoor dynamic environment, an indoor dynamic VSLAM algorithm based on the YOLOv8 model and depth information (YOD-SLAM) is proposed based on the ORB-SLAM3 system. Firstly, the YOLOv8 model obtains the original mask of a priori dynamic objects, and the depth information is used to modify the mask. Secondly, the mask’s depth information and center point are used to a priori determine if the dynamic object has missed detection and if the mask needs to be redrawn. Then, the mask edge distance and depth information are used to judge the movement state of non-prior dynamic objects. Finally, all dynamic object information is removed, and the remaining static objects are used for posing estimation and dense point cloud mapping. The accuracy of camera positioning and the construction effect of dense point cloud maps are verified using the TUM RGB-D dataset and real environment data. The results show that YOD-SLAM has a higher positioning accuracy and dense point cloud mapping effect in dynamic scenes than other advanced SLAM systems such as DS-SLAM and DynaSLAM. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Figure 1
<p>Overview of YOD-SLAM.</p>
Full article ">Figure 2
<p>The process of modifying prior dynamic object masks using depth information. Figure (<b>a</b>) shows the depth image corresponding to the current frame. Through the algorithm presented in this article, the background area that is excessively covered in (<b>b</b>) is removed in (<b>c</b>). The expanded edges of the human body have achieved better coverage in (<b>d</b>).</p>
Full article ">Figure 3
<p>The process of redrawing the mask of previously missed dynamic objects. Figure (<b>a</b>) shows the depth image corresponding to the current frame. In (<b>b</b>), we can see that people in the distance were not covered by the original mask, resulting in missed detections. The mask in (<b>c</b>) can be obtained by filling in the mask with the depth information specific to that location in (<b>a</b>).</p>
Full article ">Figure 4
<p>The process of excluding prior static objects in motion.</p>
Full article ">Figure 5
<p>Comparison of estimated trajectories and real trajectories of four systems.</p>
Full article ">Figure 6
<p>The results of mask modification on dynamic objects. The three graphs of each column come from the same time as their respective datasets. The first line is the depth image corresponding to the current frame; the second is the original mask obtained by YOLOv8; and the third is the final mask after our modification.</p>
Full article ">Figure 7
<p>Comparison of point cloud maps between ORB-SLAM3 and YOD-SLAM in two sets of highly dynamic sequences.</p>
Full article ">Figure 7 Cont.
<p>Comparison of point cloud maps between ORB-SLAM3 and YOD-SLAM in two sets of highly dynamic sequences.</p>
Full article ">Figure 8
<p>Comparison of point cloud maps between ORB-SLAM3 and YOD-SLAM in low- and static dynamic sequences, where fr2/desk/p is a low-dynamic scene, while fr2/rpy is a static scene.</p>
Full article ">Figure 9
<p>Intel RealSense Depth Camera D455.</p>
Full article ">Figure 10
<p>Mask processing and ORB feature point extraction in real laboratory environments. Several non English exhibition boards are leaning against the wall to simulate typical indoor environments. The facial features of the characters have been treated with confidentiality.</p>
Full article ">Figure 11
<p>Comparison of dense point cloud mapping between ORB-SLAM3 and YOD-SLAM in real laboratory environments. We marked the map areas affected by dynamic objects with red circles.</p>
Full article ">
17 pages, 8334 KiB  
Article
PAIBoard: A Neuromorphic Computing Platform for Hybrid Neural Networks in Robot Dog Application
by Guang Chen, Jian Cao, Chenglong Zou, Shuo Feng, Yi Zhong, Xing Zhang and Yuan Wang
Electronics 2024, 13(18), 3619; https://doi.org/10.3390/electronics13183619 - 12 Sep 2024
Viewed by 282
Abstract
Hybrid neural networks (HNNs), integrating the strengths of artificial neural networks (ANNs) and spiking neural networks (SNNs), provide a promising solution towards generic artificial intelligence. There is a prevailing trend towards designing unified SNN-ANN paradigm neuromorphic computing chips to support HNNs, but developing [...] Read more.
Hybrid neural networks (HNNs), integrating the strengths of artificial neural networks (ANNs) and spiking neural networks (SNNs), provide a promising solution towards generic artificial intelligence. There is a prevailing trend towards designing unified SNN-ANN paradigm neuromorphic computing chips to support HNNs, but developing platforms to advance neuromorphic computing systems is equally essential. This paper presents the PAIBoard platform, which is designed to facilitate the implementation of HNNs. The platform comprises three main components: the upper computer, the communication module, and the neuromorphic computing chip. Both hardware and software performance measurements indicate that our platform achieves low power consumption, high energy efficiency and comparable task accuracy. Furthermore, PAIBoard is applied in a robot dog for tracking and obstacle avoidance system. The tracking module combines data from ultra-wide band (UWB) transceivers and vision, while the obstacle avoidance module utilizes depth information from an RGB-D camera, which further underscores the potential of our platform to tackle challenging tasks in real-world applications. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) ANN neuron; (<b>b</b>) SNN neuron; (<b>c</b>) hybrid neural networks (HNNs).</p>
Full article ">Figure 2
<p>(<b>a</b>) Neuromorphic chip; (<b>b</b>) overall architecture of the neuromorphic chip; (<b>c</b>) the neuromorphic chip routing network.</p>
Full article ">Figure 3
<p>Neuromorphic computing platform and nervous system.</p>
Full article ">Figure 4
<p>The proposed platform PAIBoard system architecture.</p>
Full article ">Figure 5
<p>(<b>a</b>) Flowchart of H2C/C2H; (<b>b</b>) workflow of the upper computer.</p>
Full article ">Figure 6
<p>Topology diagram of the proposed prototype board.</p>
Full article ">Figure 7
<p>(<b>a</b>) 3D printer; (<b>b</b>) acrylic plate; (<b>c</b>) the prototype board; (<b>d</b>) the prototype board with an acrylic plate.</p>
Full article ">Figure 8
<p>The architecture of HNN with group convolution for a classification task on CIFAR-10 dataset.</p>
Full article ">Figure 9
<p>The pipeline of robot dog tracking and obstacle avoidance system.</p>
Full article ">Figure 10
<p>Workflow of robot dog.</p>
Full article ">Figure 11
<p>Robot dog containing UWB and the prototype board.</p>
Full article ">Figure 12
<p>(<b>a</b>) Three-layer full-connected SNN; (<b>b</b>) network architecture of YOLOv5n.</p>
Full article ">Figure 13
<p>Image samples of self-built dataset.</p>
Full article ">Figure 14
<p>Results on (<b>a</b>) tracking; (<b>b</b>) obstacle avoidance.</p>
Full article ">Figure 15
<p>Tracking and obstacle avoidance system demonstration.</p>
Full article ">
Back to TopTop