MDPI - Publisher of Open Access Journals

17 pages, 17602 KiB

Open AccessArticle

Enhancing Detection of Pedestrians in Low-Light Conditions by Accentuating Gaussian–Sobel Edge Features from Depth Maps

by Minyoung Jung and Jeongho Cho

Appl. Sci. 2024, 14(18), 8326; https://doi.org/10.3390/app14188326 (registering DOI) - 15 Sep 2024

Owing to the low detection accuracy of camera-based object detection models, various fusion techniques with Light Detection and Ranging (LiDAR) have been attempted. This has resulted in improved detection of objects that are difficult to detect due to partial occlusion by obstacles or [...] Read more.

Owing to the low detection accuracy of camera-based object detection models, various fusion techniques with Light Detection and Ranging (LiDAR) have been attempted. This has resulted in improved detection of objects that are difficult to detect due to partial occlusion by obstacles or unclear silhouettes. However, the detection performance remains limited in low-light environments where small pedestrians are located far from the sensor or pedestrians have difficult-to-estimate shapes. This study proposes an object detection model that employs a Gaussian–Sobel filter. This filter combines Gaussian blurring, which suppresses the effects of noise, and a Sobel mask, which accentuates object features, to effectively utilize depth maps generated by LiDAR for object detection. The model performs independent pedestrian detection using the real-time object detection model You Only Look Once v4, based on RGB images obtained using a camera and depth maps preprocessed by the Gaussian–Sobel filter, and estimates the optimal pedestrian location using non-maximum suppression. This enables accurate pedestrian detection while maintaining a high detection accuracy even in low-light or external-noise environments, where object features and contours are not well defined. The test evaluation results demonstrated that the proposed method achieved at least 1–7% higher average precision than the state-of-the-art models under various environments. Full article

(This article belongs to the Special Issue Object Detection and Image Classification)

► Show Figures

Figure 1

15 pages, 12764 KiB

Open AccessArticle

Learning Unsupervised Cross-Domain Model for TIR Target Tracking

by Xiu Shu, Feng Huang, Zhaobing Qiu, Xinming Zhang and Di Yuan

Mathematics 2024, 12(18), 2882; https://doi.org/10.3390/math12182882 (registering DOI) - 15 Sep 2024

Abstract

The limited availability of thermal infrared (TIR) training samples leads to suboptimal target representation by convolutional feature extraction networks, which adversely impacts the accuracy of TIR target tracking methods. To address this issue, we propose an unsupervised cross-domain model (UCDT) for TIR tracking. [...] Read more.

The limited availability of thermal infrared (TIR) training samples leads to suboptimal target representation by convolutional feature extraction networks, which adversely impacts the accuracy of TIR target tracking methods. To address this issue, we propose an unsupervised cross-domain model (UCDT) for TIR tracking. Our approach leverages labeled training samples from the RGB domain (source domain) to train a general feature extraction network. We then employ a cross-domain model to adapt this network for effective target feature extraction in the TIR domain (target domain). This cross-domain strategy addresses the challenge of limited TIR training samples effectively. Additionally, we utilize an unsupervised learning technique to generate pseudo-labels for unlabeled training samples in the source domain, which helps overcome the limitations imposed by the scarcity of annotated training data. Extensive experiments demonstrate that our UCDT tracking method outperforms existing tracking approaches on the PTB-TIR and LSOTB-TIR benchmarks. Full article

(This article belongs to the Special Issue Mathematics-Based Methods in Artificial Intelligence, Pattern Recognition and Deep Learning, 2nd Edition)

18 pages, 5572 KiB

Open AccessArticle

Visual-Inertial RGB-D SLAM with Encoder Integration of ORB Triangulation and Depth Measurement Uncertainties

by Zhan-Wu Ma and Wan-Sheng Cheng

Sensors 2024, 24(18), 5964; https://doi.org/10.3390/s24185964 (registering DOI) - 14 Sep 2024

Viewed by 217

Abstract

In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct [...] Read more.

In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct measurements from RGB-D depth cameras, which inherently contain measurement errors. Moreover, the potential of triangulation-based estimation for ORB (Oriented FAST and Rotated BRIEF) feature points remains underutilized. To address the singularity of measurement data, this paper proposes the integration of the ORB features, triangulation uncertainty estimation and depth measurements uncertainty estimation, for 3D positions of feature points. This integration is achieved using a CI (Covariance Intersection) filter, referred to as the CI-TEDM (Triangulation Estimates and Depth Measurements) method. Vision-based SLAM systems face significant challenges, particularly in environments, such as long straight corridors, weakly textured scenes, or during rapid motion, where tracking failures are common. To enhance the stability of visual SLAM, this paper introduces an improved CI-TEDM method by incorporating wheel encoder data. The mathematical model of the encoder is proposed, and detailed derivations of the encoder pre-integration model and error model are provided. Building on these improvements, we propose a novel tightly coupled visual-inertial RGB-D SLAM with encoder integration of ORB triangulation and depth measurement uncertainties. Validation on open-source datasets and real-world environments demonstrates that the proposed improvements significantly enhance the robustness of real-time state estimation and localization accuracy for intelligent vehicles in challenging environments. Full article

(This article belongs to the Special Issue Target Tracking and Navigation for Intelligent Autonomous Unmanned Systems Application)

► Show Figures

Figure 1

23 pages, 11793 KiB

Open AccessArticle

Detecting Canopy Gaps in Uneven-Aged Mixed Forests through the Combined Use of Unmanned Aerial Vehicle Imagery and Deep Learning

by Nyo Me Htun, Toshiaki Owari, Satoshi Tsuyuki and Takuya Hiroshima

Drones 2024, 8(9), 484; https://doi.org/10.3390/drones8090484 - 13 Sep 2024

Viewed by 365

Abstract

Canopy gaps and their associated processes play an important role in shaping forest structure and dynamics. Understanding the information about canopy gaps allows forest managers to assess the potential for regeneration and plan interventions to enhance regeneration success. Traditional field surveys for canopy [...] Read more.

Canopy gaps and their associated processes play an important role in shaping forest structure and dynamics. Understanding the information about canopy gaps allows forest managers to assess the potential for regeneration and plan interventions to enhance regeneration success. Traditional field surveys for canopy gaps are time consuming and often inaccurate. In this study, canopy gaps were detected using unmanned aerial vehicle (UAV) imagery of two sub-compartments of an uneven-aged mixed forest in northern Japan. We compared the performance of U-Net and ResU-Net (U-Net combined with ResNet101) deep learning models using RGB, canopy height model (CHM), and fused RGB-CHM data from UAV imagery. Our results showed that the ResU-Net model, particularly when pre-trained on ImageNet (ResU-Net_2), achieved the highest F1-scores—0.77 in Sub-compartment 42B and 0.79 in Sub-compartment 16AB—outperforming the U-Net model (0.52 and 0.63) and the non-pre-trained ResU-Net model (ResU-Net_1) (0.70 and 0.72). ResU-Net_2 also achieved superior overall accuracy values of 0.96 and 0.97, outperforming previous methods that used UAV datasets with varying methodologies for canopy gap detection. These findings underscore the effectiveness of the ResU-Net_2 model in detecting canopy gaps in uneven-aged mixed forests. Furthermore, when these trained models were applied as transfer models to detect gaps specifically caused by selection harvesting using pre- and post-UAV imagery, they showed considerable potential, achieving moderate F1-scores of 0.54 and 0.56, even with a limited training dataset. Overall, our study demonstrates that combining UAV imagery with deep learning techniques, particularly pre-trained models, significantly improves canopy gap detection accuracy and provides valuable insights for forest management and future research. Full article

(This article belongs to the Special Issue Feature Papers for Drones in Agriculture and Forestry Section: 2nd Edition)

► Show Figures

Figure 1

17 pages, 17092 KiB

Open AccessArticle

Detection and Assessment of White Flowering Nectar Source Trees and Location of Bee Colonies in Rural and Suburban Environments Using Deep Learning

by Atanas Z. Atanasov, Boris I. Evstatiev, Asparuh I. Atanasov and Ivaylo S. Hristakov

Diversity 2024, 16(9), 578; https://doi.org/10.3390/d16090578 - 13 Sep 2024

Viewed by 193

Abstract

Environmental pollution with pesticides as a result of intensive agriculture harms the development of bee colonies. Bees are one of the most important pollinating insects on our planet. One of the ways to protect them is to relocate and build apiaries in populated [...] Read more.

Environmental pollution with pesticides as a result of intensive agriculture harms the development of bee colonies. Bees are one of the most important pollinating insects on our planet. One of the ways to protect them is to relocate and build apiaries in populated areas. An important condition for the development of bee colonies is the rich species diversity of flowering plants and the size of the areas occupied by them. In this study, a methodology for detecting and distinguishing white flowering nectar source trees and counting bee colonies is developed and demonstrated, applicable in populated environments. It is based on UAV-obtained RGB imagery and two convolutional neural networks—a pixel-based one for identification of flowering areas and an object-based one for beehive identification, which achieved accuracies of 93.4% and 95.2%, respectively. Based on an experimental study near the village of Yuper (Bulgaria), the productive potential of black locust (Robinia pseudoacacia) areas in rural and suburban environments was determined. The obtained results showed that the identified blooming area corresponds to 3.654 m², out of 89.725 m² that were scanned with the drone, and the number of identified beehives was 149. The proposed methodology will facilitate beekeepers in choosing places for the placement of new apiaries and planning activities of an organizational nature. Full article

(This article belongs to the Special Issue Ecology and Diversity of Bees in Urban Environments)

► Show Figures

Figure 1

19 pages, 18432 KiB

Open AccessArticle

Low-Cost Lettuce Height Measurement Based on Depth Vision and Lightweight Instance Segmentation Model

by Yiqiu Zhao, Xiaodong Zhang, Jingjing Sun, Tingting Yu, Zongyao Cai, Zhi Zhang and Hanping Mao

Agriculture 2024, 14(9), 1596; https://doi.org/10.3390/agriculture14091596 - 13 Sep 2024

Viewed by 196

Abstract

Plant height is a crucial indicator of crop growth. Rapid measurement of crop height facilitates the implementation and management of planting strategies, ensuring optimal crop production quality and yield. This paper presents a low-cost method for the rapid measurement of multiple lettuce heights, [...] Read more.

Plant height is a crucial indicator of crop growth. Rapid measurement of crop height facilitates the implementation and management of planting strategies, ensuring optimal crop production quality and yield. This paper presents a low-cost method for the rapid measurement of multiple lettuce heights, developed using an improved YOLOv8n-seg model and the stacking characteristics of planes in depth images. First, we designed a lightweight instance segmentation model based on YOLOv8n-seg by enhancing the model architecture and reconstructing the channel dimension distribution. This model was trained on a small-sample dataset augmented through random transformations. Secondly, we proposed a method to detect and segment the horizontal plane. This method leverages the stacking characteristics of the plane, as identified in the depth image histogram from an overhead perspective, allowing for the identification of planes parallel to the camera’s imaging plane. Subsequently, we evaluated the distance between each plane and the centers of the lettuce contours to select the cultivation substrate plane as the reference for lettuce bottom height. Finally, the height of multiple lettuce plants was determined by calculating the height difference between the top and bottom of each plant. The experimental results demonstrated that the improved model achieved a 25.56% increase in processing speed, along with a 2.4% enhancement in mean average precision compared to the original YOLOv8n-seg model. The average accuracy of the plant height measurement algorithm reached 94.339% in hydroponics and 91.22% in pot cultivation scenarios, with absolute errors of 7.39 mm and 9.23 mm, similar to the sensor’s depth direction error. With images downsampled by a factor of 1/8, the highest processing speed recorded was 6.99 frames per second (fps), enabling the system to process an average of 174 lettuce targets per second. The experimental results confirmed that the proposed method exhibits promising accuracy, efficiency, and robustness. Full article

(This article belongs to the Special Issue Smart Agriculture Sensors and Monitoring Systems for Field Detection)

► Show Figures

Figure 1

15 pages, 10244 KiB

Open AccessArticle

Identification of Floating Green Tide in High-Turbidity Water from Sentinel-2 MSI Images Employing NDVI and CIE Hue Angle Thresholds

by Lin Wang, Qinghui Meng, Xiang Wang, Yanlong Chen, Xinxin Wang, Jie Han and Bingqiang Wang

J. Mar. Sci. Eng. 2024, 12(9), 1640; https://doi.org/10.3390/jmse12091640 - 13 Sep 2024

Viewed by 180

Abstract

Remote sensing technology is widely used to obtain information on floating green tides, and thresholding methods based on indices such as the normalized difference vegetation index (NDVI) and the floating algae index (FAI) play an important role in such studies. However, as the [...] Read more.

Remote sensing technology is widely used to obtain information on floating green tides, and thresholding methods based on indices such as the normalized difference vegetation index (NDVI) and the floating algae index (FAI) play an important role in such studies. However, as the methods are influenced by many factors, the threshold values vary greatly; in particular, the error of data extraction clearly increases in situations of high-turbidity water (HTW) (NDVI > 0). In this study, high spatial resolution, multispectral images from the Sentinel-2 MSI mission were used as the data source. It was found that the International Commission on Illumination (CIE) hue angle calculated using remotely sensed equivalent multispectral reflectance data and the RGB method is extremely effective in distinguishing floating green tides from areas of HTW. Statistical analysis of Sentinel-2 MSI images showed that the threshold value of the hue angle that can effectively eliminate the effect of HTW is 218.94°. A test demonstration of the method for identifying the floating green tide in HTW in a Sentinel-2 MSI image was carried out using the identified threshold values of NDVI > 0 and CIE hue angle < 218.94°. The demonstration showed that the method effectively eliminates misidentification caused by HTW pixels (NDVI > 0), resulting in better consistency of the identification of the floating green tide and its distribution in the true color image. The method enables rapid and accurate extraction of information on floating green tide in HTW, and offers a new solution for the monitoring and tracking of green tides in coastal areas. Full article

(This article belongs to the Section Marine Environmental Science)

► Show Figures

Figure 1

Figure 1
The spatial distribution of the in situ optical observation stations and the spatial coverage of the satellite data used in this study. Full article ">Figure 2
The in situ measured hyperspectral reflectance of typical water bodies and floating green tides of different coverages. Full article ">Figure 3
Scatterplots showing the relationship between the NDVI value and the hue angle calculated using (a) the in situ measured hyperspectral reflectance, and the corresponding Sentinel-2 MSI multispectral reflectance in (b) the five visible bands and (c) with the three RGB bands. Full article ">Figure 4
Sentinel-2 MSI images (23 May 2023) and the corresponding pixel identification results (red area) based on NDVI > 0 for HTW floating green tides. Full article ">Figure 5
The distribution of pixel counts at different hue angles when NDVI > 0. Full article ">Figure 6
(a,d,g) Sentinel-2 MSI true-color image obtained on 7 June 2022, 23 May 2023, and 1 June 2024, (b,e,h) identification results obtained using the traditional NDVI thresholding method (NDVI > 0), and (c,f,i) identification results obtained using the method proposed in this study (NDVI > 0 and hue angle < 218.94°). Full article ">Figure 7
The distribution of the pixel counts within the variation interval of sensitivity factors, including (a) hue angle, (b–h) reflectance values in B2–B8, (i) B4/B3 reflectance ratio. Full article ">

24 pages, 17247 KiB

Open AccessArticle

Efficient Lossy Compression of Video Sequences of Automotive High-Dynamic Range Image Sensors for Advanced Driver-Assistance Systems and Autonomous Vehicles

by Paweł Pawłowski and Karol Piniarski

Electronics 2024, 13(18), 3651; https://doi.org/10.3390/electronics13183651 - 13 Sep 2024

Viewed by 299

Abstract

In this paper, we introduce an efficient lossy coding procedure specifically tailored for handling video sequences of automotive high-dynamic range (HDR) image sensors in advanced driver-assistance systems (ADASs) for autonomous vehicles. Nowadays, mainly for security reasons, lossless compression is used in the automotive [...] Read more.

In this paper, we introduce an efficient lossy coding procedure specifically tailored for handling video sequences of automotive high-dynamic range (HDR) image sensors in advanced driver-assistance systems (ADASs) for autonomous vehicles. Nowadays, mainly for security reasons, lossless compression is used in the automotive industry. However, it offers very low compression rates. To obtain higher compression rates, we suggest using lossy codecs, especially when testing image processing algorithms in software in-the-loop (SiL) or hardware-in-the-loop (HiL) conditions. Our approach leverages the high-quality VP9 codec, operating in two distinct modes: grayscale image compression for automatic image analysis and color (in RGB format) image compression for manual analysis. In both modes, images are acquired from the automotive-specific RCCC (red, clear, clear, clear) image sensor. The codec is designed to achieve a controlled image quality and state-of-the-art compression ratios while maintaining real-time feasibility. In automotive applications, the inherent data loss poses challenges associated with lossy codecs, particularly in rapidly changing scenes with intricate details. To address this, we propose configuring the lossy codecs in variable bitrate (VBR) mode with a constrained quality (CQ) parameter. By adjusting the quantization parameter, users can tailor the codec behavior to their specific application requirements. In this context, a detailed analysis of the quality of lossy compressed images in terms of the structural similarity index metric (SSIM) and the peak signal-to-noise ratio (PSNR) metrics is presented. With this analysis, we extracted some codec parameters, which have an important impact on preservation of video quality and compression ratio. The proposed compression settings are very efficient: the compression ratios vary from 51 to 7765 for grayscale image mode and from 4.51 to 602.6 for RGB image mode, depending on the specified output image quality settings. We reached 129 frames per second (fps) for compression and 315 fps for decompression in grayscale mode and 102 fps for compression and 121 fps for decompression in the RGB mode. These make it possible to achieve a much higher compression ratio compared to lossless compression while maintaining control over image quality. Full article

(This article belongs to the Special Issue Deep Perception in Autonomous Driving)

► Show Figures

Figure 1

21 pages, 5815 KiB

Open AccessArticle

Enhancing the Image Pre-Processing for Large Fleets Based on a Fuzzy Approach to Handle Multiple Resolutions

by Ching-Yun Mu and Pin Kung

Appl. Sci. 2024, 14(18), 8254; https://doi.org/10.3390/app14188254 - 13 Sep 2024

Viewed by 236

Abstract

Image pre-processing is crucial for large fleet management. Many traffic videos are collected by closed-circuit television (CCTV), which has a fixed area monitoring for image analysis. This paper adopts the front camera installed in large vehicles to obtain moving traffic images, whereas CCTV [...] Read more.

Image pre-processing is crucial for large fleet management. Many traffic videos are collected by closed-circuit television (CCTV), which has a fixed area monitoring for image analysis. This paper adopts the front camera installed in large vehicles to obtain moving traffic images, whereas CCTV is more limited. In practice, fleets often install cameras with different resolutions due to cost considerations. The cameras evaluate the front images with traffic lights. This paper proposes fuzzy enhancement with RGB and CIELAB conversions to handle multiple resolutions. This study provided image pre-processing adjustment comparisons, enabling further model training and analysis. This paper proposed fuzzy enhancement to deal with multiple resolutions. The fuzzy enhancement and fuzzy with brightness adjustment produced images with lower MSE and higher PSNR for the images of the front view. Fuzzy enhancement can also be used to enhance traffic light image adjustments. Moreover, this study employed You Only Look Once Version 9 (YOLOv9) for model training. YOLOv9 with fuzzy enhancement obtained better detection performance. This fuzzy enhancement made more flexible adjustments for pre-processing tasks and provided guidance for fleet managers to perform consistent image-enhancement adjustments for handling multiple resolutions. Full article

► Show Figures

Figure 1

17 pages, 5434 KiB

Open AccessArticle

HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis

by Daniel La’ah Ayuba, Jean-Yves Guillemaut, Belen Marti-Cardona and Oscar Mendez

Remote Sens. 2024, 16(18), 3399; https://doi.org/10.3390/rs16183399 - 12 Sep 2024

Viewed by 373

Abstract

The use of a pretrained image classification model (trained on cats and dogs, for example) as a perceptual loss function for hyperspectral super-resolution and pansharpening tasks is surprisingly effective. However, RGB-based networks do not take full advantage of the spectral information in hyperspectral [...] Read more.

The use of a pretrained image classification model (trained on cats and dogs, for example) as a perceptual loss function for hyperspectral super-resolution and pansharpening tasks is surprisingly effective. However, RGB-based networks do not take full advantage of the spectral information in hyperspectral data. This inspired the creation of HyperKon, a dedicated hyperspectral Convolutional Neural Network backbone built with self-supervised contrastive representation learning. HyperKon uniquely leverages the high spectral continuity, range, and resolution of hyperspectral data through a spectral attention mechanism. We also perform a thorough ablation study on different kinds of layers, showing their performance in understanding hyperspectral layers. Notably, HyperKon achieves a remarkable 98% Top-1 retrieval accuracy and surpasses traditional RGB-trained backbones in both pansharpening and image classification tasks. These results highlight the potential of hyperspectral-native backbones and herald a paradigm shift in hyperspectral image analysis. Full article

(This article belongs to the Special Issue Advances in Hyperspectral Remote Sensing Image Processing)

► Show Figures

Figure 1

29 pages, 9403 KiB

Open AccessArticle

DIO-SLAM: A Dynamic RGB-D SLAM Method Combining Instance Segmentation and Optical Flow

by Lang He, Shiyun Li, Junting Qiu and Chenhaomin Zhang

Sensors 2024, 24(18), 5929; https://doi.org/10.3390/s24185929 - 12 Sep 2024

Viewed by 316

Abstract

Feature points from moving objects can negatively impact the accuracy of Visual Simultaneous Localization and Mapping (VSLAM) algorithms, while detection or semantic segmentation-based VSLAM approaches often fail to accurately determine the true motion state of objects. To address this challenge, this paper introduces [...] Read more.

Feature points from moving objects can negatively impact the accuracy of Visual Simultaneous Localization and Mapping (VSLAM) algorithms, while detection or semantic segmentation-based VSLAM approaches often fail to accurately determine the true motion state of objects. To address this challenge, this paper introduces DIO-SLAM: Dynamic Instance Optical Flow SLAM, a VSLAM system specifically designed for dynamic environments. Initially, the detection thread employs YOLACT (You Only Look At CoefficienTs) to distinguish between rigid and non-rigid objects within the scene. Subsequently, the optical flow thread estimates optical flow and introduces a novel approach to capture the optical flow of moving objects by leveraging optical flow residuals. Following this, an optical flow consistency method is implemented to assess the dynamic nature of rigid object mask regions, classifying them as either moving or stationary rigid objects. To mitigate errors caused by missed detections or motion blur, a motion frame propagation method is employed. Lastly, a dense mapping thread is incorporated to filter out non-rigid objects using semantic information, track the point clouds of rigid objects, reconstruct the static background, and store the resulting map in an octree format. Experimental results demonstrate that the proposed method surpasses current mainstream dynamic VSLAM techniques in both localization accuracy and real-time performance. Full article

(This article belongs to the Special Issue Sensors and Algorithms for 3D Visual Analysis and SLAM)

► Show Figures

Figure 1

18 pages, 8682 KiB

Open AccessArticle

Analysis of Factors Influencing the Precision of Body Tracking Outcomes in Industrial Gesture Control

by Aleksej Weber, Markus Wilhelm and Jan Schmitt

Sensors 2024, 24(18), 5919; https://doi.org/10.3390/s24185919 - 12 Sep 2024

Viewed by 192

Abstract

The body tracking systems on the current market offer a wide range of options for tracking the movements of objects, people, or extremities. The precision of this technology is often limited and determines its field of application. This work aimed to identify relevant [...] Read more.

The body tracking systems on the current market offer a wide range of options for tracking the movements of objects, people, or extremities. The precision of this technology is often limited and determines its field of application. This work aimed to identify relevant technical and environmental factors that influence the performance of body tracking in industrial environments. The influence of light intensity, range of motion, speed of movement and direction of hand movement was analyzed individually and in combination. The hand movement of a test person was recorded with an Azure Kinect at a distance of 1.3 m. The joints in the center of the hand showed the highest accuracy compared to other joints. The best results were achieved at a luminous intensity of 500 lx, and movements in the x-axis direction were more precise than in the other directions. The greatest inaccuracy was found in the z-axis direction. A larger range of motion resulted in higher inaccuracy, with the lowest data scatter at a 100 mm range of motion. No significant difference was found at hand velocity of 370 mm/s, 670 mm/s and 1140 mm/s. This study emphasizes the potential of RGB-D camera technology for gesture control of industrial robots in industrial environments to increase efficiency and ease of use. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

21 pages, 13059 KiB

Open AccessArticle

Change Detection for Forest Ecosystems Using Remote Sensing Images with Siamese Attention U-Net

by Ashen Iranga Hewarathna, Luke Hamlin, Joseph Charles, Palanisamy Vigneshwaran, Romiyal George, Selvarajah Thuseethan, Chathrie Wimalasooriya and Bharanidharan Shanmugam

Technologies 2024, 12(9), 160; https://doi.org/10.3390/technologies12090160 - 12 Sep 2024

Viewed by 462

Abstract

Forest ecosystems are critical components of Earth’s biodiversity and play vital roles in climate regulation and carbon sequestration. They face increasing threats from deforestation, wildfires, and other anthropogenic activities. Timely detection and monitoring of changes in forest landscapes pose significant challenges for government [...] Read more.

Forest ecosystems are critical components of Earth’s biodiversity and play vital roles in climate regulation and carbon sequestration. They face increasing threats from deforestation, wildfires, and other anthropogenic activities. Timely detection and monitoring of changes in forest landscapes pose significant challenges for government agencies. To address these challenges, we propose a novel pipeline by refining the U-Net design, including employing two different schemata of early fusion networks and a Siam network architecture capable of processing RGB images specifically designed to identify high-risk areas in forest ecosystems through change detection across different time frames in the same location. It annotates ground truth change maps in such time frames using an encoder–decoder approach with the help of an enhanced feature learning and attention mechanism. Our proposed pipeline, integrated with ResNeSt blocks and SE attention techniques, achieved impressive results in our newly created forest cover change dataset. The evaluation metrics reveal a Dice score of 39.03%, a kappa score of 35.13%, an F1-score of 42.84%, and an overall accuracy of 94.37%. Notably, our approach significantly outperformed multitasking model approaches in the ONERA dataset, boasting a precision of 53.32%, a Dice score of 59.97%, and an overall accuracy of 97.82%. Furthermore, it surpassed multitasking models in the HRSCD dataset, even without utilizing land cover maps, achieving a Dice score of 44.62%, a kappa score of 11.97%, and an overall accuracy of 98.44%. Although the proposed model had a lower F1-score than other methods, other performance metrics highlight its effectiveness in timely detection and forest landscape monitoring, advancing deep learning techniques in this field. Full article

(This article belongs to the Section Environmental Technology)

► Show Figures

Figure 1

19 pages, 20386 KiB

Open AccessArticle

YOD-SLAM: An Indoor Dynamic VSLAM Algorithm Based on the YOLOv8 Model and Depth Information

by Yiming Li, Yize Wang, Liuwei Lu and Qi An

Electronics 2024, 13(18), 3633; https://doi.org/10.3390/electronics13183633 - 12 Sep 2024

Viewed by 249

Abstract

Aiming at the problems of low positioning accuracy and poor mapping effect of the visual SLAM system caused by the poor quality of the dynamic object mask in an indoor dynamic environment, an indoor dynamic VSLAM algorithm based on the YOLOv8 model and [...] Read more.

Aiming at the problems of low positioning accuracy and poor mapping effect of the visual SLAM system caused by the poor quality of the dynamic object mask in an indoor dynamic environment, an indoor dynamic VSLAM algorithm based on the YOLOv8 model and depth information (YOD-SLAM) is proposed based on the ORB-SLAM3 system. Firstly, the YOLOv8 model obtains the original mask of a priori dynamic objects, and the depth information is used to modify the mask. Secondly, the mask’s depth information and center point are used to a priori determine if the dynamic object has missed detection and if the mask needs to be redrawn. Then, the mask edge distance and depth information are used to judge the movement state of non-prior dynamic objects. Finally, all dynamic object information is removed, and the remaining static objects are used for posing estimation and dense point cloud mapping. The accuracy of camera positioning and the construction effect of dense point cloud maps are verified using the TUM RGB-D dataset and real environment data. The results show that YOD-SLAM has a higher positioning accuracy and dense point cloud mapping effect in dynamic scenes than other advanced SLAM systems such as DS-SLAM and DynaSLAM. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

17 pages, 8334 KiB

Open AccessArticle

PAIBoard: A Neuromorphic Computing Platform for Hybrid Neural Networks in Robot Dog Application

by Guang Chen, Jian Cao, Chenglong Zou, Shuo Feng, Yi Zhong, Xing Zhang and Yuan Wang

Electronics 2024, 13(18), 3619; https://doi.org/10.3390/electronics13183619 - 12 Sep 2024

Viewed by 282

Abstract

Hybrid neural networks (HNNs), integrating the strengths of artificial neural networks (ANNs) and spiking neural networks (SNNs), provide a promising solution towards generic artificial intelligence. There is a prevailing trend towards designing unified SNN-ANN paradigm neuromorphic computing chips to support HNNs, but developing [...] Read more.

Hybrid neural networks (HNNs), integrating the strengths of artificial neural networks (ANNs) and spiking neural networks (SNNs), provide a promising solution towards generic artificial intelligence. There is a prevailing trend towards designing unified SNN-ANN paradigm neuromorphic computing chips to support HNNs, but developing platforms to advance neuromorphic computing systems is equally essential. This paper presents the PAIBoard platform, which is designed to facilitate the implementation of HNNs. The platform comprises three main components: the upper computer, the communication module, and the neuromorphic computing chip. Both hardware and software performance measurements indicate that our platform achieves low power consumption, high energy efficiency and comparable task accuracy. Furthermore, PAIBoard is applied in a robot dog for tracking and obstacle avoidance system. The tracking module combines data from ultra-wide band (UWB) transceivers and vision, while the obstacle avoidance module utilizes depth information from an RGB-D camera, which further underscores the potential of our platform to tackle challenging tasks in real-world applications. Full article

► Show Figures

Figure 1

Search Results (3,453)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3,453)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI