[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (6,132)

Search Parameters:
Keywords = pixel-based

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 5728 KiB  
Article
Monocular-Vision-Based Method for Locating the Center of Anchor Holes on Steel Belts in Coal Mine Roadways
by Mengyu Lei, Xuhui Zhang and Xin Chen
Appl. Sci. 2024, 14(16), 7080; https://doi.org/10.3390/app14167080 (registering DOI) - 12 Aug 2024
Abstract
The precise positioning of anchoring-hole centers on the steel belts used for anchor support in coal mines is essential for improving the automation and efficiency of roadway support. To address the issues of poor positioning accuracy and the low support efficiency caused by [...] Read more.
The precise positioning of anchoring-hole centers on the steel belts used for anchor support in coal mines is essential for improving the automation and efficiency of roadway support. To address the issues of poor positioning accuracy and the low support efficiency caused by the manual determination of anchoring-hole-center positions, this paper proposes a monocular-vision-based method for locating anchoring-hole centers. Firstly, a laser pointer and an industrial camera are used to build an anchoring-hole positioning device, and its visual positioning model is constructed to achieve the automatic and precise localization of the anchoring-hole center. Secondly, to overcome the difficulty of obtaining high-precision spot centers using edge and grayscale information-based spot extraction methods, a spot center extraction method based on two-dimensional arctangent function fitting is proposed, achieving high precision and the stable acquisition of spot pixel coordinates. The experimental results show that the average measurement errors of the anchoring-hole centers in the camera’s coordinate system along the X-axis, Y-axis, and Z-axis are 3.36 mm, 3.30 mm, and 5.75 mm, respectively, with maximum errors of 4.23 mm, 4.39 mm, and 6.63 mm. The average measurement errors of the steel belt’s pitch, yaw, and roll angles in the camera’s coordinate system are 0.16°, 0.16°, and 0.08°, respectively, with maximum errors of 0.21°, 0.27°, and 0.13°. The proposed method can achieve the precise localization of anchoring holes, improve the efficiency of roadway support, and provide new insights for the automation and intelligentization of roadway anchor support. Full article
Show Figures

Figure 1

Figure 1
<p>Composition of the visual pose measurement system for anchor holes on the steel belt: (<b>a</b>) components of the system; (<b>b</b>) pose measurement unit for anchor holes.</p>
Full article ">Figure 2
<p>Principle of the visual pose measurement system for anchor holes on the steel belt.</p>
Full article ">Figure 3
<p>Laser spot segmentation results: (<b>a</b>) original image; (<b>b</b>) HSV segmentation; (<b>c</b>) contour detection; (<b>d</b>) result.</p>
Full article ">Figure 4
<p>Analysis of the grayscale situation of the light spot: (<b>a</b>) spot image; (<b>b</b>) grayscale distribution; (<b>c</b>) 2D Gaussian function; (<b>d</b>) 2D arctangent function.</p>
Full article ">Figure 5
<p>Extraction results of spot center using three methods.</p>
Full article ">Figure 6
<p>Extraction results of steel belt anchor hole center: (<b>a</b>) original image; (<b>b</b>) image after binarization; (<b>c</b>) image after edge extraction; (<b>d</b>) results of feature extraction.</p>
Full article ">Figure 7
<p>Experimental platform used for steel belt anchor hole pose measurements.</p>
Full article ">Figure 8
<p>Schematic diagram of the parameter calibration method of the steel belt anchor hole pose visual measurement unit.</p>
Full article ">Figure 9
<p>Experimental feature extraction results: (<b>a</b>) Position 1; (<b>b</b>) Position 4; (<b>c</b>) Position 2; (<b>d</b>) Position 5; (<b>e</b>) Position 3; (<b>f</b>) Position 6.</p>
Full article ">Figure 9 Cont.
<p>Experimental feature extraction results: (<b>a</b>) Position 1; (<b>b</b>) Position 4; (<b>c</b>) Position 2; (<b>d</b>) Position 5; (<b>e</b>) Position 3; (<b>f</b>) Position 6.</p>
Full article ">
15 pages, 5568 KiB  
Communication
Research on the Image-Motion Compensation Technology of the Aerial Camera Based on the Multi-Dimensional Motion of the Secondary Mirror
by Hongwei Zhang, Rui Qu, Weining Chen and Huinan Guo
Appl. Sci. 2024, 14(16), 7079; https://doi.org/10.3390/app14167079 (registering DOI) - 12 Aug 2024
Abstract
Targeting the dynamic image-motion problem of aerial cameras in the process of swing imaging, the image-motion compensation technology of aerial cameras based on the multi-dimensional motion of the secondary mirror was adopted. The secondary mirror was used as the image-motion compensation element, and [...] Read more.
Targeting the dynamic image-motion problem of aerial cameras in the process of swing imaging, the image-motion compensation technology of aerial cameras based on the multi-dimensional motion of the secondary mirror was adopted. The secondary mirror was used as the image-motion compensation element, and the comprehensive image-motion compensation of the aerial camera was realized through the multi-dimensional motion of the secondary mirror. However, in the process of compensating for the image motion, the secondary mirror would be eccentric and inclined, which would cause the secondary mirror to be off-axis and affect the image quality. Therefore, a misalignment optical system model was established to study the relationship between the deviation vector and the misalignment of the secondary mirror, and the influence of the secondary mirror’s motion on the distribution of the aberration was analyzed. In order to verify the image-motion compensation ability of the multi-dimensional motion of the secondary mirror, an experimental platform was built to conduct a laboratory imaging experiment and flight experiment on the aerial camera. The experimental results showed that the dynamic resolution of the aerial camera using the image-motion compensation technology could reach 74 lp/mm, and the image-motion compensation accuracy was better than 0.5 pixels, which met the design expectation. In conclusion, the image-motion compensation technology is expected to be applied to various high-precision optical imaging as well as optical detection systems. Full article
(This article belongs to the Special Issue Hybrid Intelligence in Aerospace Science and Engineering)
20 pages, 1296 KiB  
Article
Hyperspectral Spatial Frequency Domain Imaging Technique for Soluble Solids Content and Firmness Assessment of Pears
by Yang Yang, Xiaping Fu and Ying Zhou
Horticulturae 2024, 10(8), 853; https://doi.org/10.3390/horticulturae10080853 (registering DOI) - 12 Aug 2024
Abstract
High Spectral Spatial Frequency Domain Imaging (HSFDI) combines high spectral imaging and spatial frequency domain imaging techniques, offering advantages such as wide spectral range, non-contact, and differentiated imaging depth, making it well-suited for measuring the optical properties of agricultural products. The diffuse reflectance [...] Read more.
High Spectral Spatial Frequency Domain Imaging (HSFDI) combines high spectral imaging and spatial frequency domain imaging techniques, offering advantages such as wide spectral range, non-contact, and differentiated imaging depth, making it well-suited for measuring the optical properties of agricultural products. The diffuse reflectance spectra of the samples at spatial frequencies of 0−1  (Rd0) and 0.2−1  (Rd0) were obtained using the three-phase demodulation algorithm. The pixel-by-pixel inversion was performed to obtain the absorption coefficient (μa) spectra and the reduced scattering coefficient (μ′s) spectra of the pears. For predicting the SSC and firmness of the pears, these optical properties and their specific combinations were used as inputs for partial least squares regression (PLSR) modeling by combining them with the wavelength selection algorithm of competitive adaptive reweighting sampling (CARS). The results showed that  had a stronger correlation with SSC, whereas  exhibited a stronger correlation with firmness. Taking the plane diffuse reflectance  as the comparison object, the prediction results of SSC based on both  and the combination of diffuse reflectance at two spatial frequencies ( ) were superior (the best  of 0.90 and  of 0.41%). Similarly, in the prediction of firmness, the results of μ′s, μa × μ's and Rd1 were better than that of Rd0 (the best Rp2 of 0.80 and RMSEp of 3.25%). The findings of this research indicate that the optical properties represented by HSFDI technology and their combinations can accurately predict the internal quality of pears, providing a novel technical approach for the non-destructive internal quality evaluation of agricultural products. Full article
(This article belongs to the Section Postharvest Biology, Quality, Safety, and Technology)
17 pages, 4015 KiB  
Article
Gastric Cancer Detection with Ensemble Learning on Digital Pathology: Use Case of Gastric Cancer on GasHisSDB Dataset
by Govind Rajesh Mudavadkar, Mo Deng, Salah Mohammed Awad Al-Heejawi, Isha Hemant Arora, Anne Breggia, Bilal Ahmad, Robert Christman, Stephen T. Ryan and Saeed Amal
Diagnostics 2024, 14(16), 1746; https://doi.org/10.3390/diagnostics14161746 - 12 Aug 2024
Viewed by 40
Abstract
Gastric cancer has become a serious worldwide health concern, emphasizing the crucial importance of early diagnosis measures to improve patient outcomes. While traditional histological image analysis is regarded as the clinical gold standard, it is labour intensive and manual. In recognition of this [...] Read more.
Gastric cancer has become a serious worldwide health concern, emphasizing the crucial importance of early diagnosis measures to improve patient outcomes. While traditional histological image analysis is regarded as the clinical gold standard, it is labour intensive and manual. In recognition of this problem, there has been a rise in interest in the use of computer-aided diagnostic tools to help pathologists with their diagnostic efforts. In particular, deep learning (DL) has emerged as a promising solution in this sector. However, current DL models are still restricted in their ability to extract extensive visual characteristics for correct categorization. To address this limitation, this study proposes the use of ensemble models, which incorporate the capabilities of several deep-learning architectures and use aggregate knowledge of many models to improve classification performance, allowing for more accurate and efficient gastric cancer detection. To determine how well these proposed models performed, this study compared them with other works, all of which were based on the Gastric Histopathology Sub-Size Images Database, a publicly available dataset for gastric cancer. This research demonstrates that the ensemble models achieved a high detection accuracy across all sub-databases, with an average accuracy exceeding 99%. Specifically, ResNet50, VGGNet, and ResNet34 performed better than EfficientNet and VitNet. For the 80 × 80-pixel sub-database, ResNet34 exhibited an accuracy of approximately 93%, VGGNet achieved 94%, and the ensemble model excelled with 99%. In the 120 × 120-pixel sub-database, the ensemble model showed 99% accuracy, VGGNet 97%, and ResNet50 approximately 97%. For the 160 × 160-pixel sub-database, the ensemble model again achieved 99% accuracy, VGGNet 98%, ResNet50 98%, and EfficientNet 92%, highlighting the ensemble model’s superior performance across all resolutions. Overall, the ensemble model consistently provided an accuracy of 99% across the three sub-pixel categories. These findings show that ensemble models may successfully detect critical characteristics from smaller patches and achieve high performance. The findings will help pathologists diagnose gastric cancer using histopathological images, leading to earlier identification and higher patient survival rates. Full article
Show Figures

Figure 1

Figure 1
<p>Process of extraction of the histopathology image.</p>
Full article ">Figure 2
<p>Examples of histopathological gastric images. (<b>a</b>) The pre-processing step and (<b>b</b>–<b>d</b>) examples of different image sizes.</p>
Full article ">Figure 3
<p>Summaries of the total number of images for every subclass in an experiment setup.</p>
Full article ">Figure 4
<p>Summarises the total number of the empty and non-empty patch images in the dataset.</p>
Full article ">Figure 5
<p>ResNet34 model architecture.</p>
Full article ">Figure 6
<p>ResNet50 model architecture.</p>
Full article ">Figure 7
<p>VGGNet16 model architecture.</p>
Full article ">Figure 8
<p>EfficientNet model architecture.</p>
Full article ">Figure 9
<p>VITNet model architecture.</p>
Full article ">Figure 10
<p>Ensemble model detailed architecture.</p>
Full article ">
22 pages, 21022 KiB  
Article
Forest Fire Detection Based on Spatial Characteristics of Surface Temperature
by Houzhi Yao, Zhigao Yang, Gui Zhang and Feng Liu
Remote Sens. 2024, 16(16), 2945; https://doi.org/10.3390/rs16162945 - 12 Aug 2024
Viewed by 150
Abstract
Amidst the escalating threat of global warming, which manifests in more frequent forest fires, the prompt and accurate detection of forest fires has ascended to paramount importance. The current surveillance algorithms employed for forest fire monitoring—including, but not limited to, fixed threshold algorithms, [...] Read more.
Amidst the escalating threat of global warming, which manifests in more frequent forest fires, the prompt and accurate detection of forest fires has ascended to paramount importance. The current surveillance algorithms employed for forest fire monitoring—including, but not limited to, fixed threshold algorithms, multi-channel threshold algorithms, and contextual algorithms—rely primarily upon the degree of deviation between the pixel temperature and the background temperature to discern pyric events. Notwithstanding, these algorithms typically fail to account for the spatial heterogeneity of the background temperature, precipitating the consequential oversight of low-temperature fire point pixels, thus impeding the expedited detection of fires in their initial stages. For the amelioration of this deficiency, the present study introduces a spatial feature-based (STF) method for forest fire detection, leveraging Himawari-8/9 imagery as the main data source, complemented by the Shuttle Radar Topography Mission (SRTM) DEM data inputs. Our proposed modality reconstructs the surface temperature information via selecting the optimally designated machine learning model, subsequently identifying the fire point through utilizing the difference between the reconstructed surface temperatures and empirical observations, in tandem with the spatial contextual algorithm. The results confirm that the random forest model demonstrates superior efficacy in the reconstruction of the surface temperature. Benchmarking the STF method against both the fire point datasets disseminated by the China Forest and Grassland Fire Prevention and Suppression Network (CFGFPN) and the Wild Land Fire (WLF) fire point product validation datasets from Himawari-8/9 yielded a zero rate of omission errors and a comprehensive evaluative index, predominantly surpassing 0.74. These findings show that the STF method proposed herein significantly augments the identification of lower-temperature fire point pixels, thereby amplifying the sensitivity of forest surveillance. Full article
Show Figures

Figure 1

Figure 1
<p>Overview map of the study area.</p>
Full article ">Figure 2
<p>Vegetation area and DEM in Hunan Province.</p>
Full article ">Figure 3
<p>Histogram of the frequency distribution of the surface temperatures in vegetation areas in Hunan Province on different dates.</p>
Full article ">Figure 4
<p>Flowchart of fire point detection algorithm.</p>
Full article ">Figure 5
<p>Feature correlation heatmap at different moments during the daytime.</p>
Full article ">Figure 6
<p>Scatter density plot of validation data for RF at different moments of the day.</p>
Full article ">Figure 7
<p>Scatter density plot of reconstructed LST versus original LST.</p>
Full article ">Figure 8
<p>LST of original vs. reconstructed vegetation area during daytime.</p>
Full article ">Figure 9
<p>LST of original vs. reconstructed area at nighttime.</p>
Full article ">Figure 10
<p>The result of fire point identification at 15:30 on 18 October 2022.</p>
Full article ">Figure 11
<p>The result of fire point identification at 10:30 on 19 October 2022.</p>
Full article ">Figure 12
<p>The result of fire point identification at 15:20 on 23 October 2022.</p>
Full article ">Figure 13
<p>The results of fire detection.</p>
Full article ">Figure 14
<p>Identification results of fire point image elements in Xintian County, Hunan Province, at four moments on 18 and 19 October 2022. (<b>a</b>) Mid-infrared 7th band of Himawari-9 image and its bright temperature. (<b>b</b>) Identification results of the algorithm of this study. (<b>c</b>) Results of WLF fire point product.</p>
Full article ">
27 pages, 11040 KiB  
Article
PolyDexFrame: Deep Reinforcement Learning-Based Pick-and-Place of Objects in Clutter
by Muhammad Babar Imtiaz, Yuansong Qiao and Brian Lee
Machines 2024, 12(8), 547; https://doi.org/10.3390/machines12080547 - 11 Aug 2024
Viewed by 338
Abstract
This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile [...] Read more.
This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile robotic manipulations involving different robotic grippers in a completely self-supervised manner. The problem was tackled as a reinforcement learning problem; after the Markov decision process (MDP) was designed, the off-policy model-free Q-learning algorithm was deployed using deep Q-networks as a Q-function approximator. Four distinct robotic manipulations, i.e., grasp from the prehensile manipulation category and inward slide, outward slide, and suction grip from the non-prehensile manipulation category were considered as actions. The Q-function comprised four fully convolutional networks (FCN) corresponding to each action based on memory-efficient DenseNet-121 variants outputting pixel-wise maps of action-values jointly trained via the pixel-wise parametrization technique. Rewards were awarded according to the status of the action performed, and backpropagation was conducted accordingly for the FCN generating the maximum Q-value. The results showed that the agent learned the sequential combination of the polydexterous prehensile and non-prehensile manipulations, where the non-prehensile manipulations increased the possibility of prehensile manipulations. We achieved promising results in comparison to the baselines, differently designed variants, and density-based testing clutter. Full article
Show Figures

Figure 1

Figure 1
<p>The difference between prehensile and non-prehensile manipulations.</p>
Full article ">Figure 2
<p>The flow diagram of a deep neural network.</p>
Full article ">Figure 3
<p>The working of a reinforcement-learning-based agent.</p>
Full article ">Figure 4
<p>The block diagram represents the working of a deep reinforcement learning-based agent.</p>
Full article ">Figure 5
<p>Orthographic and perspective vision sensors’ field of view.</p>
Full article ">Figure 6
<p>Generation of RGB-D heightmap from the RGB and depth components.</p>
Full article ">Figure 7
<p>Examples of robotic arm performing grasping, suction-gripping, inward-slide, and outward-slide.</p>
Full article ">Figure 8
<p>The block diagram of the multimodal extended DenseNet-121.</p>
Full article ">Figure 9
<p>RGB-D heightmap rotations by 22.5°.</p>
Full article ">Figure 10
<p>The flowchart diagram of the proposed approach.</p>
Full article ">Figure 11
<p>Visualization of 16 pixelwise Q-values maps through heatmap representation.</p>
Full article ">Figure 12
<p>Regular- and irregular-shaped 3D objects.</p>
Full article ">Figure 13
<p>Jaw gripper and suction cup.</p>
Full article ">Figure 14
<p>RG2 gripper and suction-cup installation.</p>
Full article ">Figure 15
<p>Simulation testbed design.</p>
Full article ">Figure 16
<p>Performance comparison of the proposed approach and the baselines.</p>
Full article ">Figure 17
<p>Performance comparison of the proposed approach and the ResNet101-based variant.</p>
Full article ">Figure 18
<p>Performance comparison of the proposed approach and the no sliding-suction rewards variant.</p>
Full article ">Figure 19
<p>Performance comparison of the proposed approach and the no pretrained weights and no depth channel variants.</p>
Full article ">Figure 20
<p>Clutter size categorization [<a href="#B11-machines-12-00547" class="html-bibr">11</a>].</p>
Full article ">
29 pages, 26119 KiB  
Article
Semi-Supervised Building Extraction with Optical Flow Correction Based on Satellite Video Data in a Tsunami-Induced Disaster Scene
by Huijiao Qiao, Weiqi Qian, Haifeng Hu, Xingbo Huang and Jiequn Li
Sensors 2024, 24(16), 5205; https://doi.org/10.3390/s24165205 - 11 Aug 2024
Viewed by 288
Abstract
Data and reports indicate an increasing frequency and intensity of natural disasters worldwide. Buildings play a crucial role in disaster responses and damage assessments, aiding in planning rescue efforts and evaluating losses. Despite advances in applying deep learning to building extraction, challenges remain [...] Read more.
Data and reports indicate an increasing frequency and intensity of natural disasters worldwide. Buildings play a crucial role in disaster responses and damage assessments, aiding in planning rescue efforts and evaluating losses. Despite advances in applying deep learning to building extraction, challenges remain in handling complex natural disaster scenes and reducing reliance on labeled datasets. Recent advances in satellite video are opening a new avenue for efficient and accurate building extraction research. By thoroughly mining the characteristics of disaster video data, this work provides a new semantic segmentation model for accurate and efficient building extraction based on a limited number of training data, which consists of two parts: the prediction module and the automatic correction module. The prediction module, based on a base encoder–decoder structure, initially extracts buildings using a limited amount of training data that are obtained instantly. Then, the automatic correction module takes the output of the prediction module as input, constructs a criterion for identifying pixels with erroneous semantic information, and uses optical flow values to extract the accurate corresponding semantic information on the corrected frame. The experimental results demonstrate that the proposed method outperforms other methods in accuracy and computational complexity in complicated natural disaster scenes. Full article
Show Figures

Figure 1

Figure 1
<p>The framework of the semi-supervised building extraction.</p>
Full article ">Figure 2
<p>The architecture of SegNet.</p>
Full article ">Figure 3
<p>Summary of error types in the area-decreasing category: (<b>a</b>) change curve of the building area; (<b>b</b>–<b>e</b>) initial building extraction results of four adjacent frames.</p>
Full article ">Figure 4
<p>Summary of error types in the area-increasing category: (<b>a</b>) change curve of the building area; (<b>b</b>–<b>e</b>) initial building extraction results of four adjacent frames.</p>
Full article ">Figure 5
<p>Criteria for evaluating types of decrease.</p>
Full article ">Figure 6
<p>Criteria for evaluating types of increase.</p>
Full article ">Figure 7
<p>Illustration of the displacement computation.</p>
Full article ">Figure 8
<p>The flowchart of the proposed automatic correction algorithm.</p>
Full article ">Figure 9
<p>Representative schematic of the preprocessed key frames.</p>
Full article ">Figure 10
<p>Representative images and supervised classification results across various phases of the disaster: (<b>a</b>) early phase; (<b>b</b>) mid phase; (<b>c</b>) late phase. The red and green channels in the supervised classification results denote buildings and others, respectively.</p>
Full article ">Figure 11
<p>Example images of data enhancement in the training dataset.</p>
Full article ">Figure 12
<p>Initial building extraction results for the three frames: (<b>a</b>) original image; (<b>b</b>) initial building extraction results; (<b>c</b>) ground truth.</p>
Full article ">Figure 13
<p>Identification results of erroneous parts in the semantic segmentation results of diverse frames: (<b>a</b>) early phase: frame 179; (<b>b</b>) mid phase: frame 339; (<b>c</b>) late phase: frame 439.</p>
Full article ">Figure 14
<p>Correction process for highlighted area in rose from <a href="#sensors-24-05205-f013" class="html-fig">Figure 13</a>.</p>
Full article ">Figure 15
<p>Comparison of accuracy before and after calibration of example area.</p>
Full article ">Figure 16
<p>Comparison of accuracy before and after calibration of example frames.</p>
Full article ">Figure 17
<p>Visualization of initial and corrected semantic segmentation results on representative key frames.</p>
Full article ">Figure 18
<p>Building pixel counts in initial semantic segmentation results across 21 frames.</p>
Full article ">Figure 19
<p>Building pixel counts in corrected semantic segmentation results across 21 frames.</p>
Full article ">Figure 20
<p>Examples of cracked images and the corresponding reconstructed images: (<b>a</b>) cracked images; (<b>b</b>) reconstructed images.</p>
Full article ">
40 pages, 19828 KiB  
Article
Pyramid Cascaded Convolutional Neural Network with Graph Convolution for Hyperspectral Image Classification
by Haizhu Pan, Hui Yan, Haimiao Ge, Liguo Wang and Cuiping Shi
Remote Sens. 2024, 16(16), 2942; https://doi.org/10.3390/rs16162942 - 11 Aug 2024
Viewed by 207
Abstract
Convolutional neural networks (CNNs) and graph convolutional networks (GCNs) have made considerable advances in hyperspectral image (HSI) classification. However, most CNN-based methods learn features at a single-scale in HSI data, which may be insufficient for multi-scale feature extraction in complex data scenes. To [...] Read more.
Convolutional neural networks (CNNs) and graph convolutional networks (GCNs) have made considerable advances in hyperspectral image (HSI) classification. However, most CNN-based methods learn features at a single-scale in HSI data, which may be insufficient for multi-scale feature extraction in complex data scenes. To learn the relations among samples in non-grid data, GCNs are employed and combined with CNNs to process HSIs. Nevertheless, most methods based on CNN-GCN may overlook the integration of pixel-wise spectral signatures. In this paper, we propose a pyramid cascaded convolutional neural network with graph convolution (PCCGC) for hyperspectral image classification. It mainly comprises CNN-based and GCN-based subnetworks. Specifically, in the CNN-based subnetwork, a pyramid residual cascaded module and a pyramid convolution cascaded module are employed to extract multiscale spectral and spatial features separately, which can enhance the robustness of the proposed model. Furthermore, an adaptive feature-weighted fusion strategy is utilized to adaptively fuse multiscale spectral and spatial features. In the GCN-based subnetwork, a band selection network (BSNet) is used to learn the spectral signatures in the HSI using nonlinear inter-band dependencies. Then, the spectral-enhanced GCN module is utilized to extract and enhance the important features in the spectral matrix. Subsequently, a mutual-cooperative attention mechanism is constructed to align the spectral signatures between BSNet-based matrix with the spectral-enhanced GCN-based matrix for spectral signature integration. Abundant experiments performed on four widely used real HSI datasets show that our model achieves higher classification accuracy than the fourteen other comparative methods, which shows the superior classification performance of PCCGC over the state-of-the-art methods. Full article
26 pages, 14290 KiB  
Article
Exploratory Analysis Using Deep Learning for Water-Body Segmentation of Peru’s High-Mountain Remote Sensing Images
by William Isaac Perez-Torres, Diego Armando Uman-Flores, Andres Benjamin Quispe-Quispe, Facundo Palomino-Quispe, Emili Bezerra, Quefren Leher, Thuanne Paixão and Ana Beatriz Alvarez
Sensors 2024, 24(16), 5177; https://doi.org/10.3390/s24165177 - 10 Aug 2024
Viewed by 388
Abstract
High-mountain water bodies represent critical components of their ecosystems, serving as vital freshwater reservoirs, environmental regulators, and sentinels of climate change. To understand the environmental dynamics of these regions, comprehensive analyses of lakes across spatial and temporal scales are necessary. While remote sensing [...] Read more.
High-mountain water bodies represent critical components of their ecosystems, serving as vital freshwater reservoirs, environmental regulators, and sentinels of climate change. To understand the environmental dynamics of these regions, comprehensive analyses of lakes across spatial and temporal scales are necessary. While remote sensing offers a powerful tool for lake monitoring, applications in high-mountain terrain present unique challenges. The Ancash and Cuzco regions of the Peruvian Andes exemplify these challenges. These regions harbor numerous high-mountain lakes, which are crucial for fresh water supply and environmental regulation. This paper presents an exploratory examination of remote sensing techniques for lake monitoring in the Ancash and Cuzco regions of the Peruvian Andes. The study compares three deep learning models for lake segmentation: the well-established DeepWaterMapV2 and WatNet models and the adapted WaterSegDiff model, which is based on a combination of diffusion and transformation mechanisms specifically conditioned for lake segmentation. In addition, the Normalized Difference Water Index (NDWI) with Otsu thresholding is used for comparison purposes. To capture lakes across these regions, a new dataset was created with Landsat-8 multispectral imagery (bands 2–7) from 2013 to 2023. Quantitative and qualitative analyses were performed using metrics such as Mean Intersection over Union (MIoU), Pixel Accuracy (PA), and F1 Score. The results achieved indicate equivalent performance of DeepWaterMapV2 and WatNet encoder–decoder architectures, achieving adequate lake segmentation despite the challenging geographical and atmospheric conditions inherent in high-mountain environments. In the qualitative analysis, the behavior of the WaterSegDiff model was considered promising for the proposed application. Considering that WatNet is less computationally complex, with 3.4 million parameters, this architecture becomes the most pertinent to implement. Additionally, a detailed temporal analysis of Lake Singrenacocha in the Vilcanota Mountains was conducted, pointing out the more significant behavior of the WatNet model. Full article
(This article belongs to the Special Issue Application of Satellite Remote Sensing in Geospatial Monitoring)
Show Figures

Figure 1

Figure 1
<p>Location of the study area.</p>
Full article ">Figure 2
<p>Landsat-8 scenes selected for study.</p>
Full article ">Figure 3
<p>Combining process from B2 to B7 into a single 6-channel image.</p>
Full article ">Figure 4
<p>From left to right: <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>θ</mi> <mo>,</mo> <mi>ρ</mi> <mo>)</mo> </mrow> </semantics></math> parameter space, deskwed image, cropped image, and division of the image into 256 × 256 pixel patches.</p>
Full article ">Figure 5
<p>Mask creation process.</p>
Full article ">Figure 6
<p>WatNet model architecture.</p>
Full article ">Figure 7
<p>DeepWaterMapV2 model architecture based on 3 primary blocks.</p>
Full article ">Figure 8
<p>General architecture of WaterSegDiff based on a conditioning model and a diffusion model that integrate their information through two conditioning mechanisms, <math display="inline"><semantics> <mi mathvariant="script">U</mi> </semantics></math>-SA and SS-Former.</p>
Full article ">Figure 9
<p>SS-Former internal architecture consisting of two symmetrical cross-attention modules.</p>
Full article ">Figure 10
<p>Qualitative analysis of 5 selected samples that represent large lakes with compact structures. Showing the RGB image, ground truth, NDWI, WatNet, DeepWaterMapV2, and WaterSegDiff results. (<b>a</b>) Large and irregular lake, (<b>b</b>) two lakes with compact structure, (<b>c</b>) scene with river crossing, (<b>d</b>) large lake in mountainous region, (<b>e</b>) lake surrounded by dense vegetation.</p>
Full article ">Figure 11
<p>Qualitative analysis of 5 selected samples that represent small and dispersed lakes. Showing the RGB image, ground truth, NDWI, WatNet, DeepWaterMapV2, and WaterSegDiff results. (<b>a</b>,<b>b</b>) Snowy scene with shadows with presence of clear and turbid lakes, (<b>c</b>) completely snowy scene, (<b>d</b>,<b>e</b>) partially snowy area with scattered lakes.</p>
Full article ">Figure 12
<p>The edges extracted from Lake Singrenacocha based on NDWI, WatNet, DeepWaterMapV2, and WaterSegDiff. Highlights in yellow, green, blue, and red for the years 2014, 2016, 2018, and 2020, respectively.</p>
Full article ">Figure 13
<p>Graphical representation of the segmentation performance of Lake Singrenacocha during the years 2014, 2016, 2018, and 2020.</p>
Full article ">
22 pages, 7057 KiB  
Article
Extraction of Crop Row Navigation Lines for Soybean Seedlings Based on Calculation of Average Pixel Point Coordinates
by Bo Zhang, Dehao Zhao, Changhai Chen, Jinyang Li, Wei Zhang, Liqiang Qi and Siru Wang
Agronomy 2024, 14(8), 1749; https://doi.org/10.3390/agronomy14081749 - 9 Aug 2024
Viewed by 188
Abstract
The extraction of navigation lines is a crucial aspect in the field autopilot system for intelligent agricultural equipment. Given that soybean seedlings are small, and straw can be found in certain Northeast China soybean fields, accurately obtaining feature points and extracting navigation lines [...] Read more.
The extraction of navigation lines is a crucial aspect in the field autopilot system for intelligent agricultural equipment. Given that soybean seedlings are small, and straw can be found in certain Northeast China soybean fields, accurately obtaining feature points and extracting navigation lines during the soybean seedling stage poses numerous challenges. To solve the above problems, this paper proposes a method of extracting navigation lines based on the average coordinate feature points of pixel points in the bean seedling belt according to the calculation of the average coordinate. In this study, the soybean seedling was chosen as the research subject, and the Hue, Saturation, Value (HSV) colour model was employed in conjunction with the maximum interclass variance (OTSU) method for RGB image segmentation. To extract soybean seedling bands, a novel approach of framing binarised image contours by drawing external rectangles and calculating average coordinates of white pixel points as feature points was proposed. The feature points were normalised, and then the improved adaptive DBSCAN clustering method was used to cluster the feature points. The least squares method was used to fit the centre line of the crops and the navigation line, and the results showed that the average distance deviation and the average angle deviation of the proposed algorithm were 7.38 and 0.32. The fitted navigation line achieved an accuracy of 96.77%, meeting the requirements for extracting navigation lines in intelligent agricultural machinery equipment for soybean inter-row cultivation. This provides a theoretical foundation for realising automatic driving of intelligent agricultural machinery in the field. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Trial site and planting pattern.</p>
Full article ">Figure 2
<p>Image sensor placement and shooting angle.</p>
Full article ">Figure 3
<p>Soybean seedling navigation line extraction process.</p>
Full article ">Figure 4
<p>HSV colour space green image extraction: (<b>a</b>) angle of 45° to the ground; (<b>b</b>) angle of 0° to the ground.</p>
Full article ">Figure 5
<p>External rectangle drawing of binarised image: (<b>a</b>) angle of 45° to the ground; (<b>b</b>) angle of 0° to the ground.</p>
Full article ">Figure 6
<p>Feature point extraction process.</p>
Full article ">Figure 7
<p>Adaptive DBSCAN clustering process: (<b>a</b>) general flow chart; (<b>b</b>) eps and mimPts calculation process.</p>
Full article ">Figure 8
<p>Adaptive DBSCAN clustering results.</p>
Full article ">Figure 9
<p>Visible light images of soybeans collected under multiple conditions: (<b>a</b>) sunny day, 45° angle to the ground raw image; (<b>b</b>) sunny day, 0° angle to the ground raw image; (<b>c</b>) cloudy, 0° angle to the ground raw image, with straw; (<b>d</b>) sunny day, 90° angle to the ground, with straw raw image.</p>
Full article ">Figure 10
<p>Image pre-processing results: (<b>a</b>) grayscale image under sunny weather; (<b>b</b>) binarised image under sunny weather; (<b>c</b>) grayscale image under cloudy weather; (<b>d</b>) binarised image under cloudy weather.</p>
Full article ">Figure 11
<p>Comparison of the effect of feature point extraction: (<b>a</b>) sunny day, this paper’s extraction method; (<b>b</b>) sunny day, Harris corner detection; (<b>c</b>) sunny day, Fast corner detection; (<b>d</b>) sunny day, Sift corner detection; (<b>e</b>) multi-cloud, this paper’s extraction method; (<b>f</b>) multi-cloud, Harris corner detection; (<b>g</b>) multi-cloud, Fast corner detection; (<b>h</b>) multi-cloud, Sift corner detection.</p>
Full article ">Figure 12
<p>Characteristic point clustering of soybean seedling bands: (<b>a</b>) sunny day, 45° angle to the ground raw image, feature point extraction results; (<b>b</b>) sunny day, 45° angle to the ground raw image, clustering results; (<b>c</b>) sunny day, 0° angle to the ground raw image, feature point extraction results; (<b>d</b>) sunny day, 0° angle to the ground raw image, clustering results; (<b>e</b>) cloudy, 0° angle to the ground raw image, with straw, feature point extraction results; (<b>f</b>) cloudy, 0° angle to the ground raw image, with straw, clustering results; (<b>g</b>) sunny day, 90° angle to the ground, with straw raw image, feature point extraction results; (<b>h</b>) sunny day, 90° angle to the ground, with straw raw image, clustering results.</p>
Full article ">Figure 13
<p>Crop row centreline fitting and navigation line extraction: (<b>a</b>) crop row centreline fitting results; (<b>b</b>) navigation line extraction results.</p>
Full article ">Figure 14
<p>Schematic diagram of the accuracy verification method.</p>
Full article ">Figure 15
<p>Results of four sets of standard lines versus the fitted straight line: (<b>a</b>) sunny 45° image validation results; (<b>b</b>) cloudy 45° tilted image validation results; (<b>c</b>) sunny 45° multi-straw image validation results; and (<b>d</b>) cloudy 45° tilted, seedling-deficient image validation results.</p>
Full article ">Figure 16
<p>Timeliness verification test results.</p>
Full article ">
17 pages, 6141 KiB  
Article
A New GIS-Based Framework to Detect Urban Heat Islands and Its Application on the City of Naples (Italy)
by Rosa Cafaro, Barbara Cardone, Valeria D’Ambrosio, Ferdinando Di Martino and Vittorio Miraglia
Land 2024, 13(8), 1253; https://doi.org/10.3390/land13081253 - 9 Aug 2024
Viewed by 236
Abstract
This research presents a GIS-based framework used to detect urban heat islands and determine which urban settlement elements are most critical when heatwave risks exist. The proposed method uses the Iterative Self-Organizing Data Analysis (ISODATA) clustering algorithm applied to the satellite land surface [...] Read more.
This research presents a GIS-based framework used to detect urban heat islands and determine which urban settlement elements are most critical when heatwave risks exist. The proposed method uses the Iterative Self-Organizing Data Analysis (ISODATA) clustering algorithm applied to the satellite land surface temperature distribution recorded during heatwaves for the detection of urban heat islands. A pixel classification confidence level maximization approach, obtained by running a maximum likelihood classification algorithm, is performed to determine the optimal number of clusters. The areas labeled as hotspots constitute the detected urban heat islands (UHIs). This method was tested on an urban settlement set up by the municipality of Naples (Italy). Comparison tests were performed with other urban heat island detection methods such as standard deviation thresholding and Getis-Ord Gi* hotspot detection; indices measuring the density of buildings, the percentage of permeable open spaces, and vegetation cover are taken into consideration to evaluate the accuracy of the urban heat islands detected. These tests highlight that the proposed method provides the most accurate results. It could be an effective tool to support the decision maker in evaluating which urban areas are the most critical during heatwave scenarios. Full article
Show Figures

Figure 1

Figure 1
<p>Flow diagram of the proposed method.</p>
Full article ">Figure 2
<p>Study area of the city of Naples, Italy.</p>
Full article ">Figure 3
<p>LST in the city of Naples, Italy, obtained on 15 July 2023.</p>
Full article ">Figure 4
<p>LST classification obtained with the proposed model.</p>
Full article ">Figure 5
<p>Map of the UHIs detected using the standard deviation method.</p>
Full article ">Figure 6
<p>Map of the UHIs detected using Getis-Ord Gi*.</p>
Full article ">Figure 7
<p>Map of the UHIs detected using the proposed method.</p>
Full article ">Figure 8
<p>Building density obtained with the standard deviation method (<b>a</b>); building density obtained with the Getis-Ord Gi* method (<b>b</b>); building density obtained with the proposed method (<b>c</b>). The red line represents the value of the building density obtained when considering the entire city.</p>
Full article ">Figure 9
<p>Waterproof ratio obtained with the standard deviation method (<b>a</b>); waterproof ratio obtained with the Getis-Ord Gi* method (<b>b</b>); waterproof ratio obtained with the proposed method (<b>c</b>). The red line represents the value of the waterproof ratio obtained when considering the entire city.</p>
Full article ">Figure 10
<p>Greenery percentage obtained with the standard deviation method (<b>a</b>); greenery percentage obtained with the Getis-Ord Gi* method (<b>b</b>); greenery percentage obtained with the proposed method (<b>c</b>). The red line represents the value of the greenery percentage obtained when considering the entire city.</p>
Full article ">
41 pages, 4974 KiB  
Review
An Application-Driven Survey on Event-Based Neuromorphic Computer Vision
by Dario Cazzato and Flavio Bono
Information 2024, 15(8), 472; https://doi.org/10.3390/info15080472 - 9 Aug 2024
Viewed by 348
Abstract
Traditional frame-based cameras, despite their effectiveness and usage in computer vision, exhibit limitations such as high latency, low dynamic range, high power consumption, and motion blur. For two decades, researchers have explored neuromorphic cameras, which operate differently from traditional frame-based types, mimicking biological [...] Read more.
Traditional frame-based cameras, despite their effectiveness and usage in computer vision, exhibit limitations such as high latency, low dynamic range, high power consumption, and motion blur. For two decades, researchers have explored neuromorphic cameras, which operate differently from traditional frame-based types, mimicking biological vision systems for enhanced data acquisition and spatio-temporal resolution. Each pixel asynchronously captures intensity changes in the scene above certain user-defined thresholds, and streams of events are captured. However, the distinct characteristics of these sensors mean that traditional computer vision methods are not directly applicable, necessitating the investigation of new approaches before being applied in real applications. This work aims to fill existing gaps in the literature by providing a survey and a discussion centered on the different application domains, differentiating between computer vision problems and whether solutions are better suited for or have been applied to a specific field. Moreover, an extensive discussion highlights the major achievements and challenges, in addition to the unique characteristics, of each application field. Full article
(This article belongs to the Special Issue Neuromorphic Engineering and Machine Learning)
Show Figures

Figure 1

Figure 1
<p>Two commercial examples of neuromorphic sensors. (<b>a</b>) The Prophesee EVK4 that uses the Sony IMX636 CMOS. (<b>b</b>) The Inivation DAVIS346. Courtesy of Prof. Maria Martini, Kingston University London, UK.</p>
Full article ">Figure 2
<p>A summary of the paper organization. Critical analysis and discussions are highlighted with red text.</p>
Full article ">Figure 3
<p>Analogies between the human visual system (top) and the neuromorphic vision sensor (bottom). “Neuron” (<a href="https://skfb.ly/oyUVY" target="_blank">https://skfb.ly/oyUVY</a>, accessed on 4 August 2024) by mmarynguyen is licensed under Creative Commons Attribution-NonCommercial. “Human Head” (<a href="https://skfb.ly/ouFsp" target="_blank">https://skfb.ly/ouFsp</a>, accessed on 4 August 2024) by VistaPrime is licensed under Creative Commons Attribution. Car and lens generated with Adobe Firefly©.</p>
Full article ">Figure 4
<p>Output from a neuromorphic sensor (<b>left</b>) and a frame-based camera (<b>right</b>) while recording a rotating PC fan in the <span class="html-italic">xyt</span>-plan. ON and OFF events are rendered respectively as blue and black pixels on a white background.</p>
Full article ">Figure 5
<p>(<b>a</b>): An example accumulation image from an event camera while moving in an indoor environment. ON and OFF events are rendered respectively as blue and black pixels on a white background. (<b>b</b>): the same scene taken from a RGB camera.</p>
Full article ">Figure 6
<p>Two frames with respective color and event information from the <span class="html-italic">CED: Color Event Dataset</span> in [<a href="#B16-information-15-00472" class="html-bibr">16</a>].</p>
Full article ">Figure 7
<p>A scheme of the manuscript selection process. We report the documents added after filtering the output from the Scopus search as “injection” in the diagram.</p>
Full article ">Figure 8
<p>The three-level hierarchical organization to classify computer vision tasks. Usually, the amount of data tends to be lower with higher-level presentations.</p>
Full article ">Figure 9
<p>The association between computer vision tasks and application domains for the works analyzed in <a href="#sec6-information-15-00472" class="html-sec">Section 6</a>.</p>
Full article ">
19 pages, 4407 KiB  
Article
Superpixels with Content-Awareness via a Two-Stage Generation Framework
by Cheng Li, Nannan Liao, Zhe Huang, He Bian, Zhe Zhang and Long Ren
Symmetry 2024, 16(8), 1011; https://doi.org/10.3390/sym16081011 - 8 Aug 2024
Viewed by 323
Abstract
The superpixel usually serves as a region-level feature in various image processing tasks, and is known for segmentation accuracy, spatial compactness and running efficiency. However, since these properties are intrinsically incompatible, there is still a compromise within the overall performance of existing superpixel [...] Read more.
The superpixel usually serves as a region-level feature in various image processing tasks, and is known for segmentation accuracy, spatial compactness and running efficiency. However, since these properties are intrinsically incompatible, there is still a compromise within the overall performance of existing superpixel algorithms. In this work, the property constraint in superpixels is relaxed by in-depth understanding of the image content, and a novel two-stage superpixel generation framework is proposed to produce content-aware superpixels. In the global processing stage, a diffusion-based online average clustering framework is introduced to efficiently aggregate image pixels into multiple superpixel candidates according to color and spatial information. During this process, a centroid relocation strategy is established to dynamically guide the region updating. According to the area feature in manifold space, several superpixel centroids are then split or merged to optimize the regional representation of image content. Subsequently, local updating is adopted on pixels in those superpixel regions to further improve the performance. As a result, the dynamic centroid relocating strategy offers online averaging clustering the property of content awareness through coarse-to-fine label updating. Extensive experiments verify that the produced superpixels achieve desirable and comprehensive performance on boundary adherence, visual satisfactory and time consumption. The quantitative results are on par with existing state-of-the-art algorithms in terms with several common property metrics. Full article
(This article belongs to the Special Issue Image Processing and Symmetry: Topics and Applications)
Show Figures

Figure 1

Figure 1
<p>Workflow of the proposed CATS superpixel generation framework. (<b>a</b>) Input image; (<b>b</b>) Grid-level seeding initialization, each white dot indicates an initial cluster centroid; (<b>c</b>) Superpixel candidates produced by conventional SNIC method, the white outlines represent the boundaries of different superpixel regions; (<b>d</b>–<b>g</b>) Centroid relocation and local updating, note that the series of processes are performed iteratively until all superpixel candidates maintain a moderate content density. (<b>d</b>) Centroid distribution of (<b>c</b>), the green dots are relocated centroids and the region marked in red has the most dense content; (<b>e</b>) Zoom-in performance of (<b>d</b>), the blue dots are split centroids based on centroid relocation strategy in CATS; (<b>f</b>) Local updating results of (<b>e</b>); (<b>g</b>) Overall performance of (<b>f</b>); (<b>h</b>) Result of CATS superpixels; (<b>i</b>) Ground truth of (<b>a</b>) covered by CATS superpixels of (<b>h</b>).</p>
Full article ">Figure 2
<p>Image transformation from 2-dimensional plane to 2-dimensional manifold.</p>
Full article ">Figure 3
<p>Schematic diagram of a pixel area on the 2-dimensional manifold space.</p>
Full article ">Figure 4
<p>Four images and the corresponding ground-truths in the BSDS500 dataset.</p>
Full article ">Figure 5
<p>Visual comparison with approximately 200 superpixels. Alternating rows show each segmented image followed by the zoom-in performance. (<b>a</b>) MSLIC. (<b>b</b>) SNIC. (<b>c</b>) CATS. (<b>d</b>) USEQ. (<b>e</b>) BASS. (<b>f</b>) IBIS. (<b>g</b>) DBSCAN.</p>
Full article ">Figure 6
<p>Visual results with approximate 100 CATS superpixels. Alternating rows show the zoom-in performance of ground truth covered by CATS superpixels.</p>
Full article ">Figure 7
<p>Quantitative evaluation of eight algorithms on the test subset of BSDS500. A higher metrical value corresponds to better outcomes with the exception of USE. (<b>a</b>) BR. (<b>b</b>) USE. (<b>c</b>) ASA. (<b>d</b>) COM.</p>
Full article ">
19 pages, 16632 KiB  
Article
Rational-Function-Model-Based Rigorous Bundle Adjustment for Improving the Relative Geometric Positioning Accuracy of Multiple Korea Multi-Purpose Satellite-3A Images
by Seunghwan Ban and Taejung Kim
Remote Sens. 2024, 16(16), 2890; https://doi.org/10.3390/rs16162890 - 7 Aug 2024
Viewed by 434
Abstract
Recent advancements in satellite technology have significantly increased the availability of high-resolution imagery for Earth observation, enabling nearly all regions to be captured frequently throughout the year. These images have become a vast source of big data and hold immense potential for various [...] Read more.
Recent advancements in satellite technology have significantly increased the availability of high-resolution imagery for Earth observation, enabling nearly all regions to be captured frequently throughout the year. These images have become a vast source of big data and hold immense potential for various applications, including environmental monitoring, urban planning, and disaster management. However, obtaining ground control points (GCPs) and performing geometric correction is a time-consuming and costly process, often limiting the efficient use of these images. To address this challenge, this study introduces a Rational Function Model (RFM)-based rigorous bundle adjustment method to enhance the relative geometric positioning accuracy of multiple KOMPSAT-3A images without the need for GCPs. The proposed method was tested using KOMPSAT-3A images. The results showed a significant improvement in geometric accuracy, with mean positional errors reduced from 30.02 pixels to 2.21 pixels. This enhancement ensured that the corrected images derived from the proposed method were reliable and accurate, making it highly valuable for various geospatial applications. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

Figure 1
<p>Test area and enlarged images.</p>
Full article ">Figure 2
<p>Tie point extraction algorithm in proposed method.</p>
Full article ">Figure 3
<p>Visual representation of RFM with correction factors.</p>
Full article ">Figure 4
<p>Bundle adjustment concept of multiple satellite images.</p>
Full article ">Figure 5
<p>Method for generating result images with vDEM as a virtual projection model. The colored dots represent sampling points after ground coordinates adjustment.</p>
Full article ">Figure 6
<p>Height value estimation based on ray tracing.</p>
Full article ">Figure 7
<p>Extraction result of tie point (black dots) and check points (red dots).</p>
Full article ">Figure 8
<p>Scatter plot for tie points used for bundle adjustment of Dataset A.</p>
Full article ">Figure 9
<p>Scatter plot for tie points used for bundle adjustment of Dataset B.</p>
Full article ">Figure 10
<p>Scatter plot for tie points used for bundle adjustment of Dataset C.</p>
Full article ">Figure 11
<p>Scatter plot for tie points used for bundle adjustment of Dataset D.</p>
Full article ">Figure 12
<p>Concept of transform of modeling points and result image. The red dots represent initial tie points, while the blue dots represent tie points after adjustment.</p>
Full article ">Figure 13
<p>Enlarged result images with our proposed method (Dataset A).</p>
Full article ">Figure 14
<p>Enlarged result images with our proposed method (Dataset B).</p>
Full article ">Figure 15
<p>Enlarged result images with our proposed method (Dataset C).</p>
Full article ">Figure 16
<p>Enlarged result images with our proposed method (Dataset D).</p>
Full article ">
19 pages, 4475 KiB  
Article
A Multi-Level Cross-Attention Image Registration Method for Visible and Infrared Small Unmanned Aerial Vehicle Targets via Image Style Transfer
by Wen Jiang, Hanxin Pan, Yanping Wang, Yang Li, Yun Lin and Fukun Bi
Remote Sens. 2024, 16(16), 2880; https://doi.org/10.3390/rs16162880 - 7 Aug 2024
Viewed by 318
Abstract
Small UAV target detection and tracking based on cross-modality image fusion have gained widespread attention. Due to the limited feature information available from small UAVs in images, where they occupy a minimal number of pixels, the precision required for detection and tracking algorithms [...] Read more.
Small UAV target detection and tracking based on cross-modality image fusion have gained widespread attention. Due to the limited feature information available from small UAVs in images, where they occupy a minimal number of pixels, the precision required for detection and tracking algorithms is particularly high in complex backgrounds. Image fusion techniques can enrich the detailed information for small UAVs, showing significant advantages under extreme lighting conditions. Image registration is a fundamental step preceding image fusion. It is essential to achieve accurate image alignment before proceeding with image fusion to prevent severe ghosting and artifacts. This paper specifically focused on the alignment of small UAV targets within infrared and visible light imagery. To address this issue, this paper proposed a cross-modality image registration network based on deep learning, which includes a structure preservation and style transformation network (SPSTN) and a multi-level cross-attention residual registration network (MCARN). Firstly, the SPSTN is employed for modality transformation, transferring the cross-modality task into a single-modality task to reduce the information discrepancy between modalities. Then, the MCARN is utilized for single-modality image registration, capable of deeply extracting and fusing features from pseudo infrared and visible images to achieve efficient registration. To validate the effectiveness of the proposed method, comprehensive experimental evaluations were conducted on the Anti-UAV dataset. The extensive evaluation results validate the superiority and universality of the cross-modality image registration framework proposed in this paper, which plays a crucial role in subsequent image fusion tasks for more effective target detection. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)
Show Figures

Figure 1

Figure 1
<p>Overview of small UAVs flying in various backgrounds.</p>
Full article ">Figure 2
<p>Comparison of cross-modality images and registration and unregistration images.</p>
Full article ">Figure 3
<p>The workflow for generating pseudo infrared images by CycleGAN.</p>
Full article ">Figure 4
<p>The workflow of the cross-modality image registration network for small UAV targets.</p>
Full article ">Figure 5
<p>The algorithm framework of a resnet block.</p>
Full article ">Figure 6
<p>Demonstration of the cross-attention mechanism.</p>
Full article ">Figure 7
<p>Anti-UAV datasets.</p>
Full article ">Figure 8
<p>Visualization results of the comparison in various scenarios.</p>
Full article ">Figure 9
<p>Visualization results of the comparison between registered and unregistered image pairs.</p>
Full article ">Figure 10
<p>Visualization results of the comparison between SPSTN and STN.</p>
Full article ">Figure 11
<p>Visualization results of the comparison between MCARN and MRN.</p>
Full article ">
Back to TopTop