[go: up one dir, main page]

Next Article in Journal
Reliability Evaluation and Reliability-Based Sensitivity for Transposition System in Power Servo Tool Holder
Previous Article in Journal
A Review of Sustainable Pavement Aggregates
Previous Article in Special Issue
Mechanical Behavior of Low-Strength Hydraulic Lime Concrete Reinforced with Flexible Fibers under Quasi-Static and Dynamic Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Historic Built Environment Assessment and Management by Deep Learning Techniques: A Scoping Review

Department of Civil, Environmental, Land, Construction and Chemistry (DICATECh), Polytechnic University of Bari, 70125 Bari, Italy
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(16), 7116; https://doi.org/10.3390/app14167116
Submission received: 12 July 2024 / Revised: 5 August 2024 / Accepted: 12 August 2024 / Published: 13 August 2024
(This article belongs to the Special Issue Advanced Technologies in Cultural Heritage)

Abstract

:
Recent advancements in digital technologies and automated analysis techniques applied to Historic Built Environment (HBE) demonstrate significant advantages in efficiently collecting and interpreting data for building conservation activities. Integrating digital image processing through Artificial Intelligence approaches further streamlines data analysis for diagnostic assessments. In this context, this paper presents a scoping review based on Scopus and Web of Science databases, following the PRISMA protocol, focusing on applying Deep Learning (DL) architectures for image-based classification of decay phenomena in the HBE, aiming to explore potential implementations in decision support system. From the literature screening process, 29 selected articles were analyzed according to methods for identifying buildings’ surface deterioration, cracks, and post-disaster damage at a district scale, with a particular focus on the innovative DL architectures developed, the accuracy of results obtained, and the classification methods adopted to understand limitations and strengths. The results highlight current research trends and the potential of DL approaches for diagnostic purposes in the built heritage conservation field, evaluating methods and tools for data acquisition and real-time monitoring, and emphasizing the advantages of implementing the adopted techniques in interoperable environments for information sharing among stakeholders. Future challenges involve implementing DL models in mobile apps, using sensors and IoT systems for on-site defect detection and long-term monitoring, integrating multimodal data from non-destructive inspection techniques, and establishing direct connections between data, intervention strategies, timing, and costs, thereby improving heritage diagnosis and management practices.

1. Introduction

The Historic Built Environment (HBE) field has advanced significantly in recent years, particularly in integrating digital technologies for preservation efforts. There is evidence of a substantial shift toward the use of digital methods for data collection, processing, and interpretation in building conservation activities. Technologies such as aerial and close-range photogrammetry and laser scanning are increasingly used for accurate data collection, enabling the efficient assessment of building conditions and structural elements [1]. In addition, readily available tools are being introduced for the reconstruction and virtual representation of architectural environments to simplify management processes in the field of historic preservation and improve data reliability [2]. Traditional survey methods are being replaced by automated model analysis techniques in order to reduce labor-intensive manual tasks and optimize the process in terms of time and costs [3]. The integration of digital image processing through Artificial Intelligence (AI) approaches further simplifies data analysis processes. Among them, automatic image segmentation techniques, Machine Learning (ML), and Deep Learning (DL) are being employed to classify survey products in order to evaluate models from both technical-geometric [4] and structural monitoring [5] and conservation status perspectives [2].
Given the increasing focus on developing advanced methods for assessing the condition of the HBE, it is useful to identify recent reviews conducted in this field.
El Masri and Rakha [6] provided a comprehensive overview of existing Non-Destructive Testing (NDT) techniques in building audits for identifying characteristics of building envelope components and evaluated the effectiveness of each technique for different audit categories and their compatibility in workflows for informed decision making. The existing literature was organized into workflows specifically for building façade audits to create a framework for conducting large-scale assessments and retrofits of building envelopes. Within the same context, Tejedor et al. [7] examined NDT’s methodologies, instruments, and measurement devices used in Cultural Heritage buildings’ diagnosis, highlighting, on the one hand, the importance of photogrammetric and laser scanning surveys as established techniques for historic preservation, often integrated with Historical Building Information Modeling (HBIM) and Deep Learning techniques; and on the other hand, the existing gap in the use of quantitative data from NDT in HBIM models due to challenges such as the heterogeneity of historic masonry and the complexity of diagnostic procedures. De Fino et al. [8] highlighted the ease of use and accessibility of digital photogrammetry, making it suitable for multiple phases and activities within the condition assessment process by offering a comprehensive understanding of the state of conservation of heritage buildings for decision-makers. The work focuses on using photogrammetric models and methods in the diagnostic process, from non-destructive diagnostic investigations to structural modeling, monitoring, and mapping degradation for the comprehensive documentation of buildings useful for the recovery process. Zhao et al. [9] focused on point cloud segmentation for architectural cultural heritage, highlighting how this method effectively identifies deteriorating conditions and cracks, automatically generating thematic maps. With specific reference to the methodology to be adopted, machine learning and deep learning approaches are evaluated in relation to the scene’s complexity and specific objectives. Argyrou and Agapiou [10] highlighted the importance of the use of Remote Sensing (RS) integrated with AI-based techniques for cultural heritage, analyzing applications for archaeological sites. Implementing automated object detection approaches using ML and DL algorithms supports the interpretation and analysis phase of the data obtained from surveying and monitoring remote sensing. Fiorucci et al. [11] investigated the use of Machine Learning, highlighting the effectiveness of Supervised Deep Neural Networks (DNNs) in the investigation of Cultural Heritage (CH), showing excellent results in digital work analysis and archaeological remote sensing, as well as for visual work analysis and prediction of painting styles, even based on small labeled datasets. Mishra [12] highlighted how ML-based techniques, such as soft computing and Artificial Neural Networks (ANNs), can determine mechanical parameters in historic buildings as well as facilitate Structural Health Monitoring (SHM) and restoration. Using images and laser scanning, this method offers a non-invasive approach to identifying structural damage in heritage constructions, contrasting with conventional survey techniques. Rossi and Bournas [13] also examined the benefits of AI in structural damage and alterations surveys, proposing an approach for CH documenting and monitoring using Computer Vision (CV) techniques. For this purpose, photogrammetry and infrared thermography are valuable tools whose data, combined with other technologies such as the Internet of Things (IoT), machine learning, and Building Information Modeling (BIM), optimize SHM processes. The review by Mishra and Lourenço [14] consolidated the various damage assessment techniques employing image processing, emphasizing DL techniques in the context of masonry structure conservation. Several case studies on cultural heritage sites where AI-aided visual inspection has been explored were presented. Likewise, Latifi et al. [15] focused on masonry structures, delving into the cracking patterns typically affecting structural elements, with reference also to recent research results on automatic crack detection using ML and DL algorithms. According to a wider-scale view, Li et al. [16] provided insights into exploiting digital technologies for the efficient preservation of architectural heritage within disaster cycles, proposing an integrated research framework for future efforts to be directed toward the prediction of multiple disasters, automated early warning systems for building damage, and intelligent monitoring.
Based on insights gained from analyzing past experiences, it is possible to outline the main research directions currently concerning the application of digital techniques in the field of historic preservation and conservation. The support provided as a non-invasive digital surveying technique, both from geometric and visual points of view, constitutes a key issue in the conservation status of artifacts assessments.
Indeed, by exploiting the geometric characteristics and intrinsic metadata obtained from survey data such as point clouds and images, numerous studies have applied semantic segmentation techniques using ML and DL algorithms to automatically recognize architectural styles [17], structural elements [18], and construction features [19], also in support of automated three-dimensional reconstruction.
For diagnostic purposes, the application of AI-based algorithms for the automatic detection of degradation affecting the built heritage is currently under development compared to different fields such as medicine, the industrial sector, and others. Studies conducted so far have shown interest in the analysis of thermographic images [20], while the analysis of colorimetric and radiometric characteristics related to point clouds [21], orthophotos, and UV maps [22,23] finds broader applications. Another area of investigation is the automatic recognition of cracks in different structural typologies, such as infrastructure and reinforced concrete buildings [24] and masonry buildings [25], as well as other material typologies, such as metallic [26], asphalted [27], and earthen surfaces [28].
Regarding these topics, this paper aims to systematize studies focusing on the most advanced techniques of image segmentation and automatic classification, using Deep Learning algorithms, to automatically detect the degradation conditions of the HBE. The purpose of this review is to analyze the literature on the innovative methodologies adopted, the accuracy of the results obtained, and the potential of these techniques in the decision-making process downstream of the knowledge acquisition phase, while also identifying their limitations and strengths as the basis for future analyses.
For this purpose, the review work is structured according to the following scheme: Section 2 outlines the research questions and the adopted methodology; Section 3 provides a comprehensive description of the selected articles; Section 4 discusses the main findings in response to the research questions; and Section 5 presents the final reflections on the results.

2. Research Methodology

This research was conducted as a scoping review following the guidelines outlined in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement [29], utilizing its established method and flowchart. The workflow followed is shown in Figure 1.
This research aims to review the existing literature concerning automatic image classification techniques applied to the HBE in order to detect and analyze forms of damage and degradation in historical contexts. The review examines how DL techniques are utilized across relevant disciplines and explores potential methods for integrating these techniques into decision support systems. To achieve this goal, the review analysis was conducted based on the formulation of the following research questions:
Q1: What are the current research trends and challenges in the field of HBEs?
Q2: How are data acquisition methods and tools useful in HBE conservation?
Q3: What is the potential of the DL approach for diagnosis purposes?
Q4: How can DL techniques be implemented into interoperational environments to share information between stakeholders?
Q5: What are the future opportunities and directions for the decisional support of preservation and conservation interventions?

2.1. Database Selection and Search Strategy

The collection of relevant publications for the literature review was performed through consultation of the Scopus and Web of Science (WoS) research databases, last accessed in February 2024. The search strategy was conducted using a combination of keywords related to the technique/approach applied, the area of interest analyzed, and the investigation objective, according to the following search criteria:
(“deep” AND “learning”) AND (“historic” OR “historical” OR “cultural” OR “archeological”) AND (“buildings” OR “heritage”) AND (“decay” OR “deterioration” OR “degradation” OR “damage” OR “monitoring”)
The results were further limited to the English language and publication years between 2019 and 2024, considering the significant increase in publications on the topic. Excluding duplicates, a total of 168 relevant papers was obtained.

2.2. Screening of Research Contributions

The first selection of publications judged relevant to the study was carried out through direct analysis of the titles and abstracts. Therefore, exclusion criteria related to the lack of correlation with the topic were defined. Firstly, papers related to building monitoring from an energy perspective using sensors were excluded.
To refine the analysis, the search was limited to papers related to the use of automatic semantic image segmentation techniques in the field of building heritage diagnostic surveys. Therefore, the focus was on digital images derived from photographic and photogrammetric surveys (via cameras, smartphones, or drones); papers focused on the analysis of different types of images, such as thermographic and photoacoustic images, were excluded. Regarding the analysis of products obtained from photogrammetric surveying, papers related to the segmentation and classification of point clouds and/or polygonal meshes were excluded; instead, studies conducted on orthoimages, orthophotos, and UV maps derived from them were considered.
Furthermore, on the basis of papers related to experiments with deep learning algorithms applied to images, classification objectives different from the recognition of building conservation status were excluded, e.g., classifications based on geometric features for automatic recognition of structural elements and architectural styles, land cover and building footprints, or art paintings.
At the end of the screening phase, 29 papers were identified as relevant to the full-text reading and analysis.

2.3. Bibliometric Analysis

A bibliometric analysis of the selected papers supported the evaluation of statistical data related to the papers identified. Specifically, it highlighted a growing increase in scientific production from 2019 to 2024 (February 2024), indicative of growing interest and usage for historical conservation, suggesting further potential future development (Figure 2).
Further, the distribution of publications in the academic world suggests particular interest in China, likely due to the presence of large and important traditional architectural sites and the related protection priorities; significant interest also emerges in Turkey, followed by India, Italy, Republic of Korea, and the United States. A small number of publications come from other countries (Figure 3).
Figure 4 shows the multidisciplinary nature of the investigated research field. The prevalent thematic areas fall within Computer Science and Engineering, aligning with the theme of developing innovative strategies and their application in diagnostic and decision-making phases of the construction process. On the other hand, it is evident how AI-aided methodologies can support analyses and evaluations concerning specific aspects of heritage and its context, referring to structural characteristics and material properties, environmental impacts, and social and economic considerations. This underscores the importance of understanding the nature and extent of damage in the HBE, which are linked to developing appropriate intervention strategies.

2.4. Science Mapping and Topic Identification

The research aims to investigate methods and tools used in the literature for the automatic detection of building decay forms in the HBE in order to simplify decay condition surveys and address recovery and maintenance interventions aimed at specific types of damage detected.
From an initial analysis of the selected papers, it was possible to identify three main areas of investigative interest: 23 papers focused on analyses at the building scale, delving mainly into two distinct aspects: automatic crack detection and automatic segmentation and/or the classification of surface degradation forms detectable on masonry and facades; furthermore, 6 papers assessed damage conditions at the district scale, specifically following natural disasters. These three analysis goals correspond to 3 out of the 4 clusters obtained from the science mapping results performed by VOSviewer (v. 1.6.19), based on the correlation of keywords from the selected papers (Figure 5, Table 1).
After the critical full-text reading, which delved into the methodological approaches adopted according to the specific objectives, the review presentation is organized into sub-sections inherent to the classification method used.

3. Results

3.1. Building Damage Classification

The first significant aspect emerging from the literature analysis is the application of DL techniques for assessing historic buildings’ conservation status, focusing on the identification and classification of surface deterioration of facades and masonries. In this context, the relevant features extracted by the algorithms and the different classification methodologies adopted to perform their respective tasks are highlighted (Table 2).

3.1.1. Binary Classification

One initial image classification methodology involves binary classification, whereby each pixel in the entire image is categorized into one of two classes or categories. This classification task aims to determine whether a given image or its parts belong to a specific class or not. Typically, one class represents the presence of a particular element or feature in the image, while the other class represents the absence of that element.
According to this approach, Bruno et al. [30] proposed a methodology based on a Mask R-CNN model for the detection of decay morphologies on built heritage to automatically assess and recognize multiple alterations on images, such as moist areas and biological colonization. Mask R-CNN allows specific regions of the image to be segmented through masks by associating a class with them. In this case, the two degradation classes were evaluated individually, with a value of 0 associated with the image background and a value of 1 associated with the degradation type. Regarding the adopted architecture, the Residual Network and the Feature Pyramid Network (ResNet101 + FPN) were chosen as convolutional layers because of their efficiency and celerity. FPN was applied after ResNet101 and exploited a pyramidal structure to extract multiple features at different scales. Moreover, a Region Proposal Network (RPN) was applied to the input feature maps, which is a neural network that works by proposing probable regions, which likely contain objects of interest, using a sliding-window technique. Regarding moist areas, the model provides highly accurate outcomes (mean average precision in the testing phase varying from 74% to 80%), unlike the predictions for biological colonization, which are less accurate. Similarly, Samhouri et al. [31] proposed an automatic damage detection method for historic structures, employing a CNN-VGG16 model for image classification and feature extraction. The study focused on detecting five damage typologies in historic stone structures (i.e., material loss, erosion, color change, natural, and sabotage), which were used in the training phase to develop a binary model able to accurately predict the presence or absence of any damage form, with an average accuracy of 98.92%.
On the other hand, other works have focused on different types of materials. For instance, Lee and Yu [32] showed interest in surface damage on wooden architectural cultural heritage, addressing the lack of research in this area. Using a CNN-based model, the system enabled real-time on-site monitoring and automatic damage typology detection based on images taken directly from cultural heritage sites in South Korea. Considering the most prevalent pathologies of wooden structures, the model was trained with images related to forms of exfoliation, deterioration, and cracks. The binary classification methodology adopted allowed the model to output predictions regarding the presence or absence of any defect it was trained on. Four models were used and compared—VGG16, InceptionNetV3, ResNet50, and EfficientNetB0—among which the latter performed with better accuracy indices. Unlike other studies, Lee and Cho [33] introduced a deep learning framework for detecting the slight roof tilts of a traditional wooden structure typical of South Korea. The proposed method involved binary normal–abnormal classification using images derived from video data acquired from CCTV cameras, with six environmental datasets for context condition generalization. Four Deep Learning models based on pre-training were employed: EfficientNetB0, EfficientNetB2, ShuffleNet_v2. and AlexNet. Among these, EfficientNetB0 and EfficientNetB2 models showed high prediction accuracy of 99.61% and 99.55%, respectively. Meanwhile, Wang N. et al. [34] proposed a novel strategy for the detection, segmentation, and measurement of glazed tiles from the roof of the Palace Museum in China. The proposed architecture had two levels: the first employed Faster R-CNN with ResNet101 to automatically detect and crop the glazed tile photographs with high accuracy; these cropped images formed a dataset for training the Mask R-CNN algorithm with the ResNet101+ FPN (+RPN) model, which was used in the second level to segment and measure the damage detected on the tiles, with an average accuracy of 97.5%. Regarding the quantitative measurement of the damaged area, since the output mask is binarized, where the damage region is 1 and the background is 0, the damaged area can be measured by summing the pixels of the damaged mask.
Innovative applications also involve the implementation of the developed models within web platforms directly accessible from mobile devices. In this regard, Kumar J.N.V.R.S. et al. [35] proposed an advanced model using the CNN deep learning algorithm to automatically detect two types of concrete historical building defects—cracks and spalls—processing images acquired with different resolutions and sizes from mobile devices, digital cameras, and the web. According to binary classification, the model classified images into two categories: cracks (0) and spalls (1), achieving high accuracy (98.25% for training, 97.73% for validation) with training on diverse concrete surface images. This approach was proven to be robust against noise interference and holds potential for widespread application in defect detection. The architecture is implemented in a cloud web application designed to improve automatic real-time defect identification using mobile devices.

3.1.2. Multi-Class Classification

The second methodology of image classification involves a multi-class approach, wherein the model is trained to classify each input instance into one of several predefined classes; therefore, the predictive model indicates the probability that a particular image belongs to a single class among those defined during training. In this specific case, multiple degradation classes are labeled for the training phase, enabling the classifier to automatically assign each new image in the dataset to one of the different degradation classes labeled by the operator.
According to this approach, Samhouri et al. [31], in a previously reviewed study, provided a second level of classification, where the five categories of degradation present in the stone masonry were labeled for the training phase. The CNN-based detection, localization, and classification model leveraged transfer learning from the VGG16 model for feature extraction and the classification of each degradation class. Instead, Hatir et al. [36] proposed the development of two models based on CNNs and ANNs to recognize different decay classes affecting historical stone facades in the Konya region. Each training image was labeled with one of eight degradation types (cracks, flaking, contour scaling, crust, efflorescence, biological colonization, erosion, and graffiti), and the average accuracy values for CNN and ANN were 99.4% and 93.95%, respectively, showing higher reliability for the first one. In the same way, Kwon and Yu [37] focused on the automatic identification of degradation forms affecting stone masonry, classifying them into four types: crack, loss, detachment, and biological colonization, evaluated separately. In this specific case, the model was trained using transfer learning on a pre-trained Faster R-CNN model that incorporated an RPN to improve its performance. The proposed method achieved an average confidence score of 94.6%, though lower values were observed when the edges of the damaged areas were not clearly defined due to overlapping with other decay forms.
Differently, Cardellicchio et al. [38] aimed to automate defect recognition in existing historic reinforced concrete (RC) bridges. The study focused on six types of defects affecting RC surfaces and structures, including cracks, deteriorated concrete, honeycombs, and moisture spots. They collected a database of images, each manually labeled for the predominant defect, and eight different CNN networks were applied via the transfer learning approach for automatic defect identification. The MobileNetV3Small network showed the best overall performance, with a validation mean Average Precision (mAP) index of 69.97%, and can be easily deployed on mobile devices, drones, and surveillance cameras to support rapid monitoring in the field. The innovative aspect is the use of eXplainable Artificial Intelligence (XAI) techniques, especially Class Activation Maps (CAMs), for interpreting results and providing a quasi-quantitative assessment of CNN reliability. To automate the recognition and classification of defects in historic buildings, Rodrigues et al. [39] used a CNN based on regions, selecting the Residual Network (ResNet) as the main backbone model for the classification task. Based on a dataset consisting of images of different anomalies from different buildings, individual and combined classes of anomalies were created for the training dataset, covering 14 classes. Different types of ResNet (18, 34, 50, and 101) were analyzed, with accuracy values (F1-score) all greater than 50%, except for ResNet101, which showed an accuracy level lower than 50%. In addition, the study aimed to integrate the results of automatic classification within a BIM environment, modeled using a scan-to-BIM approach. For this purpose, an add-in was developed to integrate the DL model into Revit 2022 software, linked to a mobile application through which users can upload new images and obtain their classification directly in the BIM environment in real time.
A different aspect was evaluated by Mehta et al. [40] who focused on the automatic assessment of five damage severity levels in historic buildings. Using a dataset of 4500 images, acquired from different types of devices (smartphones, digital cameras, and drones), each representing a class associated with a different degree of damage severity, CNN and Support Vector Machine (SVM) models were trained to predict the appropriate class for a given input image of a historic building. Overall, both models showed strength in classifying damage severity, with CNN achieving slightly higher accuracy (85.6% vs. 80.2%), highlighting the factors that impact the performance of ML methods.

3.1.3. Multi-Label Classification

From the perspective of detecting different typologies of pathologies present simultaneously, advancements in research adopting a multi-label approach are particularly noteworthy. Multi-label classification involves assigning multiple labels or categories to a single image. Unlike previous methods, this approach allows for the possibility that an image may simultaneously contain various distinct classes or multiple instances of the same class. Multi-label classification aims to train a model to accurately predict the presence or absence of each label within a given image, enabling a more detailed understanding of the content present in the image.
According to this approach, Hatir et al. [41] carried out further study on the classification of deterioration types on stone surfaces of historical monuments, including biological colonization, contour scaling, crack, higher plant, impact damage, microkarst, and missing parts, by identifying all forms of deterioration present in the same image. The authors used the Mask R-CNN algorithm, which integrates CNNs into region proposals, performs tests with end-to-end training, and applies masks. The ResNet101 model was employed to reduce training time, enhanced by FPN to detect small objects, while RPN further streamlined processing. The models achieved high accuracy, with a mAP of 97.91% for the training phase and AP values ranging from 89.624% to 100% for the testing phase on orthophoto images. Similarly, Meklati et al. [42] presented an automatic classification approach to surface damage on masonry using crowd-sourced images. Starting with photos acquired from users’ smartphones, the labeling phase associated the identified deterioration types (efflorescence, spalls, cracks, and mold) with each image, considering their simultaneous presence. Unlike previous methods, multi-label classification was performed locally, directly on smartphones, utilizing the MobileNetV2-based CNN network integrated into a mobile application for public use. In addition, an interesting aspect is the potential for larger-scale monitoring by tracking the detected damages on a geographical map, providing a global view of their distribution.
From this perspective, several studies have developed DL-based techniques employing a multi-label approach, specifically aimed at facilitating remote and real-time monitoring of the progression of degradation states over time. In the work conducted by Mishra et al. [43], defect detection on a historic monument was developed using ResNet 101-based Faster R-CNN architecture and YOLOv5, a single-stage object detection model, to identify and analyze four classes of defects: cracks, discoloration, exposed brick, and spalling. A real-time monitoring model was developed, with the image dataset also acquired by video captured from UAVs and mobile cameras. The fastest R-CNN model achieved an accuracy of 85.04%, while YOLOv5 achieved a maximum mAP of 93.7%, demonstrating its superiority in terms of speed and false detection rates. The work developed by Wang N. et al. [44] also employed an automatic damage detection method for historic brick masonry structures, using a Faster R-CNN with RPN model based on ResNet101 to identify two classes of degradation: efflorescence and spalling. The model was trained on images obtained from two orthophotos, achieving an average precision mAP value of 95% during testing. In addition, the model demonstrated good performance when tested on different-sized images and under different lighting conditions. Another interesting aspect of the work is the development of a real-time damage detection system: the first experiment utilizes an IP Webcam-based system, allowing videos collected from smartphones to be uploaded to a computer for real-time damage detection via a WLAN network; the second experiment involves a mobile monitoring system, where detection is performed in real-time directly on the smartphone.
Another interesting aspect proposed by several studies is the possibility of utilizing both two-dimensional and three-dimensional data from various sources. Liu et al. [45] proposed a combination of semantic image segmentation and photogrammetry to monitor changes in the HBE, particularly focusing on the segmentation of vegetation on stone masonry. The study compares a multi-label classification model that segments seven classes within the image (including plant, masonry, window, and hole) with a second binary segmentation model to further refine the results of the first one for the specific class related to vegetation presence. By comparing different CNN-based architectures, the DeepLabV3+ network (with the backbone ResNet101) showed the best performance, with an overall Intersection over Union (IoU) of 66.9% for seven classes (54.6% for the multi-label model and 56.2% for the binary model). Additionally, another interesting aspect is the three-dimensional restitution of the case study using crowdsourced images acquired on-site, where the output of the segmentation is reprojected to allow for the quantitative measurement of the area affected by vegetation. Pathak et al. [46] also proposed a novel pipeline for detecting and localizing degradation forms present on the surfaces of complex heritage structures using 2D and 3D data. The database consisted of images acquired from different angles and distances, as well as rendered images obtained from point clouds. The pipeline involved semantic segmentation for pre-processing these images, with manual labeling related to two types of damage, cracks and spalling, followed by the object detection phase using Faster R-CNN(+RPN) with various feature extractors (Resnet101, Inception V2, Inception-Resnet, and Resnet-FPN, Nas). The test phase demonstrated the transferability of the model domain by detecting damage on rendered images from 3D models not involved in the training process; the highest performance was achieved by the network with ResNet101 integrated with FPN, achieving a mAP of 58.19%. An interesting aspect of this work is the possibility of obtaining, from the detected damages in the images, correspondence with their three-dimensional location in the structure. Differently from previous works, Idjiaton et al. [47] developed a novel architecture to automatically detect stone degradation from images acquired for the 3D modeling of a building with limestone masonry. The most common damage in facades is spalling, which was manually labeled on orthomosaics obtained from photogrammetric surveys carried out by cameras on ground stations and drones. To augment the training dataset, the labeling was transferred to the images acquired during the survey, benefiting from the high-resolution photos and redundancy due to image overlap. For automatic degradation class detection, a neural network based on the YOLOv5 architecture was developed, with an improved version incorporating transformers to enhance the identification of small areas. The proposed method was compared with other architectures, achieving the best performance with an F1-score of 85% and a mAP of 81%.

3.1.4. Object Segmentation

Taking advantage of image segmentation, Gong et al. [48] introduced a novel method for the quantitative damage assessment of large historic buildings in remote areas using drone imagery. The approach first involved reconstructing a 3D mesh model from high-resolution oblique images acquired by a drone. The second step involved 2D object segmentation, with the results then reprojected onto the 3D mesh models. For more accurate object segmentation, the authors proposed the Mask R-CNN algorithm, enhanced with an advanced edge-enhanced method, which exploits a region-based CNN and a gradient enhancement strategy. Based on the reprojected segmentation results on the 3D model and the symmetry characteristics of the objects, the damage condition of 3D objects was estimated by evaluating volume reduction, defining three levels of damage: no damage or mild damage (less than 30% reduction), moderate damage (30~60%), and severe damage (more than 60%). Regarding 2D segmentation accuracy, the experimental results showed a mAP value of 93.23%; however, the limitations include volume dependence as an indicator of damage and applicability only to symmetrical structures.

3.2. Buildings Crack Detection

In the field of the diagnostic assessment of buildings on the HBE, some studies specifically address the issue of automatic crack detection via DL architectures. The developed DL algorithms and the adopted classification methodologies are highlighted (Table 3).

3.2.1. Binary Classification

Given the specificity of the problem, the binary classification approach is well-suited for crack identification, distinguishing between two classes: cracks and no cracks. For this purpose, Elhariri et al. [49] introduced an automated deep crack segmentation approach using variants of the U-Net deep learning model to achieve pixel-level crack segmentation in images of historic buildings. U-Net is a convolutional encoder–decoder network designed for end-to-end semantic segmentation tasks with limited amounts of data. Three U-Net architectures, including Deep ResU-Net, ResU-Net++, and U2-Net, were explored. U2-Net demonstrated superior performance, with an average Intersection over Union (mIoU) value of 78.38% due to its ability to effectively capture small and superficial cracks, including those located on image edges or in images with varying lighting and focus conditions. With the aim of accurately detecting cracks in historic masonry structures, Haciefendioglu et al. [50] developed an innovative segmentation model called CAM-K-SEG that integrates Grad-CAM visualization and K-Mean clustering with pre-trained convolutional neural network models. By comparing the performance metrics of several pre-trained CNN models, including VGG16, VGG19, Inception-V3, Xception, ResNet50, and a customized CNN model, the ResNet50 model was selected. A dataset of high-resolution photographs acquired from smartphones was used for training, and the performance of the proposed method was evaluated through a comparative analysis with the U-Net segmentation model. Evaluating the results according to the IoU metric, with an average value of 70%, the CAM-K-SEG model demonstrated superior ability in object identification and crack localization, whereas the U-Net model was more proficient in crack area segmentation. Differently, Reis and Khoshelham [51] concentrated on automatic crack detection in historic concrete buildings. They developed a novel DL architecture called ReCRNet (Residual CRack Detection Network), which was applied to close-range images acquired from drones. The performance was compared with traditional ML-based classification methods (PCA + Linear SVM, PCA + DecisionTree) and DL-based classification methods (AlexNet, VGG19), showing superior performance in terms of accuracy, precision, recall, and F1-score. In addition, the proposed model leverages residual learning, resulting in a lightweight architecture designed for fast and high-precision classification and suitable for personal computers and small datasets. Unlike standard residual networks, such as ResNet, which are too large for binary classification tasks, this model was optimized for efficiency and performance on limited training samples.

3.2.2. Edge Extraction

Unlike previous studies, Bakirman et al. [52] employed innovative DL architectures to extract stone contours and cracks in masonry using two semantic segmentation algorithms for contour extraction: DexiNed with an Xception-based architecture and RCF with a VGG16-based architecture. The two models were trained on an image dataset acquired from the Internet, digital cameras, and drones. After the testing phase, a binarization approach was applied to the predictive images to optimize the visualization of the results. The results showed that the background and vegetation can influence extraction performance, as well as the structural or material characteristics of the stones. Indeed, the highest F1-score values were obtained for objects with homogeneous edges, reaching values of 61.38% for the first model and 61.50% for the second. The best solution was tested on an independent orthoimage derived from low-cost UAVs.

3.3. Post-Disaster Damage at District Scale

The review investigation highlights a further research topic concerning the application of AI techniques for assessing and monitoring the infrastructure and historic building conditions at the district scale. Specifically, studies focus on the potential of DL techniques as tools for the timely identification of damage caused by natural disasters and informed management of emergencies. The relevant works are presented according to the adopted classification methodology (Table 4).

3.3.1. Binary Classification

According to the binary classification method, Kumar P. et al. [53] presented a method for the early detection of the HBE damage caused by disasters, specifically earthquakes, using data from social media, a timely and large-scale source. Images depicting historic built sites affected by post-disaster damage were collected and annotated according to two classes: Damage/No Damage. Several convolutional neural networks (VGG16, ResNet50, DenseNet121, InceptionResNetV2, Xception, and NASNetLarge), in combination with a variety of classification algorithms (Logistic Regression, Support Vector Machines, Random Forests, and AdaBoost), were tested and compared. For this specific problem, the model trained using the logistic regression algorithm on DenseNet121 featured performed the best, achieving classification accuracy values of 81% for the “Damage” class and 94% for the “No damage” class. This disparity is likely related to the imbalanced distribution of damage and no-damage images in the dataset.

3.3.2. Multi-Class Classification

An additional analysis conducted on images related to post-disaster damage situations focused on distinguishing the severity of the detected damage. Following a multi-class approach, DL-based models can assign new images to their respective severity classes. Moreover, some studies have sought methods to overcome the lack of post-disaster data by classifying new data using training conducted on historical data related to events and areas with different characteristics.
Among these, Lin D. et al. [54] explored a different approach to the early detection of post-disaster building damage. In their study, a novel Domain Adaptation (DA) framework was developed. This method aimed to adapt a model learned from a dataset related to a specific context (source domain) to perform in a different context (target domain), where data may have different characteristics. To validate the effectiveness of the proposed method, two damage detection tasks were studied using post-earthquake and post-hurricane datasets. Satellite imagery related to the Haiti disaster constituted the source domain, while the target domain included imagery from the Yushu earthquake. The method was further validated on datasets from post-hurricane Sandy (source domain) and Irma (target domain). In both cases, the samples were classified into three categories: undamaged, damaged, and other. The model aimed to achieve category-level feature fitting for complex image classification tasks by combining the variational autoencoder (VAE) and the Gaussian mixture model (GMM). Specifically, the GMM was used to characterize the distribution of the source domain, while the VAE estimated the distribution of the target domain, aligning each category feature with the corresponding category of the source domain by minimizing the KL divergence between the two domains. The experimental results revealed good performance of the model, reaching an accuracy value of 78.2% for post-earthquake images and 94.3% for post-hurricane images. Similarly, Lin Q. et al. [55] addressed the challenge of transferring models trained on historical datasets to novel data related to different contexts through transfer learning approaches. For this purpose, based on post-earthquake building damage data (Ludian and Yushu earthquakes), remote sensing images (aerial and UAV images) were labeled according to four damage classes (No observable damage, Light damage, Heavy damage, and Collapse) by examining building characteristics, including contour, geometry, texture, and relationship with the surrounding area. To assess the seismic damage class of buildings, the proposed DL model architecture consisted of a CNN feature extractor, the VGG network, and an ordered regression classifier. The VGG-OR network was employed to develop a model trained on the Ludian dataset. Subsequently, four schemes were designed and compared to evaluate the performance of different transfer learning programs. Among them, a novel data transfer algorithm was proposed to identify potential historical data samples that are advantageous for the new dataset. This algorithm was used in conjunction with the training samples of the Yushu dataset. The results showed that when the training set of the new task reached about 10% of the historical data, the model achieved an overall accuracy of about 74%, improving the accuracy of seismic building damage assessment in data-limited contexts and making it applicable to new disaster scenarios.
Instead, the research conducted by Presa-Reyes and Chen [56] addressed the need for accurate analysis of the impact of natural disasters on infrastructure, including historic buildings, using advanced technologies such as CNNs and data fusion. The aim was to compare aerial images acquired pre- and post-disaster, specifically Hurricane Irma. A two-stream CNN architecture was proposed, which preprocessed the data to focus on the central buildings in the image, taking into account the surrounding context, and performed deep feature fusion from pre- and post-disaster image pairs, as opposed to single-image analysis. The proposed CNN streams were based on a pre-trained ResNet50 architecture, which was trained on images taken after the disaster event and labeled according to four damage levels (No Damage, Minor Damage, Major Damage, and Destroyed). The experimental results demonstrated the effectiveness of CNNs in differentiating building damage levels when incorporating contextual information and feature fusion techniques, achieving an average F1-score of 94%. However, on the other hand, the model demonstrated lower performance when classifying the most severe damage levels due to the limited number of available samples and the overlap of different damage levels between adjacent buildings in the same image.

3.3.3. Multi-Label Classification

Presa-Reyes and Chen conducted further work [57] in order to identify damaged buildings from remote sensing images and predict the building’s damage level after a disaster event using a multi-label approach. Unlike the previous method, the proposed model removed the dependence on available building footprint geometries. It assumed that only an estimate of the location and damage level of the damaged building is available to train the model, assuming a scenario where data are limited. In this work, an end-to-end CNN based on ResNet50 architecture was developed to learn deep feature matching from pre- and post-disaster images. Given a pair of images, the goal was to generate a two-dimensional predictive patch in a regression-based approach, where each cell in the patch contains a value representing the predicted level of damage. Tests conducted on large-scale satellite imagery and aerial photographs (Hurricane Irma dataset) demonstrated the model’s effectiveness in identifying damaged buildings and predicting damage levels (No damage/affected/minor damage/major damage/destroyed), with lower Mean Square Error (MSE) and Mean Absolute Error (MAE) values than other configurations. Instead, Wang Y. et al. [58] presented a novel two-step solution for automatic building damage detection using satellite imagery, addressing the challenges of imbalanced data distribution. Two separately trained prediction models were developed: one for building location, which identifies building features from the image background through binary classification, and one for damage level classification (No Damage, Minor Damage, and Major Damage). By incorporating the disaster context, an additional feature (e.g., distance between buildings) was introduced as a model input function when concatenated with the 2D images for the classification model. To address the highly imbalanced problem in damage classification, an incremental learning strategy was proposed, which exploited a large number of training data subsets with imposed normality in their data distribution to continuously train and update the classification model. An experimental study tested the proposed approach on an open-source satellite dataset on three catastrophic events, including earthquakes, floods, and tsunamis. The location model achieved a test accuracy of 97.29% and an IoU of 53.78%. In contrast, the proposed incremental learning strategy for highly imbalanced class distributions, tested on buildings extracted from satellite images of the “Mexico-earthquake” disaster event, achieved an accuracy score of 99.55% and a weighted average F1-score of 99.53%.

4. Discussion

The analysis of the scientific contributions considered in this scoping review highlights the current research directions on which the scientific community is focusing in the context of using innovative techniques to support activities related to the early diagnosis of damage affecting the HBE and timely recovery and maintenance interventions while leaving new challenges open in the field (Q1§2).
The studies reviewed present a variety of image segmentation and classification approaches, showcasing the development of increasingly advanced DL algorithms. Notably, the analysis reveals a growing emphasis on the integration of different technologies using multiple data sources, including 2D and 3D data, in damage detection and monitoring processes, along with the adoption of multi-level approaches that combine different DL techniques for decay segmentation, classification, and localization. Several studies also focused on the practical applicability of the proposed models by considering the integration of automated solutions into conservation and maintenance processes, and into real-world monitoring systems for historic heritage. The goal is to develop effective and efficient tools to support experts in HBE management and preservation. Challenges in this area include optimizing neural network architectures to ensure accurate and fast performance even on small datasets and limited processing platforms, as well as adapting to the different types of damage and materials found at cultural heritage sites.
At the district scale, studies have focused on the early detection of damage caused by natural disasters, leveraging timely and large-scale data, including data from social media and satellite imagery. Given the difficulty of obtaining images immediately after a disaster, some studies have addressed the issue of adapting models to contexts with different characteristics. In addition, comparative pre- and post-disaster analyses are conducted to assess the impact of disasters on HBE. Challenges in this area include ensuring sufficient training data in a post-disaster context and the potential generalization of DL-based models trained on different contexts, as well as accuracy in differentiating the identified damage levels in order to promptly address interventions according to a priority scale.
The main challenge in this field lies in making these techniques accessible and practical for a wide range of historic built sites, simplifying their implementation and use throughout the various stages of conservation status detection and diagnosis, while also reducing associated costs (Q2§2). In this context, the advent of modern technologies supporting monitoring activities and the assessment of damage and degradation that historical structures face over time also impacts survey techniques and tools used for data acquisition. Focusing on the analysis of the methodologies proposed in the scientific literature, it is evident that technological innovation allows for the use of a wide range of advanced tools, which are easy to use and widely available, enabling the implementation of rapid, efficient, and highly repeatable survey systems over time.
In the context of surface degradation analysis and crack pattern mapping at the building level, there is a well-established trend towards using images acquired from common digital cameras, which are widely available on the market at relatively low costs. These cameras are capable of providing high-resolution images, enhancing the accuracy of the analyses conducted on them, as well as offering extensive flexibility and versatility during the data acquisition phase. Similarly, there is research on the use of images and videos acquired by smartphone cameras, which offer the significant advantage of utilizing tools that are not only accessible to everyone but are also suitable for even faster surveys. These tools can quickly transfer the captured images to storage devices or online platforms for data processing and sharing, even for real-time monitoring activities. For building-scale monitoring, early experiments with DL architectures are being conducted using images and videos acquired from surveillance cameras installed on-site. These instruments allow continuous data acquisition and archiving, even in areas with limited access, thus enabling the creation of a historical database of the building’s conditions that is useful for comparative analysis over time.
The potential of individual tools is further enhanced by the implementation of various outputs according to a multi-source approach, which explores the use of multiple data from different devices to increase the accuracy of models and the effectiveness of the assessments conducted. From this perspective, the use of 3D photogrammetric data, acquired through cameras, drones, and laser scanners, not only allows the three-dimensional representation of artifacts, serving as a basis for geometric and reality-based surveys, but also the assessment of a building’s state of conservation using the generated orthophotos, which are useful both as a basis for manual labeling and as a test dataset for DL-based models aimed at locating pathologies in space.
Finally, the recent experimentation of DL architectures on images from the web and social media related to historic built sites is noteworthy. Due to the wide availability of data, variations in acquisition conditions, and diverse camera perspectives, it is possible to create datasets of representative and diversified images that can be used in the training phase of DL algorithms for better generalization of prediction models. Furthermore, on a larger scale, this type of imagery could provide an important contribution to the acquisition of a substantial amount of data in emergency situations, following natural disasters, where the availability of information is often limited and delayed, thus offering the possibility for the timely identification and monitoring of critical issues affecting buildings and/or historic districts.
The advantages emerging from the multi-level approaches adopted are essentially reflected in two substantial aspects that highlight the potential of DL techniques in supporting decision-making phases in the diagnostic and maintenance process of buildings (Q3§2).
Firstly, although still limited in the current context, the use of DL-based techniques applied to photogrammetric data presents strong prospects and potential for diagnostic and maintenance applications. The existing literature often references applications on random images to expand datasets and test the effectiveness of algorithms, without considering the specificity of the context or other relevant information for such purposes. In contrast, the automated mapping of degradation pathologies applied to photogrammetric data offers a triple benefit for diagnosis purposes: (i) a better understanding and interpretation of results at a qualitative level, thanks to three-dimensional and reality-based visualization, including information on materials, construction techniques, and context, as well as the spatial localization of identified damage; (ii) a quantitative assessment of the identified damaged areas, enabling concrete planning of intervention measures, also considering time and cost factors; and (iii) classification based on damage severity levels, derived from the qualitative and quantitative information obtained, to guide decision-makers in the recovery process when defining a priority scale for interventions.
Secondly, the various methodological approaches for automated pathology detection applied through the use of different technologies highlight the potential of DL models in conducting real-time monitoring of HBE, even in inaccessible or dangerous environments, enabling the remote inspection of buildings and the planning of rapid and targeted interventions.
The development of lightweight DL architectures capable of real-time performance even on devices with limited processing power, such as mobile devices, makes the diagnostic process more efficient and effective, thanks to continuous and large-scale monitoring in terms of detection, localization, and classification of structural defects, either through connection to a PC workstation for data transfer and real-time monitoring or through instant real-time detection directly on smartphones. On a broader scale, the integration of DL-based models with crowd-sensing technology provides a global damage monitoring method that enables distributed image collection by users via smartphones. This decentralized approach makes the diagnostic process more accessible and allows for geographic mapping of damage, providing a global view of the distribution of critical issues and their evolution over time, thereby facilitating an accurate assessment of causes and contributing factors that need to be addressed.
The practical implementation of the developed models in real monitoring and maintenance systems of HBE becomes essential to maximize the contribution that these technologies can offer in the planning and scheduling of conservation actions. Practical applications of research results in collaborative and interoperable environments and tools for shared information management among different stakeholders actively involved in the decision-making system are still limited (Q4§2). Among these applications, a cloud-based web platform development for the real-time automated uploading and processing of defects in the dataset acquired from mobile devices stands out. Additionally, the development of a DL architecture integrated into a mobile application allows citizens to independently recognize building defects without specific knowledge in the field of HBE conservation, while also contributing to the constant acquisition of updated data for urban-scale monitoring activities by professionals.
The integration of a specific DL model within the BIM environment, aimed at creating an innovative workflow that integrates AI-aided automated solutions for recognizing and classifying degradation pathologies, is also still underexplored. This could support the generation of a comprehensive, measurable, and sharable representation of information, aiding the implementation of the most appropriate FM/AM strategies. The implementation of innovative AI-aided technologies in the field of built heritage makes these underexplored areas fertile ground for future developments and research, opening up new perspectives for the systematization of data related to the various phases of the process within environments and/or platforms that serve as organized and interoperable containers for the different stakeholders involved.
The application of DL techniques in the analyzed context reveals the limitations of methodologies that are still in the experimental and development stages, leaving open several opportunities for improving the effectiveness and generalization of damage detection models in historic structures (Q5§2).
Firstly, there is an inherent limitation specific to the field of HBE, related to the availability of limited training databases and the resulting need for a substantial amount of labeled data. This necessitates significant effort from experienced professionals in the process of manual image identification and mapping, which are fundamental requirements for adequate algorithm training and satisfactory accuracy values in predictive models. Secondly, the experiments tend to focus on specific categories of damage, highlighting the need to extend these experiments to a broader range of recurring pathologies in historic heritage.
To overcome these limitations, future research could focus on increasing the training dataset to ensure greater representativeness of all defect categories, thereby improving the accuracy and reliability of the model. Additionally, exploring advanced transfer learning and model adaptation techniques could help leverage larger datasets with different characteristics from the context under investigation. In this regard, making shared data accessible and open to professionals could be advantageous, increasing the training dataset based on the specific peculiarities of the studied assets and the historical context in which they are located.
Further future developments could focus on the development of more advanced hyperparameters and lighter architectures capable of real-time responsiveness, even on devices with limited processing power. This would allow the implementation of models in mobile systems, such as apps and websites, which are easy to use even for non-experts, enabling them to contribute to the aforementioned databases by providing useful information about the conservation state of buildings via their smartphones. At the same time, similar technologies could be useful to experts for collecting information through rapid inspections and real-time on-site monitoring of the progress of deterioration.
From a monitoring perspective, new research directions could explore on-site damage detection techniques and long-term monitoring, such as the use of sensors and IoT systems directly linked to external databases that can be integrated into digital models, where real-time data can be converged and combined with other types of multimodal data from non-destructive inspection techniques such as thermal imaging, ultrasonic testing, or remote sensing information such as LIDAR data.
Finally, it would be valuable to integrate, in this manner, various diagnostic data and image-based mapping results into broader cultural heritage management systems, such as HBIM models and geospatial maps, to facilitate a clearer interpretation of results at both building and district scales, following an approach that facilitates data sharing and collaborative work among specialists. Moreover, these digital models could also become useful tools for planned maintenance and conservation, integrating information on intervention actions, corrective and preventive maintenance to be implemented, or emergency control and response actions in the event of building damage induced by natural disasters.

5. Conclusions

The proposed scoping review aimed to provide an overview of the applications of DL architectures for classifying the decay phenomena of the HBE, focusing on the significant implications for decision-making in the built heritage conservation and management. To achieve this, documents that reflected the keywords related to the topic were selected, highlighting the innovative nature of the field under examination. The analysis shows a strong growth trend beginning in 2020, with significant development prospects in the coming years.
The main topics addressed in the literature focus on the automated identification of surface degradation pathologies and cracks affecting buildings within the HBE, as well as post-disaster damage assessments at a broader scale. The analysis of the reviewed studies was structured to highlight the different segmentation and classification methodologies employed, the architectures developed for specific objectives, and the results obtained in terms of accuracy and precision. This approach aims to highlight the potential and limitations of each experiment, providing guidance for future research and developments based on the specific characteristics of the cases under examination.
The literature reveals significant progress in the efficiency of DL-based models applied to images, offering increasingly accurate assessments of damage affecting heritage at both the building and district scales, with the potential for monitoring changes over time. The integration of DL models with increasingly advanced surveying tools enables remote observation and decay mapping activities, reducing the need for direct on-site presence, even in inaccessible or hazardous locations. Furthermore, the advantages of applying automated degradation identification systems to data acquired through photogrammetric surveys are evident. These systems convert image data into valuable qualitative and quantitative information, providing a comprehensive and detailed understanding of the spatial localization and extent of pathologies. Another significant benefit is the large amount of two-dimensional data that this approach generates, which are essential for image labeling and model training phases.
The potential for implementing DL-based systems directly on mobile devices, capable of real-time monitoring of building conditions, is also noteworthy, as is the use of crowdsourced images that leverage the contributions of multiple users to enrich datasets. These contactless, fast, and cost-effective approaches facilitate the early detection of structural and surface building issues and the evaluation of repair and recovery interventions, ensuring informed decision-making for appropriate maintenance and conservation strategies.
In this context, the potential to integrate DL models within post-disaster heritage management practices is underscored, offering significant improvements in damage assessment operations and contributing to reducing response times in emergencies.
In conclusion, several challenges and development directions remain open in this field. To address issues related to the limited availability of data concerning the diverse typological characteristics typical of the HBE, the creation of open-source databases organized by construction type could significantly accelerate the training and prediction phases for specific case studies, enabling the accurate localization and measurement of degradation. This approach would be fully aligned with the primary goal of the research: minimizing survey, diagnosis, and monitoring times, while ensuring precise measurement data to define appropriate intervention measures. Furthermore, the implementation of DL models in interoperable environments remains limited. Such environments could merge qualitative and quantitative results with data from other types of integrable diagnostic analyses and sensor systems. A future challenge lies in directly linking these data to corresponding intervention actions, timelines, and costs in order to provide professionals with a comprehensive and shareable tool for heritage diagnosis and management practices.

Author Contributions

Conceptualization, F.F. and V.G.; methodology, V.G.; resources, V.G.; data curation, V.G.; writing—original draft preparation, V.G.; writing—review and editing, V.G.; visualization, F.F.; supervision, F.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Valero, E.; Forster, A.; Bosché, F.; Hyslop, E.; Wilson, L.; Turmel, A. Automated defect detection and classification in ashlar masonry walls using machine learning. Autom. Constr. 2019, 106, 102846. [Google Scholar] [CrossRef]
  2. Galantucci, R.A.; Lasorella, M.; De Fino, M. A Rapid pipeline for periodic inspection and maintenance of architectural surfaces. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch. 2023, 48, 621–628. [Google Scholar] [CrossRef]
  3. Pavoni, G.; Giuliani, F.; De Falco, A.; Corsini, M.; Ponchio, F.; Callieri, M.; Cignoni, P. On Assisting and Automatizing the Semantic Segmentation of Masonry Walls. J. Comput. Cult. Herit. 2022, 15, 1–17. [Google Scholar] [CrossRef]
  4. Llamas, J.; Lerones, P.M.; Medina, R.; Zalama, E.; Gómez-García-Bermejo, J. Classification of architectural heritage images using deep learning techniques. Appl. Sci. 2017, 7, 992. [Google Scholar] [CrossRef]
  5. Ceravolo, R.; Invernizzi, S.; Lenticchia, E.; Matteini, I.; Patrucco, G.; Spanò, A. Integrated 3D Mapping and Diagnosis for the Structural Assessment of Architectural Heritage: Morano’s Parabolic Arch. Sensors 2023, 23, 6532. [Google Scholar] [CrossRef] [PubMed]
  6. El Masri, Y.; Rakha, T. A scoping review of non-destructive testing (NDT) techniques in building performance diagnostic inspections. Constr. Build. Mater. 2020, 265, 120542. [Google Scholar] [CrossRef]
  7. Tejedor, B.; Lucchi, E.; Bienvenido-Huertas, D.; Nardi, I. Non-destructive techniques (NDT) for the diagnosis of heritage buildings: Traditional procedures and futures perspectives. Energy Build. 2022, 263, 112029. [Google Scholar] [CrossRef]
  8. De Fino, M.; Galantucci, R.A.; Fatiguso, F. Condition Assessment of Heritage Buildings via Photogrammetry: A Scoping Review from the Perspective of Decision Makers. Heritage 2023, 6, 7031–7066. [Google Scholar] [CrossRef]
  9. Zhao, J.; Hua, X.; Yang, J.; Yin, L.; Liu, Z.; Wang, X. A Review of Point Cloud Segmentation of Architectural Cultural Heritage. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 10, 247–254. [Google Scholar] [CrossRef]
  10. Argyrou, A.; Agapiou, A. A Review of Artificial Intelligence and Remote Sensing for Archaeological Research. Remote Sens. 2022, 14, 6000. [Google Scholar] [CrossRef]
  11. Fiorucci, M.; Khoroshiltseva, M.; Pontil, M.; Traviglia, A.; Del Bue, A.; James, S. Machine Learning for Cultural Heritage: A Survey. Pattern Recognit. Lett. 2020, 133, 102–108. [Google Scholar] [CrossRef]
  12. Mishra, M. Machine learning techniques for structural health monitoring of heritage buildings: A state-of-the-art review and case studies. J. Cult. Herit. 2021, 47, 227–245. [Google Scholar] [CrossRef]
  13. Rossi, M.; Bournas, D. Structural Health Monitoring and Management of Cultural Heritage Structures: A State-of-the-Art Review. Appl. Sci. 2023, 13, 6450. [Google Scholar] [CrossRef]
  14. Mishra, M.; Lourenço, P.B. Artificial intelligence-assisted visual inspection for cultural heritage: State-of-the-art review. J. Cult. Herit. 2024, 66, 536–550. [Google Scholar] [CrossRef]
  15. Latifi, R.; Hadzima-Nyarko, M.; Radu, D.; Rouhi, R. A Brief Overview on Crack Patterns, Repair and Strengthening of Historical Masonry Structures. Materials 2023, 16, 1882. [Google Scholar] [CrossRef] [PubMed]
  16. Li, Y.; Du, Y.; Yang, M.; Liang, J.; Bai, H.; Li, R.; Law, A. A review of the tools and techniques used in the digital preservation of architectural heritage within disaster cycles. Herit. Sci. 2023, 11, 199. [Google Scholar] [CrossRef]
  17. Siountri, K.; Anagnostopoulos, C.N. The Classification of Cultural Heritage Buildings in Athens Using Deep Learning Techniques. Heritage 2023, 6, 3673–3705. [Google Scholar] [CrossRef]
  18. Grilli, E.; Özdemir, E.; Remondino, F. Application of machine and deep learning strategies for the classification of heritage point clouds. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch. 2019, 42, 447–454. [Google Scholar] [CrossRef]
  19. Vandenabeele, L.; Loverdos, D.; Pfister, M.; Sarhosis, V. Deep Learning for the Segmentation of Large-Scale Surveys of Historic Masonry: A New Tool for Building Archaeology Applied at the Basilica of St Anthony in Padua. Int. J. Archit. Herit. 2023, 1–13. [Google Scholar] [CrossRef]
  20. Garrido, I.; Erazo-Aux, J.; Lagüela, S.; Sfarra, S.; Ibarra-Castanedo, C.; Pivarˇciová, E.; Gargiulo, G.; Maldague, X.; Arias, P. Introduction of Deep Learning in Thermographic Monitoring of Cultural Heritage and Improvement by Automatic Thermogram Pre-Processing Algorithms. Sensors 2021, 21, 750. [Google Scholar] [CrossRef]
  21. Musicco, A.; Galantucci, R.A.; Bruno, S.; Verdoscia, C.; Fatiguso, F. Automatic point cloud segmentation for the detection of alterations on historical buildings through an unsupervised and clustering-based machine learning approach. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, 5, 129–136. [Google Scholar] [CrossRef]
  22. Galantucci, R.A.; Musicco, A.; Verdoscia, C.; Fatiguso, F. Machine Learning for the Semi-Automatic 3D Decay Segmentation and Mapping of Heritage Assets. Int. J. Archit. Herit. 2023, 1–19. [Google Scholar] [CrossRef]
  23. Grilli, E.; Remondino, F. Classification of 3D digital heritage. Remote Sens. 2019, 11, 847. [Google Scholar] [CrossRef]
  24. Iraniparast, M.; Ranjbar, S.; Rahai, M.; Moghadas Nejad, F. Surface concrete cracks detection and segmentation using transfer learning and multi-resolution image processing. Structures 2023, 54, 386–398. [Google Scholar] [CrossRef]
  25. Dais, D.; Bal, İ.E.; Smyrou, E.; Sarhosis, V. Automatic crack classification and segmentation on masonry surfaces using convolutional neural networks and transfer learning. Autom. Constr. 2021, 125, 103606. [Google Scholar] [CrossRef]
  26. Stoean, R.; Bacanin, N.; Stoean, C.; Ionescu, L.; Atencia, M.; Joya, G. Computational framework for the evaluation of the composition and degradation state of metal heritage assets by deep learning. J. Cult. Herit. 2023, 64, 198–206. [Google Scholar] [CrossRef]
  27. Liu, J.; Lau, S.; Wang, X.; Luo, S.; Lee, V.C.S.; Ding, L. Automated pavement crack detection and segmentation based on two-step convolutional neural network. Comput. Civ. Infrastruct. Eng. 2020, 35, 1291–1305. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Zhang, Z.; Zhao, W.; Li, Q. Crack Segmentation on Earthen Heritage Site Surfaces. Appl. Sci. 2022, 12, 12830. [Google Scholar] [CrossRef]
  29. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Syst. Rev. 2021, 10, 89. [Google Scholar] [CrossRef] [PubMed]
  30. Bruno, S.; Galantucci, R.A.; Musicco, A. Decay detection in historic buildings through image-based deep learning. VITRUVIO 2023, 8, 6–17. [Google Scholar] [CrossRef]
  31. Samhouri, M.; Al-Arabiat, L.; Al-Atrash, F. Prediction and measurement of damage to architectural heritages facades using convolutional neural networks. Neural Comput. Appl. 2022, 34, 18125–18141. [Google Scholar] [CrossRef]
  32. Lee, J.; Yu, J.M. Automatic Surface Damage Classification Developed Based on Deep Learning for Wooden Architectural Heritage. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 10, 151–157. [Google Scholar] [CrossRef]
  33. Lee, S.Y.; Cho, H.H. Damage Detection and Safety Diagnosis for Immovable Cultural Assets Using Deep Learning Framework. In Proceedings of the 25th International Conference on Advanced Communication Technology (ICACT), Pyeongchang, Republic of Korea, 19–22 February 2023; pp. 3310–3313. [Google Scholar] [CrossRef]
  34. Wang, N.; Zhao, X.; Zou, Z.; Zhao, P.; Qi, F. Autonomous damage segmentation and measurement of glazed tiles in historic buildings via deep learning. Comput. Civ. Infrastruct. Eng. 2020, 35, 277–291. [Google Scholar] [CrossRef]
  35. Kumar, J.N.V.R.S.; Indira, D.N.V.S.L.S.; Veerendra, G.T.N. A Cloud Application for Detecting Building Defects using CNN. In Proceedings of the International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India, 28–29 January 2022; pp. 1–8. [Google Scholar] [CrossRef]
  36. Hatir, M.E.; Barstuğan, M.; İnce, İ. Deep learning-based weathering type recognition in historical stone monuments. J. Cult. Herit. 2020, 45, 193–203. [Google Scholar] [CrossRef]
  37. Kwon, D.; Yu, J. Automatic damage detection of stone cultural property based on deep learning algorithm. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch. 2019, 42, 639–643. [Google Scholar] [CrossRef]
  38. Cardellicchio, A.; Ruggieri, S.; Nettis, A.; Renò, V.; Uva, G. Physical interpretation of machine learning-based recognition of defects for the risk management of existing bridge heritage. Eng. Fail. Anal. 2023, 149, 107237. [Google Scholar] [CrossRef]
  39. Rodrigues, F.; Cotella, V.; Rodrigues, H.; Rocha, E.; Freitas, F.; Matos, R. Application of Deep Learning Approach for the Classification of Buildings’ Degradation State in a BIM Methodology. Appl. Sci. 2022, 12, 7403. [Google Scholar] [CrossRef]
  40. Mehta, S.; Kukreja, V.; Gupta, A. Exploring the Efficacy of CNN and SVM Models for Automated Damage Severity Classification in Heritage Buildings. In Proceedings of the 2nd International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), Trichy, India, 23–25 August 2023. [Google Scholar] [CrossRef]
  41. Hatir, M.E.; Korkanç, M.; Schachner, A.; Ince, I. The deep learning method applied to the detection and mapping of stone deterioration in open-air sanctuaries of the Hittite period in Anatolia. J. Cult. Herit. 2021, 51, 37–49. [Google Scholar] [CrossRef]
  42. Meklati, S.; Boussora, K.; Abdi, M.E.H.; Berrani, S.A. Surface Damage Identification for Heritage Site Protection: A Mobile Crowd-sensing Solution Based on Deep Learning. J. Comput. Cult. Herit. 2023, 16, 25. [Google Scholar] [CrossRef]
  43. Mishra, M.; Barman, T.; Ramana, G.V. Artificial intelligence-based visual inspection system for structural health monitoring of cultural heritage. J. Civ. Struct. Heal. Monit. 2024, 14, 103–120. [Google Scholar] [CrossRef]
  44. Wang, N.; Zhao, X.; Zhao, P.; Zhang, Y.; Zou, Z.; Ou, J. Automatic damage detection of historic masonry buildings based on mobile deep learning. Autom. Constr. 2019, 103, 53–66. [Google Scholar] [CrossRef]
  45. Liu, Z.; Brigham, R.; Long, E.R.; Wilson, L.; Frost, A.; Orr, S.A.; Grau-Bové, J. Semantic segmentation and photogrammetry of crowdsourced images to monitor historic facades. Herit. Sci. 2022, 10, 27. [Google Scholar] [CrossRef]
  46. Pathak, R.; Saini, A.; Wadhwa, A.; Sharma, H.; Sangwan, D. An object detection approach for detecting damages in heritage sites using 3-D point clouds and 2-D visual data. J. Cult. Herit. 2021, 48, 74–82. [Google Scholar] [CrossRef]
  47. Idjaton, K.; Janvier, R.; Balawi, M.; Desquesnes, X.; Brunetaud, X.; Treuillet, S. Detection of limestone spalling in 3D survey images using deep learning. Autom. Constr. 2023, 152, 104919. [Google Scholar] [CrossRef]
  48. Gong, Y.; Zhang, F.; Jia, X.; Huang, X.; Li, D.; Mao, Z. Deep neural networks for quantitative damage evaluation of building losses using aerial oblique images: Case study on the great wall (China). Remote Sens. 2021, 13, 1321. [Google Scholar] [CrossRef]
  49. Elhariri, E.; El-Bendary, N.; Taie, S.A. Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models. Algorithms 2022, 15, 281. [Google Scholar] [CrossRef]
  50. Hacıefendioğlu, K.; Altunışık, A.C.; Abdioğlu, T. Deep Learning-Based Automated Detection of Cracks in Historical Masonry Structures. Buildings 2023, 13, 3113. [Google Scholar] [CrossRef]
  51. Reis, H.C.; Khoshelham, K. ReCRNet: A deep residual network for crack detection in historical buildings. Arab. J. Geosci. 2021, 14, 13. [Google Scholar] [CrossRef]
  52. Bakirman, T.; Kulavuz, B.; Bayram, B. Use of Artificial Intelligence Toward Climate-Neutral Cultural Heritage. Photogramm. Eng. Remote Sens. 2023, 89, 163–171. [Google Scholar] [CrossRef]
  53. Kumar, P.; Ofli, F.; Imran, M.; Castillo, C. Detection of disaster-affected cultural heritage sites from social media images using deep learning techniques. J. Comput. Cult. Herit. 2020, 13, 23. [Google Scholar] [CrossRef]
  54. Lin, D.; Wang, J.; Li, Y. Unsupervised building damage identification using post-event optical imagery and variational autoencoder. IEICE Trans. Inf. Syst. 2021, E104D, 1770–1774. [Google Scholar] [CrossRef]
  55. Lin, Q.; Ci, T.; Wang, L.; Mondal, S.K.; Yin, H.; Wang, Y. Transfer Learning for Improving Seismic Building Damage Assessment. Remote Sens. 2022, 14, 201. [Google Scholar] [CrossRef]
  56. Presa-Reyes, M.; Chen, S.C. Assessing Building Damage by Learning the Deep Feature Correspondence of before and after Aerial Images. In Proceedings of the 3rd International Conference on Multimedia Information Processing and Retrieval (MIPR), Shenzhen, China, 6–8 August 2020. [Google Scholar] [CrossRef]
  57. Presa-Reyes, M.; Chen, S.C. Weakly-Supervised Damaged Building Localization and Assessment with Noise Regularization. In Proceedings of the 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), Tokyo, Japan, 8–10 September 2021. [Google Scholar] [CrossRef]
  58. Wang, Y.; Chew, A.W.Z.; Zhang, L. Building damage detection from satellite images after natural disasters on extremely imbalanced datasets. Autom. Constr. 2022, 140, 104328. [Google Scholar] [CrossRef]
Figure 1. Methodological workflow of the research process.
Figure 1. Methodological workflow of the research process.
Applsci 14 07116 g001
Figure 2. Number of publications per year during 2019–2024.
Figure 2. Number of publications per year during 2019–2024.
Applsci 14 07116 g002
Figure 3. Distribution of publications by country.
Figure 3. Distribution of publications by country.
Applsci 14 07116 g003
Figure 4. Percentage of publications by subject area.
Figure 4. Percentage of publications by subject area.
Applsci 14 07116 g004
Figure 5. Visualization of co-occurrence of author keywords and indexed keywords, with 32 items (4 clusters). VOSviewer.
Figure 5. Visualization of co-occurrence of author keywords and indexed keywords, with 32 items (4 clusters). VOSviewer.
Applsci 14 07116 g005
Table 1. Cluster classification of author keywords and indexed keywords. VOSviewer.
Table 1. Cluster classification of author keywords and indexed keywords. VOSviewer.
ClustersKeywords
Cluster 1Convolutional Neural Network; Learning algorithms;
Buildings; Damage detection;
Deterioration; Historic preservation
Cluster 2Convolutional Neural Network;
Building damage; Damage assessment;
Disasters; Natural disasters;
Remote sensing; Antennas
Cluster 3Architecture; Learning models;
Crack detection; Object detection;
Image classification; Image segmentation;
Structural health monitoring
Cluster 4Deep learning; Machine learning;
Classification; Learning systems;
Cultural heritage; Heritage structures
Table 2. Review of image-based DL models applied for building damage classification on HBE, with accuracy values obtained.
Table 2. Review of image-based DL models applied for building damage classification on HBE, with accuracy values obtained.
ReferenceDL Algorithm
(Architecture)
Accuracy
mPA [%]
Precision
mAP [%]
Recall
[%]
F1-Score
[%]
IoU/
mIoU
[%]
Damage Classes
BINARY CLASSIFICATION
[30]Mask R-CNN
(ResNet101 + FPN + RPN)
74–80(@0.5) *
17–26(@0.5) *
moist area, biological colonization
[31]CNN (VGG16)98.9294.5088.7091.40 erosion, material loss, color change, natural, sabotage
[32]CNN—Grad-CAM
(EfficientNetB0)
96.5096.7396.2596.49 cracks, exfoliation, deterioration
[33]CNN (EfficientNetB0)99.61 99.97 roof tilts inclinations
[34]Mask R-CNN
(ResNet101 + FPN + RPN)
97.50(@0.5) * tiles damage
[35]CNN97.73 cracks, spall
MULTI-CLASS CLASSIFICATION
[31]CNN (VGG16)96.2694.809476 erosion, material loss, color change, natural, sabotage
[36]CNN99.4096.40–100 cracks, flaking, contour scaling, crust, efflorescence, biological colonization, erosion, graffiti
[37]Faster R-CNN + RPN
(InceptionV2)
94.60crack, loss, detachment, biological colonization
[38]CNN—Grad-CAM
(MobileNetV3Small)
63.4669.9755.42 cracks, corroded steel reinforcements, deteriorated concrete, honeycombs, moisture spots, shrinkage cracks
[39]CNN (ResNet18) 59.06 crust, detachment, cracks, oxidation, humidity, efflorescence
[40]CNN 85.60 damage levels
MULTI-LABEL CLASSIFICATION
[41]Mask R-CNN
(ResNet101 + FPN + RPN)
53–100 biological colonization, contour scaling, crack, higher plant, impact damage, microkarst, missing part
[42]CNN (MobileNetV2) 86.8084 efflorescence, spall, crack, mold
[43]YOLOv5 93.70(@0.5) *91.80 crack, discoloration, exposed bricks,
spalling
[44]Faster R-CNN + RPN
(ResNet101)
95 efflorescence, spalling
[45]CNN (DeepLabV3+ (backbone ResNet101)84.40 74.3066.90vegetation
[46]Faster R-CNN + RPN
(ResNet101 + FPN)
58.19(@0.5) * cracks, spalling
[47]YOLOv5x 81(@0.5) * 85 spalling
OBJECT SEGMENTATION
[48]edge-enhanced
Mask R-CNN
(ResNet101)
93.23(@0.5) * 84.21missing parts
* Value obtained at IoU = 0.5.
Table 3. Review of image-based DL models applied for buildings crack detection on HBE, with accuracy values obtained.
Table 3. Review of image-based DL models applied for buildings crack detection on HBE, with accuracy values obtained.
ReferenceDL Algorithm
(Architecture)
Accuracy
mPA [%]
Precision
mAP [%]
Recall
[%]
F1-Score
[%]
IoU/mIoU
[%]
BINARY CLASSIFICATION
[49]CNN
(U2-Net)
98.3267.7383.03 78.38
[50]CNN CAM-K-SEG
(ResNet50)
95.4497100 70
[51]CNN
(ReCRNet)
97.1010093.3096.60
EDGES EXTRACTION
[52]DexiNed (Xception)
RCF (VGG16)
88.83–90.62
88.84–90.53
47.14–63.39
47.77–63.27
49.58–60.82
48.39–60.42
50.15–61.38
49.41–61.50
33.47–44.28
32.81–44.41
Table 4. Review of image-based DL models applied for post-disaster damage assessment at district scale, with accuracy values obtained.
Table 4. Review of image-based DL models applied for post-disaster damage assessment at district scale, with accuracy values obtained.
ReferenceDL Algorithm
(Architecture)
Accuracy
mPA [%]
Precision
mAP [%]
Recall
[%]
F1-Score
[%]
Post-Disaster
Damage Levels
BINARY CLASSIFICATION
[53]Logistic Regression
(DenseNet121)
81/9487/9284/93damage/no damage
MULTI-CLASS CLASSIFICATION
[54]VAE + GMM
(ResNet50)
78.20–94.30 76.80–94.20damage/
no damage/other
[55]CNN (VGG-OR)74 no damage/
light damage/
heavy damage/
collapse
[56]CNN (ResNet50) 73/87/90/9669/79/90/9871/83/90/97no damage/
minor damage/major damage/destroyed
MULTI-LABEL CLASSIFICATION
[57]CNN (ResNet50)MSE = 1.1282–1.3893 *
MAE = 0.9266–1.1235 *
no damage/affected/
minor damage/major damage/destroyed
[58]CNN with incremental learning99.55 99.53no damage/minor damage/major damage
* MSE = Mean Square Error; MAE = Mean Absolute Error.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Giannuzzi, V.; Fatiguso, F. Historic Built Environment Assessment and Management by Deep Learning Techniques: A Scoping Review. Appl. Sci. 2024, 14, 7116. https://doi.org/10.3390/app14167116

AMA Style

Giannuzzi V, Fatiguso F. Historic Built Environment Assessment and Management by Deep Learning Techniques: A Scoping Review. Applied Sciences. 2024; 14(16):7116. https://doi.org/10.3390/app14167116

Chicago/Turabian Style

Giannuzzi, Valeria, and Fabio Fatiguso. 2024. "Historic Built Environment Assessment and Management by Deep Learning Techniques: A Scoping Review" Applied Sciences 14, no. 16: 7116. https://doi.org/10.3390/app14167116

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop