Open AccessArticle

A New Extensible Feature Matching Model for Corrosion Defects Based on Consecutive In-Line Inspections and Data Clustering

Mohamad Shatnawi

^1,*

and

Péter Földesi

Doctoral School of Multidisciplinary Engineering Sciences, Széchenyi István University, 9026 Győr, Hungary

Department of Logistics, Széchenyi István University, 9026 Győr, Hungary

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 2943; https://doi.org/10.3390/app15062943 (registering DOI)

Submission received: 13 February 2025 / Revised: 3 March 2025 / Accepted: 6 March 2025 / Published: 8 March 2025

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The proposed framework introduces a new feature matching approach for corroded pipelines based on in-line inspections and data clustering, contributing to the broader field of pipeline integrity management. The effectiveness of this framework suggests potential for application in other domains that benefit from spatial feature matching.

Abstract

Corrosion is considered a leading cause of failure in pipeline systems. Therefore, frequent inspection and monitoring are essential to maintain structural integrity. Feature matching based on in-line inspections (ILIs) aligns corrosion data across inspections, facilitating the observation of corrosion progression. Nonetheless, the uncertainties of inspection tools and corrosion processes present in ILI data influence feature matching accuracy. This study proposes a new extensible feature matching model based on consecutive ILIs and data clustering. By dynamically segmenting the data into spatially localized clusters, this framework enables feature matching of isolated pairs and merging defects, as well as facilitating more precise localized transformations. Moreover, a new clustering technique—directional epsilon neighborhood clustering (DENC)—is proposed. DENC utilizes spatial graph structures and directional proximity thresholds to address the directional variability in ILI data while effectively identifying outliers. The model is evaluated on six pipeline segments with varying ILI data complexities, achieving high recall and precision of 91.5% and 98.0%, respectively. In comparison to exclusively point matching models, this work demonstrates significant improvements in terms of accuracy, stability, and managing the spatial variability and interactions of adjacent defects. These advancements establish a new framework for automated feature matching and contribute to enhanced pipeline integrity management.

Keywords:

pipeline integrity; corrosion management; directional epsilon neighborhood clustering (DENC); affine transformation; linear optimization; linear programming

1. Introduction

Corrosion is a major global issue, costing an estimated 3% to 4% of the global GDP, approximately USD 2.5 trillion [1]. It is considered a primary cause of failures and operational disruptions in pipeline systems [2,3]. Thus, to ensure pipeline integrity and safe operation, in-line inspection (ILI) is conducted every two to six years using tools known as “intelligent pigs”, which traverse the pipeline and identify corrosion defects utilizing technologies such as ultrasonic testing (UT) and magnetic flux leakage (MFL) [4,5,6]. If not adequately monitored and controlled, corrosion can progress, causing defects to expand and interact, thereby leading to additional material loss and wall thinning. This escalation increases the pipeline’s vulnerability to high-pressure conditions, potentially compromising its structural integrity and elevating the likelihood of leaks or ruptures. Such failures can result in substantial financial losses, alongside severe environmental and safety repercussions. The economic impact extends beyond direct repair costs, affecting operational efficiency and necessitating unplanned downtime, which can disrupt supply chains and regulatory compliance [3,7,8]. Figure 1 shows various internal corrosion defects on fragments of a pipe wall [9], demonstrating examples of material loss and wall thinning.

Regardless of the technology used, ILI tools face inherent limitations due to detection and measurement uncertainties. These tools detect and measure corrosion defects larger than a defined detection threshold, commonly 10% of the wall thickness for MFL-based tools. Additionally, a reporting threshold may also be applied, where only defects with a size exceeding this threshold are reported, thereby reducing false positives often associated with smaller corrosion defects. Despite these thresholds, detection errors may still occur, even for defects above the detection threshold, as ILI tools cannot guarantee complete accuracy. This is quantified by the probability of detection (POD), which describes the likelihood that a defect will be detected when it exceeds the detection threshold. Conversely, false call errors can arise when the tool identifies a defect that does not exist, with the probability of false alarm (POA) describing the likelihood of such occurrences. In addition to that, ILI tools are also subjected to directional location uncertainties due to limitations in their sensor and coordinate systems. These location uncertainties can be seen as a calibrating uncertainty of the defect’s position (

δ

) [10,11,12,13,14,15].

The raw data captured by ILI tools, which represent corrosion defects in the form of signals, are processed into a feature tally. Corrosion defects are reported in the tally as features with lengths, widths, and depths determined by the signal’s peak in each dimension, while their location is reported in terms of axial position and circumferential (12 h clock orientation) or angular position. Although the raw signals provide a more detailed and accurate representation of corrosion, the feature tally, being independent of the inspection technology, is more accessible for interpretation and manipulation. Therefore, end users are typically presented with corrosion measurements in terms of feature tallies [11,12,13,14].

Feature matching is performed by aligning the tallies from consecutive ILIs, enabling the direct observation of the progression and emergence of corrosion defects [16,17]. Ideally, all defects in a pipeline can be perfectly matched. However, uncertainties in ILI data, such as detection, measurement, and location uncertainties, limit perfect feature matching. Recently, point matching based on affine transformations has shown promising results in addressing the feature matching problem by relying on the spatial characteristics of the ILI data informed by these uncertainties [10,13].

1.1. Point Matching Problem: Identifying Matching Corrosion Features Based on Consecutive ILIs

In feature matching, the task closely resembles point pattern matching, benefiting from the availability of relatively accurate defect positions reported by ILI tools. Hence, the problem can be described as a matching problem involving two sets of points: a moving set

P = {p_{i} \in W, i = 1, 2, \dots, n}

and a reference set

Q = {q_{j} \in W, j = 1, 2, \dots, m}

, each representing corrosion features from older and newer ILIs, respectively. Each

p_{i} ≔ (x_{p_{i}}, y_{p_{i}})

and

q_{j} ≔ (x_{q_{j}}, y_{q_{j}})

indicates the location of a feature on the pipe, while

W = (A \times π D) \in R^{2}

denotes the plane defined by the abscissa

A

and perimeter

π D

, where

D

is the diameter of the pipeline, internal or external. The problem is then formulated to find an affine transformation

T

for each

p_{i}

, as shown in Equation (1). Here,

S

is the scaling factor,

θ

is the rotation angle, and

t_{x}

and

t_{y}

are the translations along the axial and circumferential directions, respectively [18]. This transformation maps the moving set

P

to a transformed moving set P′, thereby facilitating accurate correspondence with set

Q

, as illustrated in Figure 2.

T (\binom{x_{p_{i}}}{y_{p_{i}}}) = (\binom{t_{x}}{t_{y}}) + S (\binom{\cos θ}{\sin θ} \binom{- \sin θ}{\cos θ}) (\binom{x_{p_{i}}}{y_{p_{i}}})

(1)

Feature matching between ILIs is usually performed manually, making it both time-consuming and error prone. Automating this process is therefore essential to improve accuracy and scalability. For instance, Dann and Dann [13] proposed an automated framework for matching corrosion defects across multiple in-line inspections. This framework simplifies the feature matching problem by transforming it into a series of manageable matching problems consisting of five steps:

Reducing the multiple ILI matching problem to a series of two consecutive ILI matching problems.
Splitting the pipeline into segments between consecutive girth welds with an overlap of 0.3 m with adjacent joints at both ends, as shown in Figure 3.
Mapping feature locations to a two-dimensional plane by unrolling the pipeline axially and double unrolling the moving set P to account for features near the unrolling reference point (ideally the 12 o’clock position), as shown in Figure 4. Double unrolling of the moving set is performed by extending the pipeline halves at the opposite ends, duplicating only the moving set in the process. This approach results in a new extended moving set $\tilde{P} = {{\tilde{p}}_{i} \in W, i = 1, 2, \dots, 2 n}$ that stacks both the moving and double unrolled moving sets, respectively.
Matching corrosion features using annealing and the soft-assign mixed point matching approach [19,20,21], derived from the thin plate spline robust point matching (TPS-RPM) method introduced by Chui and Rangarajan [22]. This approach aims to find an optimum affine transformation that is close to the identity transformation matrix, leveraging ILI tools’ accuracy and controlling transformation from overdrifting.
Transforming the results back to the original feature matching problem.

The proposed point matching approach (step 4) performs best when an identifiable displacement pattern (affine transformation) exists between features of the two sets. However, its accuracy declines when displacements are highly irregular [10]. This is primarily due to the approach’s strict correspondence criteria, which consider only features with a correspondence probability

\geq 0.9

as matches. Throughout this study, this model will be referred to as the “annealing model”.

To address these limitations, Amaya-Gómez et al. [10] proposed an alternative model using the iterative closest point (ICP) algorithm, initially introduced by Besl and McKay [23]. This model incorporates Voronoi tessellations to identify mixed nearest neighbors between

\tilde{P}

(or

P

in the absence of features near the unrolling reference point) and

Q

, as shown in Figure 5. Mixed nearest neighbors refer to the pair of features

{\tilde{p}}_{i}

and

q_{j}

positioned within each other’s Voronoi cells and separated by a distance less than the defect position uncertainty threshold (

δ

), denoted as

d ({q_{j}, \tilde{p}}_{i}) < δ

. Here,

δ = \sqrt{δ_{x}^{2} + δ_{y}^{2}}

or the vector (

δ_{x}

δ_{y}

), where

δ_{x}

and

δ_{y}

denote the axial and circumferential (angular) location uncertainties, respectively.

This nearest neighbor selection strategy is then applied iteratively to determine the optimum affine transformation T′. In each iteration, a new estimated affine transformation T of the mixed nearest neighbors, as described by Chang et al. [24] and formulated in Equation (2), is applied to

\tilde{P}

, resulting in a transformed moving set that will be used in the next iteration to determine the mixed nearest neighbors. The iterative process converges when the mean square error (MSE) of the matchings between two consecutive iterations falls below a predefined convergence threshold (

τ

). A value of

τ = 0.001

was found to provide a suitable balance between accuracy and convergence.

According to Chang et al. [24], if there are

𝓁

pairs (

𝓁 \geq 2

) of potentially matching points, denoted as

a_{i} : = (x_{a_{i}}, y_{a_{i}}) \leftrightarrow b_{i} : = (x_{b_{i}}, y_{b_{i}}) f o r i = 1 \dots 𝓁

, the affine transformation

T

that minimizes the sum of squared residuals,

{\sum_{i = 1}^{𝓁} ‖T (a_{i}) - b_{i}‖}^{2}

, can be expressed as

\bar{r} = (t_{x}, t_{y}, S \cos θ, S \sin θ)^{T}

. The formula for

\bar{r}

that provides the optimal least-squares match is derived as follows:

\bar{r} = \frac{1}{d e t} [\begin{matrix} \begin{array}{l} l_{A} & 0 \\ 0 & l_{A} \end{array} & \begin{matrix} {- μ}_{X_{A}} & μ_{Y_{A}} \\ - μ_{Y_{A}} & {- μ}_{X_{A}} \end{matrix} \\ \begin{matrix} {- μ}_{X_{A}} & - μ_{Y_{A}} \\ μ_{Y_{A}} & {- μ}_{X_{A}} \end{matrix} & \begin{array}{l} 𝓁 & 0 \\ 0 & 𝓁 \end{array} \end{matrix}] [\begin{matrix} \begin{matrix} μ_{X_{B}} \\ μ_{Y_{B}} \end{matrix} \\ \begin{matrix} l_{A + B} \\ l_{A - B} \end{matrix} \end{matrix}]

(2)

where:

μ_{X_{A}} = \sum_{i = 1}^{𝓁} x_{a_{i}}, μ_{Y_{A}} = \sum_{i = 1}^{𝓁} y_{a_{i}}, μ_{X_{B}} = \sum_{i = 1}^{𝓁} x_{b_{i}}, μ_{Y_{B}} = \sum_{i = 1}^{𝓁} y_{b_{i}} l_{A + B} = \sum_{i = 1}^{𝓁} (x_{a_{i}} . x_{b_{i}} + y_{a_{i}} . y_{b_{i}}), l_{A - B} = \sum_{i = 1}^{𝓁} (x_{a_{i}} . y_{b_{i}} + y_{a_{i}} . x_{b_{i}}) l_{A} = \sum_{i = 1}^{𝓁} (x_{a_{i}}^{2} + y_{a_{i}}^{2}), d e t = 𝓁 . l_{A} - μ_{X_{A}}^{2} - μ_{Y_{A}}^{2}

Finally, the correspondence matrix

C = ([c_{i j}], \forall i = 1, 2, \dots, 2 n, \forall j = 1,2, \dots m)

and the outlier vectors of the moving set

r = (r_{1}, r_{2}, \dots, r_{2 n})

and reference set

s = {(s_{1}, s_{2}, \dots, s_{m})}^{T}

are obtained by solving a linear optimization problem, following the objective and constraints derived from the revised TPS-RPM by Dann and Dann [13] and outlined in Equation (3). With T′ obtained from the previous step, the linear optimization problem becomes a linear programming problem with linear constraints limited to binary correspondence and outlier solutions [25]. Here

c_{i j}

is a binary correspondence between moving feature

{\tilde{p}}_{i}

and reference feature

q_{j}

, while

r_{i}

and

s_{j}

denote binary values, whether or not moving feature

{\tilde{p}}_{i}

and reference feature

q_{j}

are classified as outliers, respectively. The outliers and mismatches controlling parameter

α

can be linked to the proportion of outliers in both sets. Higher

α

values influence the optimization to favor farther neighbors, while lower values enforce stricter proximity constraints, filtering out actual correspondences as outliers.

{a r g m i n}_{C, r, s} \sum_{i = 1}^{2 n} \sum_{j = 1}^{m} c_{i j} (w_{i j} - α) Subjected to 0 \leq c_{i j}, r_{i}, s_{j} \leq 1, \forall i = 1, \dots, 2 n \forall j = 1, \dots, m \sum_{j = 1}^{m} c_{i j} + r_{i} = 1, \forall i = 1, \dots, 2 n \sum_{i = 1}^{2 n} c_{i j} + s_{j} = 1, \forall j = 1, \dots, m

(3)

where

w_{i j} = {‖q_{j} - T^{'} ({\tilde{p}}_{i})‖}^{2}

This model was validated on a 45 km pipeline, achieving a true matching ratio (recall) of 84%. Additionally, the model was compared to the annealing model on 50 synthetic datasets, yielding better results for all datasets. Throughout this research, this model will be referred to as the “Voronoi model”.

1.2. Limitations of Feature Matching That Relies on Affine Transformations

The uncertainties in ILI data are influenced by multiple factors that go beyond the accuracy of ILI tools. For instance, the defect’s position uncertainty (

δ

) explains a physical limitation of ILI tools, but it does not explain the influence of corrosion variable growth on the position of the defect. Furthermore, point matching using affine transformation seeks to determine the best point pattern transformation. Consequently, it often fails to address scenarios where adjacent corrosion defects interact and merge. Figure 6 illustrates how these two phenomena challenge feature matching and influence defect positioning across inspections.

The annealing model recommends splitting the pipeline into segments between consecutive girth welds, with a 0.3 m overlap with adjacent segments, as shown in Figure 2. This segmentation simplifies the optimization process, making it more computationally efficient compared to optimizing the entire pipeline at once. Additionally, it allows transformations to be computed on subsets of

P

and

Q

, potentially reducing distortions in feature displacements between the sets and improving accuracy. However, this approach may require additional manual validation, as some features in the overlapping area may match in both segments. In contrast, the Voronoi model does not recommend segmentation and instead processes the entire sets at once. While this eliminates the need for manual post-processing, it may lead to less accurate transformations when compared to smaller pipeline segments. Dynamic segmentation of the pipeline into spatially localized regions could potentially balance the trade-offs of both methodologies. Unlike physical segmentation, this approach avoids issues with defects near the segmentation reference (e.g., near girth welds). Furthermore, transformations are computed on smaller, localized problems, resulting in more accurate correspondence that better captures variable displacements. In addition, these regions, when appropriately identified, can represent different localized behaviors such as defects merging or the presence of outliers.

1.3. Resolution of the Feature Matching Problem Using Clustering

Clustering methods, particularly those based on graph structures and dense regions, such as spectral clustering and density-based clustering, are widely used to address spatial variability in data [26,27]. For instance, spatial spectral clustering techniques inspired by proximity graphs, such as epsilon neighborhood, define linkages (edges) between data points based on proximity thresholds, forming graphs that capture underlying spatial relationships. Similarly, density-based clustering methods, particularly density-based spatial clustering of applications with noise (DBSCAN) uses such proximity criteria to form dense regions. This is achieved by evaluating the number of neighboring points (

m i n P t s

) within a specific distance, determined by the proximity threshold epsilon (

ε

). This enables DBSCAN to classify data points into cores, borders, and noise while effectively identifying cluster boundaries [28]. Both spectral and density-based clustering methods often rely on graph traversal techniques, such as depth-first search (DFS), to analyze spatial relationships efficiently. By traversing graphs, these techniques group spatially continuous data points into subgraphs or clusters [29].

In the context of feature matching, clustering techniques have demonstrated significant potential in resolving complex spatial relationships, often outperforming traditional transformation-based approaches. For instance, Jiang et al. [30] proposed a spatial clustering-based approach called robust feature matching using spatial clustering with heavy outliers (RFM-SCAN) that redefines the feature matching problem in images as a spatial clustering problem with outliers, modifying the DBSCAN algorithm to better handle the unique challenges in applications like remote sensing image registration. Their approach introduces adaptive clustering methods to identify motion-consistent groups while isolating mismatches, effectively managing cases where predefined transformation models, such as affine or homographic transformations, may fail due to complex and unknown geometric relationships. Moreover, Ren et al. [31] applied the RFM-SCAN algorithm to aerial image registration with large view differences, addressing issues of data redundancy and sensitivity to spatial location by introducing dimensionality reduction using principal component analysis (PCA) and motion consistency constraints to recall isolated positive matches. These enhancements significantly improved RFM-SCAN performance, particularly in scenarios with high outlier rates and complex spatial variability.

In certain problems, particularly those involving directional variability, clustering based on isotropic metrics like DBSCAN or RFM-SCAN may not fully capture this variability, leading to cluster misclassification. Anisotropic clustering methods, such as anisotropic density-based clustering with noise (ADCN) and quick unsupervised anisotropic clustering (QUAC) [32,33], show that traditional isotropic methods often deliver less accurate results for such problems, and anisotropic methods better capture coherent and meaningful clusters by accounting for directional variability in the data.

1.4. Novelty and Purpose of the Study

This study proposes a novel extensible data clustering-based feature matching model to address spatial variability and corrosion interaction complexities present in ILI data, two phenomena exclusively point matching-based models fail to comprehensively describe. The model’s framework aims to capture these complexities by integrating data clustering and classification with point matching. First, data clustering is utilized to segment the data into spatially localized clusters. These clusters are then classified based on their density. Subsequently, the feature matching problem is addressed based on the characteristics of each cluster. As illustrated in Figure 7, this approach should enable the following:

Isolated correspondence matching: Clustering identifies isolated correspondent defects along the pipeline, which are easily matched due to their strong correspondence and minimal interference from surrounding defects.
Merging defect matching: Clustering facilitates the development of tailored matching strategies based on cluster characteristics, such as defect merging, a critical aspect often overlooked or oversimplified by traditional feature matching models.
Localized transformations: Clustering segments the data into localized regions with minimized distortion and outliers, enabling more precise affine transformations that capture feature displacements.

In addition to that, a novel clustering technique—directional epsilon neighborhood clustering (DENC)—is developed. DENC employs spatial graph structures and directional proximity thresholds:

4.: Aligning with the directional variability in ILI data: Directional thresholds effectively capture axial and circumferential variability, providing a more realistic and accurate representation of spatial relationships.

The remainder of this paper is organized as follows. Section 2 provides a detailed description of the model’s structure, accompanied by a walk-through example of the matching problem presented in Figure 4 through Figure 6. Section 3 highlights the key findings of a detailed performance analysis of the proposed model for a case study consisting of six pipeline segments with varying ILI data complexities. Section 4 discusses the results and outlines potential future work, while Section 5 presents the final conclusions.

2. Research Methodology

The proposed model, as illustrated in Figure 8, addresses the feature matching problem by analyzing two sets of consecutive ILIs: an older inspection (moving set

P

) and a newer inspection (reference set

Q

). To simplify the correspondence and proximity problem, the pipeline is axially unrolled into a two-dimensional plane, where the

x

and

y

coordinates denote the abscissa and perimeter, respectively. On this plane, features are located by their axial position along the x-axis and circumferential position along the y-axis.

As shown in Figure 8, the model performs feature matching in two subsequent steps. First, the ILI data are clustered to identify spatially localized relationships, as detailed in Section 2.1. Second, the obtained clusters are classified into distinct categories based on their density; therefore, the feature matching problem is addressed using strategies tailored to the characteristics of each category, as detailed in Section 2.2.

2.1. Data Clustering

Solving the feature matching problem using clustering highlights the core aspect of this study, as it aims to improve accuracy in the context of uncertainties introduced by corrosion processes and ILI tools. This study proposes a new clustering technique—directional epsilon neighborhood clustering (DENC)—specifically designed to address the directional variability in ILI data, as detailed in Section 2.1.1. Additionally, Section 2.1.2 outlines an alternative implementation of the proposed model that utilizes DBSCAN as the clustering technique, thus evaluating the proposed DENC and providing insights into how different clustering techniques and thresholds influence the feature matching outcomes.

2.1.1. ILI Data Clustering Using DENC

DENC employs proximity thresholds, like in epsilon neighborhood graphs or DBSCAN, but instead of relying on an isotropic proximity threshold (

ε

), DENC proximity thresholds are the directional components of

ε

ε_{x}

along the

x

-axis and

ε_{y}

along the

y

-axis. The method constructs a directed graph of the ILI data by labeling features as anchors and nodes. Adjacency boundaries are determined based on the directional thresholds applied to the anchors, enabling the identification of adjacent nodes that fall within these boundaries. This adjacency criterion determines the structure of the graph and the boundaries of the clusters, represented in a binary adjacency matrix (

B

The first step in DENC involves dividing the feature tallies into anchors and nodes. Features in the reference set

Q

are designated as anchors, while features in the moving set

P

are treated as nodes. Adjacency is established exclusively between anchors and nodes, with no adjacency permitted between anchors themselves or between nodes. This controlled adjacency forms the basis for constructing adjacency relationships.

It is worth noting that the moving set can be designated as anchors and the reference set as nodes without affecting the outcomes. However, the proposed approach in the following steps must be consistently realigned to account for such a swap.

The adjacency relationships between anchors and nodes are established using a directional proximity criterion along the axial (

x

) and circumferential (

y

) directions. As illustrated in Figure 9, this criterion implies that for a node

p_{i}

to be considered adjacent to anchor

q_{j}

it must fall within the rectangular boundary surrounding

q_{j}

, defined by the thresholds

ε_{x}

and

ε_{y}

Here, the directional distance matrix

D = ([(d_{j i}^{x}, d_{j i}^{y})], \forall j = 1, 2, \dots, m, \forall i = 1,2, \dots n)

is established. Each element

(d_{j i}^{x}, d_{j i}^{y})

in matrix

D

represents the displacements between anchors

Q

and nodes

P

along the

x

and

y

directions, respectively. The displacements

d_{j i}^{x}

and

d_{j i}^{y}

are computed as follows:

d_{j i}^{x} = | x_{q_{j}} - x_{p_{i}} |

and

d_{j i}^{y} = | y_{q_{j}} - y_{p_{i}} |

. The binary adjacency matrix

B = ([B_{j i}], \forall j = 1, 2, \dots, m, \forall i = 1,2, \dots n)

is constructed such that each element

B_{j i}

represents a binary value when the directional distance

(d_{j i}^{x}, d_{j i}^{y})

is within the directional thresholds

ε_{x}

and

ε_{y}

, and it is given by the following:

B_{j i} = \{\begin{matrix} 1; & d_{j i}^{x} \leq ε_{x} a n d {d_{j i}^{y} \leq ε}_{y} \\ 0; & o t h e r w i s e \end{matrix}

(4)

In the context of ILI data,

d_{j i}^{y}

can be slightly modified to account for the pipeline’s circumferential continuity, as denoted in Equation (5). Here,

d_{j i}^{y}

is the minimum circumferential displacement between

q_{j}

and

p_{i}

measured in the downward and upward directions.

d_{j i}^{y} = \min (| y_{q_{j}} - y_{p_{i}} |, π D - | y_{q_{j}} - y_{p_{i}} |)

(5)

where

D

: diameter of the pipeline, internal or external.

Considering the given example problem, where the features of the moving and reference sets are ordered ascendingly by their axial position, the binary adjacency matrix

B

is then given as follows:

B = [\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 1 \\ 1 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \\ \begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \\ \begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix}]

The binary adjacency matrix

B

can be interpreted as the adjacency matrix of a graph. In this graph, each anchor

q_{j}

and node

p_{i}

corresponds to a vertex, and a directed edge exists between/from vertex

q_{j}

and/to vertex

p_{i}

B_{j i} = 1

, as illustrated Figure 10.

Clusters within the graph are defined as isolated subgraphs. Depth-first search (DFS) is used to traverse the entire graph and identify these subgraphs. Vertices that do not belong to any subgraph, such as those without edges, are treated as outliers. The final obtained clusters for the given example problem are shown in Figure 11. Other approaches such as breadth-first search (BFS) can also be used to obtain the same clusters. However, analyzing or comparing these approaches is not in the scope of this work.

2.1.2. ILI Data Clustering Using DBSCAN

While DENC is specifically designed for the feature matching problem, alternative techniques such as DBSCAN can also be employed to capture spatial variability in ILI data. The integration of DBSCAN into the proposed model differs slightly from DENC due to DBSCAN’s lack of explicit support for circumferential continuity and labeling data as anchors and nodes. Instead, the moving set

P

is double unrolled, and all features in the extended set

\tilde{P}

and reference set

Q

are stacked in one set. Next, this set is clustered using DBSCAN, as initially introduced by Ester et al. [28]. Then, the clustered data are mapped back to their original sets,

P

and

Q

. Finally, a post-processing step is introduced, where features of clusters formed exclusively from

P

Q

are filtered out as outliers.

When configured appropriately, DBSCAN can exhibit similar clustering behavior to DENC, but it relies on isotropic thresholds rather than anisotropic (directional) ones. To capture the direct correspondence of isolated features, such as the two pairs within the first 0.75 m in the example problem shown in Figure 9, the parameter

m i n P t s

is set to 2. In DBSCAN, this configuration enables the formation of dense regions when a feature has at least one neighboring feature within proximity threshold

ε

, as well as allowing for the identification of isolated outlier features. The obtained clusters for the given example using DBSCAN with

m i n P t s = 2

and

ε = 0.240

m are identical to the clusters obtained by DENC and illustrated in Figure 11. DBSCAN implementation in the Python 3.0 package ‘sklearn’ is employed in this study.

Methodologically, when

m i n P t s = 1

, DBSCAN becomes incapable of identifying outliers, as dense regions are formed by as few as one feature (the feature itself). However, the post-processing step will result in equivalent clusters to those of

m i n P t s = 2

, as clusters containing a single feature from either

P

Q

are also filtered out as outliers.

2.2. Cluster Classification and Feature Matching

Regardless of the clustering technique employed, the proposed model classifies clusters into four distinct categories: (1) one-to-one, (2) one-to-many, (3) many-to-one, and (4) many-to-many. This classification strategy is based on the densities of

X_{Q}

and/through

X_{P}

, as shown in Figure 12. For a given cluster

X

X_{Q}

and

X_{P}

denote the subset of features from

Q

and

P

within the cluster, respectively. Category 1 includes the direct correspondence of isolated pairs within their own clusters. Categories 2 and 3 encompass clusters of merging defects and newly detected defects, respectively. The matches and outliers within these clusters are identified using distance-based filtering strategies. Finally, Category 4 represents general clusters where point matching, such as the Voronoi model, is employed to identify matches and outliers.

To account for features positioned near the unrolling reference point, double unrolling of

X_{P}

, as proposed by Dann and Dann [13] and shown in Figure 4, is performed prior to any further analysis for clusters in Category 2 and 3, resulting in a new extended set

X_{\tilde{P}} = {X_{{\tilde{p}}_{k}} \in W, k = 1, 2, \dots, 2 n_{X}}

, where

n_{X}

is the size of the moving set in cluster

X

. For clusters in Category 4, double unrolling is performed based on the requirements of the utilized point matching method. For example, the annealing model always requires double unrolling, while the Voronoi model requires it only when features are present near the unrolling reference point.

2.2.1. Category 1: One-to-One

This category represents the simplest form of feature matching, where a cluster contains exactly one feature in set

X_{Q}

and one feature in set

X_{P}

. These features are considered a direct match, as they are isolated within their own localized region, and therefore, their correspondence is strong without requiring any further processing.

2.2.2. Category 2: One-to-Many

This category addresses scenarios of merging defects, where a cluster contains a single feature in set

X_{Q}

(

X_{q_{0}}

) and multiple features in set

X_{P}

. Any feature

X_{{\tilde{p}}_{k}}

in set

X_{\tilde{P}}

is considered a match if its distance to feature

X_{q_{0}}

is within the merging distance threshold (

λ

), denoted as

d (X_{q_{0}}, X_{{\tilde{p}}_{k}}) \leq λ, \forall k = 1, 2, \dots, 2 n_{X}

. Features that fail to meet this criterion are classified as outliers.

2.2.3. Category 3: Many-to-One

This category applies to clusters containing multiple features in set

X_{Q}

and a single feature in set

X_{P}

(

X_{p_{0}}

), representing a single match accompanied by additional newly detected defects. The closest feature in set

X_{Q}

to feature

X_{{\tilde{p}}_{0}}

X_{{\tilde{p}}_{1}}

is identified as a match; this condition is denoted as

{m i n}_{X_{q} \in X_{Q}} (m i n (d (X_{q}, X_{{\tilde{p}}_{0}}), d (X_{q}, X_{{\tilde{p}}_{1}})))

. All other features in

X_{Q}

that fail to meet this criterion are classified as outliers (newly detected defects).

Note that features

X_{{\tilde{p}}_{0}}

and

X_{{\tilde{p}}_{1}}

are feature

X_{p_{0}}

before and after double unrolling, respectively. For example, if

C = 0.600

m and

X_{p_{0}} ≔ (1.120 m, 0.250 m)

, then

X_{{\tilde{p}}_{0}} = X_{p_{0}}

and

X_{{\tilde{p}}_{1}} ≔ (1.120 m, 0.850 m)

2.2.4. Category 4: Many-to-Many

This category addresses scenarios where a cluster contains multiple features in set

X_{Q}

and multiple features in set

X_{P}

, presenting a point matching problem. While the proposed model does not enforce a specific point or feature matching technique, it primarily utilizes the Voronoi model as proposed by Amaya et al. [10]. As detailed in Section 1.1, the model operates in two optimization steps. First, ICP optimization using Voronoi tessellation is employed to refine affine transformations. Then, linear optimization is employed to determine correspondence matrix and outlier vectors. Voronoi diagrams are implemented in this study using the Python 3.0 package ‘scipy.’ Linear optimization problems are solved using the Python 3.0 package ‘cvxpy’ with the GLPK solver, chosen for its accuracy and flexibility in handling complex problems [34].

2.3. Complete Solution for the Given Example Problem

Once all the clusters are completely analyzed, the outcome produced by the model for the given example problem is shown in Figure 13. As illustrated, the model successfully identifies all matches and outliers, addressing the spatial and interaction challenges outlined in Section 1.2.

When the same example problem is solved using exclusively the Voronoi model, the delivered recall declines. The Voronoi model is designed to capture one-to-one correspondence relationships. Consequently, it fails to capture merging defects. Out of the actual ten matches, it only captured seven, as shown in Figure 14.

3. Experiment and Analysis

This study examines six internally corroded API 5L X52 shallow offshore water injection pipeline segments, each approximately 12 m in length, with an outer diameter of 219 mm and a wall thickness of 12.7 mm. The feature tally of the six segments was obtained from two consecutive ILIs conducted four years apart using a UT intelligent pig [35]. The inspection setup, illustrated in Figure 15, depicts the direction of both inspections, from the manned wellhead platform to the unmanned wellhead platform. The length of the inspected pipeline is 1228 m. Inspection runs are completed in approximately 3.5 h at an average velocity of 352 m/h.

Table 1 summarizes the number of features in sets

P

and

Q

, observed matches, and new features, as well as the complexity of ILI data across the six pipeline segments. Complexity is categorized as low, medium, or high based on the density and distribution of defects along the segment and the severity of interactions between adjacent defects. Additionally, the two-dimensional (length and width) presentation of the features in set

P

and set

Q

for each pipeline segment is shown in Figure 16.

This section presents a detailed analysis of the proposed model, focusing on sensitivity evaluations and comparisons. The analysis is structured as follows: First, a sensitivity analysis is performed on the Voronoi model’s parameters to assess their influence on overall performance and establish a baseline for comparison, as outlined in Section 3.1. Similarly, Section 3.2 outlines the sensitivity analysis of the proposed model’s parameters. The two models are then compared, focusing on their segment-level performance across varying ILI data complexities, as shown in Section 3.3. Finally, Section 3.4 presents the sensitivity analysis of the DBSCAN-based alternative model’s parameters. The performance evaluation in this study consists of three key metrics, recall, precision, and F1 score, defined as follows [36]:

Recall assesses the model’s ability to identify actual matches and new features, capturing the ratio of true positives among all relevant predictions (true positives and false negatives). High recall indicates the model is effective at finding most of the actual matches and new features.
Precision focuses on the reliability of the model’s positive predictions, representing the proportion of correctly identified matches and new features (true positives) among all predicted (true positives and false positives) matches and new features. High precision indicates fewer false predictions.
F1 score provides a balanced measure of precision and recall, particularly useful in cases where these metrics need to be equally emphasized. It is defined as follows:

F 1 = 2 \frac{R e c a l l \times P r e c i s i o n}{P r e c i s i o n + R e c a l l}

(6)

3.1. Sensitivity Analysis of the Voronoi Model’s Parameters

According to the tool specifications, the axial uncertainty

δ_{x} = 0.1 m

, and the circumferential uncertainty

δ_{y} \geq 10 °

. Hence, conservatively,

δ = 0.11

. Note that

δ = \sqrt{δ_{x}^{2} + δ_{y}^{2}}

, where

δ_{x}

and

δ_{y}

must be of the same measurement unit; therefore,

δ_{y} \geq π D / 36

3.1.1. Sensitivity Analysis of the Outlier Proportion Parameter

The sensitivity of the Voronoi model to outlier proportion parameter

α

is evaluated using

δ = 0.110

m and

τ = 0.001

. As shown in Figure 17, the analysis indicates that the Voronoi model achieves its highest recall of 84.4% when

α

values range between 0.067 and 0.092, with a median of 0.080. Beyond this range, model performance steadily declines, reaching a recall of 70.7% when

α = 1.0

3.1.2. Sensitivity Analysis of the Defect’s Position Uncertainty Threshold

The impact of the defect’s position uncertainty threshold

δ

on the Voronoi model is analyzed using

α = 0.080

and

τ = 0.001

. As shown in Figure 18, the highest recall of 84.4% is observed for

δ

values of 0.020, 0.060, 0.070, 0.080, 0.100, and 0.110 m, aligning with the estimation driven from the tool specifications.

When

δ = 0.0

m, the model achieves a recall of 81.5%. Under this configuration, mixed nearest neighbors are not identified, which causes the model to bypass ICP optimization and rely solely on linear optimization using linear programming, as shown in Equation (2), where now

T^{'} ({\tilde{p}}_{i}) = {\tilde{p}}_{i}

. These findings suggest that the proposed linear optimization approach can provide moderate accurate results without requiring prior transformations, making it suitable for scenarios where preprocessing is impractical or unnecessary.

3.2. Sensitivity Analysis of the Proposed Model’s Parameters

3.2.1. Sensitivity Analysis of the Directional Proximity Thresholds

The sensitivity of the proposed model to DENC’s directional proximity thresholds

ε_{x}

and

ε_{y}

is evaluated using a uniformly scaled combination of these thresholds, with

λ = 0.250

δ = 0.110

α = 0.080

m as estimated in Section 3.1.1, and

τ = 0.001

. Figure 19 shows that the model achieves recall exceeding 85% when

ε_{x}

is between 0.140 and 0.490 m and

ε_{y}

is between 0.070 and 0.245 m. The highest recall of 90.7% is achieved when

ε_{x}

is between 0.220 and 0.250 m and

ε_{y}

is between 0.110 and 0.125 m, while the highest precision of 97.2% is obtained when

ε_{x} = 0.220

m and

ε_{y} = 0.110

When

ε_{x}

and

ε_{y}

are small, the model delivers low performance, as many actual matches are classified as outliers by DENC. Once

ε_{x} = 0.080

m and

ε_{y} = 0.040

m, the model achieves a recall above 70%. The model maintains a recall of at least 81.1% for

ε_{x} \geq 0.110

m and

ε_{y} \geq 0.055

m, and it stabilizes at 84.4% when

ε_{x} \geq 0.770

m and

ε_{x} \geq 0.385

3.2.2. Sensitivity Analysis of the Outlier Proportion Parameter

The impact of the outlier proportion parameter

α

on the proposed model is analyzed using

ε_{x} = 0.220

ε_{y} = 0.110

λ = 0.250

δ = 0.110

m, and

τ = 0.001

. As shown in Figure 20, the model achieves a recall of 80.7% when

α = 0.0

. Recall increases rapidly, reaching 90.0% at

α = 0.014

, and peaks at 91.5% for

α

values between 0.023 and 0.062, with a median of 0.043. Beyond this range, recall slightly declines until

α = 0.430

, where it stabilizes at 90.4%.

3.2.3. Sensitivity Analysis of the Merging Distance Threshold

The sensitivity of the proposed model to the merging distance threshold

λ

is evaluated using

ε_{x} = 0.220

ε_{y} = 0.110

δ = 0.110

α = 0.043

m, and

τ = 0.001

. As shown in Figure 21, the model achieves its highest recall of 91.5% when

λ \geq 0.220

When

λ \geq 0.220

m, the model delivers a precision of 98.0%. The highest precision of 99.1% is observed for all

λ \leq 0.120

m, with recall ranging between 84.1% and 87.4%. In such scenarios, the F1 score provides a more balanced evaluation, with the highest value of 94.6% observed when

λ \geq 0.220

3.3. Segment-Level Analysis of the Proposed Model and the Voronoi Model

The segment-level comparative analysis between the performance of the proposed model and the Voronoi model across the six pipeline segments, as summarized in Table 2, indicates that the proposed model delivers significant improvements over the Voronoi model in segments S1 and S5, with high ILI data complexity, where defect density, distribution, and interaction challenge the feature matching process. For segments S3 and S6 with moderate complexity, the model delivers marginal improvements, while for low-complexity segments (S2 and S4), both models exhibit perfect performance by maintaining a score of 100% through all metrics.

3.4. Sensitivity Analysis of the Proposed DBSCAN-Based Alternative Model’s Parameters

3.4.1. Sensitivity Analysis of DBSCAN’s Proximity Threshold

The sensitivity of the proposed DBSCAN-based alternative model to the proximity threshold

ε

is evaluated using

λ = 0.250

δ = 0.110

α = 0.080

, and

τ = 0.001

, aligning with the configurations used in the proposed model’s sensitivity analysis in Section 3.2. As shown in Figure 22, the model achieves the highest recall of 88.9% when

ε = 0.250

m. Although this configuration does not yield the highest precision, it results in the highest F1 score of 92.1%.

When

ε

is small, the model delivers low performance, as DBSCAN classifies many actual matches as outliers. Once

ε \geq 0.081

m, the model achieves recall above 70%. Recall continues to improve until

ε

reaches the best value of 0.250 m. Beyond this value, performance slightly fluctuates until

ε

reaches a value of 0.740 m, where the model stabilizes at a recall of 84.4%.

3.4.2. Sensitivity Analysis of the Outlier Proportion Parameter

The impact of the outlier proportion parameter

α

on the proposed DBSCAN-based alternative model is analyzed using

ε = 0.250

λ = 0.250

δ = 0.110

m, and

τ = 0.001

. As shown in Figure 23, the model achieves a recall of 63.0% when

α = 0.0

. Recall rapidly increases to 81.1% at

α = 0.001

, with the highest value of 89.3% observed at

α

ranges between 0.025 and 0.026, 0.033 and 0.035, and 0.051 and 0.063, with a median of 0.055. Beyond these ranges, performance slightly declines, stabilizing at 87.4% when

α = 0.430

3.4.3. Sensitivity Analysis of the Merging Distance Threshold

The sensitivity of the proposed DBSCAN-based alternative model to the merging distance threshold

λ

is evaluated using

ε = 0.250

δ = 0.110

α = 0.055

, and

τ = 0.001

. As shown in Figure 24, the model achieves the highest recall of 89.6% when

λ \geq 0.350

m. Although with this configuration the model does not yield the highest precision, it results in the highest F1 score of 92.7%.

4. Discussion

The analysis results detailed in Section 3 strongly suggest that the proposed model, including its presented DBSCAN-based alternative, provides an effective framework to match features based on consecutive ILIs, addressing challenges in spatial variability and interactions between adjacent defects. This section discusses the results of the analysis, demonstrating key findings and validating the study purpose. First, the improvements in feature matching, brought about by clustering, are discussed in Section 4.1, highlighting the key differences in model performance utilizing DENC (an anisotropic-based clustering) and DBSCAN (an isotropic-based clustering). Then, the effectiveness and extensibility of the proposed cluster classification process are discussed in Section 4.2. Next, a discussion of the sensitivity of the model’s performance to its parameters, including the DBSCAN-based alternative, is presented in Section 4.3. Furthermore, a discussion of the model’s applicability to external defects and insights into runtime performance are outlined in Section 4.4 and Section 4.5, respectively. Finally, considerations for detection limitations and pipeline materials are highlighted in Section 4.6.

4.1. Improvements Brought to Feature Matching Using Clustering and DENC

The analysis in Section 3.2, Section 3.3 and Section 3.4 highlights the improvement in feature matching accuracy achieved through clustering. Both the proposed model and its DBSCAN-based alternative outperform the Voronoi model, relatively improving recall by 8.4% and 6.2% and precision by 2.7% and 0.6%, respectively. By dynamically segmenting the feature matching problem to spatially localized clusters, the proposed model effectively demonstrates clear advantages over exclusively point matching-based models, such as the Voronoi model.

DENC’s use of directional thresholds ensures that clusters accurately reflect the directional variability in ILI data. In contrast, while DBSCAN’s reliance on isotropic thresholds captures isolated features and merging defects, as well as improving transformations, it lacks the directional precision offered by DENC. Figure 25 illustrates feature clustering for segment S5 using both DBSCAN and DENC. As shown, DENC is more effective in capturing spatially localized clusters due to its directional proximity thresholds.

4.2. Effectiveness of the Proposed Classification of Clusters

The classification of clusters into four categories—one-to-one, one-to-many, many-to-one, and many-to-many—based on their density provides a structured framework for managing various cluster configurations. These categories effectively cover various scenarios, from direct correspondence to merging and outlier patterns. However, these predefined categories may not fully encompass all possible spatial or geometric patterns, such as highly irregular clusters or features with specific axial or circumferential alignments (e.g., single-path colonies [37]). Extending the classification to include additional scenarios or subcategories could further enhance the model’s ability to accommodate unique feature configurations.

4.3. Influence of the Proposed Model’s Parameters on Balancing Performance and Clustering Efficiency

4.3.1. Proximity Thresholds

DENC and DBSCAN exhibit similar patterns in how their proximity thresholds influence the model’s performance. Smaller thresholds result in tighter clusters, enabling the capture of fine-grained correspondences in densely packed ILI sets. However, overly small thresholds in either approach may lead to actual matches being classified as outliers, reducing recall and lowering overall performance. Conversely, larger thresholds allow for clustering features with greater spatial variability. This relaxes the clustering criteria, improving recall by including features within broader spatial boundaries. However, excessive threshold values risk reducing cluster diversity and often result in most clusters being categorized as point matching problems (Category 4).

4.3.2. Outlier Proportion Parameter

The outlier proportion parameter

α

balances the accuracy of the model’s linear optimization against the proportion of outliers and mismatches present in the two sets. Lower

α

values ensure that matches are identified only for features that correspond with high accuracy, making this approach particularly effective for problems with minimal displacements. However, for problems involving larger displacements or complex interactions, lower

α

values may lead to missed matches, resulting in both false positives and false negatives, as defects are classified as newly detected rather than matched to their actual correspondents. Conversely, higher

α

values cause the linear optimization to favor farther neighbors, as the solution is now stabilizing for these neighbors at the expense of the true, closer correspondents, which similarly increases false positives and false negatives, thereby reducing the overall performance.

Dann and Dann [13] proposed a feasible range of

α \in [0, 0.1]

for the annealing model. This range aligns with the Voronoi model, where performance significantly declines beyond these values, as detailed in Section 3.1.1. On the other hand,

α

has a notably weaker influence on the proposed model, as detailed in Section 3.2.2 and Section 3.4.2. While performance does decline beyond this range, the model consistently maintains recall above 90.4% for the proposed approach using DENC and 87.4% for the DBSCAN-based alternative. This is primarily because many features are either matched directly (Category 1) or through distance-based filtering (Categories 2 and 3), which results in reduced problem size, distortions, and outliers, thereby enhancing transformation precision and stabilizing the correspondence optimization for changes in or extended ranges of

α

4.3.3. Merging Distance Threshold

The merging distance threshold

λ

governs the model’s ability to address defect merging (Category 2). Smaller

λ

values effectively minimize false positives by restricting matches to closely spaced features. However, this strict matching criterion may fail to capture actual matches in the case of significant interactions between adjacent defects, leading to reduced recall. Conversely, larger

λ

values enhance recall for these scenarios, but they may introduce false matches in segments with fewer interactions. These false positives become particularly problematic when the clustering setup is unsuitable, causing inappropriately formed clusters to be classified as defect merging problems (Category 2). In such cases, the merging distance threshold may inadvertently associate unrelated features, undermining the model’s precision.

4.4. Applicability of the Proposed Model to Match Internal and External Features

The proposed model is primarily designed for high-density internal corrosion, where features are often in close proximity. In contrast, external corrosion typically occurs in isolated locations, resulting in reduced complexity and less manual effort for feature matching. Nevertheless, the proposed model is well-suited for both internal and external feature matching. Like the annealing and Voronoi models, the proposed model also addresses internal and external feature matching independently as two separate problems. Mixing internal and external features during data clustering is not methodologically appropriate, as it could result in mixed clusters and/or matching problems, ultimately compromising the model’s feasibility.

4.5. Insights into the Proposed Model’s Runtime Performance

The proposed model, including its DBSCAN-based alternative, demonstrated strong computational performance, completing feature matching for the six pipeline segments in an average of 0.22 s—approximately 170% faster than the Voronoi model. This efficiency is achieved by reducing the number of features requiring computationally intensive affine transformation and linear optimization. Many features are either identified as outliers through clustering, directly matched, or efficiently processed using fast distance-based filtering methods. These measurements were based on 100 iterations using the optimal configurations obtained from the analysis presented in Section 3. The experiments were conducted on a personal computer running Windows 10 with 16 GB of RAM and an Intel i7 CPU. By comparison, other methods, such as the annealing model, require significantly longer runtimes of 3.1 s per segment [13]. This runtime advantage underscores the model’s suitability for large-scale applications, where both speed and accuracy are critical.

While machine differences undoubtedly impact these metrics, variations can also arise from the model’s configurations. For instance, higher proximity thresholds reduce cluster diversity, leading to more point matching problems and consequently longer runtimes. Additionally, runtime is significantly influenced by the choice of clustering and point matching techniques.

4.6. Considerations for Detection Limitations and Pipeline Materials

The proposed model operates based on the data reported by in-line inspection (ILI) tools. If corrosion defects are covered by rust, biofilms, or scale, these surface layers may influence the accuracy of the inspection by interfering with the tool’s ability to detect, locate, and measure defects. Additionally, if a defect is too small to be detected by the tool, it is inherently excluded from the analysis. Once a defect is detected and reported, the proposed model’s alternatives (DENC and DBSCAN-based) process it as a discrete feature represented as a point on a planar surface, emphasizing its spatial positioning rather than its physical dimensions. Consequently, the model’s effectiveness is not dependent on defect size but rather on the consistency of defect locations across inspections. However, since the model’s framework is extensible to other clustering and feature matching techniques, its dependence on defect size and/or location may vary.

Moreover, different pipeline materials exhibit varying responses to corrosive environments. While these responses indeed affect corrosion type, density, distribution, and severity, the model is designed with a parametric and extensible structure, potentially allowing it to accommodate these complexities. The process governing clustering, classification, and feature matching can be adjusted to account for material-dependent corrosion behaviors. However, it is important to acknowledge that the case study presented in this research is based on an API 5L X52 steel pipeline. While the model is adaptable, its performance and behavior may vary across different materials and corrosive environments.

5. Conclusions and Future Work

In-line inspection (ILI) data are subject to uncertainties due to the limitations of inspection tools and the complex behavior of corrosion. These uncertainties challenge feature matching and negatively influence its accuracy. This study proposed an extensible feature matching model based on consecutive ILIs and data clustering to effectively capture spatial variability and interactions between adjacent defects. Matching strategies, such as point matching and distance-based filtering, are then applied following a classification approach to solve the matching problem independently within each cluster. This framework successfully demonstrated its ability to address key challenges in feature matching. For instance, isolated corresponding features and merging defects were efficiently identified. Localized transformations within clusters minimized the influence of distortions and outliers, enabling more precise feature alignment. Additionally, a new clustering technique, DENC, was developed to capture directional variability and identify outliers in ILI data by utilizing spatial graph structures and directional proximity thresholds, further enhancing the model’s performance.

The proposed model, including its DBSCAN-based alternative, was evaluated on six internally corroded pipeline segments with varying ILI data complexities. The model achieved a recall of 91.5%, outperforming the DBSCAN-based alternative, which achieved 89.6%, reflecting DENC’s ability to align with the directional variability in ILI data. Nevertheless, both alternative models demonstrated noticeable performance and stability improvements over exclusively point matching models such as the Voronoi model by Amaya-Gómez et al. [10], which achieved a recall of 84.4%.

Future research could explore adaptive parameter tuning strategies to optimize the model for varying conditions along the pipeline. Parameters such as

ε_{x}

ε_{y}

α

, and

λ

could be dynamically adjusted based on feature density, position variability, alignment patterns, or interaction complexities across different pipeline sections. Furthermore, future research could address the limitations of spatial clustering techniques such as DENC and DBSCAN, which cannot guarantee the accurate representation of all defect merging configurations, as these methods rely on spatial proximity, and merged defects may be misclassified if they are close in proximity to unrelated defects.

Author Contributions

Conceptualization, M.S.; methodology, M.S.; software, M.S.; validation, M.S. and P.F.; formal analysis, M.S.; investigation, M.S.; resources, M.S. and P.F.; data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, M.S. and P.F.; visualization, M.S.; supervision, P.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ILI	In-line inspection
DENC	Directional epsilon neighborhood clustering
UT	Ultrasonic testing
MFL	Magnetic flux leakage
POD	Probability of detection
POA	Probability of false alarm
TPS-RPM	Thin plate spline robust point matching
ICP	Iterative closest point
MSE	Mean square error
DBSCAN	Density-based spatial clustering of applications with noise
DFS	Depth-first search
BFS	Breadth-first search
RFM-SCAN	Robust feature matching using spatial clustering with heavy outliers
PCA	Principal component analysis
ADCN	Anisotropic density-based clustering with noise
QUAC	Quick unsupervised anisotropic clustering

References

Iannuzzi, M.; Frankel, G. The carbon footprint of steel corrosion. NPJ Mater. Degrad. 2022, 6, 101. [Google Scholar] [CrossRef] [PubMed]
Al-Sabaeei, A.M.; Alhussian, H.; Abdulkadir, S.J.; Jagadeesh, A. Prediction of oil and gas pipeline failures through machine learning approaches: A systematic review. Energy Rep. 2023, 10, 1313–1338. [Google Scholar] [CrossRef]
DNV GL. DNV-RP-F116 Integrity Management of Submarine Pipeline Systems; DNV GL: Høvik, Norway, 2021. [Google Scholar]
May, Z.; Alam, M.K.; Nayan, N.A. Recent advances in nondestructive method and assessment of corrosion undercoating in carbon-steel pipelines. Sensors 2022, 22, 6654. [Google Scholar] [CrossRef] [PubMed]
Chen, P.; Li, R.; Jia, G.; Lan, H.; Fu, K.; Liu, X. A decade review of the art of inspection and monitoring technologies for long-distance oil and gas pipelines in permafrost areas. Energies 2023, 16, 1751. [Google Scholar] [CrossRef]
Parlak, B.; Yavasoglu, H. A Comprehensive analysis of in-line inspection tools and technologies for steel oil and gas pipelines. Sustainability 2023, 15, 2783. [Google Scholar] [CrossRef]
DNV GL. DNV-RP-F101 Corroded Pipelines; Amended 2021; DNV GL: Høvik, Norway, 2019. [Google Scholar]
AMSE. B31G—Manual for Determining the Remaining Strength of Corroded Pipeline; ASME: New York, NY, USA, 2023. [Google Scholar]
Beben, D.; Steliga, T. Monitoring and preventing failures of transmission pipelines at oil and natural gas plants. Energies 2023, 16, 6640. [Google Scholar] [CrossRef]
Amaya-Gómez, R.; Schoefs, F.; Sánchez-Silva, M.; Muñoz, F.; Bastidas-Arteaga, E. Matching of corroded defects in onshore pipelines based on in-line inspections and Voronoi partitions. Reliab. Eng. Syst. Saf. 2022, 223, 108520. [Google Scholar] [CrossRef]
API. API Standard 1163: In-Line Inspection Systems Qualification Standard; American Petroleum Institute: Washington, DC, USA, 2021. [Google Scholar]
POF. Specifications and Requirements for In-Line Inspection of Pipelines; Pipeline Operators Forum: Amsterdam, The Netherlands, 2021. [Google Scholar]
Dann, M.; Dann, C. Automated matching of pipeline corrosion features from in-line inspection data. Reliab. Eng. Syst. Saf. 2017, 162, 40–50. [Google Scholar] [CrossRef]
NACE Standard. SP0102-2017 Standard Recommended Practice In-Line Inspection of Pipelines; NACE International: Houston, TX, USA, 2017. [Google Scholar]
Pakrashi, V.; Schoefs, F.; Memet, J.B.; O’Connor, A. ROC dependent event isolation method for image processing based assessment of corroded harbour structures. Struct. Infrastruct. Eng. 2008, 6, 365–378. [Google Scholar] [CrossRef]
Liu, H.; Liu, Z.; Taylor, B.; Dong, H. Matching pipeline in-line inspection data for corrosion characterization. NDT E Int. 2019, 101, 44–52. [Google Scholar] [CrossRef]
Ismail, M.F.; May, Z.; Asirvadam, V.S.; Nayan, N.A. Machine-learning-based classification for pipeline corrosion with monte carlo probabilistic analysis. Energies 2023, 16, 3589. [Google Scholar] [CrossRef]
Van Wamelen, P.B.; Li, Z.; Iyengar, S.S. A fast expected time algorithm for the 2-D point pattern matching problem. Pattern Recognit. 2003, 37, 1699–1711. [Google Scholar] [CrossRef]
Chui, H.; Rambo, J.; Duncan, J.; Schultz, R.; Rangarajan, A. Registration of cortical anatomical structures via robust 3D point matching. In Proceedings of the Information Processing in Medical Imaging (IPMI), Visegrad, Hungary, 28 June–2 July 1999; pp. 114–121. [Google Scholar]
Gold, S.; Rangarajan, A.; Lu, C.P.; Pappu, S.; Mjolsness, E. New algorithms for 2-D and 3-D point matching: Pose estimation and correspondence. Pattern Recognit. 1998, 31, 1019–1031. [Google Scholar] [CrossRef]
Rangarajan, A.; Chui, H.; Bookstein, F.L. The softassign Procrustes matching algorithm. In Proceedings of the Information Processing in Medical Imaging (IPMI), Poultney, VT, USA, 9–13 June 1997; pp. 29–42. [Google Scholar]
Chui, H.; Rangarajan, A. A new point matching algorithm for non-rigid registration. Comput. Vis. Image Underst. 2003, 89, 114–141. [Google Scholar] [CrossRef]
Besl, P.; McKay, N. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
Chang, S.; Cheng, F.; Hsu, W.; Wu, G. Fast algorithm for point pattern matching: Invariant to translations, rotations and scale changes. Pattern Recognit. 1997, 30, 311–320. [Google Scholar] [CrossRef]
Yang, J. The thin plate spline robust point matching (TPS-RPM) algorithm: A revisit. Pattern Recognit. Lett. 2011, 32, 910–918. [Google Scholar] [CrossRef]
Von Luxburg, U. A tutorial on spectral clustering. arXiv 2007, arXiv:0711.0189. [Google Scholar] [CrossRef]
Bhattacharjee, P.; Mitra, P. A survey of density based clustering algorithms. Front. Comput. Sci. 2021, 15, 151308. [Google Scholar] [CrossRef]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
Tarjan, R. Depth-first search and linear graph algorithms. In Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971), East Lansing, MI, USA, 13–15 October 1971; pp. 114–121. [Google Scholar]
Jiang, X.; Ma, J.; Jiang, J.; Guo, X. Robust Feature Matching Using Spatial Clustering with Heavy Outliers. IEEE Trans. Image Process. 2020, 29, 736–746. [Google Scholar] [CrossRef]
Ren, K.; Ye, Y.; Gu, G.; Chen, Q. Feature matching based on spatial clustering for aerial image registration with large view differences. Optik 2022, 259, 169033. [Google Scholar] [CrossRef]
Mai, G.; Janowicz, K.; Hu, Y.; Gao, S. ADCN: An anisotropic density-based clustering algorithm for discovering spatial point patterns with noise. Trans. GIS 2018, 22, 348–369. [Google Scholar] [CrossRef]
Hanwell, D.; Mirmehdi, M. QUAC: Quick unsupervised anisotropic clustering. Pattern Recognit. 2014, 47, 427–440. [Google Scholar] [CrossRef]
Dai, W.; Zhang, J.; Sun, X. On solving multi-commodity flow problems: An experimental evaluation. Chin. J. Aeronaut. 2017, 30, 1481–1492. [Google Scholar] [CrossRef]
Rosen Group. Tethered Ultrasonic In-line Inspection Solution. Available online: https://contenthub.rosen-group.com/api/public/content/02c9d6b9fa6b4f219f14f7b38bc23f0f (accessed on 29 December 2024).
Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
Benjamin, A.C.; Freire, J.L.F.; Vieira, R.D.; Cunha, D.J.S. Interaction of corrosion defects in pipelines—Part 1: Fundamentals. Int. J. Press. Vessel. Pip. 2016, 144, 56–62. [Google Scholar] [CrossRef]

Figure 1. Fragments of an internally corroded pipe, illustrating metal loss and wall thinning caused by corrosion. These images are sourced from the research conducted by Beben and Steliga [9].

Figure 2. Illustration of affine transformation on a two-dimensional plane, demonstrating how translation, scaling, and rotation facilitate correspondence between moving and reference sets.

Figure 3. Pipeline segmentation as proposed by Dann and Dann [13].

Figure 4. Pipeline unrolling and moving set double unrolling as proposed by Dann and Dann [13]—example problem.

Figure 5. Identification of mixed nearest neighbors using Voronoi tessellations as proposed by Amaya-Gómez et al. [10] using

δ

= 0.11 m—example problem.

Figure 5. Identification of mixed nearest neighbors using Voronoi tessellations as proposed by Amaya-Gómez et al. [10] using

δ

= 0.11 m—example problem.

Figure 6. Two-dimensional presentation of features (length and width), illustrating how interaction between adjacent defects and corrosion variable growth challenge feature matching and influence defect positioning across inspections—example problem.

Figure 7. Illustration of the matching problems the proposed framework aims to solve, demonstrating how clustering should facilitate isolated correspondence matching, merging defect matching, and localized transformation problems across ILIs.

Figure 8. Illustration of the proposed model’s workflow and extensibility, highlighting its parameters, data clustering using DENC and DBSCAN, cluster classification into categories, distance-based filtering, point matching using the Voronoi model, and the process of identifying matching and outlier features.

Figure 9. Establishing adjacency relationships in DENC based on boundaries defined by the directional proximity thresholds

ε_{x}

= 0.300 m and

ε_{y}

= 0.150 m—example problem.

Figure 9. Establishing adjacency relationships in DENC based on boundaries defined by the directional proximity thresholds

ε_{x}

= 0.300 m and

ε_{y}

= 0.150 m—example problem.

Figure 10. Graphical representation of the directed edges and binary adjacency matrix in DENC using

ε_{x}

= 0.300 m and

ε_{y}

= 0.150 m—example problem.

Figure 10. Graphical representation of the directed edges and binary adjacency matrix in DENC using

ε_{x}

= 0.300 m and

ε_{y}

= 0.150 m—example problem.

Figure 11. Clusters (represented by distinct colors) and outliers obtained using DENC with

ε_{x}

= 0.300 m and

ε_{y}

= 0.150 m—example problem.

Figure 11. Clusters (represented by distinct colors) and outliers obtained using DENC with

ε_{x}

= 0.300 m and

ε_{y}

= 0.150 m—example problem.

Figure 12. Outlier and cluster classification, illustrating the four density-based categories: (1) one-to-one, (2) one-to-many, (3) many-to-one, and many-to-many—example problem.

Figure 13. Feature matching results obtained by the proposed model using

ε_{x}

= 0.300 m,

ε_{y}

= 0.150 m,

λ

= 0.250 m,

δ

= 0.110 m, α = 0.010, and τ = 0.001—example problem.

Figure 13. Feature matching results obtained by the proposed model using

ε_{x}

= 0.300 m,

ε_{y}

= 0.150 m,

λ

= 0.250 m,

δ

= 0.110 m, α = 0.010, and τ = 0.001—example problem.

Figure 14. Feature matching results obtained by the Voronoi model [10] using

δ

= 0.110 m,

α

= 0.020, and

τ

= 0.001—example problem.

Figure 14. Feature matching results obtained by the Voronoi model [10] using

δ

= 0.110 m,

α

= 0.020, and

τ

= 0.001—example problem.

Figure 15. Illustration of the pipeline inspection setup from the manned wellhead platform to the unmanned wellhead platform.

Figure 16. Two-dimensional presentation (length and width) of all features across the six pipeline segments, S1 to S6—case study.

Figure 17. Sensitivity of the Voronoi model [7] to outlier proportion parameter

α

using

δ

= 0.110 m and

τ

= 0.001—case study.

Figure 17. Sensitivity of the Voronoi model [7] to outlier proportion parameter

α

using

δ

= 0.110 m and

τ

= 0.001—case study.

Figure 18. Sensitivity of the Voronoi model [7] to defect’s position uncertainty threshold

δ

using

α

= 0.080 and

τ

= 0.001—case study.

Figure 18. Sensitivity of the Voronoi model [7] to defect’s position uncertainty threshold

δ

using

α

= 0.080 and

τ

= 0.001—case study.

Figure 19. Sensitivity of the proposed model to DENC’s directional proximity thresholds

ε_{x}

and

ε_{y}

using

λ

= 0.250 m,

δ

= 0.110 m,

α

= 0.080, and

τ

= 0.001—case study.

Figure 19. Sensitivity of the proposed model to DENC’s directional proximity thresholds

ε_{x}

and

ε_{y}

using

λ

= 0.250 m,

δ

= 0.110 m,

α

= 0.080, and

τ

= 0.001—case study.

Figure 20. Sensitivity of the proposed model to outlier proportion parameter

α

using

ε_{x}

= 0.220 m,

ε_{y}

= 0.110 m,

λ

= 0.250 m,

δ

= 0.110 m, and

τ

= 0.001—case study.

Figure 20. Sensitivity of the proposed model to outlier proportion parameter

α

using

ε_{x}

= 0.220 m,

ε_{y}

= 0.110 m,

λ

= 0.250 m,

δ

= 0.110 m, and

τ

= 0.001—case study.

Figure 21. Sensitivity of the proposed model to the merging distance threshold

λ

using

ε_{x}

= 0.220 m,

ε_{y}

= 0.110 m,

δ

= 0.110 m,

α

= 0.043 m, and

τ

= 0.001—case study.

Figure 21. Sensitivity of the proposed model to the merging distance threshold

λ

using

ε_{x}

= 0.220 m,

ε_{y}

= 0.110 m,

δ

= 0.110 m,

α

= 0.043 m, and

τ

= 0.001—case study.

Figure 22. Sensitivity of the proposed DBSCAN-based alternative model to the proximity threshold

ε

using

λ

= 0.250 m,

δ

= 0.110 m,

α

= 0.080, and

τ

= 0.001—case study.

Figure 22. Sensitivity of the proposed DBSCAN-based alternative model to the proximity threshold

ε

using

λ

= 0.250 m,

δ

= 0.110 m,

α

= 0.080, and

τ

= 0.001—case study.

Figure 23. Sensitivity of the proposed DBSCAN-based alternative model to outlier proportion parameter

α

using

ε

= 0.250 m,

λ

= 0.250 m,

δ

= 0.110 m, and

τ

= 0.001—case study.

Figure 23. Sensitivity of the proposed DBSCAN-based alternative model to outlier proportion parameter

α

using

ε

= 0.250 m,

λ

= 0.250 m,

δ

= 0.110 m, and

τ

= 0.001—case study.

Figure 24. Sensitivity of the proposed DBSCAN-based alternative model to the merging distance threshold

λ

using

ε

= 0.250 m,

δ

= 0.110 m,

α

= 0.055, and

τ

= 0.001—case study.

Figure 24. Sensitivity of the proposed DBSCAN-based alternative model to the merging distance threshold

λ

using

ε

= 0.250 m,

δ

= 0.110 m,

α

= 0.055, and

τ

= 0.001—case study.

Figure 25. Feature clustering (represented by distinct colors) using DBSCAN (top) and DENC (bottom) using

ε

= 0.250 m,

ε_{x}

= 0.220 m, and

ε_{y}

= 0.110 m—case study for segment S5.

Figure 25. Feature clustering (represented by distinct colors) using DBSCAN (top) and DENC (bottom) using

ε

= 0.250 m,

ε_{x}

= 0.220 m, and

ε_{y}

= 0.110 m—case study for segment S5.

Table 1. Number of features, observed matches, new features, and complexity of ILI data across the six pipeline segments—case study.

Segment	Features Q	Features P	Observed Matches	New Features	Complexity
S1	45	62	54	7	High
S2	38	18	18	20	Low
S3	45	33	33	14	Moderate
S4	29	15	15	14	Low
S5	31	39	39	4	High
S6	51	57	52	0	Moderate
Total	239	224	211	59	-

Table 2. Segment-level performance comparison between the proposed model and the Voronoi model [7] using

ε_{x}

= 0.220 m,

ε_{y}

= 0.110 m,

λ

= 0.220 m,

δ

= 0.110 m,

α

(proposed model) = 0.043,

α

(Voronoi model) = 0.080, and

τ

= 0.001—case study.

Table 2. Segment-level performance comparison between the proposed model and the Voronoi model [7] using

ε_{x}

= 0.220 m,

ε_{y}

= 0.110 m,

λ

= 0.220 m,

δ

= 0.110 m,

α

(proposed model) = 0.043,

α

(Voronoi model) = 0.080, and

τ

= 0.001—case study.

Segment	Complexity	Proposed Model			Point Matching (Voronoi Model)
Segment	Complexity	Recall %	Precision %	F1 Score %	Recall %	Precision %	F1 Score %
S1	High	77.0	92.2	83.9	60.7	82.2	69.8
S2	Low	100.0	100.0	100.0	100.0	100.0	100.0
S3	Moderate	95.7	100.0	97.8	91.5	95.6	93.5
S4	Low	100.0	100.0	100.0	100.0	100.0	100.0
S5	High	86.0	97.4	91.4	69.8	96.8	81.1
S6	Moderate	98.1	100.0	99.0	98.1	100.0	99.0
Overall		91.5	98.0	94.6	84.4	95.4	89.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shatnawi, M.; Földesi, P. A New Extensible Feature Matching Model for Corrosion Defects Based on Consecutive In-Line Inspections and Data Clustering. Appl. Sci. 2025, 15, 2943. https://doi.org/10.3390/app15062943

AMA Style

Shatnawi M, Földesi P. A New Extensible Feature Matching Model for Corrosion Defects Based on Consecutive In-Line Inspections and Data Clustering. Applied Sciences. 2025; 15(6):2943. https://doi.org/10.3390/app15062943

Chicago/Turabian Style

Shatnawi, Mohamad, and Péter Földesi. 2025. "A New Extensible Feature Matching Model for Corrosion Defects Based on Consecutive In-Line Inspections and Data Clustering" Applied Sciences 15, no. 6: 2943. https://doi.org/10.3390/app15062943

APA Style

Shatnawi, M., & Földesi, P. (2025). A New Extensible Feature Matching Model for Corrosion Defects Based on Consecutive In-Line Inspections and Data Clustering. Applied Sciences, 15(6), 2943. https://doi.org/10.3390/app15062943

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Extensible Feature Matching Model for Corrosion Defects Based on Consecutive In-Line Inspections and Data Clustering

Abstract

Featured Application

Abstract

1. Introduction

1.1. Point Matching Problem: Identifying Matching Corrosion Features Based on Consecutive ILIs

1.2. Limitations of Feature Matching That Relies on Affine Transformations

1.3. Resolution of the Feature Matching Problem Using Clustering

1.4. Novelty and Purpose of the Study

2. Research Methodology

2.1. Data Clustering

2.1.1. ILI Data Clustering Using DENC

2.1.2. ILI Data Clustering Using DBSCAN

2.2. Cluster Classification and Feature Matching

2.2.1. Category 1: One-to-One

2.2.2. Category 2: One-to-Many

2.2.3. Category 3: Many-to-One

2.2.4. Category 4: Many-to-Many

2.3. Complete Solution for the Given Example Problem

3. Experiment and Analysis

3.1. Sensitivity Analysis of the Voronoi Model’s Parameters

3.1.1. Sensitivity Analysis of the Outlier Proportion Parameter

3.1.2. Sensitivity Analysis of the Defect’s Position Uncertainty Threshold

3.2. Sensitivity Analysis of the Proposed Model’s Parameters

3.2.1. Sensitivity Analysis of the Directional Proximity Thresholds

3.2.2. Sensitivity Analysis of the Outlier Proportion Parameter

3.2.3. Sensitivity Analysis of the Merging Distance Threshold

3.3. Segment-Level Analysis of the Proposed Model and the Voronoi Model

3.4. Sensitivity Analysis of the Proposed DBSCAN-Based Alternative Model’s Parameters

3.4.1. Sensitivity Analysis of DBSCAN’s Proximity Threshold

3.4.2. Sensitivity Analysis of the Outlier Proportion Parameter

3.4.3. Sensitivity Analysis of the Merging Distance Threshold

4. Discussion

4.1. Improvements Brought to Feature Matching Using Clustering and DENC

4.2. Effectiveness of the Proposed Classification of Clusters

4.3. Influence of the Proposed Model’s Parameters on Balancing Performance and Clustering Efficiency

4.3.1. Proximity Thresholds

4.3.2. Outlier Proportion Parameter

4.3.3. Merging Distance Threshold

4.4. Applicability of the Proposed Model to Match Internal and External Features

4.5. Insights into the Proposed Model’s Runtime Performance

4.6. Considerations for Detection Limitations and Pipeline Materials

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI