[go: up one dir, main page]

0% found this document useful (0 votes)
5 views8 pages

U-Shape Network For Chip Surface Defect Detection

The document presents a novel U-shape network architecture for chip surface defect detection, addressing challenges in distinguishing small defects from complex backgrounds. It highlights the limitations of existing deep learning methods in industrial defect detection and proposes enhancements to improve segmentation accuracy through multi-scale feature fusion and attention mechanisms. The research aims to provide a viable solution for effective defect detection in semiconductor capacitor components, which are critical for electronic device reliability.

Uploaded by

63kanarya36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

U-Shape Network For Chip Surface Defect Detection

The document presents a novel U-shape network architecture for chip surface defect detection, addressing challenges in distinguishing small defects from complex backgrounds. It highlights the limitations of existing deep learning methods in industrial defect detection and proposes enhancements to improve segmentation accuracy through multi-scale feature fusion and attention mechanisms. The research aims to provide a viable solution for effective defect detection in semiconductor capacitor components, which are critical for electronic device reliability.

Uploaded by

63kanarya36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2024 IEEE International Conference on e-Business Engineering (ICEBE)

U-shape network for chip surface defect detection


Rui Ruan Wei Zou*
School of Software and Engineering Guangzhou Top-Leading Intelligent Co.,Ltd
South China University of Technology GuangZhou, China
GuangZhou, China zou.w@top-leading.com
331382125@qq.com
Zhuorui Qin
Zhenwei Li School of Software and Engineering
Guangzhou Top-Leading Intelligent Co.,Ltd South China University of Technology
GuangZhou, China GuangZhou, China
li.zw@top-leading.com 202121046957@mail.scut.edu.cn

Qingyao Wu Shenghui Huang*


2024 IEEE International Conference on e-Business Engineering (ICEBE) | 979-8-3503-6585-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICEBE62490.2024.00053

School of Software and Engineering School of Software and Engineering


South China University of Technology South China University of Technology
GuangZhou, China GuangZhou, China
qyw@scut.edu.cn huangshenghui68@163.com

Abstract—Currently, deep learning methods and models have the growing market demand. However, due to the complex
achieved significant progress in image segmentation and have manufacturing process of capacitor components, their quality
been successfully applied in various practical applications. How-
ever, common methods and models are usually only suitable for
directly affects the performance and reliability of electronic
situations where the segmentation target occupies a large portion devices. Therefore, the appearance inspection, electrical per-
of the input image and has a very obvious difference with the formance testing, and sorting and packaging of capacitors after
background. Additionally, they require high-quality datasets with production are critical steps to ensure product quality. Accurate
balanced categories and large quantities. In the field of industrial and efficient detection of semiconductor capacitor components
defect detection, defects often occupy a small area of the image
and are difficult to distinguish from the background. The edges
has become an important market demand.
of defects are often blurry, and the differences in shape and In recent years, deep learning has been shown to be robust
color between defect categories are relatively small. Moreover, in to background, illumination, color, shape, size, and intensity,
industrial settings, acquiring a large amount of defect data and which is particularly ideal for detecting complex surface
labeling it is very expensive. Therefore, the available training data defects in industrial environments. As surface defects can take
in practical applications is often small and imbalanced. These
characteristics make it difficult to train common image detection
many forms, the surface to be detected may have multiple
models on industrial defect datasets. To address these issues, we types of these defects simultaneously. To address the defect
propose a novel network architecture based on the Unet [25] detection, several methods have emerged. Some supervised
segmentation model. This architecture effectively analyzes and learning methods use datasets including DAGM2007, road
utilizes image features extracted by commonly used backbone crack datasets [1], railway datasets [2], fabric datasets [3], sili-
networks. As a result, the model can detect defects that closely
resemble the background and occupy small regions. We also
con steel strip datasets [4], and railway defect datasets [5]. The
introduce a practical method for training segmentation models work of supervised methods mainly focuses on how to obtain
on low-quality datasets. Compared to existing segmentation efficient feature representations in images, and many network
models commonly used in image segmentation, our proposed modules have been proposed for this purpose. For example,
architecture demonstrates better performance across various [6] uses an autoencoder network to learn the representations
evaluation metrics. This research aims to provide a viable solution
for industrial defect detection.
of these local anomalies and find common features between
Index Terms—Image Segmentation, Defect Detection, Camou- different defects. Another method [7] proposes a multi-modal
flaged Object Detection, Small Object Detection, Gaussian pyramid scheme using patches, combined with con-
volutional denoising autoencoder (CDAE) networks at each
I. I NTRODUCTION level, for defect detection on textured surfaces. It learns the
pattern distribution in the reference image. Method [8] detects
With the development and expansion of China’s industry, insulators in traction power systems connected to electrified
the demand for electronic components in various sectors has railways through segmentation methods. In addition to design-
also been increasing. As one of the most commonly used ing new network modules with CNN [36], the use of attention
electronic components, capacitors are now being produced mechanisms has also been proven useful in many works.
on a large and mass scale by major manufacturers to meet For example, [9] adds attention mechanisms in FPN [26] to

2472-8527/24/$31.00 ©2024 IEEE 298


DOI 10.1109/ICEBE62490.2024.00053
Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on February 11,2025 at 07:17:35 UTC from IEEE Xplore. Restrictions apply.
enhance the obtained feature representation. In addition to The entire structure utilizes multi-scale features from the
adding small modules to traditional detection model structures encoder module and gradually restores the image resolution
to enhance feature extraction capabilities, some works focus through the decoder module, predicting pixel classifications.
on learning the differences between defect and normal images
to obtain the ability to distinguish defects. For example, B. Camouflaged Object Segmentation (COS)
[10] combines autoencoders and GAN [37] networks to form Camouflaged Object Segmentation (COS) has a long and
a special image restoration network. It inputs images with rich research history in many fields such as biology and art
defects and outputs restored images without defects. Then, [20], and it has been significantly influenced by two notable
it separately calculates the LBP feature values of the restored studies [13], [21]. Early works related to camouflage focus
image and the defect image, and finds the difference between on distinguishing the foreground from the background based
the two. on handcrafted low-level features such as texture [19], 3D
However, the task addressed in this paper involves a chip convexity [17], and motion [16]. These methods are effective
defect detection dataset that differs significantly from common in a few simple cases but often fail in complex scenes.
segmentation datasets. First, there is considerable variability Recently, Le et al. [15] proposed an end-to-end network for
between defect categories—some categories can occupy nearly camouflaged object segmentation by integrating classification
half of the input image’s pixels, while others occupy only a information into pixel-level segmentation. Yan et al. [18] fur-
small portion. The positions of the defects are not fixed and do ther introduced an adversarial attack to enhance segmentation
not consistently appear in the center of the image. This char- accuracy. Fan et al. [14] developed a simple yet effective
acteristic contrasts with standard datasets, where deep learning framework, termed SINet, and constructed the current largest
networks can simply and quickly locate the recognition area COS dataset, COD10K, to advance COS in the deep learning
in the image’s center. This variability complicates the learning era.
process for small defect categories. Moreover, the differences The proposed PFNet [32] possesses both precise recognition
between small and large defect targets are substantial, but capabilities for camouflaged targets and fast inference abilities,
their colors are very similar to the background, making them PFNet proposed two noval modules to enhance segmentation
difficult to distinguish without specialized knowledge. model’s ability to verify the difference between background
Though methods above have their strengthen, but none of and target.For the input image I, feature extraction is first
them can be used in chip defect segmentation for the lack performed using the backbone, resulting in four intermediate
4
of ability to distinguish the difference between defects.Our feature outputs fk=1 . For the feature with the richest semantic
contribution can be summarized as follows: information {fk=4 }, a channel attention mechanism is applied
• A chip defect detection method combining camouflage first, followed by a spatial attention mechanism to obtain
target detection techniques is proposed. By further inte- the enhanced feature {fk=4 }′ . The detailed process for this
grating semantic and spatial information, and employing operation is as follows: for the input feature F , F consists of
both classification and segmentation losses, the model is three dimensions C, H, W , where C is the channel, and H, W
compelled to be more suitable for surface defect detec- are the height and width, respectively. First, the shape of F is
tion compared to conventional multi-class segmentation adjusted to obtain three feature matrices Q, K, V , each with a
model. shape of C ∗(H ∗W ). Then, matrices Q and K are multiplied,
• Extensive experiments conducted on the proposed dataset and the channel attention matrix X with a shape of C ∗ C is
validate the merits of our approach against common obtained through softmax.
multi-class image segment of defect detection methods. exp (Qi Kj )
xi,j = PC (1)
II. R ELATED W ORK j=1 exp (Qi Kj )

A. Image segmentation xi,j represents the influence of the j-th channel on the i-th
channel. Then, X is multiplied by V to obtain the channel
In 2015, Long et al. from the University of California,
attention-enhanced feature F ′ .
Berkeley, proposed the Fully Convolution Network (FCN)
[11], which performs pixel-level classification on input images X
C

to solve the problem of semantic segmentation, producing a Fi′ = (xij Vj ) + Fi (2)


j=1
prediction for each pixel while retaining the spatial informa-
tion of the original input image. Pixel-wise classification is Thus, useful semantic information for identifying defect tar-
then performed on the upsampled feature map, with a softmax gets is filtered out.
loss computed for each pixel, treating each pixel as a training Camouflaged objects typically share a similar appearance
sample. with the background, leading to false positive and false
U-Net [12], proposed in 2015, was designed to solve negative predictions in the initial segmentation results. The
medical image segmentation problems. It features a U-shaped Focus Module (FM) is designed to identify and eliminate
network structure that captures both contextual and spatial these erroneous predictions. Humans, after careful analysis,
information. Overall, U-Net has an encoder-decoder structure. can distinguish distracting areas through contextual reasoning

299

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on February 11,2025 at 07:17:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. The overall architecture of our model extracts features from each layer of the input image through the backbone. As the features progress from higher
layers to lower layers, the semantic information gradually strengthens while the spatial information gradually weakens. These features are then input into the
proposed MS fusion module to enhance the semantic and spatial information of each layer. The enhanced features are subsequently fed into the PFNet [32]
decoder, which outputs the final result.

by comparing patterns, such as texture and semantics, of Ff nd , enhancing both false positive and false negative iden-
ambiguous regions with confident regions to make the final tification capabilities, obtaining the current layer’s enhanced
decision. This paper proposes using the focus module in PFNet feature Fi , which can serve as the input for the focus module
[32] to mimic this behavior to identify chip defects that are of the previous layer. The input for the second layer’s focus
highly similar to the background. module is the output feature of the bottom layer attention
First, for the next layer output feature Fh , a convolution mechanism module.
operation reduces the number of channels from 512 to 1 to For Ff pd , Ff nd , and the output feature Fi :
obtain a rough estimate Pi+1 of the defect target’s position.
Ff pd = CE(Fc ∗ U p(Pi+1 )) (3)
Then, through the FM module (focus module), the current
layer feature Fc is multiplied by the binary feature map Pi+1 Ff nd = CE(Fc ∗ U p(1 − P i+1 )) (4)
of the rough estimate of the defect target’s position from the
next layer, followed by a spatial convolution pyramid CE Fi = BR (BR (U p (CBR (Fh )) − Ff pd ) + Ff nd ) (5)
block. The spatial convolution pyramid uses 1x1, 3x3, 5x5,
and 7x7 convolutions to achieve different receptive fields for
the same input feature. The outputs of these four convolution
operations are concatenated and then scaled back to the III. M ETHODOLOGY
original channel size through a 1x1 convolution to obtain Our motivation is to leverage camouflage object detec-
the feature Ff nd , enhancing the feature’s ability to perceive tion methods to accomplish the task of chip surface defect
contextual information of non-defective target areas. detection. Since PFNet [32] can achieve both high detec-
Next, Fc is multiplied by the single-channel feature map tion accuracy and fast inference speed, we have chosen to
Pi+1 of the rough estimate of the defect target’s position from modify the PFNet architecture to suit our task. To address
the next layer, followed by a spatial convolution pyramid CE the three main challenges of this task—significant variability
block to enhance the feature’s ability to capture contextual among segmentation target categories, the non-fixed positions
information of the defect target area, resulting in the feature of segmentation targets, and the extreme similarity between
Ff pd . The next layer’s feature output Fh is then upsampled by targets and the background—we have added an multi-scale
interpolation to the current layer’s feature size. Fh is element- feature fusion (MS fusion) module to the PFNet architecture.
wise subtracted from Ff pd and then element-wise added to This module enhances the semantic and spatial information

300

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on February 11,2025 at 07:17:35 UTC from IEEE Xplore. Restrictions apply.
integration of the features input to the decoder. Additionally, Humans, after careful analysis, can effectively distinguish
we have introduced a category loss and a position loss to force distraction regions by performing contextual reasoning. This
the model to have a clearer understanding of the edges between involves comparing patterns such as texture and semantics
defect categories and their locations relative to the background, between ambiguous regions and confident regions to make a
thereby improving segmentation accuracy. final decision. This paper proposes using the decoder from
For an input image I, we first perform feature extraction PFNet [32] to mimic this behavior in order to identify chip
using PVT [35] (Pyramid Vision Transformer), resulting in defects that are highly similar to the background and acquire
4 4 ′′
four intermediate feature outputs {fk=1 } with feature sizes of the enhanced features fk=1
512, 320, 128, and 64, respectively. To reduce computational Unlike the single-classification task required by camou-
4
load, these four features {fk=1 } are initially channel-reduced flaged object detection, which only needs to identify the
to 512, 256, 128, and 64, respectively, before being input into background and camouflaged objects in the image, the chip
the multi-scale feature fusion module. defect dataset in this paper further requires the classification
of camouflaged objects. For the image segmentation task,
A. multi-scale feature fusion although it only requires classifying each pixel of the input
For the task of camouflage object detection, it is cru- image, the relationships between different pixels also play
cial to enhance shallow, information-rich features with deep, a very important role in classifying each pixel. Therefore, a
semantically rich features. This feature combination allows method [23] has been proposed to divide semantic segmenta-
the network to leverage semantic information for accurate tion into two tasks: pixel prediction and pixel grouping.After
classification of ambiguous targets while maintaining clear the network successfully predicts the segmented image, the
boundaries between targets and the background. This approach segmentation results are further processed to explore the rela-
is also effective for detecting small defects in chip defect tionships between pixels of the same category. This approach
detection datasets. According to research by Guang Chen et aims to further enhance the segmentation capability.
al. [22], the fundamental challenge in small object detection is The feature output of each layer from the feature extraction
4 ′′
the limited coverage area, which restricts the detector’s ability network fk=1 is processed through the SA [38] (Spatial
to capture sufficient information. Small objects often exist Attention) module. Assuming the input to the module is Xin
in specific contexts or alongside other objects, so utilizing and the output is Xout , it can be calculated using the following
contextual information can supplement the limited features formula:
provided by small objects. Contextual information methods Xattn = U p(CBR (AP ool (Xin ))) (7)
improve small object detection accuracy by exploiting the
relationships between small objects and other objects or the Xout = Xattn ∗ CBR (Xin ) + Xattn (8)
background.Based on this insight, we propose a multi-scale To improve the model’s ability to learn input features, this
4
feature fusion module to integrate the features fk=1 and paper proposes a multi-scale supervised training method to
enhance the contextual and semantic information of each train the proposed defect detection network. Let the output
feature. For the output features fk and fk+1 from adjacent features of the PFNet [32] decoder be Fi=1 4
. These fea-
layers, we first resize fk+1 through interpolation to match the tures are passed through a 3x3 convolution to compress the
size of fk . Then, each feature undergoes a 3x3 convolution- number of feature channels to the number of chip defect
batch normalization-ReLU (CBR) operation, and their out- classes C, resulting in the predicted features for each pixel
put features are concatenated along the channel dimension. 4
{Fi=1 4
}pre . Then, {Fi=1 }pre is input into the SA module [38]
Finally, another 3x3 CBR operation adjusts the number of for pixel aggregation, obtaining the network output features
channels to match the size of fk , producing the output feature 4
{Fi=1 }M ergeP re .

fk+1 . For each output feature {Fi }M ergeP re , global pooling is
For the output features fk+1 ′ obtained through the multi- applied to predict the categories contained in the input image,
scale feature fusion module, they can be calculated using the resulting in {Fi }CategoryP re . Additionally, {Fi }M ergeP re is
following formula: passed through a 3x3 convolution to compress the channels to
1, resulting in the prediction of all defect target locations in
fk+1 ′ = CBR(concat(CBR(fk ),
(6) the input image, denoted as {Fi }P osP re .
CBR(interpolate(fk+1 )))) Therefore, to train the network, we require twelve out-
4 4
puts from the network: {Fi=1 }M ergeP re , {Fi=1 }P osP re , and
Through the above operations, fk+1 ′ can capture the seman- 4
{Fi=1 }CategoryP re .The overall loss calculation is as follows:
tic information and underlying features of the input image
better than fk . This enhances the model’s understanding Lossoverall = Losspreidit + Losspos + Losscategory (9)
of the boundaries between the segmentation target and the
background. Additionally, the larger-sized fk can provide
more contextual information to the smaller-sized but more Lossipredict = OHEM CrossEntropy(FM
i
ergeP re , GT )+
semantically rich fk+1 , thereby laying a better foundation for i
Lovaszloss(FM ergeP re , GT )
identifying small-area chip defects. (10)

301

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on February 11,2025 at 07:17:35 UTC from IEEE Xplore. Restrictions apply.
Lossicategory = CrossEntropy {Fi=1
� 4
}CategoryP re , GTcategory
(11) 

Lossipos = OHEM CrossEntropy FPi osP re , GT


� 
� i  (12)
+Lovaszloss FP osP re , GT
IV. E XPERIMENT
A. Dataset
The chip defect dataset in this paper contains seven cate-
gories: missing gold, corrosion, delamination, burrs, foreign
matter contamination, surface contamination, and scratches.
The dataset consists of a total of 405 images, with 86 images
used as the test set and the remaining 319 images used as
the training set. The number of images for each category is Fig. 3. Chip Defect Dataset Overview
391, 108, 85, 54, 341, 129, and 116, respectively, as shown
in the figure below. The category numbers correspond to
the seven categories: missing gold, corrosion, delamination, B. Metrics
burrs, foreign matter contamination, surface contamination, This paper selects Dice, MIOU (Mean Intersection over
and scratches. Union), precision, recall, and F1-score as the metrics for
evaluating the effectiveness of chip defect detection at the pixel
level. These metrics are calculated based on the pixels, with
the final results being the average of the metrics across all
pixels for each category, rather than calculating the metrics
for each input image and then averaging the values across the
images.
C. Implementation Details
All experiments were conducted using PyTorch on a single
RTX 2080 Ti GPU with 11GB of memory. For training, all
input images were resized to a resolution of 512x512 using
area interpolation, and the ground truth segmentation masks
were resized to 512x512 using nearest-neighbor interpolation.
Random flipping was added as a data augmentation technique.
For the feature extraction network, we used pvt v2 b2 [35]
pre-trained on ImageNet 1k [34],The AdamW optimizer [24]
was employed with a weight decay set to 5e-4. The learning
Fig. 2. defcet number for each class, from 1 to 7, each number behave one
of the defect, 1 behave missing gold, 2 behave corrosion and so on.
rate was set to 1e-4, with a step decay strategy that reduced
the learning rate to 0.9 of its previous value each step. The
batch size was set to 8, and the model was trained for 180
This dataset was obtained through the general six-sided ap-
epochs.
pearance inspection system developed by Guangzhou Nuoding
For testing, the input images were resized to 512x512 using
Intelligent Technology Co., Ltd. on the chip production line.
area interpolation, and the ground truth segmentation masks
The unannotated images containing defects were captured by
were resized to 512x512 using nearest-neighbor interpolation.
the optical camera of the same chip production machine. The
The training set was divided into 80% training and 20%
defect annotations were then provided by relevant experts who
validation for each category, resulting in 344 training images
labeled the defects in the chip images.
and 61 validation images.
To increase data diversity and improve the model’s general-
ization ability, this paper applies various augmentations to the D. Comparison Result
training images in the dataset, including vertical and horizontal Since common defect detection deep learning networks
flipping, mirroring, and random rotation by certain angles. are often only suitable for anomaly detection tasks, they
To reduce memory requirements, the images are resized to can only determine whether there is a defect in the input
512x512 using area interpolation. However, for the validation image or can only segment single-target or single-category
and test sets, no augmentations are applied, and the images input images—we compared the proposed model with com-
are only resized to 512x512 using area interpolation. mon multi-category segmentation networks to demonstrate its
effectiveness. This helps reduce performance degradation of

302

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on February 11,2025 at 07:17:35 UTC from IEEE Xplore. Restrictions apply.
TABLE I
C OMPARISON R ESULT

Dice MIOU precision recall F1-score


FPN [26] 0.727705 0.604304 0.770386 0.696281 0.727705
Unet++ [25] 0.737629 0.613795 0.797900 0.699819 0.737629
DeepLabV3 [27] 0.715826 0.596548 0.783986 0.675586 0.715826
Manet [28] 0.676748 0.571769 0.696545 0.659348 0.676748
Linknet [29] 0.684202 0.577937 0.711107 0.666541 0.684202
PSPNet [30] 0.694371 0.563569 0.772056 0.645644 0.694371
PAN [31] 0.725007 0.598910 0.785082 0.684269 0.725007
Our method 0.772890 0.646458 0.813034 0.742789 0.772890

TABLE II
A BLATION R ESULT

Dice MIOU precision recall F1-score


base 0.741789 0.610547 0.786640 0.708361 0.741789
base + category loss + position loss 0.745001 0.619087 0.770293 0.725101 0.745001
base + MS fusion + category loss + position loss 0.772890 0.646458 0.813034 0.742789 0.772890
a base represents multi-class PFNet [32].

Fig. 4. The predicted output results of all models in this experiment.

comparison models due to human intervention. The models memory. The AdamW optimizer was uniformly used. Since
compared include FPN [26], Unet++ [25], DeepLabv3+ [27], the comparison models were not specifically designed for chip
Manet [28], Linknet [29], PSPNet [30], and PAN [31]. We defect detection, their original training methods, such as loss
also compared our method with the model currently used by functions, are not suitable for this task. To showcase the feature
Nuoding Intelligent Technology Co., Ltd. Due to confidential- extraction capability of the proposed model for chip defects,
ity of the platform where this model is deployed, we could the loss functions used in training the comparison models, ex-
not obtain the implementation and training parameters. Thus, cept for the proposed method and the company’s model, were
the comparison is limited to MIOU, Precision, and Recall. Online Hard Example Mining [33](OHEM) Cross-Entropy
Loss and LovaszLossSoftmax. This aimed to maximize the
All compared networks, except the company’s model, were adaptability of the comparison models to the chip defect
trained on the same RTX 2080 Ti GPU with 11GB of

303

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on February 11,2025 at 07:17:35 UTC from IEEE Xplore. Restrictions apply.
dataset. Apart from the added convolutional layer and the training loss,
For fairness, the implementation codes of the multi-category optimizer, learning rate, and feature extraction backbone, other
segmentation networks were all taken from publicly available implementations remain consistent with the original PFNet
sources. Only the training codes were modified, while other [32] implementation.The results are shown in Table II.
implementation codes were left unchanged. All networks were Effectiveness of the Category Loss and Position Loss:
tested on the same chip defect detection dataset, and the test By reducing the update distance from the predicted map to
codes and metric processing codes were identical. The results the position prediction, the similarity between the position
are shown in Table I. prediction and the actual segmentation position can be effec-
From the results shown shown in Table I, the proposed tively improved. This also effectively guides the focus module
method demonstrates superior performance across various in identifying false positives and false negatives, ultimately
comparison metrics on the chip defect dataset compared to enhancing the final prediction accuracy. Additionally, by ag-
common multi-class segmentation networks. Compared to the gregating the features of each predicted pixel through the SA
best-performing Unet++ [25] , our method achieves a 5.321% [38]module, the relationship between pixels of the same defect
improvement in the MIOU metric. The figure 4 illustrates category is strengthened. This further enhances the prediction
some prediction results of common models alongside the output features and improves prediction accuracy. Class loss
predictions of our method. can suppress the model’s random predictions, allowing it to
For larger defect targets, it’s evident that our method can focus more on overall class predictions and consider the
achieve segmentation results that are more similar to the relationships between pixels of the same defect category.
actual defect targets and have clearer segmentation boundaries Incorporating these methods can result in a 1% improvement
compared to other methods. Common segmentation methods in the mIoU metric.The performance of the self-modified
often struggle to accurately capture the relationship between multi-class PFNet [32] is close to that of the best-performing
defect targets and the background. This indicates that our comparison model, Unet++ [25] . This can be attributed to
method effectively leverages the features of the input image PFNet’s U-shaped network architecture, which can be seen as
to accurately locate defects, distinguishing defect targets that an improvement over Unet. However, it lacks steps for the
closely resemble the background in terms of shape and color. fusion and enhancement of features across different scales.
Regarding smaller defect targets, while other comparison Unet++, on the other hand, has such modifications compared
models often struggle to identify such small defects, our to Unet, with many lateral connections during down-sampling
method can predict the approximate location of these targets. and up-sampling, allowing for better feature fusion across
Moreover, there is a high overlap between our predictions and different scales. However, Unet++ is not directly suitable for
the actual segmentation results. This demonstrates that our the chip defect detection task as it cannot effectively distin-
method effectively integrates features across different scales guish between defect targets and the background, whereas
and can analyze contextual information near defect targets. PFNet can. Therefore, both models have their advantages and
It identifies differences between small defect targets and disadvantages for the chip defect detection task, resulting in
background features, thereby achieving accurate segmentation similar performance.
predictions for small defect targets. Effectiveness of the Multi-Scale Feature Fusion Module:
Combining features across different scales can enhance the
E. Ablation Analysis contextual information contained in the features, while also
To demonstrate the differences and effectiveness of the enhancing semantic information and low-level features. This
method proposed in this paper compared to the original improves the model’s ability to use contextual information to
method PFNet, we conducted corresponding ablation exper- distinguish large defect targets from the background. Addition-
iments. Since our network is an improved version based on ally, it enhances the combination of semantic information and
the camouflaged object detection method PFNet [17], we low-level features across different scales, allowing the model
used PFNet as the baseline model for ablation experiments. to effectively segment defect target positions while correctly
However, the camouflaged object detection task is a binary classifying defect targets. More precise intermediate features
image segmentation task and cannot be directly transferred to also enable the model to more accurately determine defect
the task addressed in this paper. Therefore, we set the baseline positions during up-sampling, facilitating better learning. This
model as a self-modified multi-class PFNet trained with the makes the designed class loss and position loss more effective
training method proposed in this paper. in assisting the model’s training.
The specific modifications include adding a 7x7 convolu-
tional layer after each focus module to reduce the number of V. C ONCLUSION
channels from the focus module’s output to the number of de- In order to address the challenge of accurately segmenting
fect classes in the dataset. Additionally, the loss function uses chip defect objects, this paper proposes a chip defect detection
Online Hard Example Mining [33](OHEM) Cross-Entropy and segmentation method based on a U-shaped network ar-
loss and LovaszSoftmax loss to supervise outputs at all scales. chitecture. This method combines camouflage target detection
The backbone, like the other ablation experiment comparison and small target detection techniques, as well as training on
objects, is pvt v2 b2 [35] pre-trained on ImageNet-1k [34]. imbalanced sample datasets. Specifically, by integrating the

304

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on February 11,2025 at 07:17:35 UTC from IEEE Xplore. Restrictions apply.
camouflage target detection method PFNet [32], the proposed [18] Jinnan Yan, Trung-Nghia Le, Khanh-Duy Nguyen, MinhTriet Tran,
Thanh-Toan Do, and Tam V. Nguyen. Mirror net: Bio-inspired adver-
approach enhances the ability to identify defect targets that sarial attack for camouflaged object segmentation. arXiv:2007.12881,
are highly similar to the background. A multi-scale feature 2020.
fusion module is introduced to merge features of different [19] P.Sengottuvelan, A. Wahi, andA.Shanmugam. Performance of decamou-
flaging through exploratory image analysis. In ETET, 2008.
scales, mitigating PFNet’s difficulty in detecting small targets [20] Martin Stevens and Sami Merilaita. Animal camouflage: current issues
and improving its ability to detect camouflaged defect targets. and new perspectives. Philosophical Transactions of the Royal Society
Additionally, a multi-scale output supervision training method B, 2009.
[21] Gerald Handerson Thayer and Abbott Handerson Thayer.Concealing-
that includes class loss, position loss, and actual prediction coloration in the animal kingdom : an exposition of the laws of
loss is proposed to enhance the model’s ability to learn disguise through color and pattern being a summary of abbott h. thayer’s
defect features. Comparative experiments and ablation studies discoveries. New York the Macmillan Co, 1909.
[22] Chen G, Wang H, Chen K, et al. A survey of the four pillars for
demonstrate the superiority of the proposed method over small object detection: Multiscale representation, contextual information,
common multi-class image segmentation models, as well as super-resolution, and region proposal[J]. IEEE Transactions on systems,
the effectiveness of the proposed multi-scale feature fusion man, and cybernetics: systems, 2020, 52(2): 936-953.
[23] Zhong Z, Lin Z Q, Bidart R, et al. Squeeze-and-attention networks for
module and training methods. semantic segmentation[C]//Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition. 2020: 13065-13074.
R EFERENCES [24] Loshchilov I, Hutter F. Decoupled weight decay regularization[J]. arXiv
preprint arXiv:1711.05101, 2017.
[25] Zhou Z, Siddiquee M M R, Tajbakhsh N, et al. Unet++: Redesigning
[1] Gan J, Li Q, Wang J, et al. A hierarchical extractor-based visual rail skip connections to exploit multiscale features in image segmentation[J].
surface inspection system[J]. IEEE Sensors Journal, 2017, 17(23): 7935- IEEE transactions on medical imaging, 2019, 39(6): 1856-1867.
7944. [26] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object
[2] Silvestre-Blanes J, Albero-Albero T, Miralles I, et al. A public fabric detection[C]//Proceedings of the IEEE conference on computer vision
database for defect detection methods and results. Autex Res J 19 (4): and pattern recognition. 2017: 2117-2125.
363–374[J]. 2019. [27] Firdaus-Nawi M, Noraini O, Sabri M Y, et al. DeepLabv3+ encoder-
[3] Song K, Yan Y. Micro surface defect detection method for silicon steel decoder with Atrous separable convolution for semantic image segmen-
strip based on saliency convex active contour model[J]. Mathematical tation[J]. Pertanika J. Trop. Agric. Sci, 2011, 34(1): 137-143.
Problems in Engineering, 2013, 2013. [28] He P, Jiao L, Shang R, et al. MANet: Multi-scale aware-relation network
[4] Faghih-Roohi S, Hajizadeh S, Núñez A, et al. Deep convolutional neural for semantic segmentation in aerial scenes [J]. IEEE Transactions on
networks for detection of rail surface defects[C]//2016 International joint Geoscience and Remote Sensing, 2022, 60: 1-15.
conference on neural networks (IJCNN). IEEE, 2016: 2584-2589. [29] Chaurasia A, Culurciello E. Linknet: Exploiting encoder representations
[5] Tao X, Zhang D, Ma W, et al. Automatic metallic surface defect for efficient semantic segmentation[C]//2017 IEEE visual communica-
detection and recognition with convolutional neural networks[J]. Applied tions and image processing (VCIP). IEEE, 2017: 1-4.
Sciences, 2018, 8(9): 1575. [30] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network
[6] Mei S, Yang H, Yin Z. An unsupervised-learning-based approach for [C]//Proceedings of the IEEE conference on computer vision and pattern
automated defect inspection on textured surfaces[J]. IEEE transactions recognition. 2017: 2881-2890.
on instrumentation and measurement, 2018, 67(6): 1266-1277. [31] Liu S, Qi L, Qin H, et al. Path aggregation network for instance
[7] Kang G, Gao S, Yu L, et al. Deep architecture for high-speed railway segmentation[C]//Proceedings of the IEEE conference on computer
insulator surface defect detection: Denoising autoencoder with multitask vision and pattern recognition. 2018: 8759-8768.
learning[J]. IEEE Transactions on Instrumentation and Measurement, [32] Mei H, Ji G P, Wei Z, et al. Camouflaged object segmentation with
2018, 68(8): 2679-2690. distraction mining[C]//Proceedings of the IEEE/CVF conference on
[8] Liu Q, Liu M, Wang C, et al. An efficient CNN-based detector computer vision and pattern recognition. 2021: 8772-8781.
for photovoltaic module cells defect detection in electroluminescence [33] Loshchilov I, Hutter F. Decoupled weight decay regularization[J]. arXiv
images[J]. Solar Energy, 2024, 267: 112245. preprint arXiv:1711.05101, 2017.
[9] Lu Q, Lin J, Luo L, et al. A supervised approach for automated surface [34] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical
defect detection in ceramic tile quality control[J]. Advanced Engineering image database[C]//2009 IEEE conference on computer vision and
Informatics, 2022, 53: 101692. pattern recognition. Ieee, 2009: 248-255.
[10] Zavrtanik V, Kristan M, Skočaj D. Reconstruction by inpainting for [35] Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile
visual anomaly detection[J]. Pattern Recognition, 2021, 112: 107706. backbone for dense prediction without convolutions[C]//Proceedings of
[11] Long J, Shelhamer E, Darrell T. Fully convolutional networks for seman- the IEEE/CVF international conference on computer vision. 2021: 568-
tic segmentation[C]//Proceedings of the IEEE conference on computer 578.
vision and pattern recognition. 2015: 3431-3440. [36] Chen Y. Convolutional neural network for sentence classification[D].
[12] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks University of Waterloo, 2015.
for biomedical image segmentation[C]//Medical image computing and [37] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial
computer-assisted intervention–MICCAI 2015: 18th international con- nets[J]. Advances in neural information processing systems, 2014, 27.
ference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. [38] Zhong Z, Lin Z Q, Bidart R, et al. Squeeze-and-attention networks for
Springer International Publishing, 2015: 234-241. semantic segmentation[C]//Proceedings of the IEEE/CVF conference on
[13] Hugh Bamford Cott. Adaptive coloration in animals. Methuen & Co. computer vision and pattern recognition. 2020: 13065-13074.
Ltd, 1940.
[14] Deng-Ping Fan, Ge-Peng Ji, Guolei Sun, Ming-Ming Cheng,Jianbing
Shen, and Ling Shao. Camouflaged object detection. In CVPR, 2020.
[15] Trung-Nghia Le, Tam V Nguyen, Zhongliang Nie, MinhTriet Tran, and
Akihiro Sugimoto. Anabranch network forcamouflaged object segmen-
tation. CVIU, 2019.
[16] Jianqin Yin Yanbin Han Wendi Hou Jinping Li. Detection of the mobile
object with camouflage color under dynamic background based on
optical flow. Procedia Engineering, 2011.
[17] Yuxin Pan, Yiwang Chen, Qiang Fu, Ping Zhang, and Xin Xu. Study on
the camouflaged target detection method based on 3d convexity. Modern
Applied Science, 2011.

305

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on February 11,2025 at 07:17:35 UTC from IEEE Xplore. Restrictions apply.

You might also like