Automatic Fabric Defect Detection Using Cascaded Mixed Feature Pyramid with Guided Localization
<p>Imbalance distribution of instances for FBDF. (<b>a</b>) Area and number distribution of instances per class. (<b>b</b>) Size ratio distribution of bounding boxes per class.</p> "> Figure 2
<p>End-to-end fabric defect detection architecture.</p> "> Figure 3
<p>Mixed convolutional module for feature extraction.</p> "> Figure 4
<p>Feature pyramid network frameworks.</p> "> Figure 5
<p>Composite interpolating feature pyramid (CI-FPN).</p> "> Figure 6
<p>Cascaded Guided-Region Proposal Network (CG-RPN) with semantics-guided refinement.</p> "> Figure 7
<p>Effective receptive field from deformable extraction in guided localization.</p> "> Figure 8
<p>From positive anchors to proposals and finally to bounding boxes.</p> "> Figure 9
<p>Metrics for performance of imbalanced detection.</p> "> Figure 10
<p>Model sizes with Average Precision (AP) for FBDF of different high-efficiency backbones upon Faster R-CNN and FPN.</p> "> Figure 11
<p>The performance of different configurations of CI-FPN with CG-RPN modules.</p> "> Figure 12
<p>Sample RPN proposals from Faster R-CNN and C-G RPN.</p> "> Figure 13
<p>Visualization of the best bounding boxes of defects from MC-Net along with CI-FPN.</p> ">
Abstract
:1. Introduction
2. Data Space
2.1. Defect Class Selection
2.2. Characteristics of FBDF Dataset
- Large scale. FBDF consists of 2k optical fabric images and 4k defect instances that are manually labeled with axis-aligned bounding boxes. The size of images is all 2446 × 1000 pixels and the spatial resolution could be down to 0.5 mm. FBDF is collected from the Ali Cloud by the experts in the domain of textile engineering.
- Instance size and number variations. Spatial size variation represents actual feature of fabric defects in industrial scene. This is not only because of the spatial resolutions of sensors, but also due to between-class size variation (e.g., “Knots” vs. “Indentation Marks”) and within-class size variation (e.g., “Rough Warp” vs. “Loose Warp”). There is a large range of size variations of defect instances in the proposed FBDF dataset, as shown in Figure 1. For each class of fabric, area, height-width ratio, and the number of instances are various and widely ranged. Few-shot recognition ability of detectors could be validated by number variation; multi-scale recognition ability is from area and height-width ratio variation.
- Image variations. A highly desired characteristic for any defect detection system is its robustness to image variations, concerning different textile, cloth pattern, back-light intensity, imaging conditions, etc. Textile is mainly from denim, muslin, satin and so on. Back-light is controlled to guarantee the sharpness of images. Because of the variations in viewpoint, translation is not that important as compared to illumination, background, and defect appearance for each defect class, so they are simplified in FBDF.
- Inter-class similarity and intra-class diversity. Inter-class similarity leads to False Positive (FP) and intra-class diversity leads to False Negative (FN) in classifying module of detectors. Comparable defect images in different class are collected without salient modification to obtain the former. To increase the latter, different defect colors, shapes, and scales are taken into account when selecting images. “Spandex” instances present distinguished shapes, and “Jumps” and “Star-jumps” instances are the opposite.
3. Methodology
3.1. Backbone for Feature Extraction
3.2. Neck for Feature Integrating and Refining
- (1)
- Compress feature extraction maps. Defect images share a common dark background and pixel values of object instance have no big difference, so there is no need to enlarge the number of kernels for a layer.
- (2)
- Add cross-scale fusion without extra computations. Nodes derived from one input edge are supposed to remove for its low-level semantic representation and aggregation between input and output from the same level made defect region more visually clear.
- (3)
- Repeat bidirectional (top-down & bottom-up) block. Unlike PANet, which only has one bidirectional block, the network that is proposed in this paper advices cascaded modules to enable more high-level feature fusion.
- (4)
- Keep no same lateral size in a repeated block. Unlike EfficientDet (stacking Bi-FPN) that keep the lateral sizes of up-sampling and down-sampling the same, this paper applies the interpolating layer for approximately continuous extension of scale, as shown in Figure 5. Feature loss could be reduced with least amount of extra latency by this way.
3.3. Anchor Sampling and Refining
3.3.1. Stage for Proposal Generation
3.3.2. Stage for Bounding Box Generation
3.4. Evaluation Metrics for Imbalanced Detection for Defects
4. Experiments and Discussion
4.1. Experimental Settings
4.2. Main Result
4.3. Ablation Experiments
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Ahmed, J.; Gao, B.; Woo, W.L. Wavelet Integrated Alternating Sparse Dictionary Matrix Decomposition in Thermal Imaging CFRP Defect Detection. IEEE Trans. Ind. Inform. 2019, 5, 4033–4043. [Google Scholar] [CrossRef]
- Gao, B.; Lu, P.; Woo, W.L.; Tian, G.Y.; Zhu, Y.; Johnston, M. Variational Bayesian Sub-group Adaptive Sparse Component Extraction for Diagnostic Imaging System. IEEE Trans. Ind. Electron. 2019, 5, 4033–4043. [Google Scholar]
- Wang, Y.; Gao, B.; lok Woo, W.; Tian, G.; Maldague, X.; Zheng, L.; Guo, Z.; Zhu, Y. Thermal Pattern Contrast Diagnostic of Microcracks With Induction Thermography for Aircraft Braking Components. IEEE Trans. Ind. Inform. 2018, 14, 5563–5574. [Google Scholar] [CrossRef] [Green Version]
- Hamdi, A.A.; Sayed, M.S.; Fouad, M.M.; Hadhoud, M.M. Unsupervised patterned fabric defect detection using texture filtering and K-means clustering. In Proceedings of the International Conference on Innovative Trends in Computer Engineering, Aswan, Egypt, 19–21 February 2018; pp. 130–144. [Google Scholar]
- Mei, S.; Wang, Y.; Wen, G.J. Automatic Fabric Defect Detection with a Multi-Scale Convolutional Denoising Autoencoder Network Model. Sensors 2018, 18, 1064. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Seker, A.; Peker, K.A.; Yuksek, A.G.; Delibaş, E. Fabric defect detection using deep learning. In Proceedings of the 24th Signal Processing and Communication Application Conference (SIU), Zonguldak, Turkey, 16–19 May 2016. [Google Scholar]
- Li, Z.; Peng, C.; Yu, G.; Zhang, X.; Deng, Y.; Sun, J. Detnet: A backbone network for object detection. arXiv 2018, arXiv:1804.06215. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Li, H.; Yan, J.; Wei, F.; Wang, X.; Tang, X. Recurrent Scale Approximation for Object Detection in CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 571–579. [Google Scholar]
- Singh, B.; Davis, L.S. An analysis of scale invariance in object detection snip. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3578–3587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Jiang, B.; Luo, R.; Mao, J.; Xiao, T.; Jiang, Y. Acquisition of localization confidence for accurate object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–799. [Google Scholar]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2018, 159, 296–307. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Tan, M.; Le, Q.V. Mixconv: Mixed depthwise convolutional kernels. arXiv 2019, arXiv:1907.09595. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic Convolution: Attention over Convolution Kernels. arXiv 2019, arXiv:1912.03458. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Borji, A.; Cheng, M.M.; Jiang, H.; Li, J. Salient object detection: A benchmark. IEEE Trans. Image Process. 2015, 24, 5706–5722. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Borji, A.; Itti, L. State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 185–207. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Liu, Y.; Wang, Y.; Wang, S.; Liang, T.; Zhao, Q.; Tang, Z.; Ling, H. CBNet: A Novel Composite Backbone Network Architecture for Object Detection. arXiv 2019, arXiv:1909.03625. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. arXiv 2019, arXiv:1911.09070. [Google Scholar]
- Wang, J.; Chen, K.; Yang, S.; Loy, C.C.; Lin, D. Region proposal by guided anchoring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2965–2974. [Google Scholar]
- Zhu, Y.; Zhao, C.; Wang, J.; Zhao, X.; Wu, Y.; Lu, H. CoupleNet: Coupling global structure with local parts for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4126–4134. [Google Scholar]
- Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019; pp. 6569–6578. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. arXiv 2019, arXiv:1904.01355. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Le, Q.V. Searching for mobilenetv3. arXiv 2019, arXiv:1905.02244. [Google Scholar]
Feet | Particles | Knots | Spandex | Rg-Warp | Stains |
---|---|---|---|---|---|
Input | Operator | EXP Size | AF | SE | FPN |
---|---|---|---|---|---|
2446 × 1000 × 3 | 3 × 3 | - | RE | - | - |
896 × 448 × 16 | 3 × 3 | {40, 72} | RE | - | - |
448 × 224 × 24 | 3 × 3, 5 × 5 | {72, 72} | HS | √ | √ |
224 × 112 × 40 | 3 × 3 | {72, 120, 240} | HS | √ | - |
112 × 56 × 80 | 3 × 3, 5 × 5, 7 × 7 | {200, 240} | HS | √ | √ |
56 × 28 × 112 | 3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11 | {240, 480} | RE | √ | √ |
28 × 14 × 160 | 3 × 3, 5 × 5 | {480, 672} | HS | √ | √ |
Baseline | Backbone | AP | AP50 | AP75 |
---|---|---|---|---|
FPN | ResNet-50 | 34.29 | 52.01 | 36.68 |
NAS-FPN | AmoebaNet | 36.16 | 55.74 | 40.77 |
PANet | VGG-16 | 39.51 | 60.17 | 42.14 |
EfficientDet | EfficientNet-B4 | 46.39 | 65.68 | 50.27 |
CBNet | ResNet-50 | 44.22 | 60.38 | 46.96 |
Method | Backbone | AP | AP50 | AP75 | APS | APL | VP |
---|---|---|---|---|---|---|---|
Faster R-CNN | VGG-16 | 42.6 | 57.7 | 45.8 | 22.4 | 53.6 | 14.2 |
Faster + FPN | ResNet-50 | 53.3 | 69.0 | 57.7 | 39.3 | 64.7 | 13.8 |
RetinaNet + FPN | ResNet-50 | 55.7 | 73.3 | 60.9 | 42.7 | 66.5 | 12.6 |
YOLOv3 [37] | DarkNet-53 | 35.5 | 52.5 | 36.2 | 19.4 | 50.5 | 10.8 |
SSD-513 [38] | ResNet-101 | 32.9 | 54.9 | 34.1 | 12.2 | 48.4 | 12.9 |
Cascaded [39] + FPN | ResNet-50 | 60.5 | 75.2 | 66.3 | 47.1 | 71.0 | 11.5 |
CornerNet [40] | Hourglass-52 | 36.4 | 53.0 | 39.8 | 19.9 | 51.2 | 13.0 |
CenterNet [41] | Hourglass-52 | 39.5 | 57.5 | 40.6 | 22.7 | 54.3 | 12.6 |
Libra-FPN-RetinaNet [42] | ResNet-50 | 56.9 | 71.4 | 60.2 | 38.5 | 68.9 | 11.2 |
FCOS [43] | ResNet-50 | 33.0 | 49.8 | 34.4 | 19.2 | 44.3 | 10.3 |
MC-Net + CI-FPN | ResNet-50 | 65.9 | 79.5 | 68.0 | 48.3 | 77.3 | 10.7 |
MC-Net + CI-FPN | Mixed-16 | 72.6 | 86.3 | 73.6 | 50.9 | 80.4 | 9.7 |
Method | Backbone | AP | AP50 | AP75 | APS | APL |
---|---|---|---|---|---|---|
Faster + FPN | Mixed-16 | 59.2(5.9) | 74.2 | 63.6 | 43.0 | 70.1 |
RetinaNet + FPN | Mixed-16 | 61.9(6.2) | 79.8 | 66.7 | 46.5 | 73.0 |
Cascaded R-CNN | Mixed-16 | 63.6(4.5) | 78.4 | 80.1 | 47.5 | 72.8 |
Cascaded + FPN | Mixed-16 | 66.5(6.0) | 83.5 | 72.5 | 59.8 | 77.9 |
Tuple | Mean IOU (Post) | AP | APS |
---|---|---|---|
(0.5) | 46.5 | 71.4 | 39.1 |
(0.5, 0.7) | 59.7 | 75.8 | 44.8 |
(0.3, 0.5) | 53.6 | 72.9 | 41.3 |
(0.3, 0.5, 0.7) | 56.8 | 76.1 | 46.5 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, Y.; Zhang, X.; Fang, F. Automatic Fabric Defect Detection Using Cascaded Mixed Feature Pyramid with Guided Localization. Sensors 2020, 20, 871. https://doi.org/10.3390/s20030871
Wu Y, Zhang X, Fang F. Automatic Fabric Defect Detection Using Cascaded Mixed Feature Pyramid with Guided Localization. Sensors. 2020; 20(3):871. https://doi.org/10.3390/s20030871
Chicago/Turabian StyleWu, You, Xiaodong Zhang, and Fengzhou Fang. 2020. "Automatic Fabric Defect Detection Using Cascaded Mixed Feature Pyramid with Guided Localization" Sensors 20, no. 3: 871. https://doi.org/10.3390/s20030871