Explainable Image Similarity: Integrating Siamese Networks and Grad-CAM
<p>Architecture of the proposed framework.</p> "> Figure 2
<p>Application of the proposed framework on flowers dataset. (<b>a</b>) Original input image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>b</b>) factual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>c</b>) counterfactual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>d</b>) original input image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>, (<b>e</b>) factual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>, (<b>f</b>) counterfactual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>.</p> "> Figure 3
<p>Application of the proposed framework on skin cancer dataset. (<b>a</b>) Original input image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>b</b>) factual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>c</b>) counterfactual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>d</b>) original input image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>, (<b>e</b>) factual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>, (<b>f</b>) counterfactual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>.</p> "> Figure 4
<p>Application of the proposed framework on AirBnB dataset. (<b>a</b>) Original input image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>b</b>) factual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>c</b>) counterfactual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>, (<b>d</b>) original input image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>, (<b>e</b>) factual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>, (<b>f</b>) counterfactual explanations on image<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>.</p> "> Figure 5
<p>(<b>a</b>) Original image, (<b>b</b>) bounding box, (<b>c</b>) cropped image.</p> "> Figure 6
<p>(<b>a</b>) Pair of images for the same class obtained from the original dataset, (<b>b</b>) corresponding cropped images.</p> "> Figure 6 Cont.
<p>(<b>a</b>) Pair of images for the same class obtained from the original dataset, (<b>b</b>) corresponding cropped images.</p> ">
Abstract
:1. Introduction
- We propose the concept “explainable image similarity”, highlighting the need for providing human-understandable explanations for image similarity tasks.
- We propose a new conceptual framework for explainable image similarity, which integrates Siamese networks along with the Grad-CAM technique, which is able to provide reliable, transparent and interpretable decisions on image similarity tasks.
- The proposed framework produces factual and counterfactual explanations, which are able to provide valuable insights and be used for making useful recommendations.
2. Related Work
3. Explainable Image Similarity
3.1. Background
3.2. Proposed Framework
- Counterfactual explanations: The identification of regions, which would make the network change its prediction, could highlight concepts that confuse the model. Therefore, by removing those concepts, the model’s decisions may be more accurate or more confident.
- Bias evaluation of model’s decisions: In case the Siamese model is performing well on both training and testing data (not-biased model), Grad-CAM heatmaps may be used to visualize the features, which significantly impact the model’s decisions. In contrast, in case the Siamese model is performing well on the training data but it is not able to generalize well (biased model), Grad-CAM heatmaps can be efficiently used to identify unwanted features in which the model focuses on.
4. Application of Proposed Framework and Use Case Scenarios
- Flowers. This dataset contains 4242 images (320 × 240) of flowers, which were categorized into five classes: “chamomile”, “tulip”, “rose”, “sunflower” and “dandelion”.
- Skin cancer. This dataset concerns images (224 × 224) of 1400 malignant and 1400 benign oncological diseases.
- AirBnB. This few-show dataset is composed of 864 interior and exterior house pictures (600 × 400) scraped from AirBnB over 3 cities, which were classified in 12 classes: “backyard”, “basement”, “bathroom”, “bedroom”, “decor”, “dining-room”, “entrance”, “house-exterior”, “kitchen”, “living-room”, “outdoor”, “staircase” and “TV room”.
4.1. Flowers Dataset
4.2. Skin Cancer Dataset
4.3. Airbnb Dataset
4.4. Improving the Siamese Model’s Performance
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gordo, A.; Almazan, J.; Revaud, J.; Larlus, D. End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 2017, 124, 237–254. [Google Scholar] [CrossRef]
- Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2874–2883. [Google Scholar]
- Gygli, M.; Grabner, H.; Riemenschneider, H.; Van Gool, L. Creating summaries from user videos. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VII 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 505–520. [Google Scholar]
- Shin, H.C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med Imaging 2016, 35, 1285–1298. [Google Scholar] [CrossRef] [PubMed]
- Radenović, F.; Tolias, G.; Chum, O. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1655–1668. [Google Scholar] [CrossRef] [PubMed]
- Chicco, D. Siamese neural networks: An overview. In Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 2021; pp. 73–94. [Google Scholar]
- Appalaraju, S.; Chaoji, V. Image similarity using deep CNN and curriculum learning. arXiv 2017, arXiv:1709.08761. [Google Scholar]
- Melekhov, I.; Kannala, J.; Rahtu, E. Siamese network features for image matching. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 378–383. [Google Scholar]
- Rossi, A.; Hosseinzadeh, M.; Bianchini, M.; Scarselli, F.; Huisman, H. Multi-modal siamese network for diagnostically similar lesion retrieval in prostate MRI. IEEE Trans. Med Imaging 2020, 40, 986–995. [Google Scholar] [CrossRef]
- Selbst, A.D.; Barocas, S. The intuitive appeal of explainable machines. Fordham L. Rev. 2018, 87, 1085. [Google Scholar] [CrossRef]
- Pintelas, E.; Liaskos, M.; Livieris, I.E.; Kotsiantis, S.; Pintelas, P. Explainable machine learning framework for image classification problems: Case study on glioma cancer prediction. J. Imaging 2020, 6, 37. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Wachter, S.; Mittelstadt, B.; Floridi, L. Transparent, explainable, and accountable AI for robotics. Sci. Robot. 2017, 2, eaan6080. [Google Scholar] [CrossRef]
- Livieris, I.E.; Karacapilidis, N.; Domalis, G.; Tsakalidis, D. An advanced explainable and interpretable ML-based framework for educational data mining. In Proceedings of the 13th International Conference on Methodologies and Intelligent Systems for Technology Enhanced Learning, Guimaraes, Portugal, 12–14 July 2023. [Google Scholar]
- Pintelas, E.; Liaskos, M.; Livieris, I.E.; Kotsiantis, S.; Pintelas, P. A novel explainable image classification framework: Case study on skin cancer and plant disease prediction. Neural Comput. Appl. 2021, 33, 15171–15189. [Google Scholar] [CrossRef]
- Samek, W.; Wiegand, T.; Müller, K.R. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv 2017, arXiv:1708.08296. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis. 2021, 129, 23–79. [Google Scholar]
- Pintelas, E.; Livieris, I.E.; Pintelas, P.E. A convolutional autoencoder topology for classification in high-dimensional noisy image datasets. Sensors 2021, 21, 7731. [Google Scholar] [CrossRef] [PubMed]
- Pintelas, E.; Livieris, I.E.; Kotsiantis, S.; Pintelas, P. A multi-view-CNN framework for deep representation learning in image classification. Comput. Vis. Image Underst. 2023, 232, 103687. [Google Scholar] [CrossRef]
- Hsiao, C.T.; Lin, C.Y.; Wang, P.S.; Wu, Y.T. Application of convolutional neural network for fingerprint-based prediction of gender, finger position, and height. Entropy 2022, 24, 475. [Google Scholar] [CrossRef]
- Kim, S.H.; Park, J.S.; Lee, H.S.; Yoo, S.H.; Oh, K.J. Combining CNN and Grad-CAM for profitability and explainability of investment strategy: Application to the KOSPI 200 futures. Expert Syst. Appl. 2023, 225, 120086. [Google Scholar] [CrossRef]
- Singh, A.; Sengupta, S.; Lakshminarayanan, V. Explainable deep learning models in medical image analysis. J. Imaging 2020, 6, 52. [Google Scholar] [CrossRef]
- Saeki, M.; Ogata, J.; Murakawa, M.; Ogawa, T. Visual explanation of neural network based rotation machinery anomaly detection system. In Proceedings of the 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), San Francisco, CA, USA, 17–20 June 2019; pp. 1–4. [Google Scholar]
- Gupta, R.T.; Mehta, K.A.; Turkbey, B.; Verma, S. PI-RADS: Past, present, and future. J. Magn. Reson. Imaging 2020, 52, 33–53. [Google Scholar] [CrossRef]
- Neculoiu, P.; Versteegh, M.; Rotaru, M. Learning text similarity with siamese recurrent networks. In Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany, 11 August 2016; pp. 148–157. [Google Scholar]
- Guo, Z.; Arandjelović, O.; Reid, D.; Lei, Y. A Siamese Transformer Network for Zero-Shot Ancient Coin Classification. J. Imaging 2023, 9, 107. [Google Scholar] [CrossRef]
- Mazzeo, P.L.; Libetta, C.; Spagnolo, P.; Distante, C. A siamese neural network for non-invasive baggage re-identification. J. Imaging 2020, 6, 126. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Reddi, S.J.; Kale, S.; Kumar, S. On the convergence of adam and beyond. arXiv 2019, arXiv:1904.09237. [Google Scholar]
- Wang, F.; Liu, H. Understanding the behaviour of contrastive loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2495–2504. [Google Scholar]
- Cui, X.; Wang, D.; Wang, Z.J. CHIP: Channel-wise disentangled interpretation of deep convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 4143–4156. [Google Scholar] [CrossRef] [PubMed]
- Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar]
- Livieris, I.E.; Kotsilieris, T.; Tampakas, V.; Pintelas, P. Improving the evaluation process of students’ performance utilizing a decision support software. Neural Comput. Appl. 2019, 31, 1683–1694. [Google Scholar] [CrossRef]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, NV, USA, 12–15 March 2018; pp. 839–847. [Google Scholar]
- Fu, R.; Hu, Q.; Dong, X.; Guo, Y.; Gao, Y.; Li, B. Axiom-based Gram-CAM: Towards accurate visualization and explanation of CNNs. arXiv 2020, arXiv:2008.02312. [Google Scholar]
- Wang, H.; Wang, Z.; Du, M.; Yang, F.; Zhang, Z.; Ding, S.; Mardziel, P.; Hu, X. Score-CAM: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 24–25. [Google Scholar]
- Hu, B.; Vasu, B.; Hoogs, A. X-MIR: EXplainable Medical Image Retrieval. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE Computer Society, Waikoloa, HI, USA, 3–8 January 2022; pp. 1544–1554. [Google Scholar]
- RichardWebster, B.; Hu, B.; Fieldhouse, K.; Hoogs, A. Doppelganger Saliency: Towards More Ethical Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 2847–2857. [Google Scholar]
- Hu, B.; Tunison, P.; RichardWebster, B.; Hoogs, A. Xaitk-Saliency: An Open Source Explainable AI Toolkit for Saliency. In Proceedings of the AAAI Conference on Artificial Intelligence, Arlington, VA, USA, 25–27 October 2023; Volume 37, pp. 15760–15766. [Google Scholar]
- Peng, B.; Li, C.; He, P.; Galley, M.; Gao, J. Instruction tuning with GPT-4. arXiv 2023, arXiv:2304.03277. [Google Scholar]
- Topsakal, O.; Akinci, T.C. Creating Large Language Model Applications Utilizing LangChain: A Primer on Developing LLM Apps Fast. In Proceedings of the International Conference on Applied Engineering and Natural Sciences, Konya, Turkey, 10–12 July 2023; Volume 1, pp. 1050–1056. [Google Scholar]
Dataset | Accuracy | AUC | Precision | Recall |
---|---|---|---|---|
Original | 87.15% | 0.872 | 0.890 | 0.872 |
“Cropped” | 88.31% | 0.883 | 0.900 | 0.880 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Livieris, I.E.; Pintelas, E.; Kiriakidou, N.; Pintelas, P. Explainable Image Similarity: Integrating Siamese Networks and Grad-CAM. J. Imaging 2023, 9, 224. https://doi.org/10.3390/jimaging9100224
Livieris IE, Pintelas E, Kiriakidou N, Pintelas P. Explainable Image Similarity: Integrating Siamese Networks and Grad-CAM. Journal of Imaging. 2023; 9(10):224. https://doi.org/10.3390/jimaging9100224
Chicago/Turabian StyleLivieris, Ioannis E., Emmanuel Pintelas, Niki Kiriakidou, and Panagiotis Pintelas. 2023. "Explainable Image Similarity: Integrating Siamese Networks and Grad-CAM" Journal of Imaging 9, no. 10: 224. https://doi.org/10.3390/jimaging9100224