Abstract
Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information (e.g., architectural models) to support architecting activities, such as architecture understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and December 2022. Of the 104 primary studies finally selected, 7 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 11 categories of sources have been leveraged for mining architectural information, among which version control system (e.g., GitHub) is the most popular source; 11 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 95 approaches and 56 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.










Similar content being viewed by others
Data Availibility Statement
The Supplementary Material of the current study is available in the Zenodo repository at de Dieu et al. (2024).
References
Alon U, Sadaka R, Levy O, Yahav E (2020) Structural language models of code. In: Proceedings of the 37th International conference on machine learning (ICML), pp 245–256
Alves V, Niu N, Alves C, Valença G (2010) Requirements engineering for software product lines: A systematic literature review. Information and Software Technology 52(8):806–820
Ampatzoglou A, Bibi S, Avgeriou P, Verbeek M, Chatzigeorgiou A (2019) Identifying, categorizing and mitigating threats to validity in software engineering secondary studies. Inf Softw Technol 106:201–230
Bass L, Clements P, Kazman R (2012) Software Architecture in Practice, 3rd edn. Addson-Wesley Professional
Bavota G, Gethers M, Oliveto R, Poshyvanyk D, Lucia Ad (2014) Improving software modularization via automated analysis of latent topics and dependencies. ACM Trans Softw Eng Methodol 23(1):1–33
Bedjeti A, Lago P, Lewis GA, De Boer RD, Hilliard R (2017) Modeling context with an architecture viewpoint. In: Proceedings of the 1st IEEE International Conference on Software Architecture (ICSA), pp 117–120
Bengtsson P, Lassing N, Bosch J, van Vliet H (2004) Architecture-level modifiability analysis (alma). J Syst Softw 69(1–2):129–147
Bhat M, Shumaiev K, Biesdorf A, Hohenstein U, Matthes F (2017) Automatic extraction of design decisions from issue management systems: a machine learning based approach. In: Proceedings of the 11th European Conference on Software Architecture (ECSA), pp 138–154
Bi T, Liang P, Tang A (2018) Architecture patterns, quality attributes, and design contexts: How developers design with them? In: Proceedings of the 25th Asia-Pacific Software Engineering Conference (APSEC), pp 49–58
Bi T, Liang P, Tang A, Xia X (2021) Mining architecture tactics and quality attributes knowledge in stack overflow. J Syst Softw 180:111005
Borrego G, Morán AL, Palacio RR, Vizcaíno A, García FO (2019) Towards a reduction in architectural knowledge vaporization during agile global software development. Inf Softw Technol 112:68–82
Campbell JL, Quincy C, Osserman J, Pedersen OK (2013) Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociol Methods Res 42(3):294–320
Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2015) Defect prediction as a multiobjective optimization problem. Software Testing, Verification and Reliability 25(4):426–459
Capilla R, Jansen A, Tang A, Avgeriou P, Babar MA (2016) 10 years of software architecture knowledge management: Practice and future. J Syst Softw 116:191–205
Casamayor A, Godoy D, Campo M (2012) Functional grouping of natural language requirements for assistance in architectural software design. Knowledge-Based Systems 30:78–86
Casamayor A, Godoy D, Campo M (2012) Mining textual requirements to assist architectural software design: A state of the art review. Artif Intell Rev 38(3):173–191
Cervantes H, Kazman R (2016) Designing software architectures: a practical approach. Addison-Wesley Professional
Chaabane M, Rodriguez IB, Drira K, Jmaiel M (2017) Mining approach for software architectures’ description discovery. In: Proceedings of the 14th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp 879–886
Chang Y, Wang X, Wang J, Wu Y, Yang L, Zhu K, Chen H, Yi X, Wang C, Wang Y, et al. (2024) A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology
Chen L, Babar MA, Zhang H (2010) Towards an evidence-based understanding of electronic data sources. In: Proceedings of the 14th International Conference on Evaluation and Assessment in Software Engineering (EASE), pp 1–4
Chen Z, Jiang R, Zhang Z, Pei Y, Pan M, Zhang T, Li X (2020) Enhancing example-based code search with functional semantics. J Syst Softw 165:110568
Ciniselli M, Cooper N, Pascarella L, Poshyvanyk D, Di Penta M, Bavota G (2021) An empirical study on the usage of bert models for code completion. In: Proceedings of the 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), pp 108–119
Clements P, Garlan D, Little R, Nord R, Stafford J (2003) Documenting software architectures: Views and beyond. In: Proceedings of the 25th International Conference on Software Engineering (ICSE), pp 740–741
Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1):37–46
Dąbrowski J, Letier E, Perini A, Susi A (2022) Analysing app reviews for software engineering: a systematic literature review. Empir Softw Eng 27(2):43
de Dieu MJ, Liang P, Shahin M (2022) How do developers search for architectural information? an industrial survey. In: Proceedings of the 19th IEEE International Conference on Software Architecture (ICSA), pp 58–68
de Dieu MJ, Liang P, Shahin M (2024). Supplementary Material for the Paper: Mining Architectural Information: A Systematic Mapping Study. https://doi.org/10.5281/zenodo.10354000
Ding W, Liang P, Tang A, van Vliet H, Shahin M (2014) How do open source communities document software architecture: An exploratory survey. In: Proceedings of the 19th International Conference on Engineering of Complex Computer Systems (ICECCS), pp 136–145
Ding W, Liang P, Tang A, van Vliet H (2015) Understanding the causes of architecture changes using OSS mailing lists. International Journal of Software Engineering and Knowledge Engineering 25(9 &10):1633–1651
Dinh T, Zhao J, Tan S, Negrinho R, Lausen L, Zha S, Karypis G (2023) Large language models of code fail at completing code with potential bugs. In: Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS), pp 1–27
do Nascimento Vale L, Maia MdA (2015) Keecle: Mining key architecturally relevant classes using dynamic analysis. In: Proceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 566–570
Ducasse S, Pollet D (2009) Software architecture reconstruction: A process-oriented taxonomy. IEEE Trans Softw Eng 35(4):573–591
Garcia J, Mirakhorli M, Xiao L, Zhao Y, Mujhid I, Pham K, Okutan A, Malek S, Kazman R, Cai Y, Medvidovic N (2021) Constructing a shared infrastructure for software architecture analysis and maintenance. In: Proceedings of the 18th IEEE International Conference on Software Architecture (ICSA), pp 150–161
Guha R, McCool R, Miller E (2003) Semantic search. In: Proceedings of the 12th International Conference on World Wide Web (WWW), pp 700–709
Guo J, Fan Y, Pang L, Yang L, Ai Q, Zamani H, Wu C, Croft WB, Cheng X (2020) A deep look into neural ranking models for information retrieval. Inf Process Manag 57(6):102067
Harbo SKR, Voldby EP, Madsen J, Albano M (2024) Acsmt: A plugin for eclipse papyrus to model systems of systems. Science of Computer Programming 231:103008
Hassan AE (2008) The road ahead for mining software repositories. In: Proceedings of the 2008 Frontiers of Software Maintenance (FoSM), pp 48–57
Hofmeister C, Kruchten P, Nord RL, Obbink H, Ran A, America P (2007) A general model of software architecture design derived from five industrial approaches. J Syst Softw 80(1):106–126
Hull E, Jackson K, Dick J (2005) Requirements Engineering in the Solution Domain. Springer
ISO/IEC/IEEE (2011) Systems and Software Engineering - Architecture Description. ISO/IEC/IEEE 42010:2011(E) (Revision of ISO/IEC 42010:2007 and IEEE Std 1471-2000) pp 1–46
Jansen A, Bosch J (2005) Software architecture as a set of architectural design decisions. In: Proceedings of the 5th IEEE/IFIP Working Conference on Software Architecture (WICSA), pp 109–120
Jansen A, Avgeriou P, van der Ven JS (2009) Enriching software architecture documentation. J Syst Softw 82(8):1232–1248
Kazman R, Cai Y, Mo R, Feng Q, Xiao L, Haziyev S, Fedak V, Shapochka A (2015) A case study in locating the architectural roots of technical debt. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE), pp 179–188
Kitchenham B, Charters S, et al. (2007) Guidelines for performing systematic literature reviews in software engineering
Kotsiantis SB, Zaharakis ID, Pintelas PE (2006) Machine learning: A review of classification and combining techniques. Artif Intell Rev 26(3):159–190
Koziolek H, Domis D, Goldschmidt T, Vorst P (2013) Measuring architecture sustainability. IEEE Softw 30(6):54–62
Kruchten P (1995) The 4+1 view model of architecture. IEEE Softw 12(6):42–50
Kruchten P (2004) An ontology of architectural design decisions in software-intensive systems. In: Proceedings of the 2nd Groningen Workshop on Software Variability Management (SVM), pp 54–61
Li M, Yang Y, Shi L, Wang Q, Hu J, Peng X, Liao W, Pi G (2020) Automated extraction of requirement entities by leveraging lstm-crf and transfer learning. In: Proceedings fo the 36th IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 208–219
Li R, Liang P, Soliman M, Avgeriou P (2022) Understanding software architecture erosion: A systematic mapping study. Journal of Software: Evolution and Process 34(3):e2423
Li Z, Liang P, Avgeriou P (2013) Application of knowledge-based approaches in software architecture: A systematic mapping study. Inf Softw Technol 55(5):777–794
Li Z, Liang P, Avgeriou P (2014) Architectural debt management in value-oriented architecting. In: Economics-Driven Software Architecture, Elsevier, pp 183–204
Li Z, Avgeriou P, Liang P (2015) A systematic mapping study on technical debt and its management. J Syst Softw 101:193–220
Liu F, Li G, Zhao Y, Jin Z (2020) Multi-task learning based pre-trained language model for code completion. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 473–485
Mahadi A, Tongay K, Ernst NA (2020) Cross-dataset design discussion mining. In: Proceeding of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 149–160
Malavolta I, Lago P, Muccini H, Pelliccione P, Tang A (2013) What industry needs from architectural languages: A survey. IEEE Trans Softw Eng 39:869–891
Malavolta I, Lewis GA, Schmerl B, Lago P, Garlan D (2021) Mining guidelines for architecting robotics software. J Syst Softw 178:110969
Mirakhorli M, Carvalho J, Cleland-Huang J, Mäder P (2013) A domain-centric approach for recommending architectural tactics to satisfy quality concerns. In: Proceedings of the 3rd International Workshop on the Twin Peaks of Requirements and Architecture (TwinPeaks), pp 1–8
Nafi KW, Kar TS, Roy B, Roy CK, Schneider KA (2019) Clcdsa: cross language code clone detection using syntactical features and api documentation. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 1026–1037
Naghdipour A, Hasheminejad SMH (2023) Implications of semi-supervised learning for design pattern selection. Software Quality Journal 31(3):809–842
Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary Computation 16:1–18
Nazar N, Hu Y, Jiang H (2016) Summarizing software artifacts: A literature review. J Comput Sci Technol 31(5):883–909
Nguyen HA, Nguyen TN, Dig D, Nguyen S, Tran H, Hilton M (2019) Graph-based mining of in-the-wild, fine-grained, semantic code change patterns. In: Proceedings of the 41st IEEE/ACM International Conference on Software Engineering (ICSE), pp 819–830
Perry DE, Wolf AL (1992) Foundations for the study of software architecture. ACM SIGSOFT Software Engineering Notes 17(4):40–52
Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: An update. Inf Softw Technol 64:1–18
Rocha L, Andrade R, Britto R, et al. (2017) Preventing erosion in exception handling design using static-architecture conformance checking. In: Proceedings of the 11th European Conference on Software Architecture (ECSA), Canterbury, United Kingdom, pp 67–83
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Schmitt Laser M, Medvidovic N, Le DM, Garcia J (2020) Arcade: an extensible workbench for architecture recovery, change, and decay evaluation. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 1546–1550
Shahbazian A, Lee YK, Le DM, Medvidović N (2018) Recovering architectural design decisions. In: Proceedings of the 15th IEEE International Conference on Software Architecture (ICSA), pp 95–104
Shahin M, Liang P, Li Z (2013) Recovering software architectural knowledge from documentation using conceptual model. In: Proceedings of the 25th International Conference on Software Engineering and Knowledge Engineering (SEKE), pp 556–561
Shahin M, Liang P, Li Z (2014) Do architectural design decisions improve the understanding of software architecture? two controlled experiments. In: Proceedings of the 22nd International Conference on Program Comprehension (ICPC), pp 3–13
Shaw M, Clements P (2006) The golden age of software architecture. IEEE Softw 23(2):31–39
Singhal A (2001) Modern information retrieval: A brief overview. IEEE Data Eng Bull 24(4):35–43
Soliman M, Galster M, Salama AR, Riebisch M (2016) Architectural knowledge for technology decisions in developer communities: An exploratory study with stackoverflow. In: Proceedings of the 13th Working IEEE/IFIP Conference on Software Architecture (WICSA), pp 128–133
Soliman M, Galster M, Riebisch M (2017) Developing an ontology for architecture knowledge from developer communities. In: Proceedings of the 14th IEEE International Conference on Software Architecture (ICSA), pp 89–92
Soliman M, Galster M, Avgeriou P (2021) An exploratory study on architectural knowledge in issue tracking systems. In: Proceedings of the 15th European Conference on Software Architecture (ECSA), pp 117–133
Soliman M, Malavolta I, Mirakhorli M (2021) Preface of the 1st international workshop on mining software repositories for software architecture (MSR4SA’21). In: Proceedings of the 15th European Conference on Software Architecture-Companion (ECSA-C), pp 1–2
Soliman M, Wiese M, Li Y, Riebisch M, Avgeriou P (2021) Exploring web search engines to find architectural knowledge. In: Proceedings of the 18th IEEE International Conference on Software Architecture (ICSA), pp 162–172
Souza E, Moreira A, Goulão M (2019) Deriving architectural models from requirements specifications: A systematic mapping study. Inf Softw Technol 109:26–39
Stevanetic S, Zdun U (2014) Exploring the relationships between the understandability of components in architectural component models and component level metrics. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE), Gothenburg, Sweden, pp 1–10
Tang A, Avgeriou P, Jansen A, Capilla R, Babar MA (2010) A comparative study of architecture knowledge management tools. J Syst Softw 83(3):352–370
Tavakoli M, Zhao L, Heydari A, Nenadić G (2018) Extracting useful software development information from mobile application reviews: A survey of intelligent mining techniques and tools. Expert Syst Appl 113:186–199
Tzerpos V, Holt RC (2000) Accd: An algorithm for comprehension-driven clustering. In: Proceedings 7th Working Conference on Reverse Engineering (WCRE), pp 258–267
Velasco-Elizondo P, Marín-Piña R, Vazquez-Reyes S, Mora-Soto A, Mejia J (2016) Knowledge representation and information extraction for analysing architectural patterns. Science of Computer Programming 121:176–189
Wang S, Liu T, Nam J, Tan L (2018) Deep semantic feature learning for software defect prediction. IEEE Trans Softw Eng 46(12):1267–1293
Weinreich R, Buchgeher G (2012) Towards supporting the software architecture life cycle. J Syst Softw 85(3):546–561
Williams BJ, Carver JC (2010) Characterizing software architecture changes: A systematic review. Inf Softw Technol 52(1):31–51
Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international Conference on Evaluation and Assessment in Software Engineering (EASE), pp 1–10
Wohlin C, Höst M, Henningsson K (2003) Empirical research methods in software engineering. In: Empirical Methods and Studies in Software Engineering, pp 7–23
Yang X, Song Z, King I, Xu Z (2023) A survey on deep semi-supervised learning. IEEE Trans Knowl Data Eng 35(9):8934–8954
Yang Y, Xia X, Lo D, Bi T, Grundy J, Yang X (2022) Predictive models in software engineering: Challenges and opportunities. ACM Transactions on Software Engineering and Methodology 31(3):1–72
Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning (ICML), pp 856–863
Zhao L, Alhoshan W, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca EV, Batista-Navarro RT (2021) Natural language processing for requirements engineering: a systematic mapping study. ACM Computing Surveys 54(3):1–41
Zogaan W, Mujhid I, Santos S, JC, Gonzalez D, Mirakhorli M (2017) Automated training-set creation for software architecture traceability problem. Empir Softw Eng 22(3):1028–1062
Acknowledgements
This work is partially sponsored by the National Natural Science Foundation of China (NSFC) under Grant No. 62172311 and 62176099, the Special Fund of Hubei Luojia Laboratory, the financial support from the China Scholarship Council, Shenzhen Polytechnic University with Grant No. 6022312043K, and State Key Laboratory for Novel Software Technology at Nanjing University with Grant No. KFKT2022B37.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Neil Ernst.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jean de Dieu, M., Liang, P., Shahin, M. et al. Mining architectural information: A systematic mapping study. Empir Software Eng 29, 79 (2024). https://doi.org/10.1007/s10664-024-10480-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-024-10480-6