[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Mining architectural information: A systematic mapping study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information (e.g., architectural models) to support architecting activities, such as architecture understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and December 2022. Of the 104 primary studies finally selected, 7 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 11 categories of sources have been leveraged for mining architectural information, among which version control system (e.g., GitHub) is the most popular source; 11 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 95 approaches and 56 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availibility Statement

The Supplementary Material of the current study is available in the Zenodo repository at de Dieu et al. (2024).

Notes

  1. https://www.maxqda.com/

  2. https://essere.disco.unimib.it/wiki/arcan/

  3. https://depfind.sourceforge.io/

References

  • Alon U, Sadaka R, Levy O, Yahav E (2020) Structural language models of code. In: Proceedings of the 37th International conference on machine learning (ICML), pp 245–256

  • Alves V, Niu N, Alves C, Valença G (2010) Requirements engineering for software product lines: A systematic literature review. Information and Software Technology 52(8):806–820

    Google Scholar 

  • Ampatzoglou A, Bibi S, Avgeriou P, Verbeek M, Chatzigeorgiou A (2019) Identifying, categorizing and mitigating threats to validity in software engineering secondary studies. Inf Softw Technol 106:201–230

    Google Scholar 

  • Bass L, Clements P, Kazman R (2012) Software Architecture in Practice, 3rd edn. Addson-Wesley Professional

  • Bavota G, Gethers M, Oliveto R, Poshyvanyk D, Lucia Ad (2014) Improving software modularization via automated analysis of latent topics and dependencies. ACM Trans Softw Eng Methodol 23(1):1–33

    Google Scholar 

  • Bedjeti A, Lago P, Lewis GA, De Boer RD, Hilliard R (2017) Modeling context with an architecture viewpoint. In: Proceedings of the 1st IEEE International Conference on Software Architecture (ICSA), pp 117–120

  • Bengtsson P, Lassing N, Bosch J, van Vliet H (2004) Architecture-level modifiability analysis (alma). J Syst Softw 69(1–2):129–147

    Google Scholar 

  • Bhat M, Shumaiev K, Biesdorf A, Hohenstein U, Matthes F (2017) Automatic extraction of design decisions from issue management systems: a machine learning based approach. In: Proceedings of the 11th European Conference on Software Architecture (ECSA), pp 138–154

  • Bi T, Liang P, Tang A (2018) Architecture patterns, quality attributes, and design contexts: How developers design with them? In: Proceedings of the 25th Asia-Pacific Software Engineering Conference (APSEC), pp 49–58

  • Bi T, Liang P, Tang A, Xia X (2021) Mining architecture tactics and quality attributes knowledge in stack overflow. J Syst Softw 180:111005

    Google Scholar 

  • Borrego G, Morán AL, Palacio RR, Vizcaíno A, García FO (2019) Towards a reduction in architectural knowledge vaporization during agile global software development. Inf Softw Technol 112:68–82

    Google Scholar 

  • Campbell JL, Quincy C, Osserman J, Pedersen OK (2013) Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociol Methods Res 42(3):294–320

    MathSciNet  Google Scholar 

  • Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S (2015) Defect prediction as a multiobjective optimization problem. Software Testing, Verification and Reliability 25(4):426–459

    Google Scholar 

  • Capilla R, Jansen A, Tang A, Avgeriou P, Babar MA (2016) 10 years of software architecture knowledge management: Practice and future. J Syst Softw 116:191–205

    Google Scholar 

  • Casamayor A, Godoy D, Campo M (2012) Functional grouping of natural language requirements for assistance in architectural software design. Knowledge-Based Systems 30:78–86

    Google Scholar 

  • Casamayor A, Godoy D, Campo M (2012) Mining textual requirements to assist architectural software design: A state of the art review. Artif Intell Rev 38(3):173–191

    Google Scholar 

  • Cervantes H, Kazman R (2016) Designing software architectures: a practical approach. Addison-Wesley Professional

    Google Scholar 

  • Chaabane M, Rodriguez IB, Drira K, Jmaiel M (2017) Mining approach for software architectures’ description discovery. In: Proceedings of the 14th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp 879–886

  • Chang Y, Wang X, Wang J, Wu Y, Yang L, Zhu K, Chen H, Yi X, Wang C, Wang Y, et al. (2024) A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology

  • Chen L, Babar MA, Zhang H (2010) Towards an evidence-based understanding of electronic data sources. In: Proceedings of the 14th International Conference on Evaluation and Assessment in Software Engineering (EASE), pp 1–4

  • Chen Z, Jiang R, Zhang Z, Pei Y, Pan M, Zhang T, Li X (2020) Enhancing example-based code search with functional semantics. J Syst Softw 165:110568

    Google Scholar 

  • Ciniselli M, Cooper N, Pascarella L, Poshyvanyk D, Di Penta M, Bavota G (2021) An empirical study on the usage of bert models for code completion. In: Proceedings of the 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), pp 108–119

  • Clements P, Garlan D, Little R, Nord R, Stafford J (2003) Documenting software architectures: Views and beyond. In: Proceedings of the 25th International Conference on Software Engineering (ICSE), pp 740–741

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1):37–46

    Google Scholar 

  • Dąbrowski J, Letier E, Perini A, Susi A (2022) Analysing app reviews for software engineering: a systematic literature review. Empir Softw Eng 27(2):43

    Google Scholar 

  • de Dieu MJ, Liang P, Shahin M (2022) How do developers search for architectural information? an industrial survey. In: Proceedings of the 19th IEEE International Conference on Software Architecture (ICSA), pp 58–68

  • de Dieu MJ, Liang P, Shahin M (2024). Supplementary Material for the Paper: Mining Architectural Information: A Systematic Mapping Study. https://doi.org/10.5281/zenodo.10354000

    Article  Google Scholar 

  • Ding W, Liang P, Tang A, van Vliet H, Shahin M (2014) How do open source communities document software architecture: An exploratory survey. In: Proceedings of the 19th International Conference on Engineering of Complex Computer Systems (ICECCS), pp 136–145

  • Ding W, Liang P, Tang A, van Vliet H (2015) Understanding the causes of architecture changes using OSS mailing lists. International Journal of Software Engineering and Knowledge Engineering 25(9 &10):1633–1651

    Google Scholar 

  • Dinh T, Zhao J, Tan S, Negrinho R, Lausen L, Zha S, Karypis G (2023) Large language models of code fail at completing code with potential bugs. In: Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS), pp 1–27

  • do Nascimento Vale L, Maia MdA (2015) Keecle: Mining key architecturally relevant classes using dynamic analysis. In: Proceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 566–570

  • Ducasse S, Pollet D (2009) Software architecture reconstruction: A process-oriented taxonomy. IEEE Trans Softw Eng 35(4):573–591

    Google Scholar 

  • Garcia J, Mirakhorli M, Xiao L, Zhao Y, Mujhid I, Pham K, Okutan A, Malek S, Kazman R, Cai Y, Medvidovic N (2021) Constructing a shared infrastructure for software architecture analysis and maintenance. In: Proceedings of the 18th IEEE International Conference on Software Architecture (ICSA), pp 150–161

  • Guha R, McCool R, Miller E (2003) Semantic search. In: Proceedings of the 12th International Conference on World Wide Web (WWW), pp 700–709

  • Guo J, Fan Y, Pang L, Yang L, Ai Q, Zamani H, Wu C, Croft WB, Cheng X (2020) A deep look into neural ranking models for information retrieval. Inf Process Manag 57(6):102067

    Google Scholar 

  • Harbo SKR, Voldby EP, Madsen J, Albano M (2024) Acsmt: A plugin for eclipse papyrus to model systems of systems. Science of Computer Programming 231:103008

    Google Scholar 

  • Hassan AE (2008) The road ahead for mining software repositories. In: Proceedings of the 2008 Frontiers of Software Maintenance (FoSM), pp 48–57

  • Hofmeister C, Kruchten P, Nord RL, Obbink H, Ran A, America P (2007) A general model of software architecture design derived from five industrial approaches. J Syst Softw 80(1):106–126

    Google Scholar 

  • Hull E, Jackson K, Dick J (2005) Requirements Engineering in the Solution Domain. Springer

    Google Scholar 

  • ISO/IEC/IEEE (2011) Systems and Software Engineering - Architecture Description. ISO/IEC/IEEE 42010:2011(E) (Revision of ISO/IEC 42010:2007 and IEEE Std 1471-2000) pp 1–46

  • Jansen A, Bosch J (2005) Software architecture as a set of architectural design decisions. In: Proceedings of the 5th IEEE/IFIP Working Conference on Software Architecture (WICSA), pp 109–120

  • Jansen A, Avgeriou P, van der Ven JS (2009) Enriching software architecture documentation. J Syst Softw 82(8):1232–1248

    Google Scholar 

  • Kazman R, Cai Y, Mo R, Feng Q, Xiao L, Haziyev S, Fedak V, Shapochka A (2015) A case study in locating the architectural roots of technical debt. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE), pp 179–188

  • Kitchenham B, Charters S, et al. (2007) Guidelines for performing systematic literature reviews in software engineering

  • Kotsiantis SB, Zaharakis ID, Pintelas PE (2006) Machine learning: A review of classification and combining techniques. Artif Intell Rev 26(3):159–190

    Google Scholar 

  • Koziolek H, Domis D, Goldschmidt T, Vorst P (2013) Measuring architecture sustainability. IEEE Softw 30(6):54–62

    Google Scholar 

  • Kruchten P (1995) The 4+1 view model of architecture. IEEE Softw 12(6):42–50

    Google Scholar 

  • Kruchten P (2004) An ontology of architectural design decisions in software-intensive systems. In: Proceedings of the 2nd Groningen Workshop on Software Variability Management (SVM), pp 54–61

  • Li M, Yang Y, Shi L, Wang Q, Hu J, Peng X, Liao W, Pi G (2020) Automated extraction of requirement entities by leveraging lstm-crf and transfer learning. In: Proceedings fo the 36th IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 208–219

  • Li R, Liang P, Soliman M, Avgeriou P (2022) Understanding software architecture erosion: A systematic mapping study. Journal of Software: Evolution and Process 34(3):e2423

    Google Scholar 

  • Li Z, Liang P, Avgeriou P (2013) Application of knowledge-based approaches in software architecture: A systematic mapping study. Inf Softw Technol 55(5):777–794

    Google Scholar 

  • Li Z, Liang P, Avgeriou P (2014) Architectural debt management in value-oriented architecting. In: Economics-Driven Software Architecture, Elsevier, pp 183–204

  • Li Z, Avgeriou P, Liang P (2015) A systematic mapping study on technical debt and its management. J Syst Softw 101:193–220

    Google Scholar 

  • Liu F, Li G, Zhao Y, Jin Z (2020) Multi-task learning based pre-trained language model for code completion. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 473–485

  • Mahadi A, Tongay K, Ernst NA (2020) Cross-dataset design discussion mining. In: Proceeding of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 149–160

  • Malavolta I, Lago P, Muccini H, Pelliccione P, Tang A (2013) What industry needs from architectural languages: A survey. IEEE Trans Softw Eng 39:869–891

    Google Scholar 

  • Malavolta I, Lewis GA, Schmerl B, Lago P, Garlan D (2021) Mining guidelines for architecting robotics software. J Syst Softw 178:110969

    Google Scholar 

  • Mirakhorli M, Carvalho J, Cleland-Huang J, Mäder P (2013) A domain-centric approach for recommending architectural tactics to satisfy quality concerns. In: Proceedings of the 3rd International Workshop on the Twin Peaks of Requirements and Architecture (TwinPeaks), pp 1–8

  • Nafi KW, Kar TS, Roy B, Roy CK, Schneider KA (2019) Clcdsa: cross language code clone detection using syntactical features and api documentation. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 1026–1037

  • Naghdipour A, Hasheminejad SMH (2023) Implications of semi-supervised learning for design pattern selection. Software Quality Journal 31(3):809–842

    Google Scholar 

  • Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary Computation 16:1–18

    Google Scholar 

  • Nazar N, Hu Y, Jiang H (2016) Summarizing software artifacts: A literature review. J Comput Sci Technol 31(5):883–909

    Google Scholar 

  • Nguyen HA, Nguyen TN, Dig D, Nguyen S, Tran H, Hilton M (2019) Graph-based mining of in-the-wild, fine-grained, semantic code change patterns. In: Proceedings of the 41st IEEE/ACM International Conference on Software Engineering (ICSE), pp 819–830

  • Perry DE, Wolf AL (1992) Foundations for the study of software architecture. ACM SIGSOFT Software Engineering Notes 17(4):40–52

    Google Scholar 

  • Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: An update. Inf Softw Technol 64:1–18

    Google Scholar 

  • Rocha L, Andrade R, Britto R, et al. (2017) Preventing erosion in exception handling design using static-architecture conformance checking. In: Proceedings of the 11th European Conference on Software Architecture (ECSA), Canterbury, United Kingdom, pp 67–83

  • Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Google Scholar 

  • Schmitt Laser M, Medvidovic N, Le DM, Garcia J (2020) Arcade: an extensible workbench for architecture recovery, change, and decay evaluation. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 1546–1550

  • Shahbazian A, Lee YK, Le DM, Medvidović N (2018) Recovering architectural design decisions. In: Proceedings of the 15th IEEE International Conference on Software Architecture (ICSA), pp 95–104

  • Shahin M, Liang P, Li Z (2013) Recovering software architectural knowledge from documentation using conceptual model. In: Proceedings of the 25th International Conference on Software Engineering and Knowledge Engineering (SEKE), pp 556–561

  • Shahin M, Liang P, Li Z (2014) Do architectural design decisions improve the understanding of software architecture? two controlled experiments. In: Proceedings of the 22nd International Conference on Program Comprehension (ICPC), pp 3–13

  • Shaw M, Clements P (2006) The golden age of software architecture. IEEE Softw 23(2):31–39

    Google Scholar 

  • Singhal A (2001) Modern information retrieval: A brief overview. IEEE Data Eng Bull 24(4):35–43

    Google Scholar 

  • Soliman M, Galster M, Salama AR, Riebisch M (2016) Architectural knowledge for technology decisions in developer communities: An exploratory study with stackoverflow. In: Proceedings of the 13th Working IEEE/IFIP Conference on Software Architecture (WICSA), pp 128–133

  • Soliman M, Galster M, Riebisch M (2017) Developing an ontology for architecture knowledge from developer communities. In: Proceedings of the 14th IEEE International Conference on Software Architecture (ICSA), pp 89–92

  • Soliman M, Galster M, Avgeriou P (2021) An exploratory study on architectural knowledge in issue tracking systems. In: Proceedings of the 15th European Conference on Software Architecture (ECSA), pp 117–133

  • Soliman M, Malavolta I, Mirakhorli M (2021) Preface of the 1st international workshop on mining software repositories for software architecture (MSR4SA’21). In: Proceedings of the 15th European Conference on Software Architecture-Companion (ECSA-C), pp 1–2

  • Soliman M, Wiese M, Li Y, Riebisch M, Avgeriou P (2021) Exploring web search engines to find architectural knowledge. In: Proceedings of the 18th IEEE International Conference on Software Architecture (ICSA), pp 162–172

  • Souza E, Moreira A, Goulão M (2019) Deriving architectural models from requirements specifications: A systematic mapping study. Inf Softw Technol 109:26–39

    Google Scholar 

  • Stevanetic S, Zdun U (2014) Exploring the relationships between the understandability of components in architectural component models and component level metrics. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE), Gothenburg, Sweden, pp 1–10

  • Tang A, Avgeriou P, Jansen A, Capilla R, Babar MA (2010) A comparative study of architecture knowledge management tools. J Syst Softw 83(3):352–370

    Google Scholar 

  • Tavakoli M, Zhao L, Heydari A, Nenadić G (2018) Extracting useful software development information from mobile application reviews: A survey of intelligent mining techniques and tools. Expert Syst Appl 113:186–199

    Google Scholar 

  • Tzerpos V, Holt RC (2000) Accd: An algorithm for comprehension-driven clustering. In: Proceedings 7th Working Conference on Reverse Engineering (WCRE), pp 258–267

  • Velasco-Elizondo P, Marín-Piña R, Vazquez-Reyes S, Mora-Soto A, Mejia J (2016) Knowledge representation and information extraction for analysing architectural patterns. Science of Computer Programming 121:176–189

    Google Scholar 

  • Wang S, Liu T, Nam J, Tan L (2018) Deep semantic feature learning for software defect prediction. IEEE Trans Softw Eng 46(12):1267–1293

    Google Scholar 

  • Weinreich R, Buchgeher G (2012) Towards supporting the software architecture life cycle. J Syst Softw 85(3):546–561

    Google Scholar 

  • Williams BJ, Carver JC (2010) Characterizing software architecture changes: A systematic review. Inf Softw Technol 52(1):31–51

    Google Scholar 

  • Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international Conference on Evaluation and Assessment in Software Engineering (EASE), pp 1–10

  • Wohlin C, Höst M, Henningsson K (2003) Empirical research methods in software engineering. In: Empirical Methods and Studies in Software Engineering, pp 7–23

  • Yang X, Song Z, King I, Xu Z (2023) A survey on deep semi-supervised learning. IEEE Trans Knowl Data Eng 35(9):8934–8954

    Google Scholar 

  • Yang Y, Xia X, Lo D, Bi T, Grundy J, Yang X (2022) Predictive models in software engineering: Challenges and opportunities. ACM Transactions on Software Engineering and Methodology 31(3):1–72

    Google Scholar 

  • Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning (ICML), pp 856–863

  • Zhao L, Alhoshan W, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca EV, Batista-Navarro RT (2021) Natural language processing for requirements engineering: a systematic mapping study. ACM Computing Surveys 54(3):1–41

    Google Scholar 

  • Zogaan W, Mujhid I, Santos S, JC, Gonzalez D, Mirakhorli M (2017) Automated training-set creation for software architecture traceability problem. Empir Softw Eng 22(3):1028–1062

Download references

Acknowledgements

This work is partially sponsored by the National Natural Science Foundation of China (NSFC) under Grant No. 62172311 and 62176099, the Special Fund of Hubei Luojia Laboratory, the financial support from the China Scholarship Council, Shenzhen Polytechnic University with Grant No. 6022312043K, and State Key Laboratory for Novel Software Technology at Nanjing University with Grant No. KFKT2022B37.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Liang.

Additional information

Communicated by: Neil Ernst.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jean de Dieu, M., Liang, P., Shahin, M. et al. Mining architectural information: A systematic mapping study. Empir Software Eng 29, 79 (2024). https://doi.org/10.1007/s10664-024-10480-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-024-10480-6

Keywords

Navigation