Exploring the impact of code review factors on the code review comment generation

Junyi Lu^1,2,
Zhangyi Li³,
Chenjie Shen^1,2,
Li Yang¹ &
…
Chun Zuo⁴

359 Accesses
2 Altmetric
Explore all metrics

Abstract

The pursuit of efficiency in code review has intensified, prompting a wave of research focused on automating code review comment generation. However, the existing body of research is fragmented, characterized by disparate approaches to task formats, factor selection, and dataset processing. Such variability often leads to an emphasis on refining model structures, overshadowing the critical roles of factor selection and representation. To bridge these gaps, we have assembled a comprehensive dataset that includes not only the primary factors identified in previous studies but also additional pertinent data. Utilizing this dataset, we assessed the impact of various factors and their representations on two leading computational approaches: fine-tuning pre-trained models and using prompts in large language models. Our investigation also examines the potential benefits and drawbacks of incorporating abstract syntax trees to represent code change structures. Our results reveal that: (1) the impact of factors varies between computational paradigms and their representations can have complex interactions; (2) integrating a code structure graph can enhance the graphing of code content, yet potentially impair the understanding capabilities of language models; and (3) strategically combining factors can elevate basic models to outperform those specifically pre-trained for tasks. These insights are pivotal for steering future research in code review automation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CORAL: COde RepresentAtion learning with weakly-supervised transformers for analyzing data analysis

Article Open access 18 March 2022

Do explicit review strategies improve code review performance? Towards understanding the role of cognitive load

Article Open access 07 May 2022

On the Need for a New Generation of Code Review Tools

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

We also utilized the $CRCN_{long}$ dataset for additional experiments, revealing interesting insights regarding input token length and minor differences in results for RQ1–3. However, since the text length criteria for the $CRCN_{short}$ dataset is more consistent with those commonly used in prior research, facilitating a fair comparison, and the general findings from both the small and large datasets were similar, we only represent the results of the $CRCN_{short}$ dataset here to reduce repetition. Comprehensive results are available in our open-source repository.
Complete results, including analyses of 29 diverse factor combinations across two datasets and paradigms, are available in the appendix of our open-source repository.

References

Ahmed, T., Devanbu, P.: Few-shot training llms for project-specific code-summarization. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. Association for Computing Machinery, New York, NY, USA, ASE ’22 (2023a) https://doi.org/10.1145/3551349.3559555
Ahmed, T., Ghosh, S., Bansal, C., et al.: Recommending root-cause and mitigation steps for cloud incidents using large language models. In: Proceedings of the 45th International Conference on Software Engineering, pp. 1737–1749. IEEE Press, ICSE ’23 (2023b) https://doi.org/10.1109/ICSE48619.2023.00149
Ahmed, T., Pai, KS., Devanbu, P., et al.: Automatic semantic augmentation of language model prompts (for code summarization). In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp. 1–13 (2024)
Allamanis, M.: The adverse effects of code duplication in machine learning models of code. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, pp. 143–153 (2019)
Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the Acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Baxter, ID., Yahin, A., Moura, L., et al.: Clone detection using abstract syntax trees. In: Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272), pp. 368–377. IEEE (1998)
Borges, H., Valente, M.T.: What’s in a github star? Understanding repository starring practices in a social coding platform. J. Syst. Softw. 146, 112–129 (2018). https://doi.org/10.1016/j.jss.2018.09.016
Article Google Scholar
Chen, M., Tworek, J., Jun, H., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Dabic, O., Aghajani, E., Bavota, G.: Sampling projects in github for msr studies. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp. 560–564. (2021) https://doi.org/10.1109/MSR52588.2021.00074
Eliseeva, A., Sokolov, Y., Bogomolov, E., et al.: From commit message generation to history-aware commit message completion. In: Proceedings of the 34th ACM/IEEE International Conference on Automated Software Engineering (2023)
Face, H.: Transformers python library. PyPI. https://pypi.org/project/transformers/ (2023)
Fagan, M.: Design and Code Inspections to Reduce Errors in Program Development, pp. 575–607. Springer-Verlag, Berlin (2002)
Google Scholar
Falleri, J., Morandat, F., Blanc, X., et al.: Fine-grained and accurate source code differencing. In: ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15–19, 2014, pp. 313–324 (2014) https://doi.org/10.1145/2642937.2642982
GitHub: Let’s build from here. GitHub. https://github.com/ (2008)
GitHub: Github graphql api. GitHub Docs. https://docs.github.com/en/graphql (2016)
Gupta, A., Sundaresan, N.: Intelligent code reviews using deep learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18) Deep Learning Day (2018)
Hirao, T., Ihara, A., Ueda, Y., et al.: The impact of a low level of agreement among reviewers in a code review process. In: IFIP International Conference on Open Source Systems, Springer, pp. 97–110 (2016)
Hong, Y., Tantithamthavorn, C., Thongtanunam, P., et al.: Commentfinder: a simpler, faster, more accurate code review comments recommendation. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 507–519 (2022)
Hu, X., Li, G., Xia, X., et al.: Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension. pp. 200–21. Association for Computing Machinery, New York, NY, USA, ICPC ’18 (2018) https://doi.org/10.1145/3196321.3196334
Hu, X., Li, G., Xia, X., et al.: Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 25(3), 2179–2217 (2020)
Article Google Scholar
JavaParser: Tools for your java code. JavaParser. https://javaparser.org/ (2022)
Jawahar, G., Sagot, B., Seddah, D.: What does bert learn about the structure of language? In: ACL 2019-57th Annual Meeting of the Association for Computational Linguistics (2019)
Khan, JY., Uddin, G.: Automatic code documentation generation using gpt-3. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–6 (2022)
LeClair, A., McMillan, C.: Recommendations for datasets for source code summarization. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1. (Long and Short Papers), pp. 3931–3937 (2019a)
LeClair, A., Jiang, S., McMillan, C.: A neural model for generating natural language summaries of program subroutines. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 795–806 (2019b) https://doi.org/10.1109/ICSE.2019.00087
Lemieux, C., Inala, JP., Lahiri, SK., et al.: Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In: International Conference on Software Engineering (ICSE) (2023)
Li, L., Yang, L., Jiang, H., et al.: Auger: automatically generating review comments with pre-training models. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1009–1021. Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2022 (2022a) https://doi.org/10.1145/3540250.3549099
Li, Z., Lu, S., Guo, D., et al.: Automating code review activities by large-scale pre-training. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1035-1047. Association for Computing Machinery, New York, NY, USA, ESEC/FSE 2022 (2022b) https://doi.org/10.1145/3540250.3549081
Lin, CY.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp. 74–81 (2004)
Liu, J., Xia, CS., Wang, Y., et al.: Is your code generated by chatgpt really correct? Rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems 36 (2024)
Lu, J., Yu, L., Li, X., et al.: Llama-reviewer: Advancing code review automation with large language models through parameter-efficient fine-tuning. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 647–658. IEEE (2023)
Munaiah, N., Kroh, S., Cabrey, C., et al.: Curating github for engineered software projects. Empir. Softw. Eng. 22, 3219–3253 (2017). https://doi.org/10.1007/s10664-017-9512-6
Article Google Scholar
OpenAI: Gpt-4 technical report. arXiv:2303.08774 (2023a)
OpenAI: Openai python library. PyPI, https://pypi.org/project/openai/ (2023b)
Panthaplackel, S., Nie, P., Gligoric, M., et al.: Learning to update natural language comments based on code changes. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1853–1868. Association for Computational Linguistics, Online (2020) https://doi.org/10.18653/v1/2020.acl-main.168
Panthaplackel, S., Li, J.J., Gligoric, M., et al.: Deep just-in-time inconsistency detection between comments and source code. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 427–435 (2021)
Papineni, K., Roukos, S., Ward, T., et al.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Shan, Q., Sukhdeo, D., Huang, Q., et al.: Using nudges to accelerate code reviews at scale. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 472–482 (2022)
Siow, JK., Gao, C., Fan, L., et al.: Core: automating review recommendation for code changes. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 284–295. IEEE (2020)
Su, C.Y., McMillan, C.: Distilled gpt for source code summarization. Autom. Softw. Eng. 31(1), 22 (2024)
Article Google Scholar
Tufano, R., Pascarella, L., Tufano, M., et al.: Towards automating code review activities. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 163–174. IEEE (2021)
Tufano, R., Masiero, S., Mastropaolo, A., et al.: Using pre-trained models to boost code review automation. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2291–2302 (2022)
Wan, Y., Zhao, Z., Yang, M., et al.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 397–407. Association for Computing Machinery, New York, NY, USA, ASE ’18 (2018) https://doi.org/10.1145/3238147.3238206
Wang, Y., Wang, W., Joty, S., et al.: CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021 ) https://doi.org/10.18653/v1/2021.emnlp-main.685, https://aclanthology.org/2021.emnlp-main.685
Xia, CS., Wei, Y., Zhang, L.: Automated program repair in the era of large pre-trained language models. In: Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery (2023)
Yang, X., Kula, RG., Yoshida, N., et al.: Mining the modern code review repositories: a dataset of people, process and product. In: Proceedings of the 13th International Conference on Mining Software Repositories, pp. 460–463 (2016)
Yin, T.: Lizard: A simple code complexity analyser. GitHub. https://github.com/terryyin/lizard (2016)
Yuan, W., Liu, P.: Kid-review: knowledge-guided scientific review generation with oracle pre-training. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11639–11647 (2022)
Zhang, T., Kishore, V., Wu, F., et al.: Bertscore: evaluating text generation with bert. In: International Conference on Learning Representations (2020)

Download references

Acknowledgments

This work was supported by the National Key Research and Development Program of China (No. 2023YFB3307202,2021YFC3340204) and the Alliance of International Science Organizations Collaborative Research Program (No. ANSO-CR-KP-2022-03).

Author information

Authors and Affiliations

Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Beijing, 100190, China
Junyi Lu, Chenjie Shen & Li Yang
University of Chinese Academy of Sciences, Shijingshan District, Beijing, 100049, China
Junyi Lu & Chenjie Shen
School of AI, Tiangong University, Xiqing District, Tianjin, 300387, China
Zhangyi Li
Sinosoft Company Limited, Zhong Guan Cun, Beijing, 100190, China
Chun Zuo

Authors

Junyi Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhangyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Chenjie Shen
View author publications
You can also search for this author in PubMed Google Scholar
Li Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chun Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Yang.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lu, J., Li, Z., Shen, C. et al. Exploring the impact of code review factors on the code review comment generation. Autom Softw Eng 31, 71 (2024). https://doi.org/10.1007/s10515-024-00469-2

Download citation

Received: 26 June 2024
Accepted: 13 September 2024
Published: 01 October 2024
DOI: https://doi.org/10.1007/s10515-024-00469-2

Exploring the impact of code review factors on the code review comment generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CORAL: COde RepresentAtion learning with weakly-supervised transformers for analyzing data analysis

Do explicit review strategies improve code review performance? Towards understanding the role of cognitive load

On the Need for a New Generation of Code Review Tools

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Exploring the impact of code review factors on the code review comment generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CORAL: COde RepresentAtion learning with weakly-supervised transformers for analyzing data analysis

Do explicit review strategies improve code review performance? Towards understanding the role of cognitive load

On the Need for a New Generation of Code Review Tools

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation