[go: up one dir, main page]

skip to main content
10.1145/3540250.3549099acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

AUGER: automatically generating review comments with pre-training models

Published: 09 November 2022 Publication History

Abstract

Code review is one of the best practices as a powerful safeguard for software quality. In practice, senior or highly skilled reviewers inspect source code and provide constructive comments, consider- ing what authors may ignore, for example, some special cases. The collaborative validation between contributors results in code being highly qualified and less chance of bugs. However, since personal knowledge is limited and varies, the efficiency and effectiveness of code review practice are worthy of further improvement. In fact, it still takes a colossal and time-consuming effort to deliver useful review comments. This paper explores a synergy of multiple practical review comments to enhance code review and proposes AUGER (AUtomatically GEnerating Review comments): a review comments generator with pre-training models. We first collect empirical review data from 11 notable Java projects and construct a dataset of 10,882 code changes. By leveraging Text-to-Text Transfer Transformer (T5) models, the framework synthesizes valuable knowledge in the training stage and effectively outperforms baselines by 37.38% in ROUGE-L. 29% of our automatic review comments are considered useful according to prior studies. The inference generates just in 20 seconds and is also open to training further. Moreover, the performance also gets improved when thoroughly analyzed in case study.

References

[1]
Miltiadis Allamanis. 2019. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 143–153.
[2]
Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In 2013 35th International Conference on Software Engineering (ICSE). 712–721. https://doi.org/10.1109/ICSE.2013.6606617
[3]
Vipin Balachandran. 2013. Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In 2013 35th International Conference on Software Engineering (ICSE). 931–940. https://doi.org/10.1109/ICSE.2013.6606642
[4]
Gabriele Bavota and Barbara Russo. 2015. Four eyes are better than two: On the impact of code reviews on software quality. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). 81–90. https://doi.org/10.1109/ICSM.2015.7332454
[5]
Kent Beck, Martin Fowler, and Grandma Beck. 1999. Bad smells in code. Refactoring: Improving the design of existing code, 1, 1999 (1999), 75–88.
[6]
John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing. 120–128.
[7]
Amiangshu Bosu and Jeffrey C Carver. 2013. Impact of peer code review on peer impression formation: A survey. In 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 133–142.
[8]
Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of Useful Code Reviews: An Empirical Study at Microsoft. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 146–156. https://doi.org/10.1109/MSR.2015.21
[9]
Raouf Boutaba, Mohammad A Salahuddin, Noura Limam, Sara Ayoubi, Nashid Shahriar, Felipe Estrada-Solano, and Oscar M Caicedo. 2018. A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. Journal of Internet Services and Applications, 9, 1 (2018), 1–99.
[10]
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
[11]
Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.
[12]
Agnieszka Ciborowska and Kostadin Damevski. 2021. Fast Changeset-based Bug Localization with BERT. arXiv preprint arXiv:2112.14169.
[13]
Atacílio Cunha, Tayana Conte, and Bruno Gadelha. 2021. Code Review is Just Reviewing Code? A Qualitative Study with Practitioners in Industry. Association for Computing Machinery, New York, NY, USA. 269–274. isbn:9781450390613 https://doi.org/10.1145/3474624.3477063
[14]
Jacek Czerwonka, Michaela Greiler, and Jack Tilford. 2015. Code Reviews Do Not Find Bugs. How the Current Code Review Best Practice Slows Us Down. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 2, 27–28. https://doi.org/10.1109/ICSE.2015.131
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[16]
Ondřej Dušek, David M. Howcroft, and Verena Rieser. 2019. Semantic Noise Matters for Neural Natural Language Generation. In Proceedings of the 12th International Conference on Natural Language Generation. Association for Computational Linguistics, Tokyo, Japan. 421–426. https://doi.org/10.18653/v1/W19-8652
[17]
Ahmed Elnaggar, Wei Ding, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Silvia Severini, Florian Matthes, and Burkhard Rost. 2021. CodeTrans: Towards Cracking the Language of Silicon’s Code Through Self-Supervised Deep Learning and High Performance Computing. arXiv preprint arXiv:2104.02443.
[18]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
[19]
Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 1631–1640. https://doi.org/10.18653/v1/P16-1154
[20]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, and Shengyu Fu. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366.
[21]
Anshul Gupta and Neel Sundaresan. 2018. Intelligent code reviews using deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18) Deep Learning Day.
[22]
Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In 2010 ACM/IEEE 32nd International Conference on Software Engineering. 2, 223–226. https://doi.org/10.1145/1810295.1810335
[23]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the Use of Automated Text Summarization Techniques for Summarizing Source Code. In 2010 17th Working Conference on Reverse Engineering. 35–44. https://doi.org/10.1109/WCRE.2010.13
[24]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9, 8 (1997), 1735–1780.
[25]
Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
[26]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep Code Comment Generation. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). 200–20010.
[27]
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, and Yonghui Wu. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32 (2019), 103–112.
[28]
Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S Weld, Luke Zettlemoyer, and Omer Levy. 2020. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8 (2020), 64–77.
[29]
Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caiming Xiong, and Richard Socher. 2019. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
[30]
Huda Khayrallah and Philipp Koehn. 2018. On the Impact of Various Types of Noise on Neural Machine Translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Association for Computational Linguistics, Melbourne, Australia. 74–83. https://doi.org/10.18653/v1/W18-2709
[31]
Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code Review Quality: How Developers See It. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1028–1038. https://doi.org/10.1145/2884781.2884840
[32]
Oleksii Kononenko, Olga Baysal, and Michael W Godfrey. 2016. Code review quality: How developers see it. In Proceedings of the 38th international conference on software engineering. 1028–1038.
[33]
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain. 74–81. https://aclanthology.org/W04-1013
[34]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586.
[35]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
[36]
Antonio Mastropaolo, Luca Pascarella, and Gabriele Bavota. 2022. Using Deep Learning to Generate Complete Log Statements. arXiv preprint arXiv:2201.04837.
[37]
Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2014. The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). Association for Computing Machinery, New York, NY, USA. 192–201. isbn:9781450328630 https://doi.org/10.1145/2597073.2597076
[38]
Rodrigo Morales, Shane McIntosh, and Foutse Khomh. 2015. Do code review practices impact design quality? A case study of the Qt, VTK, and ITK projects. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 171–180. https://doi.org/10.1109/SANER.2015.7081827
[39]
Jose García Moreno-Torres, José A. Saez, and Francisco Herrera. 2012. Study on the Impact of Partition-Induced Dataset Shift on k-Fold Cross-Validation. IEEE Transactions on Neural Networks and Learning Systems, 23, 8 (2012), 1304–1312. https://doi.org/10.1109/TNNLS.2012.2199516
[40]
Dana Movshovitz-Attias and William Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 35–40.
[41]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.
[42]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1, 8 (2019), 9.
[43]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
[44]
Mohammad Masudur Rahman, Chanchal K. Roy, and Raula G. Kula. 2017. Predicting Usefulness of Code Review Comments Using Textual Features and Developer Experience. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 215–226. https://doi.org/10.1109/MSR.2017.17
[45]
Peter C. Rigby and Christian Bird. 2013. Convergent Contemporary Software Peer Review Practices. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA. 202–212. isbn:9781450322379 https://doi.org/10.1145/2491411.2491444
[46]
Justyna Sarzynska-Wawer, Aleksander Wawer, Aleksandra Pawlak, Julia Szymanowska, Izabela Stefaniak, Michal Jarkiewicz, and Lukasz Okruszek. 2021. Detecting formal thought disorder by deep contextualized word representations. Psychiatry Research, 304 (2021), 114135.
[47]
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, and Cliff Young. 2018. Mesh-tensorflow: Deep learning for supercomputers. arXiv preprint arXiv:1811.02084.
[48]
Lin Shi, Ziyou Jiang, Ye Yang, Xiao Chen, Yumin Zhang, Fangwen Mu, Hanzhi Jiang, and Qing Wang. 2021. ISPY: Automatic Issue-Solution Pair Extraction from Community Live Chats. arXiv preprint arXiv:2109.07055.
[49]
Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, and Xuan Huo. 2019. Automatic code review by learning the revision of source code. In Proceedings of the AAAI Conference on Artificial Intelligence. 33, 4910–4917.
[50]
Forrest Shull and Carolyn Seaman. 2008. Inspecting the history of inspections: An example of evidence-based technology diffusion. IEEE software, 25, 1 (2008), 88–90.
[51]
Xiaotao Song, Hailong Sun, Xu Wang, and Jiafei Yan. 2019. A Survey of Automatic Generation of Source Code Comments: Algorithms and Techniques. IEEE Access, 7 (2019), 111411–111428. https://doi.org/10.1109/ACCESS.2019.2931579
[52]
Davide Spadini, Gül Çalikli, and Alberto Bacchelli. 2020. Primers or Reminders? The Effects of Existing Review Comments on Code Review. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1171–1182. isbn:9781450371216 https://doi.org/10.1145/3377811.3380385
[53]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K. Vijay-Shanker. 2010. Towards Automatically Generating Summary Comments for Java Methods. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE ’10). Association for Computing Machinery, New York, NY, USA. 43–52. isbn:9781450301169 https://doi.org/10.1145/1858996.1859006
[54]
Jeniya Tabassum, Mounica Maddela, Wei Xu, and Alan Ritter. 2020. Code and Named Entity Recognition in StackOverflow. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 4913–4926. https://doi.org/10.18653/v1/2020.acl-main.443
[55]
Ted Tenny. 1985. Procedures and Comments vs. the Banker’s Algorithm. SIGCSE Bull., 17, 3 (1985), sep, 44–53. issn:0097-8418 https://doi.org/10.1145/382208.382523
[56]
Ted Tenny. 1988. Program readability: Procedures versus comments. IEEE Transactions on Software Engineering, 14, 9 (1988), 1271–1279.
[57]
Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, and Gabriele Bavota. 2022. Using Pre-Trained Models to Boost Code Review Automation. arXiv preprint arXiv:2201.06850.
[58]
Rosalia Tufano, Luca Pascarella, Michele Tufanoy, Denys Poshyvanykz, and Gabriele Bavota. 2021. Towards Automating Code Review Activities. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 163–174.
[59]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
[60]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu. 2018. Improving Automatic Source Code Summarization via Deep Reinforcement Learning. Association for Computing Machinery, New York, NY, USA. 397–407. isbn:9781450359375 https://doi.org/10.1145/3238147.3238206
[61]
Sinong Wang, Madian Khabsa, and Hao Ma. 2020. To Pretrain or Not to Pretrain: Examining the Benefits of Pretrainng on Resource Rich Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 2209–2213. https://doi.org/10.18653/v1/2020.acl-main.200
[62]
Jason Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196.
[63]
Edmund Wong, Taiyue Liu, and Lin Tan. 2015. CloCom: Mining existing source code for automatic comment generation. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 380–389. https://doi.org/10.1109/SANER.2015.7081848
[64]
S. N. Woodfield, H. E. Dunsmore, and V. Y. Shen. 1981. The Effect of Modularization and Comments on Program Comprehension. In Proceedings of the 5th International Conference on Software Engineering (ICSE ’81). IEEE Press, 215–223. isbn:0897911466
[65]
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shanping Li. 2017. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering, 44, 10 (2017), 951–976.
[66]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32 (2019).
[67]
Yue Yu, Huaimin Wang, Gang Yin, and Charles X. Ling. 2014. Reviewer Recommender of Pull-Requests in GitHub. In 2014 IEEE International Conference on Software Maintenance and Evolution. 609–612. https://doi.org/10.1109/ICSME.2014.107
[68]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2021. A Comprehensive Survey on Transfer Learning. Proc. IEEE, 109, 1 (2021), 43–76. https://doi.org/10.1109/JPROC.2020.3004555

Cited By

View all
  • (2024)Analysing Quality Metrics and Automated Scoring of Code ReviewsSoftware10.3390/software30400253:4(514-533)Online publication date: 29-Nov-2024
  • (2024)Evaluating Source Code Quality with Large Language Models: a comparative studyProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701650(103-113)Online publication date: 5-Nov-2024
  • (2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2022
1822 pages
ISBN:9781450394130
DOI:10.1145/3540250
This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code Review
  2. Machine Learning
  3. Review Comments
  4. Text Generation

Qualifiers

  • Research-article

Funding Sources

  • Strategy Priority Research Program of Chinese Academy of Sciences
  • National Key R&D Program of China
  • Chinese Academy of Sciences-Dongguan Science and Technol- ogy Service Network Plan

Conference

ESEC/FSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)462
  • Downloads (Last 6 weeks)26
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Analysing Quality Metrics and Automated Scoring of Code ReviewsSoftware10.3390/software30400253:4(514-533)Online publication date: 29-Nov-2024
  • (2024)Evaluating Source Code Quality with Large Language Models: a comparative studyProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701650(103-113)Online publication date: 5-Nov-2024
  • (2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
  • (2024)AI-Assisted Assessment of Coding Practices in Modern Code ReviewProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3665664(85-93)Online publication date: 10-Jul-2024
  • (2024)Towards AI-Assisted Synthesis of Verified Dafny MethodsProceedings of the ACM on Software Engineering10.1145/36437631:FSE(812-835)Online publication date: 12-Jul-2024
  • (2024)DivLog: Log Parsing with Prompt Enhanced In-Context LearningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639155(1-12)Online publication date: 20-May-2024
  • (2024)UniLog: Automatic Logging via LLM and In-Context LearningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623326(1-12)Online publication date: 20-May-2024
  • (2024)Towards Efficient Fine-Tuning of Language Models With Organizational Data for Automated Software ReviewIEEE Transactions on Software Engineering10.1109/TSE.2024.342832450:9(2240-2253)Online publication date: 15-Jul-2024
  • (2024)Characterizing the Prevalence, Distribution, and Duration of Stale Reviewer RecommendationsIEEE Transactions on Software Engineering10.1109/TSE.2024.342236950:8(2096-2109)Online publication date: 1-Aug-2024
  • (2024)Code Review Automation: Strengths and Weaknesses of the State of the ArtIEEE Transactions on Software Engineering10.1109/TSE.2023.334817250:2(338-353)Online publication date: 1-Jan-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media