Abstract
The field of code generation, influenced by deep learning, has become crucial in contemporary software engineering, facilitating the conversion of natural language to executable code. A noticeable knowledge gap exists, prompting an exhaustive examination of the current methodologies and innovations. The primary objective of this research is to offer a thorough literature review, illuminating the current state of deep learning-powered code generation. A rigorous systematic review was employed, wherein 28 influential papers from essential academic databases were recognized. An analytical approach was adopted to discern trends, understand the significance of numbers, and draw meaningful conclusions from the data. These papers were then analyzed using a structured methodology. The study unveils insights into large language model code generation, potentially bridging the prevailing knowledge gap and offering direction for future innovations in the domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Xu, F.F., Alon, U., Neubig, G., Hellendoorn, V.J.: A systematic evaluation of large language models of code. In: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pp. 1–10 (2022)
Hendrycks, D., et al.: Measuring coding challenge competence with APPS. CoRR abs/2105.09938, arXiv preprint arXiv:2105.09938 (2021)
Buscemi, A.: A comparative study of code generation using ChatGPT 3.5 across 10 programming languages. arXiv preprint arXiv:2308.04477 (2023)
Yin, P., Neubig, G.: A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696 (2017)
Bahrami, M., et al.: Pytorrent: a python library corpus for large-scale language models. arXiv preprint arXiv:2110.01710 (2021)
Chowdhery, A., et al.: Palm: scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022)
Kocetkov, D., et al.: The stack: 3 TB of permissively licensed source code. arXiv preprint arXiv:2211.15533 (2022)
Soliman, A.S., Hadhoud, M.M., Shaheen, S.I.: MarianCG: a code generation transformer model inspired by machine translation. J. Eng. Appl. Sci. 69(1), 1–23 (2022)
Manh, D.N., et al.: The vault: a comprehensive multilingual dataset for advancing code understanding and generation. arXiv preprint arXiv:2305.06156 (2023)
Shin, J., Nam, J.: A survey of automatic code generation from natural language. J. Inf. Process. Syst. 17(3), 537–555 (2021)
Yu, T., Gu, X., Shen, B.: Code question answering via task-adaptive sequence-to-sequence pre-training. In: 2022 29th Asia-Pacific Software Engineering Conference (APSEC), pp. 229–238. IEEE (2022)
Khan, M.A.M., Bari, M.S., Do, X.L., Wang, W., Parvez, M.R., Joty, S.: xCodeEval: a large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval. arXiv preprint arXiv:2303.03004 (2023)
Yang, Z., Chen, S., Gao, C., Li, Z., Li, G., Lv, R.: Deep learning based code generation methods: a literature review. arXiv preprint arXiv:2303.01056 (2023)
Zan, D., et al.: CERT: continual pre-training on sketches for library-oriented code generation. arXiv preprint arXiv:2206.06888 (2022)
Drozdova, A., Trofimova, E., Guseva, P., Scherbakova, A., Ustyuzhanin, A.: Code4ML: a large-scale dataset of annotated Machine Learning code. PeerJ Comput. Sci. 9, e1230 (2023)
Shen, B., et al.: PanGu-Coder2: boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936 (2023)
Tranfield, D., Denyer, D., Smart, P.: Towards a methodology for developing evidence-informed management knowledge by means of systematic review. Br. J. Manag.Manag. 14(3), 207–222 (2003)
Du, X., et al.: ClassEval: a manually-crafted benchmark for evaluating LLMs on class-level code generation. arXiv preprint arXiv:2308.01861 (2023)
Shinn, N., Cassano, F., Labash, B., Gopinath, A., Narasimhan, K., Yao, S.: Reflexion: language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366 (2023)
Cassano, F., et al.: MultiPL-E: a scalable and polyglot approach to benchmarking neural code generation. IEEE Trans. Softw. Eng.Softw. Eng. 49(7), 3675–3691 (2023). https://doi.org/10.1109/TSE.2023.3267446
Gunasekar, S., et al.: Textbooks are all you need. arXiv preprint arXiv:2306.11644 (2023)
Page, M.J., et al.: The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int. J. Surg. 88, 105906 (2021)
Siddiq, M.L., Casey, B., Santos, J.: A lightweight framework for high-quality code generation. arXiv preprint arXiv:2307.08220 (2023)
Luo, Z., et al.: WizardCoder: empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568 (2023)
Muennighoff, N., et al.: OctoPack: instruction tuning code large language models. arXiv preprint arXiv:2308.07124 (2023)
Rozière, B., et al.: Code llama: open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)
Zheng, Q., et al.: CodeGeeX: a pre-trained model for code generation with multilingual benchmarking on HumanEval-X. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 5673–5684 (2023)
Nijkamp, E., et al: Codegen: an open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022)
Allal, L.B., et al: SantaCoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988 (2023)
Li, R., et al.: StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023)
Wang, Y., Le, H., Gotmare, A.D., Bui, N.D., Li, J., Hoi, S.C.: Codet5+: open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922 (2023)
Zheng, Q., et al.: Codegeex: a pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv:2303.17568 (2023)
Acknowledgments
This research was supported in part by the National Science and Technology Council (NSTC), Taiwan, under grants MOST 110–2410-H-305–013-MY2 and NSTC 112- 2425-H-305–002-, and National Taipei University (NTPU), Taiwan under grants 112-NTPU-ORDA-F-003, 112- NTPU-ORDA-F-004, and NTPU-112A513E01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, HC., Tsai, CT., Day, MY. (2024). A Pilot Study on AI-Assisted Code Generation with Large Language Models for Software Engineering. In: Lee, CY., Lin, CL., Chang, HT. (eds) Technologies and Applications of Artificial Intelligence. TAAI 2023. Communications in Computer and Information Science, vol 2074. Springer, Singapore. https://doi.org/10.1007/978-981-97-1711-8_12
Download citation
DOI: https://doi.org/10.1007/978-981-97-1711-8_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1710-1
Online ISBN: 978-981-97-1711-8
eBook Packages: Computer ScienceComputer Science (R0)