[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Adversarial generation method for smart contract fuzz testing seeds guided by chain-based LLM

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

With the rapid development of smart contract technology and the continuous expansion of blockchain application scenarios, the security issues of smart contracts have garnered significant attention. However, traditional fuzz testing typically relies on randomly generated initial seed sets. This random generation method fails to understand the semantics of smart contracts, resulting in insufficient seed coverage. Additionally, traditional fuzz testing often ignores the syntax and semantic constraints within smart contracts, leading to the generation of seeds that may not conform to the syntactic rules of the contracts and may even include logic that violates contract semantics, thereby reducing the efficiency of fuzz testing. To address these challenges, we propose a method for adversarial generation for smart contract fuzz testing seeds guided by Chain-Based LLM, leveraging the deep semantic understanding capabilities of LLM to assist in seed set generation. Firstly, we propose a method that utilizes Chain-Based prompts to request LLM to generate fuzz testing seeds, breaking down the LLM tasks into multiple steps to gradually guide the LLM in generating high-coverage seed sets. Secondly, by establishing adversarial roles for the LLM, we guide the LLM to autonomously generate and optimize seed sets, producing high-coverage initial seed sets for the program under test. To evaluate the effectiveness of the proposed method, 2308 smart contracts were crawled from Etherscan for experimental purposes. Results indicate that using Chain-Based prompts to request LLM to generate fuzz testing seed sets improved instruction coverage by 2.94% compared to single-step requests. The method of generating seed sets by establishing adversarial roles for the LLM reduced the time to reach maximum instruction coverage from 60 s to approximately 30 s compared to single-role methods. Additionally, the seed sets generated by the proposed method can directly trigger simple types of vulnerabilities (e.g., timestamp dependency and block number dependency vulnerabilities), with instruction coverage improvements of 3.8% and 4.1%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Ackerman, J., Cybenko, G.: Large language models for fuzzing parsers (registered report). In: Proceedings of the 2nd International Fuzzing Workshop, pp. 31-38 (2023)

  • Atzei, N., Bartoletti, M., Cimoli, T.: A survey of attacks on ethereum smart contracts (SoK). In: Maffei, M., Ryan, M. (eds) Principles of security and trust. POST 2017. Lect. Notes Comput. Sci. 10204. Springer, Berlin, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54455-6_8

  • Atzei, N., Bartoletti, M., Cimoli, T.: A survey of attacks on ethereum smart contracts (sok). In: Principles of Security and Trust: 6th International Conference, POST 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22–29, 2017, Proceedings 6, pp. 164–186. Springer, Berlin (2017)

  • Borji, A.: Generated faces in the wild: Quantitative comparison of stable diffusion, midjourney and dall-e 2 (2022). arXiv: preprintarXiv:2210.00586

  • Chowdhery, A., Narang, S., Devlin, J., et al.: Palm: scaling language modeling with pathways. J. Mach. Learn. Res. 24(240), 1–113 (2023)

    MATH  Google Scholar 

  • Deng, Y., Xia, C. S., Peng, H., et al.: Large language models are zero-shot fuzzers: fuzzing deep-learning libraries via large language models. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 423–435 (2023)

  • Deng, Y., Xia, C. S., Yang, C., et al.: Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt (2023). arXiv preprint arXiv:2304.02014

  • Enis, M., Hopkins, M.: From LLM to NMT: Advancing Low-Resource Machine Translation with Claude (2024). arXiv preprint arXiv:2404.13813

  • Feng, Z., Guo, D., Tang, D., et al.: Codebert: a pre-trained model for programming and natural languages (2020). arXiv preprint arXiv:2002.08155

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 1 (2014)

    MATH  Google Scholar 

  • Gu, Q.: Llm-based code generation method for golang compiler testing. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 2201–2203 (2023)

  • Guo, D., Ren, S., Lu, S., et al.: Graphcodebert: pre-training code representations with data flow (2020). arXiv preprint arXiv:2009.08366

  • Hu, J., Qian, Z., Heng, Y.: Augmenting greybox fuzzing with generative AI (2023). arXiv preprint arXiv:2306.06782

  • Hu, Sihao, et al.: Large language model-powered smart contract vulnerability detection: New perspectives (2023). arXiv preprint arXiv:2310.01152

  • Jiang, B., Liu, Y., Chan, W.K.: ContractFuzzer: Fuzzing smart contracts for vulnerability detection. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), Montpellier, France (2018). https://doi.org/10.1145/3238147.3238177

  • Lemieux, C., Inala, J.P., Lahiri, S.K., et al.: Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 919–931. IEEE (2023)

  • Liu, C., Liu, H., Cao, Z., Chen, Z., Chen, B., Roscoe, B.: ReGuard: Finding reentrancy bugs in smart contracts. In: IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion), Gothenburg, Sweden, 2018, pp. 65-68 (2018)

  • Liu, C., Bao, X., Zhang, H., et al.: Improving chatgpt prompt for code generation (2023). arXiv preprint arXiv:2305.08360

  • Ma, W., Wu, D., Sun, Y., et al.: Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications (2024). arXiv preprint arXiv:2403.16073

  • Meng, R., Mirchev, M., Bohme, M., et al.: Large language model guided protocol fuzzing. In: Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS) (2024)

  • Miah, M.S.U., Kabir, M.M., Sarwar, T.B., et al.: A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM. Sci. Rep. 14, 9603 (2024). https://doi.org/10.1038/s41598-024-60210-7

    Article  MATH  Google Scholar 

  • Mihalache, A., Grad, J., Patil, N. S., et al.: Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye, 1, 1–6 (2024)

  • Qiu, F., Ji, P., Hua, B., et al.: Chemfuzz: large language models-assisted fuzzing for quantum chemistry software bug detection. In: 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C), pp. 103–112. IEEE (2023)

  • Radhakrishnan, A.M.: Is Midjourney-AI a new anti-hero of architectural imagery and creativity. GSJ 11(1), 94–104 (2023)

    MATH  Google Scholar 

  • Sakhrawi, Z., Labidi, T.: Test case selection and prioritization approach for automated regression testing using ontology and COSMIC measurement[J]. Automat. Softw. Eng. 31(2), 51 (2024)

    Article  MATH  Google Scholar 

  • Shou, C., Tan, S., Sen, K.: Ityfuzz: Snapshot-based fuzzer for smart contract. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023, pp. 322–333 (2023)

  • Wang, Y., Wang, W., Joty, S., et al.: Codet5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation (2021). arXiv preprint arXiv:2109.00859

  • Wei, J., Tay, Y., Bommasani, R., et al.: Emergent abilities of large language models (2022). arXiv preprint arXiv:2206.07682

  • Wu, T., He, S., Liu, J., et al.: A brief overview of ChatGPT: the history, status quo and potential future development. IEEE/CAA J. Automat. Sinica 10(5), 1122–1136 (2023)

    Article  MATH  Google Scholar 

  • Wustholz, V., Christakis, M.: Harvey: A greybox fuzzer for smart contracts. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, pp. 1398-1409 (2020)

  • Xia, C.S., et al.: Fuzz4all: Universal fuzzing with large language models. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp. 1–13 (2024)

  • Zhang, C., Bai, M., Zheng, Y., et al.: Understanding large language model based fuzz driver generation (2023). arXiv preprint arXiv:2307.12469

  • Zolkepli, H., Razak, A., Adha, K., et al.: Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding (2024). arXiv preprint arXiv:2401.13565

Download references

Funding

This work was supported in part by the Key R & D project of Shaanxi Province(2023-YBGY-030), the Key Industrial Chain Core Technology Research Project of Xi’an (23ZDCYJSGG0028–2022), the National Natural Science Foundation of China (62272387).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaze Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, J., Yin, Z., Zhang, H. et al. Adversarial generation method for smart contract fuzz testing seeds guided by chain-based LLM. Autom Softw Eng 32, 12 (2025). https://doi.org/10.1007/s10515-024-00483-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-024-00483-4

Keywords