Computer Science ›› 2019, Vol. 46 ›› Issue (11): 168-175.doi: 10.11896/jsjkx.191100504C

Modified Neural Language Model and Its Application in Code Suggestion

ZHANG Xian, BEN Ke-rong   

  1. (School of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China)
  • Received:2018-10-16 Online:2019-11-15 Published:2019-11-14

Abstract: Language models are designed to characterize the occurrence probabilities of text segments.As a class of important model in the field of natural language processing,it has been widely used in different software analysis tasks in recent years.To enhance the learning ability for code features,this paper proposed a modified recurrent neural network language model,called CodeNLM.By analyzing the source code sequences represented in embedding form,the model can capture rules in codes and realize the estimation of the joint probability distribution of the sequences.Considering that the existing models only learn the code data and the information is not fully utilized,this paper proposed an additional information guidance strategy,which can improve the ability of characterizing the code rules through the assistance of non-code information.Aiming at the characteristics of language modeling task,alayer-by-layer incremental nodes setting strategy is proposed,which can optimize the network structure and improve the effectiveness of information transmission.In the verification experiments,for 9 Java projects with 2.03M lines of code,the perplexity index of CodeNLM is obviously better than the contrast n-gram class models and neural language models.In the code suggestion task,the average accuracy (MRR index) of the proposed model is 3.4%~24.4% higher than the contrast methods.The experimental results show that except possessing a strong long-distance information learning capability,CodeNLM can effectively model programming language and perform code suggestion well.

Key words: Code suggestion, Language model, Natural language processing, Recurrent neural network, Software analysis

  • TP311.5
