CN119201214A

CN119201214A - An intelligent code optimization and refactoring system based on large models

Info

Publication number: CN119201214A
Application number: CN202411351807.XA
Authority: CN
Inventors: 谢智; 谢乾; 杨华坤; 杨晓东; 沈运东; 汪涛; 余彧; 强豪
Original assignee: Chengdu Zhuozhiyi Software Technology Co ltd; Nanjing Byosoft Co ltd; Nanjing Zhuoyi Information Technology Co ltd; Shanghai Baizhiao Information Technology Co ltd; Jiangsu Zhuoyi Information Technology Co ltd
Current assignee: Chengdu Zhuozhiyi Software Technology Co ltd; Nanjing Byosoft Co ltd; Nanjing Zhuoyi Information Technology Co ltd; Shanghai Baizhiao Information Technology Co ltd; Jiangsu Zhuoyi Information Technology Co ltd
Priority date: 2024-09-26
Filing date: 2024-09-26
Publication date: 2024-12-27

Abstract

The present invention relates to the fields of software engineering and computer science, and in particular to a large model-based intelligent code optimization and reconstruction system that uses a comprehensive code analysis module to perform static and dynamic analysis to identify potential problems and complexity indicators in the code. Through an optimization suggestion generation module, a neural network algorithm is used to generate personalized optimization suggestions, and automatic code reconstruction is achieved for different programming languages. The system ensures the functional integrity of the reconstructed code through a reconstruction verification module, and provides a user feedback mechanism to optimize subsequent strategies. Ultimately, the present invention significantly improves the readability, maintainability and execution efficiency of the code, and achieves an intelligent code optimization effect.

Description

Intelligent code optimization and reconstruction system based on large model

Technical Field

The invention relates to the fields of software engineering and computer science, in particular to an intelligent code optimization and reconstruction system based on a large model.

Background

Code optimization and reconstruction are of great importance in software development, so that the readability and maintainability of codes are improved, and the system performance and the resource utilization efficiency are also improved obviously. As the complexity of software systems continues to increase, maintaining high quality of code becomes particularly important. However, the prior art has significant drawbacks in terms of code optimization and reconstruction. Most tools currently rely on static analysis and rule-driven methods for code detection and optimization, and cannot understand the context and logic of the code in depth. These tools typically only identify common code problems such as grammar errors and simple performance bottlenecks, failing to fully capture complex code odors and potential flaws. In addition, the traditional method often lacks intellectualization, and cannot dynamically adjust the optimization strategy according to specific conditions, so that the optimization effect is limited.

Disclosure of Invention

Aiming at a plurality of problems existing in the prior art, the invention provides an intelligent code optimization and reconstruction system based on a large model. And the code quality and performance are obviously improved through an automatic reconstruction and intelligent feedback mechanism.

As shown in fig. 1, an intelligent code optimization and reconstruction system based on a large model includes:

The code analysis module is used for comprehensively and statically analyzing the input codes to identify potential problems, performance bottlenecks and code peculiar smell and generating a detailed code analysis report which contains descriptions and possible influences of each identified problem;

As shown in fig. 2, the code analysis module is the basis of the whole scheme, and through comprehensive static and dynamic analysis, the input code is subjected to deep examination. In principle, static analysis aims to find potential problems, such as fragments and unused variables that do not meet programming specifications, by scanning code structures and grammars. Dynamic analysis is to monitor the execution process in real time by running the code, and identify performance bottlenecks and code odor (i.e., those non-standard practices affecting the quality of the code). The detailed code analysis report generated by this module contains a description of each identified problem and its possible impact on program performance and maintainability. For example, if a piece of code is executed at runtime for too long, the report will indicate a specific function call and give a performance improvement suggestion.

The optimization suggestion generation module automatically generates specific optimization suggestions aiming at the identification problem based on the code analysis report by using a large model algorithm, wherein the specific optimization suggestions comprise code reconstruction, performance improvement, resource optimization and security enhancement strategies;

As shown in FIG. 3, the optimization suggestion generation module is the core of the system, which uses a large model algorithm to analyze the code analysis report and automatically generate specific optimization suggestions for identifying problems. Specific optimization suggestions include code reconstruction, performance promotion, resource optimization, and security enhancement policies. For example, when it is identified that the execution efficiency of a loop is low, the system may suggest using parallel processing to increase the execution efficiency, or provide a more efficient algorithm alternative for modules that consume more resources. The effect of the module is remarkable, and the complexity of codes can be remarkably reduced and the performance can be improved.

The automatic code reconstruction module automatically or semi-automatically reconstructs the codes according to the optimization suggestions so as to improve the readability, maintainability and execution efficiency of the codes and meet the best practice standard, and ensure that the reconstructed codes keep the original functions;

As shown in fig. 4, in the automatic code reconstruction module, the system automatically or semi-automatically reconstructs the code according to the optimization suggestion, improving the readability, maintainability and execution efficiency of the code. This process ensures that the reconstructed code meets the best practice standard while maintaining the original functionality. For example, if the system recognizes that a piece of code reuses the same logic, the automatic code reconstruction module may abstract it as a separate function, thereby improving code reusability and readability.

The reconstruction verification module is used for carrying out comprehensive functional test and performance verification on the reconstructed code, ensuring that the reconstructed code accords with the expected standard, and generating a detailed verification report comprising a test result and quantized data of an improvement effect;

The reconstruction verification module is responsible for carrying out comprehensive functional test and performance verification on the reconstructed code. Through unit test, integration test and system test, the reconstructed code is ensured to accord with the expected standard, and a detailed verification report is generated. The report contains quantitative data of test results and improvement effects, such as a 30% reduction in code execution time or a 20% reduction in memory usage after reconstruction. This verification process not only ensures the functional integrity of the code, but also provides data support for subsequent optimizations.

As shown in fig. 5, the user feedback mechanism module provides real-time feedback to the user based on the verification report, allowing the user to manually adjust the reconstruction process while recording the user feedback to continuously optimize the subsequent reconstruction policy.

The user feedback mechanism module provides real-time feedback to the user according to the verification report, allowing the user to manually adjust the reconstruction process. The design of the mechanism aims at continuously optimizing the subsequent reconstruction strategy by recording user feedback, so that the system is more intelligent and personalized. For example, when a user is not satisfied with some automatically reconstructed suggestion, the system will record the feedback and make adjustments in future optimization suggestions, thereby improving the user experience and satisfaction.

Preferably, the code analysis module further comprises a complexity evaluation of the input code to generate a code complexity index reflecting maintainability and readability of the code. Further optimization of the code analysis module includes complexity assessment of the input code. This evaluation reflects the maintainability and readability of the code by generating a code complexity index. In particular, the complexity index may use a variety of metrics, such as loop complexity, number of rows, number of nested layers, etc., that help identify complex structures in the code, thereby providing a clear direction of improvement for the developer. By quantifying the complexity, a developer can quickly determine which code blocks need to be simplified or reconstructed to improve code quality.

Preferably, the potential problems in the code analysis report include code fragments and unused variables that do not meet programming specifications. Further optimization of the code analysis module includes complexity assessment of the input code. This evaluation reflects the maintainability and readability of the code by generating a code complexity index. In particular, the complexity index may use a variety of metrics, such as loop complexity, number of rows, number of nested layers, etc., that help identify complex structures in the code, thereby providing a clear direction of improvement for the developer. By quantifying the complexity, a developer can quickly determine which code blocks need to be simplified or reconstructed to improve code quality.

Preferably, the specific optimization suggestion generated by the optimization suggestion generation module includes a resource allocation optimization policy based on performance evaluation for reducing consumption of computing resources. In code analysis reporting, the identification of potential problems includes not only code fragments and unused variables that do not meet programming specifications, but may involve repeated codes and lengthy methods. These problems often lead to maintenance difficulties and performance degradation during development. Through clear reports, the system can guide developers to solve the most urgent problems preferentially, so that technical liabilities are reduced, and the health of the whole codes is improved.

Preferably, the algorithm used by the optimization suggestion generation module comprises a neural network model to improve the accuracy of the optimization suggestion. The optimization suggestion generation module generates specific optimization suggestions using the neural network model, taking as an example a resource allocation optimization strategy based on performance assessment. The strategy aims at analyzing the use condition of the existing resources and providing a more efficient resource allocation scheme on the basis, thereby reducing the consumption of computing resources. For example, when it is detected that a certain functional module occupies too much memory when processing a large amount of data, the system may suggest to use a data stream processing or batch processing manner, so as to reduce the memory burden and improve the execution efficiency.

Preferably, the automatic code reconstruction module includes adapters for different programming languages to support code reconstruction in multiple programming languages. In an automatic code reconstruction module, the system comprises adapters for different programming languages, ensuring that codes of multiple programming languages can be effectively reconstructed. The implementation of such an adapter relies on an Abstract Syntax Tree (AST) that is capable of transcoding into language-independent representations, so that the reconstruction algorithm can be adapted to multiple language scenarios. For example, when reconstructing Java and Python codes, the system may parse out the corresponding AST and apply the same reconstruction policy on this basis, thereby ensuring consistency and accuracy of the reconstruction.

Preferably, the functional test of the reconstruction verification module includes a unit test, an integration test and a system test to ensure the functional integrity of the reconstructed code. The reconfiguration verification module ensures that the reconfigured code keeps the functional integrity through functional tests such as unit test, integration test, system test and the like. The system automatically generates test cases and runs the tests, and records the execution result of each test. The process can effectively ensure that the reconstruction code realizes the expected performance improvement on the basis of keeping the original function.

Preferably, the user feedback mechanism module records the adjustment history of the user when the user performs manual adjustment, and generates personalized optimization suggestions based on the history. The user feedback mechanism module records the adjustment history of the user when the user performs manual adjustment so as to generate personalized optimization suggestions later. For example, if a user frequently adjusts certain parameters in a particular module, the system will recognize this pattern and prioritize the adjustment of these parameters in future optimization suggestions, thereby enhancing the level of intelligence and user experience of the system.

Preferably, the quantified data of the improvement effect contained in the verification report includes a percentage reduction in code execution time and a percentage reduction in memory usage. Quantitative data of the improvement effect provided in the verification report, including the reduction percentage of the code execution time and the reduction percentage of the memory usage, provides clear improvement basis for the user. The data can clearly show the substantial effect brought by reconstruction, and help developers make more reasonable decisions in subsequent development and optimization.

Compared with the prior art, the invention has the advantages that:

the invention realizes the intelligent processing of code analysis by applying a large model algorithm and a neural network technology, and improves the accuracy of optimization suggestion;

the method can comprehensively analyze the static and dynamic characteristics of the codes and can generate personalized optimization suggestions aiming at specific problems;

According to the invention, through automatic reconstruction, the readability and the execution efficiency of the code are optimized on the premise of keeping the function intact, and the defect that the traditional method cannot effectively identify the complex problem is overcome.

Drawings

FIG. 1 is a block diagram of a system of the present invention;

FIG. 2 is a flow chart of code analysis according to the present invention;

FIG. 3 is a flow chart of the optimization suggestion generation in the present invention;

FIG. 4 is a flow chart of the reconstruction verification in the present invention;

FIG. 5 is a flow chart of a user feedback mechanism in the present invention.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

The system of the invention comprises:

The system comprises a data acquisition module, a preprocessing module and a large model training module, wherein:

And the data acquisition module is responsible for collecting front-end codes of data sources such as a front-end code library, a development community, an online code warehouse and the like. The crawler technology is used for collecting front-end project codes of open sources from platforms such as GitHub, gitLab. Official documents and best practices of the front end framework (e.g., compact, angular, vue. Js) are collected. Discussion and code examples in front-end development communities (e.g., CSDN, stack Overflow, reddit) are collected.

And the preprocessing module is used for cleaning, standardizing and extracting the characteristics of the acquired codes. Code cleaning, namely removing non-key codes such as useless notes, blank spaces, logs and the like. Code normalization unifies the style of code, such as using Prettier, ESLint or other tools. Feature extraction, namely extracting structural features (such as circle complexity calculation, code repetition calculation and the like) and unstructured features (such as code fragments) of the code.

And the large model training module is used for training a deep learning model by using the preprocessed data. Pre-trained models, such as BERT, GPT-3, codeBERT, etc., are selected that are suitable for processing natural and programming languages. Fine tuning is performed on the basis of a pre-training model, trained using the front-end code dataset, to learn the syntax, semantics and best practices of the front-end code. Training objectives include code defect detection, code optimization suggestions, code style unification, and the like.

And the user feedback mechanism module is used for providing real-time feedback for the user according to the verification report, allowing the user to manually adjust the reconstruction process, and simultaneously recording the user feedback to continuously optimize the subsequent reconstruction strategy.

The system of the invention comprises the following execution steps:

step one, system initialization

Environment construction-first, necessary software environments are installed and configured on the server, including deep learning frameworks (e.g., tensorFlow or PyTorch), code parsing tools (e.g., babel or ESLint), database systems, and the like. The selection of appropriate large model architectures (e.g., transducer-basedmodels), development frameworks (e.g., act, vue. Js), and tools (e.g., ESLint, prettier).

Model loading, namely loading a pre-trained large-scale code analysis model from a storage medium. The model is trained on a large number of open source code projects, and has the recognition capability for the characteristics of code style, performance bottleneck, design mode and the like.

Interface configuration, namely configuring a user interface comprising a code uploading interface, an optimization suggestion display area, a reconfiguration execution button and the like so as to facilitate the interaction between a user and the system.

Step two, code analysis and feature extraction

And uploading codes, namely uploading front-end project source codes to be optimized by a user through a user interface.

And the system uses a code analysis module to analyze the uploaded codes. The key steps are as follows:

1) Code preprocessing

The data is flushed, useless annotations, empty lines, etc. are removed. And standardizing the code style and ensuring the data consistency.

2) Constructing a predictive analysis table:

first, the First set of each non-terminal symbol, i.e. the set of the First symbol of all derived symbol strings for that non-terminal symbol, needs to be calculated.

Then, a set of Follow-up is calculated, in particular for non-terminal symbols from which the null string epsilon is derived, for determining the terminal symbols following these non-terminal symbols.

The prediction analysis table is populated with the First and, if necessary, the Follow sets.

3) And (3) executing an analysis process:

A stack is used to simulate the derivation process, the initial content of the stack being the start symbol.

The input symbol string is scanned from left to right.

At each step, the corresponding action (push-in generating the right non-terminal or terminal, or error-reporting) is looked up in the predictive analysis table based on the non-terminal at the top of the stack and the next input symbol.

The above steps are repeated until the stack is empty and all input symbols are successfully processed to generate the intermediate representation abstract syntax tree AST of the code.

And extracting key features from the parsed codes by a feature extraction module. These features include function call relationships, variable usage frequency, loop complexity, resource loading patterns, etc. The results of the feature extraction will be encoded into a vector or matrix form for subsequent model processing. Various indicators of the code are calculated, such as circle complexity (Cyclomatic Complexity), number of code lines, function length, etc.

1. Loop complexity calculation algorithm

Circle complexity is an indicator of the complexity of the code, which represents the number of independent paths in the program.

The calculation formula is V (G) =e-n+2p, where V (G) represents the circle complexity, E represents the number of edges in the program control flow graph, N represents the number of nodes, and P represents the number of connected components.

The deduction step is as follows:

First, the front-end code is converted into a program control flow graph.

-Counting the number of edges E and the number of nodes N in the control flow graph.

-Determining the number P of connected components.

-Substituting E, N and P values into the formula to calculate the circle complexity.

2. Code repetition degree calculation algorithm

Code repetition is a measure of the proportion of repeated code segments in the code.

The calculation formula is repetition = length of repeated code segments/total code length.

The deduction step is as follows:

-word segmentation of the front-end code, splitting the code into individual code segments.

-Calculating each code segment using a hash algorithm resulting in a hash value.

Counting the number of code fragments of the same hash value, i.e. the length of the repeated code fragments.

-Calculating the total code length.

-Dividing the length of the repeated code segments by the total code length to obtain the code repetition.

Step three, large model analysis and optimization suggestion generation

Large model input, namely inputting the extracted code features into a pre-trained large-scale code analysis model.

Training and reasoning of a large model:

In the training stage of the large model, a large-scale code corpus is used for pre-training, and the model generates codes based on an autoregressive mechanism and learns different code modes from the codes. In order to improve the understanding of the model to the optimization strategy, a supervision and learning module is further added to guide the reasoning process of the model.

By minimizing the loss function, the model can gradually optimize its code reasoning capabilities.

The loss function formula:

for the classification problem, the cross entropy loss function is expressed as:

for the multi-classification problem, the cross entropy loss function is expressed as:

Wherein:

l (θ) is a loss function, which is a function of the model parameter θ.

N is the total number of samples.

C is the total number of classifications.

Y _i is the true label of the i-th sample, which is 0 or 1 for the two-class problem, and one-hot coded vector for the multi-class problem.

The model predicts the i-th sample, takes a value between 0 and 1 for the classification problem, and is a probability distribution vector for the multi-classification problem.

Log represents the natural logarithm.

These two expressions are applicable to two-class and multi-class problems, respectively, and by minimizing these loss functions, the model can gradually optimize its code reasoning capabilities.

Code optimization derivation:

The optimization objective is set to maximize the execution efficiency of the code, and the optimization process is inferred based on a series of heuristic rules. By means of a dynamic programming algorithm, a globally optimal optimization path can be calculated. A typical example of a dynamic programming algorithm is to solve a shortest path problem, such as calculating the shortest path between all pairs of points using the Floyd-Warshall algorithm. The basic formula of the algorithm is as follows:

dist[i][j]=min(dist[i][j],dist[i][k]+dist[k][j])

Wherein dist [ i ] [ j ] represents the shortest path length from point i to point j.

K is an intermediate point and the algorithm will consider updating the shortest path of i to j through all possible intermediate points k.

This formula is an example of a state transition equation in a dynamic programming algorithm that describes how a solution to a larger problem can be constructed from solutions to known sub-problems. By iteratively applying this formula, the algorithm can eventually calculate a globally optimal optimized path.

And generating an optimization suggestion, namely generating a specific optimization suggestion by an optimization suggestion generation module based on the analysis result of the model. These suggestions include code reconstruction schemes, performance optimization measures, code style adjustments, and the like. At the same time, the module ranks and prioritizes the suggestions so that the user can select as desired.

Step four, optimizing suggestion display and user confirmation

And displaying the generated optimization suggestions to a user through a user interface by the system. The content presented includes detailed descriptions of suggestions, comparative presentations of code segments, evaluation of expected effects, and the like.

The user confirms that the user evaluates according to the displayed optimization suggestions and decides whether to adopt. For the adopted advice, the user can confirm through an operation button on the interface.

Step five, reconstructing execution and effect evaluation

Reconstruction execution-after the user confirms the optimization suggestion, the system applies the suggestion to the code through the reconstruction execution module. The reconstruction process comprises the steps of adjusting a code structure, renaming variables, splitting and merging functions and the like. The system may also provide automated or semi-automated tools for code reconstruction to reduce the user's workload.

And after the reconstruction is completed, the system performs comparison analysis on codes before and after the reconstruction, and evaluates the reconstruction effect. The metrics evaluated include readability, maintainability, execution efficiency, etc. of the code. The evaluation result is fed back to the user so that the user knows the actual effect of the reconstruction.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An intelligent code optimization and reconstruction system based on a large model, comprising:

2. The large model based intelligent code optimization and reconstruction system according to claim 1, wherein the code analysis module further comprises a complexity assessment of the input code to generate a code complexity index reflecting maintainability and readability of the code.

3. The large model based intelligent code optimization and reconstruction system according to claim 1, wherein potential problems in the code analysis report include code fragments and unused variables that do not meet programming specifications.

4. The large model based intelligent code optimization and reconstruction system according to claim 1, wherein the specific optimization suggestions generated by the optimization suggestion generation module include a resource allocation optimization policy based on performance assessment for reducing consumption of computing resources.

5. The large model based intelligent code optimization and reconstruction system according to claim 1, wherein the algorithm used by the optimization suggestion generation module includes a neural network model to improve accuracy of optimization suggestions.

6. The large model based intelligent code optimization and reconstruction system according to claim 1, wherein the automatic code reconstruction module includes adapters for different programming languages to support code reconstruction in multiple programming languages.

7. The large model based intelligent code optimization and reconstruction system according to claim 1, wherein the functional testing of the reconstruction verification module includes unit testing, integration testing, and system testing to ensure functional integrity of the reconstructed code.

8. The large model based intelligent code optimization and reconstruction system according to claim 1, wherein the user feedback mechanism module records the user's adjustment history when the user makes manual adjustments, and generates personalized optimization suggestions based on the history.

9. The large model based intelligent code optimization and reconstruction system according to claim 1, wherein the quantitative data of the improvement effect contained in the verification report includes a percentage reduction of code execution time and a percentage reduction of memory usage.