CN117608630A - Code quality detection method, device, equipment and storage medium - Google Patents
Code quality detection method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117608630A CN117608630A CN202311652090.8A CN202311652090A CN117608630A CN 117608630 A CN117608630 A CN 117608630A CN 202311652090 A CN202311652090 A CN 202311652090A CN 117608630 A CN117608630 A CN 117608630A
- Authority
- CN
- China
- Prior art keywords
- code
- codes
- similarity
- center
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a code quality detection method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring at least one code to be detected, determining the similarity between the codes to be detected, and obtaining a code similarity detection result; clustering all codes to be detected based on a code similarity detection result to obtain at least one type code set; and comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result. The technical scheme of the embodiment of the invention solves the problems of low detection accuracy and detection efficiency in the prior art that a single reference code is adopted to detect the quality of the target code, can cluster the codes to obtain different types of code sets, and then detects the quality of the codes in the code sets based on the reference codes of the corresponding types, thereby improving the accuracy and efficiency of code quality detection.
Description
Technical Field
The embodiment of the invention relates to the technical field of automatic code detection, in particular to a code quality detection method, a device, equipment and a storage medium.
Background
The detection of code quality is a key link in software development, the code quality is a group of requirements common to all software projects, and the properties of readability, maintainability, modularization, performance, safety and the like are all important component factors of the software quality. In the prior art, when the quality of the code is detected, standard reference codes are mostly adopted to detect the quality of the target code, but different types of codes often have different code characteristics, the quality of the different types of codes cannot be effectively detected based on a single reference code, and the problems of high code inspection difficulty, difficult discovery of code defects and loopholes, low inspection efficiency and the like exist.
Disclosure of Invention
The embodiment of the invention provides a code quality detection method, a device, equipment and a storage medium, which can cluster codes to obtain different types of code sets, and then detect the quality of the codes in the code sets based on the corresponding types of reference codes, thereby improving the accuracy and efficiency of code quality detection.
In a first aspect, an embodiment of the present invention provides a method for detecting code quality, where the method includes:
acquiring at least one code to be detected, determining the similarity between the codes to be detected, and obtaining a code similarity detection result;
clustering all codes to be detected based on a code similarity detection result to obtain at least one type code set;
and comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result.
In a second aspect, an embodiment of the present invention provides a code quality detection apparatus, including:
the code similarity detection module is used for acquiring at least one code to be detected, determining the similarity between the codes to be detected and obtaining a code similarity detection result;
the code clustering module is used for clustering all codes to be detected based on the code similarity detection result to obtain at least one type code set;
the code quality detection module is used for comparing the codes to be detected in the type code sets with preset reference type codes aiming at each type code set, and determining a code quality detection result according to the comparison result.
In a third aspect, an embodiment of the present invention provides a computer apparatus, including:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the code quality detection method of any of the embodiments.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the code quality detection method according to any of the embodiments.
According to the technical scheme provided by the embodiment of the invention, the code similarity detection result is obtained by acquiring at least one code to be detected and determining the similarity between the codes to be detected; clustering all codes to be detected based on a code similarity detection result to obtain at least one type code set; and comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result. The technical scheme of the embodiment of the invention solves the problems of lower detection accuracy and detection efficiency caused by the fact that a single reference code is adopted to detect the quality of the target code when the quality of the code is detected in the prior art, can cluster the codes to obtain different types of code sets, and then detects the quality of the codes in the code sets based on the reference codes of the corresponding types, thereby improving the accuracy and efficiency of code quality detection.
Drawings
FIG. 1 is a flowchart of a code quality detection method according to an embodiment of the present invention;
FIG. 2 is a flowchart of another code quality detection method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a code quality detecting device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. The data acquisition, storage, use, processing and the like in the technical scheme meet the relevant regulations of national laws and regulations.
Fig. 1 is a flowchart of a code quality detection method provided by an embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which quality of an Objective-C code is detected, the method may be performed by a code quality detection device, and the device may be implemented by software and/or hardware.
As shown in fig. 1, the code quality detection method includes the steps of:
s110, obtaining at least one code to be detected, and determining the similarity between the codes to be detected to obtain a code similarity detection result.
The code to be detected may be a code for which quality detection is required. Specifically, a corresponding code uploading file can be obtained, and then the code in the uploading file is used as the code to be detected. The code similarity detection result may be a detection result of similarity between codes to be detected. Specifically, the code to be detected may be converted into code vectors, and then the euclidean distance between every two code vectors is calculated, and the calculated euclidean distance is used as the similarity between every two codes to be detected corresponding to the vectors. By determining the code phase detection result, the similarity between the codes to be detected can be known, and the codes to be detected can be conveniently classified according to the similarity.
And S120, clustering all codes to be detected based on the code similarity detection result to obtain at least one type code set.
Wherein the set of type codes may be a set of codes having similar code characteristics. Specifically, k codes (the numerical value of k can be set) are selected from the codes to be detected as central codes to create k type code sets, and then the k type code sets are added into the corresponding type code sets based on the similarity between other codes to be detected and the central codes, so that k type code sets are obtained. By clustering all codes to be detected based on the code similarity detection result, at least one type code set is obtained, codes of the finally determined type code set can have higher code feature similarity, the code quality in the type code set can be detected conveniently by matching more applicable reference codes for each type code set, and the accuracy of code quality detection is improved.
S130, comparing the codes to be detected in the type code sets with preset reference type codes according to each type code set, and determining a code quality detection result according to the comparison result.
The preset reference type code may be a reference code preset to detect the quality of one type code. Specifically, the preset reference type codes corresponding to each type code set can be respectively determined based on the set center codes. The code quality detection result may be a quality detection result for all codes to be detected in each type code set. Specifically, the codes to be detected in each type code set can be compared with corresponding preset reference type codes respectively, so that the quality detection results of the codes to be detected in each type code set can be determined, and the code quality detection results of all the codes to be detected can be further determined.
The embodiment of the invention provides an Objective-C code quality detection method based on cluster analysis, which comprises the steps of firstly carrying out induction arrangement on common code quality problems based on Objective-C grammar and code specification, and establishing a code feature library; scanning codes submitted by developers to obtain sample data, and clustering the sample data, wherein a PAM algorithm is adopted in the clustering algorithm; comparing the clustering result with a code quality feature library to obtain an analysis result; the results include each problem in the code, the shortfall of code habits of the corresponding developer, and code quality optimization suggestions. The invention can effectively help the Objective-C program developer to detect the code quality and optimize according to the suggestion, reduce the codes which are not standard and have hidden trouble, optimize the code quality of the application program, and further improve the overall quality grade of the product.
According to the technical scheme provided by the embodiment of the invention, the code similarity detection result is obtained by acquiring at least one code to be detected and determining the similarity between the codes to be detected; clustering all codes to be detected based on a code similarity detection result to obtain at least one type code set; and comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result. The technical scheme of the embodiment of the invention solves the problems of lower detection accuracy and detection efficiency caused by the fact that a single reference code is adopted to detect the quality of the target code when the quality of the code is detected in the prior art, can cluster the codes to obtain different types of code sets, and then detects the quality of the codes in the code sets based on the reference codes of the corresponding types, thereby improving the accuracy and efficiency of code quality detection.
FIG. 2 is a flowchart of another code quality detection method according to an embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which quality of an Objective-C code is detected, and on the basis of the foregoing embodiment, how to determine similarity between codes to be detected, and obtain a code similarity detection result is further described; and how to cluster all codes to be detected based on the code similarity detection result to obtain at least one type of code set.
As shown in fig. 2, the code quality detection method includes the steps of:
s210, obtaining at least one code to be detected, and carrying out vector conversion on each code to be detected to obtain an object code vector.
The code to be detected may be a code for which quality detection is required. Specifically, a corresponding code uploading file can be obtained, and then the code in the uploading file is used as the code to be detected. The target code vector may be a corresponding vector of codes to be detected. Specifically, the steering amount conversion can be performed on each code to be detected, so as to obtain the target code vector corresponding to each code to be detected. By converting the code into a vector form, subsequent similarity calculations can be facilitated.
Illustratively, the codes in the code library may be scanned, the codes are sparsely encoded, i.e., shaped as (0,0,0,1,0,0, etc.), and then the sparsity encoding is converted to a dense encoding to be converted to a word vector for ease of computation. I.e. into a coding scheme shaped as (0.23,0.56,0.36,0.86). Thus, even two different codes can be converted, and the similarity of the codes can be calculated.
S220, calculating Euclidean distance between every two target code vectors, and determining a code similarity detection result according to the Euclidean distance.
The code similarity detection result may be a detection result of similarity between codes to be detected. Specifically, the euclidean distance between every two target code vectors can be calculated, and the calculated euclidean distance is used as the similarity between every two codes to be detected corresponding to the vectors. By determining the code phase detection result, the similarity between the codes to be detected can be known, and the codes to be detected can be conveniently classified according to the similarity.
S230, selecting at least one code from the codes to be detected as a set center code, and taking the codes except the set center code in the codes to be detected as non-set center codes.
Wherein the hub may be a code that is capable of characterizing the code features in the collection. A hub code may represent a type of code. By determining the central code of the set, other codes to be detected can be classified towards the central code of the set by taking the central code of the set as the midpoint of the set. Specifically, when determining the set center code, K codes to be detected may be randomly selected as the set center code. Wherein the value of K can be set.
Further, after the set center code is determined, codes other than the set center code in the codes to be detected can be used as non-set center codes. The non-set center codes can be categorized into set center codes in a subsequent process, thereby forming a code set.
S240, based on the code similarity detection result, determining the similarity between each non-set center code and the set center code, and obtaining the distance center similarity.
Wherein the distance center similarity may be a similarity between non-aggregate center codes and aggregate center codes. Specifically, since the code similarity detection result includes a similarity detection result between every two codes, the set center codes and the non-set center codes in the code similarity detection result can be respectively determined, and the distance center similarity is further obtained based on the similarity between the non-set center codes and each set center code. The similarity between the non-set center codes and each set center code can be determined through the similarity between the non-set center codes and the set center codes, so that the non-set center codes can be classified based on the similarity.
S250, clustering the set center codes and the non-set center codes based on the distance center similarity to obtain a type code set.
Wherein the set of type codes may be a set of codes having similar code characteristics. Specifically, based on the calculated distance center similarity, each non-set center code may be clustered toward a set center code, so as to obtain a type code set.
Specifically, when determining the type code sets, code sets corresponding to the center codes of each set can be determined respectively to obtain at least one initial code set; and comparing the distance center similarity of the non-set center codes and the set center codes for each non-set center code, and adding the non-set center codes into the initial code set according to the comparison result to obtain a type code set.
Wherein the initial code set may be a set to which no non-set center code is added. Specifically, a corresponding initial code set may be established with each set center code as a set center. I.e. how many set center codes there are, there are the same number of initial code sets. Each set center code corresponds to a respective one of the initial code sets.
Further, for each non-set center code, the similarity between the non-set center codes and the center of each set center code can be compared, and the non-set center codes are added into the initial code set according to the comparison result. And finally, adding all non-set center codes into the corresponding set to obtain a final at least one type code set.
Specifically, when adding the non-set center codes to the initial code set according to the comparison result, the similarity between the non-set center codes and the distance center of each set center code can be compared, the set center code with the largest distance center similarity is used as the target center code, and then the non-set center codes are added to the initial code set corresponding to the target center code, so that the clustering process of the non-set center codes is completed. By adding the non-set center codes to the code initial set based on the similarity of the distance centers, codes with larger similarity can be classified, and the codes in the finally determined type code set are guaranteed to have higher code feature similarity.
By way of example, clustering may be performed using a PAM (Partitioning Around Medoids) algorithm, divided around a center point. Firstly, randomly selecting k samples from all code samples to be detected as initial k center points, wherein k represents the number of code quality problem types to be analyzed; and calculating the distance d between each point which is not the center point and k center points in the code sample to be detected, and distributing the non-center points to the center points with the smallest distance, thereby realizing initial clustering.
The main idea of the PAM algorithm is that selecting the object at the center in the cluster, and trying to give k partitions to n objects; the center point is also called a center object, and other objects are called non-center objects; the algorithm iteratively replaces the center object with a non-center object, trying to find a better center point to improve the quality of the clusters, by initially randomly selecting k objects as center points; in each iteration, all possible pairs of objects are analyzed, one object in each pair being the center point and the other being a non-center object, for the possible various combinations, the quality of the clustering result is calculated, whether the non-center object can have better results after replacing the center object. One object Oi can be replaced by an object with reduced maximum square-error value, namely the total cost of replacement after replacement is smaller than 0, which indicates that a better clustering result can be obtained after the center point is replaced; if the total replacement cost is greater than 0, not replacing; all possible alternatives are tried in one iteration, where the resulting set of best objects becomes the center point of the next iteration, until the quality of the clusters can no longer be improved.
In order to reduce the potential risk of codes, improve team development efficiency and improve software quality, the invention provides a technical method for code quality detection based on cluster analysis aiming at the problem of Objective-C code quality, which mainly comprises the following steps: classifying common code quality problems by using an Objective-C grammar and a code specification, and establishing a code quality feature library; scanning codes submitted by developers to obtain sample data, and clustering the sample data through a PAM algorithm; comparing the clustering result with a code quality feature library to obtain an analysis result; the results include each problem in the code, the shortfall of code habits of the corresponding developer, and code quality optimization suggestions.
Optionally, to further optimize the set of type codes, the set center code in the set of type codes may be adjusted. So that the adjusted set center code can better characterize the code characteristics of each code in the type code set. Specifically, after the type code set is obtained, each non-set center code in the type code set is used as a candidate center code; determining the similarity between the candidate center code and other non-set center codes in the type code set to obtain similar values in the candidate group; and adjusting the set center code of the type code set according to the similarity value in the candidate group.
The candidate center codes may be non-set center codes that have a probability of subsequently becoming set center codes. Specifically, each non-set center code may be sequentially used as a candidate center code, so as to determine whether the candidate center code can actually replace the current set center code. Further, the similarity value within the candidate set may be a sum of similarities of the candidate center code and other non-center set codes in the set of type codes. Specifically, the similarity between the candidate center code and other non-set center codes can be substituted into a preset formula for solving the similarity value in the group, so as to obtain the similarity value in the candidate group. From the similarity values in the candidate set, it can be determined whether the candidate center code can replace the current set center code.
Specifically, when the set center code of the type code set is adjusted, the similarity between the set center code and the non-set center code in the type code set can be determined, so that the similarity value in the current group is obtained; and in the case that the similarity value in the candidate group is larger than the similarity value in the current group, taking the candidate center code as the set center code of the type code set.
Wherein the similarity value in the current group may be the sum of the similarities of the current set center code and other non-center code sets in the type code set. Specifically, the similarity between the current set center code and other non-set center codes can be brought into a formula of similarity values in a preset group, so as to obtain the similarity values in the current group. Further, the intra-candidate group similarity value may be compared with the current intra-group similarity value, and the candidate center code is used as the set center code of the type code set if the intra-candidate group similarity value is greater than the current intra-group similarity value. By optimizing the set center codes in the type code set, the adjusted set center codes can be enabled to represent the code characteristics of each code in the type code set, and the quality judgment standard corresponding to the type code set can be found out based on the set center codes conveniently.
And when the similarity value in the candidate group is larger than the similarity value in the current group, the similarity between the candidate center code and each code in the type code set is larger, so that the candidate center code can better characterize the code characteristics in the type code set, and the candidate center code can be used as the set center code of the type code set. Correspondingly, in the case that the similarity value in the candidate group is smaller than the similarity value in the current group, the similarity between the candidate center code and each code in the type code set is smaller, so that the candidate center code cannot better characterize the code features in the type code set, and therefore the candidate center code cannot be used as the set center code of the type code set.
Illustratively, the distance from the non-center point to the center point in the present invention is calculated using the Euclidean distance, and the Euclidean distance formula from the point (x 1, x2,..and xn) to the point (y 1, y2,..and yn) in the n-dimensional space is as follows:
the clustering aims at finding the optimal k center points so that each point is allocated to the nearest center point as far as possible, and the target is measured by an objective function, and the minimum absolute error E is adopted as the objective function, and the formula is as follows:
Wherein oi is the current ith center point, pi is the cluster corresponding to the current ith center point, x is the non-center point in each cluster, d (x, oi) is the distance from x to oi as shown in formula (6-1), and finally k center points with the minimum formula E and the clusters corresponding to the k center points are the final clustering result.
In each iteration process, a non-center point x is selected for each cluster Pi, a new objective function value E ' is calculated assuming that the point is taken as a center point, if E ' is reduced compared with the current E, the point x is indicated to be better than the current center point, the current center point is replaced by the point x, otherwise, if E ' is increased, the replacement is not performed. And traversing all samples, repeatedly iterating and selecting non-center points to calculate a new objective function and replacing the center points until the objective function E is not reduced or reaches the maximum iteration times, thereby finding k center points with the minimum E and k clusters corresponding to the k center points. The k clusters that are finally determined represent k code quality problems that exist in the scanned code.
And comparing the final result of the PAM algorithm clustering with a code quality feature library to obtain detailed analysis results of the code quality problems, such as the occurrence frequency of the problems, the occurrence scene of the problems, the code habit of a developer and the like, and code quality optimization suggestions. The method helps developers reduce irregular and hidden trouble codes and improves own code writing habit; the team responsible person is helped to manage the development team, the team cooperation development efficiency is improved, and the whole quality of the product is controlled.
S260, comparing the codes to be detected in the type code sets with preset reference type codes according to each type code set, and determining a code quality detection result according to the comparison result.
The preset reference type code may be a reference code preset to detect the quality of one type code. Specifically, the preset reference type codes corresponding to each type code set can be respectively determined based on the set center codes. The code quality detection result may be a quality detection result for all codes to be detected in each type code set. Specifically, the codes to be detected in each type code set can be compared with corresponding preset reference type codes respectively, so that the quality detection results of the codes to be detected in each type code set can be determined, and the code quality detection results of all the codes to be detected can be further determined.
Optionally, before comparing the code to be detected in the type code set with the preset reference type codes, determining the similarity between the set center code and each preset reference code in the preset code quality database to obtain the type similarity; and determining the preset reference type codes corresponding to the type code set from the preset reference codes based on the type similarity.
The preset code quality database may be a preset database storing preset reference type codes. A plurality of preset reference codes are stored in a preset code quality database. The preset reference code may be used to detect the quality of the code. The type similarity may be a similarity between the set center code and a preset reference code. Specifically, the manner of calculating the similarity between the set center code and the preset reference code is not limited herein. For example, the set center code and the preset reference code may be respectively converted into corresponding vector forms, and the similarity between the two vectors is calculated, thereby obtaining the type similarity.
Further, when determining the preset reference type codes corresponding to the type code sets, the type similarity between the type code sets and each preset reference code can be compared, and then the preset reference type codes corresponding to the type code sets are determined according to the comparison result. Specifically, a preset reference type code corresponding to the maximum value of the type similarity can be selected as a preset reference type code corresponding to the type code set.
According to the technical scheme provided by the embodiment of the invention, at least one code to be detected is obtained, and vector conversion is carried out on each code to be detected to obtain a target code vector; calculating Euclidean distance between every two target code vectors, and determining a code similarity detection result according to the Euclidean distance; selecting at least one code from the codes to be detected as a set center code, and taking the codes except the set center code in the codes to be detected as non-set center codes; based on the code similarity detection result, determining the similarity between each non-set center code and the set center code to obtain the distance center similarity; clustering the set center codes and the non-set center codes based on the similarity of the distance centers to obtain a type code set; and comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result. The technical scheme of the embodiment of the invention solves the problems of lower detection accuracy and detection efficiency caused by the fact that a single reference code is adopted to detect the quality of the target code when the quality of the code is detected in the prior art, can cluster the codes to obtain different types of code sets, and then detects the quality of the codes in the code sets based on the reference codes of the corresponding types, thereby improving the accuracy and efficiency of code quality detection.
Fig. 3 is a schematic structural diagram of a code quality detection device provided by the embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which quality of an object-C code is detected, and the device may be implemented by software and/or hardware, and integrated into a computer device with an application development function.
As shown in fig. 3, the code quality detecting apparatus includes: a code similarity detection module 310, a code clustering module 320, and a code quality detection module 330.
The code similarity detection module 310 is configured to obtain at least one code to be detected, determine a similarity between the codes to be detected, and obtain a code similarity detection result; the code clustering module 320 is configured to cluster all the codes to be detected based on the code similarity detection result, so as to obtain at least one type code set; the code quality detection module 330 is configured to compare, for each type code set, a code to be detected in the type code set with a preset reference type code, and determine a code quality detection result according to the comparison result.
According to the technical scheme provided by the embodiment of the invention, the code similarity detection result is obtained by acquiring at least one code to be detected and determining the similarity between the codes to be detected; clustering all codes to be detected based on a code similarity detection result to obtain at least one type code set; and comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result. The technical scheme of the embodiment of the invention solves the problems of lower detection accuracy and detection efficiency caused by the fact that a single reference code is adopted to detect the quality of the target code when the quality of the code is detected in the prior art, can cluster the codes to obtain different types of code sets, and then detects the quality of the codes in the code sets based on the reference codes of the corresponding types, thereby improving the accuracy and efficiency of code quality detection.
In an alternative embodiment, the code clustering module 320 is specifically configured to: selecting at least one code from the codes to be detected as a set center code, and taking the codes except the set center code in the codes to be detected as non-set center codes; based on the code similarity detection result, determining the similarity between each non-set center code and the set center code to obtain the distance center similarity; and clustering the set center codes and the non-set center codes based on the similarity of the distance centers to obtain the type code set.
In an alternative embodiment, the code clustering module 320 includes: a type code set determining unit configured to: respectively determining code sets corresponding to the central codes of each set to obtain at least one initial code set; and comparing the similarity between the non-set center codes and the distance center of the set center codes for each non-set center code, and adding the non-set center codes into the initial code set according to the comparison result to obtain the type code set.
In an alternative embodiment, the code similarity detection module 310 is specifically configured to: vector conversion is carried out on each code to be detected, and an object code vector is obtained; and calculating Euclidean distance between every two target code vectors, and determining the code similarity detection result according to the Euclidean distance.
In an alternative embodiment, the code quality detection apparatus further includes: a type code set adjustment module for: taking each non-set center code in the type code set as a candidate center code; determining the similarity between the candidate center code and other non-set center codes in the type code set to obtain a similarity value in a candidate group; and adjusting the set center code of the type code set according to the similarity value in the candidate group.
In an alternative embodiment, the type code set adjustment module includes: a set center code adjustment unit configured to: determining the similarity between the set center codes and the non-set center codes in the type code set to obtain similarity values in the current group; and taking the candidate center code as the set center code of the type code set under the condition that the similarity value in the candidate group is larger than the similarity value in the current group.
In an alternative embodiment, the code quality detection apparatus further includes: a reference type code determining module for: before comparing the codes to be detected in the type code set with preset reference type codes, determining the similarity between the set center code and each preset reference code in a preset code quality database to obtain type similarity; and determining a preset reference type code corresponding to the type code set from the preset reference codes based on the type similarity.
The code quality detection device provided by the embodiment of the invention can execute the code quality detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 4 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention. The computer device 12 may be any terminal device with computing power and may be configured in a code quality detection device.
As shown in FIG. 4, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 may be one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown in fig. 4, the network adapter 20 communicates with other modules of the computer device 12 via the bus 18. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, to implement a code quality detection method provided by the present embodiment, the method including:
acquiring at least one code to be detected, determining the similarity between the codes to be detected, and obtaining a code similarity detection result;
clustering all codes to be detected based on a code similarity detection result to obtain at least one type code set;
and comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result.
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the code quality detection method as provided by any embodiment of the present invention, comprising:
acquiring at least one code to be detected, determining the similarity between the codes to be detected, and obtaining a code similarity detection result;
clustering all codes to be detected based on a code similarity detection result to obtain at least one type code set;
And comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It will be appreciated by those of ordinary skill in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computer device, such that they are stored in a memory device and executed by the computing device, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
Claims (10)
1. A code quality detection method, comprising:
acquiring at least one code to be detected, determining the similarity between the codes to be detected, and obtaining a code similarity detection result;
clustering all codes to be detected based on a code similarity detection result to obtain at least one type code set;
and comparing the codes to be detected in the type code sets with preset reference type codes for each type code set, and determining a code quality detection result according to the comparison result.
2. The method according to claim 1, wherein clustering all codes to be detected based on the code similarity detection result to obtain at least one type of code set comprises:
selecting at least one code from the codes to be detected as a set center code, and taking the codes except the set center code in the codes to be detected as non-set center codes;
based on the code similarity detection result, determining the similarity between each non-set center code and the set center code to obtain the distance center similarity;
and clustering the set center codes and the non-set center codes based on the similarity of the distance centers to obtain the type code set.
3. The method of claim 2, wherein clustering the aggregate center code and the non-aggregate center code based on the distance center similarity to obtain the set of type codes comprises:
respectively determining code sets corresponding to the central codes of each set to obtain at least one initial code set;
and comparing the similarity between the non-set center codes and the distance center of the set center codes for each non-set center code, and adding the non-set center codes into the initial code set according to the comparison result to obtain the type code set.
4. The method according to claim 1, wherein the determining the similarity between the codes to be detected to obtain a code similarity detection result includes:
vector conversion is carried out on each code to be detected, and an object code vector is obtained;
and calculating Euclidean distance between every two target code vectors, and determining the code similarity detection result according to the Euclidean distance.
5. The method of claim 2, further comprising, after said deriving said set of type codes:
Taking each non-set center code in the type code set as a candidate center code;
determining the similarity between the candidate center code and other non-set center codes in the type code set to obtain a similarity value in a candidate group;
and adjusting the set center code of the type code set according to the similarity value in the candidate group.
6. The method of claim 5, wherein adjusting the set center code of the set of type codes based on the similarity values within the candidate set comprises:
determining the similarity between the set center codes and the non-set center codes in the type code set to obtain similarity values in the current group;
and taking the candidate center code as the set center code of the type code set under the condition that the similarity value in the candidate group is larger than the similarity value in the current group.
7. The method of claim 2, further comprising, prior to said comparing the code to be detected in the set of type codes with a preset reference type code:
determining the similarity between the set center code and each preset reference code in a preset code quality database to obtain type similarity;
And determining a preset reference type code corresponding to the type code set from the preset reference codes based on the type similarity.
8. A code quality detection apparatus, the apparatus comprising:
the code similarity detection module is used for acquiring at least one code to be detected, determining the similarity between the codes to be detected and obtaining a code similarity detection result;
the code clustering module is used for clustering all codes to be detected based on the code similarity detection result to obtain at least one type code set;
the code quality detection module is used for comparing the codes to be detected in the type code sets with preset reference type codes aiming at each type code set, and determining a code quality detection result according to the comparison result.
9. A computer device, the computer device comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the code quality detection method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the code quality detection method according to any one of claims 1-7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311652090.8A CN117608630A (en) | 2023-12-05 | 2023-12-05 | Code quality detection method, device, equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311652090.8A CN117608630A (en) | 2023-12-05 | 2023-12-05 | Code quality detection method, device, equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117608630A true CN117608630A (en) | 2024-02-27 |
Family
ID=89955997
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311652090.8A Pending CN117608630A (en) | 2023-12-05 | 2023-12-05 | Code quality detection method, device, equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117608630A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118427842A (en) * | 2024-07-04 | 2024-08-02 | 北京安普诺信息技术有限公司 | LLM-based SAST vulnerability rapid analysis method, device and equipment |
| CN118468295A (en) * | 2024-07-04 | 2024-08-09 | 北京安普诺信息技术有限公司 | SAST vulnerability detection method, device and electronic equipment based on LLM |
| CN120046154A (en) * | 2024-12-30 | 2025-05-27 | 中国人民解放军61660部队 | Software vulnerability parallel mining method for optimizing machine learning |
-
2023
- 2023-12-05 CN CN202311652090.8A patent/CN117608630A/en active Pending
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118427842A (en) * | 2024-07-04 | 2024-08-02 | 北京安普诺信息技术有限公司 | LLM-based SAST vulnerability rapid analysis method, device and equipment |
| CN118468295A (en) * | 2024-07-04 | 2024-08-09 | 北京安普诺信息技术有限公司 | SAST vulnerability detection method, device and electronic equipment based on LLM |
| CN118427842B (en) * | 2024-07-04 | 2024-10-01 | 北京安普诺信息技术有限公司 | LLM-based SAST vulnerability rapid analysis method, device and equipment |
| CN120046154A (en) * | 2024-12-30 | 2025-05-27 | 中国人民解放军61660部队 | Software vulnerability parallel mining method for optimizing machine learning |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN117608630A (en) | Code quality detection method, device, equipment and storage medium | |
| CN111753863B (en) | Image classification method, device, electronic device and storage medium | |
| US20070005556A1 (en) | Probabilistic techniques for detecting duplicate tuples | |
| CN114020916B (en) | Text classification method, device, storage medium and electronic device | |
| CN110728313B (en) | Classification model training method and device for intention classification recognition | |
| US20220114478A1 (en) | System and method for enhancing inference models based on prediction data | |
| CN113723618B (en) | SHAP optimization method, equipment and medium | |
| CN115965410B (en) | Network site selection method and device | |
| CN116610806A (en) | AI-based RPA digital service processing method and computer equipment | |
| CN114186605B (en) | Methods, apparatus, equipment and storage media for processing minority samples | |
| CN116707859A (en) | Feature rule extraction method and device, network intrusion detection method and device | |
| CN120105100A (en) | A method, device and storage medium for optimizing LLMs pre-training data set | |
| Behtash et al. | Universality of layer-level entropy-weighted quantization beyond model architecture and size | |
| CN112801226A (en) | Data screening method and device, computer readable storage medium and electronic equipment | |
| CN113065597A (en) | Clustering method, device, equipment and storage medium | |
| CN110968690B (en) | Clustering division method and device for words, equipment and storage medium | |
| CN115905236B (en) | Data processing method, device, equipment and storage medium | |
| CN117952717A (en) | A method and system for processing air ticket orders based on big data | |
| CN114268625B (en) | Feature selection method, device, equipment and storage medium | |
| CN112380111B (en) | Real-time defect positioning method and system based on new project | |
| CN115695205B (en) | Topology network structure optimization method, device, equipment and storage medium | |
| CN118672792B (en) | Resource fragment treatment method | |
| CN110647519B (en) | Method and device for predicting missing attribute value in test sample | |
| CN120223424B (en) | Website data processing method, device and equipment | |
| CN115297114B (en) | Node allocation method and device, storage medium and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |