[go: up one dir, main page]

CN117688573A - A static analysis method, system, chip and device for multi-language applications - Google Patents

A static analysis method, system, chip and device for multi-language applications Download PDF

Info

Publication number
CN117688573A
CN117688573A CN202311706298.3A CN202311706298A CN117688573A CN 117688573 A CN117688573 A CN 117688573A CN 202311706298 A CN202311706298 A CN 202311706298A CN 117688573 A CN117688573 A CN 117688573A
Authority
CN
China
Prior art keywords
type
language
analysis
intermediate representation
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311706298.3A
Other languages
Chinese (zh)
Inventor
缪思薇
左海峰
汪洋
王智慧
晁竞健
高睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electric Power Research Institute Co Ltd CEPRI
Original Assignee
China Electric Power Research Institute Co Ltd CEPRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electric Power Research Institute Co Ltd CEPRI filed Critical China Electric Power Research Institute Co Ltd CEPRI
Priority to CN202311706298.3A priority Critical patent/CN117688573A/en
Publication of CN117688573A publication Critical patent/CN117688573A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Stored Programmes (AREA)

Abstract

本发明公开了一种面向多语言应用的静态分析方法、系统、芯片及设备,将多语言项目中不同编程语言的文件转换为抽象语法树,并进行类型分析;将抽象语法树转换为通用统一的中间表示数据;将类型分析信息存储到缓存数据集中;将类型分析信息分为已解析和未解析两种状态;如果已解析和未解析两种状态的类型查找成功,则补充与调用对象相关的函数、属性信息,并添加至中间表示数据中;基于中间表示数据,构建跨语言的完整函数调用关系,并通过静态分析缺陷分析模块发现跨语言调用存在的安全隐患。本发明在多语言应用中能够提供全面的安全性分析和漏洞检测,提高安全性和可靠性。

The invention discloses a static analysis method, system, chip and equipment for multi-language applications. It converts files of different programming languages in multi-language projects into abstract syntax trees and performs type analysis; converts the abstract syntax trees into universal unified intermediate representation data; store the type analysis information in the cache data set; divide the type analysis information into two states: parsed and unparsed; if the type search in the two states of parsed and unparsed is successful, the supplement is related to the calling object function and attribute information, and add it to the intermediate representation data; based on the intermediate representation data, a complete cross-language function call relationship is constructed, and the security risks of cross-language calls are discovered through the static analysis defect analysis module. The invention can provide comprehensive security analysis and vulnerability detection in multi-language applications, and improve security and reliability.

Description

Multi-language application-oriented static analysis method, system, chip and device
Technical Field
The invention belongs to the technical field of source code defect detection, and particularly relates to a static analysis method, a system, a chip and equipment for multi-language application.
Background
Modern software often completes corresponding functions through different modules written in multiple languages along with the expansion of service scale, the change of function requirements and other reasons. Some programming languages provide the ability for code to call each other directly, e.g., java code may call Kotlin code, groovy code may call Java code directly, etc. The different language code modules are organized by construction tools such as Gradle and Maven and are called by API interfaces. Because of the complexity of software, higher demands are also placed on software vulnerability analysis programs.
The static analysis technology of the source code is one of the important technologies for ensuring the safety of software. Static analysis is a technique for discovering software vulnerabilities by analyzing source code without running a program. The static parser will scan all source code, convert it to an abstract syntax tree and perform type parsing, and convert the abstract syntax tree to a generic unified intermediate representation. The abstract syntax tree is a structured representation of source code, type analysis is to infer the behavior of the code by identifying the types of variables and functions and their relationships, and the intermediate representation is a language independent data generated from language abstract syntax tree and type information.
Static analysis detection models are used to discover potential software problems based on a unified intermediate representation. The more accurate type analysis can more accurately infer variables and functions used in codes, so that the accuracy of static analysis program detection is effectively improved.
However, in software projects containing multiple programming languages, currently widely used static analysis programs can only call the corresponding analysis engines for separate detection according to source codes of different language types. This approach fails to perform type analysis on cross-language program calls, resulting in an inability to accurately analyze the function of the variables or functions of the cross-language call. When data that may be at risk is transferred from an interface written in one language to an interface in another language through the interface, existing static analysis methods cannot effectively analyze such cross-language conditions, resulting in the inability to discover code problems that may be present.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a static analysis method, a system, a chip and equipment for multi-language application, which are used for solving the technical problems that the existing static analysis software can only independently analyze different language source codes of multi-language items and can not find the inter-calling interfaces in the multi-language items, providing comprehensive security analysis and vulnerability detection in the multi-language application and improving the security and reliability.
The invention adopts the following technical scheme:
a static analysis method for multi-language application comprises the following steps:
converting files of different programming languages in the multi-language project into an abstract syntax tree, and performing type analysis to obtain type analysis information; converting the abstract syntax tree into universal unified intermediate representation data;
storing the type analysis information in a cache data set;
dividing type analysis information in the cache data set into two states which are analyzed and unresolved;
if the type lookup of the parsed and unresolved two states is successful, supplementing functions and attribute information related to the calling object, and adding the functions and the attribute information into the intermediate representation data;
based on the obtained intermediate representation data, constructing a cross-language complete function call relation, and finding potential safety hazards existing in cross-language call through a static analysis defect analysis module.
Preferably, converting files of different programming languages in the multi-language project into an abstract syntax tree is embodied as:
files of different programming languages in the multilingual project are collected according to the file suffixes, and for each language file, a corresponding compiler is used for converting the source codes into abstract syntax trees and performing type analysis.
More preferably, the different programming language files comprise source code files in Java, kotlin, groovy language, and the compiler comprises a JDT compiler, a Kotlin compiler, a Groovy compiler.
Preferably, storing the type analysis information in the cached data set is specifically:
and storing all types of analysis results, and establishing a mapping relation between all types which are analyzed and the data of the intermediate representation, wherein the stored data structure is divided into file names, languages, type information and representations of the types in the data of the intermediate representation.
Preferably, the two states resolved and unresolved are specifically:
the parsed representation has completed type inference and is directly used for detecting the model; and the unresolved entering type deducing module searches relevant types in the cached data set to supplement type information according to the brief type information of the calling object.
More preferably, cached datasets are obtained for types or other languages used by the method according to the interoperability relationships between the programming languages.
More preferably, type inference is made for all types or methods that are not fully resolved, specifically as follows:
acquiring the package name and the class name of the type which is not completely analyzed; according to the language type, acquiring a cache data set of other callable languages, and searching corresponding type information according to the acquired package name and class name of the type; data of the corresponding type in the intermediate representation is acquired.
In a second aspect, an embodiment of the present invention provides a static analysis system for multilingual applications, including:
the conversion module is used for converting files of different programming languages in the multi-language project into an abstract syntax tree and performing type analysis to obtain type analysis information;
the data module is used for converting the abstract syntax tree into universal unified intermediate representation data;
the storage module is used for storing the type analysis information into a cache data set;
the state module is used for dividing the type analysis information in the cache data set into two states which are analyzed and unresolved;
the supplementing module supplements functions and attribute information related to the calling object when the type lookup of the parsed and unresolved two states obtained by the state module is successful, and adds the functions and attribute information into the intermediate representation data obtained by the data module;
the analysis module is used for constructing a complete cross-language function call relation based on the intermediate representation data obtained by the supplementing module, and finding potential safety hazards existing in cross-language call through the static analysis defect analysis module.
In a third aspect, a chip includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above-described multi-language application-oriented static analysis method when the computer program is executed.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including a computer program, where the computer program when executed by the electronic device implements the steps of the above-mentioned static analysis method for multilingual applications.
Compared with the prior art, the invention has at least the following beneficial effects:
a static analysis method for multi-language application can effectively process the static analysis requirement of the multi-language application and improve the accuracy, comprehensiveness and reliability of vulnerability detection, and is concretely as follows:
1. acquiring accurate type information: the method converts the source code into a grammar tree and performs type analysis to obtain accurate type information of each object and function, and supplements the type information which is missing in the middle through a type deducing module. Therefore, accurate type information can be provided for the static analysis detection module, and the accuracy and the comprehensiveness of vulnerability detection are improved.
2. Potential safety hazards of cross-language call processing: based on the unified intermediate representation data of the supplementary deletion type, the static analysis detection module not only can identify loopholes in a single language module, but also can process potential safety hazards existing in cross-language calling. The method can analyze the interface security and the data transfer correctness in cross-language call and detect potential security vulnerabilities. The comprehensive vulnerability detection capability enables the static analysis program optimized by the method to comprehensively analyze the security problems in multi-language application, including the rationality and the security of a cross-language call interface
It will be appreciated that the advantages of the second to fourth aspects may be found in the relevant description of the first aspect and are not repeated here.
In summary, the method and the device can acquire accurate type information, process potential safety hazards of cross-language calling, and support detection requirements of complex code items; the method can comprehensively improve the static analysis capability of the multilingual application, and enhance the accuracy, the comprehensiveness and the reliability of vulnerability detection, thereby improving the safety and the reliability of software.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a view of a compiler type analysis result storage structure;
FIG. 3 is a schematic diagram of a computer device according to an embodiment of the present invention;
FIG. 4 is a block diagram of a chip according to an embodiment of the present invention;
FIG. 5 illustrates a portion of source code according to an example embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it will be understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In the present invention, the character "/" generally indicates that the front and rear related objects are an or relationship.
It should be understood that although the terms first, second, third, etc. may be used to describe the preset ranges, etc. in the embodiments of the present invention, these preset ranges should not be limited to these terms. These terms are only used to distinguish one preset range from another. For example, a first preset range may also be referred to as a second preset range, and similarly, a second preset range may also be referred to as a first preset range without departing from the scope of embodiments of the present invention.
Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
Various structural schematic diagrams according to the disclosed embodiments of the present invention are shown in the accompanying drawings. The figures are not drawn to scale, wherein certain details are exaggerated for clarity of presentation and may have been omitted. The shapes of the various regions, layers and their relative sizes, positional relationships shown in the drawings are merely exemplary, may in practice deviate due to manufacturing tolerances or technical limitations, and one skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions as actually required.
The invention provides a static analysis method for multi-language application, which converts multi-language project source codes into abstract syntax trees and type information through a compiler, extracts abstract information of all types, collects and stores the information in a unified abstract information data set, and abstracts the syntax trees into language-independent intermediate representations. The type inference module is utilized to infer the incompletely parsed type and method information based on the summary information dataset, supplementing data in the intermediate representation, thereby discovering potential code security issues based on the supplemented intermediate representation using the analysis model. By the optimization method, interface information can be supplemented across languages, comprehensive static analysis can be performed on the whole project, and the capability of cross-language call analysis and vulnerability detection is provided.
Source code conversion: and carrying out abstract syntax tree conversion and type analysis on the source codes through each language compiler, and caching type analysis results.
The intermediate representation: the abstract syntax tree and the type analysis result are further converted into a language independent unified intermediate representation.
Type inference: the missing type information is analyzed by the type inference module through the inference steps and the translator type, and is supplemented into the intermediate representation.
Static analysis detection: and carrying out static analysis detection based on the supplemented intermediate representation, so that the accuracy and the comprehensiveness of vulnerability detection are improved.
Referring to fig. 1, the static analysis method for multi-language application of the present invention includes the following steps:
s1, collecting files of different programming languages in a multilingual project according to file suffixes, converting a source code into an abstract syntax tree by using a corresponding compiler aiming at the files of each language, and analyzing types;
collecting all source code files of the items to be analyzed, and distinguishing the source code files of Java, kotlin, groovy language according to the file types; and respectively calling the collected Java, kotlin, groovy files to a JDT compiler, a Kotlin compiler and a Groovy compiler for abstract syntax tree conversion and type analysis.
S2, converting the abstract syntax tree into a universal unified intermediate representation;
s3, storing type information obtained by each language analysis into a cache data set;
storing all types of analysis results, and establishing a mapping relation between all types of which analysis is completed and data of the intermediate representation.
The stored data structure is divided into 4 items of file name, language, type information and representation of type in the intermediate representation data.
The storage structure is shown in fig. 2, and the type information needs to be mapped with the method and the field. The member methods and field definitions of Java, kotlin, groovy are necessarily in the type definition. In addition, the top-level method and field of kotlen are located in the type definition of the kotlen file name where they are located plus the Kt suffix form, and the top-level method and field of Groovy are located in the type definition of the Groovy file name where they are located.
S4, dividing the type and the method which are analyzed by the compiler into two states which are analyzed and unresolved, wherein the analyzed representation is used for completing type inference, and the method can be directly used for detecting a model without further processing; if the type is not analyzed, a type deducing module is entered, and according to the brief type information of the calling object, the relevant type is searched in the cached data set to supplement the type information;
a cached data set to be analyzed is selected. Firstly, according to the interoperation relation between programming languages, a cache data set of types or other languages used by the method is obtained. For example, java may call Kotlin, groovy code, kotlen may call Java code, groovy may call Java code, and corresponding cached data sets may be selected according to an interoperation relationship, respectively.
Type inference is performed on all incompletely resolved types or methods, specifically as follows:
s401, acquiring a package name and a class name of a type which is not completely resolved:
the JDT compiler analyzes the type of the incompletely parsed Java code to obtain only the type name, and needs to analyze the imported information in combination with the report method of the compilation unit to obtain the complete package name and the type name.
The Kotlen compiler can only acquire the type name for the type which is not completely resolved, and acquires the complete package name and the type name by analyzing the imported information through the inportList attribute according to the KtFile object.
The Groovy compiler can directly acquire the complete package name and the type name for the type which is not completely resolved.
S402, acquiring a cache data set of other callable languages according to the language type, and searching corresponding type information according to the packet name and the class name of the type acquired in the step S401;
s403, acquiring data of the corresponding type in the intermediate representation.
S5, if the type lookup is successful, supplementing functions and attribute information related to the calling object, and adding the functions and the attribute information into the intermediate representation;
and supplementing the data obtained after the type inference into unified intermediate representation data.
S6, based on the intermediate representation data after type searching and supplementing, constructing a cross-language complete function call relation, and finding potential safety hazards existing in cross-language call through a static analysis defect analysis module.
And completing defect analysis of the static analysis engine based on the intermediate representation data, and reporting analysis results.
In still another embodiment of the present invention, a static analysis system for a multilingual application is provided, which can be used to implement the static analysis method for a multilingual application, and specifically, the static analysis system for a multilingual application includes a conversion module, a data module, a storage module, a status module, a supplement module, and an analysis module.
The conversion module converts files of different programming languages in the multilingual project into an abstract syntax tree and performs type analysis;
the data module is used for converting the abstract syntax tree obtained by the conversion module into universal unified intermediate representation data;
the storage module is used for storing the type analysis information obtained by the conversion module into a cache data set;
the state module is used for dividing the type analysis information obtained by the storage module into two states which are analyzed and unresolved;
the supplementing module supplements functions and attribute information related to the calling object when the type lookup of the parsed and unresolved two states obtained by the state module is successful, and adds the functions and attribute information into the intermediate representation data obtained by the data module;
the analysis module is used for constructing a complete cross-language function call relation based on the intermediate representation data obtained by the supplementing module, and finding potential safety hazards existing in cross-language call through the static analysis defect analysis module.
In yet another embodiment of the present invention, a terminal device is provided, the terminal device including a processor and a memory, the memory for storing a computer program, the computer program including program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., which are the computational core and control core of the terminal adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement the corresponding method flow or corresponding functions; the processor in the embodiment of the invention can be used for the operation of a static analysis method facing multi-language application, and comprises the following steps:
converting files of different programming languages in the multi-language project into an abstract syntax tree, and performing type analysis; converting the abstract syntax tree into universal unified intermediate representation data; storing the type analysis information in a cache data set; dividing the type analysis information into two states which are analyzed and unresolved; if the type lookup of the parsed and unresolved two states is successful, supplementing functions and attribute information related to the calling object, and adding the functions and the attribute information into the intermediate representation data; based on the intermediate representation data, a complete function call relation of the cross-language is constructed, and potential safety hazards existing in the cross-language call are found through a static analysis defect analysis module.
Referring to fig. 3, the terminal device is a computer device, and the computer device 60 of this embodiment includes: a processor 61, a memory 62, and a computer program 63 stored in the memory 62 and executable on the processor 61, the computer program 63 when executed by the processor 61 implements the reservoir inversion wellbore fluid composition calculation method of the embodiment, and is not described in detail herein to avoid repetition. Alternatively, the computer program 63, when executed by the processor 61, implements the functions of each model/unit in the static analysis system for multilingual applications according to the embodiment, and in order to avoid repetition, it is not described in detail herein.
The computer device 60 may be a desktop computer, a notebook computer, a palm top computer, a cloud server, or the like. Computer device 60 may include, but is not limited to, a processor 61, a memory 62. It will be appreciated by those skilled in the art that fig. 3 is merely an example of a computer device 60 and is not intended to limit the computer device 60, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., a computer device may also include an input-output device, a network access device, a bus, etc.
The processor 61 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 62 may be an internal storage unit of the computer device 60, such as a hard disk or memory of the computer device 60. The memory 62 may also be an external storage device of the computer device 60, such as a plug-in hard disk provided on the computer device 60, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like.
Further, the memory 62 may also include both internal storage units and external storage devices of the computer device 60. The memory 62 is used to store computer programs and other programs and data required by the computer device. The memory 62 may also be used to temporarily store data that has been output or is to be output.
Referring to fig. 4, the terminal device is a chip, and the chip 600 of this embodiment includes a processor 622, which may be one or more in number, and a memory 632 for storing a computer program executable by the processor 622. The computer program stored in memory 632 may include one or more modules each corresponding to a set of instructions. Further, the processor 622 may be configured to execute the computer program to perform the above-described multi-language application-oriented static analysis method.
In addition, chip 600 may further include a power supply component 626 and a communication component 650, where power supply component 626 may be configured to perform power management of chip 600, and communication component 650 may be configured to enable communication of chip 600, e.g., wired or wireless communication. In addition, the chip 600 may also include an input/output (I/O) interface 658. Chip 600 may operate based on an operating system stored in memory 632.
In a further embodiment of the present invention, the present invention also provides a storage medium, in particular, a computer readable storage medium (Memory), which is a Memory device in a terminal device, for storing programs and data. It will be appreciated that the computer readable storage medium herein may include both a built-in storage medium in the terminal device and an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium may be a high-speed RAM Memory or a Non-Volatile Memory (Non-Volatile Memory), such as at least one magnetic disk Memory.
One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the respective steps of the above-described embodiments with respect to a multi-language application-oriented static analysis method; one or more instructions in a computer-readable storage medium are loaded by a processor and perform the steps of:
converting files of different programming languages in the multi-language project into an abstract syntax tree, and performing type analysis; converting the abstract syntax tree into universal unified intermediate representation data; storing the type analysis information in a cache data set; dividing the type analysis information into two states which are analyzed and unresolved; if the type lookup of the parsed and unresolved two states is successful, supplementing functions and attribute information related to the calling object, and adding the functions and the attribute information into the intermediate representation data; based on the intermediate representation data, a complete function call relation of the cross-language is constructed, and potential safety hazards existing in the cross-language call are found through a static analysis defect analysis module.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Application instance
Compared with the same static analysis tool, the invention mixes the analysis results of Java and Kotlen code items before and after the technology is applied.
As shown in fig. 5, in the Java code file test.java, there are calls to the creatothing method and the dosometing method of the thornfactor.kt of the Kotlin code file thornfactor.kt.
The smear data propagation path of the static analysis tool output to which the present invention is not applied:
request->
param=request.getHeader("vector")->
param=URLDecoder.decode(param,"UTF-8")
after the invention is applied, the stain data propagation path output by the static analysis tool:
request->
param=request.getHeader("vector")->
param=URLDecoder.decode(param,"UTF-8")->
bar=thing.doSomething(param)->
sql="INSERT INTO users(username,password)VALUES('foo','"+bar+"')"->
statement.executeUpdate(sql,new String[]{"USERNAME","PASSWORD"});
the static analysis tool after the application of the invention can be seen, the path analysis of cross-language function call is completed through complete type information, and SQL injection risks in the example codes are found;
the specific operation steps are as follows:
1. the JDT compiler is used to parse the test.java file to convert it into an intermediate representation and store the Test type information.
2. And analyzing the ThingInterface. Kt, thing1.Kt and ThingFactory. Kt files by using a Kotlin compiler, respectively converting the ThingInterface. Kt and ThingFactory. Kt files into corresponding intermediate representations, and storing the type information of ThingInterface, thing and ThingFactoy.
3. Analyzing the intermediate representation corresponding to the Test file, finding that unresolved types ThingInterface and ThingFactoy exist, and unresolved methods doSomethng and creatothing.
4. And acquiring complete package names and class names through import sentences of test.java files, namely, import test.kotin.thin Factory and import test.kotin.thin interface, searching the types of the test.kotin.thin Factory and the test.kotin.thin interface in a data set in advance, and supplementing corresponding type information in a Test intermediate representation.
5. And using a static analysis tool to complete the construction of the method call graph and the analysis of the data flow according to the complete type information.
6. The defect analysis module through the static analysis tool finds that there is a risk of SQL injection in the example code.
It can be seen that when the static analysis tool of the technology of the present invention is not used to analyze test.java files, the analysis of the data stream is interrupted because the information of the types of ThingInterface and ThingFactoy cannot be obtained, and the SQL injection risk existing in the example code cannot be found accurately. After the technology of the invention is applied, the SQL injection problem is accurately found by converting different languages into unified intermediate representation data in advance and supplementing the lacking type information, thereby completing the analysis of the cross-language data stream transmission.
In summary, the static analysis method, system, chip and device for multi-language application provided by the invention use different compilers to convert source codes of multiple languages into unified intermediate representation, cache type information of different languages, and use type inference to supplement the type and method information of cross-language interface call lacking in the intermediate representation. The method enables the static analysis program to analyze the program problems existing in the cross-language interface call.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other manners. For example, the apparatus/terminal embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a usb disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a Random-Access Memory (RAM), an electrical carrier wave signal, a telecommunications signal, a software distribution medium, etc., it should be noted that the content of the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in jurisdictions, such as in some jurisdictions, according to the legislation and patent practice, the computer readable medium does not include electrical carrier wave signals and telecommunications signals.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (10)

1. A method of static analysis for multi-lingual applications, comprising the steps of:
converting files of different programming languages in the multi-language project into an abstract syntax tree, and performing type analysis to obtain type analysis information; converting the abstract syntax tree into universal unified intermediate representation data;
storing the type analysis information in a cache data set;
dividing type analysis information in the cache data set into two states which are analyzed and unresolved;
if the type lookup of the parsed and unresolved two states is successful, supplementing functions and attribute information related to the calling object, and adding the functions and the attribute information into the intermediate representation data;
based on the obtained intermediate representation data, constructing a cross-language complete function call relation, and finding potential safety hazards existing in cross-language call through a static analysis defect analysis module.
2. The static analysis method for multi-language applications according to claim 1, wherein converting files of different programming languages in the multi-language item into an abstract syntax tree is as follows:
files of different programming languages in the multilingual project are collected according to the file suffixes, and for each language file, a corresponding compiler is used for converting the source codes into abstract syntax trees and performing type analysis.
3. The static analysis method for multi-language applications according to claim 2, wherein the different programming language files comprise source code files of Java, kotlin, groovy languages, and the compiler comprises JDT compiler, kotlin compiler, groovy compiler.
4. The static analysis method for multi-language applications according to claim 1, wherein storing the type analysis information in the cache dataset is specifically:
and storing all types of analysis results, and establishing a mapping relation between all types which are analyzed and the data of the intermediate representation, wherein the stored data structure is divided into file names, languages, type information and representations of the types in the data of the intermediate representation.
5. The static analysis method for multi-language applications according to claim 1, wherein the two states of resolved and unresolved are specifically:
the parsed representation has completed type inference and is directly used for detecting the model; and the unresolved entering type deducing module searches relevant types in the cached data set to supplement type information according to the brief type information of the calling object.
6. The method of claim 5, wherein the cached data set is obtained for the type or other language used by the method based on an interoperability relationship between programming languages.
7. The method for static analysis for multilingual applications according to claim 5, wherein the type inference is performed for all incompletely parsed types or methods, specifically as follows:
acquiring the package name and the class name of the type which is not completely analyzed; according to the language type, acquiring a cache data set of other callable languages, and searching corresponding type information according to the acquired package name and class name of the type; data of the corresponding type in the intermediate representation is acquired.
8. A multi-lingual application oriented static analysis system comprising:
the conversion module is used for converting files of different programming languages in the multi-language project into an abstract syntax tree and performing type analysis to obtain type analysis information;
the data module is used for converting the abstract syntax tree into universal unified intermediate representation data;
the storage module is used for storing the type analysis information into a cache data set;
the state module is used for dividing the type analysis information in the cache data set into two states which are analyzed and unresolved;
the supplementing module supplements functions and attribute information related to the calling object when the type lookup of the parsed and unresolved two states obtained by the state module is successful, and adds the functions and attribute information into the intermediate representation data obtained by the data module;
the analysis module is used for constructing a complete cross-language function call relation based on the intermediate representation data obtained by the supplementing module, and finding potential safety hazards existing in cross-language call through the static analysis defect analysis module.
9. A chip is characterized in that,
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-7.
10. An electronic device, characterized in that,
comprising a chip as claimed in claim 9.
CN202311706298.3A 2023-12-12 2023-12-12 A static analysis method, system, chip and device for multi-language applications Pending CN117688573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311706298.3A CN117688573A (en) 2023-12-12 2023-12-12 A static analysis method, system, chip and device for multi-language applications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311706298.3A CN117688573A (en) 2023-12-12 2023-12-12 A static analysis method, system, chip and device for multi-language applications

Publications (1)

Publication Number Publication Date
CN117688573A true CN117688573A (en) 2024-03-12

Family

ID=90126013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311706298.3A Pending CN117688573A (en) 2023-12-12 2023-12-12 A static analysis method, system, chip and device for multi-language applications

Country Status (1)

Country Link
CN (1) CN117688573A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121209838A (en) * 2025-11-28 2025-12-26 北京麟卓信息科技有限公司 Code editor cross-language function positioning method based on layered context driving

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121209838A (en) * 2025-11-28 2025-12-26 北京麟卓信息科技有限公司 Code editor cross-language function positioning method based on layered context driving
CN121209838B (en) * 2025-11-28 2026-01-30 北京麟卓信息科技有限公司 Code editor cross-language function positioning method based on layered context driving

Similar Documents

Publication Publication Date Title
CN110096338B (en) Intelligent contract execution method, device, equipment and medium
US8959106B2 (en) Class loading using java data cartridges
US9058360B2 (en) Extensible language framework using data cartridges
EP4030313B1 (en) Sql statement generator
US7526755B2 (en) Plug-in pre- and postconditions for static program analysis
US7421680B2 (en) Persisted specifications of method pre-and post-conditions for static checking
US7934207B2 (en) Data schemata in programming language contracts
CN111209298A (en) Method, device, equipment and storage medium for querying database data
CN115599386A (en) Code generation method, device, equipment and storage medium
CN116126904A (en) Data blood edge analysis method and device, data blood edge analysis system and electronic equipment
CN110737431A (en) Software development method, development platform, terminal device and storage medium
CN117688573A (en) A static analysis method, system, chip and device for multi-language applications
CN118860406B (en) Vulnerability detection methods, devices, computer equipment, and readable storage media
Chen et al. A GCC-based checker for compliance with MISRA-C's single-translation-unit rules
US20230315412A1 (en) Scalable behavioral interface specification checking
WO2025081782A1 (en) Code analysis method and related device
EP2535813B1 (en) Method and device for generating an alert during an analysis of performance of a computer application
CN118445103A (en) A data processing method and device
CN118171250A (en) A code fingerprint tracing identification method, system, terminal and storage medium
Deng et al. Formal Verification Platform as a Service: WebAssembly Vulnerability Detection Application.
CN115016797A (en) Code analysis method and device
Liu et al. PromeFuzz: A Knowledge-Driven Approach to Fuzzing Harness Generation with Large Language Models
CN118606354B (en) Visualization method and device for program execution scheme, electronic equipment and storage medium
Grigorev et al. String-embedded language support in integrated development environment
US12333241B2 (en) Integrating non-native dependencies in spreadsheet applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination