Disclosure of Invention
Aiming at the defects of the prior art, the embodiment of the application aims to provide a method for identifying and protecting the attackeable data in the kernel file system.
According to a first aspect of an embodiment of the present application, there is provided a method for identifying and protecting attackeable data in a kernel file system, including:
compiling source codes of a kernel file system into LLVM IR files, and scanning to obtain core data of different types based on the LLVM IR files, wherein the core data comprise file metadata, file content and file reading and writing directions;
tracking data stream propagation and control stream propagation of the core data based on the LLVM IR file to obtain attacked non-control data;
dynamically verifying the attacked non-control data and judging whether the attacked non-control data is the attacked data or not;
And protecting the data which can be finally influenced by the attacked data in the disk.
Further, a data structure of authority data in an abstract file system layer is adopted as the file metadata, data structures representing file contents in a page buffer layer and a general block layer are adopted as the file contents, and read/write event seat file read/write directions which are uniformly recorded when data exchange is triggered in the general block layer are adopted.
Further, data stream propagation and control stream propagation of the core data is tracked through value flow analysis of type-based access paths.
Further, for core data in the form of a marker variable, the data stream transmission includes direct assignment or assignment after logic operation, and the value affecting another marker variable is judged through branching in the control stream transmission.
Further, for core data in the form of pointer references, the data streaming thereof includes direct assignment operations and assignment after arithmetic operations.
Further, the dynamic verification of the attacked non-control data, and the judgment of whether the attacked non-control data is the attacked data, includes:
According to the non-control data to be verified, the user mode program executes legal file writing operation on the file with writing authority, triggers the use code of the non-control data in the kernel, automatically records the target value of the non-control data by the kernel instrumentation code, and stores the target value into an array maintained by the kernel;
the user program tries to write the non-control data into the read-only file, triggers the using code of the non-control data in the kernel, takes out and writes the target value of the non-control data covered in the array, and the kernel relocates the data at the moment and automatically resets the data to the value recorded in the last step;
And the user state program re-reads the read-only file, checks whether the content is changed into the malicious content written in the previous step, and if so, the data is the attackeable data.
According to a second aspect of an embodiment of the present application, there is provided an apparatus for identifying and protecting attackeable data in a kernel file system, including:
The scanning module is used for compiling source codes of the kernel file system into LLVM IR files, and scanning the LLVM IR files to obtain core data of different types, including file metadata, file content and file reading and writing directions;
The tracking module is used for tracking data stream propagation and control stream propagation of the core data based on the LLVM IR file to obtain attacked non-control data;
the dynamic verification module is used for dynamically verifying the attacked non-control data and judging whether the attacked non-control data is the attacked data or not;
And the protection module is used for protecting the data which can be finally influenced by the attack data in the disk.
According to a third aspect of embodiments of the present application, there is provided a computer program product comprising a computer program/instruction which, when executed by a processor, carries out the method according to the first aspect.
According to a fourth aspect of an embodiment of the present application, there is provided an electronic apparatus including:
One or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.
According to a fifth aspect of embodiments of the present application there is provided a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method according to the first aspect.
Compared with the prior art, the method for classifying the non-control data based on the semantics has the technical innovation points that 1, the semantics of the non-control data which can be attacked in a modeling file system are classified into different categories according to the semantics, and core data representing each category is screened. 2. Static tracking technology of potentially attacked data, namely analyzing and obtaining all potentially attacked data aiming at the propagation process of the static tracking data of the core data representing each category. 3. The dynamic simulation attack verification technology simulates the value of the dynamic tampered data of an attacker, and automatically judges whether the attack of the copyright is caused, so that whether the data is the attackeable data is verified, and the attack surface caused by the non-control data is obtained. 4. The protection technology for the data of different categories is to analyze the hierarchy of a file system where the data available for the right-raising attack is located, and protect the data in different modes according to the category based on the semantics in the first step. Therefore, the invention has the beneficial effects that:
1) The invention relates to automatic and systematic attacked non-control data analysis, which is an automatic analysis tool for an attack surface caused by the attacked non-control data in a Linux kernel file system. The invention can automatically identify and use potential attacked data according to semantic information, simulate the attack process, verify the validity of the data and identify the complete attack surface.
2) And the invention designs different protection schemes for three kinds of data after obtaining the data effective for the right-raising attack, and selects the data which is finally influenced as a protection target according to different file system layers where the data are transmitted so as to reduce the performance cost.
3) Compatible with the driving of various file systems and storage hardware, the present invention automatically identifies the offensive non-control data, such as ext2, ext4, FAT, etc., that is present in the implementation of various file systems. In addition, the system can be compatible with different hardware, and can systematically analyze hardware drivers without additional labor cost.
4) Cross-version support is that the update iteration speed of the Linux kernel code is very high, the method can be transplanted into Linux kernels of various versions without extra labor cost, and analysis based on source codes can be compatible with all mainstream system architectures (x86_64, aarch64 and the like).
Therefore, the invention has good popularization and application prospect
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
FIG. 1 is a flowchart illustrating a method for identifying and protecting aggressor data in a kernel file system, according to an exemplary embodiment, where the method is applied to a terminal, as shown in FIG. 1, and may include the following steps:
step S1, compiling source codes of a kernel file system (shown in figure 2) into LLVM IR files, and scanning based on the LLVM IR files to obtain different types of core data, wherein the core data comprise file metadata, file content and file reading and writing directions. FIG. 2 shows the hierarchy of the file system in the Linux kernel, and lists at each level a portion of the dynamically verified number that can be used for the right-raising attack.
Specifically, as shown in FIG. 3, the kernel source code is compiled into LLVM IR files that are convenient for analysis, and then a static analysis tool is run to scan certain specific layers of the file system for non-control data representing different classes, i.e., core data.
The core data is mainly divided into file metadata, file contents and file reading and writing directions:
(1) And (3) file metadata, namely, an attacker can bypass the security check of the uppermost layer of the file system through tampering with the authority information of the file metadata, so as to acquire root authorities. As shown in FIG. 2, the abstract file system layer (e.g., virtual file system layer, generic block layer) hides the different implementations to provide an abstract interface, and the module selects the data structure of the rights data of that layer to represent the metadata of the file.
(2) File content, which represents user data stored in a file, exists in two ways, namely a disk and a memory (page buffer). The attacker directly writes the content of the read-only file by modifying the non-control data, so as to achieve the right-raising attack, such as modifying/etc/passwd content, and adding the attacker as a root user, thereby realizing the right-raising. The data structures representing the file content are thus selected in the page cache layer and the generic block layer, as shown in fig. 2, relatively independent of the different file system implementations and disk hardware drives.
(3) And in the read-write direction, file metadata and contents are stored on a disk and loaded into a memory when access is needed. The read-write operation penetrates through the whole file system, and is transmitted downwards from the topmost layer to the bottommost disk, so that data exchange is realized. The Linux kernel records unified data exchange events between disk and host in a generic block layer. The read operation can be changed into the write operation by modifying the read-write flag bit, and then the read-only file is written. Therefore, the universal block layer is selected to uniformly record the read/write event as core data when the data exchange is triggered.
Step S2, tracking data stream propagation and control stream propagation of the core data based on the LLVM IR file to obtain attacked non-control data;
Specifically, as shown in fig. 3, the core data is taken as a root of propagation, and the propagation process is tracked to discover more attacked non-control data. The three types of data propagation modes are different, and the data flow and the control flow of the core data are tracked through customizing the propagation rules, so that more potential attacked data are obtained through analysis. Specifically, the data structure of three types of core data in code has two forms (1) a flag variable, typically implemented by an integer, whose value marks some core state. Each bit of the flag variable represents a different state, and therefore only a logical operation (and or) is performed, and no arithmetic operation (add-subtract, multiply-divide) is involved. Thus, during the data stream propagation, the propagation algorithm only contains direct assignments or assignments after logic operations. In control flow propagation, one marker variable may influence the value of another marker variable by branching decisions. For example, a specific bit of a flag variable representing a right represents a legal right, the right is judged by branching, if the right is legal, the state of another flag variable is assigned as legal, otherwise, the state is marked as illegal. (2) Pointer references are used as indexes to find specific areas in the memory space. Unlike the flag variable, the information represented by the different bits of the flag variable is already hard coded in the core and does not change, so that values representing different states can be readily obtained. While the value of a pointer reference is dynamically changed, depending on where the target memory space is allocated. The value of the pointer is unique and uniquely points to a block of memory space. The data stream propagation process of tracking the value of pointer variables involves arithmetic operations (e.g., aligning a certain pointer) and not logical operations. Thus, the data streaming rules contain direct assignment operations and assignments after arithmetic operations. On the other hand, the value of the pointer is affected by the control flow, and in the branch judgment, the value of the pointer is decided according to the state of the flag variable in the condition judgment, and then the flag variable is regarded as new data to be propagated. This is because pointer variables are not normally present in the branch predicate conditions, only flag variables will be present therein for judging the state of the core. The propagation rule of the state variable is performed according to a first propagation rule (flag variable propagation). The above propagation process is recursive until no new data is found.
The tracking process is also applied to LLVM IR files, and by virtue of value flow analysis (such as PRACTICAL PROGRAM MODULARIZATION WITH TYPE-based DEPENDENCE ANALYSIS and STATICALLY DISCOVERING HIGH-Order TAINT STYLE Vulnerabilities in OS Kernels) based on type access paths, pointer analysis with huge cost is avoided.
Step S3, dynamically verifying the attacked non-control data and judging whether the attacked non-control data is the attacked data or not;
Specifically, step S2 results in a series of potentially attacked uncontrolled data sets, and in order to avoid interaction between the data, the invention extracts one data at a time from the sets until all the data has been validated. As shown in fig. 4, instrumentation is required before dynamic verification, which is divided into three steps, and the kernel and user mode program after LLVM instrumentation are required to be used cooperatively. The invention performs instrumentation on the positions of all the identification data in the kernel, but extracts one data at a time to verify, and only records and resets the value of the data when the kernel runs. The function of determining the data to be extracted and the function of recording and resetting the data are realized by a newly added system call, and the user mode program realizes dynamic verification by calling the function after the system call triggers the kernel instrumentation. Specifically, the invention self-defines a new system call, wherein the parameters of the system call are a certain data to be verified, and specific operations (including two types of record/reset) need to be carried out. In the process of inserting all data in advance, all data are numbered, so that the system call only needs to provide the number of the data to be verified, and parameters of specific operations are distinguished by integers and are used for steps S31-S32:
s31, according to the specific data, a user mode program executes legal file writing operation on a file with writing authority, triggers a using code of the data in a kernel, automatically records a target value of the data by a kernel instrumentation code, and stores the target value into an array maintained by the kernel;
S32, the user mode program tries to write malicious content into the read-only file, triggers the use code of the data in the kernel, takes out the value in the array, then writes the value into the data, and the kernel relocates the data at the moment and automatically resets the data to the value recorded in the last step.
S33, the user mode program re-reads the read-only file used in S32, checks whether the content is changed into the malicious content written in the previous step, if so, the data is the attackeable data, and collects the attackeable data into the attackeable data set.
After executing the steps S31-S33 on all the non-control data in the attacked non-control data set, reporting all the data in the attacked data set, including the data structure on which the allocation data depends, the specific offset position in the structure, and modifying the value of the offset position to the value recorded in the third step to cause writing into the read-only file, thereby achieving the right-raising attack and obtaining the attack surface of the non-control data in the kernel.
Step S4, protecting the data which can be finally influenced by the attack data in the disk;
as shown in FIG. 5, all the attackeable data are analyzed to obtain data representing file contents distributed in the bottommost layer of the kernel file system, namely, a disk, the data part comprises page buffers representing the file contents and disk block data, and it is worth noting that for file metadata and file read-write directions, an attacker bypasses the permission to write target file contents or modify the read-write directions to write the file contents, so that perfect protection can be realized only by protecting the file contents, and the attack of uncontrolled data on the file system is resisted.
The invention firstly provides systematic analysis of non-control data serving as an attack target in the kernel, automatically identifies data causing the right-raising attack, summarizes rules of the data and provides a corresponding protection scheme. The analysis range is a file system in the kernel, because the file plays an important role in kernel security, once the file access protection is destroyed, an attacker can easily initiate the right-raising attack, and the hierarchy of the file system is clear, so that semantic information of different layers can be analyzed conveniently. The method comprises the steps of dividing non-control data into a plurality of categories representing different semantics based on a multi-layer semantic structure of a file system, screening core non-control data representing each category according to characteristics of data semantics, statically tracking a semantic propagation process of the data, analyzing to obtain all potential attacked data in each category, automatically monitoring and simulating a process of attacking by an attacker to tamper with the attacked data by dynamic instrumentation, verifying to obtain data actually causing the right-lifting attack, wherein the identification result comprises all non-control data utilized by the current right-lifting attack, including dirty cow loopholes (vm_fault- > flags), dirty pipeline loopholes (pipe_buffer- > flags) and dirty right loopholes (file- > f_mode), and finally reporting all the attacked non-control data by adopting different protection schemes aiming at the data of different categories.
The application also provides an embodiment of the device for identifying and protecting the attacked data in the kernel file system, which corresponds to the embodiment of the method for identifying and protecting the attacked data in the kernel file system.
FIG. 6 is a block diagram illustrating an identification and guard for attackeable data in a kernel file system in accordance with an illustrative embodiment. Referring to fig. 6, the apparatus may include:
the scanning module 21 is configured to compile source codes of a kernel file system into LLVM IR files, and scan the LLVM IR files to obtain core data of different types, including file metadata, file content and file read-write directions;
A tracking module 22, configured to track data stream propagation and control stream propagation of the core data based on the LLVM IR file, so as to obtain attacked non-control data;
The dynamic verification module 23 is configured to dynamically verify the attacked non-control data, and determine whether the attacked non-control data is attacked data;
and the protection module 24 is used for protecting the data which is finally affected by the attacked data in the disk.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.
Accordingly, the present application also provides a computer program product comprising a computer program/instruction which, when executed by a processor, implements a method for identifying and protecting data in a kernel file system as described above.
Correspondingly, the application further provides electronic equipment, which comprises one or more processors, a memory and a control unit, wherein the memory is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the identification and protection method for the attacked data in the kernel file system. As shown in fig. 7, a hardware structure diagram of an apparatus with data processing capability according to any of the embodiments of the present application, where the apparatus with data processing capability is located, is provided for identifying and protecting data in a kernel file system, and besides the processor, the memory, and the network interface shown in fig. 7, any of the apparatuses with data processing capability in the embodiments generally includes other hardware according to the actual function of the any apparatus with data processing capability, which is not described herein.
Correspondingly, the application also provides a computer readable storage medium, wherein computer instructions are stored on the computer readable storage medium, and the computer instructions realize the identification and protection method for the attacked data in the kernel file system when being executed by a processor. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), an SD card, a flash memory card (FLASH CARD), etc. provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any device having data processing capabilities. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.