Disclosure of Invention
The embodiment of the application provides a process injection method and related equipment, which are used for solving the problem of process injection of binary files.
The first aspect of the embodiment of the application provides a process injection method, which comprises the following steps:
Obtaining a target thin binary file in a Mach-O format binary file, wherein the target thin binary file comprises a file header, a loading instruction section and a data section;
Analyzing the file header and the loading instruction section of the target thin binary file to obtain a blank area between the last loading instruction in the target thin binary file and the starting position of the data section;
generating a dynamic library dependent instruction according to the dynamic library information on which the Mach-O format binary file depends;
calculating the size of a target space address occupied by the dynamic library dependent instruction;
if the size of the target space address is larger than that of the blank area, acquiring a system architecture of the target thin binary file;
Calculating a target offset of the starting position of the data segment according to the system architecture and the size of the target space address;
Shifting the initial position of the data segment by the target offset to obtain an enlarged blank area;
And inserting the dynamic library dependent instruction into the increased blank area to realize process injection of the Mach-O format binary file.
Optionally, before shifting the start position of the data segment by the target shift amount to obtain the increased blank area, the method further includes:
Removing signature information of a loading instruction section in the target thin binary file;
after inserting the dynamic library dependency instruction into the increased blank region, the method further comprises:
Re-signing the target thin binary file into which the dynamic library dependency instruction is inserted.
Optionally, the calculating the target offset of the start position of the data segment according to the system architecture and the target space address size includes:
Determining the size of an aligned data block of a memory page of the system architecture according to the system architecture;
if the target space address size is an integer multiple of the aligned data block size, determining the target space address size as the target offset;
If the target space address size is a non-integer multiple of the aligned data block size, the target offset is determined to be greater than the target space address size and a minimum integer multiple of the aligned data block size.
Optionally, before acquiring the thin binary file in the Mach-O format binary file, the method further includes:
Analyzing a file header of the Mach-O format binary file to determine a file type of the Mach-O format binary file, wherein the file type comprises a fat binary file and a thin binary file, and the fat binary file comprises a plurality of thin binary files;
If the Mach-O format binary file is a fat binary file, respectively acquiring a plurality of thin binary files in the fat binary file;
determining any thin binary file in the plurality of thin binary files as the target thin binary file;
after inserting the dynamic library dependency instruction into the increased blank region, the method further comprises:
And updating the data value representing the size of the target thin binary file in the target thin binary file after the dynamic library dependent instruction is inserted according to the size of the dynamic library dependent instruction and the size of the target thin binary file before the dynamic library dependent instruction is inserted.
Optionally, the method further comprises:
Determining a first target thin binary file in the plurality of thin binary files;
And correspondingly increasing the offset of the file header of the Nth target thin binary file after the first target thin binary file by N-by-target offset according to the ordering of the target thin binary files, wherein N is more than or equal to 1.
Optionally, after inserting the dynamic library dependency instruction into the increased blank region, the method further comprises:
And increasing the offset of a target load instruction segment contained in the target thin binary file by the target offset, wherein the target load instruction segment is an instruction segment representing an address and an offset in a load instruction segment.
Optionally, after inserting the dynamic library dependency instruction into the increased blank region, the method further comprises:
And inserting a blank field at the starting position point of the data segment of the target thin binary file, wherein the size of the blank field is the target offset.
Optionally, after inserting the dynamic library dependency instruction into the increased blank region, the method further comprises:
And increasing the offset of the target data segment contained in the target thin binary file by the target offset, wherein the target data segment represents the data segment of the address or the pointer.
Optionally, the dynamic library dependent instruction includes an instruction name, an instruction size, and information of a dynamic library, wherein the information of the dynamic library includes a path of the dynamic library, or at least one of a timestamp of the dynamic library and a version number of the dynamic library, and the path of the dynamic library;
the calculating the target space address size occupied by the dynamic library dependent instruction comprises:
acquiring the instruction size of the dynamic library dependent instruction and the length size of the dynamic library path;
and calculating the size of the target space address according to the instruction size of the dynamic library dependent instruction and the length size of the dynamic library path.
A second aspect of an embodiment of the present application provides a process injection system, including:
The device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a target thin binary file in a Mach-O format binary file, and the target thin binary file comprises a file header, a loading instruction section and a data section;
The analyzing unit is used for analyzing the file header and the loading instruction section of the target thin binary file so as to obtain a blank area between the last loading instruction in the target thin binary file and the starting position of the data section;
the generation unit is used for generating a dynamic library dependent instruction according to the dynamic library information on which the Mach-O format binary file depends;
the calculating unit is used for calculating the size of the target space address occupied by the dynamic library dependent instruction;
The acquiring unit is further configured to acquire a system architecture of the target thin binary file when the target space address size is greater than the blank area;
The calculating unit is further configured to calculate a target offset of the start position of the data segment according to the system architecture and the target space address size;
An offset unit, configured to offset the start position of the data segment by the target offset amount, so as to obtain an increased blank area;
and the inserting unit is used for inserting the dynamic library dependent instruction into the increased blank area so as to realize process injection of the Mach-O format binary file.
A third aspect of an embodiment of the present application provides a computer apparatus, including:
The device comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a short-term memory or a persistent memory;
The central processor is configured to communicate with the memory and to execute instruction operations in the memory to perform the method of process injection of the first aspect.
A fourth aspect of the embodiments of the present application provides a computer-readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of process injection of the first aspect.
A fifth aspect of an embodiment of the application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of process injection of the first aspect.
According to the technical scheme, the method for process injection disclosed by the embodiment of the application has the advantages that firstly, the target thin binary file in the Mach-O format binary file is obtained, the target thin binary file comprises a file header, a loading instruction section and a data section, then the file header and the loading instruction section of the target thin binary file are analyzed to obtain a blank area between the last loading instruction and the starting position of the data section in the target thin binary file, then a dynamic library dependent instruction is generated according to the dynamic library information on which the Mach-O format binary file depends, secondly, the target space address occupied by the dynamic library dependent instruction is calculated, if the target space address is larger than the blank area, the system architecture of the target thin binary file is obtained, then, the target offset of the starting position of the data section is calculated according to the system architecture and the target space address, the starting position of the data section is offset by the target offset to obtain the increased blank area, and finally, the dynamic library dependent instruction is inserted into the increased blank area to realize the process injection of the Mach-O format binary file.
It can be seen that, in the embodiment of the present application, the process injection is implemented by inserting the dynamic library dependency instruction in the blank area between the load instruction and the data segment, and the method of inserting the dynamic library dependency instruction in the blank area between the load instruction and the data segment is not limited by the SIP mechanism, so that stability of the process injection and compatibility of the system are improved compared with the prior art.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted in advance that the present application has been applied to Mac unified endpoint management (UEM, unified endpoint management) sandboxed products, and the application scenario is application encapsulation. The ARM architecture macOS sandbox uses this scheme to insert dependency library instructions into the binary when importing an application or binary execution to implement application encapsulation and process injection. The process of dynamic library injection is called a security process, and data such as files in a working space can be accessed. Process injection is the basis of sandboxed products and is the premise that sandbox functions take effect. Sandboxes (Sandbox, also known as sandboxes) are a security mechanism that pertains to virtualization technology, provide isolated environments for executing programs, and restrict program access and modified files to isolated environments by redirection or the like. Sandboxes are typically used to isolate and protect operating systems, as well as between virtual environments and real environments, and between virtual environments. Wherein a dynamic link library (dynamic library) is a module containing functions and data that can be used by other applications or dynamic libraries. Dynamic linking allows a process to determine the export functions of a dependent library at load time or run time. The platform dynamic library is simply referred to as dylib at macOS. While application encapsulation refers to the re-signing of the application installation package (Android, iOS, macOS platform installation package program) and adding some new functionality features to the running package program. For convenience of description, the above nouns will not be explained later.
To facilitate description of differences between the method for implementing process injection with the original binary file in the embodiment of the present application, fig. 1 shows a Mach-O file structure diagram.
As shown in FIG. 1, mach-O files can be categorized into Fat binary (Fat binary) and Thin binary (Thin binary). Fat binary contains multiple architectures, at macOS platform, fat binary generally contains the x86_64 and arm64 architecture. The Fat file header (FAT HEADER) in Fat binary contains the information digest of the thin binary contained in the file (e.g., the data start or footprint size of each architectural binary, etc.). The thin binary contains information such as a file header (MACH HEADER), a Load instruction segment (Load Commands), a data segment (SEGMENT DATA), a Code Signature (Code Signature), and the like, wherein the Mach header of the thin binary indicates the number of Load instructions and the size of the occupied space. One or more Segment commands (Segment commands) are included in the load instruction Segment. After loading the effective content of the instruction segment and before loading the effective content of the data segment, the blank area into which the instruction can be inserted is formed. The binary of each architecture requires the insertion of dynamic library dependent instructions because the macOS system of the ARM platform can also run the binary of the x86 architecture based on Rosetta escaping. It should be noted that Rosetta is a binary compiler software, which is aimed at running the software developed by the Power PC platform on the Intel platform Mac computer, and is currently used for running the software developed by the Intel platform on the ARM architecture platform Mac computer. Load Commands are part of the executable file in macOS that is used to indicate the various information needed by the dynamic linker loader. Load Command contains a series of instructions describing various attributes and requirements of the program, such as specifying the entry point of the program, the location of the dynamically linked library, the executable code segment, the data segment, etc. These instructions are parsed by the dynamic linker at the time of executable file loading to determine the environment and resources required for program execution.
Based on the Mach-O file architecture described in fig. 1, a method of process injection in the embodiment of the present application is described below, specifically, for an original Mach-O binary file, a dynamic library dependent instruction is written, if the binary file has no space writing, the binary data after writing the location is shifted to leave enough space, and the shifted data is corrected. For Mach-O format, the Section name of the record binary dependency library is Load Commands, and the instruction name is LC_LOAD_ DYLIB. The application adds a dynamic library dependent instruction LC_LOAD_ DYLIB, the instruction content is the dynamic library path to be injected, and the instruction content is written into the blank area at the tail of the LOAD Commands of the target binary system. For most binary, there will be empty areas without content between sections due to memory alignment. If the blank area is large enough to accommodate the dynamic library dependent instructions to be written, then direct writing may be performed. With some exceptions, the tail of a small number of binaries Load Commands Section does not have enough blank space, and if an instruction is added forcibly, the binary valid data will be covered, resulting in abnormal or even no operation of the binary. In view of this situation, the present application proposes a binary data offset method, which offsets the data in the Load Commands area by a specified size, and enlarges the effective space of the Load Commands area, so that the tail has enough blank area to accommodate the new instruction. After the data is shifted, the shifted data is required to be corrected, and the main correction is that the data representing the pointer or the address in the data is prevented from being invalid in pointing. Before binary data modification, binary signature information is removed, and the data is re-signed after data modification.
For ease of understanding, please refer to fig. 2, fig. 2 is a flow chart of a process injection method according to an embodiment of the present application. Including steps 201-208.
201. And obtaining a target thin binary file in the Mach-O format binary file.
As can be seen from the description of FIG. 1, the dynamic library dependency instructions are mainly injected into the empty region between the load instruction segment and the data segment in the thin binary file. Thus, it is necessary to acquire a target thin binary file from among the Mach-O format binary files. It will be appreciated that the target thin binary file includes a file header, a load instruction segment, and a data segment, as shown in FIG. 1.
In one specific embodiment, in the binary file in the Mach-O format, if the binary file is a fat binary file, the fat binary file may include one or more thin binary files, where the file formats of the thin binary files are the same, and thus, the target thin binary file may be considered as any thin binary file of all thin binary files. Further, by traversing all the architectures in the Mach-O format binary file, all the target thin binary files in the fat binary file can be determined. It should be understood that any one thin binary corresponds to one system architecture.
In another specific embodiment, if the Mach-O format binary file is a thin binary file, the thin binary file is the target thin binary file in the embodiment of the present application.
202. And analyzing the file header and the loading instruction section of the target thin binary file to obtain a blank area between the last loading instruction and the starting position of the data section in the target thin binary file.
Based on step 201, the header and the load instruction segment of the target thin binary file may be parsed, thereby obtaining a blank region between the last load instruction in the target thin binary file and the start position of the data segment.
In this embodiment, the binary file parsing mainly includes a parse file header and an instruction loading section. Wherein the file header refers to the starting region of the binary file. For the thin binary file, since the file header MACH HEADER of the thin binary file is the starting point of the binary data of the architecture, the number of instructions and the size of the occupied space of the current architecture are recorded, and meanwhile, the file size of MACH HEADER is a fixed value, for example, 4k or 2k (the file size of MACH HEADER is not limited here), so that the number of corresponding loading instruction segments and the size of the occupied space thereof can be clearly known by analyzing the file header of the target thin binary file. Then, through the analyzed quantity and the occupied space size, all the load instructions in the load instruction segment can be traversed and analyzed. Thus, by parsing all load instructions, one of the load instructions is found, which is an instruction that records a data (_text) section of code (TEXT). So that the starting position of the code area can be obtained according to the instruction. Wherein the code region is the first data region of the data segment. Thus, by combining the number of the load instruction segments and the size of the occupied space thereof, and the positions of the corresponding code areas, as described in the above, the specific positions of the blank areas in the target thin binary file can be determined. It will be appreciated that the blank area before the code region and after the last instruction of the instruction segment is the dynamic library dependent instruction insertion location.
Thus, in combination with the above description, the target thin binary file format in the Mach-O format binary file is the instruction segment+blank area+code area, and the position before the last instruction reaches the code area is the blank area. The structure of the file header (also called the architectural header) and instruction segment is fixed, and its composition is described by a defined structure. The header mainly contains the supported CPU architecture, the number of instructions in the instruction section, the occupied space, and the like, and is also parsed to obtain the information. There are many types of instructions for the instruction SEGMENT, such as lc_load_ DYLIB, which is often referred to herein as describing a dependency library LOAD, lc_sense_64, which describes a data field (Section), lc_code_signal, which describes a SIGNATURE data field, etc., and what is relevant to the present application are lc_load_ DYLIB and lc_sense_64.
203. And generating a dynamic library dependent instruction according to the dynamic library information on which the Mach-O format binary file depends.
For the dynamic library information to be called, a dynamic library dependent instruction is generated according to the dynamic library information on which the Mach-O format binary file depends.
In one specific embodiment, since the dynamic library information cannot be directly inserted into the Mach-O format binary file, an instruction needs to be generated to call the dynamic library information, and the instruction is the dynamic library dependent instruction described in the foregoing description. Specifically, the dynamic library dependency instruction includes at least a call path of the dynamic library, and may include an instruction name, an instruction size, dynamic library information (time stamp or version number), or the like, in addition to the call path. In particular, the contents of the dynamic library dependency instructions are not limited herein.
204. The size of the target space address occupied by the dynamic library dependent instruction is calculated.
After the dynamic library dependent instruction is generated, the size of the space address occupied by the dynamic library dependent instruction can be calculated.
In one specific embodiment, the size of the space address occupied by the dynamic library dependent instruction may be calculated first, and then the corresponding path length of the dynamic library when the dynamic library is called may be calculated, so that the size of the target space address is determined according to the size of the space address occupied by the path length of the dynamic library and the size of the space address occupied by the dynamic library dependent instruction. That is, in calculating the target space address size, it is necessary to calculate not only the instruction size of the dynamic library dependent instruction but also the size of the dynamic library path length. For ease of understanding and description, please refer to the embodiment shown in fig. 6.
205. And when the size of the target space address is larger than that of the blank area, acquiring the system architecture of the target thin binary file.
After determining the size of the target space address, the size of the target space address and the size of the blank area can be determined. Therefore, if the size of the target space address is larger than that of the blank area, the system architecture of the target thin binary file is obtained, and the pairs Ji Da of the data blocks in the memory page are different because of different system architectures, so that the offset is different when the offset is calculated.
In one embodiment, when the target space address size is determined to be greater than the blank area, the forced insertion of the instruction and subsequent re-signing may cause the code region of the data segment to be overwritten, causing a binary operation exception, so that the code region and subsequent data needs to be shifted. Since the alignment sizes of the data blocks of the memory pages of different system architectures are different during the data offset process, the alignment sizes of the data blocks of the memory pages are considered during the data offset process. Therefore, a system architecture for obtaining the target thin binary file is required at this time. For example, the memory pages are aligned 4K for the x86_64 architecture, but aligned 16K for the arm64 architecture, and the process of determining the target offset according to the system architecture will be described in detail in the following steps, which are not repeated here.
In another specific embodiment, if the target space address size is not greater than the blank area, the dynamic library dependent instruction may be directly inserted into the blank area, so as to implement process injection for the Mach-O format binary file.
206. And calculating the target offset of the starting position of the data segment according to the system architecture and the target space address size.
Specifically, because the alignment of the data blocks in the memory pages is different in different system architectures, for example, the data blocks in the x86_64 architecture memory pages are aligned by 4K, the data blocks in the arm64 architecture memory pages are aligned by 16K, and the alignment of the existing data in the binary file is already performed according to the corresponding data block size during the writing process, the alignment problem of the inserted data is also considered when determining the target offset.
If the system architecture is x86_64, the size of the data block in the memory page is aligned by 4K, and if the calculated target space size is 7K, the determined target offset is 8K at least in consideration of the alignment size problem of the data block of the dynamic library dependent instruction to be inserted, and may be 12K or 16K as long as it is greater than 7K and is an integer multiple of 4K.
For ease of understanding, reference may also be made to the embodiment shown in fig. 4.
207. And shifting the starting position of the data segment by a target shift amount to obtain an increased blank area.
After determining the target offset, the target offset may be offset at the start position of the data segment, thereby obtaining an enlarged blank area.
In one specific embodiment, for the first data area of the data segment, i.e., the code area, a blank field needs to be added at the start of the data, where the size of the blank field is the target offset. The code region or instruction segment field representing the region size is not enlarged here because the purpose of the offset is to enlarge the blank region instead of enlarging the existing region, only the field representing the current architecture size.
208. And inserting the dynamic library dependent instruction into the increased blank area to realize process injection of the Mach-O format binary file.
After the increased blank area is obtained, a dynamic library dependent instruction can be inserted into the increased blank area to realize process injection of the Mach-O format binary file.
It can be seen that, in this embodiment, the process injection is implemented by inserting the dynamic library dependency instruction in the blank area between the load instruction and the data segment, and the method of inserting the dynamic library dependency instruction in the blank area between the load instruction and the data segment is not limited by the SIP mechanism, so that the stability of the process injection and the compatibility of the system are improved.
In one of the possible technical solutions, since the Mach-O binary file also has signature information, referring to fig. 3, fig. 3 is a schematic flow chart of another process injection method disclosed in the embodiment of the present application.
301. And removing signature information of the loading instruction segment in the target thin binary file.
Step 301 in this embodiment may be performed before step 208 based on the description of fig. 1 and 2.
Specifically, since the dynamic library dependency instruction is added on the premise of not changing the binary signature, verification fails because the hash value of the binary data block contained in the signature cannot be matched with the modified binary, and the dynamic library dependency instruction still cannot be operated. Because the signature information is positioned at the tail part of the loading instruction segment, the old signature information is inconvenient to remove after the instruction is added, and therefore, the binary signature information can be removed before the instruction is added.
In one particular embodiment, the removal of signature information located at the end of a load instruction segment may be accomplished by cleaning or modifying the information of the spatial address to which the signature information corresponds.
And after the removal of the signature information is completed, if a dynamic library dependency instruction is inserted in the increased blank area, execution of step 302 is triggered.
302. The target thin binary file with the inserted dynamic library dependency instructions is re-signed.
Because the dynamic library dependent instruction is inserted into the increased blank area, in order to ensure the correctness of the data, the embodiment of the application needs to re-sign the binary file inserted with the dynamic library dependent instruction so as to ensure the correctness of the later data verification.
Specifically, when signing a binary file, as an optional embodiment, a target thin binary file is generally cut according to a preset data block size, so as to calculate a hash value of each data block size by adopting an irreversible algorithm (such as a hash algorithm), and finally, calculate the hash value again for the hash value of each data block to obtain a final signature value.
Further, since there may be a plurality of thin binaries in the Mach-O format binary file, it is necessary to re-sign each thin binary file.
Because the embodiment of the application completes the removal of the old signature before the dynamic library dependent instruction is inserted into the target thin binary file, and re-executes the writing of the new signature after the dynamic library dependent instruction is inserted, the way of removing the old signature in advance improves the convenience of writing the new signature, thereby correspondingly improving the convenience of packaging the binary file in the embodiment of the application and also correspondingly avoiding the problem of process starting failure caused by signature verification failure.
Based on step 206, the following details the process of determining the target offset, please refer to fig. 4, fig. 4 is a refinement of step 206:
401. According to the system architecture, the aligned data block size of the memory pages of the system architecture is determined.
According to fig. 1, the fat binary file includes a plurality of target thin binary files, each thin binary file corresponds to one system architecture, and the Mach-O binary file is at least divided into x86_64 and arm64, and the pairs Ji Da of data blocks in memory pages of each system architecture are different, so that the alignment problem of data inserted into the dynamic library dependent instruction is considered, and the system architecture needs to be considered when determining the target offset.
In one embodiment, since the x86_64 architectural page aligned data blocks are 4K aligned, the arm64 architectural page aligned data blocks are 16K aligned.
402. When the target space address size is an integer multiple of the aligned data block size, then the target space address size is determined as the target offset.
The target space address size calculated in step 204 in the embodiment of fig. 2 is obtained and compared to determine whether the target space address size is an integer multiple of the aligned data block size. Thus, when the target space address size is an integer multiple of the aligned data block size, then the target space address size may be determined as the target offset.
For ease of understanding, it is assumed that the system architecture in the embodiment of the present application is x86—64, and the alignment size of the data block of the corresponding memory page is 4K, and if the target space size calculated in step 204 is an integer multiple of 4K, for example, 4K,8K,16K, & gt, 4NK (N is a positive integer), the target space address size is taken as the target offset.
403. When the target space address size is a non-integer multiple of the aligned data block size, then the target offset is determined to be greater than the target space address size and to be a minimum integer multiple of the aligned data block size.
Unlike step 402, in step 403 of this embodiment, when the target space address size is a non-integer multiple of the aligned data block size, then the target offset may be determined to be greater than the target space address size and the smallest integer multiple of the aligned data block size.
In one specific embodiment, for example, for the system architecture of x86_64, since the aligned data block size is 4K, when the target space address size is 6K, in order to achieve that the offset is a non-integer multiple of the aligned data block size (4K), at this time, 4kx2=8k >6K, the target offset may be determined to be 8K. It will be appreciated that 8K is the smallest integer multiple of the aligned data block size 4K that is greater than the target space address size 6K.
In addition, for the system architecture of arm64, since the aligned data block size is 16K, when the target space address size is 42K, in order to achieve that the offset is an integer multiple of the aligned data block size (16K), there is 16k×3=48K >42K, and the target offset can be determined to be 48K. It will be appreciated that 48K is the smallest integer multiple of the aligned data block size 16K that is greater than the target space address size 42K.
According to the process injection method disclosed by the embodiment, the target offset of the target thin binary file is determined, so that the dynamic library dependent instruction can be inserted into the blank area in a minimum limiting mode (the memory of the binary file is reduced as much as possible), the follow-up binary data offset and offset data correction are completed, and the feasibility of the scheme is improved.
For the Mach-O binary file, there may be a plurality of thin binary files in the fat binary file, so that after the first target thin binary file is shifted, the later plurality of target thin binary files are sequentially shifted backward, and a process for correcting the shift amounts of the plurality of target thin binary files after the first target thin binary file is described below, please refer to fig. 5, and fig. 5 is a flow chart of another process injection method disclosed in the embodiment of the present application.
501. A first target thin binary file of the plurality of thin binary files is determined.
As shown in fig. 1, when the Mach-O binary file is a fat binary file, since the fat binary file generally includes a plurality of thin binary files, in order to facilitate offset and offset data correction of a subsequent plurality of thin binary data, it is necessary to determine a first target thin binary file of the plurality of thin binary files. It is to be understood that the first target thin binary file may be understood as the first thin binary file located under the file header of the fat binary file.
In one specific embodiment, referring to fig. 1, the Thin Binary1 in fig. 1 can be understood as the first Thin Binary file, i.e., the first target Thin Binary file described in the foregoing description. Specifically, the Fat header of the Fat binary file is analyzed, wherein the Fat header contains the information abstract of the thin binary contained in the Mach-O format binary file. Wherein, the information abstract records the data starting point and the occupied space size of each architecture binary in the fat binary file. Specifically, the data start point and the occupied space size of each architecture binary may be understood as the data start point of each thin binary file and the occupied space size of each thin binary file, and the file header of each thin binary file is written with a corresponding system architecture. Also FAT HEADER is the starting point for fat binary data. It should be noted that, after the data start point of each thin binary file and the occupied space of each thin binary file are defined, the arrangement sequence of each thin binary file can be clearly known.
502. Aiming at a plurality of target thin binary files behind the first target thin binary file, according to the ordering of the plurality of target thin binary files, correspondingly increasing the offset of the file header of the Nth target thin binary file behind the first target thin binary file by N target offset.
Based on step 501, for a plurality of target thin binary files subsequent to the first target thin binary file, the offset of the file header of the nth target thin binary file subsequent to the first target thin binary file may be correspondingly increased by n×target offset according to the ordering of the plurality of target thin binary files.
In one specific embodiment, after determining the target offset, if the first thin binary file is inserted into the dynamic library dependency instruction, the offset is modified sequentially for a plurality of target thin binary files after the first target thin binary file.
For example, the first target thin binary file is offset by the target offset, that is, fills the corresponding blank area, if the size of the filled blank area is 16K, the offset of the second architecture MACH HEADER needs to be increased by 16K, the offset of the third architecture MACH HEADER needs to be increased by 32K, and so on, the offset of the file header of the nth target thin binary file located after the first target thin binary file needs to be increased by n×target offset (N is a positive integer). When the data representing the address/pointer/offset and the like in each architecture is corrected, the data in each architecture needs to be increased by 16K uniformly, because the data in each architecture is offset calculated relative to the position of the architecture MACH HEADER. See in particular the embodiment shown in fig. 7.
By the process injection method disclosed by the embodiment, each target thin binary file in the Mach-O format binary file needs to be offset according to the related offset, so that the problems of path errors, call errors and the like when environment variables are subsequently called are reduced as much as possible, and the accuracy of the binary file and the operation of a dynamic library dependent instruction thereof is ensured.
Since the target space size depends on the instruction size of the dynamic library dependent instruction, a process for calculating the dynamic library dependent instruction size will be described with reference to fig. 6, and fig. 6 is a flowchart illustrating another process injection method according to an embodiment of the present application.
601. The instruction size of the dynamic library dependent instruction and the length size of the dynamic library path are obtained.
Specifically, since the dynamic library dependent instruction includes at least the path of the dynamic library, it may further include
The method comprises the steps of providing an instruction name, an instruction size and information of a dynamic library, wherein the information of the dynamic library comprises a path of the dynamic library, or at least one of a timestamp of the dynamic library and a version number of the dynamic library, and the path of the dynamic library. Therefore, to calculate the target space address size occupied by the dynamic library dependent instruction, the instruction size of the dynamic library dependent instruction and the length size of the dynamic library path need to be obtained.
In one specific embodiment, the instruction size of the dynamic library dependent instruction includes at least the size of the space address occupied by the instruction name and the information of the dynamic library. Because there is a corresponding call dynamic library path when the dynamic library is called, the length of the dynamic library path also needs to be obtained at this time.
It will be appreciated that in calculating the dynamic library dependency instruction size, the dynamic library path length needs to be calculated. Meanwhile, since the dynamic library path exists in the memory page in the addr form, the multiple of the rounded integer of 8 is required to be used as the final instruction size, which is not described in detail herein.
602. And calculating the size of the target space address according to the instruction size of the dynamic library dependent instruction and the length size of the dynamic library path.
After the instruction size of the dynamic library dependent instruction and the length size of the dynamic library path are obtained, the target space address size can be calculated.
In one particular embodiment, the contents of the data segment may be otherwise covered in order to ensure that there is still room after the instruction is inserted and the signature information is added. Therefore, the size of the space address occupied by the instruction size of the dynamic library dependent instruction and the length size of the dynamic library path calling the dynamic library need to be comprehensively considered, so that the final target space address size is calculated.
According to the process injection method disclosed by the embodiment, the instruction size of the dynamic library dependent instruction and the length size of the dynamic library path are calculated, and the final target space address size is determined, so that the allowance is still reserved after the instruction is inserted and signature information is added, and the operation correctness of the dynamic library dependent instruction and the binary file is ensured.
After the dynamic library dependent instruction is inserted into the increased blank area, the data of the instruction loading section or the data area and the code area needs to be modified, please refer to fig. 7, fig. 7 is a flow chart of another process injection method disclosed in the embodiment of the present application.
701. And analyzing the file header of the Mach-O format binary file to determine the file type of the Mach-O format binary file.
Specifically, since the file type of the binary file is recorded in the file header, the file type of the Mach-O format binary file can be determined by parsing the file header of the Mach-O format binary file. It is to be understood that the file types include a fat binary file and a thin binary file, and the fat binary file includes a plurality of thin binary files.
In one specific embodiment, since the header of different types of binary files is different, for example, the header of a fat binary file is FAT HEADER, and the header of a thin binary file is MACH HEADER, the header of the Mach-O format binary file is parsed, so that the type of the file of the Mach-O format binary file can be determined by the type of the header.
702. When the Mach-O format binary file is a fat binary file, a plurality of thin binary files in the fat binary file are respectively acquired.
Thus, when the Mach-O format binary file is a fat binary file, a plurality of thin binary files in the fat binary file are respectively acquired.
In one specific embodiment, as shown in fig. 1, when the first header of the parsed Mach-O format binary file is FAT HEADER, it may be determined that the Mach-O format binary file is a fat binary file, so that a plurality of thin binary files are determined by the information digest of the thin binary file contained in FAT HEADER.
It will be appreciated that when the Mach-O format binary file is MACH HEADER, then the current binary file may be determined to be a thin binary file.
703. Any thin binary file of the plurality of thin binary files is determined to be the target thin binary file.
After obtaining the plurality of thin binaries based on step 702, any one of the plurality of thin binaries may be determined to be the target thin binary.
In one specific embodiment, since the ways of determining whether the blank area is enough and analyzing the loading instruction segment are approximate in the plurality of thin binary files, the executing step of any one thin binary file can be used as the reference of the executing steps of other thin binary files, and thus any thin binary file can be determined as the target thin binary file.
704. And obtaining a target thin binary file in the Mach-O format binary file.
705. And analyzing the file header and the loading instruction section of the target thin binary file to obtain a blank area between the last loading instruction and the starting position of the data section in the target thin binary file.
706. And generating a dynamic library dependent instruction according to the dynamic library information on which the Mach-O format binary file depends.
707. The size of the target space address occupied by the dynamic library dependent instruction is calculated.
708. And when the size of the target space address is larger than that of the blank area, acquiring the system architecture of the target thin binary file.
709. And calculating the target offset of the starting position of the data segment according to the system architecture and the target space address size.
710. And shifting the starting position of the data segment by a target shift amount to obtain an increased blank area.
711. And inserting the dynamic library dependent instruction into the increased blank area to realize process injection of the Mach-O format binary file.
It should be noted that, steps 704 to 711 are similar to those described in the embodiment of fig. 2, and are not repeated here.
712. And updating the data value representing the size of the target thin binary file in the target thin binary file after the dynamic library dependent instruction is inserted according to the size of the dynamic library dependent instruction and the size of the target thin binary file before the dynamic library dependent instruction is inserted.
And after the dynamic library dependent instruction is inserted into the increased blank area, the data value representing the size of the target thin binary file in the target thin binary file after the dynamic library dependent instruction is inserted can be updated according to the size of the dynamic library dependent instruction and the size of the target thin binary file before the dynamic library dependent instruction is inserted.
In one embodiment, the target thin binary file expands the target space address size and, at the same time, inserts dynamic library dependent instructions, so that the target thin binary file size changes. Thus, the data value representing the target thin-binary file size in the target thin-binary file after the dynamic library dependency instruction is inserted needs to be updated. Wherein it is to be understood that the corresponding updated value should include the size of the dynamic library dependent instruction.
713. And increasing the offset of the target load instruction segment contained in the target thin binary file by a target offset.
After the target receiving binary file of the inserted dynamic library dependent instruction is obtained, because the target thin binary file contains the instruction segment of the address and the offset, the offset of the target loading instruction segment contained in the target thin binary file needs to be increased by the target offset. Here, the target load instruction segment is an instruction segment that represents an address and an offset in the load instruction segment.
In one embodiment, after determining the target offset, since the thin binary file of each architecture is offset by the target offset in the blank area, data correction is also required for the data in the instruction load segment. The type of correction includes data representing address and offset in the instructions such as offset and addr, and the correction is performed by adding the target offset to the original value.
714. And inserting a blank field at the starting position point of the data segment of the target thin binary file.
In order to obtain the increased blank area, the embodiment of the application can insert a blank field at the starting position point of the data segment of the target thin binary file. The size of the blank field here is the target offset.
In one specific embodiment, for the first data area, i.e., the code area, a blank field needs to be added at the start point of the data, where the size of the blank field is the target offset. The code region or instruction segment field representing the region size is not enlarged here because the purpose of the offset is to enlarge the blank region instead of enlarging the existing region, the only field enlarged being the field representing the current architecture size.
715. And increasing the offset of the target data segment contained in the target thin binary file by a target offset.
In addition to this, it is necessary to correct the offset of the target data segment, where the target data segment is a data segment representing an address or pointer.
In one particular embodiment, the data representing the pointer/address is uniformly modified for the data of the code region and the data region. Such as a general DATA area (e.g., _ got, _la_symbol_ptr), an OC DATA area (e.g., _ objc _ selrefs, _ objc _ protorefs _data), a virtual DATA area (e.g., _ objc _const, _data), etc., the method of correction is typically to add a target offset to the original value. The corrected area range is different according to different programming languages, such as C/C++ (Core Foundation), objective-C, swift and the like, and if binary offset written in the relevant language is required to be supported, offset correction of the corresponding language data area is required to be realized.
By the method for process injection disclosed in this embodiment, it is to be understood that process injection is the basis of the function of the UEM sandbox and is a precondition for implementing a sandbox security mechanism. After process injection, isolation and secure access of sandboxed confidential data may be achieved through the injected code. The method for realizing process injection by inserting the dynamic library dependent instruction can replace the early method based on environment variable injection. After an ARM architecture machine and an SIP mechanism are currently launched by apple companies, the method based on environment variable injection cannot be effective, and even the macOS kernel layer directly fails all environment variables at the beginning of DYLD _, so as to limit the tampering of an application layer to a process. The method for inserting the dynamic library instruction can be normally used on macOS new and old systems, and has good stability and system compatibility.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
The method for performing injection in the embodiment of the present application is described in detail, and the following describes a system for performing process injection in the embodiment of the present application, referring to fig. 8, one embodiment of the system for performing process injection in the embodiment of the present application includes:
An obtaining unit 801, configured to obtain a target thin binary file in a Mach-O format binary file, where the target thin binary file includes a file header, a load instruction section, and a data section;
the parsing unit 802 is configured to parse a header and a load instruction segment of the target thin binary file to obtain a blank area between a last load instruction and a start position of a data segment in the target thin binary file;
a generating unit 803, configured to generate a dynamic library dependency instruction according to the dynamic library information on which the Mach-O format binary file depends;
A calculating unit 804, configured to calculate a target space address size occupied by the dynamic library dependent instruction;
the obtaining unit 801 is further configured to obtain a system architecture of the target thin binary file when the target space address size is greater than the blank area;
The calculating unit 804 is further configured to calculate a target offset of the start position of the data segment according to the system architecture and the target space address size;
An offset unit 805, configured to offset the start position of the data segment by a target offset amount, so as to obtain an increased blank area;
an inserting unit 806, configured to insert the dynamic library dependency instruction into the increased blank area, so as to implement process injection on the Mach-O format binary file.
Illustratively, the system further includes a removal unit 807 and a signature unit 808;
a removing unit 807 configured to remove signature information of the load instruction segment in the target thin binary file;
a signing unit 808 for re-signing the target thin binary file into which the dynamic library dependency instruction is inserted.
The system further comprises, illustratively, a determination unit 809;
A determining unit 809, configured to determine, according to the system architecture, an aligned data block size of a memory page of the system architecture;
A determining unit 809, further configured to determine the target space address size as a target offset when the target space address size is an integer multiple of the aligned data block size;
The determining unit 809 is further configured to determine the target offset to be greater than the target space address size and to be a minimum integer multiple of the aligned data block size when the target space address size is a non-integer multiple of the aligned data block size.
Illustratively, the system further includes an update unit 810;
The parsing unit 802 is further configured to parse a header of the Mach-O format binary file to determine a file type of the Mach-O format binary file, where the file type includes a fat binary file and a thin binary file, and the fat binary file includes a plurality of thin binary files;
the obtaining unit 801 is further configured to, when the Mach-O format binary file is a fat binary file, respectively obtain a plurality of thin binary files in the fat binary file;
A determining unit 809 for determining any one of the plurality of thin binary files as a target thin binary file;
The updating unit 810 is configured to update a data value representing a target thin binary file size in the target thin binary file after the dynamic library dependency instruction is inserted according to the size of the dynamic library dependency instruction and the size of the target thin binary file before the dynamic library dependency instruction is inserted.
Illustratively, the system further includes an augmentation unit 811;
a determining unit 809, configured to determine a first target thin binary file from the plurality of thin binary files;
An adding unit 811, configured to correspondingly add, for a plurality of target thin binary files subsequent to the first target thin binary file, an offset of a header of an nth target thin binary file subsequent to the first target thin binary file by an N-th target offset according to the ordering of the plurality of target thin binary files, where N is greater than or equal to 1.
Illustratively, the system further comprises:
the adding unit 811 is further configured to add a target offset to an offset of a target load instruction segment included in the target thin binary file, where the target load instruction segment is an instruction segment that represents an address and an offset in a load instruction segment.
Illustratively, the system further comprises:
The inserting unit 806 is further configured to insert a blank field at a starting position point of the data segment of the target thin binary file, where a size of the blank field is the target offset.
Illustratively, the system further comprises:
The adding unit 811 is further configured to add a target offset to an offset of a target data segment included in the target thin binary file, where the target data segment is a data segment representing an address or pointer.
Illustratively, the dynamic library dependency instruction includes an instruction name, an instruction size, and information of the dynamic library, wherein the information of the dynamic library includes a path of the dynamic library, or at least one of a timestamp of the dynamic library and a version number of the dynamic library, and the dynamic library path, the system comprising:
an obtaining unit 801, configured to obtain an instruction size of the dynamic library dependent instruction and a length size of the dynamic library path;
The calculating unit 804 is specifically configured to calculate the target space address size according to the instruction size of the dynamic library dependent instruction and the length size of the dynamic library path.
Referring to fig. 9, a schematic structural diagram of a process injection apparatus according to an embodiment of the present application includes:
a central processor 901, a memory 905, an input/output interface 904, a wired or wireless network interface 903, and a power supply 902;
Memory 905 is a transient memory or persistent memory;
The central processor 901 is configured to communicate with the memory 905 and to execute the operations of instructions in the memory 905 to perform the method of process injection in any of the embodiments described above with respect to fig. 2-7.
The embodiment of the application also provides a chip system, which is characterized in that the chip system comprises at least one processor and a communication interface, the communication interface and the at least one processor are interconnected through a line, and the at least one processor is used for running a computer program or instructions to execute the method of process injection in the embodiment shown in any one of the foregoing fig. 2 to 7.
Embodiments of the present application also provide a computer readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of process injection in the embodiments of any of the preceding figures 2 to 7.
Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of process injection of the embodiments of any of the preceding figures 2 to 7.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a usb disk, a removable hard disk, a read-only memory (ROM), a random-access memory (RAM, random access memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.