CN102043915B

CN102043915B - Method and device for detecting malicious code contained in non-executable file

Info

Publication number: CN102043915B
Application number: CN2010105317170A
Authority: CN
Inventors: 郭小春; 张永光; 吴鸿伟
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Information Security Research Institute Co Ltd
Priority date: 2010-11-03
Filing date: 2010-11-03
Publication date: 2013-01-23
Anticipated expiration: 2030-11-03
Also published as: CN102043915A

Abstract

The invention discloses a method and device for detecting a malicious code contained in a non-executable file. The method comprises the following steps: opening a file to be checked, and reading and decoding effective content data to a memory; starting to check the data from the very beginning, and determining that the file contains a malicious code if a certain amount of data in the current position can conform to the form of an effective CPU (Central Processing Unit) instruction code block; if the data in the position does not conform to the form of the effective CPU instruction code block, moving to the next data position, and checking; and if the effective CPU instruction code block is not found after checking all the data, determining that the file does not contain a malicious code. The invention judges whether a non-executable file is injected with a malicious code by searching the non-executable file for an executable instruction code block. Compared with other detection methods, the detection method provided by the invention has higher recognition rate, can not be easily modified and bypassed by anti-antivirus and can find out the malicious code bound in a non-executable file using unknown vulnerabilities or vulnerabilities which can not be triggered, thereby being capable of acting as an effective complement to the existing detection methods.

Description

Method and device for detecting malicious codes contained in non-executable file

Technical Field

The present invention relates to a method for detecting computer security, and more particularly, to a method and apparatus for detecting malicious codes contained in a non-executable file.

Background

The development of internet security technology has greatly reduced the direct remote attack. And by sending files containing malicious codes, indirect attack behaviors of luring an attack target to open for invasion become a main attack form. Making network and computer security a growing problem. Due to the development of the prior art and the increased awareness of users of the network, indirect attacks using executable files containing malicious code directly have been largely impossible. However, in computers, systems and various application software have unknown security vulnerabilities, such as unknown overflow vulnerabilities, which often allow malicious code to be bound to non-executable files, such as documents, spreadsheets, pictures, and the like. Such files are easily opened due to trust in the files, resulting in intrusion of malicious code therein.

Therefore, it is also an important issue for internet security to effectively detect the security of a non-executable file and determine whether the non-executable file contains malicious code. The traditional method based on the characteristic code killing is easily bypassed by modifying the characteristic code killing-free. The behavior detection based approach also fails when the vulnerability has been fixed or the vulnerability-dependent system environment is not satisfied to trigger. In addition, the method does not need to depend on the system environment, so that the method can be used when general computer users detect harmful files and can also effectively work when professional security companies screen a large number of samples.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method and a device for detecting malicious codes in a non-executable file.

The technical scheme adopted by the invention for solving the technical problems is as follows: a method for detecting malicious codes contained in a non-executable file comprises the following steps:

opening the file to be checked, reading and decoding the effective content data to the memory;

starting from the beginning of data, checking whether the data contain valid CPU instruction code blocks, and if a certain amount of data at the current position contain valid CPU instruction code blocks, considering that the file contains malicious codes;

if the position data does not contain a valid CPU instruction code block, moving to the next data position for continuous checking;

if all data is checked and no valid CPU instruction code block is found, the file is considered to contain no malicious code.

The process of checking whether the data contains the valid CPU instruction code block comprises the following steps:

detecting whether the data is provided with a rule for executing a code block of instructions rather than a piece of non-executable data;

and (3) taking the data as a code, and using an instruction simulation executive program to execute the data, wherein the data can be legally operated within a specified limited time or a specified limited instruction number without errors.

The process of executing by using the instruction simulation executive program comprises the following steps:

a. judging whether the end of the file is reached, if so, ending, otherwise, reading a block of data from the file;

b. judging whether the end of the data block is reached, if so, returning to the step a, otherwise, reading an instruction from the data block;

c. judging whether the instruction is in a preset illegal instruction form; if yes, returning to the step b, otherwise, continuing the next step;

d. analyzing the instruction, and copying the instruction to a buffer area;

e. calling a general processing process to process the codes in the buffer, and in the code processing process in the buffer, if no error occurs in operation under a specified rule and a preset malicious code instruction form is met, determining that the file contains an effective CPU instruction code block, and ending the simulation; otherwise, returning to the step b.

The preset illegal instruction form comprises the following steps: any of the instructions uses uninitialized registers, the instruction performs a meaningless operation, the instruction is a privileged instruction, the instruction contains an invalid memory address, the instruction contains an infrequent instruction, or the instruction contains other predefined instructions.

The preset malicious code instruction form comprises: the code contains a decryption operator, the code contains code relocation, the code contains a springboard, the code contains API import or the code contains other rules predefined by users.

The invention relates to a method for detecting malicious codes contained in a non-executable file, wherein the rule for detecting whether the data is provided with executable instructions is as follows: the data block which does not contain undefined instruction, privileged instruction, address invalid instruction, instruction for destroying operating environment or other specific instruction forms conforms to the rule of the instruction code block; the code structure resembles the techniques used by common malicious code.

The invention relates to a method for detecting malicious codes contained in a non-executable file, which executes a rule capable of legally running in limited time and limited instruction number by taking data as a code, and comprises the following steps: the execution of the instruction simulation executive program (virtual machine) which can interpret most common instructions begins to interpret the initial part of data which accords with the rule of executable instructions, and within a specified instruction number or time, the execution can be normally realized without the technology used for executing the instruction which causes error exception or finding that the code function is similar to common malicious code. The data block is deemed to conform to the rules of the instruction code block.

The invention relates to a method for detecting malicious codes contained in a non-executable file, wherein the code structure or the function of the method is similar to the common malicious codes and uses techniques including but not limited to malicious code decryption operators, malicious code relocation, malicious code springboard, malicious code API import and the like.

The invention relates to a method for detecting malicious codes contained in a non-executable file, which aims at the non-executable file used when an intruder binds the malicious codes to the non-executable file by utilizing a vulnerability for carrying out deceptive intrusion. The intrusion behavior comprises documents, picture bundle horses and other intrusion behaviors which take non-executable files as attack tools.

The detection method for the malicious codes contained in the non-executable file does not depend on the invasion vulnerability type used by the malicious codes and the binding of the non-executable file, so that the non-executable file containing the malicious codes and using unknown vulnerabilities can be effectively detected.

The invention relates to a device for detecting malicious codes contained in a non-executable file, which comprises:

a file opening device for opening the file to be checked, reading and decoding the effective content data to the memory;

a valid CPU instruction code block detection device for checking whether the data contains a valid CPU instruction code block and outputting the detection result;

an output display device for displaying the display result output by the valid CPU instruction code block detection device, outputting and displaying alarm information when the detected file contains a valid CPU instruction code block, and outputting information not containing malicious code when the detected file does not contain a valid CPU instruction code block;

the output of the file opening device is connected to the input of the effective CPU instruction code block detection device; the output of the valid CPU instruction code block detection means is connected to the input of the output display means.

The effective CPU instruction code block detection device comprises a first detection device and a second detection device, wherein the first detection device is used for detecting whether the data of a file is provided with a rule of an executable instruction code block instead of a section of unexecutable data block, the second detection device is used for executing the data as a section of code by using an instruction simulation execution program and whether the data can legally run within a specified limited time or a specified limited instruction number without errors, the input of the first detection device is connected to the output of the file opening device, the output of the first detection device is connected to the input of the second detection device, and the output of the second detection device is connected to the input of the output display device.

The method has the advantages that the detection effectiveness is irrelevant to the type of the vulnerability because whether malicious codes exist in the non-executable file is detected instead of detecting the specific vulnerability type used by the malicious codes, and the non-executable file which uses unknown vulnerability and contains the malicious codes can be effectively detected. The method has the advantages of playing a role in discovering and defending harmful non-executable files and ensuring the use safety of the computer. Compared with other detection methods, the detection method has high identification rate, is not easy to be bypassed by killing-free modification, can discover the binding malicious code of the non-executable file which uses unknown vulnerability or can not trigger the vulnerability, and can be used as an effective supplement of the existing detection method.

The invention is further explained in detail with the accompanying drawings and the embodiments; however, the method and apparatus for detecting malicious codes contained in a non-executable file according to the present invention are not limited to the embodiments.

Drawings

FIG. 1 is a flow chart of the detection steps of the present invention;

FIG. 2 is a schematic diagram of a comparison between the data content of a non-executable file bound with malicious code and the data content of a normal non-executable file;

FIG. 3 is a diagrammatic representation of differences in the form of a generic block of data and a block of executable instruction code;

FIG. 4 is a flowchart of an instruction simulation routine;

fig. 5 is a schematic view of the apparatus of the present invention.

Detailed Description

In an embodiment, referring to the attached drawings, a method for detecting malicious codes contained in a non-executable file of the present invention includes the following steps:

FIG. 1 is a flow chart of the detection steps of the present invention;

firstly, reading a file into a memory and decoding; then, whether a valid instruction code segment (i.e., a valid CPU instruction code block) exists is analyzed from the current position; judging, if yes, outputting warning information, namely considering that the file contains malicious codes, and then ending; if the current position does not exist, moving to the next position; then, judging whether the file is ended or not, and if the file is ended, finishing the detection; if the end of the file is not reached, the method returns to the analysis of whether a valid instruction code segment exists or not from the current position (the current position at the moment is the next position moved), and continues the detection.

The invention discloses a method for detecting whether a non-executable file contains a malicious code, wherein the process of checking whether data contains a valid CPU instruction code block comprises the following steps:

In the detection process of the invention:

the file being checked is opened, read and decoded, and some file formats require data to be encoded in some way, such as Flash animation files, which may be compressed. So that the decoding must be performed to read the valid content data to the memory. Therefore, other valid content data except the necessary format data in the file enter the memory.

As shown in fig. 2, a location in the non-executable file containing malicious code contains code fragments that can be properly executed by the CPU, which are ShellCode. The code may be executed to decrypt or load more code to perform various malicious activities.

The data is detected from the beginning byte of the data, and whether a segment of data from the beginning of the current byte position is a valid segment of data or not is judged, and the judgment can be correctly executed by the CPU. FIG. 3 is a diagram illustrating differences between a general data block and an executable instruction code block;

as shown in fig. 3, the differences are easily seen by a binary code analyst. The CPU instruction code segments that can be efficiently executed all initialize registers well when accessing data using register addressing. It has coherent logic, good rules, and does not contain various codes that cause error exceptions.

The invalid data segment uses the uninitialized register arbitrarily, performs various meaningless operations, and rewrites the register value arbitrarily, resulting in operation-induced errors.

Therefore, the rule for detecting whether the data is provided with the executable instruction according to the present invention means: the data block which does not contain undefined instruction, privileged instruction, address invalid instruction, instruction for destroying operating environment or other specific instruction forms conforms to the rule of the instruction code block; the code structure resembles the techniques used by common malicious code.

These codes that raise error exceptions include but are not limited to:

privileged instructions that require a high system privilege level to be executable, typically for driver level software use; the programs for processing files are all user-level programs, so that the privileged instruction can cause error exception;

instructions comprising invalid address storage, such instructions operating on invalid addresses, execution of which must cause an exception; when checking such instructions, an effective address space needs to be specified according to the memory layout definition of an operating system;

undefined instruction, which is an undefined instruction of the CPU, which cannot be executed by the CPU and causes errors and exceptions; other instructions which cause errors and exceptions, and other instructions which are rarely used in actual work but do not cause exceptions can also be regarded as undefined instructions;

some instructions, although causing an exception, should be treated as a jump instruction if the exception handler is set in the beginning of the code.

The invention discloses a method for detecting whether a non-executable file contains malicious codes, which comprises the following two steps of starting from the current data position to detect whether a code segment contains an effective CPU instruction or not:

in the first step, a certain number of instruction codes (e.g., 512 bytes of instructions, or 512 instructions) are disassembled step by step starting from the current address location. These instructions are analyzed to determine whether they contain an instruction that causes a fault exception. If the current location data does not satisfy the rule, the location is discarded and the current data location is moved to the next byte. In most cases the first instruction to disassemble is an invalid instruction and can therefore be moved quickly.

And secondly, if the instruction causing the error exception is not found in the last detection, further simulating the execution code for a limited number of steps (such as executing 512 instructions) from the current data position by using an instruction simulation execution program. The instruction-emulating executive is a CPU virtual device that need not be a complete virtual machine implementation, but must contain a basic CPU and memory architecture. The simulation system can interpret most common CPU instructions and effectively simulate the change of CPU register and memory states according to the instructions, thereby being capable of simulating and executing a long instruction section in a nearly continuous mode. As shown in fig. 4, the instruction simulation executive program is used to start execution from the current position, and it is determined that the instruction can be normally executed in a limited step. If the current location data does not satisfy the rule, the location is discarded and the current data location is moved to the next byte.

In the detection process, the accuracy can be improved according to the common structural and functional characteristics of the malicious codes. These structural and functional features include, but are not limited to: malicious code decryption operator, malicious code relocation, malicious code springboard and malicious code API import. Taking the introduction of the malicious code API as an example, since the malicious code needs to obtain addresses of various system interface API functions when malicious behavior is required, common malicious codes all have a code block with the purpose of introducing the API function. If the instruction simulation executive program finds that a piece of code can load the API function address when simulating the execution of the code section, the code section can be considered to have the API import function characteristic of malicious code.

If a data block meeting the detection rules is found in the detection, the file is likely to contain malicious code. And (4) warning is provided for the user, and virus analysis personnel can further analyze the user to find the vulnerability. Ordinary computer users may take care of or open in a secure environment (such as a sandbox or virtual machine).

The invention relates to a method for detecting malicious codes contained in a non-executable file, which adopts an instruction simulation executive program, namely a simple virtual machine. The virtual machine is a software simulated CPU which can be fetched, decoded and executed like a real CPU, and can simulate the result obtained by running a piece of code on the real CPU. The workload is enormous considering that the CPU is to be completely simulated. Our goal is to simulate only the instructions that are common to most malicious code, so the contents of the simulation are only a small subset of them. Thus, the virtual machine is designed as a buffered code virtual machine.

An instruction is retrieved from a file and compared to predefined rules. If the specified rule is not satisfied, skip directly. Otherwise, it is simply decoded to find the length of the instruction, and then all such instructions are directed to a small process that can universally emulate all commonly used instructions. In this way, the number of instructions to be processed can be greatly reduced and the execution speed can be increased.

d. analyzing the instruction, and copying the instruction to a buffer area;

Wherein,

The method for detecting the malicious codes contained in the non-executable file can improve the recognition rate through file preprocessing (before the simulation execution of the file).

The following process steps may be employed:

firstly, scanning available plug-ins and loading;

calling the plug-in to judge whether the file meets the preprocessing condition of the plug-in;

if yes, calling the plug-in to process the file, and then entering a simulation execution flow;

if not, directly entering the simulation execution flow;

the plug-in mode is used for processing the file and improving the effect of simulation execution after the file is processed.

If the JS script is embedded in some files, the data is encrypted through the JS and can be processed through the plug-in unit, and the decrypted data is obtained.

The detection method for the malicious codes contained in the non-executable file can improve the preset illegal instruction form and the preset malicious code instruction form through a large number of actual test results, thereby reducing the false alarm rate.

Referring to fig. 5, an apparatus for detecting malicious code contained in a non-executable file according to the present invention includes:

a file opening device 51 for opening the file to be checked, reading and decoding the effective content data to the memory;

a valid CPU instruction code block detection means 52 for checking whether the data contains a valid CPU instruction code block and outputting the detection result;

an output display means 53 for displaying the display result outputted from the valid CPU instruction code block detecting means, outputting and displaying alarm information when the detected file contains a valid CPU instruction code block, and outputting information not containing malicious code when the detected file does not contain a valid CPU instruction code block;

the output of the open file means 51 is connected to the input of the valid CPU instruction code block detection means 52; the output of the valid CPU instruction code block detection means 52 is connected to the input of the output display means 53.

The effective CPU instruction code block detection device 52 is composed of a first detection device 521 for detecting whether the data of the file is provided with a rule of an executable instruction code block instead of a section of non-executable data block and a second detection device 522 for executing the data as a section of code by using an instruction simulation executive program and whether the data can legally run within a specified limited time or a specified limited instruction number without errors, wherein the input of the first detection device 521 is connected to the output of the file opening device 51, the output of the first detection device 521 is connected to the input of the second detection device 522, and the output of the second detection device 522 is connected to the input of the output display device 53.

The above embodiments are only used to further illustrate the method and apparatus for detecting malicious codes in a non-executable file according to the present invention, but the present invention is not limited to the embodiments, and any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical spirit of the present invention fall within the scope of the technical solution of the present invention.

Claims

1. A method for detecting malicious codes contained in a non-executable file is characterized in that: the method comprises the following steps:

if all the data are checked and no valid CPU instruction code block is found, the file is considered not to contain malicious codes;

if the data is provided with a rule of an executable instruction code block, the data is used as a section of code to be executed by using an instruction simulation executive program, and whether the data can be legally operated within the specified limited time or the specified limited instruction number without errors is judged;

d. analyzing the instruction, and copying the instruction to a buffer area;

2. The method for detecting malicious code contained in a non-executable file according to claim 1, wherein: the preset illegal instruction form comprises the following steps: any of the instructions uses uninitialized registers, the instruction performs a meaningless operation, the instruction is a privileged instruction, the instruction contains an invalid memory address, the instruction contains an infrequent instruction, or the instruction contains other predefined instructions.

3. The method for detecting malicious code contained in a non-executable file according to claim 1, wherein: the preset malicious code instruction form comprises: the code contains a decryption operator, the code contains code relocation, the code contains a springboard, the code contains API import or the code contains other rules predefined by users.

4. An apparatus for detecting malicious code contained in a non-executable file, comprising: the method comprises the following steps:

a file opening device for opening the checked file and reading and decoding the effective content data to the memory;

a data detection device, starting from the beginning of data, checking whether the data contains valid CPU instruction code blocks, if a certain amount of data at the current position contains valid CPU instruction code blocks, considering that the file contains malicious codes;

a first judgment execution device, if the position data does not contain a valid CPU instruction code block, moving to the next data position for continuous inspection;

a second judgment execution device, if no valid CPU instruction code block is found by checking all data, the file is considered not to contain malicious codes;

wherein, data detection device includes:

first data detecting means for detecting whether the data is provided with a rule for executing a code block of instructions instead of a piece of unexecutable data;

the second data detection device is used for taking the data as a section of code to be executed by using an instruction simulation executive program if the data is provided with a rule of an executable instruction code block, and judging whether the data can be legally operated within a specified limited time or a specified limited instruction number without errors;

the process of using the data as a code to simulate an executive program by using an instruction to execute comprises the following steps:

d. analyzing the instruction, and copying the instruction to a buffer area;