CN118278018A

CN118278018A - Binary code vulnerability detection method, device and equipment based on value stream state machine

Info

Publication number: CN118278018A
Application number: CN202410717015.3A
Authority: CN
Inventors: 张斌; 王晓磊; 谢君
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2024-06-04
Filing date: 2024-06-04
Publication date: 2024-07-02
Anticipated expiration: 2044-06-04
Also published as: CN118278018B

Abstract

The present application relates to a binary code vulnerability detection method, device and equipment based on a value flow state machine, which obtains the corresponding intermediate code by preprocessing the program to be detected, determines the external data introduction point in the intermediate code, and constructs a static value flow graph based on the intermediate code. Based on the static value flow graph, the external data introduction point is used as the data source point, and a DC state machine is used to determine the state values of all value flow graph nodes that reference the data source point according to the value flow graph node type and the state update rule, and the pointer in the value flow graph node whose state value meets the preset value is determined as a potential dangerous pointer access point, and the potential dangerous pointer access point is screened to obtain the final dangerous pointer access point, and finally the program to be detected is dynamically plugged according to the dangerous pointer access point to realize the detection of arbitrary address pointer dereference vulnerability in the program to be detected. The method can effectively detect arbitrary address pointer dereference vulnerability existing in software.

Description

Binary code vulnerability detection method, device and equipment based on value stream state machine

技术领域Technical Field

本申请涉及信息安全技术领域，特别是涉及一种基于值流状态机的二进制代码漏洞检测方法、装置及设备。The present application relates to the field of information security technology, and in particular to a binary code vulnerability detection method, device and equipment based on a value stream state machine.

背景技术Background technique

软件漏洞也叫脆弱性（英语：Vulnerability），是指计算机系统安全方面的缺陷，使得系统或其应用数据的保密性、完整性、可用性、访问控制等面临威胁。任意地址指针解引用漏洞是软件漏洞中威胁较大的一类漏洞，其产生机理是程序中用于内存读、写、执行等解引用的指针，可以被攻击者通过外部输入的数据所控制，从而使得攻击者可以通过该漏洞获得任意内存的读、写和执行的能力，造成数据篡改、信息泄露，甚至任意代码执行等严重威胁。Software vulnerabilities, also known as vulnerabilities, refer to defects in computer system security that threaten the confidentiality, integrity, availability, and access control of the system or its application data. Arbitrary address pointer dereference vulnerabilities are a type of software vulnerability that poses a greater threat. The mechanism of their occurrence is that the pointers used for memory reading, writing, and execution in the program can be controlled by attackers through external input data, allowing attackers to obtain the ability to read, write, and execute any memory through the vulnerability, causing serious threats such as data tampering, information leakage, and even arbitrary code execution.

检测任意地址指针解引用漏洞的关键在于分析内存读写指令的地址参数是否可以被外部输入污染。然而，对于此类漏洞，由于无法跟踪外部数据的传播过程，传统ASAN只能在读写访问的地址恰好为不允许访问的Red Zone时，才能检测到内存访问错误，而且其仅能给出非法内存访问的错误报告，无法准确分析该非法内存访问是否为任意地址写任意数据。The key to detecting arbitrary address pointer dereference vulnerabilities is to analyze whether the address parameters of memory read and write instructions can be contaminated by external input. However, for such vulnerabilities, due to the inability to track the propagation process of external data, traditional ASAN can only detect memory access errors when the address of the read and write access happens to be a red zone that is not allowed to be accessed, and it can only give an error report of illegal memory access, and cannot accurately analyze whether the illegal memory access is to write arbitrary data to an arbitrary address.

发明内容Summary of the invention

基于此，有必要针对上述技术问题，提供一种能够有效进行软件漏洞检测的基于值流状态机的二进制代码漏洞检测方法、装置及设备。Based on this, it is necessary to provide a binary code vulnerability detection method, device and equipment based on a value stream state machine that can effectively perform software vulnerability detection in response to the above technical problems.

一种基于值流状态机的二进制代码漏洞检测方法，所述方法包括：A binary code vulnerability detection method based on a value stream state machine, the method comprising:

获取待检测程序，对所述待检测程序进行预处理得到对应的中间代码；Obtaining a program to be detected, and preprocessing the program to be detected to obtain a corresponding intermediate code;

在所述中间代码中确定外部数据引入点，并根据所述中间代码构建得到静态值流图；Determining an external data introduction point in the intermediate code, and constructing a static value flow graph based on the intermediate code;

基于所述静态值流图，将所述外部数据引入点作为数据源点，采用直流状态机根据值流图节点类型以及状态更新规则确定引用所述数据源点的所有值流图节点的状态值，将所述状态值符合预设值的值流图节点中的指针确定为潜在危险指针访问点；Based on the static value flow graph, the external data introduction point is used as the data source point, and the state values of all value flow graph nodes that reference the data source point are determined by a DC state machine according to the value flow graph node type and the state update rule, and the pointer in the value flow graph node whose state value meets the preset value is determined as a potential dangerous pointer access point;

对所述潜在危险指针访问点进行筛选，得到最终的危险指针访问点；Screening the potential dangerous pointer access points to obtain final dangerous pointer access points;

根据所述危险指针访问点对所述待检测程序进行动态插桩，以实现对所述待检测程序中任意地址指针解引用漏洞的检测。The program to be detected is dynamically instrumented according to the dangerous pointer access point to detect any address pointer dereference vulnerability in the program to be detected.

在其中一实施例中，所述待检测程序为二进制代码。In one embodiment, the program to be detected is a binary code.

在其中一实施例中，所述对所述待检测程序进行预处理得到对应的中间代码时，包括：In one embodiment, the preprocessing of the program to be detected to obtain the corresponding intermediate code includes:

对所述二进制代码进行中间代码提升，得到所述待检测程序对应的中间代码。The binary code is subjected to intermediate code promotion to obtain the intermediate code corresponding to the program to be detected.

在其中一实施例中，采用工具包括llvm-mctoll或mcsema对所述二进制代码进行中间代码提升；In one embodiment, a tool including llvm-mctoll or mcsema is used to perform intermediate code promotion on the binary code;

其中，在采用工具mcsema对所述二进制代码进行中间代码提升时，在得到的所述中间代码中添加metadata，以保存对应的原始汇编指令地址及指令信息。When the tool mcsema is used to perform intermediate code promotion on the binary code, metadata is added to the obtained intermediate code to save the corresponding original assembly instruction address and instruction information.

在其中一实施例中，所述在所述中间代码中确定外部数据引入点包括：In one embodiment, determining the external data introduction point in the intermediate code comprises:

将所述中间代码中的API函数定义为包括函数名称、外部数据在函数中的参数位置以及读取长度的三元组表示形式；The API function in the intermediate code is defined as a triple representation including a function name, a parameter position of external data in the function, and a read length;

在所述中间代码中采用CallInst和InvokeInst的指令进行函数调用，判断各三元组表示形式的API函数是否为外部数据读取函数；In the intermediate code, the instructions of CallInst and InvokeInst are used to perform function calls, and it is determined whether the API functions in the form of triples are external data reading functions;

根据判断为外部数据读取函数的API函数所在位置，确定所述外部数据引入点。The external data introduction point is determined according to the location of the API function determined to be the external data reading function.

在其中一实施例中，所述采用直流状态机根据值流图节点类型以及状态更新规则确定引用所述数据源点的所有值流图节点的状态值包括：In one embodiment, the method of using a DC state machine to determine the state values of all value flow graph nodes that reference the data source point according to the value flow graph node type and the state update rule includes:

当值流图节点类型为读取外部数据以及对象创建时，则将该节点状态值更新为“1”；When the value flow graph node type is reading external data and creating an object, the node status value is updated to "1";

当值流图节点类型为变量值拷贝、指针计算、变量比较、变量二元操作、函数实参、函数形参、函数实返回、函数形返回时，则将该节点状态值更新为该节点的父节点状态值；When the value flow graph node type is variable value copy, pointer calculation, variable comparison, variable binary operation, function actual parameter, function formal parameter, function actual return, function formal return, the node state value is updated to the parent node state value of the node;

当值流图节点类型为内存读取时，且该节点的父节点状态值大于“0”，则将该节点状态值根据父节点状态值减1进行更新；When the value flow graph node type is memory read, and the parent node status value of the node is greater than "0", the node status value is updated according to the parent node status value minus 1;

当值流图节点类型为内存写入时，若该节点的父节点变量作为值使用，则将该节点状态值根据父节点状态值加1进行更新，若该节点的父节点变量作为指针使用，则将该节点状态值更新为父节点状态值。When the value flow graph node type is memory write, if the parent node variable of the node is used as a value, the node state value is updated by adding 1 to the parent node state value; if the parent node variable of the node is used as a pointer, the node state value is updated to the parent node state value.

在其中一实施例中，所述对所述潜在危险指针访问点进行筛选，得到最终的危险指针访问点包括：In one embodiment, screening the potential dangerous pointer access points to obtain the final dangerous pointer access points includes:

在所述潜在危险指针访问点中，保留数据仅来自外部输入源的危险指针访问点，将其他的指针访问点筛除；Among the potential dangerous pointer access points, dangerous pointer access points whose data only come from external input sources are retained, and other pointer access points are screened out;

在筛除后潜在危险指针访问点中，将经过数据源点到达危险指针访问点过程中未对外部数据进行比较操作的执行路径上的潜在危险指针访问点确定为最终的危险指针访问点。Among the potential dangerous pointer access points after screening, the potential dangerous pointer access points on the execution path without performing a comparison operation on external data in the process of passing through the data source point to reach the dangerous pointer access point are determined as the final dangerous pointer access points.

在其中一实施例中，所述根据所述危险指针访问点对所述待检测程序进行动态插桩时，包括：In one embodiment, the dynamically instrumenting the program to be detected according to the dangerous pointer access point includes:

在所述二进制代码中，将所述危险指针访问点对应的指令修改为执行漏洞代码检测的跳转指令；In the binary code, modify the instruction corresponding to the dangerous pointer access point into a jump instruction for executing vulnerability code detection;

其中，所述漏洞代码检测包括：漏洞检测代码以及控制流上下文恢复代码。The vulnerability code detection includes: vulnerability detection code and control flow context recovery code.

一种基于值流状态机的二进制代码漏洞检测装置，所述装置包括：A binary code vulnerability detection device based on a value stream state machine, the device comprising:

预处理模块，用于获取待检测程序，对所述待检测程序进行预处理得到对应的中间代码；A preprocessing module, used to obtain a program to be detected, and preprocess the program to be detected to obtain a corresponding intermediate code;

外部数据引入点集静态值流图得到模块，用于在所述中间代码中确定外部数据引入点，并根据所述中间代码构建得到静态值流图；A module for obtaining a static value flow graph of an external data introduction point set, which is used to determine an external data introduction point in the intermediate code and obtain a static value flow graph based on the intermediate code;

潜在危险指针访问点确定模块，用于基于所述静态值流图，将所述外部数据引入点作为数据源点，采用直流状态机根据值流图节点类型以及状态更新规则确定引用所述数据源点的所有值流图节点的状态值，将所述状态值符合预设值的值流图节点中的指针确定为潜在危险指针访问点；A module for determining a potential dangerous pointer access point is used to determine the state values of all value flow graph nodes that reference the data source point based on the static value flow graph, using a DC state machine according to the value flow graph node type and a state update rule, and determining a pointer in a value flow graph node whose state value meets a preset value as a potential dangerous pointer access point;

潜在危险指针访问点筛选模块，用于对所述潜在危险指针访问点进行筛选，得到最终的危险指针访问点；A potential dangerous pointer access point screening module, used for screening the potential dangerous pointer access points to obtain final dangerous pointer access points;

漏洞检测模块，用于根据所述危险指针访问点对所述待检测程序进行动态插桩，以实现对所述待检测程序中任意地址指针解引用漏洞的检测。The vulnerability detection module is used to dynamically insert the program to be detected according to the dangerous pointer access point to detect any address pointer dereference vulnerability in the program to be detected.

一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：A computer device comprises a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented:

一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：A computer-readable storage medium stores a computer program, which, when executed by a processor, implements the following steps:

上述基于值流状态机的二进制代码漏洞检测方法、装置及设备，通过对待检测程序进行预处理得到对应的中间代码，在中间代码中确定外部数据引入点，并根据中间代码构建得到静态值流图，基于静态值流图，将外部数据引入点作为数据源点，采用直流状态机根据值流图节点类型以及状态更新规则确定引用数据源点的所有值流图节点的状态值，将状态值符合预设值的值流图节点中的指针确定为潜在危险指针访问点，并对潜在危险指针访问点进行筛选，得到最终的危险指针访问点，最后根据危险指针访问点对所述待检测程序进行动态插桩，以实现对待检测程序中任意地址指针解引用漏洞的检测。采用本方法可有效对软件中存在的任意地址指针解引用漏洞进行检测。The above-mentioned binary code vulnerability detection method, device and equipment based on the value flow state machine obtains the corresponding intermediate code by preprocessing the program to be detected, determines the external data introduction point in the intermediate code, and constructs a static value flow graph based on the intermediate code. Based on the static value flow graph, the external data introduction point is used as the data source point, and the DC state machine is used to determine the state values of all value flow graph nodes that reference the data source point according to the value flow graph node type and the state update rule, and the pointer in the value flow graph node whose state value meets the preset value is determined as a potential dangerous pointer access point, and the potential dangerous pointer access points are screened to obtain the final dangerous pointer access point, and finally the program to be detected is dynamically plugged according to the dangerous pointer access point to realize the detection of arbitrary address pointer dereference vulnerability in the program to be detected. This method can effectively detect arbitrary address pointer dereference vulnerability existing in software.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为一个实施例中任意地址指针解引用漏洞的示例图；FIG1 is an example diagram of an arbitrary address pointer dereference vulnerability in one embodiment;

图2为一个实施例中基于值流状态机的二进制代码漏洞检测方法的流程示意图；FIG2 is a flow chart of a binary code vulnerability detection method based on a value stream state machine in one embodiment;

图3为一个实施例中局部静态值流示意图；FIG3 is a schematic diagram of a local static value flow in one embodiment;

图4为一个实施例中状态机更新过程示意图；FIG4 is a schematic diagram of a state machine update process in one embodiment;

图5为一个实施例中基于静态代码控制流修改与恢复的二进制代码插桩方法示意图；FIG5 is a schematic diagram of a binary code instrumentation method based on static code control flow modification and recovery in one embodiment;

图6为一个实施例中基于值流状态机的二进制代码漏洞检测装置的结构框图；FIG6 is a structural block diagram of a binary code vulnerability detection device based on a value stream state machine in one embodiment;

图7为一个实施例中计算机设备的内部结构图。FIG. 7 is a diagram showing the internal structure of a computer device in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

任意地址指针解引用漏洞是软件漏洞中威胁较大的一类漏洞，以如图1所示的代码为例，该程序在第11行通过fgets从标准输入流stdin中读取外部数据放入data，然后再第30行和31行将data的内容赋值给linkedListPrev和linkedListNext，然后在第32行和33行作为指针进行内存写入。此时，攻击者是可以完全控制linkedListPrev和linkedListNext两个指针，即具有任意地址写入数据的强大攻击能力。Arbitrary address pointer dereference vulnerability is a type of vulnerability with a greater threat in software vulnerabilities. Take the code shown in Figure 1 as an example. The program reads external data from the standard input stream stdin through fgets in line 11 and puts it into data. Then, in lines 30 and 31, the content of data is assigned to linkedListPrev and linkedListNext. Then, in lines 32 and 33, it is used as a pointer to write to memory. At this point, the attacker can completely control the two pointers linkedListPrev and linkedListNext, that is, has a powerful attack capability to write data to arbitrary addresses.

学术界和业界常用于漏洞检测的工具为AddressSanitizer（简称ASAN），是一个针对 C/C++程序的动态内存错误检测器，主要用于检测程序运行时(runtime)发生的许多内存访问错误。ASAN由一个编译器插桩模块和运行时库组成，其中，插桩模块主要用在栈内存上，而运行时库主要用在堆内存上。ASAN能够检查内存泄露、越界、未初始化、重复释放、缓冲区溢出、堆栈溢出、野指针、线程死锁等漏洞和问题。在运行过程中，一旦检测到内存错误，ASAN会使程序崩溃，并输出有用的调试信息，包括调用堆栈、影子内存映射、内存访问违例类型、读取或写入的内容、导致内存访问违例的计算机以及内存内容。A tool commonly used in academia and industry for vulnerability detection is AddressSanitizer (ASAN for short), which is a dynamic memory error detector for C/C++ programs. It is mainly used to detect many memory access errors that occur during program runtime. ASAN consists of a compiler instrumentation module and a runtime library. The instrumentation module is mainly used on stack memory, while the runtime library is mainly used on heap memory. ASAN can check for vulnerabilities and problems such as memory leaks, out-of-bounds, uninitialized, repeated releases, buffer overflows, stack overflows, wild pointers, and thread deadlocks. During operation, once a memory error is detected, ASAN will crash the program and output useful debugging information, including the call stack, shadow memory mapping, memory access violation type, content read or written, the computer that caused the memory access violation, and memory content.

检测任意地址指针解引用漏洞的关键在于分析程序中内存读、写和执行的指令的地址参数是否可以被外部输入污染。这其中涉及到的关键技术为污点分析技术，污点分析(又称污点传播分析)的主要思想，在于为目标程序中从外部环境中(诸如文件系统、命令行、网络数据报文等)读取的输入数据引入对应的污点标签，随之在程序执行过程中进行污点传播。通过判断“外部输入对敏感程序位置是否能够产生影响”，确认目标程序是否存在安全问题。该技术当前已成为信息安全领域的一个研究热点。The key to detecting arbitrary address pointer dereference vulnerabilities is to analyze whether the address parameters of the instructions for reading, writing, and executing memory in the program can be contaminated by external input. The key technology involved is taint analysis technology. The main idea of taint analysis (also known as taint propagation analysis) is to introduce corresponding taint labels for the input data read from the external environment (such as the file system, command line, network data message, etc.) in the target program, and then propagate the taint during the program execution. By judging whether "external input can have an impact on the sensitive program location", it is confirmed whether the target program has security issues. This technology has currently become a research hotspot in the field of information security.

然而，在现有技术中，由于无法跟踪外部数据的传播过程，传统ASAN只能在读写访问的地址恰好为不允许访问的Red Zone时，才能检测到内存访问错误，而且，其仅能给出非法内存访问的错误报告，无法准确分析该非法内存访问是否为任意地址写任意数据。成功检测任意地址指针解引用漏洞的关键在于分析外部输入的传播过程，因此，一个可行的方案是可以借助已有的动态污点分析工具（如DataFlow Sanitizer）进行在线污点传播。然而，直接使用此类污点分析工具会带来两个方面的问题：一是其用于存储污点信息的影子内存会与现有ASAN发生冲突，导致其它漏洞检测失败；二是对全部指令进行污点分析会引入非常大的运行开销，实用性难以得到保障。However, in the prior art, due to the inability to track the propagation process of external data, traditional ASAN can only detect memory access errors when the address of the read or write access happens to be a Red Zone that is not allowed to be accessed. Moreover, it can only give an error report of illegal memory access, and cannot accurately analyze whether the illegal memory access is to write arbitrary data to any address. The key to successfully detecting arbitrary address pointer dereference vulnerabilities is to analyze the propagation process of external input. Therefore, a feasible solution is to use existing dynamic taint analysis tools (such as DataFlow Sanitizer) to perform online taint propagation. However, directly using such taint analysis tools will bring two problems: first, the shadow memory used to store taint information will conflict with the existing ASAN, resulting in failure of other vulnerability detection; second, taint analysis of all instructions will introduce very large operating overhead, and practicality is difficult to guarantee.

针对上述技术缺陷，如图2所示，提供了一种基于值流状态机的二进制代码漏洞检测方法，所述方法步骤具体包括：In view of the above technical defects, as shown in FIG2 , a binary code vulnerability detection method based on a value stream state machine is provided, and the method steps specifically include:

步骤S100，获取待检测程序，对待检测程序进行预处理得到对应的中间代码。Step S100, obtaining a program to be detected, and preprocessing the program to be detected to obtain a corresponding intermediate code.

步骤S110，在中间代码中确定外部数据引入点，并根据中间代码构建得到静态值流图。Step S110, determining the external data introduction point in the intermediate code, and constructing a static value flow graph based on the intermediate code.

步骤S120，基于静态值流图，将外部数据引入点作为数据源点，采用直流状态机根据值流图节点类型以及状态更新规则确定引用数据源点的所有值流图节点的状态值，将状态值符合预设值的值流图节点中的指针确定为潜在危险指针访问点。Step S120, based on the static value flow graph, takes the external data introduction point as the data source point, uses the DC state machine to determine the state values of all value flow graph nodes that reference the data source point according to the value flow graph node type and the state update rule, and determines the pointer in the value flow graph node whose state value meets the preset value as a potentially dangerous pointer access point.

步骤S130,对潜在危险指针访问点进行筛选，得到最终的危险指针访问点。Step S130, screening the potential dangerous pointer access points to obtain the final dangerous pointer access points.

步骤S140,根据危险指针访问点对待检测程序进行动态插桩，以实现对待检测程序中任意地址指针解引用漏洞的检测。Step S140: dynamically insert the program to be detected according to the dangerous pointer access point to detect any address pointer dereference vulnerability in the program to be detected.

在本实施例中，从动静结合的角度出发，先通过值流状态机筛选出潜在的异常点，从而避免对所有的指令进行动态分析，在此基础上，结合动态插桩技术，将对应的任意地址指针解引用漏洞触发逻辑插入异常点，从而保证以较低开销和较高精度实现对此类漏洞的检测。具体的，在全局静态值流图基础上，定位典型的输入函数并追踪其对内存对象的影响来找到外部输入源，之后引入了基于状态机的值流分析框架，根据数据流动动态更新每个节点的状态，以区分数据是否被用作内存访问的指针，从而准确定位潜在的危险指针访问点。一旦确定了外部输入源和危险指针访问点，本方法在动态运行时仅在这些关键点插入插桩，以实现对任意地址指针解引用漏洞的高效检测。In this embodiment, from the perspective of combining dynamic and static, potential abnormal points are first screened out through the value flow state machine to avoid dynamic analysis of all instructions. On this basis, combined with dynamic stub technology, the corresponding arbitrary address pointer dereference vulnerability trigger logic is inserted into the abnormal point, thereby ensuring that such vulnerabilities are detected with low overhead and high accuracy. Specifically, based on the global static value flow graph, typical input functions are located and their impact on memory objects is tracked to find external input sources. Then, a value flow analysis framework based on a state machine is introduced to dynamically update the state of each node according to the data flow to distinguish whether the data is used as a pointer for memory access, thereby accurately locating potential dangerous pointer access points. Once the external input source and dangerous pointer access point are determined, this method only inserts stubs at these key points during dynamic operation to achieve efficient detection of arbitrary address pointer dereference vulnerabilities.

在步骤S100中，将C/C++，其编译产生的二进制代码转换为便于分析的LLVM IR中间表示。In step S100, the binary code generated by C/C++ compilation is converted into an LLVM IR intermediate representation that is convenient for analysis.

在本实施例中，对二进制代码进行预处理时：对二进制代码进行中间代码提升，得到待检测程序对应的中间代码。In this embodiment, when the binary code is preprocessed: intermediate code lifting is performed on the binary code to obtain the intermediate code corresponding to the program to be detected.

具体的，对于二进制代码，则通过已有的工具如微软的llvm-mctoll或mcsema等，将其提升至LLVM IR。其间，为了使动态插桩部分能够根据静态分析获得的异常点定位到原始的汇编指令，对mcsema工具进行了修改，使其在提升得到的中间代码中添加metadata，保存了对应的原始汇编指令地址及指令信息。Specifically, for binary code, it is promoted to LLVM IR through existing tools such as Microsoft's llvm-mctoll or mcsema. In order to enable the dynamic instrumentation part to locate the original assembly instructions based on the abnormal points obtained by static analysis, the mcsema tool was modified to add metadata to the promoted intermediate code to save the corresponding original assembly instruction address and instruction information.

在本实施例中，待检测程序还可以是源代码，对其进行预处理时：先通过LLVM编译器框架将每个源代码文件生成对应的对象文件，将各对象文件进行链接。再对链接后的对象文件进行中间代码的提取，得到待检测程序对应的中间代码。In this embodiment, the program to be detected can also be a source code, and when preprocessing it: first, each source code file is generated into a corresponding object file through the LLVM compiler framework, and each object file is linked. Then, the intermediate code is extracted from the linked object file to obtain the intermediate code corresponding to the program to be detected.

具体的，对于源代码，直接通过LLVM编译器框架提供的前端clang，生成每个源代码文件对应的对象文件，然后将这些对象文件进行链接，之后再进行中间代码提取，从而生成包含所有源代码对应IR在内的单个的IR中间表示文件。Specifically, for the source code, the front-end clang provided by the LLVM compiler framework is used to generate the object file corresponding to each source code file, and then these object files are linked, and then the intermediate code is extracted to generate a single IR intermediate representation file containing the IR corresponding to all source codes.

在步骤S110中，在中间代码中确定外部数据引入点，即在中间代码中定位初从外部读取数据的位置，而外部数据的可能来源主要包括网络数据包、命令行参数、文件、配置文件、环境变量等。程序读取这些数据的方式主要是依靠第三方库提供的API函数。例如，libc提供的函数可帮助程序从指定的流stream（如标准输入流STDIN）读取最大字符数为n的一行数据，并把它存储在str指向的字符串内。为了统一分析，将中间代码中的API函数定义为包括函数名称、外部数据在函数中的参数位置以及读取长度的三元组表示的统一形式。In step S110, the external data introduction point is determined in the intermediate code, that is, the location where the data is initially read from the outside is located in the intermediate code, and the possible sources of external data mainly include network data packets, command line parameters, files, configuration files, environment variables, etc. The way the program reads this data mainly relies on the API functions provided by the third-party library. For example, the API functions provided by libc The function helps the program read a line of data with a maximum of n characters from the specified stream (such as the standard input stream STDIN) and store it in the string pointed to by str. In order to unify the analysis, the API function in the intermediate code is defined as a unified form of a triple representation including the function name, the parameter position of the external data in the function, and the read length.

具体的，三元组表示形式表示为：<name, input_pos, size>，name为函数名称，input_pos为外部数据在函数中的参数位置（或者返回值），size则表示读取的长度，以fgets为例，其可以表达为。Specifically, the triple representation is expressed as: <name, input_pos, size>, where name is the function name, input_pos is the parameter position (or return value) of the external data in the function, and size is the length of the read. Taking fgets as an example, it can be expressed as .

接着，在中间代码中采用CallInst和InvokeInst的指令进行函数调用，判断各三元组表示形式的API函数是否为外部数据读取函数，根据判断为外部数据读取函数的API函数所在位置，确定外部数据引入点。Next, the CallInst and InvokeInst instructions are used in the intermediate code to perform function calls, determine whether the API functions represented by each triple are external data reading functions, and determine the external data introduction point based on the location of the API functions determined to be external data reading functions.

在本实施例中，确定的外部数据引入点可以为多个。In this embodiment, there may be multiple external data introduction points determined.

在步骤S110中，还根据中间代码中基本数据流流向关系构建得到对应的静态值流图。In step S110, a corresponding static value flow graph is constructed based on the basic data flow direction relationship in the intermediate code.

静态值流图（Static Value Flow Graph，SVFG）是一种用于程序分析和优化的图形表示方法，将程序中的变量及内存对象和它们之间的关系表示为图的形式，值流图在传统数据流图的基础上，强化了程序变量（特别是指针和内存对象）之间的值传播关系，是研究数据流向的重要基础。此外，值流图构建的过程还综合考虑了变量的作用域和生命周期，可以确保最终的结果能够准确反映程序的语义结构。现已有成熟的工具可以完成值流图的自动构建，本发明采用开源软件SVF工具对目标程序对应的中间代码产生相应的静态值流图。Static Value Flow Graph (SVFG) is a graphical representation method for program analysis and optimization. It represents variables and memory objects in a program and the relationships between them in the form of a graph. On the basis of traditional data flow graphs, value flow graphs strengthen the value propagation relationship between program variables (especially pointers and memory objects), and are an important basis for studying data flow. In addition, the process of constructing a value flow graph also comprehensively considers the scope and life cycle of the variables, which can ensure that the final result can accurately reflect the semantic structure of the program. There are now mature tools that can complete the automatic construction of value flow graphs. The present invention uses the open source software SVF tool to generate a corresponding static value flow graph for the intermediate code corresponding to the target program.

如图3所示，给出了某个程序的原始静态值流图（局部），作为实例。可以看出，变量%2的值通过getelementptr 指令影响了变量%16，变量%16又影响了指令，直至传递至函数CWE123_Write_What_Where_Condition__fgets_51b_goodG2BSink的参数中。As shown in Figure 3, the original static value flow graph (partial) of a program is given as an example. It can be seen that the value of variable %2 affects variable %16 through the getelementptr instruction, and variable %16 affects the instruction , until it is passed to the parameter of the function CWE123_Write_What_Where_Condition__fgets_51b_goodG2BSink.

在步骤S120中，在全局静态值流图的基础上，定位出外部数据的所有使用点，并从中筛选出作为内存读、写、执行的指针使用点。例如，fgets会将读入的数据存放在第一个参数指定的内存区域中。在定位到该变量后，在静态值流图上进行后继节点分析，获得所有能够被该变量影响的节点集合。为了区分哪些节点是将读取到的数据作为指针进行内存访问，在本方法中采用了基于状态机的值流分析框架，在静态值流图中，以外部数据引入点为数据源点，为其传播路径上的每一个节点都赋予一个状态值，然后根据值流传播关系动态更新每个节点的状态值。In step S120, on the basis of the global static value flow graph, all usage points of external data are located, and the pointer usage points for memory reading, writing, and execution are screened out. For example, fgets will store the read data in the memory area specified by the first parameter. After locating the variable, the successor node analysis is performed on the static value flow graph to obtain all node sets that can be affected by the variable. In order to distinguish which nodes use the read data as a pointer for memory access, a value flow analysis framework based on a state machine is adopted in this method. In the static value flow graph, the external data introduction point is used as the data source point, and each node on its propagation path is assigned a state value, and then the state value of each node is dynamically updated according to the value flow propagation relationship.

在本实施例中，采用直流状态机根据值流图节点类型以及状态更新规则确定引用所述数据源点的所有值流图节点的状态值，其中，状态更新规则为：当值流图节点类型为读取外部数据以及对象创建时，则将该节点状态值更新为“1”，即将作为数据源点的值流图节点的状态值更新为“1”。当值流图节点类型为变量值拷贝、指针计算、变量比较、变量二元操作、函数实参、函数形参、函数实返回、函数形返回时，则将该节点状态值更新为该节点的父节点状态值。当值流图节点类型为内存读取时，且该节点的父节点状态值大于“0”，则将该节点状态值根据父节点状态值减1进行更新。当值流图节点类型为内存写入时，若该节点的父节点变量作为值使用，则将该节点状态值根据父节点状态值加1进行更新，若该节点的父节点变量作为指针使用，则将该节点状态值更新为父节点状态值。In this embodiment, a DC state machine is used to determine the state values of all value flow graph nodes that reference the data source point according to the value flow graph node type and the state update rule, wherein the state update rule is: when the value flow graph node type is to read external data and create an object, the node state value is updated to "1", that is, the state value of the value flow graph node as the data source point is updated to "1". When the value flow graph node type is variable value copy, pointer calculation, variable comparison, variable binary operation, function actual parameter, function formal parameter, function actual return, function formal return, the node state value is updated to the parent node state value of the node. When the value flow graph node type is memory read, and the parent node state value of the node is greater than "0", the node state value is updated according to the parent node state value minus 1. When the value flow graph node type is memory write, if the parent node variable of the node is used as a value, the node state value is updated according to the parent node state value plus 1, and if the parent node variable of the node is used as a pointer, the node state value is updated to the parent node state value.

具体的，值流图节点类型及其状态更新规则如表1所示，其中，Sc代表当前节点状态值， Sp代表其父节点状态值。状态值代表着外部数据使用的指针层次。例如，状态值为0代表当前节点中使用的为外部数据；状态值为1代表当前节点使用的为指向外部数据的指针；状态值为2代表当前节点使用的为指向外部数据指针的指针；状态值越大，代表指针的指向层次越高。Specifically, the value flow graph node types and their state update rules are shown in Table 1, where Sc represents the current node state value and Sp represents its parent node state value. The state value represents the pointer level used by external data. For example, a state value of 0 means that the current node uses external data; a state value of 1 means that the current node uses a pointer to external data; a state value of 2 means that the current node uses a pointer to an external data pointer; the larger the state value, the higher the level of the pointer.

表1 值流图节点状态更新规则Table 1 Value flow graph node status update rules

具体的，对于定位到的读取函数调用，首先将其存储数据的参数所在节点状态设置为1，表明当前是指向外部数据的指针。然后根据值流关系动态分析每个直接后继节点的状态值。当发现某个直接后继节点的LoadVFGNode或StoreVFGNode的指针所在节点状态值为0时，说明该Load或Store操作的指针为外部数据可控，即危险指针访问点。Specifically, for the located read function call, first set the node state where the parameter of the stored data is located to 1, indicating that it is currently a pointer to external data. Then dynamically analyze the state value of each direct successor node according to the value flow relationship. When it is found that the node state value where the pointer of the LoadVFGNode or StoreVFGNode of a direct successor node is located is 0, it means that the pointer of the Load or Store operation is controllable by external data, that is, a dangerous pointer access point.

如图4中给出的一条数据流边为例，该数据流边为：，其对应的具体值流节点如下：Take a data flow edge shown in Figure 4 as an example. The data flow edge is: , and its corresponding specific value flow nodes are as follows:

假设%2为fgets函数的第一个参数，其状态被设置为Sc = 1，然后变量%5所在的GepVFGNode节点状态值更新为1，最后由于%6为从%5中通过Load操作获得的值，因此，其LoadVFGNode节点状态更新为0，即%6为外部数据，假设后续有Load或Store操作以%6为指针，即将这些Load和Store操作加入到危险指针访问点中。Assume that %2 is the first parameter of the fgets function, and its status is set to Sc = 1. Then the status value of the GepVFGNode node where the variable %5 is located is updated to 1. Finally, since %6 is the value obtained from %5 through the Load operation, its LoadVFGNode node status is updated to 0, that is, %6 is external data. Assuming that there are subsequent Load or Store operations using %6 as a pointer, these Load and Store operations are added to the dangerous pointer access points.

在这里对图4进行说明，这里为静态值流图的简化图并不是原始图，仅为说明数据流边的状态变化。FIG4 is explained here. This is a simplified diagram of the static value flow graph and is not the original diagram. It is only used to illustrate the state change of the data flow edge.

在本实施例中，由于根据步骤S110确定的外部数据引入点可以是多个，所以，在值流图上分别以各外部数据引入点作为数据源点，对以各数据源点进行传播的数据流边上的各节点状态进行动态赋值，最后，状态值为“0”的节点中的指针即为潜在危险指针访问点。也就是说潜在危险指针访问点也会存在多个，为了平衡效率和完整性，在本方法中还提出了两种优化方式对多个潜在危险指针访问点进行筛选，包括只处理唯一来自外部输入源的数据的危险指针访问点，并将没有对外部数据进行比较操作的路径视为潜在漏洞。这样显著降低了运行时开销，同时仍然能够有效检测源代码中潜在的任意地址读写漏洞。In this embodiment, since there may be multiple external data introduction points determined according to step S110, each external data introduction point is used as a data source point on the value flow graph, and the state of each node on the edge of the data flow propagated by each data source point is dynamically assigned. Finally, the pointer in the node with a state value of "0" is a potentially dangerous pointer access point. In other words, there may be multiple potentially dangerous pointer access points. In order to balance efficiency and integrity, two optimization methods are also proposed in this method to screen multiple potentially dangerous pointer access points, including only processing dangerous pointer access points with data from external input sources, and treating paths without comparison operations on external data as potential vulnerabilities. This significantly reduces runtime overhead, while still being able to effectively detect potential arbitrary address read and write vulnerabilities in source code.

在步骤S130中，对潜在危险指针访问点进行筛选，得到最终的危险指针访问点包括：在潜在危险指针访问点中，保留数据仅来自外部输入源的危险指针访问点，将其他的指针访问点筛除。接着，在筛除后潜在危险指针访问点中，将经过数据源点到达危险指针访问点过程中未对外部数据进行比较操作的执行路径上的潜在危险指针访问点确定为最终的危险指针访问点。In step S130, the potential dangerous pointer access points are screened to obtain the final dangerous pointer access points, including: among the potential dangerous pointer access points, dangerous pointer access points whose data only comes from the external input source are retained, and other pointer access points are screened out. Then, among the screened potential dangerous pointer access points, the potential dangerous pointer access points on the execution path that does not perform a comparison operation on the external data in the process of passing through the data source point to the dangerous pointer access point are determined as the final dangerous pointer access points.

具体的，在进行第一次筛选时，仅处理数据唯一来自于外部输入源的危险指针访问点，这样做的好处是只需要对数据源点和危险指针访问进行插桩即可，一旦程序经过数据源点到达危险使用点，那么该危险指针访问点的数据一定来自与外部输入数据，从而避免对整个数据流传播路径进行插桩。Specifically, during the first screening, only the dangerous pointer access points whose data only comes from external input sources are processed. The advantage of this is that only the data source point and the dangerous pointer access need to be instrumented. Once the program passes through the data source point and reaches the dangerous use point, the data of the dangerous pointer access point must come from the external input data, thereby avoiding the instrumentation of the entire data flow propagation path.

具体的，在进行第二次筛选时，仅认定经过数据源点到达危险指针访问点过程中未对外部数据进行compare比较操作的执行路径为漏洞路径。考虑下面代码示例1，虽然x来源于外部输入，且有危险指针访问点（a[x]），但由于代码对x的取值进行了合法检查，因此其并不包含漏洞。要完备分析外部数据作为指针访问时的取值合法性，需要借助符号执行等工具获得外部数据的可能取值范围，这将引入巨大的运行开销。考虑到对外部输入进行了比较验证的不一定是漏洞，但不进行比较验证的一定是漏洞这一原则，在本实施例中，仅将经过数据源点到达危险指针访问点过程中未对外部数据进行比较操作的执行路径认定为漏洞。Specifically, during the second screening, only the execution paths that do not perform a compare operation on the external data in the process of passing through the data source point to the dangerous pointer access point are identified as vulnerable paths. Consider the following code example 1. Although x comes from external input and there is a dangerous pointer access point (a[x]), the code does not contain a vulnerability because it performs a legal check on the value of x. To fully analyze the legality of the value of external data when it is accessed as a pointer, it is necessary to use tools such as symbolic execution to obtain the possible value range of the external data, which will introduce huge operating overhead. Taking into account the principle that a comparison and verification of the external input is not necessarily a vulnerability, but a comparison and verification that is not performed is definitely a vulnerability, in this embodiment, only the execution paths that do not perform a compare operation on the external data in the process of passing through the data source point to the dangerous pointer access point are identified as vulnerabilities.

因此，在本实施例中，还基于前文提出的状态机框架对状态值为0的比较指令进行了插桩，以支撑该优化的具体实现。Therefore, in this embodiment, the comparison instruction with the state value of 0 is also plugged based on the state machine framework proposed above to support the specific implementation of the optimization.

代码示例1：Code Example 1:

int a[10];int a[10];

unsigned x = input();unsigned x = input();

if (x<10) {if (x<10) {

printf(a[x]);printf(a[x]);

}}

在步骤S140中，在根据危险指针访问点对待检测程序进行动态插桩时，当待检测程序为源代码时，则直接通过在编译时调用LLVM的Pass实现，将漏洞检测的逻辑直接通过函数调用的方式插入到中间代码中。In step S140, when the program to be detected is dynamically instrumented according to the dangerous pointer access point, when the program to be detected is source code, the vulnerability detection logic is directly inserted into the intermediate code through function calls by directly calling the Pass implementation of LLVM during compilation.

当待检测程序为二进制代码时，在本实施例中提出了一种静态代码控制流修改与恢复的二进制代码插桩方法，如图5所示，在二进制代码中，将危险指针访问点对应的指令修改为执行漏洞代码检测的跳转指令，其中，漏洞代码检测包括：漏洞检测代码以及控制流上下文恢复代码。When the program to be detected is a binary code, a binary code instrumentation method for static code control flow modification and recovery is proposed in this embodiment. As shown in Figure 5, in the binary code, the instruction corresponding to the dangerous pointer access point is modified to a jump instruction for performing vulnerability code detection, wherein the vulnerability code detection includes: vulnerability detection code and control flow context recovery code.

具体的，首先，从静态分析获得异常点（即危险指针访问点）的metadata中定位到原始的汇编指令，如图5中方框所示。然后，对目标二进制程序进行静态控制流修改，将方框中对应的指令强制修改为jmp loc.xxx跳转指令，其中loc.xxx为包含了漏洞检测代码与控制流+上下文恢复代码的内存地址（该内存地址在动态运行时映射到目标程序内存空间中的固定位置）。这样，当程序执行到原本的方框指令对应的位置时（执行流1），会跳转到loc.xxx地址处执行（执行流2），执行完漏洞检测代码与控制流+上下文恢复代码后（执行流3），返回至框线的下一条指令继续执行（执行流4和执行流5）。通过这种方式，可实现针对二进制代码的插桩。Specifically, first, locate the original assembly instruction from the metadata of the abnormal point (i.e., the dangerous pointer access point) obtained from the static analysis, as shown in the box in Figure 5. Then, perform static control flow modification on the target binary program, and force the corresponding instruction in the box to be modified to the jmp loc.xxx jump instruction, where loc.xxx is the memory address containing the vulnerability detection code and the control flow + context recovery code (the memory address is mapped to a fixed position in the target program memory space during dynamic runtime). In this way, when the program executes to the position corresponding to the original box instruction (execution flow 1), it will jump to the loc.xxx address for execution (execution flow 2), and after executing the vulnerability detection code and the control flow + context recovery code (execution flow 3), it will return to the next instruction of the box line to continue execution (execution flow 4 and execution flow 5). In this way, stubbing for binary code can be achieved.

上述基于值流状态机的二进制代码漏洞检测方法中，通过结合静态分析和动态分析，实现了源代码程序和二进制程序中潜在的任意地址指针解引用漏洞的高效和准确检测，可以在较低开销条件下实现对此类漏洞的高精度检测。通过对值流图中不同节点的语义建立不同的状态转换规则，实现了对数据使用过程的准确分析，并以极小的运行开销实现对已发布二进制代码中任意指令的插桩分析。In the above binary code vulnerability detection method based on the value stream state machine, by combining static analysis and dynamic analysis, efficient and accurate detection of potential arbitrary address pointer dereference vulnerabilities in source code programs and binary programs can be achieved, and high-precision detection of such vulnerabilities can be achieved under low overhead conditions. By establishing different state transition rules for the semantics of different nodes in the value stream graph, accurate analysis of the data usage process is achieved, and stub analysis of any instruction in the released binary code is achieved with extremely low running overhead.

本方法主要应用体现在两方面：其一，软件厂商可以在发布产品前对源代码软件进行扫描分析，发现软件中潜在的漏洞并修复，从而保证发布后的版本中不存在此类型漏洞；其二，第三方软件测评机构可以对厂商发布的二进制软件进行扫描分析，检测其中可能存在的漏洞，生成相应的分析报告。The main applications of this method are reflected in two aspects: first, software manufacturers can scan and analyze the source code software before releasing the product, discover potential vulnerabilities in the software and fix them, thereby ensuring that such vulnerabilities do not exist in the released version; second, third-party software evaluation agencies can scan and analyze the binary software released by the manufacturer, detect possible vulnerabilities therein, and generate corresponding analysis reports.

应该理解的是，虽然图2的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，图2中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些子步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowchart of Fig. 2 are displayed in sequence according to the indication of the arrows, these steps are not necessarily executed in sequence according to the order indicated by the arrows. Unless there is a clear explanation in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in Fig. 2 may include a plurality of sub-steps or a plurality of stages, and these sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the execution order of these sub-steps or stages is not necessarily to be carried out in sequence, but can be executed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.

在一个实施例中，如图6所示，提供了一种基于值流状态机的二进制代码漏洞检测装置，包括：预处理模块200、外部数据引入点集静态值流图得到模块210、潜在危险指针访问点确定模块220、潜在危险指针访问点筛选模块230和漏洞检测模块240，其中：In one embodiment, as shown in FIG6 , a binary code vulnerability detection device based on a value stream state machine is provided, comprising: a preprocessing module 200, a static value stream graph obtaining module 210 for an external data introduction point set, a potential dangerous pointer access point determination module 220, a potential dangerous pointer access point screening module 230 and a vulnerability detection module 240, wherein:

预处理模块200，用于获取待检测程序，对所述待检测程序进行预处理得到对应的中间代码。The preprocessing module 200 is used to obtain the program to be detected and preprocess the program to be detected to obtain the corresponding intermediate code.

外部数据引入点集静态值流图得到模块210，用于在所述中间代码中确定外部数据引入点，并根据所述中间代码构建得到静态值流图。The module 210 for obtaining the static value flow graph of the external data introduction point set is used to determine the external data introduction point in the intermediate code and to construct a static value flow graph based on the intermediate code.

潜在危险指针访问点确定模块220，用于基于所述静态值流图，将所述外部数据引入点作为数据源点，采用直流状态机根据值流图节点类型以及状态更新规则确定引用所述数据源点的所有值流图节点的状态值，将所述状态值符合预设值的值流图节点中的指针确定为潜在危险指针访问点。The module 220 for determining a potentially dangerous pointer access point is used to determine the state values of all value flow graph nodes that reference the data source point based on the static value flow graph, using a DC state machine according to the value flow graph node type and the state update rule, and to determine the pointer in the value flow graph node whose state value meets the preset value as a potentially dangerous pointer access point.

潜在危险指针访问点筛选模块230，用于对所述潜在危险指针访问点进行筛选，得到最终的危险指针访问点。The potential dangerous pointer access point screening module 230 is used to screen the potential dangerous pointer access points to obtain final dangerous pointer access points.

漏洞检测模块240，用于根据所述危险指针访问点对所述待检测程序进行动态插桩，以实现对所述待检测程序中任意地址指针解引用漏洞的检测。The vulnerability detection module 240 is used to dynamically insert the program to be detected according to the dangerous pointer access point, so as to detect any address pointer dereference vulnerability in the program to be detected.

关于基于值流状态机的二进制代码漏洞检测装置的具体限定可以参见上文中对于基于值流状态机的二进制代码漏洞检测方法的限定，在此不再赘述。上述基于值流状态机的二进制代码漏洞检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For the specific limitations of the binary code vulnerability detection device based on the value stream state machine, please refer to the limitations of the binary code vulnerability detection method based on the value stream state machine above, which will not be repeated here. Each module in the above-mentioned binary code vulnerability detection device based on the value stream state machine can be implemented in whole or in part by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, or can be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是终端，其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于值流状态机的二进制代码漏洞检测方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏，该计算机设备的输入装置可以是显示屏上覆盖的触摸层，也可以是计算机设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be shown in FIG7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected via a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a binary code vulnerability detection method based on a value stream state machine is implemented. The display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer device may be a touch layer covered on the display screen, or a key, trackball or touchpad provided on the housing of the computer device, or an external keyboard, touchpad or mouse, etc.

本领域技术人员可以理解，图7中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art will understand that the structure shown in FIG. 7 is merely a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.

在一个实施例中，提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现以下步骤：In one embodiment, a computer device is provided, including a memory and a processor, wherein a computer program is stored in the memory, and when the processor executes the computer program, the following steps are implemented:

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器（ROM）、可编程ROM（PROM）、电可编程ROM（EPROM）、电可擦除可编程ROM（EEPROM）或闪存。易失性存储器可包括随机存取存储器（RAM）或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM（SRAM）、动态RAM（DRAM）、同步DRAM（SDRAM）、双数据率SDRAM（DDRSDRAM）、增强型SDRAM（ESDRAM）、同步链路（Synchlink） DRAM（SLDRAM）、存储器总线（Rambus）直接RAM（RDRAM）、直接存储器总线动态RAM（DRDRAM）、以及存储器总线动态RAM（RDRAM）等。Those of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing the relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods. Among them, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM) or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of the invention patent. It should be pointed out that, for a person of ordinary skill in the art, several variations and improvements can be made without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the protection scope of the patent of the present application shall be subject to the attached claims.

Claims

1. A binary code vulnerability detection method based on a value flow state machine, the method comprising:

acquiring a program to be detected, and preprocessing the program to be detected to obtain a corresponding intermediate code;

determining an external data introduction point in the intermediate code, and constructing a static value flow graph according to the intermediate code;

Based on the static value flow graph, taking the external data introducing point as a data source point, adopting a direct current state machine to determine state values of all value flow graph nodes referencing the data source point according to the value flow graph node type and a state updating rule, and determining pointers in the value flow graph nodes of which the state values accord with preset values as potential dangerous pointer access points;

screening the potential dangerous pointer access points to obtain a final dangerous pointer access point;

and dynamically instrumentation is carried out on the program to be detected according to the dangerous pointer access point so as to realize detection of any address pointer dereferencing loophole in the program to be detected.

2. The method for detecting a binary code vulnerability according to claim 1, wherein the program to be detected is a binary code.

3. The method for detecting a binary code vulnerability according to claim 2, wherein when the program to be detected is preprocessed to obtain the corresponding intermediate code, the method comprises:

And carrying out intermediate code lifting on the binary codes to obtain intermediate codes corresponding to the program to be detected.

4. A binary code vulnerability detection method as claimed in claim 3, wherein the intermediate code promotion of the binary code is performed using tools comprising llvm-mctoll or mcsema;

when the tool mcsema is used for carrying out intermediate code lifting on the binary code, metadata is added in the obtained intermediate code so as to store the corresponding original assembly instruction address and instruction information.

5. The binary code vulnerability detection method of claim 4, wherein the determining an external data introduction point in the intermediate code comprises:

Defining the API function in the intermediate code as a triple expression form comprising a function name, a parameter position of external data in the function and a reading length;

performing function call in the intermediate code by adopting instructions CallInst and InvokeInst, and judging whether the API function in each triplet expression form is an external data reading function or not;

and determining the external data introducing point according to the position of the API function which is judged to be the external data reading function.

6. The method of binary code vulnerability detection of claim 5, wherein determining the state values of all value flow graph nodes referencing the data source points using a direct current state machine according to value flow graph node types and state update rules comprises:

When the node type of the value flow graph is created for reading external data and an object, updating the state value of the node to be 1;

When the node type of the value flow graph is variable value copy, pointer calculation, variable comparison, variable binary operation, function real parameter, function shape parameter, function real return and function shape return, updating the node state value into the father node state value of the node;

when the node type of the value flow graph is memory reading and the parent node state value of the node is greater than 0, updating the node state value according to the parent node state value minus 1;

When the node type of the value flow graph is memory writing, if the father node variable of the node is used as a value, the state value of the node is updated according to the addition of 1 to the father node state value, and if the father node variable of the node is used as a pointer, the state value of the node is updated to the father node state value.

7. The method of binary code vulnerability detection of claim 6, wherein the screening the potentially dangerous pointer access points to obtain final dangerous pointer access points comprises:

among the potential dangerous pointer access points, the dangerous pointer access points with data only coming from external input sources are reserved, and other pointer access points are screened out;

and determining the potential dangerous pointer access point on the execution path which does not perform comparison operation on the external data in the process of reaching the dangerous pointer access point through the data source point as a final dangerous pointer access point in the screened potential dangerous pointer access points.

8. The method for detecting a binary code vulnerability according to claim 7, wherein when the program to be detected is dynamically instrumented according to the dangerous pointer access point, the method comprises:

Modifying an instruction corresponding to the dangerous pointer access point into a jump instruction for executing the detection of the vulnerability code in the binary code;

Wherein the vulnerability code detection comprises: vulnerability detection code and control flow context recovery code.

9. A binary code vulnerability detection apparatus based on a value flow state machine, the apparatus comprising:

The preprocessing module is used for acquiring a program to be detected, and preprocessing the program to be detected to obtain a corresponding intermediate code;

The external data introducing point set static value flow diagram obtaining module is used for determining external data introducing points in the intermediate code and constructing and obtaining a static value flow diagram according to the intermediate code;

The potential dangerous pointer access point determining module is used for determining state values of all value flow graph nodes referencing the data source point according to the value flow graph node type and the state updating rule by adopting a direct current state machine based on the static value flow graph, and determining pointers in the value flow graph nodes with the state values conforming to preset values as potential dangerous pointer access points;

The potential dangerous pointer access point screening module is used for screening the potential dangerous pointer access points to obtain final dangerous pointer access points;

and the vulnerability detection module is used for dynamically instrumentation the program to be detected according to the dangerous pointer access point so as to realize detection of any address pointer dereferencing vulnerability in the program to be detected.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 8 when the computer program is executed.