CN110618854A

CN110618854A - Virtual machine behavior analysis system based on deep learning and memory mirror image analysis

Info

Publication number: CN110618854A
Application number: CN201910772362.5A
Authority: CN
Inventors: 吴春明; 陈双喜; 王婉飞; 姜鑫悦; 吴安邦
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-08-21
Filing date: 2019-08-21
Publication date: 2019-12-27
Anticipated expiration: 2039-08-21
Also published as: CN110618854B

Abstract

The invention discloses a virtual machine behavior analysis system based on deep learning and memory mirror image analysis, which carries out delta coding by acquiring memory mirror image data, extracts map characteristic point information from a coded memory map, trains a neural network by using the obtained characteristic information to obtain a classifier, and finally operates the neural network to analyze unknown virtual machine behavior by using the obtained classifier. The invention has simple operation, easy realization and convenient modularization; the invention has wide application range, can be used for detecting various attack modes such as known attack, unknown attack and the like, and can not influence the detection performance of the invention even if an attacker leaves attack after being latent for a period of time; in addition, the invention has better robustness, reliability and usability on different system platforms.

Description

Virtual Machine Behavior Analysis System Based on Deep Learning and Memory Mirroring Analysis

技术领域technical field

本发明属于无线网络安全领域，具体是拟态主动防御领域，涉及一种基于深度学习与内存镜像分析的虚机行为分析系统。The invention belongs to the field of wireless network security, specifically the field of mimetic active defense, and relates to a virtual machine behavior analysis system based on deep learning and memory image analysis.

背景技术Background technique

虚拟化的云平台是云计算的重要部分。虚拟化的云平台是指在同一云平台上同时运行多个操作系统，每个系统拥有自己独立的运行空间。通过一台服务器上运行多台虚拟服务器，提高了机器的使用效率，从而减小硬件采购开支，是打造绿色数据中心的重要方式。基于虚拟机的云平台使用户能自主搭建自己的业务环境，运行稳定且具有良好的扩展性与迁移性，在金融行业、零售行业、数字营销、教育行业、政企单位等领域中应用广泛。The virtualized cloud platform is an important part of cloud computing. A virtualized cloud platform refers to running multiple operating systems on the same cloud platform at the same time, and each system has its own independent operating space. By running multiple virtual servers on one server, the utilization efficiency of the machine is improved, thereby reducing hardware procurement expenses, which is an important way to build a green data center. The virtual machine-based cloud platform enables users to independently build their own business environment. It runs stably and has good scalability and migration. It is widely used in financial industry, retail industry, digital marketing, education industry, government and enterprise units and other fields.

虚拟化的云平台结构具有开放特性，由此衍生出一系列与虚拟机相关的安全问题。虚拟机中运行的资源数据以及应用容易受到入侵者的损害。因此，虚拟机需要更多的安全保障机制来加速大规模云服务的部署。其中首要问题是如何实时正确地判断虚拟机行为，判断虚拟机是否遭受恶意攻击。The virtualized cloud platform structure is open, which leads to a series of security issues related to virtual machines. Resource data and applications running in a virtual machine are vulnerable to damage by intruders. Therefore, virtual machines need more security mechanisms to accelerate the deployment of large-scale cloud services. The primary problem is how to correctly judge the behavior of the virtual machine in real time and determine whether the virtual machine is under malicious attack.

目前，解决虚机运行安全的方法有：基于网络流数据、基于日志、基于先验知识的虚机运行状态判断方法。基于网络流数据的虚机行为判断方法通过判断虚拟机网卡接收的数据是否含恶意数据包，检测虚拟机是否遭受恶意攻击。这种方法需要通信协议可解析，无法应对未知协议。另外，利用大量的数据包导致该判断方法计算开销较大。基于日志分析的方法通过分析虚拟机系统日志判断虚拟机是否遭受恶意攻击。但是日志本身具有滞后性，往往系统需要判断一系列的活动与动作才能判定入侵的发生，这对于即使阻止活跃的入侵行为是十分不利的。基于先验知识的判断方法需要已知攻击行为，无法应对未知漏洞、未知后门、未知攻击。At present, the methods for solving virtual machine operation security include: methods for judging virtual machine operating status based on network flow data, based on logs, and based on prior knowledge. The virtual machine behavior judging method based on network flow data detects whether the virtual machine suffers malicious attacks by judging whether the data received by the network card of the virtual machine contains malicious data packets. This method requires the communication protocol to be parseable and cannot cope with unknown protocols. In addition, the use of a large number of data packets leads to a large calculation overhead of the judgment method. The method based on log analysis judges whether the virtual machine is under malicious attack by analyzing the virtual machine system log. However, the log itself has a hysteresis, and often the system needs to judge a series of activities and actions to determine the occurrence of an intrusion, which is very unfavorable for even preventing active intrusions. Judgment methods based on prior knowledge require known attack behaviors and cannot cope with unknown vulnerabilities, unknown backdoors, and unknown attacks.

为了实时保证虚拟机的安全性，亟需一种不依赖于已漏洞库、攻击库，且快速有效的虚机行为分析方法，以提高威胁发现的准确性和效率，实现虚拟机的可靠性、可用性、安全性。In order to ensure the security of virtual machines in real time, there is an urgent need for a fast and effective virtual machine behavior analysis method that does not depend on existing vulnerability databases and attack databases, so as to improve the accuracy and efficiency of threat discovery, and realize the reliability and security of virtual machines. Availability, security.

发明内容Contents of the invention

本发明的目的在于针对现有技术的不足，提供一种基于深度学习与内存镜像分析的虚机行为分析系统。本发明针对网络中的内部攻击与外部攻击、已知攻击与未知攻击，保证虚拟机平台的安全性，对未知威胁及时作出预警，能正确实时地判断虚拟机行为，提高云虚拟机的安全性、可靠性、可用性。The purpose of the present invention is to provide a virtual machine behavior analysis system based on deep learning and memory image analysis to address the deficiencies of the prior art. The present invention aims at internal attacks and external attacks, known attacks and unknown attacks in the network, ensures the security of the virtual machine platform, gives timely early warning for unknown threats, can correctly and real-time judge the behavior of the virtual machine, and improves the security of the cloud virtual machine , reliability, availability.

本发明的目的是通过以下技术方案来实现的：一种基于深度学习与内存镜像分析的虚机行为分析系统，包括以下步骤：The purpose of the present invention is achieved through the following technical solutions: a virtual machine behavior analysis system based on deep learning and memory mirroring analysis, comprising the following steps:

(1)获取内存镜像数据，包括以下子步骤：(1) Obtain memory mirroring data, including the following sub-steps:

(1.1)在初始时刻t₀，使用内存取证工具获取初始内存镜像数据，得到初始内存。(1.1) At the initial time t ₀ , use the memory forensics tool to obtain the initial memory image data, and obtain the initial memory.

(1.2)在任意时刻t₀+Δt，在VirtualBox，VMware虚拟化平台上，根据不同操作系统的内存管理机制，分别自动采样当前时刻各异构体不受攻击与受攻击情况下的内存镜像数据，得到当前内存，即正常样本与恶意样本。(1.2) At any time t ₀ +Δt, on VirtualBox and VMware virtualization platforms, according to the memory management mechanisms of different operating systems, automatically sample the memory image data of each isomer at the current moment when it is not attacked or under attack. , to get the current memory, that is, normal samples and malicious samples.

(2)进行delta编码，包括以下子步骤：(2) Carry out delta encoding, including the following sub-steps:

(2.1)运行内存取证工具，对步骤(1.1)获取的初始内存使用pslist以及dlllist命令，分别确定初始内存中的EXE类型的可执行文件与DLL类型的动态链接库列表。(2.1) Run the memory forensics tool, use the pslist and dlllist commands on the initial memory obtained in step (1.1), and determine the executable file of the EXE type and the dynamic link library list of the DLL type in the initial memory respectively.

(2.2)对步骤(1.2)得到的当前内存运行内存取证工具中的pslist以及dlllist命令，分别确定当前内存中的EXE类型的可执行文件与DLL类型的动态链接库列表；(2.2) pslist and dlllist commands in the current memory operation memory forensics tool that step (1.2) obtains determine the executable file of the EXE type and the dynamic link library list of the DLL type in the current memory respectively;

(2.3)分析步骤(2.1)和(2.2)得到的EXE类型的可执行文件与DLL类型的动态链接库列表，确定在当前内存中而不在初始内存中的可执行文件，称为新可执行文件；(2.3) The executable file of EXE type and the dynamic link library list of DLL type obtained by analysis steps (2.1) and (2.2), determine the executable file in the current memory but not in the initial memory, called new executable file ;

(2.4)根据初始内存，为每一个新可执行文件生成一个预测内存，包括以下子步骤：(2.4) According to the initial memory, generate a predicted memory for each new executable file, including the following sub-steps:

(2.4.1)确定每一个新可执行文件的进程ID，同时确定该进程在虚拟内存地址空间中的基地址；(2.4.1) determine the process ID of each new executable file, determine the base address of this process in the virtual memory address space simultaneously;

(2.4.2)对于每一个新可执行文件的所属进程，根据步骤(2.4.1)中的进程基地址，在当前内存中运行内存取证工具中的memmap命令提取进程虚拟内存与物理内存的映射关系；(2.4.2) For the process of each new executable file, according to the process base address in step (2.4.1), run the memmap command in the memory forensics tool in the current memory to extract the mapping between the process virtual memory and physical memory relation;

(2.4.3)将新可执行文件从虚拟磁盘上复制到初始内存中，对于新可执行文件的每个虚拟内存页，执行以下两步：首先，该虚拟内存页在当前内存中，使用步骤(2.4.2)中提取的虚拟内存与物理内存的映射关系，将新可执行文件复制到初始内存中；然后，记录页面复制信息，包括虚拟内存页的源页面位置、物理内存中目标页面位置、页面长度；最终生成预测内存；(2.4.3) Copy the new executable file from the virtual disk to the initial memory. For each virtual memory page of the new executable file, perform the following two steps: first, the virtual memory page is in the current memory, use the step The mapping relationship between virtual memory and physical memory extracted in (2.4.2), copy the new executable file to the initial memory; then, record the page copy information, including the source page position of the virtual memory page and the target page position in the physical memory , page length; finally generate prediction memory;

(2.5)输出头信息，包括需要加载的新可执行文件的路径信息以及步骤(2.4.3)中提取的所有新可执行文件的页面复制信息；(2.5) output header information, including the path information of the new executable file that needs to be loaded and the page copy information of all new executable files extracted in the step (2.4.3);

(2.6)将步骤(2.4)生成的预测内存作为源，当前内存作为比较对象，使用xdelta3编码，得到当前内存镜像数据编码后的内存图谱；用M、N分别表示内存图谱的行数和列数，用I(i,j)＝[a,b,c]表示内存图谱第i行第j列的元素；其中，0≤i＜M,0≤j＜N，a、b、c均为32位的浮点数，I(i,j)是一个三维向量；(2.6) Use the predicted memory generated in step (2.4) as the source, and the current memory as the comparison object, and use xdelta3 encoding to obtain the memory map after encoding the current memory image data; use M and N to represent the number of rows and columns of the memory map respectively , use I(i,j)=[a,b,c] to represent the elements of row i and column j of the memory map; among them, 0≤i<M, 0≤j<N, a, b, c are all 32 Bit floating point number, I(i,j) is a three-dimensional vector;

(3)提取步骤(2.6)得到的内存图谱特征点信息，包括特征点位置、特征点大小、特征点的特征强度，包括以下子步骤：(3) The feature point information of the memory map obtained in the extraction step (2.6), including the feature point position, the feature point size, and the feature strength of the feature point, including the following sub-steps:

(3.1)构建海森矩阵，具体为：计算内存图谱中每一个元素对应的海森矩阵H(i,j)的行列式，作为该元素的特征值，计算公式为：(3.1) Construct the Hessian matrix, specifically: calculate the determinant of the Hessian matrix H(i, j) corresponding to each element in the memory map, as the eigenvalue of the element, the calculation formula is:

det(H(i,j))＝D_ii·D_jj-0.9D_ij·D_ij det(H(i,j))＝D _ii ·D _jj -0.9D _ij ·D _ij

其中，D_ii＝I(i+1,j)+I(i-1,j)-2I(i,j)，D_jj＝I(i,j+1)+I(i,j-1)-2I(i,j)，D_ij＝I(i+1,j)+I(i,j-1)-2I(i,j)；Among them, D _ii =I(i+1,j)+I(i-1,j)-2I(i,j), D _jj =I(i,j+1)+I(i,j-1) -2I(i,j), D _ij =I(i+1,j)+I(i,j-1)-2I(i,j);

(3.2)采用SURF的方式构建尺度空间：首先采用9×9的盒子滤波器对内存图谱原图像进行滤波处理，作为最底层的图像；然后逐渐增大盒子滤波器的尺寸，对内存图谱原图像继续进行滤波处理；最终得到不同尺度的滤波响应图，构造尺度空间；所述尺度空间有4层，层与层之间的缩放比率为2；(3.2) Use SURF to construct the scale space: first use a 9×9 box filter to filter the original image of the memory map as the bottom image; then gradually increase the size of the box filter to filter the original image of the memory map Continue the filtering process; finally obtain filter response maps of different scales, and construct a scale space; the scale space has 4 layers, and the scaling ratio between layers is 2;

(3.3)精确定位特征点，具体为：在每一个3×3×3的局部区域中，对步骤(3.2)构建的尺度空间进行非最大值抑制；将尺度空间中每一个元素与其三维邻域的26个元素的特征值进行比较，其中特征值比周围26个元素都大或者都小的元素为特征点，记录特征点位置(i，j)及尺度s；(3.3) Accurately locate feature points, specifically: in each local region of 3×3×3, perform non-maximum suppression on the scale space constructed in step (3.2); combine each element in the scale space with its three-dimensional neighborhood Compare the eigenvalues of the 26 elements, and the elements whose eigenvalues are larger or smaller than the surrounding 26 elements are feature points, and record the feature point position (i, j) and scale s;

(3.4)根据阈值确定图谱特征点及特征向量，具体为：比较步骤(3.3)得到的每个特征点在相应尺度下的特征值与预设的阈值，如果对应的特征值小于预设的阈值，则该特征点不作为最终特征点；如果对应的特征值大于等于预设的阈值，则将该特征点作为最终特征点，特征向量表示为[i,j,s,det(H(i,j,s))]；其中，i、j是最终特征点在内存图谱中的行号与列号，s是最终特征点对应的滤波器尺度，det(H(i,j,s))是最终特征点在尺度s下的特征值；(3.4) Determine the map feature points and feature vectors according to the threshold, specifically: compare the feature value of each feature point obtained in step (3.3) at the corresponding scale with the preset threshold, if the corresponding feature value is less than the preset threshold , then the feature point is not used as the final feature point; if the corresponding feature value is greater than or equal to the preset threshold, the feature point is used as the final feature point, and the feature vector is expressed as [i,j,s,det(H(i, j,s))]; where i, j are the row number and column number of the final feature point in the memory map, s is the filter scale corresponding to the final feature point, det(H(i,j,s)) is The feature value of the final feature point at scale s;

(3.5)统计特征向量，具体为：判断步骤(3.4)得到的特征向量的来源，所述来源包括步骤(1.2)中不受攻击情况下的内存镜像数据和受攻击情况下的内存镜像数据；确定每个特征向量对应的标签z，用z＝0表示该特征向量来源于不受攻击情况下的内存镜像数据，用z＝1表示该特征向量来源于受攻击情况下的内存镜像数据；最终得到特征向量序列[i,j,s,det(H(i,j,s)),z]；(3.5) Statistical feature vectors, specifically: the source of the feature vector obtained in the judgment step (3.4), said source including the memory mirror data under the unattacked situation and the memory mirror data under the attacked situation in the step (1.2); Determine the label z corresponding to each eigenvector, use z=0 to indicate that the eigenvector comes from the memory image data under the unattacked situation, and use z=1 to indicate that the eigenvector comes from the memory image data under the attacked situation; finally Get the feature vector sequence [i,j,s,det(H(i,j,s)),z];

(4)训练神经网络，具体为：将步骤(3.5)得到的特征向量序列作为深度神经网络的输入样本，以虚拟机行为是否正常为输出，训练深度神经网络得到一个虚拟机行为分类器；(4) training the neural network, specifically: using the feature vector sequence obtained in step (3.5) as the input sample of the deep neural network, and whether the behavior of the virtual machine is normal is output, and training the deep neural network to obtain a virtual machine behavior classifier;

(5)运行神经网络，分析未知的虚拟机行为，具体为：用步骤(4)得到的虚拟机行为分类器对运行状态未知的虚拟机进行分析，判断未知的虚拟机行为是否正常。(5) Run the neural network to analyze the behavior of the unknown virtual machine, specifically: use the virtual machine behavior classifier obtained in step (4) to analyze the virtual machine with an unknown running state, and judge whether the behavior of the unknown virtual machine is normal.

进一步地，所述步骤(1.1)中初始时刻t₀为正实数。Further, the initial time t ₀ in the step (1.1) is a positive real number.

进一步地，所述步骤(1.2)中Δt为正实数。Further, Δt in the step (1.2) is a positive real number.

进一步地，所述步骤(1.2)中，对于正常样本，采用常用的内存镜像手段即可得到；对于恶意样本，为所有的异构执行体开辟共享空间存放不同种类的恶意工具样本，为所有异构执行体配置模拟入侵环境，从而得到虚拟化平台受不同类型攻击时的内存镜像数据。Further, in the step (1.2), normal samples can be obtained by using commonly used memory mirroring means; for malicious samples, a shared space is opened for all heterogeneous executables to store different types of malicious tool samples, and all heterogeneous The configuration of the execution body of the structure simulates the intrusion environment, so as to obtain the memory image data when the virtualization platform is attacked by different types.

进一步地，所述步骤(3.3)中一个元素三维邻域的26个元素指与该元素在同一尺度上的8个元素和在其之上及之下的两个尺度层的9个元素。Further, the 26 elements in the three-dimensional neighborhood of an element in the step (3.3) refer to 8 elements on the same scale as the element and 9 elements in two scale layers above and below it.

进一步地，所述步骤(3.4)中预设的阈值取决于要识别的特征数量，阈值设定越高，能识别的特征就越少。Further, the preset threshold in the step (3.4) depends on the number of features to be identified, and the higher the threshold is set, the fewer features can be identified.

进一步地，所述步骤(4)中深度神经网络为任意一种现有的深度神经网络结构。Further, the deep neural network in the step (4) is any existing deep neural network structure.

本发明的有益效果是：本发明利用内存镜像数据分析与深度学习机制，通过内存镜像数据的编码特征分析虚拟机行为属性；与已有虚拟平台状态分析方法相比，本发明操作简单，容易实现，便于模块化；本发明适用范围广，可用于检测已知攻击、未知攻击等多种攻击方式，即使攻击者潜伏一段时间后再发起攻击，也不会影响本发明的检测性能；此外，本发明在不同系统平台，都具有较好的鲁棒性、可靠性、可用性。The beneficial effects of the present invention are: the present invention utilizes the memory mirroring data analysis and deep learning mechanism to analyze the behavior attributes of the virtual machine through the encoding features of the memory mirroring data; compared with the existing virtual platform state analysis method, the present invention is simple in operation and easy to implement , which is convenient for modularization; the present invention has a wide application range and can be used to detect various attack methods such as known attacks and unknown attacks. The invention has good robustness, reliability and usability on different system platforms.

附图说明Description of drawings

图1为本发明实施例中的系统模型示意图；Fig. 1 is a schematic diagram of a system model in an embodiment of the present invention;

图2为本发明方法的流程图。Fig. 2 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

下面结合附图并举实施例对本发明的技术方案进行详细说明。The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and examples.

考虑到内存镜像数据能完整表示一台虚拟机的运行状态，因此，本发明利用内存镜像数据，结合深度神经网络，提出了一种基于深度学习与内存镜像分析的虚机行为分析系统。Considering that the memory image data can completely represent the running state of a virtual machine, the present invention uses the memory image data and combines with the deep neural network to propose a virtual machine behavior analysis system based on deep learning and memory image analysis.

如附图1所示，本实施例系统模型为：在一个虚拟平台上运行多个操作系统，包括WinServer、Ubuntu、CentOS、RedHat。通过人工手段向各个操作系统导入后门、病毒恶意工具数据库，可以随时获得各个系统不受攻击时的内存镜像数据与受到不同类型攻击后的内存镜像数据。本方法将利用这些数据通过内存图谱编码提取内存数据特征，进而利用内存特征判断虚机行为状态是否受到攻击，流程如附图2所示，具体包括以下步骤：As shown in accompanying drawing 1, the system model of this embodiment is: run multiple operating systems on a virtual platform, including WinServer, Ubuntu, CentOS, RedHat. By manually importing backdoors and virus malicious tool databases into each operating system, the memory mirror data of each system when it is not attacked and the memory mirror data after being attacked by different types can be obtained at any time. This method will use these data to extract memory data features through memory map coding, and then use the memory features to judge whether the behavior state of the virtual machine is under attack. The process is shown in Figure 2, which specifically includes the following steps:

步骤一、获取内存镜像数据；具体过程如下：Step 1. Obtain memory mirroring data; the specific process is as follows:

(1)在初始时刻t₀＝0，使用内存取证工具获取初始内存镜像数据；(1) At the initial time t ₀ =0, use the memory forensics tool to obtain the initial memory image data;

(2)经过Δt＝1时间，在VirtualBox，VMware虚拟化平台上，根据不同操作系统的内存管理机制，分别自动采样当前时刻各异构体正常情况(不受攻击)与受攻击情况下的内存镜像数据，即正常样本与恶意样本。对于正常样本，采用常用的内存镜像手段即可实现；对于恶意样本，为所有的异构执行体开辟共享空间存放不同种类的恶意工具样本，为所有异构执行体配置模拟入侵环境，从而得到虚拟平台受不同类型攻击时的内存镜像数据；(2) After Δt = 1 time, on VirtualBox and VMware virtualization platforms, according to the memory management mechanism of different operating systems, automatically sample the memory of each isomer in the normal state (not attacked) and under attack at the current moment Mirror data, that is, normal samples and malicious samples. For normal samples, common memory mirroring methods can be used; for malicious samples, a shared space is opened for all heterogeneous executables to store different types of malicious tool samples, and a simulated intrusion environment is configured for all heterogeneous executables to obtain a virtual Memory image data when the platform is attacked by different types;

步骤二、进行delta编码；具体过程如下：Step 2, perform delta encoding; the specific process is as follows:

(1)运行内存取证工具，对初始化的内存使用pslist以及dlllist命令分别确定初始化内存中的EXE类型的可执行文件与DLL类型的动态链接库列表；(1) Run the memory forensics tool, and use the pslist and dlllist commands on the initialized memory to determine the executable file of the EXE type and the dynamic link library list of the DLL type in the initialized memory;

(2)对当前的内存运行内存取证工具中的pslist以及dlllist命令分别确定当前内存中的EXE类型的可执行文件与DLL类型的动态链接库列表；(2) Run the pslist and dlllist commands in the memory forensics tool on the current memory to determine the executable file of the EXE type and the dynamic link library list of the DLL type in the current memory;

(3)分析上两步得到的EXE/DLL列表，确定在当前内存中而不在初始内存中的可执行(portable executable，PE)文件；(3) analyze the EXE/DLL list obtained in the previous two steps, and determine the executable (portable executable, PE) file in the current memory but not in the initial memory;

(4)根据初始内存，为每一个新的PE生成一个预测内存；(4) Generate a prediction memory for each new PE according to the initial memory;

a)确定每一个新PE的进程ID，同时确定该进程在虚拟内存地址空间中的基地址；a) determine the process ID of each new PE, and determine the base address of the process in the virtual memory address space;

b)对于每一个新PE的所属进程，在当前内存中运行内存取证工具中的memmap命令提取进程虚拟内存与物理内存的映射关系；b) For the process of each new PE, run the memmap command in the memory forensics tool in the current memory to extract the mapping relationship between the virtual memory and the physical memory of the process;

c)将新PE从虚拟磁盘上的对应文件复制到初始内存中；对于PE文件的每个虚拟内存页，执行以下两步：首先，如果该页在当前内存中，使用步骤二中(4)b)中得到的虚拟内存与物理内存映射关系，将PE文件复制到初始内存中；第二，记录页面复制信息，包括PE文件中源页面位置、物理内存中目标页面位置、页面长度；c) Copy the new PE from the corresponding file on the virtual disk to the initial memory; for each virtual memory page of the PE file, perform the following two steps: first, if the page is in the current memory, use step 2 (4) The virtual memory and physical memory mapping relationship obtained in b) copy the PE file into the initial memory; second, record the page copy information, including the source page position in the PE file, the target page position in the physical memory, and the page length;

(5)输出头信息，包括需要加载的新PE的路径信息以及每个PE的所有拷贝页面；(5) output header information, including the path information of the new PE to be loaded and all copied pages of each PE;

(6)将预测的内存作为源、当前内存作为比较对象，使用xdelta3编码，得到当前内存镜像数据编码后的内存图谱；用M,N分别表示图谱的行数和列数，用I(i,j)＝[a,b,c]表示图谱第i行第j列的元素，0≤i＜M,0≤j＜N，a,b,c均为32位的浮点数,I(i,j)是一个三维向量；(6) Using the predicted memory as the source and the current memory as the comparison object, use xdelta3 encoding to obtain the memory map after encoding the current memory image data; use M and N to represent the number of rows and columns of the map respectively, and use I(i, j)=[a,b,c] means the element in row i and column j of the map, 0≤i<M, 0≤j<N, a, b, and c are all 32-bit floating point numbers, I(i, j) is a three-dimensional vector;

步骤三、提取内存图谱特征点信息，包括特征点位置、特征点大小、特征点的特征强度；具体过程如下：Step 3. Extract the feature point information of the memory map, including the feature point position, feature point size, and feature strength of the feature point; the specific process is as follows:

(1)构建Hessian矩阵；(1) Construct the Hessian matrix;

Hessian矩阵是特征提取算法的核心算子。任意一个二元函数f(x,y)的Hessian矩阵H表示为：The Hessian matrix is the core operator of the feature extraction algorithm. The Hessian matrix H of any binary function f(x,y) is expressed as:

用矩阵H的行列式表示f(x,y)的特征值：Express the eigenvalues of f(x,y) by the determinant of matrix H:

对于特征提取过程，为加快实际应用中的计算速度，采用近似的方式求解海森矩阵，内存镜像图谱中第i行第j列的元素对应的海森矩阵H(i,j)的行列式计算为：For the feature extraction process, in order to speed up the calculation speed in practical applications, an approximate method is used to solve the Hessian matrix, and the determinant calculation of the Hessian matrix H(i, j) corresponding to the element in the i-th row and j-th column in the memory image map for:

det(H(i,j))＝D_ii·D_jj-0.9D_ij·D_ij；det(H(i,j))=D _ii ·D _jj -0.9D _ij ·D _ij ;

其中，·表示向量点积，即各元素乘积之和，D_ii＝I(i+1,j)+I(i-1,j)-2I(i,j)，D_jj＝I(i,j+1)+I(i,j-1)-2I(i,j)，D_ij＝I(i+1,j)+I(i,j-1)-2I(i,j)；Among them, represents the vector dot product, that is, the sum of the product of each element, D _ii =I(i+1,j)+I(i-1,j)-2I(i,j), D _jj =I(i,j) j+1)+I(i,j-1)-2I(i,j), D _ij =I(i+1,j)+I(i,j-1)-2I(i,j);

对内存镜像图谱中的每一个元素都做上述计算，得到图谱中每一个像素点的对应海森矩阵的行列式，即该像素点的特征值；Do the above calculations for each element in the memory image map, and obtain the determinant of the Hessian matrix corresponding to each pixel in the map, that is, the eigenvalue of the pixel;

(2)构建尺度空间；(2) Construct scale space;

尺度空间是一幅图谱在不同解析度下的表示；为了模拟图像数据的多尺度特征，在空间域与尺度域上找到极值点，确定初步的特征点，需要为图谱构建尺度空间，通过多次重复的二元函数与高斯函数核卷积构建图谱在不同尺度域上的特征值；Scale space is the representation of a map at different resolutions; in order to simulate the multi-scale characteristics of image data, find extreme points in the space domain and scale domain, and determine preliminary feature points, it is necessary to construct a scale space for the map. The repeated binary function and the Gaussian function kernel convolution construct the eigenvalues of the map on different scale domains;

本专利采用SURF的方式构建尺度空间；对于任意一张内存镜像图谱，都保持原图像大小不变，通过改变模板盒子尺寸对原图像进行滤波，构造出尺度空间；同时，SURF可以采用并行运算，对尺度空间中的各层图像同时进行处理；通过逐渐增大的盒子尺寸滤波模板与积分图像卷积，由各像素点对应的Hessian矩阵行列式得到响应图像，构造出金字塔；This patent adopts the method of SURF to construct the scale space; for any memory image map, the size of the original image is kept unchanged, and the original image is filtered by changing the size of the template box to construct the scale space; at the same time, SURF can use parallel computing, Simultaneously process the images of each layer in the scale space; through the convolution of the gradually increasing box size filter template and the integral image, the response image is obtained from the determinant of the Hessian matrix corresponding to each pixel, and a pyramid is constructed;

首先采用9×9的盒子滤波器得到的响应图像作为最底层的图像，然后逐渐增大盒子的尺寸，对原图像继续进行滤波处理；将尺度空间划分为4层，层与层之间的缩放比率为2，每一层包含不同尺度的滤波响应图；每层都是采用逐渐增大的滤波器尺寸进行处理，从而得到含有多层的一系列不同尺度的图谱；First, the response image obtained by the 9×9 box filter is used as the bottom image, and then the size of the box is gradually increased, and the original image is continued to be filtered; the scale space is divided into 4 layers, and the scaling between layers The ratio is 2, and each layer contains filter response maps of different scales; each layer is processed with a gradually increasing filter size to obtain a series of maps of different scales containing multiple layers;

(3)精确定位特征点；(3) Accurate positioning of feature points;

在每一个3×3×3的局部区域中，进行非最大值抑制；对于每一个像素点，与同一尺度上的8个点和在其之上及之下的两个尺度层9个点进行比较，只有比周围的26个领域值都大或者都小的极值点才能作为特征点，记录特征点位置(i,j)及尺度s；In each 3×3×3 local area, non-maximum suppression is performed; for each pixel, it is performed with 8 points on the same scale and 9 points in the two scale layers above and below it. In comparison, only extreme points that are larger or smaller than the surrounding 26 field values can be used as feature points, and record the feature point position (i, j) and scale s;

(4)根据阈值确定图谱特征点及特征向量；(4) Determine the feature points and feature vectors of the map according to the threshold;

对上一步中得到的每个特征点，比较该点在相应尺度下的特征值与预设的阈值。如果对应的特征值小于预设的阈值，则该点无法作为特征点；如果对应的特征值大于等于预设的阈值，则该点可以作为最终特征点，特征向量表示为[i,j,s,det(H(i,j,s))]，其中，i,j是该特征点在图谱中得行号与列号，s是该点可以作为特征点时对应的滤波器尺度，det(H(i,j,s))是该点在尺度s下的特征值；For each feature point obtained in the previous step, compare the feature value of the point at the corresponding scale with the preset threshold. If the corresponding eigenvalue is less than the preset threshold, the point cannot be used as a feature point; if the corresponding eigenvalue is greater than or equal to the preset threshold, the point can be used as the final feature point, and the feature vector is expressed as [i, j, s ,det(H(i,j,s))], where i, j are the row number and column number of the feature point in the map, s is the corresponding filter scale when the point can be used as a feature point, det( H(i,j,s)) is the eigenvalue of the point at scale s;

(5)统计特征向量；(5) Statistical feature vector;

对于上一步得到的特征向量，根据其来源，即来源于不受攻击的内存镜像数据还是受攻击的内存镜像数据，为每个特征向量确定对应的标签z，用z＝0表示该特征向量来源于不受攻击的内存镜像数据，用z＝1表示该特征向量来源于受攻击的内存镜像数据；For the feature vector obtained in the previous step, according to its source, that is, from the unattacked memory image data or the attacked memory image data, determine the corresponding label z for each feature vector, and use z=0 to indicate the source of the feature vector For the memory image data that is not attacked, use z=1 to represent that the feature vector comes from the memory image data that is attacked;

至此，Delta编码后的内存图谱通过特征提取，内存图谱抽象成为特定编码的带标签的特征向量序列；So far, the delta-encoded memory map is extracted through feature extraction, and the memory map is abstracted into a specific coded sequence of labeled feature vectors;

步骤四、训练神经网络；Step 4, train the neural network;

深度神经网络的一个输入样本表示为[i,j,s,det(H(i,j,s)),z]；选用一种现有的深度神经网络结构训练得到一个分类器，用于实际运行时分析未知虚拟机行为；An input sample of a deep neural network is expressed as [i, j, s, det(H(i, j, s)), z]; a classifier is obtained by training an existing deep neural network structure, which is used in practice Runtime analysis of unknown virtual machine behavior;

步骤五、运行神经网络，分析未知的虚拟机行为；Step 5. Run the neural network to analyze unknown virtual machine behaviors;

用步骤四训练好的神经网络对运行状态未知的虚拟机进行分析，判断未知的虚拟机行为是否正常。Use the neural network trained in step 4 to analyze the virtual machine whose running status is unknown, and judge whether the behavior of the unknown virtual machine is normal.

以上所述为本发明的一个实施例，本发明不受上述实施例限制，可将本发明的技术方案与实际应用场景结合确定具体实施方法。The above is an embodiment of the present invention, and the present invention is not limited by the above embodiment, and a specific implementation method can be determined by combining the technical solutions of the present invention with actual application scenarios.

Claims

1. A virtual machine behavior analysis system based on deep learning and memory mirroring analysis, characterized in that, comprising the following steps:

(1) Obtain memory mirroring data, including the following sub-steps:

(1.1) At the initial time t ₀ , use the memory forensics tool to obtain the initial memory image data, and obtain the initial memory.

(1.2) At any time t ₀ +Δt, on VirtualBox and VMware virtualization platforms, according to the memory management mechanisms of different operating systems, automatically sample the memory image data of each isomer at the current moment when it is not attacked or under attack. , to get the current memory, that is, normal samples and malicious samples.

(2) Carry out delta encoding, including the following sub-steps:

(2.1) Run the memory forensics tool, use the pslist and dlllist commands on the initial memory obtained in step (1.1), and determine the executable file of the EXE type and the dynamic link library list of the DLL type in the initial memory respectively.

(2.2) pslist and dlllist commands in the current memory operation memory forensics tool that step (1.2) obtains determine the executable file of the EXE type and the dynamic link library list of the DLL type in the current memory respectively;

(2.3) The executable file of EXE type and the dynamic link library list of DLL type obtained by analysis steps (2.1) and (2.2), determine the executable file in the current memory but not in the initial memory, called new executable file ;

(2.4) According to the initial memory, generate a predicted memory for each new executable file, including the following sub-steps:

(2.4.1) determine the process ID of each new executable file, determine the base address of this process in the virtual memory address space simultaneously;

(2.4.2) For the process of each new executable file, according to the process base address in step (2.4.1), run the memmap command in the memory forensics tool in the current memory to extract the mapping between the process virtual memory and physical memory relation;

(2.4.3) Copy the new executable file from the virtual disk to the initial memory. For each virtual memory page of the new executable file, perform the following two steps: first, the virtual memory page is in the current memory, use the step The mapping relationship between virtual memory and physical memory extracted in (2.4.2), copy the new executable file to the initial memory; then, record the page copy information, including the source page position of the virtual memory page and the target page position in the physical memory , page length; finally generate prediction memory;

(2.5) output header information, including the path information of the new executable file that needs to be loaded and the page copy information of all new executable files extracted in the step (2.4.3);

(2.6) Use the predicted memory generated in step (2.4) as the source, and the current memory as the comparison object, and use xdelta3 encoding to obtain the memory map after encoding the current memory image data; use M and N to represent the number of rows and columns of the memory map respectively , use I(i,j)=[a,b,c] to represent the elements of row i and column j of the memory map; among them, 0≤i<M, 0≤j<N, a, b, c are all 32 Bit floating-point number, I(i,j) is a three-dimensional vector;

(3) The feature point information of the memory map obtained in the extraction step (2.6), including the feature point position, the feature point size, and the feature strength of the feature point, including the following sub-steps:

(3.1) Construct the Hessian matrix, specifically: calculate the determinant of the Hessian matrix H(i, j) corresponding to each element in the memory map, as the eigenvalue of the element, the calculation formula is:

det(H(i,j))＝D _ii ·D _jj -0.9D _ij ·D _ij

Among them, D _ii =I(i+1,j)+I(i-1,j)-2I(i,j), D _jj =I(i,j+1)+I(i,j-1) -2I(i,j), D _ij =I(i+1,j)+I(i,j-1)-2I(i,j);

(3.2) Use SURF to construct the scale space: first use a 9×9 box filter to filter the original image of the memory map as the bottom image; then gradually increase the size of the box filter to filter the original image of the memory map Continue the filtering process; finally obtain filter response maps of different scales, and construct a scale space; the scale space has 4 layers, and the scaling ratio between layers is 2;

(3.3) Accurately locate feature points, specifically: in each local region of 3×3×3, perform non-maximum suppression on the scale space constructed in step (3.2); combine each element in the scale space with its three-dimensional neighborhood Compare the eigenvalues of the 26 elements, and the elements whose eigenvalues are larger or smaller than the surrounding 26 elements are feature points, and record the feature point position (i, j) and scale s;

(3.4) Determine the map feature points and feature vectors according to the threshold, specifically: compare the feature value of each feature point obtained in step (3.3) at the corresponding scale with the preset threshold, if the corresponding feature value is less than the preset threshold , then the feature point is not used as the final feature point; if the corresponding feature value is greater than or equal to the preset threshold, the feature point is used as the final feature point, and the feature vector is expressed as [i,j,s,det(H(i, j,s))]; where i, j are the row number and column number of the final feature point in the memory map, s is the filter scale corresponding to the final feature point, det(H(i,j,s)) is The feature value of the final feature point at scale s;

(3.5) Statistical feature vectors, specifically: the source of the feature vector obtained in the judgment step (3.4), said source including the memory mirror data under the unattacked situation and the memory mirror data under the attacked situation in the step (1.2); Determine the label z corresponding to each eigenvector, use z=0 to indicate that the eigenvector comes from the memory image data under the unattacked situation, and use z=1 to indicate that the eigenvector comes from the memory image data under the attacked situation; finally Get the feature vector sequence [i,j,s,det(H(i,j,s)),z];

(4) training the neural network, specifically: using the feature vector sequence obtained in step (3.5) as the input sample of the deep neural network, and whether the behavior of the virtual machine is normal is output, and training the deep neural network to obtain a virtual machine behavior classifier;

(5) Run the neural network to analyze the behavior of the unknown virtual machine, specifically: use the virtual machine behavior classifier obtained in step (4) to analyze the virtual machine with an unknown running state, and judge whether the behavior of the unknown virtual machine is normal.

2. The virtual machine behavior analysis system based on deep learning and memory mirroring analysis according to claim 1, characterized in that, in the step (1.1), initial time t ₀ is a positive real number.

3. The virtual machine behavior analysis system based on deep learning and memory mirroring analysis according to claim 1, wherein Δt in the step (1.2) is a positive real number.

4. The virtual machine behavior analysis system based on deep learning and memory mirroring analysis according to claim 1, characterized in that, in the step (1.2), for normal samples, it can be obtained by using commonly used memory mirroring means; for malicious Samples, open up a shared space for all heterogeneous executables to store different types of malicious tool samples, and configure simulated intrusion environments for all heterogeneous executables, so as to obtain memory image data when the virtualization platform is attacked by different types.

5. The virtual machine behavior analysis system based on deep learning and memory mirroring analysis according to claim 1, wherein the 26 elements in the three-dimensional neighborhood of an element in the step (3.3) are on the same scale as the element 8 elements of and 9 elements of the two scale layers above and below it.

6. The virtual machine behavior analysis system based on deep learning and memory mirroring analysis according to claim 1, wherein the preset threshold in the step (3.4) depends on the number of features to be identified, and the higher the threshold is set , the fewer features can be identified.

7. The virtual machine behavior analysis system based on deep learning and memory image analysis according to claim 1, wherein the deep neural network in the step (4) is any existing deep neural network structure.