[go: up one dir, main page]

CN101119373B - A gateway-level streaming virus scanning method and system thereof - Google Patents

A gateway-level streaming virus scanning method and system thereof Download PDF

Info

Publication number
CN101119373B
CN101119373B CN2007101213221A CN200710121322A CN101119373B CN 101119373 B CN101119373 B CN 101119373B CN 2007101213221 A CN2007101213221 A CN 2007101213221A CN 200710121322 A CN200710121322 A CN 200710121322A CN 101119373 B CN101119373 B CN 101119373B
Authority
CN
China
Prior art keywords
data
scanning
virus
buffer
streaming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2007101213221A
Other languages
Chinese (zh)
Other versions
CN101119373A (en
Inventor
龚晓锐
韦韬
朴爱花
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN2007101213221A priority Critical patent/CN101119373B/en
Publication of CN101119373A publication Critical patent/CN101119373A/en
Application granted granted Critical
Publication of CN101119373B publication Critical patent/CN101119373B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种网关级流式病毒扫描方法及其系统,其方法为首先对会话数据包组织成数据单元队列,然后分析数据单元内的数据类型是否支持流式扫描,根据数据类型对数据进行文件式扫描或流式扫描;其系统包括数据流获取装置、预处理装置、病毒扫描装置、存储上下文信息的表装置;本发明所提出的网关级流式病毒扫描方法和系统,在保证了对数据进行有效准确的病毒检测的同时,提高网关对于客户端的响应速度,并节省了网关的存储资源。

The invention discloses a gateway-level streaming virus scanning method and system thereof. The method comprises firstly organizing session data packets into data unit queues, and then analyzing whether the data types in the data units support streaming scanning, and scanning the data according to the data types. Carry out file scanning or flow scanning; Its system comprises data flow obtaining device, preprocessing device, virus scanning device, the table device of storing context information; Gateway level flow type virus scanning method and system that the present invention proposes, in guaranteeing While performing effective and accurate virus detection on the data, the response speed of the gateway to the client is improved, and the storage resources of the gateway are saved.

Description

一种网关级流式病毒扫描方法及其系统 A gateway-level streaming virus scanning method and system thereof

技术领域technical field

本发明涉及一种基于网关级流式病毒扫描方法及其系统,属于计算机网络和数据通信技术领域。The invention relates to a gateway-level streaming virus scanning method and system thereof, belonging to the technical fields of computer network and data communication.

背景技术Background technique

随着病毒传播日益网络化,使得网关级病毒过滤越来越得到关注。网关级病毒过滤工具,使病毒能够在局域网入口处即得到查杀,尽早地限制了病毒的传播。然而,由于受到传统病毒检测方法的限制,大多数网关级病毒过滤工具都使用了文件式查杀方式,即先在外存或内存中将多个网络数据包重新整合成一个完整的文件,再进行病毒扫描。这种方法需要先在网关处缓存多个数据包,整合成文件扫描病毒后再次转发,因此会造成数据包的延时,降低网络性能。尤其对于实时性要求较高的应用,如Web服务,影响较大。As the virus spreads increasingly networked, gateway-level virus filtering is getting more and more attention. The gateway-level virus filtering tool enables viruses to be detected and killed immediately at the entrance of the LAN, limiting the spread of viruses as early as possible. However, due to the limitations of traditional virus detection methods, most gateway-level virus filtering tools use file-based killing methods, that is, first reintegrate multiple network data packets into a complete file in external memory or memory, and then perform virus scan. This method needs to cache multiple data packets at the gateway first, integrate them into files to scan for viruses, and then forward them again, which will cause delays in data packets and reduce network performance. Especially for applications with high real-time requirements, such as Web services, the impact is relatively large.

在网关处通过文件方式查杀病毒,除了影响实时性能,还会占用大量内存。在网络流量大的局域网中,若对所有的会话均进行缓存,并在查杀病毒后进行释放,将增加网关大量的内存开销,严重时甚至会影响网关运行的稳定性。另外,有些病毒出现在会话的开始,但文件式查杀方式却需要缓存会话的全部内容后才进行病毒扫描,不仅浪费了系统资源,也影响了查杀病毒的及时性。Scanning and killing viruses through files at the gateway will not only affect real-time performance, but also occupy a large amount of memory. In a LAN with heavy network traffic, if all sessions are cached and released after virus detection, it will increase the gateway’s memory overhead, and even affect the stability of the gateway’s operation in severe cases. In addition, some viruses appear at the beginning of the session, but the file-type killing method needs to cache all the contents of the session before scanning the virus, which not only wastes system resources, but also affects the timeliness of virus killing.

另外,在网关处以文件方式查杀病毒,支持的协议也相对有限。由于全部数据都在网关处缓存,因此网关必须模拟实际的客户端与服务器通信,并在查杀病毒结束后,再次模拟服务器与实际的客户端建立连接并完成通信。对于一些复杂的应用协议,网关很难模拟整个通信过程,因此也很难得到完整的通信数据形成文件。In addition, the supported protocols for scanning and killing viruses in the form of files at the gateway are relatively limited. Since all data is cached at the gateway, the gateway must simulate the communication between the actual client and the server, and after the anti-virus is completed, simulate the connection between the server and the actual client again and complete the communication. For some complex application protocols, it is difficult for the gateway to simulate the entire communication process, so it is also difficult to obtain complete communication data to form a file.

发明内容Contents of the invention

本发明的目的在于提供一种新型网关级流式病毒扫描方法及其系统。The purpose of the present invention is to provide a novel gateway-level streaming virus scanning method and system thereof.

流式病毒扫描即不必等待网络的会话数据全部到来后还原成文件进行病毒扫描,而可以随着会话数据的传输,将数据流切分成数据单元进行连续的扫描,数据单元的接收、扫描、发送过程并行进行。Streaming virus scanning means that you don’t have to wait for all the session data from the network to arrive and restore them to files for virus scanning. Instead, with the transmission of session data, you can divide the data stream into data units for continuous scanning. The receiving, scanning, and sending of data units The process runs in parallel.

本发明采用如下技术方案实现本发明的目的:The present invention adopts following technical scheme to realize the purpose of the present invention:

一种新型网关级流式病毒扫描方法,其步骤为:A novel gateway-level streaming virus scanning method, the steps of which are:

1)将会话数据包组织成数据单元队列;1) Organizing the session data packet into a data unit queue;

2)判断数据单元内的数据类型是否支持流式扫描;2) Determine whether the data type in the data unit supports stream scanning;

若不支持流式扫描,则进行文件式扫描;If stream scanning is not supported, file scanning is performed;

若支持流式扫描,则进行下列步骤:If stream scanning is supported, perform the following steps:

a)将数据单元分组进行病毒扫描;a) grouping data units for virus scanning;

b)判断扫描的数据单元是否安全,如果不安全则停止扫描并关闭与客户端的连接,退出实例进程;如果安全则发送该组数据;b) Judging whether the scanned data unit is safe, if it is not safe, stop scanning and close the connection with the client, and exit the instance process; if it is safe, send the group of data;

c)判断会话是否结束,如果没有结束则保留该组数据的边界信息,利用上下文信息的表装置进行边界处理;其方法为:A)对支持流式扫描的会话数据分组复制到上下文信息的表装置的buffer缓冲区中,复制的起始地址为:buffer+offset+cur_buff_len;如果缓冲区中存有上一次扫描的边界数据,则新读入的数据紧跟在边界数据的后面,下一步的扫描操作可以将这两部分数据作为一个整体进行计算;B)将该组数据同时复制到上下文信息的表装置维护的数据单元队列bb中;C)将上下文信息的表装置的buffer中的数据通过病毒扫描装置的流式扫描接口传递给病毒扫描装置;c) judge whether the session ends, if not, then retain the boundary information of the group of data, and use the table device of context information to carry out boundary processing; the method is: A) copy the session data packets supporting streaming scanning to the table of context information In the buffer buffer of the device, the starting address of copying is: buffer+offset+cur_buff_len; if there is boundary data from the previous scan in the buffer, the newly read data follows the boundary data, and the next step The scanning operation can calculate these two parts of data as a whole; B) copy the group of data to the data unit queue bb maintained by the table device of the context information at the same time; C) pass the data in the buffer of the table device of the context information through The stream scanning interface of the virus scanning device is passed to the virus scanning device;

其中,所述上下文信息的表装置的数据结构关键域包括:buffer:其功能为基于流式扫描模式下,存储待扫描的数据,以及处理边界效应时需要存储的数据,数据类型为字符型数组;cur_buff_len:其功能为存储当前buffer中存储数据长度的值,数据类型为整型;offset:其功能为存储当前buffer中存储数据的起始位置距buffer首地址的偏移,数据类型为整型;tmpFile:其功能为基于文件扫描模式中,存储队列数据的临时文件,数据类型为文件指针;partcnt:其功能为流式扫描时,记录病毒扫描过程中,针对多模式病毒特征码已经匹配的子模式信息,以及它们的位置关系,数据类型为结构体链表;Wherein, the key field of the data structure of the table device of the context information includes: buffer: its function is to store the data to be scanned based on the streaming scanning mode, and the data that needs to be stored when dealing with boundary effects, and the data type is a character array ;cur_buff_len: Its function is to store the value of the length of the data stored in the current buffer, and the data type is an integer; offset: Its function is to store the offset between the starting position of the data stored in the current buffer and the first address of the buffer, and the data type is an integer ;tmpFile: Its function is based on the file scanning mode, the temporary file that stores the queue data, and the data type is a file pointer; partcnt: Its function is to record during the virus scanning process, the multi-mode virus signature has been matched Sub-mode information, and their positional relationship, the data type is a structure linked list;

d)重复步骤1)和2),如果结束则退出实例进程。d) Repeat steps 1) and 2), and exit the instance process if it ends.

所述边界信息为该组数据单元内,包含的病毒库最大病毒特征码长度的末尾数据。The boundary information is the end data of the maximum virus signature length of the virus database included in the group of data units.

所述边界信息包括记录该组数据中针对多模式病毒特征码已检测到的子模式的序号,以及各个子模式之间的位置关系。The boundary information includes recording the sequence numbers of the sub-patterns detected for the multi-pattern virus signature in the group of data, and the positional relationship between the sub-patterns.

所述流式扫描方法中利用上下文信息的表装置进行多模式病毒特征码匹配时处理边界问题的方法为:在partcnt中记录多模式病毒特征码序号,以及已经匹配的子模式信息,以供后续数据到来时继续进行匹配;如果扫描数据匹配了该多模式类型的病毒特征码中定义的全部子模式,且满足规定的位置关系,则认为当前数据被感染。In the stream scanning method, the method for dealing with the boundary problem when using the table device of context information to match the multi-mode virus signature is: record the serial number of the multi-mode virus signature in partcnt, and the matched sub-pattern information for subsequent When the data arrives, continue to match; if the scanned data matches all the subpatterns defined in the virus signature of the multi-pattern type and satisfies the specified positional relationship, the current data is considered to be infected.

所述会话数据由多个数据单元队列构成,最后一个队列的最后一个数据单元中存有结束标志EOS。The session data is composed of multiple data unit queues, and the last data unit of the last queue stores an end flag EOS.

所述不支持流式扫描的数据包载荷数据类型包括但不限于下列类型的一种或几种:gzip、zip、bz、rar、base64。The data packet payload data types that do not support stream scanning include, but are not limited to, one or more of the following types: gzip, zip, bz, rar, and base64.

一种网关级流式病毒扫描系统,其包括:A gateway-level streaming virus scanning system, comprising:

数据流获取装置,用于获取流入的数据单元,针对会话组织成数据单元队列;A data flow acquiring device, configured to acquire incoming data units, and organize them into data unit queues for sessions;

预处理装置,分析数据类型是否支持流式扫描;A preprocessing device to analyze whether the data type supports streaming scanning;

病毒扫描装置,分别针对流式病毒扫描和文件式病毒扫描两种接口的数据进行病毒扫描处理;A virus scanning device, which performs virus scanning processing on the data of the two interfaces of streaming virus scanning and file virus scanning respectively;

数据流发送装置,用于转发处理结束后的数据;A data stream sending device, configured to forward the processed data;

存储上下文信息的表装置,用于处理流式扫描时因分割数据流而引起的病毒分割问题;其数据结构关键域包括:The table device for storing context information is used to deal with the virus segmentation problem caused by segmenting data streams during stream scanning; the key domains of its data structure include:

buffer:其功能为基于流式扫描模式下,存储待扫描的数据,以及处理边界效应时需要存储的数据,数据类型为字符型数组;buffer: Its function is to store the data to be scanned and the data to be stored when dealing with boundary effects based on the streaming scanning mode, and the data type is a character array;

cur_buff_len:其功能为存储当前buffer中存储数据长度的值,数据类型为整型;cur_buff_len: Its function is to store the value of the length of data stored in the current buffer, and the data type is integer;

offset:其功能为存储当前buffer中存储数据的起始位置距buffer首地址的偏移,数据类型为整型;offset: Its function is to store the offset from the starting position of the data stored in the current buffer to the first address of the buffer, and the data type is an integer;

tmpFile:其功能为基于文件扫描模式中,存储队列数据的临时文件,数据类型为文件指针;tmpFile: Its function is based on the temporary file for storing queue data in the file scanning mode, and the data type is a file pointer;

partcnt:其功能为流式扫描时,记录多模式病毒扫描过程中,已经匹配的子模式信息,以及它们的位置关系,数据类型为结构体链表。partcnt: Its function is to record the matched sub-pattern information and their positional relationship during the multi-pattern virus scanning process during stream scanning, and the data type is a structure linked list.

本发明的积极效果及优势Positive effect and advantage of the present invention

本发明所提出的这种流式病毒检测系统和方法,本发明的方法是对流经的网络数据流依照流经的顺序进行病毒扫描,数据单元的接收、扫描、发送过程并行进行,而不用将所有数据包缓存、整合成文件后再进行扫描,从而减少了数据传输的延迟,提高了扫描效率;在保证了对数据进行有效准确的病毒检测的同时,提高网关对于客户端的响应速度,并节省了网关的存储资源。The streaming virus detection system and method proposed by the present invention, the method of the present invention is to scan the network data flow passing through according to the order of the flow, and the receiving, scanning and sending processes of the data units are carried out in parallel, without the need to All data packets are cached and integrated into files before scanning, which reduces the delay of data transmission and improves scanning efficiency; while ensuring effective and accurate virus detection of data, it improves the response speed of the gateway to the client and saves storage resources of the gateway.

附图说明Description of drawings

图1为本发明方法的流程图;Fig. 1 is the flowchart of the inventive method;

图2为本发明实施例的网络拓扑图;Fig. 2 is a network topology diagram of an embodiment of the present invention;

图3为本发明的系统结构图。Fig. 3 is a system structure diagram of the present invention.

具体实施方式Detailed ways

以下参照附图对本发明的网关级流式病毒扫描系统进行详细说明。The gateway-level streaming virus scanning system of the present invention will be described in detail below with reference to the accompanying drawings.

本发明方法的流程如图1所示,其工作过程如下:The flow process of the inventive method is as shown in Figure 1, and its work process is as follows:

(1)系统初始化。该步骤包括两方面的工作,一是初始化用于存储上下文信息的表装置的数据结构,以辅助完成流式病毒扫描,二是初始化病毒扫描装置。初始化病毒扫描装置包括导入病毒库,导入过程中根据每条病毒特征码的长度计算出当前病毒库中最大病毒码的长度maxpatlen:单模式病毒特征码的长度即为特征码字符串的长度,多模式病毒特征码的长度则根据包含子模式的长度及其偏移关系进行计算。病毒扫描装置的运行周期是从协议层会话的第一个数据包到来开始,到最后一个数据包结束。因此,病毒扫描装置应该在一个完整的协议层会话范围内进行上下文保护。当网关每收到一个数据包的数据时,数据获取装置便会创建一个专属于这个请求的病毒扫描装置实例用于处理当前队列。当同属于本次会话的下一个数据到来时,数据流获取装置会自动调用属于同一请求的过滤实例。当本次会话的所有数据处理结束之后,该病毒扫描实例将会退出,结束运行。存储上下文信息的表装置贯穿整个会话的处理过程。(1) System initialization. This step includes two aspects of work, one is to initialize the data structure of the table device for storing the context information to assist in completing streaming virus scanning, and the other is to initialize the virus scanning device. Initializing the virus scanning device includes importing the virus database. During the import process, the length maxpatlen of the largest virus code in the current virus database is calculated according to the length of each virus signature code: the length of the single-mode virus signature code is the length of the signature string. The length of the pattern virus signature is calculated according to the length of the included subpattern and its offset relationship. The running period of the virus scanning device starts from the arrival of the first data packet of the protocol layer session and ends with the last data packet. Therefore, the virus scanning device should perform context protection within the scope of a complete protocol layer session. When the gateway receives data of a data packet, the data acquisition device will create a virus scanning device instance dedicated to this request for processing the current queue. When the next data belonging to the same session arrives, the data stream acquisition device will automatically invoke the filtering instance belonging to the same request. When all the data processing of this session is finished, the virus scanning instance will exit and end its operation. A table device that stores context information throughout the processing of a session.

(2)数据流获取,将会话数据包组织成数据单元队列。属于同一会话的数据包具有相同的源地址、目的地址、源端口、目的端口、协议号等基本信息,且所属高层协议具有相应的状态信息、时间戳等标识。数据获取模块能够根据以上信息区分数据包所属会话。(2) Data flow acquisition, organizing session data packets into data unit queues. Data packets belonging to the same session have the same basic information such as source address, destination address, source port, destination port, protocol number, etc., and the high-level protocols to which they belong have corresponding status information, time stamps, and other identifications. The data acquisition module can distinguish the session to which the data packet belongs according to the above information.

当网关每收到一个数据包时,数据获取模块便判断数据包所属的会话,并依照流入顺序将一定数目的数据包存入数据单元中,再将该数据单元插入到所属会话的数据单元队列。若当前不存在该会话的数据单元队列,则新建专属的数据单元队列。若数据包中包含会话的结束标志,数据流获取装置则在所属队列末尾的数据单元中插入“EOS”标志。一次完整的会话,将由多个数据单元队列构成。数据流获取装置将组织好的数据单元队列传递给病毒扫描装置。When the gateway receives a data packet, the data acquisition module will determine the session to which the data packet belongs, and store a certain number of data packets into the data unit according to the inflow sequence, and then insert the data unit into the data unit queue of the session . If there is no data unit queue for the session currently, create a dedicated data unit queue. If the data packet contains an end-of-session flag, the device for obtaining the data stream inserts an "EOS" flag into the data unit at the end of the queue to which it belongs. A complete session will consist of multiple data unit queues. The data flow obtaining device transmits the organized data unit queue to the virus scanning device.

(3)数据流预处理,判断数据单元的数据类型是否支持流式扫描。这一步的目的是为了下一步的扫描过程做准备,机制分为两种情况:基于流式扫描的预处理;基于文件式扫描的预处理。(3) Data stream preprocessing, judging whether the data type of the data unit supports stream scanning. The purpose of this step is to prepare for the next scanning process. The mechanism is divided into two situations: preprocessing based on streaming scanning; preprocessing based on file scanning.

系统中预声明了一些不支持流式扫描的数据类型,例如:gzip、zip、bz、rar、base64等。同时为了协助病毒扫描装置处理流式查杀病毒时引起的病毒分割问题建立了存储上下文信息的表装置,因为数据单元队列传输数据时,数据单元与数据单元之间、队列与队列之间会形成边界效应。存储上下文信息的表装置如表1所示。Some data types that do not support stream scanning are pre-declared in the system, such as: gzip, zip, bz, rar, base64, etc. At the same time, in order to assist the virus scanning device to deal with the virus segmentation problem caused by streaming virus scanning and killing, a table device for storing context information is established, because when the data unit queue transmits data, there will be formation between data units and between queues. border effect. The table device for storing context information is shown in Table 1.

表1.streamav_ctx表Table 1. streamav_ctx table

数据信息Data information 数据类型type of data 含义meaning bufferbuffer 字符型数组character array 用于流式扫描模式,存储待扫描的数据,以及处理边界效应时需要存储的数据Used in streaming scanning mode, storing data to be scanned, and data that needs to be stored when dealing with boundary effects cur_buff_lencur_buff_len 整型integer 用于流式扫描模式,当前buffer中存储的数据长度For streaming scan mode, the length of data stored in the current buffer offsetoffset 整型integer 用于流式扫描模式,当前buffer中存储的数据的起始位置距buffer首地址的偏移For streaming scan mode, the offset between the starting position of the data stored in the current buffer and the first address of the buffer tmpFiletmpFile 文件指针file pointer 用于文件式扫描模式,存储队列数据的临时文件Temporary file for storing queue data for file scan mode bbbb 字符型数组链表character array linked list 用于流式扫描模式,病毒扫描装置实例自已维护的数据单元队列Used in streaming scanning mode, the data unit queue maintained by the virus scanning device instance itself partcntpartcnt 结构体链表Structure linked list 用于流式扫描模式,记录多模式病毒扫描过程中,已经匹配的子模式信息,以及它们的位置关系It is used in flow scanning mode to record the matched sub-pattern information and their positional relationship during the multi-mode virus scanning process

数据流预处理时,如果数据类型解析结果为特殊类型,则需要使用基于文件式扫描的模式对数据进行预处理,即直接将数据流获取装置得到的数据单元数据缓存入临时文件(临时文件指针存入streamav_ctx表结构的tmpFile域)。当次会话结束后,将对临时文件中的数据内容进行适当处理(如将文件解压缩),再调用病毒扫描装置的文件式扫描接口。During data stream preprocessing, if the result of the data type analysis is a special type, it is necessary to use a file-based scanning mode to preprocess the data, that is, directly cache the data unit data obtained by the data stream acquisition device into a temporary file (temporary file pointer Stored in the tmpFile field of the streamav_ctx table structure). After the session ends, the data content in the temporary file will be properly processed (such as decompressing the file), and then the file-type scanning interface of the virus scanning device will be called.

如果数据类型为非特殊类型,病毒扫描装置使用基于流式扫描模式的数据预处理机制,调用病毒扫描装置的流式扫描接口,进行流式病毒扫描,在流式病毒扫描的过程中,病毒扫描装置将队列中的每三个数据单元分为一组,每次对当前的三个数据单元进行完整的处理之后,包括预处理、扫描、发送,才对剩余的数据单元进行处理。以下是某一次预处理的详细过程。If the data type is not a special type, the virus scanning device uses the data preprocessing mechanism based on the streaming scanning mode to call the streaming scanning interface of the virus scanning device to perform streaming virus scanning. During the streaming virus scanning process, the virus scanning The device divides every three data units in the queue into a group, and processes the remaining data units after each complete processing of the current three data units, including preprocessing, scanning, and sending. The following is a detailed process of a preprocessing.

A)将数据单元队列中的前三个数据单元的数据复制到表streamav_ctx的buffer缓冲区中,复制的起始地址为:buffer+offset+cur_buff_len。如果缓冲区中存有上一次扫描的边界数据,则新读入的数据紧跟在边界数据的后面,下一步的扫描操作可以将这两部分数据作为一个整体进行计算。A) Copy the data of the first three data units in the data unit queue to the buffer buffer of the table streamav_ctx, and the starting address of the copy is: buffer+offset+cur_buff_len. If there is boundary data from the previous scan in the buffer, the newly read data follows the boundary data, and the next scanning operation can calculate these two parts of data as a whole.

B)将这三个数据单元中的数据同时复制到streamav_ctx维护的数据单元队列bb中。B) Copy the data in these three data units to the data unit queue bb maintained by streamav_ctx at the same time.

C)将表streamav_ctx的buffer中的数据传递给病毒扫描装置。C) Transfer the data in the buffer of the table streamav_ctx to the virus scanning device.

(4)病毒扫描。病毒扫描装置提供文件式扫描接口与流式扫描接口。在初始化以及预处理的过程中,已经为病毒扫描准备好了相应接口的扫描数据,此步骤只需要直接调用病毒扫描装置的相应接口即可。若为文件式扫描,则病毒扫描装置扫描streamav_ctx的tmpFile所指向的临时文件;若为流式扫描,则病毒扫描装置扫描streamav_ctx表bb队列中的数据单元数据。如果扫描结果显示数据为安全的,而且本次会话中还有数据需要处理,即没有会话结束标志EOS出现,则病毒扫描装置进行边界数据的保护处理。病毒扫描装置带有病毒库,病毒库中包含病毒特征码集合。如果被扫描内容包含与病毒库中的病毒特征码相匹配的字符串序列,则可判断该扫描内容被此种病毒所感染。病毒扫描装置在初始化过程中记录病毒库中病毒特征码的最大长度,以供病毒扫描系统在病毒扫描过程中处理数据分割引起的边界效应。病毒扫描装置在进行边界处理时,将buffer所存储数据单元末尾maxpatlen长度的数据保存下来。如果扫描数据匹配了某个单模式类型的病毒特征码,则认为当前数据被感染。如果扫描数据匹配了某个多模式类型的病毒特征码中定义的全部子模式,且满足规定的位置关系,则认为当前数据被感染。如果扫描数据匹配了某个多模式类型的病毒特征码中定义的部分子模式,则在streamav_ctx表的partcnt中记录该多模式病毒特征码序号,以及已经匹配的子模式信息,以供后续数据到来时继续进行匹配。检测出的被感染数据将被直接丢弃,不继续处理后续数据。病毒扫描装置将扫描结果提交给数据流发送装置。(4) Virus scanning. The virus scanning device provides a file scanning interface and a stream scanning interface. In the process of initialization and preprocessing, the scanning data of the corresponding interface has been prepared for virus scanning, and this step only needs to directly call the corresponding interface of the virus scanning device. If it is file scanning, the virus scanning device scans the temporary file pointed to by tmpFile of streamav_ctx; if it is stream scanning, the virus scanning device scans the data unit data in the streamav_ctx table bb queue. If the scanning result shows that the data is safe, and there is still data to be processed in this session, that is, no session end sign EOS appears, then the virus scanning device performs boundary data protection processing. The virus scanning device has a virus database, and the virus database contains a collection of virus signatures. If the scanned content contains a character string sequence matching the virus signature in the virus database, it can be determined that the scanned content is infected by the virus. The virus scanning device records the maximum length of the virus signature code in the virus database during the initialization process, so that the virus scanning system can handle the boundary effect caused by data segmentation during the virus scanning process. When the virus scanning device performs boundary processing, it saves the data of the length maxpatlen at the end of the data unit stored in the buffer. If the scanned data matches a virus signature of a single pattern type, the current data is considered infected. If the scanned data matches all the sub-patterns defined in the virus signature of a certain multi-pattern type and satisfies the specified positional relationship, the current data is considered to be infected. If the scanned data matches part of the subpatterns defined in a virus signature of a certain multi-pattern type, record the serial number of the multi-pattern virus signature and the matched subpattern information in the partcnt of the streamav_ctx table for subsequent data arrival continue to match. The detected infected data will be directly discarded, and subsequent data will not be processed. The virus scanning device submits the scanning result to the data stream sending device.

(5)数据流输出。如果已检测数据为安全的,数据流发送装置立刻将这部分数据发送给客户端。对于流式扫描模式,如果当前处理的数据单元队列中还有数据单元等待处理,或当前处理的数据单元队列已没有数据,但没有出现表示会话结束的EOS数据单元,则病毒扫描装置实例继续保留在内存中,处理或等待处理剩余的数据单元数据。(5) Data stream output. If the detected data is safe, the data stream sending device immediately sends this part of data to the client. For streaming scanning mode, if there are data units waiting to be processed in the currently processed data unit queue, or there is no data in the currently processed data unit queue, but there is no EOS data unit indicating the end of the session, the virus scanning device instance will continue to be reserved In memory, process or wait to process the remaining data unit data.

如果被测数据含有病毒,数据流发送装置向客户端发送提示信息,关闭与客户端的连接,退出实例进程。如果病毒扫描装置已经处理了本次会话的所有数据,它也将退出实例进程。退出实例进程之前会释放扫描过程中申请的内存空间。If the measured data contains a virus, the data stream sending device sends a prompt message to the client, closes the connection with the client, and exits the instance process. If the virus scanning device has processed all data for this session, it will also exit the instance process. Before exiting the instance process, the memory space allocated during the scanning process will be released.

一种网关级流式病毒扫描系统,如图3所示。附图2为本发明系统运行于网关处,位于局域网与外网之间。本发明提出的系统包括:数据流获取装置,用于获取流入的数据包,针对会话组织成数据流队列;预处理装置,用于分析数据流队列,判断数据类型;病毒扫描装置,支持流式病毒扫描和文件式病毒扫描两种接口,分别针对不同类型的数据进行病毒扫描处理;数据流发送装置,用于转发处理结束后的数据,并释放扫描过程中占用的内存空间;存储上下文信息的表装置,用于处理流式扫描时因分割数据流而引起的病毒分割问题。其中,流式病毒扫描是指对流经的网络数据流依照流经的顺序进行病毒扫描,数据单元的接收、扫描、发送过程并行进行,而不用将所有数据单元缓存、整合成文件后再进行扫描,从而减少了数据传输的延迟,提高了扫描效率。A gateway-level streaming virus scanning system is shown in FIG. 3 . Accompanying drawing 2 is that the system of the present invention runs at the gateway and is located between the local area network and the external network. The system proposed by the present invention includes: a data flow acquisition device for obtaining incoming data packets, and organizes data flow queues for sessions; a preprocessing device for analyzing data flow queues and judging data types; a virus scanning device that supports streaming Two interfaces, virus scanning and file virus scanning, are used for virus scanning and processing for different types of data respectively; the data stream sending device is used to forward the processed data and release the memory space occupied during the scanning process; the device for storing context information A table device for dealing with virus segmentation problems caused by segmenting data streams during stream scanning. Among them, streaming virus scanning refers to virus scanning of the flowing network data streams according to the order of the flow, and the receiving, scanning, and sending processes of data units are carried out in parallel, instead of caching and integrating all data units into files before scanning , thereby reducing the delay of data transmission and improving the scanning efficiency.

其中,数据流获取装置能够根据会话区分流入的网络数据包,并依照流入顺序将一定数目的数据包存储在一起形成利于病毒扫描的“数据单元”,再将属于同一会话的“数据单元”排列成“数据单元队列”。其中,“数据单元”、“数据单元队列”均对应数据获取模块定义的特定数据结构,用于高效的内存操作和管理。数据单元对应的数据结构中,各个域主要包括包含的网络数据包的内容、大小、基本操作的函数指针、以及所在的数据单元队列指针等。数据单元队列对应的数据结构为包含一系列相关数据单元的环状链表,用来提供灵活高效的内存操作,如数据单元的分配、回收等,供其他模块使用。一次完整的会话,将由多个数据单元队列构成。数据流获取装置能够判断会话是否结束,并在队列末尾的数据单元中插入一个“EOS”标志,以标识会话的结束。数据流获取装置将组织好的数据单元队列传递给病毒扫描装置。Among them, the data flow acquisition device can distinguish the incoming network data packets according to the session, and store a certain number of data packets together according to the inflow sequence to form a "data unit" that is conducive to virus scanning, and then arrange the "data units" belonging to the same session into a "data unit queue". Among them, "data unit" and "data unit queue" both correspond to specific data structures defined by the data acquisition module for efficient memory operation and management. In the data structure corresponding to the data unit, each field mainly includes the content and size of the included network data packet, the function pointer of the basic operation, and the queue pointer of the data unit where it is located. The data structure corresponding to the data unit queue is a circular linked list containing a series of related data units, which is used to provide flexible and efficient memory operations, such as allocation and recovery of data units, for use by other modules. A complete session will consist of multiple data unit queues. The data stream acquisition device can judge whether the session is over, and insert an "EOS" flag into the data unit at the end of the queue to mark the end of the session. The data flow obtaining device transmits the organized data unit queue to the virus scanning device.

预处理装置对数据单元队列中的数据类型进行分析,若数据为不支持流式扫描的数据类型(如gzip、rar等压缩类型,或base64等编码类型),则需要将当次会话的所有数据缓存成完整的文件,经过适当处理(如将文件解压缩)后,再调用病毒扫描装置的文件式扫描接口。若数据单元队列中的数据类型不是特殊类型,则调用病毒扫描装置的流式扫描接口,进行流式病毒扫描。The preprocessing device analyzes the data type in the data unit queue. If the data is a data type that does not support streaming scanning (such as compression types such as gzip and rar, or encoding types such as base64), all data in the current session need to be It is cached as a complete file, and after being properly processed (such as decompressing the file), the file-based scanning interface of the virus scanning device is called. If the data type in the data unit queue is not a special type, the streaming scanning interface of the virus scanning device is invoked to perform streaming virus scanning.

病毒扫描装置以进程方式独立运行,在接收到预处理装置的扫描请求后扫描指定数据。病毒扫描装置带有病毒库,病毒库中包含病毒特征码集合,病毒特征码分为单模式与多模式两种形式。如果被扫描内容包含与单模式的病毒特征码相匹配的字符串序列,则可判断该扫描内容被此种病毒所感染。如果被扫描内容包含的字符串序列能够匹配某多模式病毒特征码中定义的多个模式,并符合规定的某种条件,则可判断该扫描内容被此种病毒所感染。The virus scanning device operates independently in a process mode, and scans specified data after receiving a scanning request from the preprocessing device. The virus scanning device is equipped with a virus database, which contains a set of virus signature codes, and the virus signature codes are divided into two forms: single mode and multimode. If the scanned content contains a character string sequence matching a single-mode virus signature, it can be determined that the scanned content is infected by the virus. If the character string sequence contained in the scanned content can match multiple patterns defined in a multi-mode virus signature and meet certain specified conditions, it can be determined that the scanned content is infected by the virus.

存储上下文信息的表装置,如表1所示,用于协助病毒扫描装置处理流式查杀病毒时引起的病毒分割问题。The table device for storing context information, as shown in Table 1, is used to assist the virus scanning device in dealing with the problem of virus segmentation caused by streaming virus scanning and killing.

Claims (7)

1.一种网关级流式病毒扫描方法,其步骤为:1. A gateway-level streaming virus scanning method, the steps of which are: 1)将会话数据包组织成数据单元队列;1) Organizing the session data packet into a data unit queue; 2)判断数据单元内的数据类型是否支持流式扫描;2) Determine whether the data type in the data unit supports stream scanning; 若不支持流式扫描,则进行文件式扫描;If stream scanning is not supported, file scanning is performed; 若支持流式扫描,则进行下列步骤:If stream scanning is supported, perform the following steps: a)将数据单元分组进行病毒扫描;a) grouping data units for virus scanning; b)判断扫描的数据单元是否安全,如果不安全则停止扫描并关闭与客户端的连接,退出实例进程;如果安全则发送该组数据;b) Judging whether the scanned data unit is safe, if it is not safe, stop scanning and close the connection with the client, and exit the instance process; if it is safe, send the group of data; c)判断会话是否结束,如果没有结束则保留该组数据的边界信息,利用上下文信息的表装置进行边界处理;其方法为:A)对支持流式扫描的会话数据分组复制到上下文信息的表装置的buffer缓冲区中,复制的起始地址为:buffer+offset+cur_buff_len;如果缓冲区中存有上一次扫描的边界数据,则新读入的数据紧跟在边界数据的后面,下一步的扫描操作可以将这两部分数据作为一个整体进行计算;B)将该组数据同时复制到上下文信息的表装置维护的数据单元队列bb中;C)将上下文信息的表装置的buffer中的数据通过病毒扫描装置的流式扫描接口传递给病毒扫描装置;c) judge whether the session ends, if not, then retain the boundary information of the group of data, and use the table device of context information to carry out boundary processing; the method is: A) copy the session data packets supporting streaming scanning to the table of context information In the buffer buffer of the device, the starting address of copying is: buffer+offset+cur_buff_len; if there is boundary data from the previous scan in the buffer, the newly read data follows the boundary data, and the next step The scanning operation can calculate these two parts of data as a whole; B) copy the group of data to the data unit queue bb maintained by the table device of the context information at the same time; C) pass the data in the buffer of the table device of the context information through The stream scanning interface of the virus scanning device is passed to the virus scanning device; 其中,所述上下文信息的表装置的数据结构关键域包括:buffer:其功能为基于流式扫描模式下,存储待扫描的数据,以及处理边界效应时需要存储的数据,数据类型为字符型数组;cur_buff_len:其功能为存储当前buffer中存储数据长度的值,数据类型为整型;offset:其功能为存储当前buffer中存储数据的起始位置距buffer首地址的偏移,数据类型为整型;tmpFile:其功能为基于文件扫描模式中,存储队列数据的临时文件,数据类型为文件指针;partcnt:其功能为流式扫描时,记录病毒扫描过程中,针对多模式病毒特征码已经匹配的子模式信息,以及它们的位置关系,数据类型为结构体链表;Wherein, the key field of the data structure of the table device of the context information includes: buffer: its function is to store the data to be scanned based on the streaming scanning mode, and the data that needs to be stored when dealing with boundary effects, and the data type is a character array ;cur_buff_len: Its function is to store the value of the length of the data stored in the current buffer, and the data type is an integer; offset: Its function is to store the offset between the starting position of the data stored in the current buffer and the first address of the buffer, and the data type is an integer ;tmpFile: Its function is based on the file scanning mode, the temporary file that stores the queue data, and the data type is a file pointer; partcnt: Its function is to record during the virus scanning process, the multi-mode virus signature has been matched Sub-mode information, and their positional relationship, the data type is a structure linked list; d)重复步骤1)和2),如果结束则退出实例进程。d) Repeat steps 1) and 2), and exit the instance process if it ends. 2.如权利要求1所述的方法,其特征在于所述边界信息为该组数据单元内,包含的病毒库最大病毒特征码长度的末尾数据。2. The method according to claim 1, wherein the boundary information is the end data of the maximum virus signature length of the virus database contained in the group of data units. 3.如权利要求1所述的方法,其特征在于所述边界信息为记录该组数据中针对多模式病毒特征码已检测到的子模式的序号,以及各个子模式之间的位置关系。3. The method according to claim 1, wherein the boundary information is to record the sequence numbers of sub-patterns detected for the multi-pattern virus signature in the group of data, and the positional relationship between each sub-pattern. 4.如权利要求1所述的方法,其特征在于流式扫描中利用上下文信息的表装置进行多模式病毒特征码匹配时处理边界问题的方法为:4. method as claimed in claim 1 is characterized in that the method for processing border problem when utilizing the table device of context information to carry out multi-mode virus signature code matching in stream scanning is: 在partcnt中记录多模式病毒特征码序号,以及已经匹配的子模式信息,以供后续数据到来时继续进行匹配;如果扫描数据匹配了该多模式类型的病毒特征码中定义的全部子模式,且满足规定的位置关系,则认为当前数据被感染。Record the serial number of the multi-mode virus signature and the matched sub-pattern information in partcnt for continued matching when subsequent data arrives; if the scanned data matches all the sub-patterns defined in the multi-mode virus signature, and If the specified positional relationship is met, the current data is considered to be infected. 5.如权利要求1所述的方法,其特征在于所述会话数据由多个数据单元队列构成,最后一个队列的最后一个数据单元中存有结束标志EOS。5. The method according to claim 1, wherein the session data is composed of multiple data unit queues, and the last data unit of the last queue contains an end flag EOS. 6.如权利要求1所述的方法,其特征在于所述不支持流式扫描的数据包载荷数据类型包括但不限于下列类型的一种或几种:gzip、zip、bz、rar、base64。6. The method according to claim 1, wherein the data packet payload data types that do not support stream scanning include but are not limited to one or more of the following types: gzip, zip, bz, rar, base64. 7.一种网关级流式病毒扫描系统,其包括:7. A gateway-level streaming virus scanning system, comprising: 数据流获取装置,用于获取流入的数据单元,针对会话组织成数据单元队列;A data flow acquiring device, configured to acquire incoming data units, and organize them into data unit queues for sessions; 预处理装置,分析数据类型是否支持流式扫描;A preprocessing device to analyze whether the data type supports streaming scanning; 病毒扫描装置,分别针对流式病毒扫描和文件式病毒扫描两种接口的数据进行病毒扫描处理;A virus scanning device, which performs virus scanning processing on the data of the two interfaces of streaming virus scanning and file virus scanning respectively; 数据流发送装置,用于转发处理结束后的数据;A data stream sending device, configured to forward the processed data; 存储上下文信息的表装置,用于处理流式扫描时因分割数据流而引起的病毒分割问题;其数据结构关键域包括:The table device for storing context information is used to deal with the virus segmentation problem caused by segmenting data streams during stream scanning; the key domains of its data structure include: buffer:其功能为基于流式扫描模式下,存储待扫描的数据,以及处理边界效应时需要存储的数据,数据类型为字符型数组;buffer: Its function is to store the data to be scanned and the data to be stored when dealing with boundary effects based on the streaming scanning mode, and the data type is a character array; cur_buff_len:其功能为存储当前buffer中存储数据长度的值,数据类型为整型;cur_buff_len: Its function is to store the value of the length of data stored in the current buffer, and the data type is integer; offset:其功能为存储当前buffer中存储数据的起始位置距buffer首地址的偏移,数据类型为整型;offset: Its function is to store the offset from the starting position of the data stored in the current buffer to the first address of the buffer, and the data type is an integer; tmpFile:其功能为基于文件扫描模式中,存储队列数据的临时文件,数据类型为文件指针;tmpFile: Its function is based on the temporary file for storing queue data in the file scanning mode, and the data type is a file pointer; partcnt:其功能为流式扫描时,记录多模式病毒扫描过程中,已经匹配的子模式信息,以及它们的位置关系,数据类型为结构体链表。partcnt: Its function is to record the matched sub-pattern information and their positional relationship during the multi-pattern virus scanning process during stream scanning, and the data type is a structure linked list.
CN2007101213221A 2007-09-04 2007-09-04 A gateway-level streaming virus scanning method and system thereof Expired - Fee Related CN101119373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007101213221A CN101119373B (en) 2007-09-04 2007-09-04 A gateway-level streaming virus scanning method and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007101213221A CN101119373B (en) 2007-09-04 2007-09-04 A gateway-level streaming virus scanning method and system thereof

Publications (2)

Publication Number Publication Date
CN101119373A CN101119373A (en) 2008-02-06
CN101119373B true CN101119373B (en) 2010-09-08

Family

ID=39055307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007101213221A Expired - Fee Related CN101119373B (en) 2007-09-04 2007-09-04 A gateway-level streaming virus scanning method and system thereof

Country Status (1)

Country Link
CN (1) CN101119373B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021074802A1 (en) * 2019-10-17 2021-04-22 International Business Machines Corporation Maintaining system security

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102905269B (en) * 2011-07-26 2017-06-13 西门子公司 The detection method and device of a kind of mobile phone viruses
CN102811146B (en) * 2012-08-31 2015-03-04 飞天诚信科技股份有限公司 Method and device for detecting message processing environment
CN103546449A (en) * 2012-12-24 2014-01-29 哈尔滨安天科技股份有限公司 E-mail virus detection method and device based on attachment formats
CN103580949A (en) * 2012-12-27 2014-02-12 哈尔滨安天科技股份有限公司 Method and system for non-complete flow detection and complete flow detection in switchable mode
CN104424438B (en) * 2013-09-06 2018-03-16 华为技术有限公司 A kind of antivirus file detection method, device and the network equipment
CN104216946B (en) * 2014-07-31 2019-03-26 百度在线网络技术(北京)有限公司 A kind of method and apparatus for beating again packet application program for determination
CN109981629A (en) * 2019-03-19 2019-07-05 杭州迪普科技股份有限公司 Antivirus protection method, apparatus, equipment and storage medium
CN110610087A (en) * 2019-09-06 2019-12-24 武汉达梦数据库有限公司 Data acquisition safety detection method and device
CN113132341B (en) * 2020-01-16 2023-03-21 深信服科技股份有限公司 Network attack behavior detection method and device, electronic equipment and storage medium
CN112199679B (en) * 2020-09-29 2024-07-19 珠海豹好玩科技有限公司 Virus checking and killing method and device under Linux system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5495607A (en) * 1993-11-15 1996-02-27 Conner Peripherals, Inc. Network management system having virtual catalog overview of files distributively stored across network domain
US7246227B2 (en) * 2003-02-10 2007-07-17 Symantec Corporation Efficient scanning of stream based data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5495607A (en) * 1993-11-15 1996-02-27 Conner Peripherals, Inc. Network management system having virtual catalog overview of files distributively stored across network domain
US7246227B2 (en) * 2003-02-10 2007-07-17 Symantec Corporation Efficient scanning of stream based data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021074802A1 (en) * 2019-10-17 2021-04-22 International Business Machines Corporation Maintaining system security
US11093612B2 (en) 2019-10-17 2021-08-17 International Business Machines Corporation Maintaining system security

Also Published As

Publication number Publication date
CN101119373A (en) 2008-02-06

Similar Documents

Publication Publication Date Title
CN101119373B (en) A gateway-level streaming virus scanning method and system thereof
CN109309626B (en) DPDK-based high-speed network data packet capturing, distributing and caching method
JP5850896B2 (en) Method and apparatus for monitoring traffic in a network
EP2530874B1 (en) Method and apparatus for detecting network attacks using a flow based technique
CN104978526B (en) The extracting method and device of virus characteristic
CN109450900B (en) Mimic judgment method, device and system
CN106534145B (en) An application identification method and device
CN110222503A (en) Database audit method, system and equipment under a kind of load of high amount of traffic
CN101316232B (en) Fragmentation and reassembly method based on network protocol version six
CN108183893A (en) A kind of fragment packet inspection method, detection device, storage medium and electronic equipment
CN104182519B (en) A kind of file scanning method and device
CN105407096B (en) Message data detection method based on flow management
JP2004528651A5 (en)
CN102932203A (en) Method and device for inspecting deep packets among heterogeneous platforms
CN106330584A (en) A business flow identification method and identification device
CN111200665B (en) User source tracing method and device and computer readable storage medium
CN101170496B (en) An identification method and device for point-to-point media stream
CN104104659B (en) Communication fingerprint extraction method and device
CN106385407A (en) Method and device for noise removing through application of identification data packet to be analyzed
Yan A survey of traffic classification validation and ground truth collection
CN101063952A (en) Universal serial bus host controller rapid test system and method
CN105591833A (en) Flow-acquiring method based on rule engine
CN107426180A (en) A kind of monitoring device to ethernet data frame spreadability
CN108768984B (en) Intrusion detection device and method based on field programmable gate array
CN106027375A (en) System and method for realizing website channel sharing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100908

Termination date: 20130904