Disclosure of Invention
The invention aims to overcome the defects of a network virus detection technology based on files and data packets, and provides a device for detecting network viruses based on network data flow in a TCP/IP network, so that the network viruses can be detected quickly and accurately, the safety of information in the network is ensured, and a quick and safe network application environment is provided for network users.
The purpose of the invention is realized by the following technical scheme:
a network virus detection method based on network data flow includes the following steps:
A. classifying the network viruses according to different host file types;
B. fragmenting the network virus according to different description modes;
C. acquiring different webpage file format characteristics aiming at different webpage file types;
D. recombining a network data stream for a specific number of network data packets;
E. and matching the network data stream with the network virus characteristics according to the network virus information base, the virus characteristic fragments, the web page file format characteristics and the network data stream of different types, and detecting the network virus hidden in the network data stream.
On the premise that a network virus information base is available, the step A comprises the following steps:
a1, reading a network virus information base;
a2, decrypting virus information;
and A3, analyzing the virus information.
Preferably, the virus information includes: virus name, virus type, file offset, and virus signature.
A4, dividing the network virus into PE virus, macro virus, script virus and other viruses according to different format types of the network virus host file.
Preferably, the step B includes:
according to different network virus description modes, two pieces of network viruses described by a non-regular expression are randomly and sequentially extracted from the network viruses, for the network viruses described by a regular expression, a segment is randomly extracted from the front of all regular descriptors, and the rest part of the segment is taken as a segment.
Preferably, the first of the two viral signature fragments comprises: the file comprises a virus name, a virus description mode, a file offset, a virus feature code and a pointer of a second virus feature segment in the two virus feature segments;
preferably, the second viral fragment comprises: a virus offset and a virus signature.
Preferably, the step C includes:
c1, acquiring file format characteristics of various types of web page file formats by analyzing the web page file formats to form a web page file format characteristic library;
c1, reading the web page file format characteristic information from the web page file format characteristic library;
c2, analyzing the webpage file format characteristic information;
c3, inserting the web page file format features into the web virus features data structure to detect script viruses embedded in the PE and document format files.
Preferably, the step D includes:
d1, storing the data packets into a cache in sequence, and adding 1 to a packet counter;
and D2, when the packet counter value is greater than or equal to the sending end packet number window value, or the packet counter value is smaller than the sending end packet number window value and the time difference value is greater than or equal to the time window value, uploading the data packet, wherein the packet counter is 0, and the initial time is 0.
Preferably, the step E includes:
e1, according to different file types, hanging different types of network virus feature libraries, and using a multi-mode matching algorithm to enable the first segments of all network viruses to scan network data streams;
e2, when the matching of E1 is successful and the corresponding virus is described by the irregular expression, calculating the position of the second piece of virus in the network data stream and scanning the network data stream from the position by using a single pattern matching algorithm;
e3, when the matching of the E1 is successful and the corresponding virus is described by a regular expression, performing the automatic machine scanning network data flow generated by using the residual virus characteristics by using a regular expression algorithm.
E4, according to the scanning results of E2 and E3, if the matching is successful, a virus response is carried out.
A network data flow based network virus detection apparatus, comprising:
the virus information base is used for storing virus information, and comprises a virus name, a virus type, a description mode, an offset and a virus characteristic code;
the system comprises a characteristic mark used for storing and identifying webpage files, such as a webpage file format characteristic library of html, htm, php, jsp, jspx and asp;
the operation parameter library is used for storing the switch parameters and the response modes of the compression, decoding and shelling modules;
initializing operation parameters, initializing webpage file format characteristics, preprocessing virus characteristics, and establishing an initialization module of a virus characteristic single-mode regular expression automaton, wherein the virus characteristic is read, decrypted, analyzed, classified and fragmented, a virus characteristic tree is established, and the virus characteristic single-mode regular expression automaton is established;
a virus detection module for preprocessing data stream, including data stream recombination, decompression, decoding, virus shelling, cross-stream matching virus of multi-mode matching algorithm, cross-stream matching virus of single-mode regular expression algorithm, and response;
and the cache recovery module is used for recovering the cache space applied in the running process of the device.
Firstly, an initialization module reads, decrypts and analyzes a network virus information base, a webpage file format feature base and an operation parameter base; then, the virus detection device preprocesses the network data stream and performs virus feature matching, and if a virus exists, the virus detection device responds; finally, the dynamic application cache is reclaimed at the termination of the process.
According to the technical scheme provided by the invention, the viruses are classified, so that different virus libraries can be called for different types of files, the number of matched viruses is reduced, the viruses are fragmented, two-time matching is carried out on data streams, the matching length of a single virus is reduced, and the performance of the device is improved; the network data packet is subjected to stream recombination, and cross-stream can be realized in the processes of decompression and virus matching, so that the false alarm rate and the missing report rate are reduced.
Detailed Description
The core of the method is that the network viruses are classified according to the file types, so that the file types correspond to the network virus types, and the number of the network viruses matched for each data stream is reduced; the network virus is segmented and scanned twice, so that the length of scanning a single virus is reduced; the network data flow is recombined, so that the false alarm rate and the missing report rate are reduced; cross-stream decompression, decoding, shelling, and matching to enable devices to detect viruses transmitted in a network; the embedded network virus is detected, so that the device can detect the script virus hidden in the PE and OLE2 files.
As known to those of ordinary skill in the art, the general workflow for network virus detection is:
in the initialization stage, reading network virus information from a virus information base, and decrypting and analyzing the network virus information; in the detection stage, a network packet capturing device acquires a data packet, a protocol is analyzed and recombined, and decompression, decoding, shelling and virus scanning are carried out; and in the response stage, reporting the detection result and the corresponding action to be taken to the control end. In the detection stage, there are three detection modes: packet-based, data stream-based, and file-based. The method of the invention keeps the general framework and the flow of the network virus detection when the network virus detection is carried out based on the data flow in the TCP/IP network.
The device networking structure for detecting network virus based on data flow in TCP/IP is shown in FIG. 1. Wherein,
the local area network comprises network users and network services inside the local area network;
the network virus detection device based on the data flow is used for detecting the network data flow passing through and providing safety protection for the local area network;
the Internet, including routers, may transport and route network traffic.
The apparatus structure of the method of the present invention will be described in detail with reference to FIG. 2:
the network virus detection device based on the network data flow comprises an information base, an initialization module, a network virus characteristic detection module and a cache recovery module; the information base comprises a virus information base, a webpage format feature base and an operation parameter base.
The initialization module comprises an operation parameter initialization module, a preprocessing network virus characteristic module and a webpage file format characteristic initialization module, wherein the preprocessing network virus characteristic module comprises a virus information reading, decrypting, analyzing, classifying and fragmenting module, a characteristic tree creating module, a single-mode data structure and a regular expression automaton creating module;
the network virus characteristic detection module comprises a network data stream reconstruction module, a data stream preprocessing module, a virus characteristic matching module and a response module; the data stream preprocessing module comprises a cross-stream decompression, decoding and virus shelling module.
The virus characteristic matching module is provided with a multi-mode matching module and a single-mode regular expression matching module.
And the cache recovery module is used for recovering the cache space applied in the initialization module when the network virus detection device based on the network data flow is finished.
The classification model of the network viruses in the present invention is explained in detail with reference to fig. 3:
FIG. 3 classifies virus libraries and file types in the present invention such that the file types correspond to virus libraries to reduce the number of viruses used to scan a single network data stream. The virus library is divided into a PE virus library, a macro virus library, a script virus library and other virus libraries. The files are divided into compressed files and uncompressed files, the compressed files comprise zip files, rar files, chm files and cab files, the uncompressed files are divided into special coded files and non-special coded files, the special coded files comprise Base64 coded files, the non-special coded files are divided into Windows PE files, Windows documents, webpages and other files, the Windows PE files comprise exe files, com files, dll files, sys files and vxd files, the Windows documents comprise doc files, xls files and ppt files, and the webpages comprise html files, htm files, php files, asp files, aspx files and jsp files.
The viruses hidden in the Windows PE files form a PE virus library, the viruses hidden in the Windows files form a macro virus library, the viruses hidden in the web pages form a script virus library, and the viruses hidden in other files form other virus libraries.
The fragment model of the complex virus in the present invention is explained in detail with reference to fig. 4:
in order to reduce the length of the matched single virus, the single virus is segmented in the invention, firstly, the virus is divided into the virus described by a non-regular expression and the virus described by a regular expression according to a description form. For viruses described by a non-regular expression, randomly extracting 2 segments according to an address sequence, and inserting the 1 st segment into a feature tree of a virus library where the segments are located; for the virus described by the regular expression, the 1 st segment is inserted into the feature tree of the virus library, and the 2 nd segment is inserted into the single-mode regular expression automaton.
In order that those skilled in the art will better understand the present invention, the present invention will be described in further detail below with reference to the flowchart shown in fig. 5. The method comprises the following steps:
step 501: the setting of the operating parameters of the network virus detection device based on the network data stream can be identified, specifically: and reading the operation parameters from the network virus detection device operation parameter library based on the network data flow, analyzing and assigning the operation parameters to corresponding variables.
The structural table of the operating parameters of the apparatus is shown in table 1 below.
Table 1:
device operating parameter variables |
Values of device operating parameters |
Type of detection |
1 detection of compressed files 2 detection of mail 3 detection of enveloped virus files 4 detection of embedded virus files |
Type of response |
1 alarm 2 packet loss |
Type of matching algorithm |
1 multi-pattern matching algorithm 12 multi-pattern matching algorithm 23 single-pattern regular expression algorithm |
Step 502: setting network viruses to be identifiable specifically as follows: reading virus information from a virus library, carrying out decryption, analysis, fragmentation and classification, creating a feature tree, and creating a single-mode data structure and a regular expression automaton.
The structure table of the network virus information is shown in table 2 below.
Table 2:
serial number |
Data field |
1 |
Viral name |
2 |
Virus type 1 PE Virus 2 Macro Virus 3 script Virus 4 other viruses |
3 |
Description mode 1 irregular expression 2 regular expression |
4 |
File offset of virus signature relative to file header |
The array structure of the 1 st segment of the network virus signature is shown in table 3 below.
Table 3:
serial number |
Data field |
1 |
Length of fragment 1 of viral signature |
2 |
Case sensitive 1 case sensitive 2 case insensitive |
3 |
Description mode 1 irregular expression 2 regular expression |
4 |
Characteristic value of virus |
Step 503: the method for setting the format characteristics of the webpage file can be identified, and specifically comprises the following steps: and reading the characteristic value from the webpage file format characteristic library, analyzing and finally inserting the characteristic value into a corresponding characteristic tree.
The structure of the web page file format feature is shown in table 4 below.
Table 4:
serial number |
Data field |
1 |
html |
2 |
htm |
3 |
PHP |
4 |
asp |
5 |
aspx |
6 |
jsp |
Step 504: setting that the network data packet can be identified specifically is: and capturing a data packet from a network link, and performing data frame analysis, IP data packet analysis and fragment recombination, and transmission layer data message analysis.
Step 505: and recombining the data messages according to the fact that the network data packet can be identified, and reporting the network data stream when the number of the messages is larger than the number of the messages window.
Step 506: preprocessing the network data stream, and if the network data stream is a compressed file, decompressing the network data stream; if the file is the file with the special coding format, decoding; if the file contains the virus with shell, shell removal is carried out, and a corresponding virus library is connected.
Step 507: matching the 1 st segment of the virus and the format characteristics of the webpage file by using a multi-pattern matching algorithm, and detecting the embedded virus if the matching is successful and the matching is the format characteristics of the webpage file; and if the matching is successful and the matching is the virus characteristic, detecting the network data flow by using the 2 nd fragment virus characteristic according to different description forms.
Step 508: and according to the virus matching result, if viruses exist, making a corresponding response.
Step 509: and when the virus detection device is terminated, recovering the dynamically applied cache resources.
The above-described flow of fig. 5 is further illustrated by an application example.
For example: the device may detect files of compressed, mail, shelled and embedded virus types; when the virus is detected, the alarm is given and the packet is lost; when scanning virus, a multi-mode matching algorithm and a single-mode regular expression algorithm are adopted.
The virus information is
Exploit.HTML.ObjectType:3:9a:3c6f626a65637420747970653d222f2f2f2f2f2f2f2f2f2f2f2f7468284461746529203d20313220416e6420446179284461746529203d203239205468656e{2-
3}20202020456e64204966*636f6d706f6e656e742e4578706f7274202822433a5c537572726f756e642e6b65792229646174613d226d732d6974733a6d68746d6c3a66696c653a2f2f(63|64)3a5c
The web page file format is characterized in that
html, htm, PHP, asp, aspx, and jsp.
The structural table of the operating parameters of the apparatus is shown in table 5 below.
Table 5:
device operating parameter variables |
Values of device operating parameters |
Type of detection |
1 & 2 & 3 & 4 |
Type of response |
1 & 2 |
Type of matching algorithm |
1 & 3 |
The structure of the network virus information is shown in table 6 below.
Table 6:
serial number |
Data field |
1 |
Exploit.HTML.ObjectType |
2 |
3 |
3 |
2 |
4 |
9a |
5 |
3c6f626a65637420747970653d222f2f2f2f2f2f2f2f2f2f2f2 f74682 84461746529203d20313220416e6420446179284461746529203 d203239205468656e{2-3}20202020456e64204966*636f 6d706f6e656e742e4578706f7274202822433 a5c537572726f756 e642e6b65792229646174613d226d732d6974733a6d68746d6c3 a66696c653a2f2f(63|64)3a5c |
The array structure of the 1 st segment of the network virus signature is shown in table 7 below.
Table 7:
serial number |
Data field |
1 |
10byte |
2 |
1 |
3 |
3c6f626a656374207479 |
Firstly, preprocessing a network data packet, wherein the preprocessed network data stream is
01 00 0c cc cc cc 00 0e d7 bd a4 c0 01 63 aa aa 03 00 00 0c 20 00
02 b4 50 76 00 01 00 0d 76 65 6e 75 73 64 65 70 32 00 05 01 00 43
69 73 63 6f 20 49 6e 74 65 72 6e 65 74 77 6f 72 6b 20 4f 70 65 72
61 74 69 6e 67 20 53 79 73 74 65 6d 20 53 6f 66 74 77 61 72 65 20
0a 49 4f 53 20 28 74 6d 29 20 43 32 36 30 30 20 53 6f 66 74 77 61
72 65 20 28 43 32 36 3c6f626a656374207479 30 30 2d 49 2d 4d 29 2c
20 56 65 72 73 30 00 04 00 08 00 00 00 01 00 07 00 09 c0 a8 0a 00
18 00 0b 00 05 00
And matching the 1 st network virus characteristic with the network data stream, wherein the matching is successful, because the virus characteristic is described by the regular expression, an automaton using the regular expression is used for matching, and if the matching is not successful, the data packet is obtained again and detected.
The present invention is further described in detail with reference to the flowchart shown in fig. 6. The method comprises the following steps:
step 601: and reading virus information from the virus library.
Step 602: and decrypting the ciphertext of the virus information.
Step 603: and analyzing the plaintext of the virus information.
Step 604: the virus characteristics are classified into PE viruses, macro viruses, script viruses and other viruses.
Step 605: the virus characteristics are segmented, if the virus characteristics are described by the irregular expression, 2 virus segments are randomly extracted from the virus characteristics; if the virus features are described by regular expressions, randomly extracting a1 st virus feature segment with a determined length before a1 st regular expression descriptor, and taking the rest part as a2 nd virus feature segment.
Step 606: and inserting the 1 st virus fragment of each virus characteristic into the corresponding virus characteristic tree.
Step 607: and if the virus characteristics are described by the regular expression, inserting the residual virus characteristics behind the 1 st virus segment into the single-mode regular expression automaton.
Step 608: if the virus characteristics are described by the irregular expression, saving the 2 nd virus characteristic segment and the virus offset thereof.
The array structure of the 2 nd segment of the network virus signature is shown in table 8 below.
Table 8:
serial number |
Data field |
1 |
Offset of 2 nd viral fragment from 1 st viral signature fragment |
2 |
Length of 2 nd fragment characteristic of virus |
3 |
Viral characteristics |
The above-described flow of fig. 6 is further illustrated by an application example.
For example: firstly, reading a ciphertext of virus information from a virus library, decrypting the ciphertext and analyzing a plaintext;
the ciphertext of the virus information is
436f707972696768742028632920313938362d3230303320627920636973636f2053797374656d732c20496e632e0a436f6d70696c6564204672692033302d4d61792d30332030323a3435206279206b656c6c7974687700060010636973636f2032363231584d00020011000000010101cc0004c0a81cfe0003001346617374772e636973636f2e636f6d2f7461630a4152452028666331290a54414376
The plaintext of the virus information is
Exploit.HTML.ObjectType:3:9a:3c6f626a65637420747970653d222f2f2f2f2f2f2f2f2f2f2f2f7468284461746529203d20313220416e6420446179284461746529203d203239205468656e{2-
3}20202020456e64204966*636f6d706f6e656e742e4578706f727
4202822433a5c537572726f756e642e6b65792229646174613d226d732d6974733a6d68746d6c3a66696c653a2f2f(63|64)3a5c
The structure table of the network virus information is shown in table 9 below.
Table 9:
1 |
Exploit.HTML.ObjectType |
2 |
3 |
3 |
2 |
4 |
9a |
5 |
3c6f626a65637420747970653d222f2f2f2f2f2f2f2f2f2f2f2 f74682 84461746529203d20313220416e6420446179284461746529203 d203239205468656e{2-3}20202020456e64204966*636f 6d706f6e656e742e4578706f7274202822433 a5c537572726f756 e642e6b65792229646174613d226d732d6974733a6d68746d6c3 a66696c653a2f2f(63|64)3a5c |
Then, the virus characteristics are segmented, and the 1 st virus characteristic is inserted into a corresponding multi-mode matching virus characteristic tree according to different virus types;
the array structure of the 1 st fragment of the network virus is shown in table 10 below.
Table 10:
serial number |
Data field |
1 |
10byte |
2 |
1 |
3 |
3c6f626a656374207479 |
And finally, according to different description modes, if the segment is described by a non-regular expression, caching the 2 nd segment, creating a jump table, and if the segment is described by a regular expression, constructing a regular expression automaton.
The present invention is further described in detail with reference to the flowchart shown in fig. 7. The method comprises the following steps:
step 701: and reading the webpage file format characteristics from the webpage file format characteristic library.
Step 702: and analyzing the format characteristics of the webpage file.
Step 703: and inserting the webpage file format characteristics into the PE virus characteristic tree.
Step 704: and inserting the webpage file format characteristics into the macro virus characteristic tree.
The above-described flow of fig. 7 is further illustrated by an application example.
For example: firstly, reading the format characteristics of the webpage file from a webpage file format characteristic library and analyzing;
the structure of the web page file format feature is shown in table 11 below.
Table 11:
serial number |
Data field |
1 |
html |
2 |
htm |
3 |
PHP |
4 |
asp |
5 |
aspx |
6 |
jsp |
Then, inserting the features in the table into the PE virus tree;
finally, the features in the table are inserted into the macro virus tree.
The network data stream reassembly is described in further detail with reference to the flowchart shown in fig. 8. The method comprises the following steps:
step 801: storing the data packets into a cache in sequence, and adding 1 to a packet counter;
step 802: when the packet counter value is greater than or equal to the sending end packet number window value, or the packet counter value is smaller than the sending end packet number window value and the time value is greater than or equal to the time window value, uploading the data packet;
step 803: the packet counter is 0 and the initial time is 0.
The above-described flow of fig. 8 is further illustrated by an application example.
For example: setting that the number window of data packets is 10, the time window is 1 second, if the packet capturing device has captured data packets with sequence numbers 1, 2, 3, 4, 5, 6, 7, 9 and 10, the number of packets is 9 smaller than 10, the time window is smaller than 1 second, and now captures data packets with sequence number 8, then inserting the data packets into a stream reassembly queue in sequence, wherein the sequence of the queue is 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10, and the number of packets is equal to 10;
then, the 10 data packets are reported, and the value of the counter is assigned to 0, and the initial time window is assigned to 0.
The network data flow preprocessing flow is further described in detail with reference to the flow chart shown in fig. 9. The method comprises the following steps:
step 901: and acquiring the file type of the file from the header of the file after the protocol analysis.
Step 902: if the file type is a compressed file, a corresponding decompression algorithm is called according to the extension names of different compressed files to decompress the compressed file.
Step 903: if the file type is a file with a special coding format, calling a corresponding decoding algorithm to decode the file.
Step 904: if the file type is a common file, whether the file type is a Windows PE file, a Windows document, a webpage or other files is judged.
Step 905: if the file type is a Windows PE file and virus shelling is needed, then shelling is performed.
Step 906: and hooking a virus library corresponding to the PE format file.
Step 907: and hooking the corresponding macro virus library of the Windows document.
Step 908: and hooking a script virus library corresponding to the webpage file.
Step 909: and hanging a picture virus library corresponding to the picture file.
Step 9010: and hooking virus libraries corresponding to other files.
The above-described flow of fig. 9 is further illustrated by an application example.
For example: decoding the analyzed encoding mode base64 encoding format of the network data stream; if the decoded file type is rar, decompressing the file; the decompressed file type is a Windows PE format file, and if the file needs to be unshelled, the file is unshelled, and a PE virus library is connected.
The present invention is further described in detail with reference to the flowchart shown in fig. 10. The method comprises the following steps:
step 1001: and scanning the network data stream by using a virus characteristic tree constructed by the 1 st segment of all viruses and a multi-mode matching algorithm.
Step 1002: if the matching is successful and the corresponding mode is the format characteristic of the webpage file, hanging a script virus library corresponding to the webpage file.
Step 1003: script viruses embedded in PE and OLE2 files are detected.
Step 1004: and judging whether the description form of the virus is a non-regular expression or a regular expression description form.
Step 1005: and if the virus fragment is described by the irregular expression, calculating the position of the 2 nd virus fragment and matching.
Step 1006: if the regular expression is described, the virus automata is hooked.
Step 1007: network data streams are scanned using a virus automaton and a single-pattern regular expression algorithm.
The above-described flow of fig. 10 is further illustrated by an application example.
For example: the virus library type is set as a PE virus library, the format characteristic of the webpage file is PHP, and the virus characteristic description mode is a regular expression.
Firstly, matching a feature tree generated by the 1 st virus feature of a PE virus library and a webpage file format feature with a network data stream, wherein the matching is successful, and if the feature is PHP, hanging a script virus library and matching; if the virus characteristic is described by the regular expression, the virus characteristic regular expression automata is hooked and matched.
While the present invention has been described with respect to the embodiments, those skilled in the art will appreciate that there are numerous variations and permutations of the present invention without departing from the spirit of the invention, and it is intended that the appended claims cover such variations and modifications as fall within the true spirit of the invention.