CN107612911B

CN107612911B - Method for detecting infected host and C & C server based on DNS traffic

Info

Publication number: CN107612911B
Application number: CN201710850732.3A
Authority: CN
Inventors: 蔡福杰; 范渊; 刘元; 李凯
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2020-05-01
Anticipated expiration: 2037-09-20
Also published as: CN107612911A

Abstract

The invention relates to a method for detecting an infected host and a C & C server based on DNS traffic, which comprises the steps of constructing a training set, training an algorithm for identifying random domain names, collecting DNS traffic passing through any network card and analyzing to obtain DNS information; judging whether the domain name in the DNS information is a random domain name by using an algorithm for identifying the random domain name, if so, judging whether the domain name is successfully analyzed, if not, identifying the infected host, storing the information of the infected host, identifying the C & C server, storing the information of the C & C server, alarming, storing the alarm information and displaying. The invention effectively identifies the host infected by virus and Trojan horse which are connected back by using DGA algorithm and the C & C server behind the host through DNS traffic information, has high accuracy, and has much smaller DNS traffic compared with other traffic detection means, thereby having lower cost and higher efficiency.

Description

Method for detecting infected host and C & C server based on DNS traffic

Technical Field

The invention relates to the technical field of network security, in particular to a method for detecting an infected host and a C & C server based on DNS (domain name system) flow in the field of network security APT (advanced persistent Threat) detection.

Background

Detecting hosts infected by viruses and trojans and C & C servers (remote command and control servers, which are servers giving instructions to infected hosts) are always an important part of network security research, and computers are often connected back to the C & C servers after being infected by viruses and trojans to acquire new instructions or transmit acquired confidential contents to the C & C servers.

In the past, viruses, trojans, often used fixed domain names for the back-connection, which were easily detected. However, more and more viruses and trojans use the DGA domain name generation algorithm to generate domain names for loop connection, and in a period, hundreds or even thousands of relatively random domain names are generated by taking date and the like as seeds for one-by-one access, and then an attacker registers and points to the C & C server by using partial domain names, and when the infected host accesses the domain name registered by the attacker, the loop connection succeeds, so that some detection technologies are avoided.

In the process of accessing the domain name, DNS traffic is necessarily generated, and a DGA domain name generation algorithm is necessarily provided with a plurality of random domain names with access failure, and the infected host and the C & C server are checked on the basis of the domain name generation algorithm.

Disclosure of Invention

The invention aims to discover an infected host and a C & C server behind the infected host in time, and provides a method for detecting the infected host and the C & C server based on DNS traffic to realize the technical problem.

The technical scheme adopted by the invention is that a method for detecting an infected host and a C & C server based on DNS traffic, which comprises the following steps:

step 1: constructing a training set, and training to obtain an algorithm for identifying random domain names;

step 2: collecting DNS flow passing through any network card by using a flow collection module;

and step 3: analyzing the collected DNS traffic according to DNS protocol specification to obtain DNS information; the DNS information comprises a domain name, whether the domain name is successfully resolved or not, and a client IP;

and 4, step 4: judging whether the domain name in the DNS information is a random domain name or not by using the algorithm for identifying the random domain name obtained in the step 1, if so, carrying out the next step, and if not, returning to the step 2;

and 5: judging whether the domain name is successfully analyzed, if the domain name is failed to be analyzed, carrying out the next step, and if the domain name is successfully analyzed, carrying out the step 7;

step 6: identifying an infected host; if the host is infected, the information is stored, step 8 is carried out, otherwise, the step 2 is returned;

and 7: identifying a C & C server; if the server is a C & C server, the information is stored, the step 8 is carried out, and if not, the step 2 is returned;

and 8: alarming, storing and displaying alarming information; and returning to the step 2.

Preferably, in step 1, the training set includes a normal domain name and a random domain name generated by a random algorithm.

Preferably, the algorithm for identifying random domain names is obtained by attaching a weight ratio to the domain name length, the number of digits, the alphanumeric exchange frequency, the numeric ratio, the maximum length of continuous letters and the number of special characters.

Preferably, in step 3, the DNS information further includes a DNS server, a time of the request, and an actual server IP corresponding to the domain name successfully resolved.

Preferably, the step 6 comprises the steps of:

step 6.1: obtaining a client IP of the current DNS information;

step 6.2: if the client corresponding to the client IP continuously has the random domain names with the same characteristics and failed to resolve within the T time period, and the number of the random domain names with the same characteristics and failed to resolve before and after the T time period is small, and the duration time period T is within 30 minutes, the client is considered to be infected, and is considered to be an infected host;

step 6.3: for the infected host, extracting the time period of the random domain names failed in resolution and the same characteristics of the domain names, and storing; carrying out step 8; otherwise, returning to the step 2.

Preferably, in step 6.2, the same features include that the partial character strings are the same or have the same length.

Preferably, the same characteristics further include that the second-level domain names are identical but the top-level domain names are different.

Preferably, in step 6, the stored information includes the infected host IP, the random domain name that was accessed and failed, and the time period of the domain name that failed to be accessed.

Preferably, in step 7, the method for identifying the C & C server is to identify, for the domain name successfully resolved, the actual server IP corresponding to the domain name as the C & C server if the access time of the domain name successfully resolved currently, which has been identified as the infected host in step 6, is the same as the domain name failed in resolution within the time period when the infected host accesses the random domain name failed in resolution in a large amount.

Preferably, the saved information includes a client IP accessing the C & C server, a time of access, a domain name of access, a C & C server IP, while associating the infected host with the C & C server.

The invention provides a method for detecting infected host and C & C server based on DNS traffic, which effectively identifies virus, Trojan infected host and C & C server behind the host, which are connected back by using DGA algorithm, through DNS traffic information, has high accuracy, and DNS traffic is much smaller than other traffic detection means, so the cost of the invention is lower and the efficiency is higher.

Drawings

FIG. 1 is a flow chart of the algorithm for obtaining an identified random domain name of the present invention;

fig. 2 is a flow chart of DNS traffic based detection of infected hosts and C & C servers of the present invention.

Detailed Description

It should be noted that the method for detecting an infected host and a C & C server through DNS traffic according to the present invention is an application of computer technology in the field of information security technology. The applicant believes that it is fully possible for one skilled in the art to utilize the software programming skills in his or her own practice to implement the invention, as well as to properly understand the principles and objectives of the invention, in conjunction with the prior art, after a perusal of this application. All references made herein are to the extent that they do not constitute a complete listing of the applicants.

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

The invention relates to a method for detecting infected hosts and C & C servers based on DNS traffic, comprising the following steps.

Step 1: and constructing a training set, and training to obtain an algorithm for identifying the random domain name.

In step 1, the training set includes a normal domain name and a random domain name generated by a random algorithm.

The algorithm for identifying the random domain name is obtained by attaching a weight proportion to the length of the domain name, the number of digits, the exchange frequency of alphanumerics, the digit proportion, the maximum length of continuous alphabets and the number of special characters.

In the invention, 50 first foreign websites and 10 first domestic websites of Alexa are collected as training sets of normal domain names, and simultaneously, a plurality of DGA algorithms are collected to generate 10 ten thousand DGA domain names as training sets of random domain names.

In the invention, a second-level domain name of each domain name is extracted, according to experience, a relatively proper weight proportion is added to the length, the number of digits, the number-letter switching frequency, the number proportion, the maximum length of continuous letters and the number of special characters of the second-level domain name, and then the proportion is continuously adjusted through a certain algorithm, and finally an algorithm capable of distinguishing normal domain names from random domain names is obtained.

In the invention, an embodiment is provided for an algorithm for identifying the random domain name, and in the actual operation process, adjustment or additional setting can be made according to different technical requirements of technicians. The domain name length is a1, the number of digits is a2, the alphanumeric exchange frequency is a3, the numeric ratio is a4, the maximum length of continuous letters is a5, the number of special characters is a6, and x = x1 a1+ (x2 a2+ x3 a3+ x4 a4+ x5 a5+ x6 a6)/a1 are calculated, wherein x1, x2, x3, x4, x5 and x6 are weighted values, the domain name is updated continuously according to sample domain names, and finally whether the domain name is random is judged according to whether the obtained x reaches a preset threshold y or not.

Step 2: and acquiring DNS flow passing through any network card by using a flow acquisition module.

In the invention, a flow acquisition module captures DNS flow data packets flowing through a network card by using a Libpcap library.

And step 3: analyzing the collected DNS traffic according to DNS protocol specification to obtain DNS information; the DNS information comprises a domain name, whether the domain name is successfully resolved or not and a client IP.

In step 3, the DNS information further includes a DNS server, a requested time, and an actual server IP corresponding to the domain name successfully resolved.

And 4, step 4: and (3) judging whether the domain name in the DNS information is a random domain name or not by using the algorithm for identifying the random domain name obtained in the step (1), if so, carrying out the next step, and if not, returning to the step (2).

And 5: and (4) judging whether the domain name is successfully analyzed, if the domain name is failed to be analyzed, carrying out the next step, and if the domain name is successfully analyzed, carrying out the step (7).

Step 6: identifying an infected host; if the host is infected, the information is stored, step 8 is carried out, otherwise, the step 2 is returned.

The step 6 includes the following steps.

Step 6.1: the client IP of the current DNS information is obtained.

Step 6.2: if the random domain names with the same characteristics and failed resolution continuously appear in the client corresponding to the client IP within the T period, and the number of the random domain names with the same characteristics and failed resolution before and after the T period is small, and the duration T is within 30 minutes, the client is considered to be infected, and is considered to be an infected host.

In said step 6.2, the same features include that the partial character strings are the same or have the same length.

The same characteristics also include that the second level domain names are identical but the top level domain names are different.

In step 6, the stored information includes the infected host IP, the random domain name that was accessed and failed, and the time period of the domain name that was accessed and failed.

In the invention, the random domain name failed in resolution is classified and stored according to the client IP acquired in the step 3, and the same domain name is calculated only once.

In the present invention, in general, in order to prevent the occurrence of false determination, the client with an infected status is continuously observed as a suspected infected host until the number of random domain names with access failure is significantly reduced, and if the duration is within a predetermined threshold range (e.g. 30 minutes), the client is considered to be infected and is considered as an infected host.

In the present invention, for example, if a client corresponding to the client IP continuously fails to resolve the random domain name with the same characteristics for more than 20 times in any 5-minute time period, and the number of the random domain names with the same characteristics that fail to resolve before and after any 5-minute time period is less than 2 times, and the duration T is within 30 minutes (5 minutes within 30 minutes), the client may be considered to be infected and considered to be an infected host.

In the present invention, the information saved in step 6 can be used to identify the C & C server.

And 7: identifying a C & C server; and if the server is the C & C server, storing the information, and performing the step 8, otherwise, returning to the step 2.

In step 7, the method for identifying the C & C server is that, for the domain name successfully resolved, if the access time of the domain name successfully resolved currently, which is identified as the infected host in step 6, is the same as the domain name failed in resolution within the time period when the infected host accesses the random domain name failed in resolution in a large amount, the actual server IP corresponding to the domain name is identified as the C & C server.

The stored information includes client IP accessing the C & C server, time of access, domain name accessed, C & C server IP, while associating infected host with C & C server.

In the invention, in step 7, if the client IP identified as the C & C server is not identified as the infected host, the client IP is stored for a period of time, and if the client IP is identified as the infected host within the period of time, whether the domain name is the C & C domain name is further determined.

In the present invention, in principle, step 6 and step 7 will not be performed simultaneously for the same record, but their client IPs may be the same for different records, partly involving infected hosts and partly involving C & C servers, while other records that are the same as the client IP have identified the client IP as an infected host in step 6 and extracted domain name features and time ranges that failed resolution, and whether it is a C & C server is determined by determining whether the domain name in the current record matches the extracted domain name features and time ranges.

In the present invention, association means association with an infected host through a client IP.

Finally, it should be noted that the above-mentioned list is only a specific embodiment of the present invention. It is obvious that the present invention is not limited to the above embodiments, but many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims

1. A method for detecting infected hosts and C & C servers based on DNS traffic, characterized by: the method comprises the following steps:

the step 6 comprises the following steps:

step 6.1: obtaining a client IP of the current DNS information;

step 6.3: for the infected host, extracting the time period of the random domain names failed in resolution and the same characteristics of the domain names, and storing; carrying out step 8; otherwise, returning to the step 2;

2. The method for detecting infected hosts and C & C servers based on DNS traffic of claim 1, wherein: in step 1, the training set includes a normal domain name and a random domain name generated by a random algorithm.

3. The method for detecting infected hosts and C & C servers based on DNS traffic of claim 1, wherein: the algorithm for identifying the random domain name is obtained by attaching a weight proportion to the length of the domain name, the number of digits, the exchange frequency of alphanumerics, the digit proportion, the maximum length of continuous alphabets and the number of special characters.

4. The method for detecting infected hosts and C & C servers based on DNS traffic of claim 1, wherein: in step 3, the DNS information further includes a DNS server, a requested time, and an actual server IP corresponding to the domain name successfully resolved.

5. The method for detecting infected hosts and C & C servers based on DNS traffic of claim 4, wherein: in said step 6.2, the same features include that the partial character strings are the same or have the same length.

6. The method for detecting infected hosts and C & C servers based on DNS traffic of claim 5, wherein: the same characteristics also include that the second level domain names are identical but the top level domain names are different.

7. The method for detecting infected hosts and C & C servers based on DNS traffic of claim 1, wherein: in step 6, the stored information includes the infected host IP, the random domain name that was accessed and failed, and the time period of the domain name that was accessed and failed.

8. The method for detecting infected hosts and C & C servers based on DNS traffic of claim 1, wherein: in step 7, the method for identifying the C & C server is that, for the domain name successfully resolved, if the access time of the domain name successfully resolved currently, which is identified as the infected host in step 6, is the same as the domain name failed in resolution within the time period when the infected host accesses the random domain name failed in resolution in a large amount, the actual server IP corresponding to the domain name is identified as the C & C server.

9. The method for detecting infected hosts and C & C servers based on DNS traffic of claim 1, wherein: the stored information includes client IP accessing the C & C server, time of access, domain name accessed, C & C server IP, while associating infected host with C & C server.