CN106911717A - A kind of domain name detection method and device - Google Patents
A kind of domain name detection method and device Download PDFInfo
- Publication number
- CN106911717A CN106911717A CN201710242441.6A CN201710242441A CN106911717A CN 106911717 A CN106911717 A CN 106911717A CN 201710242441 A CN201710242441 A CN 201710242441A CN 106911717 A CN106911717 A CN 106911717A
- Authority
- CN
- China
- Prior art keywords
- domain name
- condition code
- detected
- normal
- letter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1466—Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/30—Managing network names, e.g. use of aliases or nicknames
- H04L61/3015—Name registration, generation or assignment
- H04L61/3025—Domain name generation or assignment
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Embodiments of the invention provide a kind of domain name detection method and device, are related to the communications field, can solve the problem that the problem of the domain name that None- identified is generated using DGA algorithms in the prior art.Including:The condition code of domain name to be detected and the condition code of normal domain name are obtained, condition code is used to indicate the distribution of letter in domain name or the distribution of letter and numeral;The feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, feature gap is used to indicate the similarity degree between the condition code of domain name to be detected and the condition code of normal domain name;Determined to be accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name according to feature gap.The present invention is for detecting domain name.
Description
Technical field
The present invention relates to the communications field, more particularly to a kind of domain name detection method and device.
Background technology
With continuing to develop for Internet technology, network has incorporated the every aspect of people's life.However, hacker enters
The derivative developed as Internet technology is invaded, also becomes all-pervasive, network security is threaten increasingly seriously.Wherein, lead to
Rogue program such as wooden horse that implantation can be remotely controlled in the terminal of access network etc. is crossed, hacker can reach control should
The purpose of terminal.
In order to tackle the invasion of hacker, the malice being implanted on hacker's control terminal can be monitored and prevented by fire wall
Program.But with the development of technology, increasing rogue program can actively initiate connection, and this connection is usually using HTTP
The mode of agreement realizes that the stop that can bypass fire wall is connected to remote server, to realize long-range control of the hacker to terminal
System.In order to solve the above problems, in the prior art there is provided a kind of by the domain name detection method based on blacklist, wherein, when
When user is matched by the domain name that terminal is accessed with the domain name in blacklist, user is forbidden to continue to access the domain by terminal
Name.
Although the above method can prevent a part of hacker from being implanted into the connection that rogue program is actively initiated, more and more
Rogue program begin to use specific domain name generation (English full name:Domain Generation Algorithm, English letter
Claim:DGA) algorithm generation domain name.Because the domain name detection method None- identified for being based on blacklist in the prior art uses DGA algorithms
The domain name of generation, and use DGA algorithms generate domain name speed it is higher, can automatically generate daily more than 50,000 random
Domain name, the domain name in blacklist can generally be bypassed in the prior art far fewer than the domain name of DGA algorithms generation, therefore rogue program
For the detection of domain name, the success rate in the prior art to the detection of improper domain name is reduced.
The content of the invention
The application provides a kind of domain name detection method and device, can solve the problem that None- identified is calculated using DGA in the prior art
The problem of the domain name of method generation.
In a first aspect, The embodiment provides a kind of domain name detection method, including:Obtain the spy of domain name to be detected
The condition code of code and normal domain name is levied, condition code is used to indicate the distribution of letter in domain name or the distribution of letter and numeral;Meter
The feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, feature gap is used to indicate domain to be detected
Similarity degree between the condition code and the condition code of normal domain name of name;According to feature gap determine be accessed for domain name whether be
The domain name that DGA algorithms are generated is generated using domain name.
Second aspect, The embodiment provides a kind of domain name detection means.Including:Processing module, for obtaining
The condition code of the condition code of domain name to be detected and normal domain name, condition code be used to indicate in domain name the distribution of letter or letter and
The distribution of numeral;The feature that processing module is additionally operable to calculate between the condition code of domain name to be detected and the condition code of normal domain name is poor
Away from feature gap is used to indicate the similarity degree between the condition code of domain name to be detected and the condition code of normal domain name;Detection mould
Block, for being determined to be accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name according to feature gap.
The embodiment provides a kind of domain name detection method and device, by the condition code for obtaining domain name to be detected
And the condition code of normal domain name, and the feature calculated between the condition code of domain name to be detected and the condition code of normal domain name is poor
Away from, it is used to indicate the distribution of letter or letter and numeral in domain name due to condition code, therefore according to the condition code of domain name to be detected
And the feature gap between the condition code of normal domain name can determine the condition code of domain name to be detected and the condition code of normal domain name
Similarity, because when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, this is to be checked
Survey the possibility that domain name is the domain name for generating the generation of DGA algorithms using domain name higher, therefore quilt can be determined according to feature gap
Whether the domain name of access is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection provided in an embodiment of the present invention
Method solves the problems, such as the domain name that None- identified is generated using DGA algorithms in the prior art, improves and improper domain name is examined
The success rate of survey, improves Consumer's Experience.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, embodiment will be described below
Needed for the accompanying drawing to be used be briefly described, it should be apparent that, drawings in the following description are only more of the invention
Embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also be attached according to these
Figure obtains other accompanying drawings.
Character distribution probability shows in a kind of domain name generated by DGA algorithms that Fig. 1 is provided by embodiments of the invention
It is intended to;
The schematic diagram of character distribution probability in a kind of normal domain name that Fig. 2 is provided by embodiments of the invention;
A kind of indicative flowchart of domain name detection method that Fig. 3 is provided by embodiments of the invention;
A kind of indicative flowchart of domain name detection method that Fig. 4 is provided by another embodiment of the present invention;
A kind of schematic diagram of domain name detection means that Fig. 5 is provided by embodiments of the invention;
A kind of schematic diagram of domain name detection means that Fig. 6 is provided by another embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
For the ease of clearly describing the technical scheme of the embodiment of the present invention, in an embodiment of the present invention, employ " the
One ", the printed words such as " second " make a distinction to function and the essentially identical identical entry of effect or similar item, and those skilled in the art can
To understand that the printed words such as " first ", " second " are not to be defined to quantity and execution order.
As the derivative that Internet technology develops, the serious life for affecting people of malicious attack behavior that hacker dominates
It is living.Wherein, wherein, it is black by the rogue program such as wooden horse that can be remotely controlled of implantation etc. in the terminal of access network
Visitor can reach the purpose for controlling the terminal.Existing rogue program such as wooden horse, corpse etc. typically can be by monitoring certain
The network port, waits long-range control server to be attached it, to receive the remote control of hacker.Attacked to take precautions against hacker
Hit, can be by setting the connection of firewall blocks rogue program and remote server.But with the development of technology, rogue program
In order to avoid being found or detecting, start actively to initiate connection from network internal, the connection that above-mentioned active is initiated generally makes
Realized with the mode of http protocol, the stop that can bypass fire wall is connected to remote server, reaches and receives remote control
Purpose.
In order to solve the above problems, in the prior art there is provided a kind of by the domain name detection method based on blacklist, its
In, when user is matched by the domain name that terminal is accessed with the domain name in blacklist, forbid user to continue to access by terminal
The domain name.This mode can effectively stop that a part of rogue program receives the problem of remote control really, but due to above-mentioned domain
Name detection method it is relatively simple, the dependence to blacklist is higher, at the same for example representative Zeus of rogue program and
Conficker gradually begins to use specific domain name generation (English full name:Domain Generation Algorithm, English
Referred to as:DGA) algorithm regularly, automatically generate domain name, and actively access the domain name of the generation.Due to being generated using DGA algorithms
The speed of domain name is higher, and the mutation highest of such as Conficker can automatically generate random more than 50,000 daily
Domain name, by comparison, the domain name in blacklist can generally be bypassed far fewer than the domain name of DGA algorithms generation, therefore rogue program
In the prior art for the detection of domain name, the success rate of domain name detection method in the prior art is reduced.
Regarding to the issue above, applicant have observed that the character of the overwhelming majority is all numeral and word in usually used domain name
Why mother, can constitute such a domain name using these numerals and letter, be for the ease of memory and complete by spreading all over
These domain names for being easy to remember are converted into IP address and conducted interviews by the DNS service of ball.Therefore when domain name is not for DGA is calculated
During the normal domain name of method generation, the character chosen for the ease of memory in the domain name more necessarily has the word of physical meaning
Or phrase.And rogue program such as wooden horse etc. using DGA algorithms generate domain name be in order to bypass existing detecting system, by
Most of domain name of DGA algorithms generation is to randomly select.Therefore normal domain name is constituted with the character of the domain name generated by DGA algorithms
In the presence of certain difference.
As shown in Figure 1, The embodiment provides character distribution probability in a kind of domain name generated by DGA algorithms
Schematic diagram, as shown in Figure 2, The embodiment provides a kind of schematic diagram of character distribution probability in normal domain name,
In accompanying drawing 1, accompanying drawing 2, what transverse axis was represented is all digital and letter and "-", and above-mentioned numeral and letter and "-" are all groups
Into the most common character of domain name.What the longitudinal axis was represented is the probability occurred in all samples of statistics.With reference to the accompanying drawings 1 and accompanying drawing
Understood shown in 2, the probability that numeral and letter occur in the domain name generated by DGA algorithms is relatively average, some conventional letters
What is occurred on the contrary is less, and the corresponding character distribution of normal domain name then significantly shows the differentiation of probability, and some characters occur
Probability be significantly larger than other characters;The domain name for being generated by DGA algorithms simultaneously more uses numeral relative to normal domain name
Alphabetical such as " xyz " etc. being of little use with some.
According to features described above, as shown in Figure 3, embodiments of the invention provide a kind of domain name detection method, including:
101st, the condition code of domain name to be detected and the condition code of normal domain name are obtained.
Wherein, condition code is used to indicate the distribution of letter in domain name or the distribution of letter and numeral.
Specifically, domain name to be detected can be to potentially include the domain name generated by DGA algorithms, domain name to be detected can be system
It is all in dynamic daily record data to be accessed for domain name.Exemplary, can be all users' access in network with detection domain name
Domain name, it can be the daily record that the web for obtaining all users in network using bypass testing equipment is accessed to obtain the domain name to be detected,
And all parse as domain name to be detected all of domain name in daily record.Normal domain name can be to have confirmed that do not include by DGA
The domain name of algorithm generation, normal domain name can be exemplary to realize obtaining, and can obtain preceding 1,000,000 in the ranking of Alexa websites
Domain name, and using 1,000,000 domain name as normal domain name.
It should be noted that when domain name to be detected or normal domain name include more domain name, can treat detection domain name or
Normal domain name is grouped, to obtain the condition code of multiple domain names to be detected.Wherein treat detection domain name carry out packet can be with base
Detection domain name is treated in same target IP address to be grouped, the domain name that correspond to same target IP address is classified after grouping
For identical is grouped;Can also based on identical subdomain name treat detection domain name be grouped, here subdomain name refer to removal TLD and
CcTLD later subdomain name, such as two domain name hezl3.xk80p.com and 14lyu.xk80p.com to be detected, after removal
Sew the entitled xl80p of " .com " later subdomain, therefore identical packet can be classified them as.
Likewise, because normal domain name potentially includes more domain name, can at random be taken in normal domain name a number of
The multiple sample groups of domain name generation, domain name of each sample group comprising equal number.Acquired multiple sample groups can be as work
It is to be contrasted to find that potentially possible DGA algorithms in domain name to be detected are generated with the domain name to be detected after packet according to object
Forgery domain name.Exemplary, 1000 samples can be randomly generated based on before ranking in Alexa 1,000,000 domain name
Group, each sample group includes 500 domain names.
Specifically, condition code is used to indicate the distribution of letter or the distribution of letter and numeral in domain name, wherein letter can be with
Including a to z, numeral can include 0 to 9.Further, it can be by the alphabetical or alphabetical and numeral in domain name to obtain condition code
The character string obtained after being ranked up according to occurrence number.When domain name to be detected or normal domain name are divided into multiple domain name groups,
Acquired condition code potentially includes multiple, and each digital or letter goes out during now condition code is used to indicate a domain name group
The distribution situation of occurrence number.
Exemplary, obtaining the condition code of domain name to be detected can be:
Detection domain name is treated to be grouped to obtain multiple domain name groups;
When the distribution of letter and numeral during condition code is used to indicate domain name, numeral and letter occur in statistics domain name group
Number of times, such as " x " occur in that 230 times, " 3 " occur in that 59 is inferior;Choose domain name group in 10 numerals of occurrence number highest and
Letter, if the numeral and letter for obtaining less than if 10 it is considered that the information that the domain name group is included be not enough to it is follow-up for doing
Judge, directly abandon the domain name group;10 numerals come will be selected and obtain one according to the arrangement of occurrence number descending with letter
10 condition codes of byte;
When the distribution of letter during condition code is used to indicate domain name, the number of times that letter occurs in statistics domain name group;Choose domain
10 letters of occurrence number highest in name group, can consider the information that the domain name group is included if the letter for obtaining is less than 10
It is not enough to for doing follow-up judgement, directly abandons the domain name group;10 letters for coming will be selected according to occurrence number descending
Arrangement obtains 10 condition codes for byte;
Obtaining the condition code of normal domain name can be:
When the distribution of letter and numeral during condition code is used to indicate domain name, numeral occurs with letter in counting normal domain name
Number of times.Choose 10 numerals of occurrence number highest and letter in normal domain name;10 numerals and the letter for coming will be selected
10 condition codes for byte are obtained according to the arrangement of occurrence number descending;
When the distribution of letter during condition code is used to indicate domain name, the number of times of letter appearance in normal domain name is counted;Choose
10 letters of occurrence number highest in normal domain name;10 letters for coming will be selected to be obtained according to the arrangement of occurrence number descending
One 10 condition code of byte.
102nd, the feature gap between the condition code and the condition code of normal domain name of domain name to be detected is calculated.
Wherein, feature gap is used to indicate the similar journey between the condition code of domain name to be detected and the condition code of normal domain name
Degree.
Specifically, feature gap is similar between the condition code of domain name to be detected and the condition code of normal domain name for indicating
Degree, further, the condition code of the bigger explanation domain name to be detected of feature gap is with the similarity of the condition code of normal domain name more
Low, domain name to be detected is that the probability of a forgery domain name generated by DGA algorithms is larger, otherwise then indicates the spy of domain name to be detected
Levy code higher with the similarity of the condition code of normal domain name, domain name to be detected is a forgery domain name generated by DGA algorithms
Probability is smaller.
Further, when the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, can be with
Calculate the Jie Kade similarity measurements between the condition code of domain name to be detected and the condition code of the normal domain name, and by Jie Kade phases
Like property degree as the feature gap between the condition code of domain name to be detected and the condition code of the normal domain name.
When the distribution of letter and numeral during condition code is used to indicate domain name, can be according to Damerau-Levenshtein
Distance algorithm calculates the condition code of domain name to be detected and the Damerau-Levenshtein distances of the condition code of normal domain name,
Damerau-Levenshtein distances are the feature gap between the condition code and the condition code of normal domain name of domain name to be detected.
103rd, determined to be accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name according to feature gap.
Specifically, after feature gap between the condition code and the condition code of normal domain name for obtaining domain name to be detected, can
Contrasted with standard value set in advance with by this feature gap, when comparing result meets to be required, it may be determined that to be detected
Domain name it is alphabetical with normal domain name in alphabetical characteristic distributions have an obvious difference, or determine detecting domains name letter and number with
The characteristic distributions of the letter and number in normal domain name have obvious difference, so that it is determined that the domain name to be detected is to use domain name
The domain name of generation DGA algorithm generations.
Further, when it is determined that domain name to be detected is the domain name generated using domain name generation DGA algorithms, can be to this
Domain name to be detected is marked.
The embodiment provides a kind of domain name detection method, by the condition code and just for obtaining domain name to be detected
The condition code of normal domain name, and the feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, due to
Condition code is used to indicate the distribution of letter or letter and numeral in domain name, therefore according to the condition code and normal operation in normal domain of domain name to be detected
Feature gap between the condition code of name can determine the condition code of domain name to be detected and the similarity of the condition code of normal domain name,
Due to when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, the domain name to be detected is to make
The possibility for generating the domain name that DGA algorithms are generated with domain name is higher, therefore can be determined to be accessed for domain name according to feature gap
Whether it is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection method provided in an embodiment of the present invention is solved
The problem of the domain name that None- identified is generated using DGA algorithms, improves the success rate to the detection of improper domain name in the prior art,
Improve Consumer's Experience.
Further, as shown in Figure 4, embodiments of the invention provide a kind of domain name detection method, including:
201st, web access logs are obtained, and parses web access logs to obtain domain name to be detected.
Specifically, it can be by bypassing all users in the network that testing equipment gets to obtain web access logs
The daily record that web is accessed, and all of domain name in daily record is all parsed, and as the basis of subsequent treatment.It is different from
The domain name that user accesses is analyzed based on the common flow using DNS, is accessed using the actual web of user terminal
Daily record is more targeted also more accurate compared to use DNS flows as initial data.
202nd, by domain name to be detected and the TLD suffix LTD and national TLD suffix of normal domain name
CcTLD is removed.
Specifically, by domain name to be detected and TLD suffix (the English full name of normal domain name:top-level
Domain, english abbreviation:) and national TLD suffix (English full name TLD:country-code top-level
Domain, english abbreviation:CcTLD after) all getting rid of, TLD and ccTLD can be avoided from disturbing domain name to be detected and normal strongly
The result of calculation of the letter and number distribution statisticses of the alphabetical or to be detected domain name and normal domain name of domain name, and and then influence to treat
The final Detection results of detection domain name.
203rd, by domain name to be detected and the prefix of normal domain name " www " removal.
Specifically, by domain name to be detected and the prefix of normal domain name " www " after removal, can avoid domain name to be detected with
And the prefix of normal domain name " www " disturb the alphabetical or to be detected domain name and normal domain name of domain name to be detected and normal domain name
Letter and number distribution statisticses result of calculation, and and then influence the final Detection results of domain name to be detected
204th, by the character removal in domain name to be detected and normal domain name in addition to 0-9, a-z, " ", " _ " and "-".
Specifically, it is possible to use regular expression such as ^ [0-9a-z._-]+by domain name to be detected and normal domain name
Character removal in addition to 0-9, a-z, " ", " _ " and "-", regular expression is meant that and only includes 0-9, and a-z is added
" ", " _ " and three characters of "-" have 39 characters altogether as character set, by the character independent assortment in above-mentioned character set
Just as effective domain name, all domain names for not meeting this condition are all dropped domain name as invalid domain name.
Further, because Chinese domain name can be converted to PunyCode (starting with xn-) domain name by browser, it is therefore desirable to
Chinese domain name in domain name to be detected and normal domain name is converted to the identification of PunyCode, prevents from being missed as conventional domain names
It is judged to DGA domain names.
Further, can also all be changed into small by all of domain name capital and small letter in domain name just to be detected and normal domain name
Write, in order to follow-up unified comparing.
205th, the condition code of domain name to be detected and the condition code of normal domain name are obtained.
Referring in particular to step 101 in above-described embodiment, will not be repeated here.
206th, when the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, domain to be detected is calculated
Jie Kade similarity measurements between the condition code and the condition code of normal domain name of name.
Wherein, Jie Kade similarity measurements are poor feature between the condition code of domain name to be detected and the condition code of normal domain name
Away from.
Specifically, when the distribution of letter during condition code is used to indicate domain name, the band detection domain name after packet can be calculated
Each domain name group condition code with packet after normal domain name each sample group condition code between Jie Kade phases
Like property degree, the arithmetic mean of instantaneous value of acquired multiple Jie Kade similarity measurements is then calculated, and using the arithmetic mean of instantaneous value as working as
Spy of condition code when being used to indicate the distribution of letter in domain name between the condition code of domain name to be detected and the condition code of normal domain name
Levy gap.
When letter is with digital distribution during condition code is used to indicate domain name, the band detection domain name after packet can be calculated
Jie Kade between the condition code of each domain name group and the condition code of each sample group of the normal domain name after packet is similar
Property degree, then calculate the arithmetic mean of instantaneous value of acquired multiple Jie Kade similarity measurements, and using the arithmetic mean of instantaneous value as spy
Feature when levying code for the distribution for indicating letter in domain name between the condition code of domain name to be detected and the condition code of normal domain name
Gap.
207th, the Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained according to for indicating domain name
During more than or equal to 0.8, it is determined that it is the domain name generated using DGA algorithms to be accessed for domain name.
Above-mentioned steps are clearly indicated obvious between the condition code of domain name to be detected and the condition code of normal domain name
Gap, also implying that the characteristic distributions of letter in the characteristic distributions and normal domain name of letter in domain name to be detected has significantly not
Together, or in domain name to be detected the characteristic distributions of numeral and letter have significantly with the characteristic distributions of numeral and letter in normal domain name
Difference, therefore all domain names to be detected for meeting conditions above all may be the potential forgery domain name generated by DGA algorithms.
208th, as Jie Kade differences and the outstanding person obtained according to the condition code of the distribution of letter and numeral in being used to indicate domain name
When the ratio of card moral similarity measurements is less than 0.1, it is determined that it is the domain name generated using DGA algorithms to be accessed for domain name.
Wherein, the Jie Kade for being used to indicate the condition code acquisition of the distribution of letter in domain name according to Jie Kade differences is similar
Property degree and according to the difference between the Jie Kade similarity measurements for indicating the condition code of the distribution of letter and numeral in domain name to obtain
The absolute value of value.
Above-mentioned steps are used to avoid domain name the recognizing by mistake comprising a large amount of numerals in domain name to be detected and normal domain name
It is set to forgery domain name (the such as website 10086.cn of China Mobile).Although from the point of view of the numeral of normal domain name and letter distribution,
The frequency of use of numerals is relatively low, but relies solely on and be judged as a domain name for forgery is also improper comprising substantial amounts of numeral
's.
209th, when the distribution of letter and numeral during condition code is used to indicate domain name, according to Damerau-Levenshtein
Distance algorithm calculates the condition code of domain name to be detected and the Damerau-Levenshtein distances of the condition code of normal domain name.
Wherein, Damerau-Levenshtein distances for condition code and the normal domain name of domain name to be detected condition code it
Between feature gap.
210th, when Damerau-Levenshtein distances are more than or equal to 0.9, it is determined that it is to use to be accessed for domain name
The domain name of DGA algorithms generation.
Above-mentioned steps are clearly indicated to be had substantially very much between the condition code of domain name to be detected and the condition code of normal domain name
Gap, also implying that the characteristic distributions of letter in the characteristic distributions and normal domain name of letter in domain name to be detected has significantly
There is substantially the characteristic distributions of numeral and letter with the characteristic distributions of numeral and letter in normal domain name in difference, or domain name to be detected
Difference, therefore all domain names to be detected for meeting conditions above all may be the potential forgery domain name generated by DGA algorithms.
211st, the domain name in the access record of terminal is matched with the domain name for being defined as being generated using DGA algorithms.
Specifically, after determining the domain name generated using DGA algorithms after testing result is obtained, can be by acquired inspection
It is that the original log record that domain name group is compared for generating is carried out to survey result and bring back to the daily record that original web accesses again
Match somebody with somebody.Web in order to be directed to user according to matching result accesses behavior and carries out filtering so as to reduction as far as possible is reported by mistake and can
Potential infected machine, and the infected order of severity are determined during to be filtered in behavior.
When the domain name in the access record of terminal and the domain name for being defined as being generated using DGA algorithms meet infection matching condition
When, perform step 212.
Matching condition is deleted when the domain name in the access record of terminal is met with the domain name for being defined as being generated using DGA algorithms
When, perform step 213.
212nd, determine that terminal is infection terminal.
213rd, the domain name for deleting matching condition will be met to be deleted from the domain name for being defined as being generated using DGA algorithms.
Exemplary, can be based on the web access logs in past 7 days, to be defined as using DGA algorithms with all
The domain name of generation is compared, and finally only chooses those and is visited more than 3 days domain names all to being defined as being generated using DGA algorithms
The access record asked, and accessed at least 3 machine conducts of the different domain names for being defined as using DGA algorithms to generate altogether
Final infected machine, in order to take measures to carry out killing or isolation as early as possible to final infected machine, reduce into
The harm of one step;If certain domain name for being defined as being generated using DGA algorithms was only accessed less than 3 times by a machine, recognize
No longer it is labeled as forging domain name for this domain name is a possible statistics noise.From such behavior filter type
Principle is:One infected bot program is as the machine of a Botnet part, it is necessary to periodically go to access the control of behind
Machine processed keeps this control and controlled relation, and is namely using the purpose that DGA algorithms carry out domain name forgery
Hide possible domain name blacklist filtering, thus the DGA algorithms that access of infected machine forge domain name should also be need through
Often change.So can determine to be exactly infected terminal substantially after the terminal for finding above behavior.
The embodiment provides a kind of domain name detection method, by the condition code and just for obtaining domain name to be detected
The condition code of normal domain name, and the feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, due to
Condition code is used to indicate the distribution of letter or letter and numeral in domain name, therefore according to the condition code and normal operation in normal domain of domain name to be detected
Feature gap between the condition code of name can determine the condition code of domain name to be detected and the similarity of the condition code of normal domain name,
Due to when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, the domain name to be detected is to make
The possibility for generating the domain name that DGA algorithms are generated with domain name is higher, therefore can be determined to be accessed for domain name according to feature gap
Whether it is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection method provided in an embodiment of the present invention is solved
The problem of the domain name that None- identified is generated using DGA algorithms, improves the success rate to the detection of improper domain name in the prior art,
Improve Consumer's Experience.
As shown in Figure 5, The embodiment provides a kind of domain name detection means 500, including:
Processing module 501, for obtaining the condition code of domain name to be detected and the condition code of normal domain name.
Wherein, condition code is used to indicate the distribution of letter in domain name or the distribution of letter and numeral.
Specifically, domain name to be detected can be to potentially include the domain name generated by DGA algorithms, domain name to be detected can be system
It is all in dynamic daily record data to be accessed for domain name.Exemplary, can be all users' access in network with detection domain name
Domain name, it can be the daily record that the web for obtaining all users in network using bypass testing equipment is accessed to obtain the domain name to be detected,
And all parse as domain name to be detected all of domain name in daily record.Normal domain name can be to have confirmed that do not include by DGA
The domain name of algorithm generation, normal domain name can be exemplary to realize obtaining, and can obtain preceding 1,000,000 in the ranking of Alexa websites
Domain name, and using 1,000,000 domain name as normal domain name.
It should be noted that when domain name to be detected or normal domain name include more domain name, can treat detection domain name or
Normal domain name is grouped, to obtain the condition code of multiple domain names to be detected.Wherein treat detection domain name carry out packet can be with base
Detection domain name is treated in same target IP address to be grouped, the domain name that correspond to same target IP address is classified after grouping
For identical is grouped;Can also based on identical subdomain name treat detection domain name be grouped, here subdomain name refer to removal TLD and
CcTLD later subdomain name, such as two domain name hezl3.xk80p.com and 14lyu.xk80p.com to be detected, after removal
Sew the entitled xl80p of " .com " later subdomain, therefore identical packet can be classified them as.
Likewise, because normal domain name potentially includes more domain name, can at random be taken in normal domain name a number of
The multiple sample groups of domain name generation, domain name of each sample group comprising equal number.Acquired multiple sample groups can be as work
It is to be contrasted to find that potentially possible DGA algorithms in domain name to be detected are generated with the domain name to be detected after packet according to object
Forgery domain name.Exemplary, 1000 samples can be randomly generated based on before ranking in Alexa 1,000,000 domain name
Group, each sample group includes 500 domain names.
Specifically, condition code is used to indicate the distribution of letter or the distribution of letter and numeral in domain name, wherein letter can be with
Including a to z, numeral can include 0 to 9.Further, it can be by the alphabetical or alphabetical and numeral in domain name to obtain condition code
The character string obtained after being ranked up according to occurrence number.When domain name to be detected or normal domain name are divided into multiple domain name groups,
Acquired condition code potentially includes multiple, and each digital or letter goes out during now condition code is used to indicate a domain name group
The distribution situation of occurrence number.
Exemplary, obtaining the condition code of domain name to be detected can be:
Detection domain name is treated to be grouped to obtain multiple domain name groups;
When the distribution of letter and numeral during condition code is used to indicate domain name, numeral and letter occur in statistics domain name group
Number of times, such as " x " occur in that 230 times, " 3 " occur in that 59 is inferior;Choose domain name group in 10 numerals of occurrence number highest and
Letter, if the numeral and letter for obtaining less than if 10 it is considered that the information that the domain name group is included be not enough to it is follow-up for doing
Judge, directly abandon the domain name group;10 numerals come will be selected and obtain one according to the arrangement of occurrence number descending with letter
10 condition codes of byte;
When the distribution of letter during condition code is used to indicate domain name, the number of times that letter occurs in statistics domain name group;Choose domain
10 letters of occurrence number highest in name group, can consider the information that the domain name group is included if the letter for obtaining is less than 10
It is not enough to for doing follow-up judgement, directly abandons the domain name group;10 letters for coming will be selected according to occurrence number descending
Arrangement obtains 10 condition codes for byte;
Obtaining the condition code of normal domain name can be:
When the distribution of letter and numeral during condition code is used to indicate domain name, numeral occurs with letter in counting normal domain name
Number of times.Choose 10 numerals of occurrence number highest and letter in normal domain name;10 numerals and the letter for coming will be selected
10 condition codes for byte are obtained according to the arrangement of occurrence number descending;
When the distribution of letter during condition code is used to indicate domain name, the number of times of letter appearance in normal domain name is counted;Choose
10 letters of occurrence number highest in normal domain name;10 letters for coming will be selected to be obtained according to the arrangement of occurrence number descending
One 10 condition code of byte.
The feature that processing module 501 is additionally operable to calculate between the condition code of domain name to be detected and the condition code of normal domain name is poor
Away from.
Wherein, feature gap is used to indicate the similar journey between the condition code of domain name to be detected and the condition code of normal domain name
Degree.
Specifically, feature gap is similar between the condition code of domain name to be detected and the condition code of normal domain name for indicating
Degree, further, the condition code of the bigger explanation domain name to be detected of feature gap is with the similarity of the condition code of normal domain name more
Low, domain name to be detected is that the probability of a forgery domain name generated by DGA algorithms is larger, otherwise then indicates the spy of domain name to be detected
Levy code higher with the similarity of the condition code of normal domain name, domain name to be detected is a forgery domain name generated by DGA algorithms
Probability is smaller.
Further, when the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, can be with
Calculate the Jie Kade similarity measurements between the condition code of domain name to be detected and the condition code of the normal domain name, and by Jie Kade phases
Like property degree as the feature gap between the condition code of domain name to be detected and the condition code of the normal domain name.
When the distribution of letter and numeral during condition code is used to indicate domain name, can be according to Damerau-Levenshtein
Distance algorithm calculates the condition code of domain name to be detected and the Damerau-Levenshtein distances of the condition code of normal domain name,
Damerau-Levenshtein distances are the feature gap between the condition code and the condition code of normal domain name of domain name to be detected.
Detection module 502, for being determined to be accessed for whether domain name is to generate DGA algorithms using domain name according to feature gap
The domain name of generation.
Specifically, after feature gap between the condition code and the condition code of normal domain name for obtaining domain name to be detected, can
Contrasted with standard value set in advance with by this feature gap, when comparing result meets to be required, it may be determined that to be detected
Domain name it is alphabetical with normal domain name in alphabetical characteristic distributions have an obvious difference, or determine detecting domains name letter and number with
The characteristic distributions of the letter and number in normal domain name have obvious difference, so that it is determined that the domain name to be detected is to use domain name
The domain name of generation DGA algorithm generations.
Further, when it is determined that domain name to be detected is the domain name generated using domain name generation DGA algorithms, can be to this
Domain name to be detected is marked.
The embodiment provides a kind of domain name detection means, by the condition code and just for obtaining domain name to be detected
The condition code of normal domain name, and the feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, due to
Condition code is used to indicate the distribution of letter or letter and numeral in domain name, therefore according to the condition code and normal operation in normal domain of domain name to be detected
Feature gap between the condition code of name can determine the condition code of domain name to be detected and the similarity of the condition code of normal domain name,
Due to when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, the domain name to be detected is to make
The possibility for generating the domain name that DGA algorithms are generated with domain name is higher, therefore can be determined to be accessed for domain name according to feature gap
Whether it is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection method provided in an embodiment of the present invention is solved
The problem of the domain name that None- identified is generated using DGA algorithms, improves the success rate to the detection of improper domain name in the prior art,
Improve Consumer's Experience.
Specifically, as shown in Figure 6, the domain name detection means 500 that embodiments of the invention are provided can also include adopting
Collection module 503, for obtaining web access logs, and parses web access logs to obtain domain name to be detected.
Specifically, it can be by bypassing all users in the network that testing equipment gets to obtain web access logs
The daily record that web is accessed, and all of domain name in daily record is all parsed, and as the basis of subsequent treatment.It is different from
The domain name that user accesses is analyzed based on the common flow using DNS, is accessed using the actual web of user terminal
Daily record is more targeted also more accurate compared to use DNS flows as initial data.
Specifically, acquisition module 503 is additionally operable to:
The TLD suffix LTD and country TLD suffix ccTLD of domain name to be detected and normal domain name are gone
Remove;And/or, by domain name to be detected and the prefix of normal domain name " www " removal;And/or, by domain name to be detected and normal operation in normal domain
Character removal in name in addition to 0-9, a-z, " ", " _ " and "-".
Specifically, by domain name to be detected and TLD suffix (the English full name of normal domain name:top-level
Domain, english abbreviation:) and national TLD suffix (English full name TLD:country-code top-level
Domain, english abbreviation:CcTLD after) all getting rid of, TLD and ccTLD can be avoided from disturbing domain name to be detected and normal strongly
The result of calculation of the letter and number distribution statisticses of the alphabetical or to be detected domain name and normal domain name of domain name, and and then influence to treat
The final Detection results of detection domain name.
Specifically, by domain name to be detected and the prefix of normal domain name " www " after removal, can avoid domain name to be detected with
And the prefix of normal domain name " www " disturb the alphabetical or to be detected domain name and normal domain name of domain name to be detected and normal domain name
Letter and number distribution statisticses result of calculation, and and then influence the final Detection results of domain name to be detected
Specifically, it is possible to use regular expression such as ^ [0-9a-z._-]+by domain name to be detected and normal domain name
Character removal in addition to 0-9, a-z, " ", " _ " and "-", regular expression is meant that and only includes 0-9, and a-z is added
" ", " _ " and three characters of "-" have 39 characters altogether as character set, by the character independent assortment in above-mentioned character set
Just as effective domain name, all domain names for not meeting this condition are all dropped domain name as invalid domain name.
Further, because Chinese domain name can be converted to PunyCode (starting with xn-) domain name by browser, it is therefore desirable to
Chinese domain name in domain name to be detected and normal domain name is converted to the identification of PunyCode, prevents from being missed as conventional domain names
It is judged to DGA domain names.
Further, can also all be changed into small by all of domain name capital and small letter in domain name just to be detected and normal domain name
Write, in order to follow-up unified comparing.
Specifically, processing module 501 specifically for:
When the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, domain name to be detected is calculated
Jie Kade similarity measurements between the condition code of condition code and normal domain name, Jie Kade similarity measurements are the feature of domain name to be detected
Feature gap between the condition code of code and normal domain name;And/or,
When the distribution of letter and numeral during condition code is used to indicate domain name, according to Damerau-Levenshtein distances
Algorithm calculates the condition code of domain name to be detected and the Damerau-Levenshtein distances of the condition code of normal domain name,
Damerau-Levenshtein distances are the feature gap between the condition code and the condition code of normal domain name of domain name to be detected.
Wherein, Jie Kade similarity measurements are poor feature between the condition code of domain name to be detected and the condition code of normal domain name
Away from.
Specifically, when the distribution of letter during condition code is used to indicate domain name, the band detection domain name after packet can be calculated
Each domain name group condition code with packet after normal domain name each sample group condition code between Jie Kade phases
Like property degree, the arithmetic mean of instantaneous value of acquired multiple Jie Kade similarity measurements is then calculated, and using the arithmetic mean of instantaneous value as working as
Spy of condition code when being used to indicate the distribution of letter in domain name between the condition code of domain name to be detected and the condition code of normal domain name
Levy gap.
When letter is with digital distribution during condition code is used to indicate domain name, the band detection domain name after packet can be calculated
Jie Kade between the condition code of each domain name group and the condition code of each sample group of the normal domain name after packet is similar
Property degree, then calculate the arithmetic mean of instantaneous value of acquired multiple Jie Kade similarity measurements, and using the arithmetic mean of instantaneous value as spy
Feature when levying code for the distribution for indicating letter in domain name between the condition code of domain name to be detected and the condition code of normal domain name
Gap.
Specifically, detection module 502 specifically for:
The Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained according to for indicating domain name are more than
Or during equal to 0.8, it is determined that it is the domain name generated using DGA algorithms to be accessed for domain name;And/or,
As Jie Kade differences and the Jie Kade obtained according to the condition code of the distribution of letter and numeral in being used to indicate domain name
When the ratio of similarity measurements is less than 0.1, it is determined that it is the domain name generated using DGA algorithms to be accessed for domain name, Jie Kade differences are
According to the Jie Kade similarity measurements for indicating the condition code of the distribution of letter in domain name to obtain with according in for indicating domain name
The absolute value of the difference between the Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained;And/or,
When Damerau-Levenshtein distances are more than or equal to 0.9, it is determined that it is to be calculated using DGA to be accessed for domain name
The domain name of method generation.
The above is clearly indicated obvious between the condition code of domain name to be detected and the condition code of normal domain name
Gap, also implying that the characteristic distributions of letter in the characteristic distributions and normal domain name of letter in domain name to be detected has significantly not
Together, or in domain name to be detected the characteristic distributions of numeral and letter have significantly with the characteristic distributions of numeral and letter in normal domain name
Difference, therefore all domain names to be detected for meeting conditions above all may be the potential forgery domain name generated by DGA algorithms.
Wherein, the Jie Kade for being used to indicate the condition code acquisition of the distribution of letter in domain name according to Jie Kade differences is similar
Property degree and according to the difference between the Jie Kade similarity measurements for indicating the condition code of the distribution of letter and numeral in domain name to obtain
The absolute value of value.
The above is used to avoid domain name the recognizing by mistake comprising a large amount of numerals in domain name to be detected and normal domain name
It is set to forgery domain name (the such as website 10086.cn of China Mobile).Although from the point of view of the numeral of normal domain name and letter distribution,
The frequency of use of numerals is relatively low, but relies solely on and be judged as a domain name for forgery is also improper comprising substantial amounts of numeral
's.
Specifically, as shown in Figure 6, the domain name detection means 500 that embodiments of the invention are provided can also include back
Module of tracing back 504, domain name and the domain for being defined as being generated using DGA algorithms that backtracking module 504 is used in the access record to terminal
Name is matched;Matched when the domain name in the access record of terminal meets infection with the domain name for being defined as being generated using DGA algorithms
During condition, determine that terminal is infection terminal;And/or, when terminal access record in domain name be defined as being given birth to using DGA algorithms
Into domain name meet delete matching condition when, the domain name for deleting matching condition will be met from being defined as what is generated using DGA algorithms
Deleted in domain name.
Specifically, after determining the domain name generated using DGA algorithms after testing result is obtained, can be by acquired inspection
It is that the original log record that domain name group is compared for generating is carried out to survey result and bring back to the daily record that original web accesses again
Match somebody with somebody.Web in order to be directed to user according to matching result accesses behavior and carries out filtering so as to reduction as far as possible is reported by mistake and can
Potential infected machine, and the infected order of severity are determined during to be filtered in behavior.
Exemplary, can be based on the web access logs in past 7 days, to be defined as using DGA algorithms with all
The domain name of generation is compared, and finally only chooses those and is visited more than 3 days domain names all to being defined as being generated using DGA algorithms
The access record asked, and accessed at least 3 machine conducts of the different domain names for being defined as using DGA algorithms to generate altogether
Final infected machine, in order to take measures to carry out killing or isolation as early as possible to final infected machine, reduce into
The harm of one step;If certain domain name for being defined as being generated using DGA algorithms was only accessed less than 3 times by a machine, recognize
No longer it is labeled as forging domain name for this domain name is a possible statistics noise.From such behavior filter type
Principle is:One infected bot program is as the machine of a Botnet part, it is necessary to periodically go to access the control of behind
Machine processed keeps this control and controlled relation, and is namely using the purpose that DGA algorithms carry out domain name forgery
Hide possible domain name blacklist filtering, thus the DGA algorithms that access of infected machine forge domain name should also be need through
Often change.So can determine to be exactly infected terminal substantially after the terminal for finding above behavior.
The embodiment provides a kind of domain name detection means, by the condition code and just for obtaining domain name to be detected
The condition code of normal domain name, and the feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, due to
Condition code is used to indicate the distribution of letter or letter and numeral in domain name, therefore according to the condition code and normal operation in normal domain of domain name to be detected
Feature gap between the condition code of name can determine the condition code of domain name to be detected and the similarity of the condition code of normal domain name,
Due to when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, the domain name to be detected is to make
The possibility for generating the domain name that DGA algorithms are generated with domain name is higher, therefore can be determined to be accessed for domain name according to feature gap
Whether it is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection method provided in an embodiment of the present invention is solved
The problem of the domain name that None- identified is generated using DGA algorithms, improves the success rate to the detection of improper domain name in the prior art,
Improve Consumer's Experience.
Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can be with
Realized with hardware, or firmware is realized, or combinations thereof mode is realized.When implemented in software, can be by above-mentioned functions
Storage is transmitted in computer-readable medium or as one or more instructions on computer-readable medium or code.Meter
Calculation machine computer-readable recording medium includes computer-readable storage medium and communication media, and wherein communication media includes being easy to from a place to another
Any medium of individual place transmission computer program.Storage medium can be any usable medium that computer can be accessed.With
As a example by this but it is not limited to:Computer-readable medium can include random access memory (English full name:Random Access
Memory, English abbreviation:RAM), read-only storage (English full name:Read Only Memory, English abbreviation:ROM), electricity can
EPROM (English full name:Electrically Erasable Programmable Read Only
Memory, English abbreviation:EEPROM), read-only optical disc (English full name:Compact Disc Read Only Memory, English
Referred to as:CD-ROM) or other optical disc storages, magnetic disk storage medium or other magnetic storage apparatus or can be used in carry or
Desired program code of the storage with instruction or data structure form simultaneously can be by any other medium of computer access.This
Outward.Any connection can be appropriate as computer-readable medium.If for example, software be use coaxial cable, optical fiber cable,
Twisted-pair feeder, digital subscriber line (English full name:Digital Subscriber Line, English abbreviation:DSL it is) or such as red
The wireless technology of outside line, radio and microwave etc is transmitted from website, server or other remote sources, then coaxial electrical
The wireless technology of cable, optical fiber cable, twisted-pair feeder, DSL or such as infrared ray, wireless and microwave etc is included in computer-readable
In the definition of medium.
Through the above description of the embodiments, it is apparent to those skilled in the art that, when with software
When mode realizes the present invention, can will store in computer-readable medium or logical for the instruction or code that perform the above method
Computer-readable medium is crossed to be transmitted.Computer-readable medium includes computer-readable storage medium and communication media, wherein communicating
Medium includes being easy to being transmitted from a place to another place any medium of computer program.Storage medium can be calculated
Any usable medium that machine can be accessed.As example but it is not limited to:Computer-readable medium can include RAM, ROM, electricity can
EPROM (full name:Electrically erasable programmable read-only memory,
Referred to as:EEPROM), CD, disk or other magnetic storage apparatus or can be used in carrying or store with instruction or data
The desired program code of structure type simultaneously can be by any other medium of computer access.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain
Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.
Claims (12)
1. a kind of domain name detection method, it is characterised in that including:
Obtain the condition code of domain name to be detected and the condition code of normal domain name, described document information is used to indicating letter in domain name
Distribution or the distribution of letter and numeral;
The feature gap between the condition code of the domain name to be detected and the condition code of the normal domain name is calculated, the feature is poor
Away from for indicating the similarity degree between the condition code of the domain name to be detected and the condition code of the normal domain name;
It is accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name according to the feature gap determines.
2. domain name detection method according to claim 1, it is characterised in that the condition code of the acquisition domain name to be detected with
And before the condition code of normal domain name, methods described also includes:
Web access logs are obtained, and parses the web access logs to obtain the domain name to be detected.
3. domain name detection method according to claim 1, it is characterised in that the condition code of the acquisition domain name to be detected with
And before the condition code of normal domain name, methods described also includes:
The TLD suffix LTD and country TLD suffix ccTLD of the domain name to be detected and normal domain name are gone
Remove;And/or,
By the domain name to be detected and the prefix of normal domain name " www " removal;And/or,
By the character removal in the domain name to be detected and normal domain name in addition to 0-9, a-z, " ", " _ " and "-".
4. domain name detection method according to claim 1, it is characterised in that the feature of the calculating domain name to be detected
Feature gap between code and the condition code of the normal domain name, including:
When the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, the domain name to be detected is calculated
Jie Kade similarity measurements between the condition code of condition code and the normal domain name, the Jie Kade similarity measurements are described to be checked
The feature gap surveyed between the condition code and the condition code of the normal domain name of domain name;And/or,
When the distribution of letter and numeral during condition code is used to indicate domain name, according to Damerau-Levenshtein distance algorithms
Calculate the condition code of the domain name to be detected and the Damerau-Levenshtein distances of the condition code of the normal domain name, institute
Damerau-Levenshtein distances are stated between the condition code and the condition code of the normal domain name of the domain name to be detected
Feature gap.
5. domain name detection method according to claim 4, it is characterised in that described according to the feature gap determines
It is accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name, including:
The Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained according to for indicating domain name are more than
Or during equal to 0.8, it is determined that the domain name that is accessed for is the domain name generated using DGA algorithms;And/or,
As Jie Kade differences and the Jie Kade obtained according to the condition code of the distribution of letter and numeral in being used to indicate domain name
When the ratio of similarity measurements is less than 0.1, it is determined that the domain name that is accessed for is the domain name generated using DGA algorithms, the outstanding card
The Jie Kade similarity measurements for being used to indicating the condition code of the distribution of letter in domain name to obtain according to moral difference with according to
In indicate domain name in letter and numeral distribution condition code obtain the Jie Kade similarity measurements between difference it is absolute
Value;And/or,
When Damerau-Levenshtein distances are more than or equal to 0.9, it is determined that the domain name that is accessed for is to use
The domain name of DGA algorithms generation.
6. the domain name detection method according to any one of claim 1-5, it is characterised in that methods described also includes:
Domain name in the access record of terminal is matched with the domain name for being defined as being generated using DGA algorithms;
Matched when the domain name in the access record of the terminal meets infection with the domain name for being defined as being generated using DGA algorithms
During condition, determine that the terminal is infection terminal;And/or,
Matched when the domain name in the access record of the terminal meets to delete with the domain name for being defined as being generated using DGA algorithms
During condition, the domain name that will meet the deletion matching condition is deleted from the domain name for being defined as and being generated using DGA algorithms.
7. a kind of domain name detection means, it is characterised in that including:
Processing module, for obtaining the condition code of domain name to be detected and the condition code of normal domain name, described document information is used to refer to
Show the distribution of letter in domain name or the distribution of letter and numeral;
The processing module is additionally operable to calculate between the condition code of the domain name to be detected and the condition code of the normal domain name
Feature gap, the feature gap is used to indicate between the condition code of the domain name to be detected and the condition code of the normal domain name
Similarity degree;
Detection module, for being accessed for whether domain name is to use domain name to generate DGA to calculate according to feature gap determination
The domain name of method generation.
8. domain name detection means according to claim 7, it is characterised in that described device also includes:
Acquisition module, for obtaining web access logs, and parses the web access logs to obtain the domain name to be detected.
9. domain name detection means according to claim 7, it is characterised in that the acquisition module is additionally operable to:
The TLD suffix LTD and country TLD suffix ccTLD of the domain name to be detected and normal domain name are gone
Remove;And/or, by the domain name to be detected and the prefix of normal domain name " www " removal;And/or, by the domain name to be detected with
And the character removal in normal domain name in addition to 0-9, a-z, " ", " _ " and "-".
10. domain name detection means according to claim 7, it is characterised in that processing module specifically for:
When the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, the domain name to be detected is calculated
Jie Kade similarity measurements between the condition code of condition code and the normal domain name, the Jie Kade similarity measurements are described to be checked
The feature gap surveyed between the condition code and the condition code of the normal domain name of domain name;And/or,
When the distribution of letter and numeral during condition code is used to indicate domain name, according to Damerau-Levenshtein distance algorithms
Calculate the condition code of the domain name to be detected and the Damerau-Levenshtein distances of the condition code of the normal domain name, institute
Damerau-Levenshtein distances are stated between the condition code and the condition code of the normal domain name of the domain name to be detected
Feature gap.
11. domain name detection means according to claim 10, it is characterised in that the detection module specifically for:
The Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained according to for indicating domain name are more than
Or during equal to 0.8, it is determined that the domain name that is accessed for is the domain name generated using DGA algorithms;And/or,
As Jie Kade differences and the Jie Kade obtained according to the condition code of the distribution of letter and numeral in being used to indicate domain name
When the ratio of similarity measurements is less than 0.1, it is determined that the domain name that is accessed for is the domain name generated using DGA algorithms, the outstanding card
The Jie Kade similarity measurements for being used to indicating the condition code of the distribution of letter in domain name to obtain according to moral difference with according to
In indicate domain name in letter and numeral distribution condition code obtain the Jie Kade similarity measurements between difference it is absolute
Value;And/or,
When Damerau-Levenshtein distances are more than or equal to 0.9, it is determined that the domain name that is accessed for is to use
The domain name of DGA algorithms generation.
The 12. domain name detection means according to any one of claim 7-11, it is characterised in that described device also includes:
Backtracking module, is carried out for the domain name and the domain name for being defined as being generated using DGA algorithms in the access record to terminal
Match somebody with somebody;Matched when the domain name in the access record of the terminal meets infection with the domain name for being defined as being generated using DGA algorithms
During condition, determine that the terminal is infection terminal;And/or, when the domain name in the access record of the terminal is defined as with described
The domain name generated using DGA algorithms is met when deleting matching condition, will meet the domain name of the deletion matching condition from it is described really
It is set to deletion in the domain name generated using DGA algorithms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710242441.6A CN106911717A (en) | 2017-04-13 | 2017-04-13 | A kind of domain name detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710242441.6A CN106911717A (en) | 2017-04-13 | 2017-04-13 | A kind of domain name detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106911717A true CN106911717A (en) | 2017-06-30 |
Family
ID=59209445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710242441.6A Pending CN106911717A (en) | 2017-04-13 | 2017-04-13 | A kind of domain name detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106911717A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109246083A (en) * | 2018-08-09 | 2019-01-18 | 北京奇安信科技有限公司 | A kind of detection method and device of DGA domain name |
WO2019096099A1 (en) * | 2017-11-15 | 2019-05-23 | 瀚思安信(北京)软件技术有限公司 | Real-time detection method and apparatus for dga domain name |
CN109936560A (en) * | 2018-12-27 | 2019-06-25 | 上海银行股份有限公司 | Malware means of defence and device |
CN111478877A (en) * | 2019-01-24 | 2020-07-31 | 安碁资讯股份有限公司 | Domain name identification method and domain name identification device |
CN111641663A (en) * | 2020-07-06 | 2020-09-08 | 奇安信科技集团股份有限公司 | Safety detection method and device |
CN111935099A (en) * | 2020-07-16 | 2020-11-13 | 兰州理工大学 | Malicious domain name detection method based on deep noise reduction self-coding network |
CN112073551A (en) * | 2020-08-26 | 2020-12-11 | 重庆理工大学 | DGA Domain Name Detection System Based on Character-Level Sliding Window and Deep Residual Network |
US10880319B2 (en) | 2018-04-26 | 2020-12-29 | Micro Focus Llc | Determining potentially malware generated domain names |
US10911481B2 (en) | 2018-01-31 | 2021-02-02 | Micro Focus Llc | Malware-infected device identifications |
US10931714B2 (en) | 2019-01-08 | 2021-02-23 | Acer Cyber Security Incorporated | Domain name recognition method and domain name recognition device |
US10965697B2 (en) | 2018-01-31 | 2021-03-30 | Micro Focus Llc | Indicating malware generated domain names using digits |
CN112751804A (en) * | 2019-10-30 | 2021-05-04 | 北京观成科技有限公司 | Method, device and equipment for identifying counterfeit domain name |
US11108794B2 (en) | 2018-01-31 | 2021-08-31 | Micro Focus Llc | Indicating malware generated domain names using n-grams |
US11245720B2 (en) | 2019-06-06 | 2022-02-08 | Micro Focus Llc | Determining whether domain is benign or malicious |
US11271963B2 (en) | 2018-12-20 | 2022-03-08 | Micro Focus Llc | Defending against domain name system based attacks |
CN116980234A (en) * | 2023-09-25 | 2023-10-31 | 北京源堡科技有限公司 | Domain name imitation detection method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1670723A (en) * | 2004-03-16 | 2005-09-21 | 微软公司 | Systems and methods for improved spell checking |
CN103098050A (en) * | 2010-01-29 | 2013-05-08 | 因迪普拉亚公司 | Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization |
US20160057165A1 (en) * | 2014-08-22 | 2016-02-25 | Mcafee, Inc. | System and method to detect domain generation algorithm malware and systems infected by such malware |
CN105577660A (en) * | 2015-12-22 | 2016-05-11 | 国家电网公司 | DGA domain name detection method based on random forest |
CN105610830A (en) * | 2015-12-30 | 2016-05-25 | 山石网科通信技术有限公司 | Method and device for detecting domain name |
CN105827594A (en) * | 2016-03-08 | 2016-08-03 | 北京航空航天大学 | Suspicion detection method based on domain name readability and domain name analysis behavior |
CN106372056A (en) * | 2016-08-25 | 2017-02-01 | 久远谦长(北京)技术服务有限公司 | Natural language-based topic and keyword extraction method and system |
CN106372659A (en) * | 2016-08-30 | 2017-02-01 | 五八同城信息技术有限公司 | Similar object determination method and device |
CN106557476A (en) * | 2015-09-24 | 2017-04-05 | 北京奇虎科技有限公司 | The acquisition methods and device of relevant information |
-
2017
- 2017-04-13 CN CN201710242441.6A patent/CN106911717A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1670723A (en) * | 2004-03-16 | 2005-09-21 | 微软公司 | Systems and methods for improved spell checking |
CN103098050A (en) * | 2010-01-29 | 2013-05-08 | 因迪普拉亚公司 | Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization |
US20160057165A1 (en) * | 2014-08-22 | 2016-02-25 | Mcafee, Inc. | System and method to detect domain generation algorithm malware and systems infected by such malware |
CN106557476A (en) * | 2015-09-24 | 2017-04-05 | 北京奇虎科技有限公司 | The acquisition methods and device of relevant information |
CN105577660A (en) * | 2015-12-22 | 2016-05-11 | 国家电网公司 | DGA domain name detection method based on random forest |
CN105610830A (en) * | 2015-12-30 | 2016-05-25 | 山石网科通信技术有限公司 | Method and device for detecting domain name |
CN105827594A (en) * | 2016-03-08 | 2016-08-03 | 北京航空航天大学 | Suspicion detection method based on domain name readability and domain name analysis behavior |
CN106372056A (en) * | 2016-08-25 | 2017-02-01 | 久远谦长(北京)技术服务有限公司 | Natural language-based topic and keyword extraction method and system |
CN106372659A (en) * | 2016-08-30 | 2017-02-01 | 五八同城信息技术有限公司 | Similar object determination method and device |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019096099A1 (en) * | 2017-11-15 | 2019-05-23 | 瀚思安信(北京)软件技术有限公司 | Real-time detection method and apparatus for dga domain name |
US11334764B2 (en) | 2017-11-15 | 2022-05-17 | Han Si An Xin (Beijing) Software Technology Co., Ltd | Real-time detection method and apparatus for DGA domain name |
US10965697B2 (en) | 2018-01-31 | 2021-03-30 | Micro Focus Llc | Indicating malware generated domain names using digits |
US11108794B2 (en) | 2018-01-31 | 2021-08-31 | Micro Focus Llc | Indicating malware generated domain names using n-grams |
US10911481B2 (en) | 2018-01-31 | 2021-02-02 | Micro Focus Llc | Malware-infected device identifications |
US10880319B2 (en) | 2018-04-26 | 2020-12-29 | Micro Focus Llc | Determining potentially malware generated domain names |
CN109246083A (en) * | 2018-08-09 | 2019-01-18 | 北京奇安信科技有限公司 | A kind of detection method and device of DGA domain name |
CN109246083B (en) * | 2018-08-09 | 2021-08-03 | 奇安信科技集团股份有限公司 | DGA domain name detection method and device |
US11271963B2 (en) | 2018-12-20 | 2022-03-08 | Micro Focus Llc | Defending against domain name system based attacks |
CN109936560A (en) * | 2018-12-27 | 2019-06-25 | 上海银行股份有限公司 | Malware means of defence and device |
US10931714B2 (en) | 2019-01-08 | 2021-02-23 | Acer Cyber Security Incorporated | Domain name recognition method and domain name recognition device |
CN111478877B (en) * | 2019-01-24 | 2022-08-02 | 安碁资讯股份有限公司 | Domain name identification method and domain name identification device |
CN111478877A (en) * | 2019-01-24 | 2020-07-31 | 安碁资讯股份有限公司 | Domain name identification method and domain name identification device |
US11245720B2 (en) | 2019-06-06 | 2022-02-08 | Micro Focus Llc | Determining whether domain is benign or malicious |
CN112751804A (en) * | 2019-10-30 | 2021-05-04 | 北京观成科技有限公司 | Method, device and equipment for identifying counterfeit domain name |
CN112751804B (en) * | 2019-10-30 | 2023-04-07 | 北京观成科技有限公司 | Method, device and equipment for identifying counterfeit domain name |
CN111641663A (en) * | 2020-07-06 | 2020-09-08 | 奇安信科技集团股份有限公司 | Safety detection method and device |
CN111641663B (en) * | 2020-07-06 | 2022-08-12 | 奇安信科技集团股份有限公司 | Safety detection method and device |
CN111935099A (en) * | 2020-07-16 | 2020-11-13 | 兰州理工大学 | Malicious domain name detection method based on deep noise reduction self-coding network |
CN112073551B (en) * | 2020-08-26 | 2021-07-20 | 重庆理工大学 | DGA Domain Name Detection System Based on Character-Level Sliding Window and Deep Residual Network |
CN112073551A (en) * | 2020-08-26 | 2020-12-11 | 重庆理工大学 | DGA Domain Name Detection System Based on Character-Level Sliding Window and Deep Residual Network |
CN116980234A (en) * | 2023-09-25 | 2023-10-31 | 北京源堡科技有限公司 | Domain name imitation detection method and system |
CN116980234B (en) * | 2023-09-25 | 2024-01-05 | 北京源堡科技有限公司 | Domain name imitation detection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106911717A (en) | A kind of domain name detection method and device | |
CN104113519B (en) | Network attack detecting method and its device | |
CN103428189B (en) | A kind of methods, devices and systems identifying malicious network device | |
CN111949803B (en) | Knowledge graph-based network abnormal user detection method, device and equipment | |
CN102945340B (en) | information object detection method and system | |
CN105024969A (en) | Method and device for realizing malicious domain name identification | |
CN109743294A (en) | Interface access control method, device, computer equipment and storage medium | |
CN110351248B (en) | Safety protection method and device based on intelligent analysis and intelligent current limiting | |
JP2016146114A (en) | Management method of blacklist | |
CN113992356A (en) | Method and device for detecting IP attack and electronic equipment | |
CN112434304B (en) | Method, server and computer-readable storage medium for defending against network attacks | |
CN110493253B (en) | Botnet analysis method of home router based on raspberry group design | |
CN106411819A (en) | Method and apparatus for recognizing proxy Internet protocol address | |
CN107172033A (en) | A kind of WAF erroneous judgement recognition methods and device | |
CN111625700B (en) | Anti-grabbing method, device, equipment and computer storage medium | |
US11627050B2 (en) | Distinguishing network connection requests | |
KR101428725B1 (en) | A System and a Method for Finding Malicious Code Hidden Websites by Checking Sub-URLs | |
CN113726775B (en) | Attack detection method, device, equipment and storage medium | |
CN113923039B (en) | Attack equipment identification method and device, electronic equipment and readable storage medium | |
CN114465746B (en) | Network attack control method and system | |
CN110266684A (en) | A kind of domain name system security means of defence and device | |
CN106850500A (en) | Fishing website processing method and processing device | |
CN112702349B (en) | Network attack defense method and device and electronic bidding transaction platform | |
US11683337B2 (en) | Harvesting fully qualified domain names from malicious data packets | |
CN103581910B (en) | A kind of method and apparatus for following the trail of mobile subscriber |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170630 |
|
RJ01 | Rejection of invention patent application after publication |