CN114666071A

CN114666071A - Botnet identification method and device and terminal equipment

Info

Publication number: CN114666071A
Application number: CN202011403556.7A
Authority: CN
Inventors: 周实奇; 黄倚霄; 钱湖海; 钱成; 周旭莹
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2022-06-24
Anticipated expiration: 2040-12-04
Also published as: CN114666071B

Abstract

The present application discloses a botnet identification method, device and terminal device, wherein the method includes: performing risk identification on a DNS message log to be processed, and obtaining a risk identification result, wherein the DNS message log includes multiple domain name information, Each of the domain name information includes a domain name and a source IP corresponding to the domain name; when the risk identification result indicates that the DNS packet log is suspected of malicious traffic, the DNS packet log is determined based on preconfigured DGA family rules. The DGA family set to which each domain name information in the document log belongs; based on the DGA family set and the source IP, cluster the domain name information with the same source IP to obtain multiple target traffic groups; count the target traffic separately The number of source IPs accessing the same type of DGA family set in the group; when the number is greater than the first threshold and the proportion of invalid access returns is greater than the second threshold, it is determined that the network corresponding to the source IP is a botnet.

Description

Botnet identification method, device and terminal device

技术领域technical field

本申请涉及网络安全技术领域，尤其涉及一种僵尸网络识别方法、装置及终端设备。The present application relates to the technical field of network security, and in particular, to a method, device and terminal device for identifying a botnet.

背景技术Background technique

僵尸网络是一种复杂、灵活、高效的网络攻击平台，在互联网中分布非常广泛。僵尸网络使攻击者具备了实施大规模恶意活动的能力，如发动分布式拒绝服务攻击、发送垃圾邮件(Spam)、网络钓鱼(Phishing)以及窃取信息等，Botnet is a complex, flexible and efficient network attack platform, which is widely distributed in the Internet. Botnets enable attackers to carry out large-scale malicious activities, such as launching distributed denial of service attacks, sending spam (Spam), phishing (Phishing), and stealing information, etc.

其中，僵尸网络的逻辑为黑客利用相应的手段攻破用户主机、企业服务器等，以实现自身的非法目的。僵尸网络的活动主要分为传播、命令与控制(C&C)、攻击三个阶段，其中命令与控制是僵尸网络工作的核心机制。例如图1所示，是一种基于中心C&C服务器的僵尸网络实现过程，其通过远控域名，建立C&C服务器与僵尸主机之间的联系，再通过下发指令，发动群体对受害主机或服务器的网络攻击。Among them, the logic of the botnet is that hackers use corresponding means to break through user hosts, enterprise servers, etc. to achieve their own illegal purposes. The activities of botnets are mainly divided into three stages: propagation, command and control (C&C), and attack. Command and control is the core mechanism of botnet work. For example, as shown in Figure 1, it is a botnet implementation process based on the central C&C server. It establishes the connection between the C&C server and the bot host through the remote control domain name, and then sends out instructions to launch groups to attack the victim host or server. Network attacks.

由于僵尸网络的危害严重，因此，如何有效识别并对抗僵尸网络已经成为网络安全研究的热点之一。但是，在僵尸网络在与安全组织和厂商的博弈过程中，为了解决僵尸网络的隐蔽性、生存性等问题，其也在不断地演化，从而呈现出如下的发展趋势。Due to the serious harm of botnets, how to effectively identify and fight against botnets has become one of the hotspots in network security research. However, in the game process of botnets with security organizations and manufacturers, in order to solve the problems of concealment and survivability of botnets, they are also constantly evolving, thus showing the following development trends.

1、非中心化：非中心化结构因其在抗毁性方面的优势，正逐渐被越来越多的僵尸网络采用。1. Decentralization: Decentralized structures are gradually being adopted by more and more botnets due to their advantages in invulnerability.

2、小型化：僵尸网络的规模在一定程度上呈现出缩小的趋势。2. Miniaturization: The scale of botnets is shrinking to a certain extent.

3、隔离化：更多的僵尸网络将会采用Domain-Flux(DGA)技术来隔离僵尸结点与命令服务器的直接通信，以为命令服务器提供一层安全保护。3. Isolation: More botnets will use Domain-Flux (DGA) technology to isolate the direct communication between zombie nodes and command servers, so as to provide a layer of security protection for command servers.

针对僵尸网络出现的前述趋势，相关技术中已有的僵尸网络检测手段逐渐失去了自身的优势，导致僵尸网络识别效果差，无法有效对抗僵尸网络的攻击。In response to the aforementioned trend of botnets, the existing botnet detection methods in related technologies gradually lose their own advantages, resulting in poor botnet identification effects and ineffective countermeasures against botnet attacks.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种僵尸网络识别方法、装置及终端设备，能够有效解决僵尸网络识别效果差，无法有效对抗僵尸网络攻击的问题。The embodiments of the present application provide a botnet identification method, device, and terminal device, which can effectively solve the problem of poor botnet identification effect and inability to effectively counter botnet attacks.

为了解决上述问题，本申请是这样实现的：In order to solve the above problems, this application is implemented as follows:

第一方面，本申请实施例提供一种僵尸网络识别方法，所述方法包括：对待处理的DNS报文日志进行风险识别，得到风险识别结果，所述DNS报文日志中包括多个域名信息，每个所述域名信息包括域名以及所述域名对应的源IP；在所述风险识别结果指示所述DNS报文日志疑似恶意流量的情况下，基于预配置的DGA家族规则，确定所述DNS报文日志中各域名信息所属的目标DGA家族集合；基于所述DGA家族集合以及所述源IP，对具有相同源IP的域名信息集进行聚类，得到多个目标流量组；分别统计各所述目标流量组中访问同一类DGA家族集合的源IP的数量；在所述数量大于第一阈值、且访问无效返回占比大于第二阈值的情况下，确定所述源IP对应的网络为僵尸网络。In a first aspect, an embodiment of the present application provides a method for identifying a botnet, the method comprising: performing risk identification on a DNS message log to be processed, and obtaining a risk identification result, where the DNS message log includes multiple domain name information, Each of the domain name information includes a domain name and a source IP corresponding to the domain name; when the risk identification result indicates that the DNS packet log is suspected of malicious traffic, the DNS packet log is determined based on preconfigured DGA family rules. The target DGA family set to which each domain name information in the document log belongs; based on the DGA family set and the source IP, cluster the domain name information sets with the same source IP to obtain multiple target traffic groups; The number of source IPs accessing the same type of DGA family set in the target traffic group; when the number is greater than the first threshold and the proportion of invalid access returns is greater than the second threshold, it is determined that the network corresponding to the source IP is a botnet .

第二方面，本申请实施例还提供一种僵尸网络识别装置，包括：识别模块，用于对待处理的DNS报文日志进行风险识别，得到风险识别结果，所述DNS报文日志中包括多个域名信息，每个所述域名信息包括域名以及所述域名对应的源IP；第一确定模块，用于在所述风险识别结果指示所述DNS报文日志疑似恶意流量的情况下，基于预配置的DGA家族规则，确定所述DNS报文日志中各域名信息所属的目标DGA家族集合；聚类模块，用于基于所述DGA家族集合以及所述源IP，对具有相同源IP的域名信息集进行聚类，得到多个目标流量组；统计模块，用于分别统计各所述目标流量组中访问同一类DGA家族集合的源IP的数量；第二确定模块，用于在所述数量大于第一阈值、且访问无效返回占比大于第二阈值的情况下，确定所述源IP对应的网络为僵尸网络。In a second aspect, an embodiment of the present application further provides a botnet identification device, including: an identification module configured to perform risk identification on a DNS message log to be processed, and obtain a risk identification result, where the DNS message log includes multiple Domain name information, each of the domain name information includes a domain name and a source IP corresponding to the domain name; a first determination module is configured to, when the risk identification result indicates that the DNS packet log is suspected of malicious traffic, based on pre-configured The DGA family rules, determine the target DGA family set to which each domain name information in the DNS message log belongs; the clustering module is used for the domain name information set with the same source IP based on the DGA family set and the source IP. Clustering is performed to obtain a plurality of target traffic groups; a statistics module is used to separately count the number of source IPs accessing the same type of DGA family set in each of the target traffic groups; a second determination module is used for when the number is greater than the number of In the case of a threshold and the proportion of invalid access returns is greater than the second threshold, it is determined that the network corresponding to the source IP is a botnet.

第三方面，本申请实施例还提供一种终端设备，包括处理器，存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令，所述程序或指令被所述处理器执行时实现如第一方面所述的僵尸网络识别方法的步骤。In a third aspect, an embodiment of the present application further provides a terminal device, including a processor, a memory, and a program or instruction stored on the memory and executable on the processor, where the program or instruction is processed by the The steps of implementing the method for identifying a botnet according to the first aspect when the server is executed.

第四方面，本申请实施例还提供一种计算机可读存储介质，当所述存储介质中的指令由终端设备中的处理器执行时，使得终端设备能够执行第一方面所述的僵尸网络识别方法的步骤。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, when an instruction in the storage medium is executed by a processor in a terminal device, the terminal device can perform the botnet identification described in the first aspect steps of the method.

本申请实施例采用的上述至少一个技术方案能够达到以下有益效果：The above-mentioned at least one technical solution adopted in the embodiments of the present application can achieve the following beneficial effects:

本申请实施例中，利用预配置的DGA家族规则对待处理的DNS报文日志进行分析，以实现僵尸网络的有效识别，以用于有效对抗僵尸网络的攻击。In the embodiment of the present application, the DNS packet log to be processed is analyzed by using the preconfigured DGA family rules, so as to realize the effective identification of the botnet, so as to effectively fight against the attack of the botnet.

上述说明仅是申请技术方案的概述，为了能够更清楚了解本申请的技术手段，可依照说明书的内容予以实施，并且为了让本申请的上述和其他目标、特征和优点能够更明显易懂，以下特举本申请的具体实施方式。The above description is only an overview of the technical solution of the application. In order to understand the technical means of the application more clearly, it can be implemented according to the content of the description, and in order to make the above and other objectives, features and advantages of the application more obvious and understandable, the following The specific embodiments of the present application are specifically cited.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:

图1为根据一示例性实施例提供的僵尸网络识别方法的流程示意图。FIG. 1 is a schematic flowchart of a method for identifying a botnet according to an exemplary embodiment.

图2为根据另一示例性实施例提供的僵尸网络识别方法的流程示意图。FIG. 2 is a schematic flowchart of a method for identifying a botnet according to another exemplary embodiment.

图3为根据一示例性实施例提供的僵尸网络识别装置的框图。FIG. 3 is a block diagram of a botnet identification device provided according to an exemplary embodiment.

图4为根据一示例性实施例提供的终端设备的框图。FIG. 4 is a block diagram of a terminal device provided according to an exemplary embodiment.

具体实施方式Detailed ways

为使本申请的目标、技术方案和优点更加清楚，下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the objectives, technical solutions and advantages of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

以下结合附图，详细说明本申请各实施例提供的技术方案。The technical solutions provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

本申请提供一种僵尸网络识别方法、装置及终端设备，其中，该僵尸网络识别方法可以应用于终端设备，具体可由安装于所述终端设备中的硬件或/和软件执行。可选地，所述终端设备可以是但不限于手机、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)或称为笔记本电脑、个人数字助理(Personal Digital Assistant，PDA)、掌上电脑、上网本、超级移动个人计算机(ultra-mobile personal computer，UMPC)、移动上网装置(Mobile Internet Device，MID)、可穿戴式设备(Wearable Device)或车载设备(VUE)、行人终端(PUE)等终端侧设备，可穿戴式设备包括：手环、耳机、眼镜等。需要说明的是，在本申请实施例并不限定终端11的具体类型。The present application provides a botnet identification method, apparatus, and terminal device, wherein the botnet identification method can be applied to a terminal device, and specifically can be executed by hardware or/and software installed in the terminal device. Optionally, the terminal device may be, but is not limited to, a mobile phone, a tablet computer (Tablet Personal Computer), a laptop computer (Laptop Computer) or a notebook computer, a personal digital assistant (Personal Digital Assistant, PDA), a handheld computer , Netbook, Ultra-mobile Personal Computer (UMPC), Mobile Internet Device (MID), Wearable Device (Wearable Device) or Vehicle-mounted Equipment (VUE), Pedestrian Terminal (PUE) and other terminals Side devices, wearable devices include: bracelets, headphones, glasses, etc. It should be noted that, the embodiment of the present application does not limit the specific type of the terminal 11 .

结合参阅图1，为本申请一示例性实施例提供的僵尸网络识别方法100的流程示意图，所述方法100至少包括如下步骤。Referring to FIG. 1 , a schematic flowchart of a botnet identification method 100 provided by an exemplary embodiment of the present application, the method 100 at least includes the following steps.

S110，对待处理的域名系统(Domain Name System,DNS)报文日志进行风险识别，得到风险识别结果。S110: Perform risk identification on the domain name system (Domain Name System, DNS) message log to be processed to obtain a risk identification result.

其中，所述DNS报文日志中包括多个域名信息，每个所述域名信息包括域名以及所述域名对应的源网协(Internet Protocol,IP)。其中，所述DNS报文日志可以是实时获取的，例如，可以通过旁路镜像方式获取交换机流量信息，然后通过DPI解析方法对所述交换机流量信息进行解析，得到所述DNS报文日志。The DNS message log includes a plurality of domain name information, and each of the domain name information includes a domain name and a source Internet Protocol (IP) corresponding to the domain name. The DNS message log may be acquired in real time. For example, the switch traffic information may be acquired by means of bypass mirroring, and then the switch traffic information may be parsed by the DPI resolution method to obtain the DNS message log.

需要注意的是，所述DNS报文日志根据“请求或响应”字段的不同，分为两类，一类是DNS请求，一类是DNS响应，所述DNS请求和所述DNS响应可以分别表示如下。It should be noted that the DNS packet log is divided into two categories according to the "request or response" field, one is DNS request, and the other is DNS response. The DNS request and the DNS response can be expressed respectively. as follows.

a.DNS请求a. DNS request

15412345677|1231231312123|10.1.1.1|3456|10.1.1.2|53|dns|80|Q|www.a.com|A|IN|。15412345677|1231231312123|10.1.1.1|3456|10.1.1.2|53|dns|80|Q|www.a.com|A|IN|.

b.DNS响应b. DNS response

15412345677|1231231312123|10.1.1.1|3456|10.1.1.2|53|dns|80|A|www.a.com|A|IN|NoError|10.1.1.1。15412345677|1231231312123|10.1.1.1|3456|10.1.1.2|53|dns|80|A|www.a.com|A|IN|NoError|10.1.1.1.

S120，在所述风险识别结果指示所述DNS报文日志疑似恶意流量的情况下，基于预配置的域名生成算法(Domain Generation Algorithm，DGA)家族规则，确定所述DNS报文日志中各域名信息所属的DGA家族集合。S120, when the risk identification result indicates that the DNS message log is suspected of malicious traffic, based on a preconfigured domain name generation algorithm (Domain Generation Algorithm, DGA) family rule, determine each domain name information in the DNS message log The DGA family collection to which it belongs.

其中，所述DGA家族规则的获取过程可以包括：根据360等提供的每一个DGA家族的字符规则，提取出判定域名为某类家族的规则集合，如Mirai家族域名满足固定长度12，字符由a-y构成，于是得到Mirai家族的判定SQL语句。Wherein, the acquisition process of the DGA family rules may include: according to the character rules of each DGA family provided by 360, etc., extracting a rule set for determining that the domain name is a certain type of family, for example, the Mirai family domain name meets a fixed length of 12, and the characters are a-y Then, the decision SQL statement of the Mirai family is obtained.

本实施例中，所述DGA家族集合的获取过程可以包括：将疑似恶意流量的DNS报文日志经与所述DGA家族规则进行匹配，得到DNS报文日志中每条域名信息对应的DGA家族集合。In this embodiment, the acquisition process of the DGA family set may include: matching the DNS packet log of suspected malicious traffic with the DGA family rule to obtain the DGA family set corresponding to each domain name information in the DNS packet log .

应注意，如果某条恶意流量新增家族字段后内容为，15412345677|1231231312123|10.1.1.1|3456|10.1.1.2|53|dns|80|A|abcjiaucnoin.com|A|IN|NoError|10.1.1.1|[mirai,tempedreve,tinba]，可将该恶意流量的域名家族信息补充完整。It should be noted that if a malicious traffic adds a family field, the content is 15412345677|1231231312123|10.1.1.1|3456|10.1.1.2|53|dns|80|A|abcjiaucnoin.com|A|IN|NoError|10.1. 1.1|[mirai,tempedreve,tinba], can complete the domain name family information of the malicious traffic.

S130，基于所述DGA家族集合以及所述源IP，对具有相同源IP的域名信息集进行聚类，得到多个目标流量组。S130, based on the DGA family set and the source IP, cluster domain name information sets with the same source IP to obtain multiple target traffic groups.

S140，分别统计各所述目标流量组中访问同一类DGA家族集合的源IP的数量。S140: Count the number of source IPs accessing the same type of DGA family set in each of the target traffic groups respectively.

S150，在所述数量大于第一阈值、且访问无效返回占比大于第二阈值的情况下，确定所述源IP对应的网络为僵尸网络。S150. In the case that the number is greater than the first threshold and the proportion of invalid access returns is greater than the second threshold, determine that the network corresponding to the source IP is a botnet.

本实施例中，在S130至S150中，所述第一阈值和所述第二阈值可根据实际需求进行设定，例如，假设第一阈值为θ1、第二阈值分别为θ2，那么，作为一种可能的实现方式，以相同源IP的恶意流量(也即目标流量组)为一组，大小为N_IP，统计一组中源IP访问同一类DGA家族集合的个数δ，如10.1.1.1访问mirai家族集合的域名个数，若δ>θ1，再判断源IP访问的该DGA家族集合的无效返回占比η，若η>θ2，则可判定该源IP为僵尸网络。In this embodiment, in S130 to S150, the first threshold and the second threshold may be set according to actual requirements. For example, assuming that the first threshold is θ1 and the second threshold is θ2, respectively, then, as a A possible implementation method, take malicious traffic with the same source IP (that is, target traffic group) as a group, the size is N _IP , and count the number δ of source IP accessing the same type of DGA family set in a group, such as 10.1.1.1 The number of domain names accessing the mirai family set, if δ>θ1, then determine the invalid return ratio η of the DGA family set accessed by the source IP, and if η>θ2, it can be determined that the source IP is a botnet.

其中，所述无效返回占比η的计算方式可以如下表示。Wherein, the calculation method of the invalid return ratio n can be expressed as follows.

其中，NIP表示总的访问量，N_{IP,rcode＝NXdomain}表示无效返回的数量。Among them, NIP represents the total number of visits, and N _{IP,rcode=NXdomain} represents the number of invalid returns.

本实施例中，利用预配置的DGA家族规则对待处理的DNS报文日志进行分析，以实现僵尸网络的有效识别。此外，DGA家族规则易于实现，且覆盖性、通用性高，经过测试，僵尸网络的识别准确率较高。In this embodiment, the preconfigured DGA family rules are used to analyze the DNS packet log to be processed, so as to realize the effective identification of the botnet. In addition, the DGA family rules are easy to implement, and have high coverage and versatility. After testing, the identification accuracy of botnets is high.

结合参阅图2，为本申请一示例性实施例提供的僵尸网络识别方法200的流程示意图，所述方法200至少包括如下步骤。Referring to FIG. 2 in conjunction with a schematic flowchart of a botnet identification method 200 according to an exemplary embodiment of the present application, the method 200 at least includes the following steps.

S210，获取待处理的DNS报文日志。S210: Obtain the pending DNS packet log.

其中，S210中所述的DNS报文日志的获取方式可参照前述S110的中的相关描述，为避免重复，在此不再赘述。Wherein, for the acquisition method of the DNS message log described in S210, reference may be made to the relevant description in the foregoing S110, and in order to avoid repetition, details are not repeated here.

S220，对所述DNS报文日志包括的域名信息进行预处理，得到多个待识别语料。S220: Preprocess the domain name information included in the DNS message log to obtain a plurality of corpora to be identified.

作为一种可能的实现方式，S220的实现过程可以包括：对所述DNS报文日志中包括的各所述域名信息进行标准化处理；对标准化处理后的各所述域名进行分词处理，得到对应各所述域名信息的多个待识别语料。As a possible implementation manner, the implementation process of S220 may include: standardizing each of the domain name information included in the DNS message log; performing word segmentation processing on each of the standardized domain names to obtain corresponding Multiple corpora to be identified for the domain name information.

例如，假设域名信息为QQ.com，那么，先对该域名信息进行大小写转换，如将字母Q标准化为q，那么，对QQ.com标准化处理后的域名信息为qq.com；然后，对标准化后的域名信息进行分词处理，也就是，将所述标准化处理后的域名信息按照每单个字母进行分词，例如，在对域名信息qq.com进行分词处理后，得到的待识别预料{q q c o m}。For example, assuming that the domain name information is QQ.com, then first case-convert the domain name information. For example, if the letter Q is normalized to q, then the domain name information after the normalization of QQ.com is qq.com; The standardized domain name information is subjected to word segmentation, that is, the standardized domain name information is subjected to word segmentation according to each single letter. For example, after the domain name information qq.com is subjected to word segmentation, the expected to-be-identified {q q c o m} is obtained. .

作为另一种可能的实现方式，在得到待识别预料后，为了进一步提高僵尸网络的识别准确度，可将各所述待识别语料的长度处理为预定长度。例如，假设预配置的语料的预定长度为n(如32)，那么，对于每个待识别语料，当待识别语料的长度超出所述预定长度，可对所述待识别语料进行截断，使得截断后的待识别语料的长度为所述预定长度，反之，当待识别语料的长度不足所述预定长度，可对所述待识别语料的长度进行补足，使得补足后的待识别语料的长度为所述预定长度。As another possible implementation manner, after obtaining the expectation to be identified, in order to further improve the identification accuracy of the botnet, the length of each of the to-be-identified corpora may be processed as a predetermined length. For example, assuming that the predetermined length of the preconfigured corpus is n (such as 32), then, for each corpus to be recognized, when the length of the corpus to be recognized exceeds the predetermined length, the corpus to be recognized may be truncated, so that the truncation The length of the corpus to be recognized is the predetermined length. On the contrary, when the length of the corpus to be recognized is less than the predetermined length, the length of the corpus to be recognized can be supplemented, so that the length of the supplemented corpus to be recognized is the specified length. the predetermined length.

其中，关于所述待识别的补足方式或截断方式本实施例不做限制。Wherein, the present embodiment does not limit the complementing manner or truncation manner to be identified.

S230，根据所述待识别语料中包括的每个域名字母查询预配置的词向量矩阵，得到待识别向量。S230: Query a preconfigured word vector matrix according to each domain name letter included in the corpus to be recognized, and obtain a vector to be recognized.

其中，所述词向量矩阵可以基于通过word2vec模型获取。作为一种可能的实现方式，所述词向量矩阵的生成过程可从参照下述S240中的详细描述，在此不再赘述。Wherein, the word vector matrix can be obtained based on the word2vec model. As a possible implementation manner, the generation process of the word vector matrix can be referred to the detailed description in the following S240, which is not repeated here.

S240，将所述待识别向量输入预训练的DGA识别模型，得到风险识别结果。S240: Input the to-be-identified vector into a pre-trained DGA identification model to obtain a risk identification result.

其中，所述DGA识别模型可以但不限于基于Textcnn模型等训练得到。作为一种可能的实现方式，所述DGA识别模型的预训练过程可以包括如下(1)-(6)中所述。Wherein, the DGA recognition model may be obtained by training based on, but not limited to, the Textcnn model. As a possible implementation manner, the pre-training process of the DGA recognition model may include the following (1)-(6).

(1)获取用于训练数据集构建的多个域名信息。(1) Obtain multiple domain name information for training dataset construction.

其中，所述多个域名信息可以是实时获取的，也可以是从HDFS中获取。The plurality of domain name information may be acquired in real time, or may be acquired from HDFS.

本实施例中，可获取具备样本标签的训练数据集，其中DGA样本获取路径可以为https://data.netlab.360.com/dga/dga.txt，总计1223607。正常域名的获取路径可以为http://s3.amazonaws.com/alexa-static/top-1m.csv.zip，总计100万，由此，可以得到具备正负标签的样本数据集。In this embodiment, a training data set with sample labels can be obtained, wherein the DGA sample obtaining path can be https://data.netlab.360.com/dga/dga.txt, with a total of 1223607. The acquisition path of a normal domain name can be http://s3.amazonaws.com/alexa-static/top-1m.csv.zip, with a total of 1 million, so that a sample data set with positive and negative labels can be obtained.

(2)对每个所述域名信息预处理，得到对应各所述域名的多个训练语料，每个所述训练语料包括域名样本和样本标签。(2) Preprocessing each of the domain name information to obtain a plurality of training corpora corresponding to each of the domain names, and each of the training corpora includes a domain name sample and a sample label.

其中，对于各所述域名信息进行预处理的预处理过程可参照前述S220中的详细描述，但与前述S220中不同的是，所述训练语料中包括用于表征训练样本的正负性的正负标签。Wherein, the preprocessing process of preprocessing each of the domain name information may refer to the detailed description in the foregoing S220, but the difference from the foregoing S220 is that the training corpus includes positive and negative values used to represent the positive and negative of the training samples. Negative label.

例如，假设用于训练数据集构建的域名信息为qq.com、Google.com、ntnipngh.com,那么，包含正负标签的训练语料可以为：For example, assuming that the domain name information used to construct the training dataset is qq.com, Google.com, and ntnipngh.com, then the training corpus containing positive and negative labels can be:

qq.com：q q c o m,0；qq.com: q q c o m, 0;

Google.com：g o o g l e c o m,0；Google.com: go o g l e c o m, 0;

ntnipngh.com：n t n i p n g h c o m,1；ntnipngh.com: n t n i p n g h c o m,1;

其中，标签0表示正常域名，标签1表示DGA域名。Among them, label 0 represents a normal domain name, and label 1 represents a DGA domain name.

(3)分别根据各所述训练语料中包括的每一个域名字母，依次查询预配置的词向量矩阵，得到与所述训练语料一一对应的多个一维词向量。(3) According to each domain name letter included in each training corpus, query the preconfigured word vector matrix in turn to obtain a plurality of one-dimensional word vectors corresponding to the training corpus one-to-one.

其中，所述词向量矩阵的获取过程包括：将各训练语料处理为一维向量，在将各所述一维向量依次通过word2vec模型，则可以到预定大小的词向量举证。例如，假设word2vec模型对应的一维词向量长度为64，那么，所述词向量举证可以为27x64(26个字母+补值)。The process of obtaining the word vector matrix includes: processing each training corpus into a one-dimensional vector, and passing each of the one-dimensional vectors through the word2vec model in sequence, then a word vector of a predetermined size can be used to provide evidence. For example, assuming that the length of the one-dimensional word vector corresponding to the word2vec model is 64, then the word vector evidence may be 27x64 (26 letters + complement).

(4)对所述多个一维词向量进行拼接，得到所述训练语料对应的目标向量。(4) Splicing the multiple one-dimensional word vectors to obtain the target vector corresponding to the training corpus.

其中，根据得到的词向量矩阵，将基于训练域名信息中的每一个字母查询词向量矩阵，得到的一维词向量进行拼接，得到每条训练语料的对应的目标向量。本实施例中，所述目标向量的长度可以为2048。Among them, according to the obtained word vector matrix, the one-dimensional word vector obtained by querying the word vector matrix based on each letter in the training domain name information is spliced to obtain the corresponding target vector of each training corpus. In this embodiment, the length of the target vector may be 2048.

(5)基于各所述训练语料对应的目标向量，构建训练数据集。(5) Constructing a training data set based on the target vector corresponding to each of the training corpora.

(6)利用所述训练数据集对预定神经网络模型进行训练，得到DGA识别模型。(6) Using the training data set to train a predetermined neural network model to obtain a DGA recognition model.

其中，在(5)-(6)中，可以将基于训练语料构建的训练数据集，输入Textcnn模型，以对Textcnn模型进行训练，得到最终用于预测的DGA识别模型。Among them, in (5)-(6), the training data set constructed based on the training corpus can be input into the Textcnn model to train the Textcnn model to obtain the final DGA recognition model for prediction.

需要注意的是，为了提高DGA识别模型的识别准确性，可通过损失函数、更换训练样本等方式对所述DGA模型进行多次训练，对此本实施例不再赘述。It should be noted that, in order to improve the recognition accuracy of the DGA recognition model, the DGA model may be trained multiple times by means of loss functions, replacement of training samples, etc., which will not be repeated in this embodiment.

应注意的是，在利用DGA识别模型得到风险识别结果后，可根据风险识别结果对各域名信息进行标记，例如，若该条DNS报文日志中的域名标记为正常，则该条日志属于正常流量，若该条日志中的域名标记为DGA，则该条日志属于恶意流量，并执行后续的S250-S280。It should be noted that after the risk identification result is obtained by using the DGA identification model, each domain name information can be marked according to the risk identification result. For example, if the domain name in the DNS packet log is marked as normal, the log is normal. If the domain name in the log is marked as DGA, the log belongs to malicious traffic, and the subsequent S250-S280 are executed.

S250，在所述风险识别结果指示所述DNS报文日志疑似恶意流量的情况下，所述基于预配置的DGA家族规则，确定待处理的DNS报文日志中的各域名信息所属的DGA家族集合。S250, in the case that the risk identification result indicates that the DNS message log is suspected of malicious traffic, the DGA family set to which each domain name information in the DNS message log to be processed belongs is determined based on the preconfigured DGA family rules .

S260，基于所述DGA家族集合以及所述源IP，对具有相同源IP的域名信息集进行聚类，得到多个目标流量组；S260, based on the DGA family set and the source IP, cluster the domain name information sets with the same source IP to obtain multiple target traffic groups;

S270，分别统计各所述目标流量组中访问同一类DGA家族集合的源IP的数量；S270, respectively count the number of source IPs accessing the same type of DGA family set in each of the target traffic groups;

S280，在所述数量大于第一阈值、且访问无效返回占比大于第二阈值的情况下，确定所述源IP对应的网络为僵尸网络。S280, in the case that the number is greater than the first threshold and the proportion of invalid access returns is greater than the second threshold, determine that the network corresponding to the source IP is a botnet.

其中，S250-S280中的详细描述，可参照方法100的相关描述，在此不做赘述。For the detailed description in S250-S280, reference may be made to the relevant description of the method 100, and details are not repeated here.

本实施例中，基于包括两个实施阶段的漏斗模型，筛选出疑似恶意流量的DGA域名，再基于筛选结果，利用DGA家族规则库，能够进一步精准识别出僵尸网络主机。此外，本实施例中基于词向量空间识别DGA的手段，特征处理更加简易，避免了大量特征工程的干扰，识别准确率更高。In this embodiment, based on the funnel model including two implementation stages, DGA domain names suspected of malicious traffic are screened out, and then based on the screening results, the botnet host can be further accurately identified by using the DGA family rule base. In addition, the method of identifying the DGA based on the word vector space in this embodiment makes the feature processing simpler, avoids the interference of a large number of feature engineering, and has a higher recognition accuracy.

如图3所示，为本申请的一示例性实施例提供的僵尸网络识别装置300，所述装置300包括识别模块310，用于对待处理的DNS报文日志进行风险识别，得到风险识别结果，所述DNS报文日志中包括多个域名信息，每个所述域名信息包括域名以及所述域名对应的源IP；第一确定模块320，用于在所述风险识别结果指示所述DNS报文日志疑似恶意流量的情况下，基于预配置的DGA家族规则，确定所述DNS报文日志中的各域名信息所属的目标DGA家族集合；；聚类模块330，用于基于所述DGA家族集合以及所述源IP，对具有相同源IP的域名信息集进行聚类，得到多个目标流量组；统计模块340，用于分别统计各所述目标流量组中访问同一类DGA家族集合的源IP的数量；第二确定模块350，用于在所述数量大于第一阈值、且访问无效返回占比大于第二阈值的情况下，确定所述源IP对应的网络为僵尸网络。As shown in FIG. 3 , a botnet identification device 300 is provided in an exemplary embodiment of the present application. The device 300 includes an identification module 310 for performing risk identification on the DNS message log to be processed to obtain a risk identification result, The DNS message log includes a plurality of domain name information, and each of the domain name information includes a domain name and a source IP corresponding to the domain name; the first determining module 320 is configured to indicate the DNS message in the risk identification result In the case that the log is suspected to be malicious traffic, based on the preconfigured DGA family rules, determine the target DGA family set to which each domain name information in the DNS message log belongs; the clustering module 330 is configured to be based on the DGA family set and For the source IP, cluster the domain name information sets with the same source IP to obtain multiple target traffic groups; the statistics module 340 is used to separately count the source IPs accessing the same type of DGA family set in each of the target traffic groups. The second determination module 350 is configured to determine that the network corresponding to the source IP is a botnet when the number is greater than the first threshold and the proportion of invalid access returns is greater than the second threshold.

关于本实施例中的僵尸网络识别装置300，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the botnet identification device 300 in this embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

请参阅图4，为根据一示例性实施例提供的一种终端设备400的框图，该终端设备400可至少包括处理器410，用于存储处理器410可执行指令的存储器420。其中，处理器410被配置为执行指令，以实现如上述实施例中的僵尸网络识别方法的全部步骤或部分步骤。Please refer to FIG. 4 , which is a block diagram of a terminal device 400 according to an exemplary embodiment. The terminal device 400 may at least include a processor 410 and a memory 420 for storing instructions executable by the processor 410 . Wherein, the processor 410 is configured to execute the instructions to implement all or part of the steps of the botnet identification method in the above-mentioned embodiments.

处理器410、存储器420之间直接或间接地电性连接，以实现数据的传输或交互。例如，这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。The processor 410 and the memory 420 are directly or indirectly electrically connected to realize data transmission or interaction. For example, these elements may be electrically connected to each other through one or more communication buses or signal lines.

其中，处理器410用于读/写存储器中存储的数据或程序，并执行相应地功能。The processor 410 is used to read/write data or programs stored in the memory, and perform corresponding functions.

存储器420用于存储程序或者数据，如存储处理器410可执行指令。该存储器420可以是，但不限于，随机存取存储器(Random Access Memory，RAM)，只读存储器(Read OnlyMemory，ROM)，可编程只读存储器(Programmable Read-Only Memory，PROM)，可擦除只读存储器(Erasable Programmable Read-Only Memory，EPROM)，电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory，EEPROM)等。The memory 420 is used to store programs or data, such as instructions executable by the processor 410 . The memory 420 can be, but not limited to, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable memory Read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electric Erasable Programmable Read-Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

进一步，作为一种可能的实现方式，终端设备400还可包括电源组件、多媒体组件、音频组件、输入/输出(I/O)接口、传感器组件以及通信组件等。Further, as a possible implementation manner, the terminal device 400 may further include a power supply component, a multimedia component, an audio component, an input/output (I/O) interface, a sensor component, a communication component, and the like.

电源组件为终端设备400的各种组件提供电力。电源组件可以包括电源管理系统，一个或多个电源、以及其他与为终端设备400生成、管理和分配电力相关联的组件。The power supply components provide power to the various components of the terminal device 400 . Power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power to end device 400 .

多媒体组件包括在终端设备400和用户之间的提供一个输出接口的屏幕。在一些实施例中，屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板，屏幕可以被实现为触摸屏，以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件包括一个前置摄像头和/或后置摄像头。当终端设备400处于操作模式，如拍摄模式或视频模式时，前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component includes a screen that provides an output interface between the terminal device 400 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. A touch sensor can sense not only the boundaries of a touch or swipe action, but also the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component includes a front-facing camera and/or a rear-facing camera. When the terminal device 400 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音频组件被配置为输出和/或输入音频信号。例如，音频组件包括一个麦克风(MIC)，当终端设备400处于操作模式，如呼叫模式、记录模式和语音识别模式时，麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器420或经由通信组件发送。在一些实施例中，音频组件还包括一个扬声器，用于输出音频信号。The audio components are configured to output and/or input audio signals. For example, the audio component includes a microphone (MIC) that is configured to receive external audio signals when the terminal device 400 is in an operating mode, such as a calling mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in memory 420 or transmitted via the communication component. In some embodiments, the audio assembly further includes a speaker for outputting audio signals.

I/O接口为处理组件和外围接口模块之间提供接口，上述外围接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface provides an interface between the processing component and a peripheral interface module, and the above-mentioned peripheral interface module can be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

传感器组件包括一个或多个传感器，用于为终端设备400提供各个方面的状态评估。例如，传感器组件可以检测到终端设备400的打开/关闭状态，组件的相对定位，例如组件为终端设备400的显示器和小键盘，传感器组件还可以检测终端设备400或终端设备400一个组件的位置改变，用户与终端设备400接触的存在或不存在终端设备400方位或加速/减速和终端设备400的温度变化。传感器组件可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件还可以包括光传感器，如CMOS或CCD图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件还可以包括加速度传感器，陀螺仪传感器，磁传感器，压力传感器或温度传感器。The sensor assembly includes one or more sensors for providing various aspects of the status assessment for the end device 400 . For example, the sensor component can detect the open/closed state of the terminal device 400, the relative positioning of the components, such as the display and keypad of the terminal device 400, the sensor component can also detect the position change of the terminal device 400 or a component of the terminal device 400 , the presence or absence of the user's contact with the terminal device 400 , the orientation or acceleration/deceleration of the terminal device 400 and the temperature change of the terminal device 400 . The sensor assembly may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信组件被配置为便于终端设备400和其他设备之间有线或无线方式的通信。终端设备400可以接入基于通信标准的无线网络，如WiFi，运营商网络(如2G、3G、4G或4G)，或它们的组合。在一个示例性实施例中，通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，通信组件还包括近场通信(NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(RFID)技术，红外数据协会(IrDA)技术，超宽带(UWB)技术，蓝牙(BT)技术和其他技术来实现。The communication component is configured to facilitate wired or wireless communication between the end device 400 and other devices. The terminal device 400 may access a wireless network based on a communication standard, such as WiFi, an operator network (eg, 2G, 3G, 4G, or 4G), or a combination thereof. In one exemplary embodiment, the communication component receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中，终端设备400可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述方法。In an exemplary embodiment, end device 400 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmed gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.

应当理解的是，图4所示的结构仅为终端设备400的结构示意图，该终端设备400还可包括比图4中所示更多或者更少的组件，或者具有与图4所示不同的配置。图4中所示的各组件可以采用硬件、软件或其组合实现。It should be understood that the structure shown in FIG. 4 is only a schematic structural diagram of the terminal device 400 , and the terminal device 400 may further include more or less components than those shown in FIG. 4 , or have different components from those shown in FIG. 4 . configuration. Each component shown in FIG. 4 can be implemented in hardware, software, or a combination thereof.

在示例性实施例中，还提供了一种包括指令的非临时性计算机可读存储介质，例如包括指令的存储器，上述指令可由终端设备中的处理器执行以完成上述僵尸网络识别方法。例如，非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions, such as a memory including instructions, is also provided, and the instructions can be executed by a processor in a terminal device to complete the above botnet identification method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also no Other elements expressly listed, or which are also inherent to such a process, method, article of manufacture or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or apparatus that includes the element.

本领域技术人员应明白，本申请的实施例可提供为方法、系统或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

以上仅为本申请的实施例而已，并不用于限制本申请。对于本领域技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本申请的权利要求范围之内。The above are merely examples of the present application, and are not intended to limit the present application. Various modifications and variations of this application are possible for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the scope of the claims of this application.

Claims

1. A botnet identification method, comprising:

performing risk identification on a DNS message log to be processed to obtain a risk identification result, wherein the DNS message log comprises a plurality of domain name information, and each domain name information comprises a domain name and a source IP (Internet protocol) corresponding to the domain name;

under the condition that the risk identification result indicates the suspected malicious flow of the DNS message log, determining a DGA family set to which each domain name information in the DNS message log belongs based on a preconfigured DGA family rule;

based on the DGA family set and the source IP, clustering domain name information with the same source IP to obtain a plurality of target flow groups;

respectively counting the number of source IPs accessing the same DGA family set in each target flow group;

and under the condition that the number is greater than a first threshold value and the access invalid return occupation ratio is greater than a second threshold value, determining that the network corresponding to the source IP is a botnet.

2. The method of claim 1, wherein the DNS message logs comprise DNS request logs and DNS response logs.

3. The method of claim 1, wherein performing risk identification on the DNS message log to be processed to obtain a risk identification result comprises:

acquiring a DNS message log to be processed;

preprocessing domain name information included in the DNS message log to obtain a plurality of linguistic data to be identified;

inquiring a pre-configured word vector matrix according to each domain name letter included in the linguistic data to be recognized to obtain a vector to be recognized;

and inputting the vector to be recognized into a pre-trained DGA recognition model to obtain a risk recognition result.

4. The method according to claim 3, wherein preprocessing the domain name information included in the DNS packet log to obtain a plurality of corpora to be identified includes:

performing standardization processing on each domain name information included in the DNS message log;

and performing word segmentation on each normalized domain name to obtain a plurality of linguistic data to be identified corresponding to each domain name information.

5. The method according to claim 3, wherein before the pre-configured word vector matrix is queried according to each domain name letter included in the corpus to be recognized to obtain the vector to be recognized, the risk recognition is performed on the DNS message log to be processed to obtain a risk recognition result, further comprising:

and processing the length of each corpus to be identified into a preset length.

6. The method of claim 3, wherein the pre-training process of the DGA recognition model comprises:

acquiring a plurality of domain name information for constructing a training data set;

preprocessing each domain name information to obtain a plurality of training corpora corresponding to each domain name, wherein each training corpora comprises a domain name sample and a sample label;

sequentially inquiring a pre-configured word vector matrix according to each domain name letter included in each training corpus respectively to obtain a plurality of one-dimensional word vectors corresponding to the training corpuses one by one;

splicing the plurality of one-dimensional word vectors to obtain a target vector corresponding to the training corpus;

constructing a training data set based on the target vector corresponding to each training corpus;

and training a preset neural network model by using the training data set to obtain a DGA recognition model.

7. The method of claim 1, wherein before performing risk identification on the DNS message log to be processed and obtaining a risk identification result, the method further comprises:

obtaining the flow information of the switch in a bypass mirror mode;

and analyzing the switch flow information in a DPI (deep packet inspection) analysis mode to obtain the DNS message log.

8. A botnet recognition device, comprising:

the system comprises an identification module, a processing module and a processing module, wherein the identification module is used for carrying out risk identification on a DNS message log to be processed to obtain a risk identification result, the DNS message log comprises a plurality of domain name information, and each domain name information comprises a domain name and a source IP corresponding to the domain name;

a first determining module, configured to determine, based on a preconfigured DGA family rule, a DGA family set to which the DNS message log belongs, when the risk identification result indicates that the DNS message log is suspected to be malicious traffic;

the clustering module is used for clustering the domain name information sets with the same source IP based on the DGA family set and the source IP to obtain a plurality of target flow groups;

the statistical module is used for respectively counting the number of source IPs accessing the same DGA family set in each target flow group;

and the second determining module is used for determining that the network corresponding to the source IP is a botnet under the condition that the number is greater than the first threshold value and the access invalid return occupation ratio is greater than a second threshold value.

9. A terminal device comprising a processor, a memory and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the botnet identification method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a program or instructions are stored, which program or instructions, when executed by a processor, carry out the steps of the botnet identification method according to any one of claims 1-7.