WO2019057048A1

WO2019057048A1 - Low-frequency crawler identification method, device, readable storage medium and equipment

Info

Publication number: WO2019057048A1
Application number: PCT/CN2018/106370
Authority: WO
Inventors: 胡志磊; 刘鑫琪; 陈�峰; 汪海; 陈哲; 丛磊
Original assignee: 北京数安鑫云信息技术有限公司
Priority date: 2017-09-20
Filing date: 2018-09-19
Publication date: 2019-03-28
Also published as: CN107800684A; CN107800684B

Abstract

Disclosed herein are a low-frequency crawler identification method, a device, a readable storage medium and an equipment, the method comprising: computing a behavior feature vector of each user IP within a preset time slot according to a network application log of each user IP; clustering the behavior feature vector of each user IP to acquire a plurality of clusters; and determining an inspection rule, determining a cluster that meets the corresponding inspection rule, and determining each user IP in the cluster as a crawler. The embodiments of the present invention may effectively identify low frequency crawlers, and may solve group threats, low-frequency threats, associated threats and persistent threats, which traditional security products cannot identify. Public cloud or private cloud deployment is supported, and threat identification and blocking may be performed without changing network topology and without embedding any code; the joining of custom blocking interfaces is supported, and a deployment environment being completely switched off under extreme cases will not influence the normal operation of an original service.

Description

Low-frequency crawler identification method, device, readable storage medium and device

The present application claims priority to Chinese Patent Application No. 200910857222.9, filed on Sep. 20, 2011, the entire disclosure of which is incorporated herein by reference. in.

Technical field

This document relates to, but is not limited to, the field of Internet technology, and in particular, to a low frequency crawler identification method, apparatus, readable storage medium and device.

Background technique

The Internet is full of reptiles, and reptiles are constantly evolving in the process of anti-reptiles. The evolution of the reptile includes the following three stages: primary reptiles, browser reptiles, and low frequency reptiles. The primary crawler does not masquerade itself while crawling the target page. It can be accurately identified by features such as user-agent and frequency. The browser crawler will use the User-agent used by Firefox and opera. Various types of browsers, such as chrome, camouflage, and behave similarly to normal users. Browser crawlers can be identified by access frequency, timeline, etc. Low-frequency crawlers use a large number of proxy IP pools to mimic ordinary users for data crawling. A crawler, the low-frequency crawler is closer to the average user in the User-agent, frequency, timeline and other characteristics, especially the frequency is often 1 hour to have a single digit access.

The prior art generally performs low frequency crawler identification by collecting a proxy IP library. The prior art has the following disadvantages:

(1) The recognition recall rate is limited by the coverage of the proxy IP library. At present, the number of Internet proxy IP is hundreds of millions, and the mobile agent IP library can only cover a small part;

(2) The proxy IP is not static. Therefore, it is necessary to update the proxy IP library frequently. Customers generally have a conflicting attitude towards online updates, and offline update will face the problem of update delay.

(3) The proxy IP obtained by using the ADSL cell broadband disconnection replay and multicast is more concealed, and this IP will be used by many real users, and the proxy IP library will face problems such as incorrect sealing and inaccurate identification.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

In order to solve the above technical problem, an embodiment of the present invention provides a low frequency crawler identification method, apparatus, readable storage medium, and device.

The low frequency crawler identification method provided by the embodiment of the invention includes:

Calculating the behavior feature vector of each user IP in the preset time period according to the network application log of each user IP; clustering the behavior feature vector of each user IP to obtain multiple clusters; determining the inspection rule, and determining the cluster that satisfies the corresponding inspection rule , each user IP in this cluster is determined to be a crawler.

The above low frequency crawler identification method also has the following characteristics:

The behavior characteristics include multiple of the following characteristics: average request transmission bytes, unit time period requests, GET request number ratio, request path set space ratio, path maximum similar proportion, path maximum repeat ring ratio, Referer maximum Similar proportion, dangerous user agent UA ratio, UA maximum similar proportion, UA collection space, 404 status code ratio, 2XX status code ratio, 5XX status code ratio, maximum similar proportion of URL type, average access of similar URLs The number of times, the average number of URL types, the standard deviation of the proportion of HTML requests, the standard deviation of other requests, the response time of requests, the length of request responses, the length of request returns, and the number of page views.

Determining the inspection rule includes: determining N target behavior characteristics, setting a corresponding judgment logic and a threshold value of the N target behavior characteristics;

The cluster that satisfies the corresponding test rule includes: calculating an average value of all user IPs for the N target behavior characteristics in the current cluster, and determining that the average values of the N target behavior characteristics satisfy the corresponding judgment logic and the threshold;

or,

Determining the inspection rule includes: determining N target behavior characteristics, setting a judgment logic, a weight, and a threshold corresponding to the N target behavior characteristics;

The cluster that satisfies the corresponding test rule includes: calculating an average value of all user IPs for the N target behavior characteristics in the current cluster, calculating a product of the average value and the corresponding weight, and determining an average value of the N target behavior characteristics The products of the corresponding weights all satisfy the corresponding judgment logic and threshold.

Determining the inspection rule includes: determining N target behavior characteristics, setting a judgment logic corresponding to the N target behavior characteristics, a threshold, an access threshold, and/or an access interval duration;

The cluster that satisfies the corresponding inspection rule includes: calculating an average value of the access times of all IPs in the current cluster and an average value of the access interval, and determining that the average number of the access times is greater than the access threshold and/or the average of the access intervals is greater than the duration of the access interval. For each of the N target behavior characteristics in the current cluster, the average value of all user IPs is calculated, and the product of the average of the N target behavior characteristics and the corresponding weights is determined to satisfy the corresponding judgment logic and threshold.

Determining N target behavior characteristics includes selecting a plurality of target behavior characteristics using a random forest algorithm or a principal component analysis algorithm.

The low frequency crawler identification device provided by the embodiment of the invention includes:

a feature calculation module, configured to calculate, according to a network application log of each user IP, a behavior feature vector of each user IP in a preset time period;

The clustering module is configured to cluster the behavior feature vectors of each user IP to obtain a plurality of clusters;

a rule determination module configured to determine an inspection rule;

The identification module is configured to determine a cluster that satisfies the corresponding inspection rule, and each user IP in the cluster is determined to be a crawler.

The above low frequency crawler identification device also has the following characteristics:

The rule determination module is configured to determine N target behavior characteristics, and set a judgment logic and a threshold corresponding to the N target behavior characteristics;

The identification module is configured to determine that the clusters satisfying the corresponding verification rules include: calculating average values of all user IPs for the N target behavior characteristics in the current cluster, and determining that the average values of the N target behavior characteristics satisfy the corresponding judgment logic and threshold ;

or,

The rule determination module is configured to determine N target behavior characteristics, and set a judgment logic, a weight, and a threshold corresponding to the N target behavior characteristics;

The identification module is configured to calculate an average value of all user IPs for the N target behavior characteristics in the current cluster, calculate a product of the average value and the corresponding weight, and determine that the product of the average value of the N target behavior characteristics and the corresponding weights are both satisfied. Corresponding judgment logic and threshold;

or,

The rule determination module is configured to determine N target behavior characteristics, and set a judgment logic, a threshold, an access threshold, and/or an access interval duration of the N target behavior characteristics;

The identification module is configured to calculate an average of the access times and an average of the access intervals of all the IPs in the current cluster, and determine that the average number of access times is greater than the access threshold and/or the average of the access intervals is greater than the duration of the access interval, and for the N in the current cluster. The target behavior characteristics respectively calculate the average value of all user IPs, and judge that the product of the average of the N target behavior characteristics and the corresponding weights satisfy the corresponding judgment logic and threshold.

The computer readable storage medium provided by the embodiment of the present invention stores a computer program, and when the program is executed by the processor, the steps of the foregoing method are implemented.

The computer device provided by the embodiment of the present invention includes a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the following content is implemented: calculating a preset time period according to the network application log of each user IP. a behavior feature vector of each user IP; clustering the behavior feature vectors of each user IP to obtain a plurality of clusters; determining a verification rule, determining a cluster that satisfies the corresponding inspection rule, and determining each user IP in the cluster as a crawler .

The above computer equipment also has the following characteristics:

or,

The cluster that satisfies the corresponding test rule includes: calculating an average value of all user IPs for the N target behavior characteristics in the current cluster, calculating a product of the average value and the corresponding weight, and determining an average value of the N target behavior characteristics The product of the corresponding weights all meet the corresponding judgment logic and threshold;

The cluster that satisfies the corresponding inspection rule includes: calculating an average value of the access times of all IPs in the current cluster and an average value of the access interval, and determining that the average number of the access times is greater than the access threshold and/or the average of the access intervals is greater than the duration of the access interval. For each of the N target behavior characteristics in the current cluster, the average value of all user IPs is calculated, and the product of the average of the N target behavior characteristics and the corresponding weights is determined to satisfy the corresponding judgment logic and threshold. Embodiments of the invention have the following advantages:

(1) It is possible to effectively identify low frequency crawlers.

(2) Data modeling based on user behavior, without any manual analysis or configuration, automatic intelligent identification of various deep threats through unsupervised clustering, can solve gang threats, low frequency threats, associated threats, persistence that traditional security products cannot identify Threats, etc.

(3) Support public cloud or private cloud deployment, no need to change the network topology, no need to embed any code, can identify and block threats, support docking custom blocking interface, in extreme cases, even if the deployment environment is completely powered off, Will affect the normal operation of the original business.

DRAWINGS

The drawings described herein are intended to provide a further understanding of the embodiments of the invention, and the embodiments of the embodiments of the invention limited. In the drawing:

1 is a flow chart of a low frequency crawler identification method in an embodiment;

Figure 2 is a structural view of a low frequency crawler identification device in the embodiment;

3 is a structural diagram of a computer device for low frequency crawler recognition in the embodiment.

Detailed ways

The embodiments of the present invention will be further described with reference to the drawings and specific embodiments.

1 is a flowchart of a low frequency crawler identification method in an embodiment, and the low frequency crawler identification method includes:

Step 1: Calculate a behavior feature vector of each user IP in the preset time period according to the network application log of each user IP.

Step 2: clustering behavior characteristic vectors of each user IP to obtain multiple clusters;

Step 3: Determine an inspection rule, determine a cluster that satisfies the corresponding inspection rule, and determine each user IP in the cluster as a crawler.

among them,

The behavior feature in step 1 includes multiple of the following features: average request transmission bytes, unit time period requests, GET request number ratio, request path set space ratio, path maximum similar proportion, path maximum repeat ring occupation Ratio, Referer maximum similar proportion, dangerous user agent (UA) ratio, UA maximum similar proportion, UA collection space, 404 status code proportion, 2XX status code proportion, 5XX status code proportion, URL type Maximum similarity ratio, average number of visits to similar URLs, average number of URL types, standard deviation of HTML request ratio, standard deviation of other request ratios, request response time, request response length, request return length, page views.

E.g:

Behavioral characteristics

value

平均请求发送字节数Average number of bytes sent by the request	31283128
请求数Number of requests	291291
GET请求数占比GET requests	100％100%
UA最大相似占比UA maximum similar proportion	100％100%
Referer最大相似占比Referer maximum similar proportion	100％100%
请求路径集合空间占比Request path collection space ratio	56％56%
2XX状态码占比2XX status code ratio	50％50%
URL类型最大相似占比URL type maximum similar proportion	49％49%
URL类型平均数Average number of URL types	28.6828.68
HTML请求占比的标准差Standard deviation of HTML requests	0.020.02
其他请求占比的标准差Standard deviation of other requests	00
同类URL平均访问次数Average number of visits to similar URLs	00

The calculated behavior characteristics are sorted in a preset order to form a behavior feature vector.

The clustering algorithm in step 2 is a commonly used clustering algorithm in the prior art, such as K-Means, K-Medoids, GMM, Spectral clustering, Ncu.

This method supports three identification methods.

The first:

The determining the verification rule in step 3 includes: determining N target behavior characteristics, and setting a judgment logic and a threshold corresponding to the N target behavior characteristics. The cluster that satisfies the corresponding test rule includes: calculating an average value of all user IPs for the N target behavior features in the current cluster, and determining that the average values of the N target behavior characteristics satisfy the corresponding judgment logic and the threshold.

Second:

The determining the verification rule in step 3 includes: determining N target behavior characteristics, and setting a judgment logic, a weight, and a threshold corresponding to the N target behavior characteristics. The cluster that satisfies the corresponding test rule includes: calculating an average value of all user IPs for the N target behavior characteristics in the current cluster, calculating a product of the average value and the corresponding weight, and determining an average value of the N target behavior characteristics The products of the corresponding weights all satisfy the corresponding judgment logic and threshold.

The third type:

The determining the verification rule in step 3 includes: determining N target behavior characteristics, setting a judgment logic corresponding to the N target behavior characteristics, a threshold, an access threshold, and/or an access interval duration. The cluster that satisfies the corresponding inspection rule includes: calculating an average value of the access times of all IPs in the current cluster and an average value of the access interval, and determining that the average number of the access times is greater than the access threshold and/or the average of the access intervals is greater than the duration of the access interval. For each of the N target behavior characteristics in the current cluster, the average value of all user IPs is calculated, and the product of the average of the N target behavior characteristics and the corresponding weights is determined to satisfy the corresponding judgment logic and threshold.

In the method, the method for determining N target behavior characteristics comprises: selecting a N target behavior feature by using a random forest algorithm or a principal component analysis algorithm.

Specific embodiment:

Collect the network application logs of each user's IP in a certain month, and calculate the behavior feature vector of each user's IP in this month. Clustering the behavior feature vectors of each user IP to obtain two clusters.

The inspection rules include: determining that the three target behavior characteristics are respectively the maximum similar proportion of the Referer, the proportion of the space of the request path set, and the proportion of the 2XX status code.

The judgment logic corresponding to the maximum similar proportion of the Referer is greater than the threshold value of 95%.

The judgment logic of the request path set space ratio is greater than, and the threshold is 50%.

The judgment logic of the 2XX status code ratio is greater than the threshold value of 50%.

The average of the three target behavioral characteristics of all user IPs of the two clusters is calculated. The average of the three target behavioral characteristics of the first cluster is 100%, 50%, and 50%, respectively. Then the first cluster satisfies the inspection rule, and all user IPs in this cluster are crawlers. The average of the three target behavioral characteristics in the second cluster is 80%, 40%, and 50%, respectively. Then the second cluster does not satisfy the inspection rule, and all user IPs in the cluster are normal users. In the software implementing this method, options for various behavioral features, options for various clustering algorithms, display items for data security, and display items for crawler threats are designed. In the process of using this software, the selection of the corresponding behavior feature and the selection of the clustering algorithm may be selected according to the use requirements. After the method is executed, the number of clusters divided into clusters may be displayed on the software interface. The area of each cluster is different and the size of each cluster corresponds to the number of user IPs in the cluster. With the process of calculation of this method, the area of each cluster is also changed according to the change of user IP in the user. Variety. According to the evolution result of the method, the crawling condition of the current system is determined to determine whether the current system is in a data security state or a crawler threat state and is indicated at the corresponding display item.

Fig. 2 is a structural diagram of a low frequency crawler identification device in the embodiment. The low frequency crawler identification device includes a feature calculation module, a clustering module, a rule determination module, and an identification module.

a rule determination module configured to determine an inspection rule;

among them,

This device supports three recognition methods.

The first:

The identification module is configured to determine that the clusters satisfying the corresponding verification rules include: calculating average values of all user IPs for the N target behavior characteristics in the current cluster, and determining that the average values of the N target behavior characteristics satisfy the corresponding judgment logic and threshold .

Second:

The identification module is configured to calculate an average value of all user IPs for the N target behavior characteristics in the current cluster, calculate a product of the average value and the corresponding weight, and determine that the product of the average value of the N target behavior characteristics and the corresponding weights are both satisfied. Corresponding judgment logic and threshold.

The third type:

The rule determination module is further configured to select N target behavior features using a random forest algorithm or a principal component analysis algorithm.

The embodiment of the present invention further provides a computer readable storage medium, where the computer program is stored on the storage medium, and the steps of the foregoing method are implemented when the program is executed by the processor.

3 is a structural diagram of a computer device for low frequency crawler identification in an embodiment, the computer device including a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor implementing the program to implement the following Calculating the behavior feature vector of each user IP in the preset time period according to the network application log of each user IP; clustering the behavior feature vector of each user IP to obtain multiple clusters; determining the inspection rule and determining that the corresponding inspection rule is met Cluster, each user IP in this cluster is determined to be a crawler.

The cluster that satisfies the corresponding test rule includes: calculating an average value of all user IPs for the N target behavior features in the current cluster, and determining that the average values of the N target behavior features satisfy the corresponding judgment logic and the threshold.

or,

Compared with the prior art, the embodiment of the invention has the following advantages:

(1) It is possible to effectively identify low frequency crawlers.

Those skilled in the art should understand that the invention may be modified or equivalently substituted without departing from the spirit and scope of the invention.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and functional blocks/units of the methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical The components work together. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on a computer readable medium, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to those of ordinary skill in the art, the term computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or may Any other medium used to store the desired information and that can be accessed by the computer. Moreover, it is well known to those skilled in the art that communication media typically includes computer readable instructions, data structures, program modules or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .

Industrial applicability

The low-frequency crawler identification method, device, readable storage medium and device can effectively identify low-frequency crawlers and perform data modeling based on user behavior without any manual analysis or configuration, and automatically identify various deep-level threats through unsupervised clustering. It can solve gang threats, low frequency threats, associated threats, and persistent threats that are not recognized by traditional security products.

Claims

A low frequency crawler identification method includes:

Calculating the behavior feature vector of each user IP in the preset time period according to the network application log of each user IP; clustering the behavior feature vector of each user IP to obtain multiple clusters; determining the inspection rule, and determining the cluster that satisfies the corresponding inspection rule , each user IP in this cluster is determined to be a crawler.
The low frequency crawler identification method according to claim 1, wherein

The behavior feature includes a plurality of the following features: an average number of request transmission bytes, a number of unit time period requests, a GET request number ratio, a request path set space ratio, a path maximum similar proportion, a path maximum repeat ring ratio, Referer maximum similar proportion, dangerous user agent UA proportion, UA maximum similar proportion, UA collection space, 404 status code proportion, 2XX status code proportion, 5XX status code proportion, maximum similar proportion of URL type, similar URL Average number of visits, average number of URL types, standard deviation of HTML request ratios, standard deviation of other request ratios, request response time, request response length, request return length, page views.
The low frequency crawler identification method according to claim 1, wherein

The determining the verification rule comprises: determining N target behavior characteristics, setting a judgment logic and a threshold corresponding to the N target behavior characteristics;

The determining that the clusters satisfying the corresponding inspection rules comprise: calculating average values of all user IPs for the N target behavior characteristics in the current cluster, and determining that the average values of the N target behavior characteristics satisfy the corresponding judgment logic and the threshold;

or,

The determining the verification rule comprises: determining N target behavior characteristics, setting a judgment logic, a weight, and a threshold corresponding to the N target behavior characteristics;

The cluster that determines that the corresponding inspection rule is satisfied includes: calculating an average value of all user IPs for the N target behavior characteristics in the current cluster, calculating a product of the average value and the corresponding weight, and determining an average of the N target behavior characteristics. The product of the value and the corresponding weight satisfies the corresponding judgment logic and threshold.
The low frequency crawler identification method according to claim 1, wherein

The determining the verification rule comprises: determining N target behavior characteristics, setting a corresponding judgment logic, a threshold value, an access number threshold, and/or an access interval duration of the N target behavior characteristics;

The determining that the cluster that meets the corresponding verification rule comprises: calculating an average value of the access times of all IPs in the current cluster and an average value of the access interval, and determining that the average number of the access times is greater than the threshold of the number of accesses and/or the average of the access intervals is greater than After the interval duration is accessed, the average values of all user IPs are calculated for the N target behavior characteristics in the current cluster, and the product of the average of the N target behavior characteristics and the corresponding weights is determined to satisfy the corresponding judgment logic and threshold.
The low frequency crawler identification method according to claim 3 or 4, wherein

The determining the N target behavior characteristics comprises: selecting a N target behavior characteristics using a random forest algorithm or a principal component analysis algorithm.
A low frequency crawler identification device comprising:

a feature calculation module, configured to calculate, according to a network application log of each user IP, a behavior feature vector of each user IP in a preset time period;

The clustering module is configured to cluster the behavior feature vectors of each user IP to obtain a plurality of clusters;

a rule determination module configured to determine an inspection rule;

The identification module is configured to determine a cluster that satisfies the corresponding inspection rule, and each user IP in the cluster is determined to be a crawler.
The low frequency crawler identification device according to claim 6, wherein

The behavior feature includes a plurality of the following features: an average number of request transmission bytes, a number of unit time period requests, a GET request number ratio, a request path set space ratio, a path maximum similar proportion, a path maximum repeat ring ratio, Referer maximum similar proportion, dangerous user agent UA proportion, UA maximum similar proportion, UA collection space, 404 status code proportion, 2XX status code proportion, 5XX status code proportion, maximum similar proportion of URL type, similar URL Average number of visits, average number of URL types, standard deviation of HTML request ratios, standard deviation of other request ratios, request response time, request response length, request return length, page views.
The low frequency crawler identification device according to claim 6, wherein

The rule determination module is configured to determine N target behavior characteristics, and set a judgment logic and a threshold corresponding to the N target behavior characteristics;

The determining module is configured to determine that the clusters satisfying the corresponding verification rule comprise: calculating average values of all user IPs for the N target behavior characteristics in the current cluster, and determining that the average values of the N target behavior characteristics satisfy the corresponding judgment logic And threshold;

or,

The rule determining module is configured to determine N target behavior characteristics, and set a judgment logic, a weight, and a threshold corresponding to the N target behavior characteristics;

The identification module is configured to calculate an average value of all user IPs for the N target behavior characteristics in the current cluster, calculate a product of the average value and the corresponding weight, and determine a product of the average value of the N target behavior characteristics and the corresponding weight. Both meet the corresponding judgment logic and threshold;

or,

The rule determining module is configured to determine N target behavior characteristics, and set a determination logic, a threshold, an access threshold, and/or an access interval duration of the N target behavior characteristics;

The identification module is configured to calculate an average value of access times and an average value of access intervals of all IPs in the current cluster, and determine that the average number of access times is greater than the threshold of the number of accesses and/or the average value of the access intervals is greater than the duration of the access interval, The N target behavior characteristics in the cluster respectively calculate the average value of all user IPs, and judge that the product of the average of the N target behavior characteristics and the corresponding weights satisfy the corresponding judgment logic and threshold.
A computer readable storage medium having stored thereon a computer program, the program being executed by a processor to perform the steps of the method of any one of claims 1 to 5.
A computer device, comprising: a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor executing the program to implement the following: according to each user IP The network application log calculates a behavior feature vector of each user IP in a preset time period; clusters the behavior feature vectors of each user IP to obtain a plurality of clusters; determines a verification rule, and determines a cluster that satisfies the corresponding inspection rule, in the cluster Each user IP is determined to be a crawler.
The computer device according to claim 10, wherein

The behavior feature includes a plurality of the following features: an average number of request transmission bytes, a number of unit time period requests, a GET request number ratio, a request path set space ratio, a path maximum similar proportion, a path maximum repeat ring ratio, Referer maximum similar proportion, dangerous user agent UA proportion, UA maximum similar proportion, UA collection space, 404 status code proportion, 2XX status code proportion, 5XX status code proportion, maximum similar proportion of URL type, similar URL Average number of visits, average number of URL types, standard deviation of HTML request ratios, standard deviation of other request ratios, request response time, request response length, request return length, page views.
The computer device according to claim 10, wherein

The determining the verification rule comprises: determining N target behavior characteristics, setting a judgment logic and a threshold corresponding to the N target behavior characteristics;

The determining that the clusters satisfying the corresponding inspection rules comprise: calculating average values of all user IPs for the N target behavior characteristics in the current cluster, and determining that the average values of the N target behavior characteristics satisfy the corresponding judgment logic and the threshold;

or,

The determining the verification rule comprises: determining N target behavior characteristics, setting a judgment logic, a weight, and a threshold corresponding to the N target behavior characteristics;

The cluster that determines that the corresponding inspection rule is satisfied includes: calculating an average value of all user IPs for the N target behavior characteristics in the current cluster, calculating a product of the average value and the corresponding weight, and determining an average of the N target behavior characteristics. The product of the value and the corresponding weight meets the corresponding judgment logic and threshold;

The determining the verification rule includes: determining N target behavior characteristics, setting a judgment logic corresponding to the N target behavior characteristics, a threshold, an access threshold, and/or an access interval duration;

The determining that the cluster that meets the corresponding verification rule comprises: calculating an average value of the access times of all IPs in the current cluster and an average value of the access interval, and determining that the average number of the access times is greater than the threshold of the number of accesses and/or the average of the access intervals is greater than After the interval duration is accessed, the average values of all user IPs are calculated for the N target behavior characteristics in the current cluster, and the product of the average of the N target behavior characteristics and the corresponding weights is determined to satisfy the corresponding judgment logic and threshold.