[go: up one dir, main page]

CN110807488B - Anomaly detection method and device based on user peer-to-peer group - Google Patents

Anomaly detection method and device based on user peer-to-peer group Download PDF

Info

Publication number
CN110807488B
CN110807488B CN201911059212.6A CN201911059212A CN110807488B CN 110807488 B CN110807488 B CN 110807488B CN 201911059212 A CN201911059212 A CN 201911059212A CN 110807488 B CN110807488 B CN 110807488B
Authority
CN
China
Prior art keywords
user
peer
probability
target user
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911059212.6A
Other languages
Chinese (zh)
Other versions
CN110807488A (en
Inventor
张尚志
丁龙
朱纲兴
孙悦
郭晓鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Trusfort Technology Co ltd
Original Assignee
Beijing Trusfort Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Trusfort Technology Co ltd filed Critical Beijing Trusfort Technology Co ltd
Priority to CN201911059212.6A priority Critical patent/CN110807488B/en
Publication of CN110807488A publication Critical patent/CN110807488A/en
Application granted granted Critical
Publication of CN110807488B publication Critical patent/CN110807488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Discrete Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for detecting abnormity based on a user peer-to-peer group, wherein the method comprises the following steps: clustering a user set containing target users for a preset number of times according to a preset clustering algorithm to obtain a preset number of clustering clusters to which the target users belong, wherein each clustering corresponds to one clustering cluster; acquiring peer-to-peer group probability of the target user and any user in the preset times of clustering clusters according to the preset times of clustering clusters to which the target user belongs, and constructing a probability clustering cluster based on the peer-to-peer group probability; selecting a preset threshold value of behavior features from a preset behavior feature pool, and constructing an isolated forest tree containing the target user based on the probability cluster and the selected behavior features aiming at each selected behavior feature; and detecting whether the target user is an abnormal user or not according to the obtained isolated forest tree and a preset abnormal detection algorithm. The abnormality detection accuracy can be effectively improved.

Description

Anomaly detection method and device based on user peer-to-peer group
Technical Field
The invention relates to the technical field of information detection, in particular to an anomaly detection method and device based on a user peer-to-peer group.
Background
With the rapid development of computer communication technology, great convenience is brought to the work and life of users, and meanwhile, the loss of the users caused by network security or network abnormity is increased. The illegal network behavior of the user is an important factor influencing network security or network abnormity, so that the monitoring of the network behavior of the user, the discovery of the abnormal network behavior of the user and the timely processing are carried out, and the monitoring method is particularly important for guaranteeing the normal operation of the network.
At present, for the network security inside an enterprise, a statistical and analytical method is generally used, and a user network behavior model is built, and a monitored user network behavior is input into the built user network behavior model, so that whether the user network behavior deviates from a behavior baseline which is trained in advance by the user network behavior model is judged, and an abnormal detection result is given. The behavior baselines comprise individual baselines and peer-to-peer group baselines, and compared with the individual baselines based on user information, the peer-to-peer group baselines also consider the information of groups similar to the network behavior of the user, and the groups similar to the network behavior are discovered, and a user network behavior model is established on the basis of the groups, so that the method has stronger extensibility and robustness, and is more widely applied.
The user network behavior model and peer-to-peer group-based user anomaly detection comprises the following steps: clustering the user network behaviors of the target users to cluster the target users to the target peer group, realizing peer group division of the target users, and detecting whether the target users in the target peer group are abnormal users or not based on a preset abnormal detection algorithm. However, in the anomaly detection method, when peer group clustering is performed on a target user, a clustering result only contains a conclusion that the clustering result does not belong to the same peer group or belongs to the same peer group, namely, a result is not 0 or 1. However, in practical applications, for example, whether a user belongs to a peer group is only a probabilistic event, and therefore, the accuracy of the anomaly detection is not high when the anomaly detection is performed according to the anomaly detection method.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for detecting an anomaly based on a peer-to-peer group of users, so as to improve the accuracy of anomaly detection.
In a first aspect, an embodiment of the present invention provides an anomaly detection method based on a user peer group, including:
clustering a user set containing target users for a preset number of times according to a preset clustering algorithm to obtain a preset number of clustering clusters to which the target users belong, wherein each clustering corresponds to one clustering cluster;
acquiring peer-to-peer group probability of the target user and any user in the preset times of clustering clusters according to the preset times of clustering clusters to which the target user belongs, and constructing a probability clustering cluster based on the peer-to-peer group probability;
selecting a preset threshold value of behavior features from a preset behavior feature pool, and constructing an isolated forest tree containing the target user based on the probability cluster and the selected behavior features aiming at each selected behavior feature;
and detecting whether the target user is an abnormal user or not according to the obtained isolated forest tree and a preset abnormal detection algorithm.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the clustering algorithm includes a binary K-means algorithm, and the clustering is performed for a preset number of times on a user set including a target user according to a preset clustering algorithm to obtain a preset number of clusters to which the target user belongs, including:
taking the behavior characteristics contained in the user set as column vectors of the trunk cluster, and taking the value of the behavior characteristic corresponding to each user in the user set as a row vector of the trunk cluster;
traversing the trunk cluster, performing binary clustering on the row-column vector of the trunk cluster by using a binary K-means algorithm, and dividing the trunk cluster into two clusters by taking the sum of squares of errors as an index;
traversing the class clusters, and performing binary clustering on the row-column vectors of the class clusters by using a binary K-means algorithm until a preset clustering number is reached;
and selecting the cluster where the target user belongs to obtain the cluster where the target user belongs to the cluster until the number of the clusters to which the target user belongs reaches the preset times.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the obtaining, according to a preset number of clusters to which the target user belongs, a peer-to-peer group probability of the target user and any user in the preset number of clusters includes:
acquiring each user in a preset number of clustering clusters to which the target user belongs, and counting the times that the target user and the user belong to the same clustering cluster;
and aiming at each user, obtaining the peer-to-peer group probability of the target user and the user according to the times of the target user and the user belonging to the same cluster and the preset times.
With reference to the first aspect, the first possible implementation manner of the first aspect, or the second possible implementation manner, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the obtaining an isolated forest tree including the target user based on the probabilistic cluster and the selected behavior feature includes:
selecting a plurality of users including target users from the probability cluster to form a root node data set;
determining a root threshold according to the upper and lower value bounds of the selected behavior characteristics in the root node data set, and segmenting the root node data set into subsets based on the root threshold;
and determining a segmentation characteristic subthreshold value according to the upper and lower value bounds of the selected behavior characteristics in the subset, and segmenting the subset based on the segmentation characteristic subthreshold value until the size of the segmented subset is 1 or the tree depth of the isolated forest tree formed after segmentation reaches a preset tree depth threshold value.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the splitting the root node dataset into subsets based on the root threshold includes:
if the value corresponding to the selected behavior characteristic of the user is larger than the root threshold, the peer-to-peer group probability of the user, the target user and the user is placed in a first subset, and if the value corresponding to the selected behavior characteristic of the user is smaller than the root threshold, the peer-to-peer group probability of the user, the target user and the user is placed in a second subset;
if the value corresponding to the selected behavior feature of the user is equal to the root threshold:
if the peer-to-peer group probability of the target user and the user is smaller than or equal to a preset peer-to-peer group probability threshold value, placing the user, the target user and the peer-to-peer group probability of the user in a first subset or a second subset;
if the peer-to-peer group probability of the target user and the user is greater than the peer-to-peer group probability threshold, splitting the peer-to-peer group probability of the target user and the user into a first probability and a second probability, wherein the sum of the first probability and the second probability is equal to the peer-to-peer group probability;
the user sum and the first probability are placed in a first subset and the user sum and the second probability are placed in a second subset.
With reference to the fourth possible implementation manner of the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the method further includes:
and detecting whether users with values equal to the root threshold value are placed in the same subset, and if so, moving part of the users with the values equal to the root threshold value placed in the same subset to another subset.
With reference to the first aspect, the first possible implementation manner of the first aspect, or the second possible implementation manner, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where the detecting, according to the obtained isolated forest tree and a preset anomaly detection algorithm, whether the target user is an anomalous user includes:
calculating the average depth of the isolated forest trees according to the acquired longest tree depth of each isolated forest tree;
acquiring the average depth of the target user based on the longest depth of the target user in each isolated forest tree;
acquiring an abnormal score of the target user according to the average depth of the isolated forest tree and the average depth of the target user;
and if the abnormal score of the target user is larger than a preset abnormal score threshold value, determining that the target user is an abnormal user.
In a second aspect, an embodiment of the present invention further provides an apparatus for detecting an anomaly based on a peer-to-peer group of a user, including:
the clustering module is used for clustering a user set containing target users for a preset number of times according to a preset clustering algorithm to obtain a preset number of clustering clusters to which the target users belong, wherein each clustering corresponds to one clustering cluster;
the peer-to-peer group probability calculation module is used for acquiring the peer-to-peer group probability of the target user and any user in the preset times of clustering clusters according to the preset times of clustering clusters to which the target user belongs, and constructing a probability clustering cluster based on the peer-to-peer group probability;
the forest tree generation module is used for selecting a preset threshold value of behavior features from a preset behavior feature pool, and constructing an isolated forest tree containing the target user based on the probability cluster and the selected behavior features aiming at each selected behavior feature;
and the anomaly detection module is used for detecting whether the target user is an abnormal user or not according to the acquired isolated forest tree and a preset anomaly detection algorithm.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, performs the steps of the method described above.
The method and the device for detecting the abnormity based on the user peer-to-peer group provided by the embodiment of the invention perform clustering for a preset number of times on a user set containing a target user according to a preset clustering algorithm to obtain a preset number of clustering clusters to which the target user belongs, wherein each clustering corresponds to one clustering cluster; acquiring peer-to-peer group probability of the target user and any user in the preset times of clustering clusters according to the preset times of clustering clusters to which the target user belongs, and constructing a probability clustering cluster based on the peer-to-peer group probability; selecting a preset threshold value of behavior features from a preset behavior feature pool, and constructing an isolated forest tree containing the target user based on the probability cluster and the selected behavior features aiming at each selected behavior feature; and detecting whether the target user is an abnormal user or not according to the obtained isolated forest tree and a preset abnormal detection algorithm. Therefore, the peer-to-peer group probability distribution is obtained by clustering the user set for multiple times, a plurality of isolated forest trees containing the target user are obtained based on the peer-to-peer group probability distribution and the randomly selected behavior characteristics, whether the target user is abnormal or not is judged based on the plurality of isolated forest trees, the peer-to-peer group probability distribution can better describe the information of the peer-to-peer group, the information whether the user belongs to the peer-to-peer group or not is more accurate and comprehensive, the behavior of the target user can be reflected from the multiple characteristics by generating the isolated forest trees for multiple times, the accuracy of an abnormal detection result can be effectively improved, and the abnormal detection precision is improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flowchart illustrating an anomaly detection method based on a peer-to-peer group according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram illustrating an anomaly detection apparatus based on a peer-to-peer group of users according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computer device 300 according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a method and a device for detecting an abnormality based on a user peer-to-peer group, which are described by embodiments below.
At present, when peer group clustering is performed on a target user, a clustering result only contains a conclusion that the clustering result does not belong to the same peer group or belongs to the same peer group, that is, a result is not 0 or 1, so that the input received by a subsequent anomaly detection algorithm is data which is not 0 or 1. However, in practical applications, for example, "whether user B belongs to the peer-to-peer group of user a" is a probability event, and a user may belong to one or several peer-to-peer groups, and only the probabilities of belonging to different peer-to-peer groups are different, for example, the probability that user B belongs to the peer-to-peer group of user a is 80%, and the probability that user B belongs to the peer-to-peer group of user C is 60%, but the above-mentioned anomaly detection method does not consider the influence of the probability that user B belongs to the peer-to-peer group of user C being 60% on whether user B is an anomalous user, nor consider the influence of the probability that user B belongs to the peer-to-peer group of user a being only 80% (and not 100%) on whether user B is an anomalous user, so that the accuracy of anomaly detection is not high.
In a service scene, whether the user B belongs to the peer group of the user a is a probability event, if behavior baselines of two users are close to each other, the probability that the user B belongs to the peer group of the user a is high, and if the behavior baselines are different greatly, the probability that the user B belongs to the peer group of the user a is low. The higher the probability, the more important user B is in user a's peer-to-peer group, which can have a greater impact on user a's peer-to-peer group.
In the embodiment of the invention, aiming at the technical problem of low accuracy of the detection result existing in the anomaly detection based on the peer-to-peer group, the peer-to-peer group anomaly detection method based on the probability distribution is provided by considering the probability distribution of the peer-to-peer group, so as to improve the accuracy of the detection result.
Fig. 1 shows a schematic flow chart of an anomaly detection method based on a user peer-to-peer group according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step 101, clustering a user set including a target user for a preset number of times according to a preset clustering algorithm to obtain a preset number of clusters to which the target user belongs, wherein each clustering corresponds to one cluster;
in the embodiment of the invention, a preset clustering algorithm is utilized to cluster a pre-acquired user set according to the clustering algorithm, so that a clustering cluster which comprises a target user and corresponds to the clustering algorithm is obtained.
In the embodiment of the invention, because the clustering algorithm has randomness when determining the initial particles of the clustered clusters, the same clustering algorithm is utilized to perform clustering analysis on the same user set for multiple times, and the obtained clustered clusters are different. Therefore, in the embodiment of the invention, the user set containing the target user is clustered for multiple times according to the clustering algorithm to respectively obtain multiple clustering clusters containing the target user, so that the clustering clusters to which the target user belongs can be comprehensively reflected. As an alternative embodiment, the preset number may be set according to actual needs, for example, may be set to 50, 100, 120, etc.
In the embodiment of the present invention, as an optional embodiment, the clustering algorithm includes, but is not limited to: binary K-means algorithms, density-based algorithms, grid-based algorithms.
In this embodiment, as an optional embodiment, the clustering algorithm includes a binary K-means algorithm, and performs clustering on a user set including a target user for a preset number of times according to a preset clustering algorithm to obtain a preset number of clusters to which the target user belongs, including:
a11, using the behavior features contained in the user set as column vectors of a trunk cluster, and using the value of the behavior feature corresponding to each user in the user set as a row vector of the trunk cluster;
in the embodiment of the invention, users in a user set are initialized to be a main cluster.
In the embodiment of the present invention, for employees inside an enterprise, behavior features are used to characterize network behaviors of the employees, including but not limited to: the access times of the user to each resource, the access times to the resources in one week, the access times to the resources in one month, the access duration, the access accumulated duration and the like.
A12, traversing a trunk cluster, performing binary clustering on a row-column vector of the trunk cluster by using a binary K-means algorithm, and dividing the trunk cluster into two clusters by taking the sum of squares of errors as an index;
in the embodiment of the invention, the error sum of squares is used as an index, the optimal cluster in the current cycle is selected, and the trunk cluster is divided into two clusters according to the result. Wherein, the specific clustering process can be referred to the related technical literature, and the detailed description is omitted here.
A13, traversing the cluster, and performing binary clustering on the row-column vector of the cluster by using a binary K-means algorithm until the preset clustering number is reached;
a14, selecting the cluster where the target user belongs to obtain the cluster where the target user belongs to the cluster, until the number of clusters where the target user belongs to reaches the preset times.
In the embodiment of the invention, the binary K-means algorithm is a distance-based clustering algorithm, and the minimized sample and particle square error is taken as an objective function. The distortion level is characterized by the squared distance error of the mass point of each cluster from the sample point within the cluster. The degree of distortion characterizes the looseness of the members within the cluster.
In the embodiment of the present invention, as an optional embodiment, the preset number k of clusters is 8.
In the embodiment of the invention, a plurality of peer-to-peer groups (cluster clusters) to which the target user belongs can be obtained by clustering the user set containing the target user for a plurality of times, so that the network behavior of the target user can be comprehensively reflected.
102, acquiring peer-to-peer group probability of the target user and any user in a preset number of clustering clusters according to the preset number of clustering clusters to which the target user belongs, and constructing a probability clustering cluster based on the peer-to-peer group probability;
in this embodiment, as an optional embodiment, obtaining the peer-to-peer group probability of the target user and any user in the predetermined number of clusters according to the predetermined number of clusters to which the target user belongs includes:
a21, acquiring each user in a preset number of clustering clusters to which the target user belongs, and counting the times that the target user and the user belong to the same clustering cluster;
in the embodiment of the present invention, if 100 clustering is performed on the user set including the target user, 100 clustering clusters can be obtained, other users except the target user in the 100 clustering clusters are obtained, and the number of times that the target user and each other user belong to the same clustering cluster is counted. For example, if the user a and the user B appear in 80 cluster clusters at the same time, the number of times that the user a and the user B belong to the same cluster is 80.
And A22, aiming at each user, obtaining the peer-to-peer group probability of the target user and the user according to the times of the target user and the user belonging to the same cluster and the preset times.
In the embodiment of the invention, if 100 times of clustering is carried out, the frequency that the user A and the user B belong to the same cluster is 80, and the peer-to-peer group probability of the user A and the user B is 80 percent; if the number of times that the user a and the user C belong to the same cluster is 1, the peer-to-peer group probability of the user a and the user C is 1%, and the peer-to-peer group probability distribution of the user a is formed by the peer-to-peer group probabilities.
According to the law of large numbers, the effect of approximating probability distribution can be achieved by counting and repeating experiments for many times. Therefore, in the embodiment of the present invention, the approximate peer group probability distribution of each member (user) in the peer group is formed through multiple peer group (cluster) divisions, so that the peer group probability distribution of the target user is approximated in the form of multiple peer groups, and the accuracy of subsequent anomaly detection is improved based on the peer group probability distribution.
In the embodiment of the invention, the probability cluster is obtained by taking the preset times as 100 times, the cluster number as 8 and clustering by adopting a binary K-means algorithm as an example, namely the peer-to-peer group frequency (frequency) distribution of the target user is shown in Table 1. Wherein, the peer-to-peer group probability is the ratio of the frequency to the preset frequency.
TABLE 1
Figure BDA0002257409160000111
Figure BDA0002257409160000121
103, selecting a preset threshold value of behavior features from a preset behavior feature pool, and constructing an isolated forest tree containing the target user based on the probability cluster and the selected behavior features for each selected behavior feature;
in the embodiment of the invention, the preset threshold value can be set according to actual needs. The behavior features which can be segmented can be randomly selected from a behavior feature pool, and a plurality of selected behavior features form a candidate feature set.
In the embodiment of the present invention, as an optional embodiment, acquiring an isolated forest tree including the target user based on the probabilistic cluster and the selected behavior feature includes:
a31, selecting a plurality of users including target users from the probability cluster to form a root node data set;
in the embodiment of the invention, a part of users including the target user are selected from the user set, the root node data set is constructed, and the number of the selected users can be set according to actual needs.
Step 32, determining a root threshold according to the upper and lower value bounds of the selected behavior characteristics in the root node data set, and segmenting the root node data set into subsets based on the root threshold;
in the embodiment of the present invention, for each behavior feature in the candidate feature set, a root threshold corresponding to the behavior feature is determined based on a value upper-lower bound of a value taken by the behavior feature in the root node data set, and the root node data set is split into a plurality of subsets, for example, a first subset and a second subset, based on the root threshold.
In the embodiment of the invention, for each round of segmentation: randomly selecting behavior characteristics capable of generating division from the candidate characteristic set, randomly selecting threshold values (root threshold values and sub threshold values) from the node data set (root node data set and subsets) between the upper and lower bounds of the values of the behavior characteristics, and splitting the node data set into two subsets based on the threshold values.
In this embodiment, as an optional embodiment, the segmenting the root node dataset into subsets based on the root threshold includes:
a321, if the value corresponding to the selected behavior feature of the user is greater than the root threshold, placing the peer group probabilities of the user, the target user and the user in a first subset, and if the value corresponding to the selected behavior feature of the user is less than the root threshold, placing the peer group probabilities of the user, the target user and the user in a second subset;
a322, if the value corresponding to the selected behavior feature of the user is equal to the root threshold:
if the peer-to-peer group probability of the target user and the user is smaller than or equal to a preset peer-to-peer group probability threshold value, placing the user, the target user and the peer-to-peer group probability of the user in a first subset or a second subset;
if the peer-to-peer group probability of the target user and the user is greater than the peer-to-peer group probability threshold, splitting the peer-to-peer group probability of the target user and the user into a first probability and a second probability, wherein the sum of the first probability and the second probability is equal to the peer-to-peer group probability;
the user sum and the first probability are placed in a first subset and the user sum and the second probability are placed in a second subset.
In the embodiment of the present invention, according to the root node data set, a user whose value corresponding to the selected behavior feature is equal to the root threshold is recorded as:
Figure BDA0002257409160000131
wherein,
xjthe value corresponding to the selected behavior characteristic of the user j in the probability cluster is obtained;
pjis the peer-to-peer group probability of the target user and the user j in the probabilistic cluster.
In the embodiment of the invention, the peer-to-peer group probability threshold is determined according to the peer-to-peer group probability distribution.
In the embodiment of the invention, if the range of the peer-to-peer probability distribution is [ min _ p, max _ p ]]Selecting a peer-to-peer group probability threshold p from the peer-to-peer group probability distributionmaxWherein min _ p < pmax<max_p。
In an embodiment of the invention, traversal
Figure BDA0002257409160000141
For any xj:pj
When p isj≤pmaxThen, x is randomly combinedj:pjDue to the first subset
Figure BDA0002257409160000142
Or the second subset
Figure BDA0002257409160000143
When p isj>pmaxThen, the peer group probability pjRandom splitting into pj1、pj2X is to bej:pj1Due to the first subset
Figure BDA0002257409160000144
xj:pj2Due to the second subset
Figure BDA0002257409160000145
In the embodiment of the present invention, as an optional embodiment, the points (users) with higher peer-to-peer group probability are further split, so as to implement the differentiated treatment of peer-to-peer group probability distribution.
In this embodiment of the present invention, as an optional embodiment, the method further includes:
and detecting whether users with values equal to the root threshold value are placed in the same subset, and if so, moving part of the users with the values equal to the root threshold value placed in the same subset to another subset.
In an embodiment of the invention, detection
Figure BDA0002257409160000146
If all users are classified in the same subset, one or more x are randomly selectedj:pj1Move it from the atom set to another subset to ensure
Figure BDA0002257409160000147
Is split into two non-empty parts.
A33, determining a segmentation feature sub-threshold according to the upper and lower value bounds of the selected behavior features in the subset, and segmenting the subset based on the segmentation feature sub-threshold until the size of the segmented subset is 1 or the tree depth of the isolated forest tree formed after segmentation reaches a preset tree depth threshold.
In the embodiment of the invention, the subset formed by segmenting the root node data set is segmented again according to the same processing mode as the root node data set until the preset segmentation finishing condition is met, namely the size of the segmented subset is 1, or the tree depth of the isolated forest tree formed after segmentation reaches the preset tree depth threshold.
In this embodiment, as an optional embodiment, after the root node dataset is partitioned into subsets, before determining a partition characteristic subthreshold according to upper and lower value bounds of the selected behavior characteristic in the subsets, the method further includes:
and judging whether the size of each subset obtained by segmentation is 1 or not, or judging whether the tree depth of the isolated forest tree formed after segmentation reaches a preset tree depth threshold value, if so, obtaining the isolated forest tree containing the target user, and if not, executing the step of determining the segmentation characteristic sub-threshold value according to the upper and lower value boundaries of the selected behavior characteristic in the subsets.
In the embodiment of the invention, users with large probability distribution of peer groups are more likely to split in the generation process of the isolated forest tree, and split points respectively enter the left child node and the right child node, so that the tree structure is expanded, and the users are detected as abnormal users with smaller probability.
In the embodiment of the invention, the pseudo code segments of the isolated forest tree containing the target user are obtained based on the probability cluster and the selected behavior characteristics as follows:
Figure BDA0002257409160000151
Figure BDA0002257409160000161
and 104, detecting whether the target user is an abnormal user or not according to the obtained isolated forest tree and a preset abnormal detection algorithm.
In the embodiment of the present invention, the anomaly detection algorithm includes, but is not limited to: an isolated forest (iForest) algorithm, a single Class support vector machine (One-Class SVM) algorithm, a Local anomaly Factor (LOF) algorithm, a Singular Value Decomposition (SVD) algorithm, an Automatic Encoder (AE) algorithm, a Variational Automatic Encoder (VAE) algorithm, and the like.
In the embodiment of the invention, as an optional embodiment, an iForest algorithm is adopted as an anomaly detection algorithm, and the iForest algorithm is an extension of an isolated Forest (RF) algorithm.
In the embodiment of the present invention, as an optional embodiment, detecting whether the target user is an abnormal user according to the obtained isolated forest tree and a preset abnormal detection algorithm includes:
a41, calculating the average depth of the isolated forest trees according to the acquired longest tree depth of each isolated forest tree;
a42, acquiring the average depth of the target user based on the longest depth of the target user in each isolated forest tree;
a43, acquiring an abnormal score of the target user according to the average depth of the isolated forest tree and the average depth of the target user;
a44, if the abnormal score of the target user is larger than a preset abnormal score threshold, determining that the target user is an abnormal user.
In the embodiment of the present invention, as an optional embodiment, the abnormality score of the target user is calculated by using the following formula:
Figure BDA0002257409160000171
wherein,
Figure BDA0002257409160000172
anomalous scores for target users
Figure BDA0002257409160000173
The average depth of the isolated forest trees;
h (x) is the average depth of the target user;
e () is the averaging.
In the embodiment of the present invention, as an optional embodiment, the average depth of the target user is calculated by using the following formula:
Figure BDA0002257409160000174
wherein,
n is the number of users contained in leaf nodes in an isolated forest tree where the target user is located;
e is the number of edges from the root node data set to the leaf nodes in the isolated forest tree where the target user is located;
Figure BDA0002257409160000175
is a correction to the average depth of the target user.
H(n-1)≈ln(n-1)+ξ
Wherein,
ξ is the Euler constant.
In the embodiment of the invention, the abnormal score of the target user to be detected is calculated and compared with the abnormal score threshold, and if the abnormal score of the target user is greater than the abnormal score threshold, the target user is determined to be the abnormal user.
In the embodiment of the invention, through the isolated forest tree generation method provided by the embodiment of the invention, the probability of the user being detected as the abnormal user is inversely proportional to the peer-to-peer group probability distribution of the user in the probability cluster (peer-to-peer group), and the user with the lower peer-to-peer group probability distribution indicates that the user is at the edge in the peer-to-peer group, so that the user is more easily detected as the abnormal user in an abnormal detection algorithm.
According to the anomaly detection method based on the user peer-to-peer group, the user set comprising the target user is clustered for the preset times according to the preset clustering algorithm to obtain a preset number of clustering clusters to which the target user belongs, wherein each clustering corresponds to one clustering cluster; acquiring peer-to-peer group probability of the target user and any user in the preset times of clustering clusters according to the preset times of clustering clusters to which the target user belongs, and constructing a probability clustering cluster based on the peer-to-peer group probability; selecting a preset threshold value of behavior features from a preset behavior feature pool, and constructing an isolated forest tree containing the target user based on the probability cluster and the selected behavior features aiming at each selected behavior feature; and detecting whether the target user is an abnormal user or not according to the obtained isolated forest tree and a preset abnormal detection algorithm. Therefore, by introducing the peer-to-peer group probability distribution method, the information of the peer-to-peer group can be better described, so that the information of whether the user belongs to the peer-to-peer group is more accurate and comprehensive, and a plurality of isolated forest trees containing the target user are obtained based on the peer-to-peer group probability distribution and the randomly selected behavior characteristics, so that whether the target user is abnormal is judged based on the plurality of isolated forest trees, the accuracy of an abnormal detection result is effectively improved, and the abnormal detection precision is improved.
Fig. 2 is a schematic structural diagram illustrating an anomaly detection apparatus based on a user peer-to-peer group according to an embodiment of the present invention. As shown in fig. 2, the apparatus includes:
the clustering module 201 is configured to perform clustering for a preset number of times on a user set including a target user according to a preset clustering algorithm to obtain a preset number of clusters to which the target user belongs, where each clustering corresponds to one cluster;
in the embodiment of the invention, the user set containing the target user is clustered for multiple times according to a clustering algorithm to respectively obtain a plurality of clustering clusters containing the target user.
In the embodiment of the present invention, the clustering algorithm includes, but is not limited to: binary K-means algorithm, density-based algorithm, grid-based algorithm. As an alternative embodiment, a binary K-means algorithm is used for performing clustering analysis, and the clustering module 201 includes:
a trunk cluster constructing unit (not shown in the figure), configured to use the behavior features included in the user set as column vectors of a trunk cluster, and use a value of a behavior feature corresponding to each user in the user set as a row vector of the trunk cluster;
in the embodiment of the present invention, the behavior characteristics include, but are not limited to: the access times of the user to each resource, the access times to the resources in one week, the access times to the resources in one month, the access duration, the access accumulated duration and the like.
The class cluster dividing unit is used for traversing the trunk cluster, performing binary clustering on the row-column vector of the trunk cluster by using a binary K-means algorithm, and dividing the trunk cluster into two class clusters by taking the sum of squares of errors as an index;
the dividing traversing unit is used for traversing the class clusters and performing binary clustering on the row-column vectors of the class clusters by using a binary K-means algorithm until the preset clustering number is reached;
and the cluster acquisition unit is used for selecting the cluster where the target user is located to obtain the cluster to which the target user belongs in the cluster until the number of the clusters to which the target user belongs reaches the preset times.
A peer-to-peer group probability calculation module 202, configured to obtain, according to a preset number of clusters to which the target user belongs, a peer-to-peer group probability of the target user and any user in the preset number of clusters, and construct a probability cluster based on the peer-to-peer group probability;
in this embodiment of the present invention, as an optional embodiment, the peer-to-peer group probability calculating module 202 includes:
a counting unit (not shown in the figure) configured to obtain each user in a preset number of clustering clusters to which the target user belongs, and count the number of times that the target user and the user belong to the same clustering cluster;
and the peer-to-peer group probability calculating unit is used for obtaining the peer-to-peer group probability of the target user and the user according to the times of the target user and the user belonging to the same cluster and the preset times aiming at each user.
A forest tree generation module 203, configured to select a preset threshold number of behavior features from a preset behavior feature pool, and construct, for each selected behavior feature, an isolated forest tree including the target user based on the probabilistic cluster and the selected behavior feature;
in this embodiment of the present invention, as an optional embodiment, the forest tree generation module 203 includes:
a feature selection unit (not shown in the figure) for selecting a preset threshold number of behavior features from a preset behavior feature pool;
in the embodiment of the invention, a part of users including the target user are selected from the user set, the root node data set is constructed, and the number of the selected users can be set according to actual needs.
A root node data set generating unit, configured to select, for each selected behavior feature, multiple users including a target user from the probabilistic cluster, and form a root node data set;
the segmentation unit is used for determining a root threshold according to the upper and lower value bounds of the selected behavior characteristics in the root node data set, and segmenting the root node data set into subsets based on the root threshold;
in the embodiment of the present invention, as an optional embodiment, the segmentation unit includes:
a root threshold determination subunit, configured to determine a root threshold according to upper and lower bounds of values of the selected behavior feature in the root node dataset;
a first processing subunit, configured to, if the value corresponding to the selected behavior feature of the user is greater than the root threshold, place the peer-to-peer group probabilities of the user, the target user, and the user in a first subset, and if the value corresponding to the selected behavior feature of the user is less than the root threshold, place the peer-to-peer group probabilities of the user, the target user, and the user in a second subset;
a second processing subunit, configured to, if the value corresponding to the selected behavior feature of the user is equal to the root threshold:
if the peer-to-peer group probability of the target user and the user is smaller than or equal to a preset peer-to-peer group probability threshold value, placing the user, the target user and the peer-to-peer group probability of the user in a first subset or a second subset;
if the peer-to-peer group probability of the target user and the user is greater than the peer-to-peer group probability threshold, splitting the peer-to-peer group probability of the target user and the user into a first probability and a second probability, wherein the sum of the first probability and the second probability is equal to the peer-to-peer group probability;
the user sum and the first probability are placed in a first subset and the user sum and the second probability are placed in a second subset.
In the embodiment of the present invention, according to the root node data set, a user whose value corresponding to the selected behavior feature is equal to the root threshold is recorded as:
Figure BDA0002257409160000211
in an embodiment of the invention, traversal
Figure BDA0002257409160000212
For any xj:pj
When p isj≤pmaxThen, x is randomly combinedj:pjDue to the first subset
Figure BDA0002257409160000213
Or the second subset
Figure BDA0002257409160000214
When p isj>pmaxThen, the peer group probability pjRandom splitting into pj1、pj2X is to bej:pj1Due to the first subset
Figure BDA0002257409160000215
xj:pj2Due to the second subset
Figure BDA0002257409160000216
In this embodiment of the present invention, as an optional embodiment, the segmentation unit further includes:
and the detecting subunit is used for detecting whether the users with the values equal to the root threshold value are placed in the same subset, and if so, moving the part of the users with the values equal to the root threshold value placed in the same subset to another subset.
And the isolated forest tree generation unit is used for determining a segmentation characteristic subthreshold according to the upper and lower value boundaries of the selected behavior characteristics in the subset, and segmenting the subset based on the segmentation characteristic subthreshold until the size of the segmented subset is 1 or the tree depth of the isolated forest tree formed after segmentation reaches a preset tree depth threshold.
In the embodiment of the invention, for each round of segmentation: randomly selecting behavior characteristics capable of generating division from the candidate characteristic set, randomly selecting threshold values (root threshold values and sub threshold values) from the node data set (root node data set and subsets) between the upper and lower bounds of the values of the behavior characteristics, and splitting the node data set into two subsets based on the threshold values.
And the anomaly detection module 204 is configured to detect whether the target user is an abnormal user according to the obtained isolated forest tree and a preset anomaly detection algorithm.
In the embodiment of the present invention, the anomaly detection algorithm includes, but is not limited to: isolated forest algorithm, single-classification support vector machine algorithm, local abnormal factor algorithm, singular value decomposition algorithm, automatic coding machine algorithm, variation automatic coding machine algorithm and the like.
In this embodiment of the present invention, as an optional embodiment, the anomaly detection module 204 includes:
the first calculating subunit is used for calculating the average depth of the isolated forest trees according to the acquired longest tree depth of each isolated forest tree;
the second calculating subunit is used for acquiring the average depth of the target user based on the longest depth of the target user in each isolated forest tree;
the abnormal score calculating unit is used for acquiring the abnormal score of the target user according to the average depth of the isolated forest tree and the average depth of the target user;
and the abnormality determining unit is used for determining the target user as an abnormal user if the abnormal score of the target user is larger than a preset abnormal score threshold value.
In the embodiment of the present invention, as an optional embodiment, the abnormality score of the target user is calculated by using the following formula:
Figure BDA0002257409160000221
wherein,
Figure BDA0002257409160000222
anomalous scores for target users
Figure BDA0002257409160000223
The average depth of the isolated forest trees;
h (x) is the average depth of the target user;
e () is the averaging.
In the embodiment of the present invention, as an optional embodiment, the average depth of the target user is calculated by using the following formula:
Figure BDA0002257409160000231
wherein,
n is the number of users contained in leaf nodes in an isolated forest tree where the target user is located;
e is the number of edges from the root node data set to the leaf nodes in the isolated forest tree where the target user is located;
Figure BDA0002257409160000232
is a correction to the average depth of the target user.
H(n-1)≈ln(n-1)+ξ
Wherein,
ξ is the Euler constant.
As shown in fig. 3, an embodiment of the present application provides a computer device 300, configured to execute the user peer-to-peer group based anomaly detection method in fig. 1, where the device includes a memory 301, a processor 302, and a computer program stored in the memory 301 and executable on the processor 302, where the processor 302 implements the steps of the user peer-to-peer group based anomaly detection method when executing the computer program.
Specifically, the memory 301 and the processor 302 can be general-purpose memories and processors, which are not limited to specific examples, and the processor 302 can execute the above-mentioned user peer group-based anomaly detection method when executing the computer program stored in the memory 301.
Corresponding to the user peer-to-peer group based anomaly detection method in fig. 1, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the user peer-to-peer group based anomaly detection method.
Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, and when the computer program on the storage medium is executed, the above-mentioned anomaly detection method based on the user peer group can be executed.
In the embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and there may be other divisions in actual implementation, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of systems or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. An anomaly detection method based on user peer-to-peer groups is characterized by comprising the following steps:
clustering a user set containing target users for a preset number of times according to a preset clustering algorithm to obtain a preset number of clustering clusters to which the target users belong, wherein each clustering corresponds to one clustering cluster;
acquiring peer-to-peer group probability of the target user and any user in the preset times of clustering clusters according to the preset times of clustering clusters to which the target user belongs, and constructing a probability clustering cluster based on the peer-to-peer group probability;
selecting a preset threshold value of behavior features from a preset behavior feature pool, and constructing an isolated forest tree containing the target user based on the probability cluster and the selected behavior features aiming at each selected behavior feature;
detecting whether the target user is an abnormal user or not according to the obtained isolated forest tree and a preset abnormal detection algorithm;
the obtaining of the isolated forest tree including the target user based on the probabilistic cluster and the selected behavior feature includes:
selecting a plurality of users including target users from the probability cluster to form a root node data set;
determining a root threshold according to the upper and lower value bounds of the selected behavior characteristics in the root node data set, and segmenting the root node data set into subsets based on the root threshold;
determining a segmentation characteristic subthreshold value according to the upper and lower value bounds of the selected behavior characteristics in the subset, and segmenting the subset based on the segmentation characteristic subthreshold value until the size of the segmented subset is 1 or the tree depth of the isolated forest tree formed after segmentation reaches a preset tree depth threshold value;
the segmenting the root node dataset into subsets based on the root threshold comprises:
if the value corresponding to the selected behavior characteristic of the user is larger than the root threshold, the peer-to-peer group probability of the user, the target user and the user is placed in a first subset, and if the value corresponding to the selected behavior characteristic of the user is smaller than the root threshold, the peer-to-peer group probability of the user, the target user and the user is placed in a second subset;
if the value corresponding to the selected behavior feature of the user is equal to the root threshold:
if the peer-to-peer group probability of the target user and the user is smaller than or equal to a preset peer-to-peer group probability threshold value, placing the user, the target user and the peer-to-peer group probability of the user in a first subset or a second subset;
if the peer-to-peer group probability of the target user and the user is greater than the peer-to-peer group probability threshold, splitting the peer-to-peer group probability of the target user and the user into a first probability and a second probability, wherein the sum of the first probability and the second probability is equal to the peer-to-peer group probability;
the user sum and the first probability are placed in a first subset and the user sum and the second probability are placed in a second subset.
2. The method according to claim 1, wherein the clustering algorithm comprises a binary K-means algorithm, and the clustering a user set including a target user according to a preset clustering algorithm for a preset number of times to obtain a preset number of clusters to which the target user belongs comprises:
taking the behavior characteristics contained in the user set as column vectors of the trunk cluster, and taking the value of the behavior characteristic corresponding to each user in the user set as a row vector of the trunk cluster;
traversing the trunk cluster, performing binary clustering on the row-column vector of the trunk cluster by using a binary K-means algorithm, and dividing the trunk cluster into two clusters by taking the sum of squares of errors as an index;
traversing the class clusters, and performing binary clustering on the row-column vectors of the class clusters by using a binary K-means algorithm until a preset clustering number is reached;
and selecting the cluster where the target user belongs to obtain the cluster where the target user belongs to the cluster until the number of the clusters to which the target user belongs reaches the preset times.
3. The method according to claim 1, wherein the obtaining the peer-to-peer group probability of the target user and any user in the predetermined number of clusters according to the predetermined number of clusters to which the target user belongs comprises:
acquiring each user in a preset number of clustering clusters to which the target user belongs, and counting the times that the target user and the user belong to the same clustering cluster;
and aiming at each user, obtaining the peer-to-peer group probability of the target user and the user according to the times of the target user and the user belonging to the same cluster and the preset times.
4. The method of claim 1, further comprising:
and detecting whether users with values equal to the root threshold value are placed in the same subset, and if so, moving part of the users with the values equal to the root threshold value placed in the same subset to another subset.
5. The method as claimed in any one of claims 1 to 3, wherein the detecting whether the target user is an abnormal user according to the obtained isolated forest tree and a preset abnormal detection algorithm comprises:
calculating the average depth of the isolated forest trees according to the acquired longest tree depth of each isolated forest tree;
acquiring the average depth of the target user based on the longest depth of the target user in each isolated forest tree;
acquiring an abnormal score of the target user according to the average depth of the isolated forest tree and the average depth of the target user;
and if the abnormal score of the target user is larger than a preset abnormal score threshold value, determining that the target user is an abnormal user.
6. An anomaly detection apparatus based on a user peer-to-peer group, comprising:
the clustering module is used for clustering a user set containing target users for a preset number of times according to a preset clustering algorithm to obtain a preset number of clustering clusters to which the target users belong, wherein each clustering corresponds to one clustering cluster;
the peer-to-peer group probability calculation module is used for acquiring the peer-to-peer group probability of the target user and any user in the preset times of clustering clusters according to the preset times of clustering clusters to which the target user belongs, and constructing a probability clustering cluster based on the peer-to-peer group probability;
the forest tree generation module is used for selecting a preset threshold value of behavior features from a preset behavior feature pool, and constructing an isolated forest tree containing the target user based on the probability cluster and the selected behavior features aiming at each selected behavior feature;
the anomaly detection module is used for detecting whether the target user is an abnormal user or not according to the acquired isolated forest tree and a preset anomaly detection algorithm;
the obtaining of the isolated forest tree including the target user based on the probabilistic cluster and the selected behavior feature includes:
selecting a plurality of users including target users from the probability cluster to form a root node data set;
determining a root threshold according to the upper and lower value bounds of the selected behavior characteristics in the root node data set, and segmenting the root node data set into subsets based on the root threshold;
determining a segmentation characteristic subthreshold value according to the upper and lower value bounds of the selected behavior characteristics in the subset, and segmenting the subset based on the segmentation characteristic subthreshold value until the size of the segmented subset is 1 or the tree depth of the isolated forest tree formed after segmentation reaches a preset tree depth threshold value;
the segmenting the root node dataset into subsets based on the root threshold comprises:
if the value corresponding to the selected behavior characteristic of the user is larger than the root threshold, the peer-to-peer group probability of the user, the target user and the user is placed in a first subset, and if the value corresponding to the selected behavior characteristic of the user is smaller than the root threshold, the peer-to-peer group probability of the user, the target user and the user is placed in a second subset;
if the value corresponding to the selected behavior feature of the user is equal to the root threshold:
if the peer-to-peer group probability of the target user and the user is smaller than or equal to a preset peer-to-peer group probability threshold value, placing the user, the target user and the peer-to-peer group probability of the user in a first subset or a second subset;
if the peer-to-peer group probability of the target user and the user is greater than the peer-to-peer group probability threshold, splitting the peer-to-peer group probability of the target user and the user into a first probability and a second probability, wherein the sum of the first probability and the second probability is equal to the peer-to-peer group probability;
the user sum and the first probability are placed in a first subset and the user sum and the second probability are placed in a second subset.
7. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the user peer group based anomaly detection method according to any one of claims 1 to 5.
8. A computer-readable storage medium, having stored thereon a computer program for executing the steps of the method for subscriber peer-group based anomaly detection according to any one of claims 1 to 5, when being executed by a processor.
CN201911059212.6A 2019-11-01 2019-11-01 Anomaly detection method and device based on user peer-to-peer group Active CN110807488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911059212.6A CN110807488B (en) 2019-11-01 2019-11-01 Anomaly detection method and device based on user peer-to-peer group

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911059212.6A CN110807488B (en) 2019-11-01 2019-11-01 Anomaly detection method and device based on user peer-to-peer group

Publications (2)

Publication Number Publication Date
CN110807488A CN110807488A (en) 2020-02-18
CN110807488B true CN110807488B (en) 2022-03-08

Family

ID=69500910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911059212.6A Active CN110807488B (en) 2019-11-01 2019-11-01 Anomaly detection method and device based on user peer-to-peer group

Country Status (1)

Country Link
CN (1) CN110807488B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612037B (en) * 2020-04-24 2024-06-21 平安直通咨询有限公司上海分公司 Abnormal user detection method, device, medium and electronic equipment
CN111695597B (en) * 2020-05-07 2023-05-02 数尊信息科技(上海)有限公司 Credit fraud group identification method and system based on improved isolated forest algorithm
CN114201999A (en) * 2020-08-31 2022-03-18 中国移动通信集团浙江有限公司 Identification method, system, computing device and storage medium of abnormal account
CN112632609B (en) * 2020-12-23 2024-07-02 深圳云天励飞技术股份有限公司 Abnormality detection method, abnormality detection device, electronic device, and storage medium
CN112632532B (en) * 2020-12-28 2022-08-23 重庆邮电大学 User abnormal behavior detection method based on deep forest in edge calculation
CN112905583A (en) * 2021-04-01 2021-06-04 辽宁工程技术大学 High-dimensional big data outlier detection method
CN113705478B (en) * 2021-08-31 2024-02-27 中国林业科学研究院资源信息研究所 Mangrove single wood target detection method based on improved YOLOv5
CN113949652B (en) * 2021-10-12 2023-03-21 平安普惠企业管理有限公司 User abnormal behavior detection method and device based on artificial intelligence and related equipment
CN114417957A (en) * 2021-12-03 2022-04-29 奇安信科技集团股份有限公司 User abnormal behavior detection method, device, electronic device, medium and program
CN114793205B (en) * 2022-04-25 2024-09-06 咪咕文化科技有限公司 Abnormal link detection method, device, equipment and storage medium
CN117576823B (en) * 2023-11-29 2024-05-14 上海徽视科技集团有限公司 Queuing and calling system terminal

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286783A1 (en) * 2014-04-02 2015-10-08 Palo Alto Research Center Incorporated Peer group discovery for anomaly detection
US9544321B2 (en) * 2015-01-30 2017-01-10 Securonix, Inc. Anomaly detection using adaptive behavioral profiles
US9953160B2 (en) * 2015-10-13 2018-04-24 Paypal, Inc. Applying multi-level clustering at scale to unlabeled data for anomaly detection and security
CN109828825A (en) * 2019-01-07 2019-05-31 平安科技(深圳)有限公司 Abnormal deviation data examination method, device, computer equipment and storage medium
CN110162970B (en) * 2019-01-08 2024-12-27 腾讯科技(深圳)有限公司 A program processing method, device and related equipment
CN109873812B (en) * 2019-01-28 2020-06-23 腾讯科技(深圳)有限公司 Anomaly detection method and device and computer equipment
CN110060087B (en) * 2019-03-07 2023-08-04 创新先进技术有限公司 Abnormal data detection method, device and server
CN110149258A (en) * 2019-04-12 2019-08-20 北京航空航天大学 A kind of automobile CAN-bus network data method for detecting abnormality based on isolated forest
CN110046665A (en) * 2019-04-17 2019-07-23 成都信息工程大学 Based on isolated two abnormal classification point detecting method of forest, information data processing terminal
CN110351307B (en) * 2019-08-14 2022-01-28 杭州安恒信息技术股份有限公司 Abnormal user detection method and system based on ensemble learning

Also Published As

Publication number Publication date
CN110807488A (en) 2020-02-18

Similar Documents

Publication Publication Date Title
CN110807488B (en) Anomaly detection method and device based on user peer-to-peer group
CN107563429B (en) Method and device for classifying network user groups
US20190166024A1 (en) Network anomaly analysis apparatus, method, and non-transitory computer readable storage medium thereof
CN108809745A (en) A kind of user&#39;s anomaly detection method, apparatus and system
CN107622072B (en) Identification method for webpage operation behavior, server and terminal
CN111914253B (en) Method, system, equipment and readable storage medium for intrusion detection
CN117978461B (en) Abnormal login detection method and system based on isolated forest
CN110245687B (en) User classification method and device
CN109257383B (en) BGP anomaly detection method and system
CN111090807A (en) Knowledge graph-based user identification method and device
Megantara et al. Feature importance ranking for increasing performance of intrusion detection system
CN106301979B (en) Method and system for detecting abnormal channel
CN110866249A (en) Method and device for dynamically detecting malicious code and electronic equipment
EP2095277A1 (en) Fuzzy database matching
CN113992340A (en) User Abnormal Behavior Recognition Method, Apparatus, Equipment, Storage Medium and Program
CN113125903A (en) Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN110968802B (en) Analysis method and analysis device for user characteristics and readable storage medium
CN112437053A (en) Intrusion detection method and device
CN109783805B (en) Network community user identification method and device and readable storage medium
CN111586001A (en) Abnormal user identification method, device, electronic device and storage medium
CN114780606B (en) Big data mining method and system
CN114969738B (en) Interface abnormal behavior monitoring method, system, device and storage medium
CN108076032B (en) Abnormal behavior user identification method and device
CN111506624A (en) Electric power missing data identification method and related device
CN112801113B (en) Data denoising method based on multi-scale reliable clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant