[go: up one dir, main page]

CN109508733A - A kind of method for detecting abnormality based on distribution probability measuring similarity - Google Patents

A kind of method for detecting abnormality based on distribution probability measuring similarity Download PDF

Info

Publication number
CN109508733A
CN109508733A CN201811233705.2A CN201811233705A CN109508733A CN 109508733 A CN109508733 A CN 109508733A CN 201811233705 A CN201811233705 A CN 201811233705A CN 109508733 A CN109508733 A CN 109508733A
Authority
CN
China
Prior art keywords
data
sample
training
node
test point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811233705.2A
Other languages
Chinese (zh)
Inventor
高欣
井潇
何杨
查森
纪维佳
任昺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201811233705.2A priority Critical patent/CN109508733A/en
Publication of CN109508733A publication Critical patent/CN109508733A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Complex Calculations (AREA)

Abstract

本发明实施例提出了一种基于分布概率相似度度量的异常检测方法,包括:多次随机采样获得正常样本数据的多个子集,以全二叉树结构保存每个子集的随机隔离过程,根据漂移比例划定回溯的阈值深度;根据测试点落在每棵树的外部叶子节点位置及阈值深度,由其所在叶子节点回溯到阈值深度的祖先节点,提取该节点下所有数据作为度量与测试点相似度的训练数据;以测试点与训练数据集内某点为端点,在各属性维度上分别计算其余数据点出现在此两点之间的概率,结合闵氏距离计算测试点与数据集内所有点的不相似程度,得出该点的异常值。本发明实施例提供的技术方案,可以有效解决训练数据集中无异常数据及局部异常检测问题。

The embodiment of the present invention proposes an anomaly detection method based on a similarity measure of distribution probability, including: obtaining multiple subsets of normal sample data by random sampling for multiple times, saving the random isolation process of each subset in a full binary tree structure, and according to the drift ratio Delineate the threshold depth of backtracking; according to the position of the outer leaf node and the threshold depth of the test point falling on each tree, the leaf node where it is located is backtracked to the ancestor node of the threshold depth, and all the data under the node are extracted as a measure of similarity with the test point training data; take the test point and a certain point in the training data set as the endpoint, calculate the probability of the remaining data points appearing between these two points in each attribute dimension, and calculate the test point and all points in the data set by combining the Min's distance The degree of dissimilarity is obtained, and the outlier of this point is obtained. The technical solution provided by the embodiment of the present invention can effectively solve the problem of no abnormal data in the training data set and the detection of local abnormality.

Description

A kind of method for detecting abnormality based on distribution probability measuring similarity
[technical field]
The present invention relates to machine learning field method for detecting abnormality, more particularly to one kind to be based on distribution probability measuring similarity Method for detecting abnormality.
[background technique]
When solving the problems, such as abnormality detection using machine learning method, there are no abnormal datas to be trained, part is different Often, each dimension dimension different distributions of data differ larger problem.The unsupervised part of data is solved according to suitable sorting algorithm Abnormality detection problem is one of the hot spot studied now to improve model to normal and whole exceptional sample discrimination.It solves At present for abnormality detection problem, conventional method is broadly divided into three types.The first is the method by mathematical statistics, is passed through The probability size that each dimension in statistical number strong point or each dimension various combination of data point occur judges whether exception.Second for based on The method for detecting abnormality of distance mainly judges to be by calculating test point with the degree of closeness of normal data is locally or globally gone up No is abnormal.The third is to judge whether it is abnormal by clustering or calculating the modes such as distribution relative density based on data distribution. But these methods have certain supposed premise: abnormal point and normal distribution cluster meet farther out, abnormal data density it is far low In normal data density etc..But in actual application environment, being distributed with for abnormal data may very be concentrated or away from normal number It is closer according to distribution cluster, or even have abnormal data and be wrapped in inside normal data cluster.The hypotheses that above-mentioned algorithm is done It excessively idealizes, this does not always set up even possibility very little in actual application environment, causes model recognition effect unstable It is fixed.In addition, will appear the situation of cluster exception in the events such as Epidemic outbreak of disease, network attack.It is a large amount of different when occurring extremely Regular data distribution concentrates on one or more distributed areas, that is to say, that spatial abnormal feature feature is more concentrated, is densely distributed, base Correct judgement can not be made to this situation in the serial algorithm of density.And in a practical situation, the case where anomalous concentration is that have Very big researching value, discrete abnormal point illustrates that such abnormal probability of happening is lower, and for the region of anomalous concentration Probability of happening is relatively high, detects that this exception can reduce loss to the full extent.Such as in network attack, if can be in time It detects cluster exception, finds out its attack mode, then can provide effective Informational support for network O&M worker, avoid system It is broken.
In conclusion for abnormality detection problem, there are following difficult points at present: abnormal data acquisition is more difficult;Abnormal number More closely there is local anomaly according to away from normal data, existing major part algorithm only judges abnormal journey by global or local data distribution Degree, can not comprehensively consider global and local distributed intelligence;Abnormal data distribution is more intensive, and distribution density is close with normal data Even higher, highdensity abnormal data cluster is easily judged as normal data by the algorithm based on relative density;Each attribute of data point Cloth range difference is larger, and by traditional distance calculating method such as Euclidean distance, weight has big difference between different dimensions, and early period Normalization, standardization can adapt data original distribution state again;Abnormal data is distributed in inside normal data, and is distributed Range has certain intersection;Have label training data concentrate data point distribution will not completely high density concentrate on a region, need to sieve The training data for having information redundancy is selected, guarantees that subsequent processing is interference-free.
[summary of the invention]
In view of this, the embodiment of the present invention proposes a kind of method for detecting abnormality based on distribution probability measuring similarity, To improve disaggregated model to the discrimination of positive negative sample entirety.
A kind of method for detecting abnormality based on distribution probability measuring similarity that the embodiment of the present invention proposes, comprising:
Multiple stochastical sampling obtains multiple subsets of normal sample data, complete with binary tree structure save each subset with Machine isolation processes delimit the threshold depth of backtracking according to drift ratio;
External leaf node position and the threshold depth that each tree is fallen according to test point, leaf node is recalled where it To the ancestor node of threshold depth, training data of all data as measurement and test point similarity under the node is extracted;
Remaining data points appearance is calculated separately in each attribute dimensions for endpoint with certain point in test point and training dataset Probability between this two o'clock calculates the dissimilar degree of all the points in test point and data set in conjunction with Min Shi distance, obtains this The exceptional value of point.
In the above method, multiple stochastical sampling obtains multiple subsets of normal sample data, with the preservation of full binary tree structure The random isolation processes of each subset, according to drift ratio delimit backtracking threshold depth method are as follows: by training dataset D with Machine samples to obtain several training subsets X_all, and each subset X contains m sample X={ X1, X2..., Xm, m is less than training The positive integer of data set D size, can select appropriate value according to the actual situation, and each sample point contains n dimension, i.e., i-th SampleRandomly select dimension and isolation threshold, isolation threshold be subset in certain dimension between Random value between its maximum value and minimum value;Continuous iteration, until meet following three conditions one of them, then terminate and change Generation: (1) spatially only one sample of each isolation;(2) spatially each sample point is identical in the dimension values;(3) reach Iteration limit number;By this process record in a tree structure, a complete binary tree is formed, each node can contain There is zero or two child node, what is saved in leaf node is the sample in each insulating space, and what internal node saved is isolation Dimension and corresponding threshold value, depth threshold of the Dt as the retrospective search neighbours training points in tree need according to each training points The average value E (h (x)) of depth h (x) is determined in each tree, as follows:
Wherein, E (h (x)) is the mean depth after sample x is traversed on all t isolation trees, and t is selected according to the actual situation Select suitable positive integer, liIt (x) is the pathdepth of i-th tree;
Need to be arranged a drift ratio r, 0≤r≤1, the i.e. relative depature in all normal training dataset D for Dt The ratio data of normal data distribution, r setting need to according in the dispersion degree and actual conditions of data distribution to model Indices demand is measured, and is selected before each training sample mean depth in ((1-r) * 100) % according to the drift ratio r of setting Minimum value as tree in retrospective search part training points depth threshold Dt.
In the above method, external leaf node position and the threshold depth of each tree are fallen according to test point, where it Leaf node traces back to the ancestor node of threshold depth, extracts all data under the node as measurement and test point similarity The method of training data are as follows: test point is sent into every one tree, if test sample falls in certain node under Dt depth, by test point Place node is recalled upwards, and until forefathers' node of Dt depth, training sample Ltd all under forefathers' node are extracted work For the data for next calculating test sample intensity of anomaly, the data extracted in all trees are incorporated as next instruction Practice sample, if test sample more than Dt depth, using data all in Ltd as next training sample, is surveyed this For sample sheet, farther out, local training data in its vicinity is less for the distribution in the overall situation away from normal data, therefore by Dt Under all training samples extract as next training sample, the intensity of anomaly of further validation test data.
In the above method, it is calculated separately in each attribute dimensions for endpoint with certain point in test point and training dataset There is the probability between this two o'clock in remainder strong point, and the dissmilarity of all the points in test point and data set is calculated in conjunction with Min Shi distance Degree, the method for obtaining the exceptional value of the point are as follows: firstly, by Ri(x, y) is defined as sample x and sample y in i-th dimension xiAnd yiTwo Region between value, at this time x ∈ Ltd.If S is the subspace set where all data of Ltd, SiFor space S i-th dimension sky Between distribution, ifSelect SiMost value be boundary, then Ri(x, y) is converted to Ri(x, S), Numi(x, y, d, S) is I-th dimension diWhether in RiBoolean in (x, y) range, wherein d is other samples in Ltd in addition to x, Mi(x,y|Ltd,S) It is as follows for the training points number in Ltd in i-th dimension between x and y:
Wherein, I () is indicator function, and condition in bracket is otherwise 0 if true, its value is 1;Then with Mi(x,y| Ltd, S) different degree of the ratio as x and y in i-th dimension in Ltd is accounted for, calculate different degree of the x and y in all dimensions D ' (x, y), as follows:
Wherein, p is the index value in Minkowski Distance, finally, the abnormality score p (y) of test point y is as follows:
Wherein, p (y) be the similarity at test point y and all midpoints Ltd and, test point is ranked up, p (y) is got over Greatly, intensity of anomaly is higher.
[Detailed description of the invention]
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field For those of ordinary skill, without any creative labor, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is that the process for the method for detecting abnormality based on distribution probability measuring similarity that the embodiment of the present invention is proposed is shown It is intended to.
[specific embodiment]
For a better understanding of the technical solution of the present invention, being retouched in detail to the embodiment of the present invention with reference to the accompanying drawing It states.
It will be appreciated that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its Its embodiment, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides the method for detecting abnormality based on distribution probability measuring similarity, referring to FIG. 1, it is this The flow diagram for the method for detecting abnormality based on distribution probability measuring similarity that inventive embodiments are proposed, as shown in Figure 1, Method includes the following steps:
Step 101, multiple stochastical sampling obtains multiple subsets of normal sample data, is saved with full binary tree structure each The random isolation processes of subset delimit the threshold depth of backtracking according to drift ratio.
Specifically, obtaining several training subsets X_all by training dataset D stochastical sampling, each subset X contains m Sample X={ X1, X2..., Xm, m is the positive integer less than training dataset D size, can select suitable number according to the actual situation Value, each sample point contain n dimension, i.e. i-th of sample Randomly select dimension and isolation threshold Value, isolation threshold are random value of the subset in certain dimension between its maximum value and minimum value;Continuous iteration, until meeting Below three conditions one of them, then terminate iteration: (1) spatially only one sample of each isolation;(2) spatially each Sample point is identical in the dimension values;(3) reach iteration limit number;By this process record in a tree structure, formed One complete binary tree, each node can contain zero or two child node, and what is saved in leaf node is each insulating space In sample, what internal node saved is the dimension and corresponding threshold value of isolation, and Dt is used as the retrospective search neighbours training points in tree Depth threshold, need the average value E (h (x)) of the depth h (x) in each tree according to each training points to determine, following institute Show:
Wherein, E (h (x)) is the mean depth after sample x is traversed on all t isolation trees, and t is selected according to the actual situation Select suitable positive integer, liIt (x) is the pathdepth of i-th tree;
Need to be arranged a drift ratio r, 0≤r≤1, the i.e. relative depature in all normal training dataset D for Dt The ratio data of normal data distribution, r setting need to according in the dispersion degree and actual conditions of data distribution to model Indices demand is measured, and is selected before each training sample mean depth in ((1-r) * 100) % according to the drift ratio r of setting Minimum value as tree in retrospective search part training points depth threshold Dt.
Algorithm 1 and algorithm 2 are the isolation processes of step 101 and the pseudocode of depth threshold setting method:
Step 102, external leaf node position and the threshold depth that each tree is fallen according to test point, leaf where it Node traces back to the ancestor node of threshold depth, extracts training of all data as measurement and test point similarity under the node Data.
Specifically, test point is sent into every one tree, if test sample falls in certain node under Dt depth, by test point institute Recall upwards in node, until forefathers' node of Dt depth, training sample Ltd all under forefathers' node are extracted into conduct The data extracted in all trees are incorporated as next training by the data for next calculating test sample intensity of anomaly Sample, if test sample more than Dt depth, using data all in Ltd as next training sample, tests this For sample, farther out, local training data in its vicinity is less, therefore will be under Dt for the distribution in the overall situation away from normal data All training samples are extracted as next training sample, the intensity of anomaly of further validation test data.
Algorithm 3 and algorithm 4 are the pseudocode that local training data method is extracted in step 102:
Step 103, its remainder is calculated separately for endpoint in each attribute dimensions with certain point in test point and training dataset There is the probability between this two o'clock in strong point, and the dissimilar journey of all the points in test point and data set is calculated in conjunction with Min Shi distance Degree, obtains the exceptional value of the point.
Specifically, firstly, by Ri(x, y) is defined as sample x and sample y in i-th dimension xiAnd yiRegion between two values, this When x ∈ Ltd.If S is the subspace set where all data of Ltd, SiFor space S i-th dimension spatial distribution range, ifSelect SiMost value be boundary, then Ri(x, y) is converted to Ri(x, S), Numi(x, y, d, S) is i-th dimension diWhether RiBoolean in (x, y) range, wherein d is other samples in Ltd in addition to x, Mi(x, y | Ltd, S) it is in Ltd i-th The training points number between x and y is tieed up, as follows:
Wherein, I () is indicator function, and condition in bracket is otherwise 0 if true, its value is 1;Then with Mi(x,y| Ltd, S) different degree of the ratio as x and y in i-th dimension in Ltd is accounted for, calculate different degree of the x and y in all dimensions D ' (x, y), as follows:
Wherein, p is the index value in Minkowski Distance, finally, the abnormality score p (y) of test point y is as follows:
Wherein, p (y) be the similarity at test point y and all midpoints Ltd and, test point is ranked up, p (y) is got over Greatly, intensity of anomaly is higher.
Table solves 10 public affairs first is that the embodiment of the present invention provides the method for detecting abnormality based on distribution probability measuring similarity When opening data set abnormality detection task, the pretreatment of data set is divided into normal data and abnormal data for all kinds of in data set.
Table one
Table solves 10 public affairs second is that the embodiment of the present invention provides the method for detecting abnormality based on distribution probability measuring similarity When opening data set abnormality detection task, AUC value (ranking of random selection positive sample is higher than the probability of random selection negative sample) Contrast and experiment, wherein in the embodiment of the present invention control methods be the type solution never KNN of balanced sort problem, Eight kinds of methods of iForest, SCiForest, iNNE, ALSH, L1SH, L2SH, KLSH.By table one, it can be concluded that, the present invention is mentioned Method DPSM out is concentrated in public data and is significantly improved in AUC value compared to control methods.It is especially real in the first seven group Have greatly improved in testing, proposition method is highest level in eight groups of methods, remaining two groups also close with highest level. The method that the embodiment of the present invention is proposed achieves certain breakthrough in method for detecting abnormality.
Table two
In conclusion the embodiment of the present invention has the advantages that
In the technical solution that the present invention is implemented, multiple stochastical sampling obtains multiple subsets of normal sample data, with complete two Fork tree construction saves the random isolation processes of each subset, and the threshold depth of backtracking delimited according to drift ratio;According to test point External leaf node position and the threshold depth for falling in each tree, the ancestors that leaf node where it traces back to threshold depth save Point extracts training data of all data as measurement and test point similarity under the node;With test point and training dataset Certain interior point is endpoint, probability of the remaining data points appearance between this two o'clock is calculated separately in each attribute dimensions, in conjunction with Min Shi Distance calculates the dissimilar degree of all the points in test point and data set, obtains the exceptional value of the point.According to embodiments of the present invention The technical solution of offer can effectively solve local anomaly test problems, can be using original training data according to where test point The distribution of regional area normal data obtains its intensity of anomaly, improve the local anomaly of abnormality detection model detectability and its Overall target.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (4)

1. a kind of method for detecting abnormality based on distribution probability measuring similarity, which is characterized in that the method step includes:
(1) repeatedly stochastical sampling obtain normal sample data multiple subsets, complete with binary tree structure save each subset with Machine isolation processes delimit the threshold depth of backtracking according to drift ratio;
(2) external leaf node position and the threshold depth that each tree is fallen according to test point, leaf node is recalled where it To the ancestor node of threshold depth, training data of all data as measurement and test point similarity under the node is extracted;
(3) remaining data points appearance is calculated separately for endpoint in each attribute dimensions with certain point in test point and training dataset Probability between this two o'clock calculates the dissimilar degree of all the points in test point and data set in conjunction with Min Shi distance, obtains this The exceptional value of point.
2. the method according to claim 1, wherein repeatedly stochastical sampling obtains multiple sons of normal sample data Collection, the random isolation processes of each subset are saved with full binary tree structure, and the threshold depth of backtracking, tool delimited according to drift ratio Body is described as follows: obtaining several training subsets X_all by training dataset D stochastical sampling, each subset X contains m sample X ={ X1, X2..., Xm, m is the positive integer less than training dataset D size, can select appropriate value according to the actual situation, each Sample point contains n dimension, i.e. i-th of sampleDimension and isolation threshold are randomly selected, is isolated Threshold value is random value of the subset in certain dimension between its maximum value and minimum value;Continuous iteration, until meeting following three A condition one of them, then terminate iteration: (1) spatially only one sample of each isolation;(2) spatially each sample point It is identical in the dimension values;(3) reach iteration limit number;By this process record in a tree structure, formation one is complete Full binary tree, each node can contain zero or two child node, and what is saved in leaf node is the sample in each insulating space This, what internal node saved is the dimension and corresponding threshold value of isolation, depth of the Dt as the retrospective search neighbours training points in tree Threshold value needs the average value E (h (x)) of the depth h (x) in each tree according to each training points to determine, as follows:
Wherein, E (h (x)) is the mean depth after sample x is traversed on all t isolation trees, and t selects to close according to the actual situation Suitable positive integer, liIt (x) is the pathdepth of i-th tree;
Need to be arranged a drift ratio r for Dt, 0≤r≤1, i.e., relative depature is normal in all normal training dataset D The setting of the ratio data of data distribution range, r need to be according to every to model in the dispersion degree and actual conditions of data distribution Index demand is measured, and is selected before each training sample mean depth in ((1-r) * 100) % most according to the drift ratio r of setting Depth threshold Dt of the small value as retrospective search part training points in tree.
3. the method according to claim 1, wherein falling in the external leaf node position of each tree according to test point It sets and threshold depth, leaf node where it traces back to the ancestor node of threshold depth, extract all data under the node and make To measure the training data with test point similarity, illustrate are as follows: test point is sent into every one tree, if test sample is fallen in Certain node under Dt depth, the node where test point are recalled upwards, will be under forefathers' node until forefathers' node of Dt depth All training sample Ltd are extracted as the following data for calculating test sample intensity of anomaly, will be extracted in all trees To data be incorporated as next training sample, if test sample more than Dt depth, using data all in Ltd as Next training sample, for this test sample, the distribution in the overall situation away from normal data farther out, in its vicinity Local training data is less, therefore training samples all under Dt are extracted as next training sample, further tests Demonstrate,prove the intensity of anomaly of test data.
4. the method according to claim 1, wherein with certain point in test point and training dataset for endpoint, Calculate separately remaining data points in each attribute dimensions and probability between this two o'clock occur, in conjunction with Min Shi distance calculate test point with The dissimilar degree of all the points, obtains the exceptional value of the point, illustrates in data set are as follows: firstly, by Ri(x, y) is defined as sample This x and sample y are in i-th dimension xiAnd yiRegion between two values, at this time x ∈ Ltd.If S is the subspace where all data of Ltd Set, SiFor space S i-th dimension spatial distribution range, ifSelect SiMost value be boundary, then Ri(x, y) turns It is changed to Ri(x, S), Numi(x, y, d, S) is i-th dimension diWhether in RiBoolean in (x, y) range, wherein d is that y is removed in Ltd Other samples in addition, Mi(x, y | Ltd, S) is the training points number in Ltd in i-th dimension positioned at x and y between, as follows:
Wherein, I () is indicator function, and condition in bracket is otherwise 0 if true, its value is 1;Then with Mi(x, y | Ltd, S) it accounts for Different degree of the ratio as x and y in i-th dimension in Ltd calculates different degree D ' (x, y) of the x and y in all dimensions, It is as follows:
Wherein, p is the index value in Minkowski Distance, finally, the abnormality score p (y) of test point y is as follows:
Wherein, p (y) be the similarity at test point y and all midpoints Ltd and, test point is ranked up, p (y) is bigger, different Chang Chengdu is higher.
CN201811233705.2A 2018-10-23 2018-10-23 A kind of method for detecting abnormality based on distribution probability measuring similarity Pending CN109508733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811233705.2A CN109508733A (en) 2018-10-23 2018-10-23 A kind of method for detecting abnormality based on distribution probability measuring similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811233705.2A CN109508733A (en) 2018-10-23 2018-10-23 A kind of method for detecting abnormality based on distribution probability measuring similarity

Publications (1)

Publication Number Publication Date
CN109508733A true CN109508733A (en) 2019-03-22

Family

ID=65745932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811233705.2A Pending CN109508733A (en) 2018-10-23 2018-10-23 A kind of method for detecting abnormality based on distribution probability measuring similarity

Country Status (1)

Country Link
CN (1) CN109508733A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110266680A (en) * 2019-06-17 2019-09-20 辽宁大学 A Method of Industrial Communication Anomaly Detection Based on Dual Similarity Measures
CN110377828A (en) * 2019-07-22 2019-10-25 腾讯科技(深圳)有限公司 Information recommendation method, device, server and storage medium
CN110781433A (en) * 2019-10-11 2020-02-11 腾讯科技(深圳)有限公司 Data type determination method and device, storage medium and electronic device
CN111639680A (en) * 2020-05-09 2020-09-08 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111784966A (en) * 2020-06-15 2020-10-16 武汉烽火众智数字技术有限责任公司 Personnel management and control method and system based on machine learning
CN112085053A (en) * 2020-07-30 2020-12-15 济南浪潮高新科技投资发展有限公司 Data drift discrimination method and device based on nearest neighbor method
CN112181706A (en) * 2020-10-23 2021-01-05 北京邮电大学 An anomaly detection method for power dispatching data based on logarithmic interval isolation
CN113204542A (en) * 2021-04-22 2021-08-03 武汉大学 Abnormal electricity sample cleaning and behavior recognition method
CN114022284A (en) * 2021-11-10 2022-02-08 中国工商银行股份有限公司 Abnormal transaction detection method and device, electronic equipment and storage medium
WO2022134578A1 (en) * 2020-12-22 2022-06-30 深圳壹账通智能科技有限公司 Method and apparatus for determining answer sequence

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064029A1 (en) * 2002-09-30 2004-04-01 The Government Of The Usa As Represented By The Secretary Of The Dept. Of Health & Human Services Computer-aided classification of anomalies in anatomical structures
CN102664961A (en) * 2012-05-04 2012-09-12 北京邮电大学 Method for anomaly detection in MapReduce environment
CN103400152A (en) * 2013-08-20 2013-11-20 哈尔滨工业大学 High sliding window data stream anomaly detection method based on layered clustering
CN103473540A (en) * 2013-09-11 2013-12-25 天津工业大学 Vehicle track incremental modeling and on-line abnormity detection method of intelligent traffic system
CN104317681A (en) * 2014-09-02 2015-01-28 上海交通大学 Behavioral abnormality automatic detection method and behavioral abnormality automatic detection system aiming at computer system
WO2015167562A1 (en) * 2014-04-30 2015-11-05 Hewlett-Packard Development Company, L.P. Using local memory nodes of a multicore machine to process a search query
US20150332523A1 (en) * 2014-05-19 2015-11-19 EpiSys Science, Inc. Method and apparatus for biologically inspired autonomous infrastructure monitoring
CN106503086A (en) * 2016-10-11 2017-03-15 成都云麒麟软件有限公司 The detection method of distributed local outlier
CN106598822A (en) * 2015-10-15 2017-04-26 华为技术有限公司 Abnormal data detection method and device applied to capacity estimation
CN107426207A (en) * 2017-07-21 2017-12-01 哈尔滨工程大学 A kind of network intrusions method for detecting abnormality based on SA iForest
CN107657288A (en) * 2017-10-26 2018-02-02 国网冀北电力有限公司 A kind of power scheduling flow data method for detecting abnormality based on isolated forest algorithm
CN108333314A (en) * 2018-04-02 2018-07-27 深圳凯达通光电科技有限公司 A kind of air pollution intelligent monitor system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064029A1 (en) * 2002-09-30 2004-04-01 The Government Of The Usa As Represented By The Secretary Of The Dept. Of Health & Human Services Computer-aided classification of anomalies in anatomical structures
CN102664961A (en) * 2012-05-04 2012-09-12 北京邮电大学 Method for anomaly detection in MapReduce environment
CN103400152A (en) * 2013-08-20 2013-11-20 哈尔滨工业大学 High sliding window data stream anomaly detection method based on layered clustering
CN103473540A (en) * 2013-09-11 2013-12-25 天津工业大学 Vehicle track incremental modeling and on-line abnormity detection method of intelligent traffic system
WO2015167562A1 (en) * 2014-04-30 2015-11-05 Hewlett-Packard Development Company, L.P. Using local memory nodes of a multicore machine to process a search query
US20150332523A1 (en) * 2014-05-19 2015-11-19 EpiSys Science, Inc. Method and apparatus for biologically inspired autonomous infrastructure monitoring
CN104317681A (en) * 2014-09-02 2015-01-28 上海交通大学 Behavioral abnormality automatic detection method and behavioral abnormality automatic detection system aiming at computer system
CN106598822A (en) * 2015-10-15 2017-04-26 华为技术有限公司 Abnormal data detection method and device applied to capacity estimation
CN106503086A (en) * 2016-10-11 2017-03-15 成都云麒麟软件有限公司 The detection method of distributed local outlier
CN107426207A (en) * 2017-07-21 2017-12-01 哈尔滨工程大学 A kind of network intrusions method for detecting abnormality based on SA iForest
CN107657288A (en) * 2017-10-26 2018-02-02 国网冀北电力有限公司 A kind of power scheduling flow data method for detecting abnormality based on isolated forest algorithm
CN108333314A (en) * 2018-04-02 2018-07-27 深圳凯达通光电科技有限公司 A kind of air pollution intelligent monitor system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NISHAD P ET AL: "Anomaly detection for IGBTs using Mahalanobis distance", 《MICROELECTRONICS RELIABILITY》 *
董国宾等: "基于RFID路径数据的异常路径检测", 《计算机应用研究》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110266680B (en) * 2019-06-17 2021-08-24 辽宁大学 An Anomaly Detection Method for Industrial Communication Based on Double Similarity Metrics
CN110266680A (en) * 2019-06-17 2019-09-20 辽宁大学 A Method of Industrial Communication Anomaly Detection Based on Dual Similarity Measures
CN110377828A (en) * 2019-07-22 2019-10-25 腾讯科技(深圳)有限公司 Information recommendation method, device, server and storage medium
CN110377828B (en) * 2019-07-22 2023-05-26 腾讯科技(深圳)有限公司 Information recommendation method, device, server and storage medium
CN110781433A (en) * 2019-10-11 2020-02-11 腾讯科技(深圳)有限公司 Data type determination method and device, storage medium and electronic device
CN110781433B (en) * 2019-10-11 2023-06-02 腾讯科技(深圳)有限公司 Data type determining method and device, storage medium and electronic device
CN111639680A (en) * 2020-05-09 2020-09-08 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111639680B (en) * 2020-05-09 2022-08-09 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111784966A (en) * 2020-06-15 2020-10-16 武汉烽火众智数字技术有限责任公司 Personnel management and control method and system based on machine learning
CN112085053B (en) * 2020-07-30 2022-08-26 山东浪潮科学研究院有限公司 Data drift discrimination method and device based on nearest neighbor method
CN112085053A (en) * 2020-07-30 2020-12-15 济南浪潮高新科技投资发展有限公司 Data drift discrimination method and device based on nearest neighbor method
CN112181706A (en) * 2020-10-23 2021-01-05 北京邮电大学 An anomaly detection method for power dispatching data based on logarithmic interval isolation
CN112181706B (en) * 2020-10-23 2023-09-22 北京邮电大学 Power dispatching data anomaly detection method based on logarithmic interval isolation
WO2022134578A1 (en) * 2020-12-22 2022-06-30 深圳壹账通智能科技有限公司 Method and apparatus for determining answer sequence
CN113204542A (en) * 2021-04-22 2021-08-03 武汉大学 Abnormal electricity sample cleaning and behavior recognition method
CN113204542B (en) * 2021-04-22 2023-08-22 武汉大学 Abnormal electricity consumption sample cleaning and behavior recognition method
CN114022284A (en) * 2021-11-10 2022-02-08 中国工商银行股份有限公司 Abnormal transaction detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109508733A (en) A kind of method for detecting abnormality based on distribution probability measuring similarity
CN111753985B (en) Image deep learning model testing method and device based on neuron coverage rate
CN113887616A (en) Real-time abnormity detection system and method for EPG (electronic program guide) connection number
CN111833172A (en) Consumption credit fraud detection method and system based on isolated forest
Arbin et al. Comparative analysis between k-means and k-medoids for statistical clustering
US20080306715A1 (en) Detecting Method Over Network Intrusion
CN106153340B (en) A kind of Fault Diagnosis of Roller Bearings
CN112784881A (en) Network abnormal flow detection method, model and system
CN111562108A (en) An Intelligent Fault Diagnosis Method of Rolling Bearing Based on CNN and FCMC
KR102387885B1 (en) Method for refining clean labeled data for artificial intelligence training
CN114707571B (en) Credit data anomaly detection method based on enhanced isolation forest
CN106897392A (en) Technology competition and patent prewarning analysis method that a kind of knowledge based finds
CN110493221A (en) A kind of network anomaly detection method based on the profile that clusters
CN110246134A (en) A kind of rail defects and failures sorter
CN115628776B (en) A method for detecting abnormal data in water supply network
US20250036541A1 (en) Method for health evaluation based on intelligent operation and maintenance scenarios, and device thereof
Bruzzese et al. DESPOTA: DEndrogram slicing through a pemutation test approach
CN109902754A (en) An efficient semi-supervised multi-level intrusion detection method and system
Xu et al. An improved LOF outlier detection algorithm
CN105824785A (en) Rapid abnormal point detection method based on penalized regression
CN112884167B (en) Multi-index anomaly detection method based on machine learning and application system thereof
CN115841110B (en) A method and system for acquiring scientific knowledge discovery
CN111652733B (en) Financial information management system based on cloud computing and block chain
Mercioni et al. Evaluating hierarchical and non-hierarchical grouping for develop a smart system
CN113792141A (en) Feature selection method based on covariance measure factor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190322