Online Anomaly Detection Using Non-Parametric Technique For Big Data Streams in Cloud Collaborative Environment
Online Anomaly Detection Using Non-Parametric Technique For Big Data Streams in Cloud Collaborative Environment
Abstract— Big Data and cloud computing are complementary complementary to each other and have inherent connection of
technological paradigms with a core focus on scalability, agility, dialectical unity. The challenges of big data in the cloud are
and on-demand availability. The rise of cloud computing and how to balance the cost (operational and infrastructure) and
cloud data stores have been a precursor and facilitator to the business impact (customer satisfaction, time to market and
emergence of big data. Cloud computing turns traditional siloed competiveness). Because of the high volume and velocity of
computing assets into shared pools of resources that are based on huge information, it is a successful alternative to store
an underlying internet foundation. As a result a number of enormous information in the cloud, as the cloud has the
enterprises are building efficient and agile cloud environments, capacity of putting away huge information and preparing high
and cloud providers continue to expand service offerings. Many
volume of client/customer/user access demands. With the
cloud providers offer online collaboration service which is
basically loosely-coupled in nature. Online anomaly detection
development of cloud computing base technology [2], we now
aims to detect anomalies in data flowing in a streaming fashion. have enough computing power to process the large-scale data,
Such stream data is commonplace in today’s cloud centric high velocity and varied formats of data.
collaborations which enables participating domains to Cloud computing is a business model in the computing
dynamically interoperate through sharing and accessing of world. According to the official NIST definition, "cloud
information. Accordingly to forestall unauthorized disclosure of computing is a model for enabling ubiquitous, convenient, on-
the shared resources and conceivable misappropriation, there is a
demand network access to a shared pool of configurable
need to identify anomalous access requests. To the best of our
computing resources (e.g., networks, servers, storage,
knowledge, the detection of anomalous access requests in cloud-
based collaborations through non-parametric statistical applications and services) that can be rapidly provisioned and
technique has not been studied in earlier works. This paper released with minimal management effort or service provider
proposes an online anomaly detection algorithm based on non- interaction."Three most popular cloud paradigms are : (i)
parametric statistical technique to detect anomalous access Infrastructure-as-a-Service (IaaS), (ii) Platform-as-a-Service
requests in cloud environment at runtime. (PaaS), and (iii) Software-as-a-Service (SaaS). One of the
prominent offerings of the SaaS cloud is the online
Keywords—Big data, cloud collaboration, online anomaly collaboration service [3]. In this service, the cloud merchant
detection, non-parametric statistical technique gives online collaboration tools which encourage quicker
sharing and accessing of data. Examples of cloud-based
I. INTRODUCTION collaboration incorporate document-driven collaboration,
Big data is among the most promising research trends of wikipages, file sharing and synchronization, blogs, project
the decade, drawing attention from every segment of the management and so on.
market and society. According to Adrian [1] “Big data Many cloud providers, such as, Google Cloud Storage,
exceeds the reach of commonly used hardware environments Microsoft Azure, and so on offer online collaboration service
and software tools to capture, manage, and process it with in a which is basically loosely-coupled in nature. In such
tolerable elapsed time for its user population.” This flood of collaborations, autonomous domains dynamically interoperate
data is generated by connected devices—from PCs and smart and uncover just constrained data about their services and
phones to sensors such as RFID readers and traffic cams. Plus, policies, relevant to collaboration. It is standard practice for
it’s heterogeneous and comes in many formats, including text, systems to deploy some monitoring mechanism that remains
document, image, video, and more. This new scenario can be in constant vigilance against any deviation of normal system
characterized by means for those issues that can't be behavior. Such deviations are always a sign of some negative
adequately or proficiently addressed using the standard phenomena that call for remedial measures. The digital
computing resources that we right now have. Due to focused on assault is quickly developing as a shrewd, diligent
bottlenecks such as poor scalability, difficulties in installation social and national risk. It is gone for taking industries insider
and maintenance, fault tolerance and low performance in facts or military privileged insights from significant
traditional information technology framework, we need to government organizations or endeavours and client data from
leverage cloud computing techniques and solutions to deal different savvy gadgets and personal computers, incapacitating
with big data problems. Cloud computing and big data are the industries control framework and subsequently bringing on
1951
Conversely, the proposed strategy is lightweight when confinements in the state-of-the-art have spurred the writers to
combined with windowing-based systems that observe address this issue.
changes in the distribution of data. Table I shows a survey of
techniques on anomaly detection. III. CATEGORIZATION OF STATISTICAL METHODS FOR ANOMALY
DETECTION
Anomaly detection techniques have likewise been
proposed for entirely temporal data. Gupta et al. [20] present Statistical methods for anomaly detection can be broadly
an outline of identifying anomalies on different types of classified as parametric, non-parametric and semi-parametric
temporal data. They characterize anomalies as outliers and techniques.
present detection strategies for both the discrete and the A. Parametric techniques
continuous cases. While the outline exhibits a wide bunch of
Parametric techniques assumes that sample data comes
various methods, it likewise specifies that there is a lot of
from a population that follows a probability distribution based
various issues that can be tended to by detecting outliers on
on a fixed set of parameters. The approaches to estimate the
time series data and that arrangements should be adapted to
parameters are numerous and they search distinctive for
meet the necessities of particular issues. For stream data, the
various distribution of the data. The larger sets of training
utilization of models that update and decay over time is
data the higher probability it is to get a good estimate on the
recommended.
parameters, yet the most critical is to have the right sort of
The major assumption in statistical technique is that strategy to the right type of data.
normal data instances occur in higher probability region of a
stochastic model, while anomalies occur in the low probability B. Non-parametric techniques
region. Both parametric and non parametric techniques are Non-parametric techniques make no supposition on the
connected to fit statistical model. Parametric techniques probability distribution of the data. The difference between
assumes the learning of the fundamental distribution also, parametric models and non-parametric models is that the
evaluate the parameters from the given data. It plays out a former has a fixed number of parameters, while the latter
statistical hypothesis testing, where the null hypothesis grows the number of parameters with the amount of training
suggests that the data instance is created by the evaluated data. The parameters that these methods contain are usually
distribution [21]. In the event that the statistical test rejects the mean, median, variance and similar comparative measures.
null hypothesis, the data point is an anomaly. The literature on Accordingly are these techniques extremely reliant on the
parametric techniques can be categorized as Gaussian model data, which means you will require large sets of training data
and Regression model. The variability in the process over a and regardless of the possibility that you do have it, it doesn't
prolonged period of time is considered and interpreted by promise great forecasts. The reason is that the training data
utilizing the historical data. From this, an assessment of the needs to have reached almost all kinds of possible sets of
variability of the data is described and threshold limits are values so that the technique knows there is a probability to
computed. This methodology maintains a strategic distance achieve that set later on. For occurrence on the off chance if
from the need to make presumptions about the state of the we have three sets of possible outcomes and the training data
distribution of the data. Such strategies are known as non- just hits two of them, then the strategy will be shaped as
parametric techniques. though there is no plausibility or a little probability that it hits
the third set. Consequently it is imperative to utilize the right
Statistical metrics that show sliding windows in streams type of data and enough data to frame these techniques.
have likewise been proposed for identifying anomalies. Datar
et al. [22] present an approximate stream summary insights for C. Semi-parametric techniques
sliding windows. Since regularities in streams may develop Semi-parametric technique make utilization of accessible
over time, the issue of data decay is taken care of by giving training data to both assess the parameters, as parametric
more weight to recent objects and aggregating, then in the techniques, furthermore to frame the strategy, as non-
long run disposing of more established data. The creators store parametric techniques. Accordingly these sorts of techniques
data utilizing exponential histograms. This data structure are a mix of parametric and non-parametric techniques. They
utilizes timestamps as the bins furthermore, the count of the are frequently utilized as a part of circumstances where a non-
item in the stream as the value for each transient extent. It is parametric technique would not execute as they wish or when
appropriate for computing approximate statistics with bounded a researcher needs to utilize a parametric technique however
errors to abstract parts of the content of a stream. inadequate with regards to information about the data a priori.
As apparent from the above survey, the state-of-the-art on The greater part of past studies on online anomaly
anomaly detection does not address recognition of atypical detection have taken parametric or semi-parametric models on
requests in a cloud environment utilizing non-parametric probabilistic distributions, i.e., arbitrary variables are created
statistical techniques. Such requests, if not recognized, can by known distributions for example, Gaussian or Bernoulli
trade off a genuine user’s account driving to unauthorized distributions. Be that as it may, parametric models may not
revelation of the shared data. Additionally, researchers have generally be accessible in applications. Much of the time,
utilized the parametric and non-parametric statistical strategies distributions can be arbitrary, and may not be Gaussian or
in various spaces, in any case, no work which has utilized non- Bernoulli. Moreover, distributions may not be known in
parametric statistical technique as a part of securing cloud- advance. Henceforth, it is desirable to develop non-parametric
based collaborations is accounted for. Consequently, these methods that are distribution free.
1952
IV. ON-STREAM ANOMALY DETECTION FRAMEWORK window. Wj will be looked at against all already seen
On-stream anomaly detection includes preprocessing of windows by means of an appropriate statistical technique. If
crude stream information as the underlying stride. the test statistic TS is underneath a pre-characterized limit, the
Preprocessing performs essential commotion end and changes match counter Ik of the kth window Wk which is as of now
over the information into multidimensional time series being contrasted with Wj is increased. On the off chance that
information that can be subjected to machine learning for Ik surpasses a pre-defined limit ITH, Wj is pronounced as
prescient purposes. Stream data analytics incorporates two benign. In the event that no window is found whose match
prominent methodologies, to be specific, data mining and counter surpasses ITH, Wj is flagged as anomalous.
statistical. K-means is a prominent semi-supervised or V. PROPOSED WORK
unsupervised data mining strategy that fabricates clusters of
data points from prep with the presumption that the clusters Recognizing individual data points as anomalous can
are benign. In the event that a data point falls into any of the bring about false alerts. To dodge this, one may try to analyze
clusters, it is marked as benign, else it is labeled as an windows of data comprising of an accumulation of data points
anomaly. Keeping in mind the end goal to oblige the and after that make a determination on the window. Since we
advancement of stream data, the training model should be are working with collections of points as opposed to individual
upgraded intermittently. ones, we concentrate on changes in probability distribution
contrasted with a reference set of the expected probability
distribution. Thus, we propose a way to deal with detecting
anomalies which show as changes in the system behavior
conflicting with what is normal.
The methodology depends on a classical inquiry in
hypothesis testing [23], in particular figuring out whether the
observed data is reliable with a given distribution. In the
classical theory of hypothesis testing, we are required to figure
out which of the two hypothesis, the null hypothesis and the
alternate hypothesis, best depicts the data. The two hypothesis
are each characterized by a single distribution or a collection
of distributions. Formally, let ଵ ǡ ଶ ǥ ୯ indicate the
observed sample of size q. Let M0 indicate the distribution
Fig. 1. Classification based on-stream framework
representing the null-hypothesis and let M1 the alternate
hypothesis. At that point the ideal test for minimizing the
Figure 1 portrays basic flow of a classification based on-
probability of falsely rejecting the null hypothesis under a
stream framework. The training model is built using input
constraint on the probability of incorrectly accepting the null
stream data, which in turn used for making predictions on
hypothesis is given by the Neyman-Pearson theorem. The
future data. It is critical to occasionally update the training
theorem expresses that the optimal test is given by figuring out
model in light of the dynamic nature of data.
likelihood ratio
A. Algorithm : Statistical Algorithm for Anomaly Detection
బ ሺଡ଼భ ǡଡ଼మ ǥଡ଼౧ ሻ
Input: (W, TH, ITH) T (1)
భ ሺଡ଼భ ǡଡ଼మ ǥଡ଼୯ሻ
1: Store data of jth time interval to Wj
2: Add Wj to the set of previously stored windows W where M0 (ଵ ǡ ଶ ǥ ୯ ) is the probability doled out to the
3: For k=1 to j-1 do observed sample by M0, M1 (ଵ ǡ ଶ ǥ ୯ ) is the corresponding
probability allotted by M1, and T is a threshold that can be
4: Compute Test Statistic, TS, for (Wk, Wj) resolved taking into account the requirement on the
5: If TS<TH probability of incorrectly accepting the null hypothesis.
6: Ik++ Now and then the alternate hypothesis is just the statement
that the observed data is not drawn from the null hypothesis.
7: If Ik>ITH Tests intended for this design are called goodness-of-fit tests.
8: Set status = benign For this issue, we conjure a specific test known as the
multinomial goodness-of-fit test that is applied to a situation
9: Else where the data X݅ are arbitrary variables that can take at most
10: Set status = anomalous k values, say {1, 2, . . . , k}. Let M0 = (m1, m2, ......, mk)where
Ȉimi= 1indicate the distribution relating to the null hypothesis
This algorithm gives a framework of the anomaly (mi indicates the probability of observing ݅). Let qi signify the
detection process by a non-parametric statistical method. It number of times ݅ was seen in the sample (ଵ ǡ ଶ ǥ ୯ ). Let
portrays a general methodology to detect anomaly. It = (
ෝ ଵ,
ෝ ଶ , ...,
ෝ ୩ ሻǡwhere
ෝ ୧ = qi/q indicate the empirical
intermittently gathers data from the approaching stream. Data distribution of the observed sample ଵ ǡ ଶ ǥ ୩ . At that point
gathered at the jth time interim will be put away in the jth the likelihood ratio in (1) reduces to
1953
୯ characteristic of data. After receiving a window, the
ෞన ୯
ς୩୧ୀଵ
ෝ ୧ framework generates its unknown distribution. It stores the
ൌ ୩ ൌ
ෝ ୧ (2) most recent N set of distributions (past N models). We will
ς୧ୀଵ ୧ ୯ ୧
୧ୀଵ consider the current model as anomaly if it does not match any
past N models.
For the purpose of hypothesis testing, we have to look at
entropy between two probability distributions; the reference A. Algorithm : Proposed Online Anomaly Detection
model and the monitoring data. In the monitoring context we Input: (DS, NOB, DSmin, DSmax, TH, NOWTH)
will try to match the entropy of a reference set M(x) with the 1: Initialize n=0
entropy of the current window of data, N(x). At the end of the
day, we have to check whether the system is behaving now 2: Procedure OAD (DS)
(sample distribution) as it has previously (historical 3: Begin
distribution). 4: Take each instance of DS and extracts a data point, DP
The relative entropy (or Kullback-Leibler divergence) with its value
between two distributions N = (n1, n2, . . . ,n݇) and M= (m1, 5: Generic bin index,
m2, . . . , m݇) that are defined over the same classification Bcurrent = ((DP-DSmin)/(DSmax-DSmin))*NOB
criteria is given by:
6: Return Bcurrent with its frequency
୬
ሺȁȁሻ ൌ σ ݊ (3)
୫ 7: Count the total number of frequencies for each bin
8: Calculate the Empirical Frequency,
Thus the likelihood ratio is =ܮq(כMפפM). Note that the
relative entropy has particular properties. The contentions are 9: If n=0
epistemologically differentiated: N speaks to the observations, 10: Set M1=
the data, what we see, while M speaks to the expectations, the 11: n=1
models, what we accept D(N||M) is an unbalanced measure of
disparity between empirical model (m(x)) and observed 12: NOW1=1
distributions (n(x)). D(N||M) is an asymmetrical measure of 13: Declare the current window as benign
dissimilarity between empirical model (M(x)) and observed 14: ȁȁMi)<TH) for any hypothesis Mi; in
Else if D(
distributions (N(x)). This dissimilarity measure is such that
Ͳ ሺȁȁሻ λ where D(N||M) = 0 iff N Ł M. Another 15: Increment NOWi by 1 (if more than one such i
important property of relative entropy is that if the null ȁȁMi))
exists, select the one with the lowest D(
hypothesis M is true, as the number of samples q grows, 16: If NOWi > NOWTH
2 כq (כ פפM) converges on a Chi-Square distribution with k-1
degrees of freedom. . The multinomial goodness-of-fit test 17: Declare the window as benign
relies on this property of entropy. This chi-square relationship 18: Else
is imperative in hypothesis testing as it permits us to analyze 19: Declare the window as anomalous
2 * q * D(N||M) with a threshold that is resolved in light of the
cumulative distribution function (cdf) of the chi-squared 20: Increment NOWi by 1
distribution and a coveted upper bound on the false negative 21: Else
probability.
22: Declare the window as anomalous
The relative entropy based methodologies depicted here 23: Increment n by 1
have a few focal points. They are non-parametric, in particular
they do not assume any predefined form for the distribution of 24: Set Mn=
the data. They can be effectively reached out to handle multi- 25: NOWn=1
dimensional time series accordingly consolidating correlation, 26: end Procedure
and can be adjusted to workloads whose distribution changes
over time. Even though relative entropy based techniques and where DS is the input data stream, NOB is the number of
hypothesis testing methodologies have been used for anomaly bins into which DS has to be quantized, DSmin and DSmax are the
detection, we are not aware of any work that uses it in the minimum and maximum values that DS can take, TH is the
context of multinomial goodness of fit test for anomaly threshold item which the test statistic is compared and it is
detection in cloud centric collaborations. The latter gives an usually set to that point in the chi squared cdf NOB-1 degrees
orderly route (taking into account the cdf of the chi-squared of freedom that corresponds to 0.95 or 0.99 and NOWTH is the
distribution) to pick an edge above which alerts are raised. threshold against which NOWi is compared to determine if a
The proposed online anomaly detection algorithm hypothesis has occurred frequently enough, NOWi is a score
illustrates the statistical procedure for anomaly detection. which tracks the number of windows that agrees with
Here, we have focused more on a set of data points in a hypothesis Mi.
particular time interval called window rather than an As we do not have any prior knowledge of data, we have
individual data point. An anomalous window conveys more utilized a bin-based histogram method to determine the
information and represents a pattern or distribution of unusual unknown distribution in step(5). The algorithm then calculates
1954
the empirical frequency, ǤNote that M1, M2, ……Mn denote References
the n null hypothesis at any instance of time. A test-statistic
and Mi is calculated and each of the Mi is [1] M. Adrian, "Big Data," Teradata Magazine. [Online]. Available:
including http://www.teradatamagazine.com/v11n01/Features/Big-Data/ (accessed
calculated and contrasted with the threshold TH in Step(14). If on March 29, 2013).
the test-statistic surpasses the threshold for all of the null- [2] Kim, Hyunjoo, et al. "Behavior-based anomaly detection on big data."
hypotheses, the window is declared anomalous, n is 13th Australian Information Security Management Conference (2015) :
augmented by 1 and a new hypothesis is made in light of the 73-80.
present window. When the test statistic is being matched with [3] Liu, Fang, et al. "NIST cloud computing reference architecture." NIST
past hypotheses, several hypotheses can have lower test special publication 500.2011 (2011): 292.
statistics than the threshold. So a hypothesis which has the [4] Baker, L. Douglas, et al. "A hierarchical probabilistic model for novelty
detection in text." Proceedings of International Conference on Machine
lowest test statistics has to be selected. If several hypotheses Learning. 1999.
have the same lowest statistics, then it selects the hypothesis
[5] Cabrera, João BD, Lundy Lewis, and Raman K. Mehra. "Detection and
which has highest score, NOWi. The score means how many classification of intrusions and faults using sequences of system calls."
times it matches a hypothesis. If any past hypothesis is Acm sigmod record 30.4 (2001): 25-34.
matched by current test statistic, then we increase the score of [6] Yeung, Dit-Yan, and Calvin Chow. "Parzen-window network intrusion
that past hypothesis. If the smallest test-statistic is less than the detectors." Pattern Recognition, 2002. Proceedings. 16th International
threshold but corresponds to a hypothesis that was accepted Conference on. Vol. 4. IEEE, 2002.
less than NOWTH times previously, then the window is [7] Ghosh, Anup K., Aaron Schwartzbard, and Michael Schatz. "Learning
announced anomalous, however the number of appearances of Program Behavior Profiles for Intrusion Detection." Workshop on
Intrusion Detection and Network Monitoring. Vol. 51462. 1999.
that hypothesis is augmented. If neither of the two conditions
[8] Buzen, Jeffrey P., and Annie W. Shum. "Masf-multivariate adaptive
is fulfilled, the window is pronounced anomalous and the statistical filtering." Int. CMG Conference. 1995.
proper accounting performed.
[9] Chitrakar, Roshan, and Chuanhe Huang. "Anomaly based Intrusion
B. Correctness and completeness of the proposed algorithm Detection using Hybrid Learning Approach of combining k-Medoids
Clustering and Naïve Bayes Classification." Wireless Communications,
Correctness: For n=0, the window is considered as benign Networking and Mobile Computing (WiCOM), 2012 8th International
and n is then set to 1 and the first null hypothesis is developed. Conference on. IEEE, 2012.
M1, M2, ……Mn denote the n null hypothesis at any instance [10] Fu, Song, Jianguo Liu, and HusanbirPannu. "A hybrid anomaly
of time. The algorithm intermittently gathers data from the detection framework in cloud computing using one-class and two-class
support vector machines." International Conference on Advanced Data
approaching stream. Data gathered at the ith time interim will Mining and Applications. Springer Berlin Heidelberg, 2012.
be put away in the ith window. Wi will be looked at against all
[11] Farid, DewanMd, NouriaHarbi, and Mohammad Zahidur Rahman.
already seen windows by means of multinomial goodness of "Combining naive bayes and decision tree for adaptive intrusion
fit If the test statistic D( ȁȁMi) is underneath a pre- detection."arXiv preprint arXiv:1005.4496 (2010).
characterized limit, then NOWi is augmented by 1 and if [12] Yasami, Yasser, and Saadat Pour Mozaffari. "A novel unsupervised
NOWi is greater than a predefined threshold NOWTH then the classification approach for network anomaly detection by k-Means
window is declared as benign else declared as anomalous. If clustering and ID3 decision tree learning methods." The Journal of
Supercomputing53.1 (2010): 231-245.
the test-statistic surpasses the threshold for all of the null-
[13] Chandola, Varun, Arindam Banerjee, and Vipin Kumar. "Anomaly
hypotheses, the window is declared anomalous, n is detection: A survey." ACM computing surveys (CSUR) 41.3 (2009): 15.
augmented by 1 and a new hypothesis is made in light of the
[14] Hodge, Victoria J., and Jim Austin. "A survey of outlier detection
present window. If none of the conditions is satisfied, the methodologies." Artificial intelligence review 22.2 (2004): 85-126.
window is declared as anomalous. [15] Agyemang, Malik, Ken Barker, and RadaAlhajj. "A comprehensive
Completeness: Trivial, no assumption is made on the survey of numeric and symbolic outlier mining techniques." Intelligent
Data Analysis10.6 (2006): 521-538.
incoming data stream DS and a window can be checked for
[16] Markou, Markos, and Sameer Singh. "Novelty detection: a review-part
anomaly if the input that program accepts contains at least one 1: statistical approaches." Signal processing 83.12 (2003): 2481-2497.
data point. [17] Patcha, Animesh, and Jung-Min Park. "An overview of anomaly
Termination: At least one data point can be solved per detection techniques: Existing solutions and latest technological trends."
Computer networks 51.12 (2007): 3448-3470.
time interval and the size of the window is finite.
[18] Beckman, Richard J., and R. Dennis Cook. "Outlier………
s."Technometrics 25.2 (1983): 119-149.
VI. CONCLUSION
[19] Bakar, Zuriana Abu, et al. "A comparative study for outlier detection
In order to cope up with the security breaches in cloud techniques in data mining." 2006 IEEE conference on cybernetics and
collaborative environment where customers can share and intelligent systems. IEEE, 2006.
access resources remotely, an online anomaly detection [20] Gupta, Manish, et al. "Outlier detection for temporal data." Synthesis
algorithm based on non-parametric statistical technique for big Lectures on Data Mining and Knowledge Discovery 5.1 (2014): 1-129.
data streams has been proposed. The proposed algorithm [21] Bamnett, V., and T. Lewis. "Outliers in statistical data." vol. 3. Wiley,
detects anomalous access requests to unauthorized shared New York(1994).
resources at runtime. [22] Datar, Mayur, et al. "Maintaining stream statistics over sliding
windows."SIAM journal on computing 31.6 (2002): 1794-1813.
[23] Lehmann, Erich L., and Joseph P. Romano. Testing statistical
hypotheses. Springer Science & Business Media, 2006.
1955