Online Anomaly Detection Using Non-Parametric Technique For Big Data Streams in Cloud Collaborative Environment

The document discusses online anomaly detection in cloud environments. It proposes an algorithm using non-parametric statistical techniques to detect anomalous access requests in cloud collaborations. Key challenges include processing big data streams in cloud environments and detecting anomalies for security and operational purposes.

Uploaded by

mindworkz pro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views6 pages

Online Anomaly Detection Using Non-Parametric Technique For Big Data Streams in Cloud Collaborative Environment

Uploaded by

mindworkz pro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2016 IEEE International Conference on Big Data (Big Data)

Online Anomaly Detection using Non-Parametric

Technique for Big Data Streams in Cloud
Collaborative Environment
Smrithy G S Sathyan Munirathinam Dr. Ramadoss Balakrishnan
Research ScholarSmrithy G S Research Scholar Professor
Ramadoss Balakrishnan
Department of Computer Applications Department of Computer Science Department of Computer Applications
Research Scholar, Dept. of Computer Applications Professor, Dept. of Computer Applications
National Institute of Technology Bharathiar University National Institute of Technology
National Institute of Technology, Tiruchirappalli National Institute of Technology, Tiruchirappalli
Tiruchirappalli, India Coimbatore, India Tiruchirappalli, India
Tiruchirappalli, India Tiruchirappalli, India
smrithygs1990@gmail.com sathyan.munirathinam@gmail.com brama@nitt.edu
smrithygs1990@gmail.com brama@nitt.edu

Abstract— Big Data and cloud computing are complementary complementary to each other and have inherent connection of
technological paradigms with a core focus on scalability, agility, dialectical unity. The challenges of big data in the cloud are
and on-demand availability. The rise of cloud computing and how to balance the cost (operational and infrastructure) and
cloud data stores have been a precursor and facilitator to the business impact (customer satisfaction, time to market and
emergence of big data. Cloud computing turns traditional siloed competiveness). Because of the high volume and velocity of
computing assets into shared pools of resources that are based on huge information, it is a successful alternative to store
an underlying internet foundation. As a result a number of enormous information in the cloud, as the cloud has the
enterprises are building efficient and agile cloud environments, capacity of putting away huge information and preparing high
and cloud providers continue to expand service offerings. Many
volume of client/customer/user access demands. With the
cloud providers offer online collaboration service which is
basically loosely-coupled in nature. Online anomaly detection
development of cloud computing base technology [2], we now
aims to detect anomalies in data flowing in a streaming fashion. have enough computing power to process the large-scale data,
Such stream data is commonplace in today’s cloud centric high velocity and varied formats of data.
collaborations which enables participating domains to Cloud computing is a business model in the computing
dynamically interoperate through sharing and accessing of world. According to the official NIST definition, "cloud
information. Accordingly to forestall unauthorized disclosure of computing is a model for enabling ubiquitous, convenient, on-
the shared resources and conceivable misappropriation, there is a
demand network access to a shared pool of configurable
need to identify anomalous access requests. To the best of our
computing resources (e.g., networks, servers, storage,
knowledge, the detection of anomalous access requests in cloud-
based collaborations through non-parametric statistical applications and services) that can be rapidly provisioned and
technique has not been studied in earlier works. This paper released with minimal management effort or service provider
proposes an online anomaly detection algorithm based on non- interaction."Three most popular cloud paradigms are : (i)
parametric statistical technique to detect anomalous access Infrastructure-as-a-Service (IaaS), (ii) Platform-as-a-Service
requests in cloud environment at runtime. (PaaS), and (iii) Software-as-a-Service (SaaS). One of the
prominent offerings of the SaaS cloud is the online
Keywords—Big data, cloud collaboration, online anomaly collaboration service [3]. In this service, the cloud merchant
detection, non-parametric statistical technique gives online collaboration tools which encourage quicker
sharing and accessing of data. Examples of cloud-based
I. INTRODUCTION collaboration incorporate document-driven collaboration,
Big data is among the most promising research trends of wikipages, file sharing and synchronization, blogs, project
the decade, drawing attention from every segment of the management and so on.
market and society. According to Adrian [1] “Big data Many cloud providers, such as, Google Cloud Storage,
exceeds the reach of commonly used hardware environments Microsoft Azure, and so on offer online collaboration service
and software tools to capture, manage, and process it with in a which is basically loosely-coupled in nature. In such
tolerable elapsed time for its user population.” This flood of collaborations, autonomous domains dynamically interoperate
data is generated by connected devices—from PCs and smart and uncover just constrained data about their services and
phones to sensors such as RFID readers and traffic cams. Plus, policies, relevant to collaboration. It is standard practice for
it’s heterogeneous and comes in many formats, including text, systems to deploy some monitoring mechanism that remains
document, image, video, and more. This new scenario can be in constant vigilance against any deviation of normal system
characterized by means for those issues that can't be behavior. Such deviations are always a sign of some negative
adequately or proficiently addressed using the standard phenomena that call for remedial measures. The digital
computing resources that we right now have. Due to focused on assault is quickly developing as a shrewd, diligent
bottlenecks such as poor scalability, difficulties in installation social and national risk. It is gone for taking industries insider
and maintenance, fault tolerance and low performance in facts or military privileged insights from significant
traditional information technology framework, we need to government organizations or endeavours and client data from
leverage cloud computing techniques and solutions to deal different savvy gadgets and personal computers, incapacitating
with big data problems. Cloud computing and big data are the industries control framework and subsequently bringing on

galactic physical harms. Dissimilar to the customary assault ’lightweight’ trademark craved for online operation. Reasons
focusing on unspecified numerous frameworks, the digital incorporate their utilization of statistical algorithms with high
focused on assault is an organizational attack with a clear figuring overheads and their use of cumbersomely a lot of
target; these days, it is utilized as an assault technique for crude metric data. Likewise, they may require prior
digital dread, hacktivism, and digital fighting. The discovery information about application Service Level Objectives
of this assault requires gathering the organization's extensive (SLO), service implementations, request semantics, or they are
scale information from different sources (system, host/server, centered around taking care of certain well defined issues at
security hardware, gadgets, and so on.) as time goes on and specific levels of abstraction.
breaking down the conduct history.Undetected anomalies may
interpret to budgetary misfortunes, compromise of sensitive There are numerous commercial studies identified with
information and business intermittence. Accordingly anomaly statistical methods for anomaly detection. The greater part of
detection is of foremost significance from an operational the current industry monitoring utilizes fixed edge strategies.
outlook. However, when data volume is extremely large and For example, a popular method for anomaly detection is
the rate of information flow is high, traditional practices of Multivariate Adaptive Statistical Filtering (MASF) [8] which
anomaly detection miss the mark. Therefore, an algorithm for relies on the Gaussian law. MASF enhances on fixed
online anomaly detection may need to be customized to factor thresholds by using a versatile procedure for re-computing
in the scalability issue. control limits iteratively as new data arrives. Notwithstanding,
MASF requests high computing overhead as historical data
Approval of requests in cloud-based collaborations is done develops and is centered around tackling well-defined
through web based access tokens dispensing the need for problems at particular levels of abstraction.
outfitting delicate credential information. The cloud
authentication server issues access tokens on behalf of the TABLE I. A SURVEY ON ANOMALY DETECTION
collaborating domains to the requesting user applications. The
S.No Author Techniques
requesting application can be either pre-enrolled, or unknown.
On receiving the request, the cloud authentication server reads
the user-defined access control list on the object. Either the 1. Chitrakar, Roshan Classification Based,
requester's token is acknowledged and required benefits are and Chuanhe [9] Clustering Based.
permitted, or the token is rejected and the access is denied. 2. Fu, Liu and Classification Based.
The token-based cloud authentication is not taking into Pannu [10]
account the properties of the requesting entity, it does not tie a
client to its implied conduct or activities. It does not pass on 3. Farid, Harbi, and Classification Based,
any data about the conduct of the conveyer between the time Rahman [11] Statistical.
the token was issued and its utilization. Access permission is 4. Yasami and Classification Based,
given strictly on the basis of the validity of the issued token Mozaffar [12] Clustering Based.
for a particular session. However there is no provision to
determine if any user with valid token will bring on any 5. Chandola et.al [13] Classification Based,
security rupture with the shared information. Accordingly to Clustering Based, Nearest
forestall unauthorized disclosure of the shared resources and Neighbor Based, Statistical,
conceivable misappropriation, there is a need to identify Information Theoretic,
anomalous access requests. The scope of this paper is defined Spectral.
as distinguishing anomalous access requests from clients to 6. Hodge and Austin Classification Based,
secure collaborations among various autonomous domains in [14] Clustering Based, Nearest
cloud environment. This paper proposes a non-parametric Neighbor Based, Statistical.
statistical technique to detect anomalous access requests at 7. Agyemang Classification Based,
runtime. To the best of our knowledge, the detection of et al. [15] Clustering Based, Nearest
anomalous access requests in cloud-based collaborations Neighbor Based, Statistical.
through non-parametric statistical technique has not been 8. Markou and Classification Based,
studied in earlier works. Singh [16] Statistical.
The rest of this paper is organized as follows. Section II
covers the related work. Section III provides a categorization 9. Patcha and Park Classification Based,
of statistical methods for anomaly detection. Section IV [17] Clustering Based, Nearest
describes about the details of the anomaly detection Neighbor Based, Statistical.
framework. Section V goes into more detailed description of
the proposed work. Section VI concludes the paper. 10. Beckman and Statistical.
II. RELATED WORK Cook [18]
Earlier scholarly work [4], [5], [6], [7] has created 11. Bakar et al [19] Nearest Neighbor Based,
numerous helpful strategies for anomaly detection, typically in Statistical.
the light of statistical techniques. Nonetheless, few of them
can work at the cloud computing systems and/or have the

1951
Conversely, the proposed strategy is lightweight when confinements in the state-of-the-art have spurred the writers to
combined with windowing-based systems that observe address this issue.
changes in the distribution of data. Table I shows a survey of
techniques on anomaly detection. III. CATEGORIZATION OF STATISTICAL METHODS FOR ANOMALY
DETECTION
Anomaly detection techniques have likewise been
proposed for entirely temporal data. Gupta et al. [20] present Statistical methods for anomaly detection can be broadly
an outline of identifying anomalies on different types of classified as parametric, non-parametric and semi-parametric
temporal data. They characterize anomalies as outliers and techniques.
present detection strategies for both the discrete and the A. Parametric techniques
continuous cases. While the outline exhibits a wide bunch of
Parametric techniques assumes that sample data comes
various methods, it likewise specifies that there is a lot of
from a population that follows a probability distribution based
various issues that can be tended to by detecting outliers on
on a fixed set of parameters. The approaches to estimate the
time series data and that arrangements should be adapted to
parameters are numerous and they search distinctive for
meet the necessities of particular issues. For stream data, the
various distribution of the data. The larger sets of training
utilization of models that update and decay over time is
data the higher probability it is to get a good estimate on the
recommended.
parameters, yet the most critical is to have the right sort of
The major assumption in statistical technique is that strategy to the right type of data.
normal data instances occur in higher probability region of a
stochastic model, while anomalies occur in the low probability B. Non-parametric techniques
region. Both parametric and non parametric techniques are Non-parametric techniques make no supposition on the
connected to fit statistical model. Parametric techniques probability distribution of the data. The difference between
assumes the learning of the fundamental distribution also, parametric models and non-parametric models is that the
evaluate the parameters from the given data. It plays out a former has a fixed number of parameters, while the latter
statistical hypothesis testing, where the null hypothesis grows the number of parameters with the amount of training
suggests that the data instance is created by the evaluated data. The parameters that these methods contain are usually
distribution [21]. In the event that the statistical test rejects the mean, median, variance and similar comparative measures.
null hypothesis, the data point is an anomaly. The literature on Accordingly are these techniques extremely reliant on the
parametric techniques can be categorized as Gaussian model data, which means you will require large sets of training data
and Regression model. The variability in the process over a and regardless of the possibility that you do have it, it doesn't
prolonged period of time is considered and interpreted by promise great forecasts. The reason is that the training data
utilizing the historical data. From this, an assessment of the needs to have reached almost all kinds of possible sets of
variability of the data is described and threshold limits are values so that the technique knows there is a probability to
computed. This methodology maintains a strategic distance achieve that set later on. For occurrence on the off chance if
from the need to make presumptions about the state of the we have three sets of possible outcomes and the training data
distribution of the data. Such strategies are known as non- just hits two of them, then the strategy will be shaped as
parametric techniques. though there is no plausibility or a little probability that it hits
the third set. Consequently it is imperative to utilize the right
Statistical metrics that show sliding windows in streams type of data and enough data to frame these techniques.
have likewise been proposed for identifying anomalies. Datar
et al. [22] present an approximate stream summary insights for C. Semi-parametric techniques
sliding windows. Since regularities in streams may develop Semi-parametric technique make utilization of accessible
over time, the issue of data decay is taken care of by giving training data to both assess the parameters, as parametric
more weight to recent objects and aggregating, then in the techniques, furthermore to frame the strategy, as non-
long run disposing of more established data. The creators store parametric techniques. Accordingly these sorts of techniques
data utilizing exponential histograms. This data structure are a mix of parametric and non-parametric techniques. They
utilizes timestamps as the bins furthermore, the count of the are frequently utilized as a part of circumstances where a non-
item in the stream as the value for each transient extent. It is parametric technique would not execute as they wish or when
appropriate for computing approximate statistics with bounded a researcher needs to utilize a parametric technique however
errors to abstract parts of the content of a stream. inadequate with regards to information about the data a priori.
As apparent from the above survey, the state-of-the-art on The greater part of past studies on online anomaly
anomaly detection does not address recognition of atypical detection have taken parametric or semi-parametric models on
requests in a cloud environment utilizing non-parametric probabilistic distributions, i.e., arbitrary variables are created
statistical techniques. Such requests, if not recognized, can by known distributions for example, Gaussian or Bernoulli
trade off a genuine user’s account driving to unauthorized distributions. Be that as it may, parametric models may not
revelation of the shared data. Additionally, researchers have generally be accessible in applications. Much of the time,
utilized the parametric and non-parametric statistical strategies distributions can be arbitrary, and may not be Gaussian or
in various spaces, in any case, no work which has utilized non- Bernoulli. Moreover, distributions may not be known in
parametric statistical technique as a part of securing cloud- advance. Henceforth, it is desirable to develop non-parametric
based collaborations is accounted for. Consequently, these methods that are distribution free.

1952
IV. ON-STREAM ANOMALY DETECTION FRAMEWORK window. Wj will be looked at against all already seen
On-stream anomaly detection includes preprocessing of windows by means of an appropriate statistical technique. If
crude stream information as the underlying stride. the test statistic TS is underneath a pre-characterized limit, the
Preprocessing performs essential commotion end and changes match counter Ik of the kth window Wk which is as of now
over the information into multidimensional time series being contrasted with Wj is increased. On the off chance that
information that can be subjected to machine learning for Ik surpasses a pre-defined limit ITH, Wj is pronounced as
prescient purposes. Stream data analytics incorporates two benign. In the event that no window is found whose match
prominent methodologies, to be specific, data mining and counter surpasses ITH, Wj is flagged as anomalous.
statistical. K-means is a prominent semi-supervised or V. PROPOSED WORK
unsupervised data mining strategy that fabricates clusters of
data points from prep with the presumption that the clusters Recognizing individual data points as anomalous can
are benign. In the event that a data point falls into any of the bring about false alerts. To dodge this, one may try to analyze
clusters, it is marked as benign, else it is labeled as an windows of data comprising of an accumulation of data points
anomaly. Keeping in mind the end goal to oblige the and after that make a determination on the window. Since we
advancement of stream data, the training model should be are working with collections of points as opposed to individual
upgraded intermittently. ones, we concentrate on changes in probability distribution
contrasted with a reference set of the expected probability
distribution. Thus, we propose a way to deal with detecting
anomalies which show as changes in the system behavior
conflicting with what is normal.
The methodology depends on a classical inquiry in
hypothesis testing [23], in particular figuring out whether the
observed data is reliable with a given distribution. In the
classical theory of hypothesis testing, we are required to figure
out which of the two hypothesis, the null hypothesis and the
alternate hypothesis, best depicts the data. The two hypothesis
are each characterized by a single distribution or a collection
of distributions. Formally, let ଵ ǡ ଶ ǥ ୯ indicate the
observed sample of size q. Let M0 indicate the distribution
Fig. 1. Classification based on-stream framework
representing the null-hypothesis and let M1 the alternate
hypothesis. At that point the ideal test for minimizing the
Figure 1 portrays basic flow of a classification based on-
probability of falsely rejecting the null hypothesis under a
stream framework. The training model is built using input
constraint on the probability of incorrectly accepting the null
stream data, which in turn used for making predictions on
hypothesis is given by the Neyman-Pearson theorem. The
future data. It is critical to occasionally update the training
theorem expresses that the optimal test is given by figuring out
model in light of the dynamic nature of data.
likelihood ratio
A. Algorithm : Statistical Algorithm for Anomaly Detection
୑బ ሺଡ଼భ ǡଡ଼మ ǥଡ଼౧ ሻ
Input: (W, TH, ITH) T (1)
୑భ ሺଡ଼భ ǡଡ଼మ ǥଡ଼୯ሻ
1: Store data of jth time interval to Wj
2: Add Wj to the set of previously stored windows W where M0 (ଵ ǡ ଶ ǥ ୯ ) is the probability doled out to the
3: For k=1 to j-1 do observed sample by M0, M1 (ଵ ǡ ଶ ǥ ୯ ) is the corresponding
probability allotted by M1, and T is a threshold that can be
4: Compute Test Statistic, TS, for (Wk, Wj) resolved taking into account the requirement on the
5: If TS<TH probability of incorrectly accepting the null hypothesis.
6: Ik++ Now and then the alternate hypothesis is just the statement
that the observed data is not drawn from the null hypothesis.
7: If Ik>ITH Tests intended for this design are called goodness-of-fit tests.
8: Set status = benign For this issue, we conjure a specific test known as the
multinomial goodness-of-fit test that is applied to a situation
9: Else where the data X݅ are arbitrary variables that can take at most
10: Set status = anomalous k values, say {1, 2, . . . , k}. Let M0 = (m1, m2, ......, mk)where
Ȉimi= 1indicate the distribution relating to the null hypothesis
This algorithm gives a framework of the anomaly (mi indicates the probability of observing ݅). Let qi signify the
detection process by a non-parametric statistical method. It number of times ݅ was seen in the sample (ଵ ǡ ଶ ǥ ୯ ). Let
portrays a general methodology to detect anomaly. It ෡= (
ෝ ଵ,
ෝ ଶ , ...,
ෝ ୩ ሻǡwhere
ෝ ୧ = qi/q indicate the empirical
intermittently gathers data from the approaching stream. Data distribution of the observed sample ଵ ǡ ଶ ǥ ୩ . At that point
gathered at the jth time interim will be put away in the jth the likelihood ratio in (1) reduces to

1953
୯ characteristic of data. After receiving a window, the
ෞన ୯౟
ς୩୧ୀଵ
ෝ ୧ framework generates its unknown distribution. It stores the
ൌ ୩ ൌ ෍
ෝ ୧ (2) most recent N set of distributions (past N models). We will
ς୧ୀଵ ୧ ୯౟ ୧
୧ୀଵ consider the current model as anomaly if it does not match any
past N models.
For the purpose of hypothesis testing, we have to look at
entropy between two probability distributions; the reference A. Algorithm : Proposed Online Anomaly Detection
model and the monitoring data. In the monitoring context we Input: (DS, NOB, DSmin, DSmax, TH, NOWTH)
will try to match the entropy of a reference set M(x) with the 1: Initialize n=0
entropy of the current window of data, N(x). At the end of the
day, we have to check whether the system is behaving now 2: Procedure OAD (DS)
(sample distribution) as it has previously (historical 3: Begin
distribution). 4: Take each instance of DS and extracts a data point, DP
The relative entropy (or Kullback-Leibler divergence) with its value
between two distributions N = (n1, n2, . . . ,n݇) and M= (m1, 5: Generic bin index,
m2, . . . , m݇) that are defined over the same classification Bcurrent = ((DP-DSmin)/(DSmax-DSmin))*NOB
criteria is given by:
6: Return Bcurrent with its frequency
୬
ሺȁȁሻ ൌ σ ݊ ౟ (3)
௜ ୫౟ 7: Count the total number of frequencies for each bin
8: Calculate the Empirical Frequency, ෡
Thus the likelihood ratio is ‫ =ܮ‬q‫(כ‬M‫פפ‬M). Note that the
relative entropy has particular properties. The contentions are 9: If n=0
epistemologically differentiated: N speaks to the observations, 10: Set M1= ෡
the data, what we see, while M speaks to the expectations, the 11: n=1
models, what we accept D(N||M) is an unbalanced measure of
disparity between empirical model (m(x)) and observed 12: NOW1=1
distributions (n(x)). D(N||M) is an asymmetrical measure of 13: Declare the current window as benign
dissimilarity between empirical model (M(x)) and observed 14: ෡ ȁȁMi)<TH) for any hypothesis Mi; in
Else if D(
distributions (N(x)). This dissimilarity measure is such that
Ͳ ൑ ሺȁȁሻ ൑ λ where D(N||M) = 0 iff N Ł M. Another 15: Increment NOWi by 1 (if more than one such i
important property of relative entropy is that if the null ෡ ȁȁMi))
exists, select the one with the lowest D(
hypothesis M is true, as the number of samples q grows, 16: If NOWi > NOWTH
2 ‫כ‬q ‫(כ‬෡ ‫פפ‬M) converges on a Chi-Square distribution with k-1
degrees of freedom. . The multinomial goodness-of-fit test 17: Declare the window as benign
relies on this property of entropy. This chi-square relationship 18: Else
is imperative in hypothesis testing as it permits us to analyze 19: Declare the window as anomalous
2 * q * D(N||M) with a threshold that is resolved in light of the
cumulative distribution function (cdf) of the chi-squared 20: Increment NOWi by 1
distribution and a coveted upper bound on the false negative 21: Else
probability.
22: Declare the window as anomalous
The relative entropy based methodologies depicted here 23: Increment n by 1
have a few focal points. They are non-parametric, in particular
they do not assume any predefined form for the distribution of 24: Set Mn=෡
the data. They can be effectively reached out to handle multi- 25: NOWn=1
dimensional time series accordingly consolidating correlation, 26: end Procedure
and can be adjusted to workloads whose distribution changes
over time. Even though relative entropy based techniques and where DS is the input data stream, NOB is the number of
hypothesis testing methodologies have been used for anomaly bins into which DS has to be quantized, DSmin and DSmax are the
detection, we are not aware of any work that uses it in the minimum and maximum values that DS can take, TH is the
context of multinomial goodness of fit test for anomaly threshold item which the test statistic is compared and it is
detection in cloud centric collaborations. The latter gives an usually set to that point in the chi squared cdf NOB-1 degrees
orderly route (taking into account the cdf of the chi-squared of freedom that corresponds to 0.95 or 0.99 and NOWTH is the
distribution) to pick an edge above which alerts are raised. threshold against which NOWi is compared to determine if a
The proposed online anomaly detection algorithm hypothesis has occurred frequently enough, NOWi is a score
illustrates the statistical procedure for anomaly detection. which tracks the number of windows that agrees with
Here, we have focused more on a set of data points in a hypothesis Mi.
particular time interval called window rather than an As we do not have any prior knowledge of data, we have
individual data point. An anomalous window conveys more utilized a bin-based histogram method to determine the
information and represents a pattern or distribution of unusual unknown distribution in step(5). The algorithm then calculates

1954
the empirical frequency, ෡ ǤNote that M1, M2, ……Mn denote References
the n null hypothesis at any instance of time. A test-statistic
෡ and Mi is calculated and each of the Mi is [1] M. Adrian, "Big Data," Teradata Magazine. [Online]. Available:
including http://www.teradatamagazine.com/v11n01/Features/Big-Data/ (accessed
calculated and contrasted with the threshold TH in Step(14). If on March 29, 2013).
the test-statistic surpasses the threshold for all of the null- [2] Kim, Hyunjoo, et al. "Behavior-based anomaly detection on big data."
hypotheses, the window is declared anomalous, n is 13th Australian Information Security Management Conference (2015) :
augmented by 1 and a new hypothesis is made in light of the 73-80.
present window. When the test statistic is being matched with [3] Liu, Fang, et al. "NIST cloud computing reference architecture." NIST
past hypotheses, several hypotheses can have lower test special publication 500.2011 (2011): 292.
statistics than the threshold. So a hypothesis which has the [4] Baker, L. Douglas, et al. "A hierarchical probabilistic model for novelty
detection in text." Proceedings of International Conference on Machine
lowest test statistics has to be selected. If several hypotheses Learning. 1999.
have the same lowest statistics, then it selects the hypothesis
[5] Cabrera, João BD, Lundy Lewis, and Raman K. Mehra. "Detection and
which has highest score, NOWi. The score means how many classification of intrusions and faults using sequences of system calls."
times it matches a hypothesis. If any past hypothesis is Acm sigmod record 30.4 (2001): 25-34.
matched by current test statistic, then we increase the score of [6] Yeung, Dit-Yan, and Calvin Chow. "Parzen-window network intrusion
that past hypothesis. If the smallest test-statistic is less than the detectors." Pattern Recognition, 2002. Proceedings. 16th International
threshold but corresponds to a hypothesis that was accepted Conference on. Vol. 4. IEEE, 2002.
less than NOWTH times previously, then the window is [7] Ghosh, Anup K., Aaron Schwartzbard, and Michael Schatz. "Learning
announced anomalous, however the number of appearances of Program Behavior Profiles for Intrusion Detection." Workshop on
Intrusion Detection and Network Monitoring. Vol. 51462. 1999.
that hypothesis is augmented. If neither of the two conditions
[8] Buzen, Jeffrey P., and Annie W. Shum. "Masf-multivariate adaptive
is fulfilled, the window is pronounced anomalous and the statistical filtering." Int. CMG Conference. 1995.
proper accounting performed.
[9] Chitrakar, Roshan, and Chuanhe Huang. "Anomaly based Intrusion
B. Correctness and completeness of the proposed algorithm Detection using Hybrid Learning Approach of combining k-Medoids
Clustering and Naïve Bayes Classification." Wireless Communications,
Correctness: For n=0, the window is considered as benign Networking and Mobile Computing (WiCOM), 2012 8th International
and n is then set to 1 and the first null hypothesis is developed. Conference on. IEEE, 2012.
M1, M2, ……Mn denote the n null hypothesis at any instance [10] Fu, Song, Jianguo Liu, and HusanbirPannu. "A hybrid anomaly
of time. The algorithm intermittently gathers data from the detection framework in cloud computing using one-class and two-class
support vector machines." International Conference on Advanced Data
approaching stream. Data gathered at the ith time interim will Mining and Applications. Springer Berlin Heidelberg, 2012.
be put away in the ith window. Wi will be looked at against all
[11] Farid, DewanMd, NouriaHarbi, and Mohammad Zahidur Rahman.
already seen windows by means of multinomial goodness of "Combining naive bayes and decision tree for adaptive intrusion
fit If the test statistic D( ෡ ȁȁMi) is underneath a pre- detection."arXiv preprint arXiv:1005.4496 (2010).
characterized limit, then NOWi is augmented by 1 and if [12] Yasami, Yasser, and Saadat Pour Mozaffari. "A novel unsupervised
NOWi is greater than a predefined threshold NOWTH then the classification approach for network anomaly detection by k-Means
window is declared as benign else declared as anomalous. If clustering and ID3 decision tree learning methods." The Journal of
Supercomputing53.1 (2010): 231-245.
the test-statistic surpasses the threshold for all of the null-
[13] Chandola, Varun, Arindam Banerjee, and Vipin Kumar. "Anomaly
hypotheses, the window is declared anomalous, n is detection: A survey." ACM computing surveys (CSUR) 41.3 (2009): 15.
augmented by 1 and a new hypothesis is made in light of the
[14] Hodge, Victoria J., and Jim Austin. "A survey of outlier detection
present window. If none of the conditions is satisfied, the methodologies." Artificial intelligence review 22.2 (2004): 85-126.
window is declared as anomalous. [15] Agyemang, Malik, Ken Barker, and RadaAlhajj. "A comprehensive
Completeness: Trivial, no assumption is made on the survey of numeric and symbolic outlier mining techniques." Intelligent
Data Analysis10.6 (2006): 521-538.
incoming data stream DS and a window can be checked for
[16] Markou, Markos, and Sameer Singh. "Novelty detection: a review-part
anomaly if the input that program accepts contains at least one 1: statistical approaches." Signal processing 83.12 (2003): 2481-2497.
data point. [17] Patcha, Animesh, and Jung-Min Park. "An overview of anomaly
Termination: At least one data point can be solved per detection techniques: Existing solutions and latest technological trends."
Computer networks 51.12 (2007): 3448-3470.
time interval and the size of the window is finite.
[18] Beckman, Richard J., and R. Dennis Cook. "Outlier………
s."Technometrics 25.2 (1983): 119-149.
VI. CONCLUSION
[19] Bakar, Zuriana Abu, et al. "A comparative study for outlier detection
In order to cope up with the security breaches in cloud techniques in data mining." 2006 IEEE conference on cybernetics and
collaborative environment where customers can share and intelligent systems. IEEE, 2006.
access resources remotely, an online anomaly detection [20] Gupta, Manish, et al. "Outlier detection for temporal data." Synthesis
algorithm based on non-parametric statistical technique for big Lectures on Data Mining and Knowledge Discovery 5.1 (2014): 1-129.
data streams has been proposed. The proposed algorithm [21] Bamnett, V., and T. Lewis. "Outliers in statistical data." vol. 3. Wiley,
detects anomalous access requests to unauthorized shared New York(1994).
resources at runtime. [22] Datar, Mayur, et al. "Maintaining stream statistics over sliding
windows."SIAM journal on computing 31.6 (2002): 1794-1813.
[23] Lehmann, Erich L., and Joseph P. Romano. Testing statistical
hypotheses. Springer Science & Business Media, 2006.

1955

Isbn 9789354738289
No ratings yet
Isbn 9789354738289
114 pages
Technical Seminar Darshan
No ratings yet
Technical Seminar Darshan
32 pages
Cloud Computing and Big Data: Technologies, Applications and Security
No ratings yet
Cloud Computing and Big Data: Technologies, Applications and Security
406 pages
Big Data Analytics in Cloud Computing
No ratings yet
Big Data Analytics in Cloud Computing
10 pages
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
No ratings yet
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
119 pages
Big Data Architecture
No ratings yet
Big Data Architecture
17 pages
Full Doc Janani
No ratings yet
Full Doc Janani
121 pages
Survey On Securing Data Storage in The Cloud
No ratings yet
Survey On Securing Data Storage in The Cloud
18 pages
Fundamentals of Mathematics Unit 2 - V1
No ratings yet
Fundamentals of Mathematics Unit 2 - V1
21 pages
Cloud Computing Report
No ratings yet
Cloud Computing Report
30 pages
Case Study
No ratings yet
Case Study
16 pages
Johnson, Gabbrielle (2020) - Algorithmic Bias - On The Implicit Biases of Social Technology (Synthese) .2up
No ratings yet
Johnson, Gabbrielle (2020) - Algorithmic Bias - On The Implicit Biases of Social Technology (Synthese) .2up
11 pages
Dap An Nwc203c Fall 2022 Update
No ratings yet
Dap An Nwc203c Fall 2022 Update
27 pages
Sharna Cs Done
No ratings yet
Sharna Cs Done
15 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
English Please 11-Student S
No ratings yet
English Please 11-Student S
3 pages
Sat - 65.Pdf - Providing Security and Clustering For Bigdata in Cloud
No ratings yet
Sat - 65.Pdf - Providing Security and Clustering For Bigdata in Cloud
11 pages
Free Online Business Model Canvas Template
100% (2)
Free Online Business Model Canvas Template
120 pages
Architecture and Framework For Machine Learning As A Service
No ratings yet
Architecture and Framework For Machine Learning As A Service
8 pages
Big Data Analytics-Enhanced Cloud Computing: Challenges, Architectural Elements, and Future Directions
No ratings yet
Big Data Analytics-Enhanced Cloud Computing: Challenges, Architectural Elements, and Future Directions
10 pages
Resume June
No ratings yet
Resume June
3 pages
Experiment 22 PDF
No ratings yet
Experiment 22 PDF
6 pages
V4i5 0447
No ratings yet
V4i5 0447
6 pages
Snowflake Domain Specific Features
No ratings yet
Snowflake Domain Specific Features
7 pages
Prediction of Vehicles' Trajecto-Ries Based On Driver Behaviour Model
No ratings yet
Prediction of Vehicles' Trajecto-Ries Based On Driver Behaviour Model
76 pages
Ap 01
No ratings yet
Ap 01
3 pages
Soumya Ray + Paper 1 + Published Version
No ratings yet
Soumya Ray + Paper 1 + Published Version
22 pages
Annex B. School Level Report On The 3rd Quarter NSED
No ratings yet
Annex B. School Level Report On The 3rd Quarter NSED
3 pages
Arunachal Pradesh
No ratings yet
Arunachal Pradesh
3 pages
Ijeta V7i4p7
No ratings yet
Ijeta V7i4p7
6 pages
Blockchain Techniques For Secure Storage of Data in Cloud Environment
No ratings yet
Blockchain Techniques For Secure Storage of Data in Cloud Environment
8 pages
Big Data Analytics in The Cloud For Business Intelligence
No ratings yet
Big Data Analytics in The Cloud For Business Intelligence
11 pages
Blogger Community
No ratings yet
Blogger Community
3 pages
21CS71 Solutions
No ratings yet
21CS71 Solutions
24 pages
African Journal of Engineering and Environment Research Vol.4 (2) 2023 - ISSN: 2992-2828
No ratings yet
African Journal of Engineering and Environment Research Vol.4 (2) 2023 - ISSN: 2992-2828
11 pages
46-Article Text-261-2-10-20210422
No ratings yet
46-Article Text-261-2-10-20210422
10 pages
Selective Jamming/Dropping Insider Attacks in Wireless Mesh Networks
No ratings yet
Selective Jamming/Dropping Insider Attacks in Wireless Mesh Networks
6 pages
Big Data in Cloud Computing
No ratings yet
Big Data in Cloud Computing
11 pages
Analysis of Secured Cloud Data Storage Model
No ratings yet
Analysis of Secured Cloud Data Storage Model
24 pages
Providing Security and Integrity For Data Stored in Cloud Storage
No ratings yet
Providing Security and Integrity For Data Stored in Cloud Storage
6 pages
Dot Net Ieee Project Titles 2017 - 2018: For More Details Contact MR - SARAVANAN - +91 9751812341
No ratings yet
Dot Net Ieee Project Titles 2017 - 2018: For More Details Contact MR - SARAVANAN - +91 9751812341
10 pages
LM1203N
No ratings yet
LM1203N
16 pages
Ijet V3i4p1
No ratings yet
Ijet V3i4p1
5 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Web-Based Collaborative Big Data Analytics On Big Data As A Service Platform
No ratings yet
Web-Based Collaborative Big Data Analytics On Big Data As A Service Platform
4 pages
Big Data in Cloud Computing An Overview
No ratings yet
Big Data in Cloud Computing An Overview
7 pages
A Cloud Based Approach For Big Data Analysis A Comprehensive Review
No ratings yet
A Cloud Based Approach For Big Data Analysis A Comprehensive Review
7 pages
Facilitating Role of Cloud Computing in Driving Big Data Emergence
No ratings yet
Facilitating Role of Cloud Computing in Driving Big Data Emergence
6 pages
Technical Characteristics ITU HARYO
No ratings yet
Technical Characteristics ITU HARYO
8 pages
Data, Engineering and Applications
No ratings yet
Data, Engineering and Applications
331 pages
Application of Cloud Computing
No ratings yet
Application of Cloud Computing
17 pages
Analysis of The Development of Big Data and AI Based Technologies For The Cloud Computing Environment
No ratings yet
Analysis of The Development of Big Data and AI Based Technologies For The Cloud Computing Environment
4 pages
pRICE LIST
No ratings yet
pRICE LIST
3 pages
2017-2018 Software Ieee Final Year Project Titles
No ratings yet
2017-2018 Software Ieee Final Year Project Titles
2 pages
High Level View of Cloud Security
No ratings yet
High Level View of Cloud Security
11 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
3 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
2018 Conferecne-Asymmetric Secure Storage Scheme For Big Data On Multiple Cloud Providers
No ratings yet
2018 Conferecne-Asymmetric Secure Storage Scheme For Big Data On Multiple Cloud Providers
5 pages
AI Supported Computer-Generated Pen-And-Ink Illustration: February 2001
No ratings yet
AI Supported Computer-Generated Pen-And-Ink Illustration: February 2001
11 pages
Altec 1628 Mixer Mods
No ratings yet
Altec 1628 Mixer Mods
2 pages
Big Data Analytic Using Cloud Computing
No ratings yet
Big Data Analytic Using Cloud Computing
6 pages
Wood 2017
100% (1)
Wood 2017
8 pages
IJETR2137
No ratings yet
IJETR2137
4 pages
Introduction To QGIS
No ratings yet
Introduction To QGIS
11 pages
Week 14 Other Technologies-Cloud
No ratings yet
Week 14 Other Technologies-Cloud
21 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Practical Python Programming For Data Scientists A. Suresh N.
100% (4)
Practical Python Programming For Data Scientists A. Suresh N.
346 pages
C - 41 Cloud Computing in Big Data Features and Issues
No ratings yet
C - 41 Cloud Computing in Big Data Features and Issues
8 pages
MapInfo Basic Skills
100% (1)
MapInfo Basic Skills
53 pages
A Survey On Big Data and Cloud Computing: D. Asir Antony Gnana Singh B. Tamizhpoonguil E. Jebamalar Leavline
No ratings yet
A Survey On Big Data and Cloud Computing: D. Asir Antony Gnana Singh B. Tamizhpoonguil E. Jebamalar Leavline
5 pages
MS-Project 2003 Training Slides - DR Mohan
No ratings yet
MS-Project 2003 Training Slides - DR Mohan
127 pages
Synopsis For Separable Reversible Data Hiding in Encrypted Image Using AES
100% (1)
Synopsis For Separable Reversible Data Hiding in Encrypted Image Using AES
26 pages
Emerging Trends Ans
No ratings yet
Emerging Trends Ans
5 pages
Emerging Trends - Q&A
No ratings yet
Emerging Trends - Q&A
5 pages
NEW Optimal Resource Allocation in Wirelessly Powered Communication Networks With User Cooperation
No ratings yet
NEW Optimal Resource Allocation in Wirelessly Powered Communication Networks With User Cooperation
14 pages
NV-Dedup: High-Performance Inline Deduplication For Non-Volatile Memory
No ratings yet
NV-Dedup: High-Performance Inline Deduplication For Non-Volatile Memory
14 pages
Review Paper On Big Data Analytics in Cloud Computing: July 2017
No ratings yet
Review Paper On Big Data Analytics in Cloud Computing: July 2017
6 pages
Presented By: Kunal Jain (071309) Under The Guidance of Mr. Praveen Kumar Tripathi Dept of CSE & IT (JUIT)
No ratings yet
Presented By: Kunal Jain (071309) Under The Guidance of Mr. Praveen Kumar Tripathi Dept of CSE & IT (JUIT)
23 pages
SCADA
No ratings yet
SCADA
12 pages
Use Wireshark To Compare Telnet and SSH Traffic
No ratings yet
Use Wireshark To Compare Telnet and SSH Traffic
4 pages
IJRPR2483
No ratings yet
IJRPR2483
4 pages
Linear Precoding For Multiuser Visible-Light Communication With Field-of-View Diversity
No ratings yet
Linear Precoding For Multiuser Visible-Light Communication With Field-of-View Diversity
9 pages
LSTM-based Memory Profiling For Predicting Data Attacks in Distributed Big Data Systems
No ratings yet
LSTM-based Memory Profiling For Predicting Data Attacks in Distributed Big Data Systems
9 pages
Course Pack
No ratings yet
Course Pack
1 page
3.2.2 Arduino Uno
No ratings yet
3.2.2 Arduino Uno
8 pages
Selfish Jammer Detection in Multiple P2P Communication Networks
No ratings yet
Selfish Jammer Detection in Multiple P2P Communication Networks
7 pages
Overview of Security Issues
No ratings yet
Overview of Security Issues
5 pages
Big Data Seminar Report
No ratings yet
Big Data Seminar Report
19 pages
Et200sp Ai 4xu I 2 Wire ST Manual en-US en-US PDF
No ratings yet
Et200sp Ai 4xu I 2 Wire ST Manual en-US en-US PDF
42 pages
Quiz (FlashSystem 5000 L2) - Attempt Review
No ratings yet
Quiz (FlashSystem 5000 L2) - Attempt Review
12 pages
Smart Vehicle Security System For Defending Against Collabrative Attacks by Malware
No ratings yet
Smart Vehicle Security System For Defending Against Collabrative Attacks by Malware
5 pages
(IJCST-V12I6P9) :Mrs.N.Dhivya, Mrs.S.Senthamarai Selvi, R.Gayathri
No ratings yet
(IJCST-V12I6P9) :Mrs.N.Dhivya, Mrs.S.Senthamarai Selvi, R.Gayathri
5 pages
Large Scale Data Collection of Tattoo-Based Biometric Data From Social-Media Websites
No ratings yet
Large Scale Data Collection of Tattoo-Based Biometric Data From Social-Media Websites
4 pages
Oxvwhulqj0Rghori&Orxg&Hqwhuviru%Lj'Dwd 3urfhvvlqj: 0/Nkdlor.O/Pdvk 1D) Du3Hohk
No ratings yet
Oxvwhulqj0Rghori&Orxg&Hqwhuviru%Lj'Dwd 3urfhvvlqj: 0/Nkdlor.O/Pdvk 1D) Du3Hohk
4 pages
Big Data Analytics in Cloud Computing
No ratings yet
Big Data Analytics in Cloud Computing
8 pages
Author Year Title Methodology Advantage Disadvantage
No ratings yet
Author Year Title Methodology Advantage Disadvantage
3 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Augmenting Data When Training A CNN For Retinal Vessel Segmentation: How To Warp?
No ratings yet
Augmenting Data When Training A CNN For Retinal Vessel Segmentation: How To Warp?
4 pages
NO Paper / Year Method/tool Concept Application
No ratings yet
NO Paper / Year Method/tool Concept Application
2 pages
Riit Eee
No ratings yet
Riit Eee
2 pages
Big Data Security Using System Logs
No ratings yet
Big Data Security Using System Logs
7 pages
Figure 1. Schematic Overview of Classification of Time Series Gene Expression
No ratings yet
Figure 1. Schematic Overview of Classification of Time Series Gene Expression
1 page
I Jcs It 20150605100
No ratings yet
I Jcs It 20150605100
4 pages
Acknowledgement: Rev - Fr.Dr.J.E. Arul Raj, The Corner Stone of This Institution For All Efforts and
No ratings yet
Acknowledgement: Rev - Fr.Dr.J.E. Arul Raj, The Corner Stone of This Institution For All Efforts and
1 page
An Intelligent Reflective Colour Sensor System For Paper and Textile Industries
No ratings yet
An Intelligent Reflective Colour Sensor System For Paper and Textile Industries
5 pages
Software Requirements Specifications Document: System Description
No ratings yet
Software Requirements Specifications Document: System Description
5 pages
GE MDS Mercury 900 Data Sheet
No ratings yet
GE MDS Mercury 900 Data Sheet
2 pages
Cloud Computing
0% (1)
Cloud Computing
5 pages
Cloud Computing
No ratings yet
Cloud Computing
6 pages
Simatic Hmi Wincc V6.2 Sp2 Installation Notes
No ratings yet
Simatic Hmi Wincc V6.2 Sp2 Installation Notes
42 pages
WB Incentive Scheme and Power Incentive Scheme
No ratings yet
WB Incentive Scheme and Power Incentive Scheme
49 pages

Online Anomaly Detection Using Non-Parametric Technique For Big Data Streams in Cloud Collaborative Environment

Uploaded by

Online Anomaly Detection Using Non-Parametric Technique For Big Data Streams in Cloud Collaborative Environment

Uploaded by

2016 IEEE International Conference on Big Data (Big Data)

2016 IEEE International Conference on Big Data (Big Data)

Online Anomaly Detection using Non-Parametric

978-1-4673-9005-7/16/$31.00 ©2016 IEEE 1950

You might also like