0% found this document useful (0 votes)

4 views14 pages

Model-Based ODS With Statistical Preprocessing

The document presents a model-based outlier detection system that utilizes statistical preprocessing to enhance the reliability and security of services by identifying erroneous data points. It categorizes outlier detection methods into four types and proposes an unsupervised approach using the inter-quartile range (IQR) for preprocessing and k-means clustering for model construction. The system demonstrates improved performance in detecting outliers across various application domains, as evaluated through specific performance metrics.

Uploaded by

Dr. D. Asir Antony Gnana Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views14 pages

Model-Based ODS With Statistical Preprocessing

Uploaded by

Dr. D. Asir Antony Gnana Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Journal of Modern Applied Statistical

Methods
Volume 15 | Issue 1 Article 39

5-1-2016

Model-Based Outlier Detection System with

Statistical Preprocessing
D. Asir Antony Gnana Singh
Anna University, Tiruchirappalli, India, asirantony@gmail.com

E. Jebalamar Leavline
Anna University, Tiruchirappalli, India, jebi.lee@gmail.com

Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm

Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the
Statistical Theory Commons

Recommended Citation
Singh, D. Asir Antony Gnana and Leavline, E. Jebalamar (2016) "Model-Based Outlier Detection System with Statistical
Preprocessing," Journal of Modern Applied Statistical Methods: Vol. 15 : Iss. 1 , Article 39.
DOI: 10.22237/jmasm/1462077480
Available at: http://digitalcommons.wayne.edu/jmasm/vol15/iss1/39

This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for
inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState.
Journal of Modern Applied Statistical Methods Copyright © 2016 JMASM, Inc.
May 2016, Vol. 15, No. 1, 789-801. ISSN 1538 − 9472

Model-Based Outlier Detection System with

Statistical Preprocessing
D. Asir Antony Gnana Singh E. Jebalamar Leavline
Anna University Anna University
Tiruchirappalli, India Tiruchirappalli, India

Reliability, lack of error, and security are important improvements to quality of service.
Outlier detection is a process of detecting the erroneous parts or abnormal objects in
defined populations, and can contribute to secured and error-free services. Outlier
detection approaches can be categorized into four types: statistic-based, unsupervised,
supervised, and semi-supervised. A model-based outlier detection system with statistical
preprocessing is proposed, taking advantage of the statistical approach to preprocess
training data and using unsupervised learning to construct the model. The robustness of
the proposed system is evaluated using the performance evaluation metrics sum of
squared error (SSE) and time to build model (TBM). The proposed system performs
better for detecting outliers regardless of the application domain.

Keywords: Outlier, Preprocessing, Inter-quartile range, Anomaly Detection

Introduction
Outlier detection is the process of identifying the objects or events that does not
follow the stranded pattern or change in behaviour compared to the group
members. The outlier detection is also known as anomaly detection (Chandola,
Banerjee, & Kumar, 2009). Anomaly detection helps to solve bank fraudulent
detection, network or host-based intrusion detection, credit cards fraudulent
detection (Akhilomen, 2013), adversaries detection in military environment,
surveillance for enemy activities, detecting the outliers from captured information
in ad-hoc and wireless sensor networks (Singhal, Gankotiya, Agarwal, & Verma,
2012), structural defect identification, error detection in intensive and safety
systems, detecting errors in text, detecting the anomaly in computer networks to
prevent the attack (Tartakovsky, Polunchenko, & Sokolov, 2013), detecting

D. Asir Antony Gnana Singh is in the Department of Computer Science and Engineering.
Email at: asirantony@gmail.com. Dr. Leavline is a Professor in the Department of
Electronics Communication and Engineering. Email her at: jebi.lee@gmail.com.

789
MODEL-BASED ODS WITH STATISTICAL PREPROCESSING

abnormal reading of radars due to the fault in the parts of the air craft (Fujimaki,
Yairi, & Machida, 2005), detecting chromosomal abnormalities in medical field
(Manning & Hudgins, 2010), etc. Anomaly is also termed as exceptions,
deviations, novelties and noise (Hodge & Austin, 2004). Figure 1 shows the
objects which are deviated from the cluster groups Ci in a vector space. The
similar objects are grouped as the clusters. The outliers are denoted as o1, o2,…, on
and the clusters are denoted as Ci where i denotes the cluster index.

Figure 1. Similarity-based grouping of object with outliers

Commonly, the outliers are detected using the following steps: Initially, a
model of the normal activities or behaviour is constructed and then the outliers are
detected using the model. The model can be a statistical summary or pattern of the
overall samples or objects. The model can be constructed by learning the
behaviors or actions of the samples or objects. Using the built model, the outliers
are detected by observing the activities or characteristics that significantly differ
from the normal model. A model-based outlier detection system with statistical
preprocessing is proposed. The statistical measure namely inter-quartile range
(IQR) is used for preprocessing and an unsupervised learner is used to build the
model in order to detect the outliers for various domains.

Related works
Outliers can be detected in various fashions such as graphical, statistical,
unsupervised, supervised, and semi-supervised methods. In graphical method, one
or multi-dimensional plots such as spin plot, scatter plot, and box-plot are used to
detect and identify the outliers. The major drawback of this approach is the

790
SINGH & LEAVLINE

increased time consumption for detection of the outliers. Also, the plots are more
specific to the detection problem, and they follow may constrains.
The statistical method generates a parametric model that describes the data
distribution and then, the statistical measure is used to find the parameters of the
data distribution such as mean, variance, and median to find the ranges of the
outlier confidence level. Then, based on the distribution parameter or outlier
ranges, the outliers are detected and separated from the normal distribution of the
data model. Zhang et al. (2012) used the statistical method to detect the outliers
for the wireless sensor networks (WSN) in order to preprocess the data which is
acquired from the sensor nodes for improving the accuracy of the acquired data
and to reduce the transmission overhead. In this approach, the statistical measures
such as temporal correlation, spatial correlation and spatial-temporal correlation
are used to detect the outliers.
Rousseeuw and Hubert (2011) developed an outlier detection scheme using
robust location and scatter estimators for outlier detection in multivariate data.
The location refers to the coordinate-wise mean and the scatter refers to the
covariance matrix. Statistical measure is computed in three phases namely c-step
data iteration, data partitioning, and data nesting (Rousseeuw & Hubert, 2011).
Hido, Tsuboi, Kashima, Sugiyama, and Kanamori (2011) proposed a statistical-
based outlier detection method using the direct density ratio estimation. The major
drawback of this statistical method is that most of the statistical tests cannot be
applied for the multi-attribute problems. Also, they require the prior knowledge of
probability distribution of the data and it is difficult to estimate the real
distribution of high dimensional data (Hido et al., 2011).
In the unsupervised method, the data are structured as attributes (column-
wise) and objects (row-wise). Objects refer to samples or populations that contain
the attributes. Commonly, the clustering approach is used with the similarity
criteria such as distance-based, density-based, nearest-neighbor-based, etc. Casas,
Mazel, and Owezarski (2011) proposed a network anomaly detection system
using the unsupervised clustering approach by sub-space and density-based
metrics to prevent the network attacks. The network attacks can be denial-of-
service, illegal access, probing, and spreading warms and viruses (Casas et al.,
2011). In some cases, multiple-clustering techniques are combined for outlier
detection using evidence accumulation (Fred & Jain, 2005). The researchers
observed that distance-based method is better in detecting the outliers since it
does not require any probability distribution and also computing the probabilistic
distribution to the high-dimensional data is difficult.

791
MODEL-BASED ODS WITH STATISTICAL PREPROCESSING

Sugiyama and Borgwardt (2013) developed an unsupervised outlier

detection method using sampling-based in the literature (Sugiyama) and reported
that the sampling method outperforms the other method that uses the searching
technique using k-nearest neighbor principle. Koupaie, Ibrahim, and Hosseinkhani
(2014) suggested unsupervised outlier detection to detect the stream data. The
multi-objective genetic algorithm (MOGA) is used to search the outliers from an
object space and the k-means clustering is used to develop the model in order to
detect the outliers. Casas, Mazel, and Owezarski (2012) presented a knowledge-
independent method based on multi-clustering approach, which is adopted to
detect the network attack. The anti-k nearest neighbor approach is used to detect
the credit card fraud in the literature (Ganji & Mannem, 2012). The prime
advantage of the unsupervised outlier detection is that it does not require the
labelled data since the labelled data are costlier than unlabelled data and it
requires special mechanism to label the data. Therefore, this approach is simple
and cost-effective than the supervised approach.
In supervised method, the classifiers are used to detect the outliers by
learning the labelled data. The following works were carried out by the
researchers in supervised method. A RIPPER classification algorithm is used to
describe the logical rules to detect the outliers (Salvador & Chan, 2005) from the
temporal time series data. Cabrera, Lewis and Mehra (2001) presented an outlier
detection algorithm to detect the network anomaly and Masud, Gao, Khan, Han
and Thuraisingham (2011) built an ensemble model with voting approach to
detect the class label in order to solve the data stream classification problems.
Mourão-Miranda et al. (2011) used one-class support vector machine classifier to
detect the outlier patient from a group of patients based on the disease in medical
environment. In image processing, the support vector machine classifier is used to
detect the changes in very high geometrical resolution (VHR) images (Volpi, Tuia,
Bovolo, Kanevski & Bruzzone, 2013). The supervised approach requires the
labelled data for detection. Hence, it is expensive than the unsupervised method.
In the semi-supervised method, the labeled and unlabeled data are used to
detect the outliers. The semi-supervised approaches are followed by the
researchers as reported in the literature (Xue, Shang & Feng, 2010). The authors
presented a fuzzy rough c-means clustering to detect the outliers. Noto, Brodley,
and Slonim (2012) proposed an outlier detection system. In this system, the
normal instances are used to build the ensemble feature to detect the anomaly
from the received instances. Daneshpazouh and Sami (2014) used the entropy
measure to detect the outliers. Initially, the steadfast negative samples are taken
from unlabeled and positive data, and then the outliers are detected based on the

792
SINGH & LEAVLINE

entropy score to remove the outliers. Also, Zamoner and Zhao (2013) presented a
score base outlier detection using stochastic network method. A semi-supervised
cluster was also proposed in the literature to detect the outliers from the digital
mammograms (Thangavel & Mohideen, 2010).

Proposed Model-Based Outlier Detection System with Statistical

Preprocessing
Various outlier detection approaches have been developed. The outlier detection
requires the domain activity database that is also known as training dataset. This
data are classified into two types namely labelled data and unlabelled data.
Preparing the labelled data is costlier than the unlabelled data because the special
mechanisms are required to prepare the labelled data. Unfortunately, the
supervised method fails to process the unlabelled data. In order to overcome this
problem, the proposed outlier detection system is developed in an unsupervised
fashion. In the unsupervised fashion, the labelled data is not required. Thereby,
the initial cost is reduced.
The flowchart representation of the proposed system is depicted in Figure 2.
This system consists of three major phases namely removal of outliers, building
up of outlier detection model, and detection phase. In removal of outliers phase,
the outliers are removed from the training dataset using IQR. In the second phase,
the outlier detection model (ODM) is built using the k-means clustering algorithm.
In the detection phase, the ODM-based detector identifies the outlier objects from
the unknown object space.
The IQR measure is used to preprocess and identify the outliers from the
training dataset. The IQR finds the outliers from the dataset by identifying the
data which is over ranging from the dataset. The IQR is evaluated as
IQR = Q3 - Q1 where Q3 and Q1 are the upper and lower quartiles, respectively.
Let us consider a dataset with the numerical data that are arranged in ascending
order as 4, 7, 9, 11, 12, and 20. Then, the midpoint is identified to divide the
dataset into two halves such as 4, 7, 9 | 11, 12, 20. Then, the median of the lower
half and upper half of the data are found. Thus, Q1 = 7 is the median of the lower
half data portion and Q3 = 12 is the median of the upper half data portion. Then,
the IQR is calculated as Q3 - Q1 = 12 – 7 = 5. Outlier value OV = 1.5 × 5 = 7.5
where the 1.5 is the outlier factor, lower extreme LE = Q1 - OV = 7 - 7.5 = -0.5
and upper extreme UE = Q2 + OV = 12 + 7.5 = 19.5. Therefore, the normal
objects lie in between the upper and lower extreme and if any object lies out of
the two (LE and UE) extremes that is treated as outlier.

793
MODEL-BASED ODS WITH STATISTICAL PREPROCESSING

Figure 2. Flowchart representation of the proposed outlier detection system

The unsupervised k-means (Sammut & Webb, 2011) algorithm is used to

build the ODM using the training dataset which is preprocessed using IQR. The
ODM is used to categorize the unknown object to detect whether the given object
is an outlier or a normal object. In order to build ODM, the number of clusters k is
defined in such a way that the k-means algorithm defines a centroid for each
cluster. Then, the data points of the training dataset are grouped based on the
centroid of each cluster using similarity measure Euclidian distance. The cluster
center point is refined in an iterative fashion to form the perfect k groups of data
points using the Equation 1.

1 K K
W C      xi  x j   Nk 
2
xi  mk
2
(1)
2 k 1 C i k C  j k k 1 C i  k

where x1,…, xN are the data points of the training dataset. Each observed data
point is assigned to any one of the k clusters. C(i) indicates the cluster index of the
ith vector. mk denotes the mean vector of the kth cluster. Nk denotes the number of
observations of the kth clusters.

794
SINGH & LEAVLINE

Experimental Setup and Discussion on Result

In order to evaluate the performance of the proposed system, the real-world
datasets such as e-mail-based dataset Spambase, communication and control
system-based dataset Ionosphere, panel dataset Unbalance, banking dataset
Credit-g, and service-based dataset CPU-with-vendor are collected from the UCI
repository and other databases (Lichman, 2013).
Table 1 shows the details of datasets used for the experiments. In order to
conduct the experiment, initially IQR algorithm is applied on the datasets with the
extreme value factor and outlier factors 6.0 and 3.0, respectively. Then, the
outliers are detected using IQR algorithm as shown the in Table 2. Then, the
detected outlier instances are removed from the datasets. In order to build the
outlier detection model, the datasets without outliers are fed into the k-means
algorithm with the Euclidean distance measure and maximum number of
iterations of 500. Then, the outlier detection model is built using k-means
algorithm. The unknown objects can be detected using the built model. Then, the
test results are obtained to evaluate the performance of the proposed system by
observing the sum of squared error (SSE) and time taken to build the model
(TBM) for the each number of clusters k from 1 to 10 as tabulated in Table 3 and
Table 4.

Table 1. Dataset details

S.No. Dataset name No. of instances No. of attributes

1 Spambase 4601 57
2 Ionosphere 351 34
3 Unbalanced 856 32
4 Credit-g 1000 20
5 CPU-with-vendor 209 9

Figure 3 shows the comparison of SSE without and with IQR on various
datasets and it is evident that the proposed approach drastically reduces SSE
compared to the outlier detection without IQR. Figure 4 shows the comparison on
TBM in second without and with IQR on various datasets and it depicts that the
proposed approach drastically reduces the time to build the outlier detection
model for various datasets compared to the same without IQR.

795
MODEL-BASED ODS WITH STATISTICAL PREPROCESSING

Figure 3. Comparison on SSE without and with IQR on various datasets

Figure 4. Comparison on TBM in second without and with IQR on various datasets

796
SINGH & LEAVLINE

Table 2. Identified normal and outlier instances

S.No. Dataset name No. of normal instances No. of outlier instances

1 Spambase 2997 1604
2 Ionosphere 286 65
3 Unbalanced 717 139
4 Credit-g 822 178
5 CPU-with-vendor 171 38

Table 3. Comparison on the SSE in second without and with IQR on various dataset

Dataset Spambase Ionosphere Unbalanced Credit-g CPU-with-vendor

SSE SSE SSE SSE SSE

No. of SSE with SSE with SSE with SSE with SSE with
without without without without without
cluster IQR IQR IQR IQR IQR
IQR IQR IQR IQR IQR

2 805.85 95.77 726.10 381.19 816.98 745.86 5665.99 4514.86 182.01 172.92
3 779.89 89.72 698.25 300.54 704.30 634.68 5429.21 4351.55 169.85 162.98
4 737.68 80.83 585.51 254.90 651.60 532.59 5226.21 4067.71 154.92 153.98
5 718.09 66.25 537.73 234.59 564.56 499.952 5000.33 3981.12 156.57 145.80
6 682.87 58.53 518.41 213.18 497.39 464.60 4755.00 3882.03 143.78 138.12
7 669.83 55.19 512.51 198.76 467.14 423.99 4707.86 3762.71 138.60 128.38
8 597.81 53.91 492.88 181.63 452.88 412.22 4715.21 3705.64 124.80 121.90
9 648.86 51.25 466.82 175.07 424.46 402.31 4583.54 3645.53 119.09 118.00
10 576.90 49.87 452.85 170.46 410.49 376.22 4501.12 3611.47 111.84 109.55

797
MODEL-BASED ODS WITH STATISTICAL PREPROCESSING

Table 4. Comparison on the TBM in second without and with IQR on various datasets

CPU-with-
Dataset Spambase Ionosphere Unbalanced Credit-g
vendor
TBM TBM TBM TBM TBM TBM TBM TBM TBM TBM
No.of
without with without with without with without with without with
cluster
IQR IQR IQR IQR IQR IQR IQR IQR IQR IQR
2 0.49 0.14 0.24 0.2 0.38 0.18 0.18 0.17 0.07 0.06
3 5.39 0.18 0.17 0.15 0.44 0.2 0.32 0.32 0.04 0.04
4 5.43 0.23 0.22 0.2 0.76 0.35 0.21 0.13 0.07 0.04
5 6.34 0.24 0.2 0.18 0.72 0.64 0.42 0.19 0.05 0.04
6 6.19 0.25 0.33 0.21 0.68 0.51 0.4 0.19 0.04 0.03
7 6.64 0.26 0.3 0.18 0.56 0.28 0.39 0.36 0.06 0.03
8 7.82 0.26 0.21 0.21 0.49 0.4 0.34 0.26 0.1 0.05
9 7.99 0.26 0.2 0.17 0.94 0.62 0.48 0.32 0.1 0.06
10 8.54 0.27 0.26 0.25 0.56 0.55 0.39 0.33 0.08 0.07

Conclusion
This paper presented a model-based outlier detection system using statistical
preprocessing method. The performance of this proposed system is tested on the
various real world well-known publically available datasets with the performance
evaluation metrics namely sum of squared error (SSE) and time to build model
(TBM). From the experimental results, it is observed that the proposed system
performs better in terms of SSE and TBM for detecting the outliers regardless of
the application domain.

References
Akhilomen, J. (2013). Data Mining Application for Cyber Credit-Card
Fraud Detection System. In P. Perner (Ed.), Advances in Data Mining.
Applications and Theoretical Aspects: 13th Industrial Conference, ICDM 2013,
New York, NY, USA, July 16-21, 2013. Proceedings (pp. 218-228). Berlin,
Heidelberg: Springer Berlin Heidelberg
Cabrera, J. B. D., Lewis, L., & Mehra, R. K. (2001). Detection and
classification of intrusions and faults using sequences of system calls. ACM
SIGMOD Record, 30(4), 25-34. doi:10.1145/604264.604269
Casas, P., Mazel, J., & Owezarski, P. (2011). UNADA: Unsupervised
Network Anomaly Detection Using Sub-space Outliers Ranking. In J. Domingo-
Pascual, P. Manzoni, S. Palazzo, A. Pont, & C. Scoglio (Eds.), NETWORKING

798
SINGH & LEAVLINE

2011 (pp. 40–51). Springer Berlin Heidelberg. Retrieved September 4, 2014, from
http://link.springer.com/chapter/10.1007/978-3-642-20757-0_4
Casas, P., Mazel, J., & Owezarski, P. (2012). Knowledge-independent
traffic monitoring: Unsupervised detection of network attacks. IEEE Network,
26(1), 13-21. doi:10.1109/MNET.2012.6135851
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A
survey. ACM Computing Surveys, 41(3), 1-58. doi:10.1145/1541880.1541882
Daneshpazhouh, A., & Sami, A. (2014). Entropy-based outlier detection
using semi-supervised approach with few positive examples. Pattern Recognition
Letters, 49, 77-84. doi:10.1016/j.patrec.2014.06.012
Fred, A. L. N., & Jain, A. K. (2005). Combining multiple clusterings using
evidence accumulation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 27(6), 835-850. doi:10.1109/TPAMI.2005.113
Fujimaki, R., Yairi, T., & Machida, K. (2005). An approach to spacecraft
anomaly detection problem using kernel feature space. Proceedings of the
Eleventh ACM SIGKDD International Conference on Knowledge Discovery in
Data Mining, KDD ’05 (pp. 401-410). New York, NY: ACM.
doi:10.1145/1081870.1081917
Ganji, V. R., & Mannem, S. N. P. (2012). Credit card fraud detection using
anti-k nearest neighbor algorithm. International Journal on Computer Science
and Engineering, 4(6), 1035-1039.
Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., & Kanamori, T. (2011).
Statistical outlier detection using direct density ratio estimation. Knowledge and
Information Systems, 26(2), 309-336. doi:10.1007/s10115-010-0283-2
Hodge, V. J., & Austin, J. (2004). A survey of outlier detection
methodologies. Artificial Intelligence Review, 22(2), 85-126.
doi:10.1007/s10462-004-4304-y
Koupaie, M. H., Ibrahim, S., & Hosseinkhani, J. (2014). Outlier detection in
stream data by clustering method. International Journal of Advanced Computer
Science and Information Technology (IJACSIT), 2(3), 25-34. Retrieved
September 4, 2014, from http://papers.ssrn.com/abstract=2376191
Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA:
University of California, School of Information and Computer Science. Retrieved
from: http://archive.ics.uci.edu/ml
Manning, M., & Hudgins, L. (2010). Array-based technology and
recommendations for utilization in medical genetics practice for detection of

799
MODEL-BASED ODS WITH STATISTICAL PREPROCESSING

chromosomal abnormalities. Genetics in Medicine, 12(11), 742-745.

doi:10.1097/GIM.0b013e3181f8baad
Masud, M., Gao, J., Khan, L., Han, J., & Thuraisingham, B. M. (2011).
Classification and novel class detection in concept-drifting data streams under
time constraints. IEEE Transactions on Knowledge and Data Engineering, 23(6),
859-874. doi:10.1109/TKDE.2010.61
Mourão-Miranda, J., Hardoon, D. R., Hahn, T., Marquand, A. F., Williams,
S. C. R., Shawe-Taylor, J., & Brammer, M. (2011). Patient classification as an
outlier detection problem: An application of the One-Class Support Vector
Machine. NeuroImage, 58(3), 793-804. doi:10.1016/j.neuroimage.2011.06.042
Noto, K., Brodley, C., & Slonim, D. (2012). FRaC: A feature-modeling
approach for semi-supervised and unsupervised anomaly detection. Data Mining
and Knowledge Discovery, 25(1), 109-133. doi:10.1007/s10618-011-0234-x
Rousseeuw, P. J., & Hubert, M. (2011). Robust statistics for outlier
detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, 1(1), 73-79. doi:10.1002/widm.2
Salvador, S., & Chan, P. (2005). Learning states and rules for detecting
anomalies in time series. Applied Intelligence, 23(3), 241-255.
doi:10.1007/s10489-005-4610-3
Sammut, C., & Webb, G. I. (Eds.). (2011). Encyclopedia of Machine
Learning. Springer Science & Business Media.
Singhal, S., Gankotiya, A. K., Agarwal, S., & Verma, T. (2012). An
investigation of wireless sensor network: A distributed approach in smart
environment. Advanced Computing & Communication Technologies (ACCT),
2012 Second International Conference on, Rohtak, Haryana, 2012, pp. 522-529.
doi:10.1109/ACCT.2012.22
Sugiyama, M., & Borgwardt, K. (2013). Rapid distance-based outlier
detection via sampling. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani,
& K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems
26 (pp. 467–475). Curran Associates, Inc.
Tartakovsky, A. G., Polunchenko, A. S., & Sokolov, G. (2013). Efficient
computer network anomaly detection by changepoint detection methods. IEEE
Journal of Selected Topics in Signal Processing, 7(1), 4-11.
doi;10.1109/JSTSP.2012.2233713
Thangavel, K., & Mohideen, A. K. (2010). Semi-supervised k-means
clustering for outlier detection in mammogram classification. Trendz in

800
SINGH & LEAVLINE

Information Sciences Computing (TISC), 2010 (pp. 68-72). Presented at the

Trendz in Information Sciences Computing (TISC), 2010.
doi:10.1109/TISC.2010.5714611
Volpi, M., Tuia, D., Bovolo, F., Kanevski, M., & Bruzzone, L. (2013).
Supervised change detection in VHR images using contextual information and
support vector machines. International Journal of Applied Earth Observation and
Geoinformation, 20, 77-85. doi:10.1016/j.jag.2011.10.013
Xue, Z., Shang, Y., & Feng, A. (2010). Semi-supervised outlier detection
based on fuzzy rough C-means clustering. Mathematics and Computers in
Simulation, 80(9), 1911-1921. doi:10.1016/j.matcom.2010.02.007
Zamoner, F. W., & Zhao, L. (2013). A network-based semi-supervised
outlier detection technique using particle competition and cooperation. Intelligent
Systems (BRACIS), 2013 Brazilian Conference on (pp. 225-230).
doi:10.1109/BRACIS.2013.45
Zhang, Y., Hamm, N. A. S., Meratnia, N., Stein, A., van de Voort, M., &
Havinga, P. J. M. (2012). Statistics-based outlier detection for wireless sensor
networks. International Journal of Geographical Information Science, 26(8),
1373-1392. doi:10.1080/13658816.2012.654493

801

Compusoft, 3 (6), 831-835 PDF
No ratings yet
Compusoft, 3 (6), 831-835 PDF
5 pages
188 1496475265 - 03-06-2017 PDF
No ratings yet
188 1496475265 - 03-06-2017 PDF
6 pages
Anomaly Detection For Data Streams in Large-Scale Distributed Heterogeneous Computing Environments
No ratings yet
Anomaly Detection For Data Streams in Large-Scale Distributed Heterogeneous Computing Environments
11 pages
Detecting Outliers in High Dimensional Data Sets U
No ratings yet
Detecting Outliers in High Dimensional Data Sets U
6 pages
Anomaly Detection On Data Streams With H
No ratings yet
Anomaly Detection On Data Streams With H
6 pages
Outlier Detection in Power Systems
No ratings yet
Outlier Detection in Power Systems
16 pages
Outlier Detection Techniques Review
No ratings yet
Outlier Detection Techniques Review
13 pages
Outlier Mining Techniques For Uncertain Data
No ratings yet
Outlier Mining Techniques For Uncertain Data
7 pages
1 s2.0 S0020025523011052 Main
No ratings yet
1 s2.0 S0020025523011052 Main
17 pages
Reverse Accessible in Local Outlier Factor Density Based Recognition
No ratings yet
Reverse Accessible in Local Outlier Factor Density Based Recognition
10 pages
Outlier Detection Insights
No ratings yet
Outlier Detection Insights
84 pages
RANKSVM Approach For Outlier Detection in IRI
No ratings yet
RANKSVM Approach For Outlier Detection in IRI
6 pages
A Survey On Outlier Detection Methods
No ratings yet
A Survey On Outlier Detection Methods
4 pages
References
No ratings yet
References
6 pages
Elastic Anomalies
No ratings yet
Elastic Anomalies
7 pages
Six Sigma in Fraud Detection Analysis
No ratings yet
Six Sigma in Fraud Detection Analysis
4 pages
Distance Based Outlier Detection
No ratings yet
Distance Based Outlier Detection
40 pages
Feature Bagging For Outlier Detection
No ratings yet
Feature Bagging For Outlier Detection
11 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Outlier Detection via Supervised Learning
No ratings yet
Outlier Detection via Supervised Learning
12 pages
Fuzzy Granule Density-Based Outlier Detection With Multi-Scale Granular Balls
No ratings yet
Fuzzy Granule Density-Based Outlier Detection With Multi-Scale Granular Balls
16 pages
UN IT-5 Outliers and Statistical Approaches in Data Mining-2
No ratings yet
UN IT-5 Outliers and Statistical Approaches in Data Mining-2
15 pages
On Detection of Outliers and Their Effect in Supervised Classification
No ratings yet
On Detection of Outliers and Their Effect in Supervised Classification
14 pages
Anomaly Detection in High Dimensional Data
No ratings yet
Anomaly Detection in High Dimensional Data
30 pages
Anomaly Detection 2
No ratings yet
Anomaly Detection 2
8 pages
1 s2.0 S1877050915007000 Main
No ratings yet
1 s2.0 S1877050915007000 Main
9 pages
Unit 4 - Data Mining and Warehousing - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining and Warehousing - WWW - Rgpvnotes.in
13 pages
ISAT 600 Progress Report 3
No ratings yet
ISAT 600 Progress Report 3
4 pages
Be A 65 Ads Exp 7
No ratings yet
Be A 65 Ads Exp 7
7 pages
741 Outlier Detection
No ratings yet
741 Outlier Detection
55 pages
A Fuzzy Proximity Relation Approach For Outlier Detection in - 2021 - Soft Compu
No ratings yet
A Fuzzy Proximity Relation Approach For Outlier Detection in - 2021 - Soft Compu
12 pages
07 Outlier Detection
No ratings yet
07 Outlier Detection
54 pages
Anomaly Detection and Curve Fitting
No ratings yet
Anomaly Detection and Curve Fitting
72 pages
SSRN Id2376652
No ratings yet
SSRN Id2376652
8 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
ff12 Deep Learning For Anomaly Detection
No ratings yet
ff12 Deep Learning For Anomaly Detection
71 pages
Anomaly Detection
No ratings yet
Anomaly Detection
10 pages
Measuring of Data Quality in KYC Using Anomaly Det
No ratings yet
Measuring of Data Quality in KYC Using Anomaly Det
7 pages
Anomaly Detection in Log Files Based On Machine Le
No ratings yet
Anomaly Detection in Log Files Based On Machine Le
13 pages
Anomaly Detection Insights
No ratings yet
Anomaly Detection Insights
7 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Anomaly Detection
No ratings yet
Anomaly Detection
13 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
Efficient Outlier Detection in High-Dimensional Data Using
No ratings yet
Efficient Outlier Detection in High-Dimensional Data Using
21 pages
参考文献1
No ratings yet
参考文献1
16 pages
High-Dimensional Outlier Detection
No ratings yet
High-Dimensional Outlier Detection
4 pages
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
No ratings yet
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
9 pages
Anomaly Detection Survey Overview
No ratings yet
Anomaly Detection Survey Overview
72 pages
Methods To Detect Different Types of Outliers: March 2016
No ratings yet
Methods To Detect Different Types of Outliers: March 2016
7 pages
Pages: 1191 - 1196 ISSN: 2278-2397 International Journal of Computing Algorithm (IJCOA) 1191 Outlier Detection Scheme To Handle Wireless Sensor Data
No ratings yet
Pages: 1191 - 1196 ISSN: 2278-2397 International Journal of Computing Algorithm (IJCOA) 1191 Outlier Detection Scheme To Handle Wireless Sensor Data
6 pages
A Novel Anomaly Detection Scheme Based On Principal Component Classifier
No ratings yet
A Novel Anomaly Detection Scheme Based On Principal Component Classifier
10 pages
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
No ratings yet
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
12 pages
Unit - 3: Big Data Analytics
No ratings yet
Unit - 3: Big Data Analytics
23 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Literature Review On Feature Selection Methods For HighDimensional Data
No ratings yet
Literature Review On Feature Selection Methods For HighDimensional Data
9 pages
Ijisa V8 N1 8
No ratings yet
Ijisa V8 N1 8
7 pages
Cuckoo Optimisation Based Intrusion Dete
No ratings yet
Cuckoo Optimisation Based Intrusion Dete
8 pages
Decision Making in Enterprise Computinga Data Mining Approach
No ratings yet
Decision Making in Enterprise Computinga Data Mining Approach
11 pages
Grace vs. Performance in Faith
100% (1)
Grace vs. Performance in Faith
19 pages
Presented by Denise Tarlinton: Pupil Free Day Monday 14 July, 2003
No ratings yet
Presented by Denise Tarlinton: Pupil Free Day Monday 14 July, 2003
53 pages
""Ƒá Z X (/V) Œ (Æ C V?'' ) - :: Kanon
No ratings yet
""Ƒá Z X (/V) Œ (Æ C V?'' ) - :: Kanon
7 pages
Ƒv›>V (C∫Ô"¬Z¬ Ôω™/V™ ºâ´›Á>› >Údº√Vm (4:1Ä7)
No ratings yet
Ƒv›>V (C∫Ô"¬Z¬ Ôω™/V™ ºâ´›Á>› >Údº√Vm (4:1Ä7)
17 pages
Sermon Outlines 7
100% (2)
Sermon Outlines 7
3 pages
Grace vs. Performance in Faith
100% (1)
Grace vs. Performance in Faith
19 pages
© M.S.Vasanthakumar, Tamil Bible Research Centre, London, November, 2006
No ratings yet
© M.S.Vasanthakumar, Tamil Bible Research Centre, London, November, 2006
1 page
Drug and Alcohol Use Among Teenagers Essay
No ratings yet
Drug and Alcohol Use Among Teenagers Essay
1 page
(Worksheet 15.2) - (Probability)
No ratings yet
(Worksheet 15.2) - (Probability)
10 pages
TOPS-3-01 Safety and Reliability For Operation of Ammonia Plants PDF
100% (1)
TOPS-3-01 Safety and Reliability For Operation of Ammonia Plants PDF
10 pages
Stat and Prob Q4 Week 3 Module 11 Alexander Randy Estrada
No ratings yet
Stat and Prob Q4 Week 3 Module 11 Alexander Randy Estrada
21 pages
AP Chemistry Equilibrium Multiple Choice
No ratings yet
AP Chemistry Equilibrium Multiple Choice
8 pages
Module 5 Logic
No ratings yet
Module 5 Logic
10 pages
Possible Cds Topics
No ratings yet
Possible Cds Topics
2 pages
Communicating With Children
100% (1)
Communicating With Children
35 pages
Chess Secrets
No ratings yet
Chess Secrets
2 pages
Affair Notes Jan
No ratings yet
Affair Notes Jan
150 pages
University of Toronto Master Thesis
100% (3)
University of Toronto Master Thesis
7 pages
Nanotech's Future: Opportunities Ahead
No ratings yet
Nanotech's Future: Opportunities Ahead
76 pages
Abanoub Emil Demian: Objective
No ratings yet
Abanoub Emil Demian: Objective
3 pages
By Yeoh Seng Guan in Philippine Studies
No ratings yet
By Yeoh Seng Guan in Philippine Studies
6 pages
Monthly Report
No ratings yet
Monthly Report
7 pages
Psychology: Science & Research Methods
100% (1)
Psychology: Science & Research Methods
10 pages
Venture Capital Basics
No ratings yet
Venture Capital Basics
53 pages
Sliver On Rolled Alluminium
No ratings yet
Sliver On Rolled Alluminium
1 page
Stability Protocol for Drug Products
100% (4)
Stability Protocol for Drug Products
10 pages
2015 Natural Science Placement Results
No ratings yet
2015 Natural Science Placement Results
13 pages
Information Booklet 2025 26
No ratings yet
Information Booklet 2025 26
28 pages
107 Powerful Coaching Questions
No ratings yet
107 Powerful Coaching Questions
18 pages
Health Policy Development Guide
No ratings yet
Health Policy Development Guide
34 pages
Female Hormonal Cycle Lesson
No ratings yet
Female Hormonal Cycle Lesson
9 pages
Fermentation Rate Study of Juices
100% (1)
Fermentation Rate Study of Juices
3 pages
Stoichiometry-I Home Assignment-3
No ratings yet
Stoichiometry-I Home Assignment-3
1 page
Types of Solutions
No ratings yet
Types of Solutions
12 pages
ITC - Mod 5 - Ktunotes - in
No ratings yet
ITC - Mod 5 - Ktunotes - in
49 pages
Luh Gede Wirani Riskayanti Darmawan
No ratings yet
Luh Gede Wirani Riskayanti Darmawan
110 pages
Wolters 2003
No ratings yet
Wolters 2003
63 pages

Model-Based ODS With Statistical Preprocessing

Uploaded by

Model-Based ODS With Statistical Preprocessing

Uploaded by

Journal of Modern Applied Statistical

Model-Based Outlier Detection System with

Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm

Model-Based Outlier Detection System with

Keywords: Outlier, Preprocessing, Inter-quartile range, Anomaly Detection

Figure 1. Similarity-based grouping of object with outliers

Sugiyama and Borgwardt (2013) developed an unsupervised outlier

Proposed Model-Based Outlier Detection System with Statistical

Figure 2. Flowchart representation of the proposed outlier detection system

The unsupervised k-means (Sammut & Webb, 2011) algorithm is used to

Experimental Setup and Discussion on Result

Table 1. Dataset details

S.No. Dataset name No. of instances No. of attributes

Figure 3. Comparison on SSE without and with IQR on various datasets

Table 2. Identified normal and outlier instances

S.No. Dataset name No. of normal instances No. of outlier instances

Dataset Spambase Ionosphere Unbalanced Credit-g CPU-with-vendor

SSE SSE SSE SSE SSE

chromosomal abnormalities. Genetics in Medicine, 12(11), 742-745.

Information Sciences Computing (TISC), 2010 (pp. 68-72). Presented at the

You might also like