APPLICATIONS OF DATA MINING ISSUES IN
DATA MINING
Applications:
Financial Data Analysis
Retail Industry
Telecommunication Industry
Biological Data Analysis
Other Scientific Applications
Intrusion Detection
Financial Data Analysis:
Financial Data
Collected from Banks and Financial Institutions
Usually complete and reliable
Design and Construction of data Warehouses for
multi-dimensional data analysis and mining
Analysis Changes by month, by region, by
sectorand max, min, total, average, trend etc.
Characteristic and Comparative analysis, Outlier
Analysis
Loan payment and customer credit policy analysis
Feature Selection and attribute relevance ranking
(Debt ratio, credit history, income, education
level )
Loan granting policy can be adjusted
Low risk Customers are granted loans
Classification and Clustering of customers for
targeted marketing
Customer group identification
Multidimensional clustering techniques
Can associate new customer with existing groups
Detection of money laundering and financial crimes
Data from several sources integrated
Data Analysis tools can be used to detect unusual
patterns
Data Visualization tools, Linkage Analysis tools
Classification tools, Clustering tools
Outlier Analysis tools
Retail Industry:
Sales Data, Customer Shopping history, Goods
Transportation, E-Commerce
Mining can help to
Identify buying behaviour, discover shopping
trends
Improve the quality of customer service, retain
customers
Design and Construction of data warehouses
Several ways to design a warehouse
Entities involved: Sales, Customers,
Employers, Goods transportation
Preliminary data mining exercises can help to
guide the design process
Dimensions and levels to involve and preprocessing to be done
Multi-dimensional analysis of sales, customers,
products, time and region
Multi-feature data cube
Visualization tools
Analysis of effectiveness of sales campaigns
Compare sales and transaction volume
Multidimensional analysis
Compare sales amount, number of
transactions containing same items before and
after the campaign
Association Analysis
Identify items likely to be purchased together
Customer Retention
Customer loyalty and trends
Sequential pattern mining
Adjust pricing strategy and goods range
Purchase recommendation and cross-reference of
items
Recommender Systems
Sales promotion by displaying deal information in
association with items of interest
Telecommunication Industry:
Computer and Web data transmission, fax, Mobile
phone, Telephone services
Multidimensional analysis of telecommunication data
Helps to identify and compare the data traffic,
System work load, Resource usage, User Group
Behavior, Profit..
Time-of-day usage patterns
Fraudulent pattern analysis
Identify fraudulent users and atypical usage
patterns
Illegal Customer account access
Automatic Dial-out equipment
Switch and route congestion patterns
Multidimensional association and sequential pattern
analysis
Usage patterns for a set of communication
services by customer group, time of day
Sales Promotion
Mobile Telecommunication Services
Spatio-temporal data mining
Use of visualization tools
Biomedical and DNA Data Analysis:
Research in DNA Analysis has led to
Development of new drugs
Cancer therapies
Human genome study
Discovery of genetic causes for many diseases
Genome Research
Study of DNA Sequences
Adenine, Cytosine, Guanine, Thymine
1,00,000 genes each has hundreds of
nucleotides can be combined in a number of
ways
Identifying Gene Sequence patterns is challenging
Semantic Integration of Heterogeneous, distributed
genome databases
Highly distributed generation and use of DNA
data
Integrated data warehouses and distributed
federated databases
Efficient Data Cleaning and Integration methods
Similarity Search and Comparison among DNA
Sequences
Gene sequences isolated from healthy and
diseased tissues
Compare frequently occurring patterns in each
class
Help to identify the genetic factors of the disease
and immune factors
Non-numeric nature of data poses difficulties
Association Analysis: Identification of co-occurring
gene sequences
Diseases triggered by a combination of genes
acting together
Association analysis helps to detect the kinds of
genes that may co-occur
Study interactions and relationships between them
Path Analysis: Linking genes to different stages of
disease development
Different genes become active at different stages
of the disease
Develop drug interventions that target specific
stages
Visualization tools and genetic data analysis
Complex Gene structures Graphs, trees,
Cuboids and visualization tools
Better Understanding and support interactive data
exploration
Intrusion Detection:
Intrusions
Any set of actions that threaten the integrity,
availability, or confidentiality of a network
resource
Misuse detection: use patterns of well-known attacks
to identify intrusions
Signatures Must be updated
Classification based on known intrusions
E.g., three consecutive login failures: password
guessing.
Anomaly detection: use deviation from normal usage
patterns to identify intrusions
Any significant deviations from the expected
behavior are reported as possible attacks
Data Mining Algorithms
Misuse detection
training data labeled normal / intrusion
Classifier can be used to detect known
intrusions
Classification algorithms, Association rule
mining
Anomaly detection
Builds models of normal behavior and detects
significant deviations
Supervised normal training data
Unsupervised no information about training
data
Classification, clustering
Association and Correlation Analysis
Finds relationships between system attributes
describing the network data
Helps in selection of useful attributes
Analysis of Stream data
Transient and dynamic nature of intrusions
An event maybe normal on its own but malicious
when viewed as a part of a sequence
Distributed Data Mining
Analysis of data from several locations
Visualization and Querying tools
Data Mining in other Scientific Applications:
Old Scenario: Small, homogeneous data sets
Formulate hypothesis, build model, evaluate
results
Current Scenario: High-dimensional data, stream
data, heterogeneous data (spatial, temporal)
Collect and store data, mine for new hypotheses,
confirm with data or experimentation
Vast amounts of data have been collected from
Scientific domains
Climate and ecosystem modeling, Chemical
engineering, fluid dynamics, structural
mechanics
Data Warehouses and data preprocessing
Scientific applications methods are needed for
integrating data from heterogeneous sources
(Geospatial data warehouse) and identifying
events (Climate and Ecosystem data)
Mining complex data types
Scientific data Semi-structured and unstructured
Multimedia and Spatial data
Graph-based mining
Labeled graphs capture spatial, topological,
geometric and other relational characteristics
present in scientific data
Nodes objects to be mined; edges
relationships
Scalable and efficient mining methods are needed
Visualization tools and domain specific knowledge
High level GUIs and visualization tools are
needed
Integrated with existing domain-specific systems
and database systems
Issues in Data Mining:
Mining methodology and user interaction
Mining different kinds of knowledge in databases
Interactive mining of knowledge at multiple
levels of abstraction
Incorporation of background knowledge
Data mining query languages and ad-hoc data
mining
Expression and visualization of data mining
results
Handling noise and incomplete data
Pattern evaluation
Issues relating to the diversity of data types
Handling relational and complex types of data
Mining information from heterogeneous
databases and global information systems
(WWW)
Performance and scalability
Efficiency and scalability of data mining
algorithms
Parallel, distributed and incremental mining
methods