0% found this document useful (0 votes)

77 views11 pages

Data Mining Techniques and Methods

Clustering refers to grouping data points with similar characteristics. There are several methods of clustering including partitioning, hierarchical, density-based, grid-based, and model-based. Clustering allows businesses to better understand customer demographics and behaviors. Association rules are used to discover relationships between variables, such as products frequently purchased together. Classification assigns data points to categories to allow for comparative analysis and insight into broad customer groups. There are various classification methods including logistic regression, decision trees, K-nearest neighbors, naive Bayes, and support vector machines. Data visualization, cleaning, and mining techniques provide businesses with actionable insights from their data.

Uploaded by

Saweera Rasheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views11 pages

Data Mining Techniques and Methods

Uploaded by

Saweera Rasheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Data mining techniques and methods

1. Clustering
Clustering is a technique used to represent data visually — such as in
graphs that show buying trends or sales demographics for a particular
product.

What Is Clustering in Data Mining?

Clustering refers to the process of grouping a series of different data
points based on their characteristics. By doing so, data miners can
seamlessly divide the data into subsets, allowing for more informed
decisions in terms of broad demographics (such as consumers or users)
and their respective behaviors.

Methods for Data Clustering

 Partitioning method: This involves dividing a data set into a

group of specific clusters for evaluation based on the criteria of
each individual cluster. In this method, data points belong to just
one group or cluster.
 Hierarchical method: With the hierarchical method, data points
are a single cluster, which are grouped based on similarities. These
newly created clusters can then be analyzed separately from each
other.
 Density-based method: A machine learning method where data
points plotted together are further analyzed, but data points by
themselves are labeled “noise” and discarded.
 Grid-based method: This involves dividing data into cells on a
grid, which then can be clustered by individual cells rather than by
the entire database. As a result, grid-based clustering hase a fast
processing time.
 Model-based method: In this method, models are created for each
data cluster to locate the best data to fit that particular model.

Examples of Clustering in Business

Clustering helps businesses manage their data more effectively. For
example, retailers can use clustering models to determine which
customers buy particular products, on which days, and with what
frequency. This can help retailers target products and services to
customers in a specific demographic or region.
2. Association
Association rules are used to find correlations, or associations, between
points in a data set.

What Is Association in Data Mining?

Data miners use association to discover unique or interesting
relationships between variables in databases. Association is often
employed to help companies determine marketing research and strategy.

Methods for Data Mining Association

Two primary approaches using association in data mining are the single-
dimensional and multi-dimensional methods.

 Single-dimensional association: This involves looking for one

repeating instance of a data point or attribute. For instance, a
retailer might search its database for the instances a particular
product was purchased.
 Multi-dimensional association: This involves looking for more
than one data point in a data set. That same retailer might want to
know more information than what a customer purchased — such as
their age, method of purchase (cash or credit card), or age.

Examples of Association in Business

The analysis of impromptu shopping behavior is an example of
association — that is, retailers notice in data studies that parents
shopping for childcare supplies are more likely to purchase specialty
food or beverage items for themselves during the same trip. These
purchases can be analyzed through statistical association.
3. Data Cleaning
Data cleaning is the process of preparing data to be mined.
What Is Data Cleaning in Data Mining?
Data cleaning involves organizing data, eliminating duplicate or
corrupted data, and filling in any null values. When this process is
complete, the most useful information can be harvested for analysis.

Methods for Data Cleaning

 Verifying the data: This involves checking that each data point in

the data set is in the proper format (e.g, telephone numbers, social
security numbers).
 Converting data types: This ensures data is uniform across the
data set. For instance, numeric variables only contain numbers,
while string variables can contain letters, numbers, and characters.
 Removing irrelevant data: This clears useless or inapplicable
data so full emphasis can be placed on necessary data points.
 Eliminating duplicate data points: This helps speed up the
mining process by boosting efficiency and reducing errors.
 Removing errors: This eliminates typing mistakes, spelling
errors, and input errors that could negatively affect analysis
outcomes.
 Completing missing values: This provides an estimated value for
all data and reduces missing values, which can lead to skewed or
incorrect results.

Examples of Data Cleaning in Business

According to Experian, 95 percent of businesses say they have been
impacted by poor data quality. Working with incorrect data wastes time
and resources, increases analysis costs (because models need to be
repeated), and often leads to faulty analytics.
4. Data Visualization
Data visualization is the translation of data into graphic form to illustrate
its meaning to business stakeholders.

What Is Data Visualization in Data Mining?

Data can be presented in visual ways through charts, graphs, maps,
diagrams, and more. This is a primary way in which data scientists
display their findings.

Methods for Data Visualization

Many methods exist for representing data visually. Here are a few:

 Comparison charts: Charts and tables express relationships in the

data, such as monthly product sales over a one-year period.
 Maps: Data maps are used to visualize data pertaining to specific
geographic locations. Through maps, data can be used to show
population density and changes; compare populations of
neighboring states, counties, and countries; detect how populations
are spread over geographic regions; and compare characteristics in
one region to those in other regions.
 Heat maps: This is a popular visualization technique that
represents data through different colors and shading to indicate
patterns and ranges in the data. It can be used to track everything
from a region’s temperature changes to its food and pop culture
trends.
 Density plots: These visualizations track data over a period of
time, creating what can look like a mountain range. Density plots
make it easy to represent occurrences of single events over time
(e.g., month, year, decade).
 Histograms: These are similar to density plots but are represented
by bars on a graph instead of a linear form.
 Network diagrams: These diagrams show how data points relate
to each other by using a series of lines (or links) to connect objects
together.
 Scatter plots: These graphs represent data point relationships on a
two-variable axis. Scatter plots can be used to compare unique
variables such as a country’s life expectancy or the amount of
money spent on healthcare annually.
 Word clouds: These graphics are used to highlight specific word
or phrase instances appearing in a body of text; the larger the
word’s size in the cloud, the more frequent its use.

Examples of Data Visualization in Business

Representing data visually is an important skill because it makes data
readily understandable to executives, clients, and customers. According
to Markets and Markets, the market size for global data visualization
tools is expected to nearly double (to $10.2 billion) by 2026.
5. Classification
Classification is a fundamental technique in data mining and can be
applied to nearly every industry. It is a process in which data points from
large data sets are assigned to categories based on how they’re being
used.

What Is Classification in Data Mining?

In data mining, classification is considered to be a form of clustering —
that is, it is useful for extracting comparable points of data for
comparative analysis. Classification is also used to designate broad
groups within a demographic, target audience, or user base through
which businesses can gain stronger insights.

Methods for Data Mining Classification

 Logistic regression: This algorithm attempts to show the

probability of a specific outcome within two possible results. For
example, an email service can use logistic regression to predict
whether or not an email is spam.
 Decision trees: Once data is classified, follow-up questions can be
asked, and the results diagrammed into a chart called a decision
tree. For example, if a computer company wants to predict the
likelihood of laptop purchases, it may ask, Is the potential buyer a
student? The data is classified into “Yes” and “No” decision trees,
with other questions to be asked afterward in a similar fashion.
 K-nearest neighbors (KNN): This is an algorithm that tries to
identify an unknown object by comparing it to others. For instance,
grocery chains might use the K-nearest neighbors algorithm to
decide whether to include a sushi or hot meals station in their new
store layout based on consumer habits in the local marketplace.
 Naive Bayes: Based on the Bayes Theorem of Probability, this
algorithm uses historical data to predict whether similar events will
occur based on a different set of data.
 Support Vector Machine (SVM): This machine learning
algorithm is often used to define the line that best divides a data set
into two classes. An SVM can help classify images and is used in
facial and handwriting recognition software.

Examples of Classification in Business

Financial institutions classify consumers based on many variables to
market new loans or project credit card risks. Meanwhile, weather apps
classify data to project snowfall totals and other similar figures. Grocery
stores also use classification to group products by the consumers who
buy them, helping forecast buying patterns.
6. Machine Learning
Machine learning is the process by which computers use algorithms to
learn on their own. An increasingly relevant part of modern technology,
machine learning makes computers “smarter” by teaching them how to
perform tasks based on the data they have gathered.

What Is Machine Learning in Data Mining?

In data mining, machine learning’s applications are vast. Machine
learning and data mining fall under the umbrella of data science but
aren’t interchangeable terms. For instance, computers perform data
mining as part of their machine learning functions.

Methods for Machine Learning

 Supervised learning: In this method, algorithms train machines to

learn using pre-labeled data with correct values, which the
machines then classify on their own. It’s called supervised because
the process trains (or “supervises”) computers to classify data and
predict outcomes. Supervised machine learning is used in data
mining classification.
 Unsupervised learning: When computers handle unlabeled data,
they engage in unsupervised learning. In this case, the computer
classifies the data itself and then looks for patterns on its own.
Unsupervised models are used to perform clustering and
association.
 Semi-supervised learning: Semi-supervised learning uses a
combination of labeled and unlabeled data, making it a hybrid of
the above models.
 Reinforcement learning: This is a more layered process in which
computers learn to make decisions based on examining data in a
specific environment. For example, a computer might learn to play
chess by examining data from thousands of games played online.

Examples of Machine Learning in Business

With machine learning, companies can use computers to quickly identify
all sorts of data patterns (in sales, product usage, buying habits, etc.) and
develop business plans using those insights. This is a growing need in
many industries.
According to a MicroStrategy survey, 18 percent of analytics
professionals said machine learning and AI will have the most
significant impact on their strategies over the next five years. Learning
more advanced topics like machine learning is thus becoming imperative
for data scientists.
7. Neural Networks
Computers process large amounts of data much faster than human brains
but don’t yet have the capacity to apply common sense and imagination
in working with the data. Neural networks are one way to help
computers reason more like humans.

What Are Neural Networks in Data Mining?

Artificial neural networks attempt to digitally mimic the way the human
brain operates. Neural networks combine many computer processors
(similar to the way the brain uses neurons) to process data, make
decisions, and learn as a human would — or at least as closely as
possible.

Neural Network Methods

Neural networks consist of three main layers: input, “hidden,” and
output. Data enters through the input layer, is processed in the hidden
layer, and is resolved in the output layer where any relevant action based
on the data is then taken. The hidden layer can consist of many
processing layers, depending on the amount of data being used and
learning taking place.
Supervised and unsupervised learning also apply to neural networks;
neural networks use these types of algorithms to “train” themselves to
function in ways similar to the human brain.

Examples of Neural Networks in Business

Neural networks have a wide range of applications. They can help
businesses predict consumer buying patterns and focus marketing
campaigns on specific demographics. They can also help retailers make
accurate sales forecasts and understand how to use dynamic pricing.
Furthermore, they help to improve diagnostic and treatment methods in
healthcare, improving care and performance.
8. Outlier Detection
Outlier detection is a key component of maintaining safe databases.
Companies use it to test for fraudulent transactions, such as abnormal
credit card usage that might suggest theft.

What Is Outlier Detection in Data Mining?

While other data mining methods seek to identify patterns and trends,
outlier detection looks for the unique: the data point or points that differ
from the rest or diverge from the overall sample. Outlier detection finds
errors, such as data that was input incorrectly or extracted from the
wrong sample. Natural data deviations can be instructive as well.

Methods for Outlier Detection

 Numeric outlier: Outliers are detected based on the Interquartile
Range, or the middle 50 percent of values. Data points outside that
range are considered outliers.
 Z-score: The Z-Score denotes how many standard deviations a
data point is from the sample’s mean. This is also known as
extreme value analysis.
 DBSCAN: This stands for “density-based spatial clustering of
applications with noise” and is a method that defines data as core
points, border points, and noise points, which are the outliers.
 Isolation forest: This method isolates anomalies in large sets of
data (the forest) with an algorithm that searches for those
anomalies instead of profiling normal data points.

Examples of Outlier Detection in Business

Almost every business can benefit from understanding anomalies in their
production or distribution lines and how to fix them. Retailers can use
outlier detection to learn why their stores witness an odd increase in
purchases, such as snow shovels being bought in the summer, and how
to respond to such findings.
9. Prediction
Predictive modeling seeks to turn data into a projection of future action
or behavior. These models examine data sets to find patterns and trends,
then calculate the probabilities of a future outcome.

What Is Prediction in Data Mining?

Predictive modeling is among the most common uses of data mining and
works best with large data sets that represent a broad sample size.

Methods for Prediction

Predictive modeling uses some of the same techniques and terminology
as other data mining processes. Here are four examples:

 Forecast modeling: This is a common technique in which the

computer answers a question (for instance, How much milk should
a store have in stock on Monday?) by analyzing historical data.
 Classification modeling: Classification places data into groups
where it can be used to answer direct questions.
 Cluster modeling: By clustering data into groups with shared
characteristics, a predictive model can be used to study those data
sets and make decisions.
 Time series modeling: This model analyzes data based on when
the data was input. A study of sales trends over a year is an
example of time series modeling.

Examples of Prediction in Business

Predictive modeling is a business imperative that impacts nearly every
corner of the public and private sectors. According to MicroStrategy, 52
percent of global businesses consider advanced and predictive modeling
their top priority in analytics.
10. Data Warehousing
Data warehousing is the process by which data is collected and stored
before it is evaluated.

What Is Data Warehousing in Data Mining?

Data miners collect data from multiple sources into a common archive
before it can be used in business analysis. This process, called data
warehousing, typically occurs before the data mining process.

Methods for Data Warehousing

Data goes through a three-stage process known as ETL before being
loaded into a data warehouse. ETL stands for extract, transform, and
load:

 Extract: Data is copied and moved from its source to a warehouse

staging area. Data can be structured (names, dates, credit card
numbers, etc.) or unstructured (photos, videos, audio files, social
media posts).
 Transform: In this step, the data is filtered and cleaned — errors
are removed and the data is validated. The data is also formatted to
fit the warehouse.
 Load: In the final step, the transformed data is uploaded to the
data warehouse. These steps can be repeated as data is updated.

Examples of Data Warehousing in Business

Data warehouses make working with big data easier — particularly for
businesses that deal with large customer bases, sales and billing reports,
and resource plans. Through data warehousing, businesses can segment
and target customers from vast collections of sales orders, product
searches, or loyalty program registrations. They also can store and
analyze a wide variety of data points, even social media posts about
products and businesses.
Data warehousing also consolidates various data sources into one place,
making mining and decision-making more efficient and saving
businesses time and money.

Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Unit III Dwdm
No ratings yet
Unit III Dwdm
113 pages
combinepdf-1
No ratings yet
combinepdf-1
74 pages
2-Tasks and Techniques
No ratings yet
2-Tasks and Techniques
17 pages
DMI UNIT 1_186_N3
No ratings yet
DMI UNIT 1_186_N3
12 pages
Data Mining Unit 1(Msc Ds 3 Sem)
No ratings yet
Data Mining Unit 1(Msc Ds 3 Sem)
119 pages
Lecture 02
No ratings yet
Lecture 02
147 pages
Lec 02
No ratings yet
Lec 02
33 pages
AIML Curriculum powered by IBM - Pregrad-merged
No ratings yet
AIML Curriculum powered by IBM - Pregrad-merged
66 pages
unit 1 DM
No ratings yet
unit 1 DM
24 pages
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
DSS chapter 5
No ratings yet
DSS chapter 5
9 pages
Dwdm Unit-II Notes
No ratings yet
Dwdm Unit-II Notes
29 pages
Karunadu Technologies Private Limited: Internship Report On Artificial Intelligence & Machine Learning
No ratings yet
Karunadu Technologies Private Limited: Internship Report On Artificial Intelligence & Machine Learning
45 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
EX2 Cahpters
No ratings yet
EX2 Cahpters
8 pages
Machine Learning Approaches For Soil Type Classification in
No ratings yet
Machine Learning Approaches For Soil Type Classification in
20 pages
Chaudhuri - A Small Set of Stylometric Features Differentiates Latin Prose and Verse
No ratings yet
Chaudhuri - A Small Set of Stylometric Features Differentiates Latin Prose and Verse
14 pages
DM Unit-1
No ratings yet
DM Unit-1
27 pages
VO_MCA_S4_Data Mining Unit 1
No ratings yet
VO_MCA_S4_Data Mining Unit 1
18 pages
BE Elex and Comp Engg - 2019 Course
No ratings yet
BE Elex and Comp Engg - 2019 Course
91 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
important questions unit-1
No ratings yet
important questions unit-1
20 pages
120CS0121 - R - B Naga Pravallika
No ratings yet
120CS0121 - R - B Naga Pravallika
15 pages
Module_III_data_mining
No ratings yet
Module_III_data_mining
7 pages
Thesis 1
No ratings yet
Thesis 1
39 pages
data mining unit I notes
No ratings yet
data mining unit I notes
24 pages
Unit 1
No ratings yet
Unit 1
59 pages
DAL Assignment 4 Endsem
No ratings yet
DAL Assignment 4 Endsem
8 pages
Seminar on Data Mining Concepts and Its
No ratings yet
Seminar on Data Mining Concepts and Its
8 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
U1_1
No ratings yet
U1_1
13 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
L_1 Data Mining
No ratings yet
L_1 Data Mining
17 pages
Thong Kam 2008
No ratings yet
Thong Kam 2008
8 pages
Module - 7 - Medical Engineering Standards and Tests
No ratings yet
Module - 7 - Medical Engineering Standards and Tests
59 pages
Detecting Alzheimer From Speech
No ratings yet
Detecting Alzheimer From Speech
11 pages
Bana1 Visualization
No ratings yet
Bana1 Visualization
22 pages
Automated Classification of Static Code Analysis Alerts A Case Study
No ratings yet
Automated Classification of Static Code Analysis Alerts A Case Study
4 pages
DATA MINIING Unit 1 Notes
No ratings yet
DATA MINIING Unit 1 Notes
22 pages
Data Mining - Docx Ghhdocx
No ratings yet
Data Mining - Docx Ghhdocx
6 pages
640394541-Kantar-Consultant-Interview-questions-1
No ratings yet
640394541-Kantar-Consultant-Interview-questions-1
11 pages
Unit 3 Ba
No ratings yet
Unit 3 Ba
29 pages
Gacovski Z Ed Soft Computing and Machine Learning With Pytho
No ratings yet
Gacovski Z Ed Soft Computing and Machine Learning With Pytho
380 pages
Down 2
No ratings yet
Down 2
61 pages
Data-Mining-OVERVIEW (1)
No ratings yet
Data-Mining-OVERVIEW (1)
8 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining
No ratings yet
Data Mining
14 pages
Image Classification Using Cnn
No ratings yet
Image Classification Using Cnn
15 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
Thesis Manuscript 11 PDF
No ratings yet
Thesis Manuscript 11 PDF
69 pages
Face Identification Based On K-Nearest Neighbor
No ratings yet
Face Identification Based On K-Nearest Neighbor
21 pages
Lecture 7 8 Data Mining
No ratings yet
Lecture 7 8 Data Mining
23 pages
Data Mining
No ratings yet
Data Mining
11 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
Data Mining
No ratings yet
Data Mining
6 pages
Data Mining Process Week3
No ratings yet
Data Mining Process Week3
13 pages
Machine Learning Approach For An Automatic Irrigation System in Southern Jordan Valley
No ratings yet
Machine Learning Approach For An Automatic Irrigation System in Southern Jordan Valley
7 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Significant Role of Statistics in Computational Sciences
No ratings yet
Significant Role of Statistics in Computational Sciences
5 pages
Customer Churn - E-Commerce: Capstone Project Report
100% (1)
Customer Churn - E-Commerce: Capstone Project Report
43 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
No ratings yet
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
35 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
No ratings yet
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
15 pages
Static Filtered Skin Detection
No ratings yet
Static Filtered Skin Detection
6 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
Discuss The Role of Data Mining Techniques and Data Visualization in e Commerce Data Mining
No ratings yet
Discuss The Role of Data Mining Techniques and Data Visualization in e Commerce Data Mining
13 pages
Finding Needles in A Haystack: Using Data Analytics To Improve Fraud Prediction
No ratings yet
Finding Needles in A Haystack: Using Data Analytics To Improve Fraud Prediction
53 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Synopsis Print
No ratings yet
Synopsis Print
4 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
NLP-1 (Tokenization)
100% (1)
NLP-1 (Tokenization)
10 pages
Kantar - Consultant Interview Questions
No ratings yet
Kantar - Consultant Interview Questions
11 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Seminar Data Mining
No ratings yet
Seminar Data Mining
10 pages
Automatic Life-Logging: A Novel Approach To Sense Real-World Activities by Environmental Sound Cues and Common Sense
No ratings yet
Automatic Life-Logging: A Novel Approach To Sense Real-World Activities by Environmental Sound Cues and Common Sense
10 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
No ratings yet
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
20 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Data Science Portfolio
No ratings yet
Data Science Portfolio
17 pages
Inventory Control: Inventory Management Is A Great Time-Saving Tool. We Can Save The Time by
No ratings yet
Inventory Control: Inventory Management Is A Great Time-Saving Tool. We Can Save The Time by
8 pages
Budget Estimate, Revised Estimate, Performance Budget
0% (2)
Budget Estimate, Revised Estimate, Performance Budget
7 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet