0% found this document useful (0 votes)

52 views12 pages

Basic Data Mining Tasks

The document outlines the Knowledge Discovery in Databases (KDD) process, which involves several steps including selection, pre-processing, transformation, data mining, and interpretation/evaluation to extract useful information from data. It discusses various visualization techniques and highlights challenges in data mining such as overfitting, outliers, and high dimensionality. Additionally, it emphasizes the importance of expert interpretation and the need for effective algorithms to handle large and diverse datasets.

Uploaded by

devipriya210387

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views12 pages

Basic Data Mining Tasks

Uploaded by

devipriya210387

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

•Knowledge Discovery in Databases (KDD): process of finding useful

information and patterns in data.

•Data Mining: Use of algorithms to extract the information and patterns

derived by the KDD process.

•KDD process involves many steps

•Input is the data and output is the desired useful info

•Five steps of the KDD process

•Selection-Obtain data from various database
•Pre-processing – may have incorrect or missing data, wrong data
corrected or removed ,missing data must be supplied or predicted
Transformation-Transformation techniques are used to make the
data easier to mine and more useful and to provide more
meaningful results.
•Data mining- applies algorithms to the transformed data to
generate the desired results.
•Interpretation/Evaluation-results are presented by various
visualization and GUI strategies
Visualization – refers to the visual representation to the data.
It includes the techniques:
• Graphical-graphs
• Geometric-box plot, scatter diagrams
• Icon based-colors,icons
• Pixel based-Unique colored pixel
• Hierarchical-divides screen into region based on values
• Hybrid-combines any of these methods
• May be 2D or 3D
KDD Process Ex: Web Log
• Selection:
Select log data (dates and locations) to use
• Preprocessing:
Remove identifying URLs
Remove error logs
• Transformation:
Sessionize (sort and group)
• Data Mining:
Identify and count patterns
Construct data structure
• Interpretation/Evaluation:
Identify and display frequently accessed sequences.
DATA MINING
DEVELOPMENT IN IR

SIMILARITY MEASURES

HIERARCHICAL
CLUSTERING

IR SYSTEMS

WEB SEARCH ENGINES

DATA MINING
DEVELOPMENT IN DB

RELATIONAL DATA
MODEL

SQL

ASSOCIATION RULES

DATA WAREHOUSING
DATA MINING
DEVELOPMENT IN ALG

ALGORITHM DESIGN

ALGORITHM ANALYSIS

DATA STRUCTURES
DATA MINING
DEVELOPMENT IN
MACHINE LEARNING

NEURAL NETWORKS

DECISION TREE
DATA MINING
DEVELOPMENT IN
STATISTICS

REGRESSION

EM ALGORITHM

K-MEANS CLUSTERING

TIME SERIES ANALYSIS

HUMAN INTERACTION – Technical experts need to formulate the queries
and assist in interpreting results

OVERFITING- Occurs when the data doesn’t fit the future stated

Outliers - Data doesn’t fit in the model

Interpretation of results – Needs an expert to interpret the correct results

Visualization of results – To easily view and understand the visualization

is needed

Large datasets- massive data creates problem when the algorithm designed
for the smaller dataset is applied.Can be rectified by the sampling

High dimensionality-Many attributes involved and difficult to determine

which one should be used.(dimensionality curse). Solution is to reduce the
number of attributes (dimensionality reduction).

Multimedia data- Different data types will affect the algorithm application
Multimedia data- Different data types will affect the algorithm
application

Missing data- During pre-processing ,missing data to be placed

Irrelevant data- some data may not be relevant

Noisy data- values might be incorrect or invalid

Changing data- database cannot be static

Integration – Introducing data mining functions into the database is

important

Application – effective use of algorithm to obtain results

Effectiveness or usefulness of the data mining should be measured using
some metrics

ROI (Return on Investment) examines the difference between what the data
mining tech costs and what the savings or benefits

Sales/advertising

Traditional metrics based on space and time based on complexity analysis

Accuracy

Social implications – Profiling is a process of evaluating data from past

source & analyzing & summarizing useful info about the data

Example – Similar Credit card purchases

Implementation issues

Scalability-Not up to date

Real world data-noisy data and missing values

Update-work with static

Ease of use-difficult or unable to understand.

1.1 DM-intro
No ratings yet
1.1 DM-intro
25 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
Data Mining: Concepts & Techniques
No ratings yet
Data Mining: Concepts & Techniques
29 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
DM Course Material
No ratings yet
DM Course Material
128 pages
Data Mining
No ratings yet
Data Mining
88 pages
Unit 1
No ratings yet
Unit 1
59 pages
Data Mining
No ratings yet
Data Mining
27 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
5 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Unit 1
No ratings yet
Unit 1
43 pages
Suraj R. Bhuyar: Presented by
No ratings yet
Suraj R. Bhuyar: Presented by
18 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Cap481 - Business Communication Unit 4
No ratings yet
Cap481 - Business Communication Unit 4
90 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
INTRODUCTION Data Mining
No ratings yet
INTRODUCTION Data Mining
43 pages
DWM 4
No ratings yet
DWM 4
23 pages
Unit - I
No ratings yet
Unit - I
22 pages
Chapter 7
No ratings yet
Chapter 7
26 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
7 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
Data Mining Concepts & Techniques Guide
100% (2)
Data Mining Concepts & Techniques Guide
27 pages
Data Mining and KDD Process Guide
No ratings yet
Data Mining and KDD Process Guide
19 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Chap 1
No ratings yet
Chap 1
32 pages
BIS 541 Ch01 20-21 S
No ratings yet
BIS 541 Ch01 20-21 S
129 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
32 pages
Data Mining 1
No ratings yet
Data Mining 1
56 pages
Introduction
No ratings yet
Introduction
27 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Data Mining: Tasks, Models, and Issues
No ratings yet
Data Mining: Tasks, Models, and Issues
19 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
29 pages
Unit 3
No ratings yet
Unit 3
23 pages
CH 1
No ratings yet
CH 1
66 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
DWDM
No ratings yet
DWDM
30 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Introduction
No ratings yet
Introduction
60 pages
Challan Form
No ratings yet
Challan Form
72 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
55 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Data Mining
No ratings yet
Data Mining
25 pages
01 Intro
No ratings yet
01 Intro
23 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
BY K.Swetha Sastry, CSE Dept
No ratings yet
BY K.Swetha Sastry, CSE Dept
17 pages
Link List All Operations Lab Task 6
No ratings yet
Link List All Operations Lab Task 6
14 pages
Job Description: Data Entry Operator - Billing Executive - Clerk - Back Office - Deo Seo
No ratings yet
Job Description: Data Entry Operator - Billing Executive - Clerk - Back Office - Deo Seo
3 pages
9 Java and MySQL
No ratings yet
9 Java and MySQL
4 pages
Log Upgrade FMC Failed
No ratings yet
Log Upgrade FMC Failed
2 pages
Fyit Practical Dbms
No ratings yet
Fyit Practical Dbms
3 pages
Java Hibernate Cookbook - Sample Chapter
No ratings yet
Java Hibernate Cookbook - Sample Chapter
30 pages
Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish
No ratings yet
Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish
30 pages
Aws Glu
No ratings yet
Aws Glu
17 pages
Target Seeker
No ratings yet
Target Seeker
6 pages
PLANxRAG Planning-Guided Retrieval Augmented Generation
No ratings yet
PLANxRAG Planning-Guided Retrieval Augmented Generation
22 pages
Project Name/Case Study: Second Year Practical Training Seminar Report
No ratings yet
Project Name/Case Study: Second Year Practical Training Seminar Report
38 pages
SecureSphere Database Security Solution
No ratings yet
SecureSphere Database Security Solution
2 pages
Resolve 1418 SQL Server Error
No ratings yet
Resolve 1418 SQL Server Error
1 page
Oracle Data Modeler - Getting Started
100% (1)
Oracle Data Modeler - Getting Started
24 pages
SQL Test - Analyst
No ratings yet
SQL Test - Analyst
4 pages
Introduction To SQL
No ratings yet
Introduction To SQL
28 pages
Xfs Filesystem Background
No ratings yet
Xfs Filesystem Background
16 pages
Database Systems Assignment Guide
No ratings yet
Database Systems Assignment Guide
4 pages
Access Control Matrix
No ratings yet
Access Control Matrix
48 pages
DbChapter Three
No ratings yet
DbChapter Three
41 pages
Info Privacy & Security Guide
No ratings yet
Info Privacy & Security Guide
14 pages
DBMS Week10
No ratings yet
DBMS Week10
4 pages
SQL Server 2008 Ephrem
No ratings yet
SQL Server 2008 Ephrem
15 pages
MongoDB Index Types Explained
No ratings yet
MongoDB Index Types Explained
18 pages
MIS - Question Paper 2024 25
No ratings yet
MIS - Question Paper 2024 25
2 pages
DBMS PDF Solutions
No ratings yet
DBMS PDF Solutions
3 pages
Search Engine Result Preference Guidelines v22811
No ratings yet
Search Engine Result Preference Guidelines v22811
21 pages
ETL Vs ELT and Data Lakehouse Presentation
No ratings yet
ETL Vs ELT and Data Lakehouse Presentation
16 pages
Data Mining and Data Warehousing
100% (1)
Data Mining and Data Warehousing
12 pages
1Z0-1041-24 Exam Questions
No ratings yet
1Z0-1041-24 Exam Questions
25 pages

Basic Data Mining Tasks

Uploaded by

Basic Data Mining Tasks

Uploaded by

•Knowledge Discovery in Databases (KDD): process of finding useful

information and patterns in data.

•Data Mining: Use of algorithms to extract the information and patterns

•KDD process involves many steps

•Input is the data and output is the desired useful info

•Five steps of the KDD process

WEB SEARCH ENGINES

TIME SERIES ANALYSIS

Outliers - Data doesn’t fit in the model

Interpretation of results – Needs an expert to interpret the correct results

Visualization of results – To easily view and understand the visualization

High dimensionality-Many attributes involved and difficult to determine

Missing data- During pre-processing ,missing data to be placed

Irrelevant data- some data may not be relevant

Noisy data- values might be incorrect or invalid

Changing data- database cannot be static

Integration – Introducing data mining functions into the database is

Application – effective use of algorithm to obtain results

Traditional metrics based on space and time based on complexity analysis

Social implications – Profiling is a process of evaluating data from past

Example – Similar Credit card purchases

Real world data-noisy data and missing values

Update-work with static

Ease of use-difficult or unable to understand.

You might also like