DM Lesson3

A database is a structured collection of data for storage and retrieval, while data mining involves analyzing that data to extract insights. Data mining can be descriptive or predictive, with various task primitives guiding the process. Major issues in data mining include handling diverse data types, ensuring efficiency and scalability, and integrating with data warehousing systems.

Uploaded by

Eugene A. Estacio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views14 pages

DM Lesson3

Uploaded by

Eugene A. Estacio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

What is the difference between database and

data mining?
A database is a collection of structured data
organized for efficient storage and retrieval, while
data mining is analyzing data to extract insights or
patterns.
Descriptive data mining is used
to summarize and describe the
data, while predictive data
mining is used to make
predictions about future events.
Both techniques have their own
advantages and applications, and the
choice of technique depends on the
specific problem and the nature of
the data.
DATA MINING TASK PRIMITIVES:

A DM task is represented in form of DM query is define in terms of DM task Primitives

Will allow the user to interactively communicate with the DM system.

There are 5 DM task Primitives

1. Set of task relevant data to be mined
2. Specifies the kind of knowledge to be mined
3. The background knowledge to be used in discovery process
4. The interestingness measures and thresholds for pattern evaluation
5. The expected representation for visualizing the discovery
INTERESTINGNESS OF PATTERN:

- In a data mining system, everyday million of data patterns are generated.

- Among all these patterns generated, how many are really interesting?

-Actually, a small fraction of patterns generated would be of interest to any given user.
This raises three 3 question
1. What make the pattern interesting? 2. Can DM system generate all of the interesting Pattern?
- easily understood by human -refers to completeness of a DM system
-valid on new/ test data - in reality it is not possible fo a DM system to generate all
-potentially useful interesting patterns.
3. Can DM systems generate only interesting pattern?
-refers to optimization of a DM system
-generating only interesting patterns are generated, it becomes easy and efficient for the user(time is save
INTEGRATING A DATA MINING SYSTEM WITH A DB/DW SYSTEM
INTEGRATION-association/ combining/gouping= DM + Db/DW – communication
If there is no Integration- no communication with Db.
We have a total of 4 integration scheme
1. No Coupling 0/100
- Coupling- Combine
- There is no communication with the DB
- for this , it communicate with the storage methods like file system
2. Loose coupling
- eg. 15/100
- will use some of the functionalities (only up to the extend)
- something is better than nothing.
- better than no coupling (fetch the data)
- suitable for small data sets.
3. Semitight coupling 50/100
- linked to the Db
- Also some of the DM primitive are also implemented in Db.

4. Tight coupling 100/100

- DM is completely linked to Db
- most efficient among all
The DB sys. Is fully integrated in such a way that it becomes part of the DM system.
Efficient and optimized implementation of DM.
Db part of DM
MAJOR ISSUES IN DATA WAREHOUSING AND MINING

1. Mining different kinds of knowledge in databases

2. Interactive mining of knowledge at multiple levels of abstraction
3. Incorporation of background knowledge
4. Presentation and visualization of data mining results
5. Handling noise and incomplete data
6. Efficiency and scalability of data mining algorithm
Issues:
Data mining is not an easy task, as the algorithms used can get very complex and data is not always
available at one place. It needs to be integrated from various heterogeneous data sources. These
factors also create some issues. Here in this tutorial, we will discuss the major issues regarding −

 Mining Methodology and User Interaction

 Performance Issues
 Diverse Data Types Issues
The following diagram describes
the major issues.
Mining Methodology and User
Interaction Issues:

It refers to the following kinds of issues −

 Mining different kinds of knowledge in databases − Different users may be interested in different kinds
of knowledge. Therefore it is necessary for data mining to cover a broad range of knowledge discovery
task.
 Interactive mining of knowledge at multiple levels of abstraction − The data mining process needs to be
interactive because it allows users to focus the search for patterns, providing and refining data mining
requests based on the returned results.
 Incorporation of background knowledge − To guide discovery process and to express the discovered
patterns, the background knowledge can be used. Background knowledge may be used to express the
discovered patterns not only in concise terms but at multiple levels of abstraction.
 Data mining query languages and ad hoc data mining − Data Mining Query language that allows
the user to describe ad hoc mining tasks, should be integrated with a data warehouse query language
and optimized for efficient and flexible data mining.
 Presentation and visualization of data mining results − Once the patterns are discovered it needs to
be expressed in high level languages, and visual representations. These representations should be
easily understandable.
 Handling noisy or incomplete data − The data cleaning methods are required to handle the noise and
incomplete objects while mining the data regularities. If the data cleaning methods are not there then
the accuracy of the discovered patterns will be poor.

Pattern evaluation − The patterns discovered should be interesting

because either they represent common knowledge or lack novelty.
Performance Issues:
There can be performance-related
issues such as follows −

 Efficiency and scalability of data mining algorithms− In order to effectively extract the
information from huge amount of data in databases; data mining algorithm must be efficient and
scalable.
 Parallel, distributed, and incremental mining algorithms − The factors such as huge size of
databases, wide distribution of data, and complexity of data mining methods motivate the
development of parallel and distributed data mining algorithms. These algorithms divide the

data into partitions which is further processed in a parallel fashion. Then the results from the
partitions are merged. The incremental algorithms, update databases without mining the data
again from scratch.
Diverse Data Types Issues:
 Handling of relational and complex types of data − The database may contain complex data
objects, multimedia data objects, spatial data, temporal data etc. It is not possible for one system
to mine all these kind of data.
 Mining information from heterogeneous databases and global information systems − The data
is available at different data sources on LAN or WAN. These data source may be structured,
semi structured or unstructured. Therefore mining the knowledge from them adds challenges to
data mining.

DM&DW SEE Module 1
No ratings yet
DM&DW SEE Module 1
6 pages
Data Mining
No ratings yet
Data Mining
22 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
30 pages
Data Mining: Key Issues and Tasks
No ratings yet
Data Mining: Key Issues and Tasks
5 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining
No ratings yet
Data Mining
26 pages
Lecture 4 - 6
No ratings yet
Lecture 4 - 6
18 pages
Data Mining Issues
No ratings yet
Data Mining Issues
5 pages
Major Issues in DM
No ratings yet
Major Issues in DM
5 pages
Data Mining Task Primitives and Major Issues
No ratings yet
Data Mining Task Primitives and Major Issues
18 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
1 page
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
L-1 Data Mining Issues
No ratings yet
L-1 Data Mining Issues
24 pages
Whats App
No ratings yet
Whats App
23 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
5 Major Issues 10 Feb 2021material I 10 Feb 2021 Mod1 Issues
No ratings yet
5 Major Issues 10 Feb 2021material I 10 Feb 2021 Mod1 Issues
5 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
Unit 1
No ratings yet
Unit 1
11 pages
DWM chp4 Solution
No ratings yet
DWM chp4 Solution
11 pages
CS1004 DWM 2marks 2013
No ratings yet
CS1004 DWM 2marks 2013
22 pages
Data Mining
No ratings yet
Data Mining
3 pages
Data Mining Unit-I
No ratings yet
Data Mining Unit-I
5 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
Data Mining Moodle Notes U1
No ratings yet
Data Mining Moodle Notes U1
11 pages
My Notes DWDM
No ratings yet
My Notes DWDM
18 pages
Unit III
No ratings yet
Unit III
101 pages
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
No ratings yet
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
31 pages
Data Mining Challenges Explained
No ratings yet
Data Mining Challenges Explained
4 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
145 pages
Lesson 1
No ratings yet
Lesson 1
32 pages
Week1 2
No ratings yet
Week1 2
24 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
DMDW Full Notes
No ratings yet
DMDW Full Notes
26 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
No ratings yet
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
54 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
10 pages
Software
No ratings yet
Software
93 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
2 pages
Week 1-2
No ratings yet
Week 1-2
3 pages
DM Passing Package
No ratings yet
DM Passing Package
38 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
DM Notes
No ratings yet
DM Notes
26 pages
DM-unit 1
No ratings yet
DM-unit 1
22 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
7 pages
Unit 1
No ratings yet
Unit 1
46 pages
Lec2 - TYPES OF DATA
No ratings yet
Lec2 - TYPES OF DATA
45 pages
Greekmath
No ratings yet
Greekmath
42 pages
Historyofmathematics Original 140108080434 Phpapp01
No ratings yet
Historyofmathematics Original 140108080434 Phpapp01
47 pages
Lec3 Statistical Notation
No ratings yet
Lec3 Statistical Notation
16 pages
History of Mathematics Detailed Syllabus
No ratings yet
History of Mathematics Detailed Syllabus
3 pages
Program of Activities
No ratings yet
Program of Activities
1 page
DM Lesson4
No ratings yet
DM Lesson4
24 pages
Cse450 Obe Ugc Cse Course Outline
No ratings yet
Cse450 Obe Ugc Cse Course Outline
9 pages
Network Devices for Engineering Students
No ratings yet
Network Devices for Engineering Students
38 pages
Famous Mathematician
No ratings yet
Famous Mathematician
25 pages
SE-Course Description and Syllabus-Spring 2020
No ratings yet
SE-Course Description and Syllabus-Spring 2020
10 pages
Pangasinan Collegiate Basketball Rules
No ratings yet
Pangasinan Collegiate Basketball Rules
6 pages
NSTP Diliman Operations Manual
100% (1)
NSTP Diliman Operations Manual
62 pages
Computer Networking Course Syllabus
No ratings yet
Computer Networking Course Syllabus
4 pages
PCBL Form Medical Cert and Waiver
No ratings yet
PCBL Form Medical Cert and Waiver
1 page
OBE Syllabus BUS 5 Quantitative Techniques To Business
No ratings yet
OBE Syllabus BUS 5 Quantitative Techniques To Business
7 pages
Worksheet 5 Mean Mode of Grouped Data
No ratings yet
Worksheet 5 Mean Mode of Grouped Data
9 pages
Toaz - Info NSTP Cwts Syllabus PR
No ratings yet
Toaz - Info NSTP Cwts Syllabus PR
3 pages
Presentation Topic: Convers I On of Number S
No ratings yet
Presentation Topic: Convers I On of Number S
45 pages
Application of DM in Cse 181128093246
No ratings yet
Application of DM in Cse 181128093246
16 pages
EE305 Discrete Math Syllabus
No ratings yet
EE305 Discrete Math Syllabus
4 pages
Calculus 1 Syllabus
No ratings yet
Calculus 1 Syllabus
9 pages
Big Data and Data Warehousing 1
No ratings yet
Big Data and Data Warehousing 1
24 pages
Dbms
No ratings yet
Dbms
99 pages
Power BI Made Simple: James Serra
No ratings yet
Power BI Made Simple: James Serra
41 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
DB2
100% (2)
DB2
16 pages
New Syllabus - COMP 482 Data Mining1674216496
No ratings yet
New Syllabus - COMP 482 Data Mining1674216496
3 pages
09b Cassandra Slides
No ratings yet
09b Cassandra Slides
26 pages
Understanding Data Independence
No ratings yet
Understanding Data Independence
3 pages
Facebook Database Design Analysis
No ratings yet
Facebook Database Design Analysis
12 pages
AJG Atlas Search Tech Webinar 2023 03
No ratings yet
AJG Atlas Search Tech Webinar 2023 03
34 pages
Hands On
No ratings yet
Hands On
2 pages
Data Analysis
100% (1)
Data Analysis
4 pages
Practical 5 - Dhyana
No ratings yet
Practical 5 - Dhyana
6 pages
DBMS 202
No ratings yet
DBMS 202
4 pages
SQL CREATE TABLE Statement
No ratings yet
SQL CREATE TABLE Statement
11 pages
Oracle Practical Program - 240625 - 100611-1
No ratings yet
Oracle Practical Program - 240625 - 100611-1
11 pages
Real-Time Data Analytics Guide
100% (2)
Real-Time Data Analytics Guide
30 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
2 pages
Learn: For Data Science
No ratings yet
Learn: For Data Science
14 pages
Displaying Data From Multiple Tables
No ratings yet
Displaying Data From Multiple Tables
23 pages
2008 The Modern Algebra of Information Retrieval
No ratings yet
2008 The Modern Algebra of Information Retrieval
332 pages
RESTful Day 1 PDF
No ratings yet
RESTful Day 1 PDF
47 pages
SD Tables
No ratings yet
SD Tables
3 pages
Chapter19 Recovery
No ratings yet
Chapter19 Recovery
38 pages
Relational Model Concepts in DBMS
No ratings yet
Relational Model Concepts in DBMS
38 pages
Mainframe Administration Material
100% (1)
Mainframe Administration Material
38 pages
My SQL
No ratings yet
My SQL
26 pages
Entity-Relationship Modeling: Pearson Education © 2014
No ratings yet
Entity-Relationship Modeling: Pearson Education © 2014
26 pages
RDBMS Unit1
No ratings yet
RDBMS Unit1
10 pages
Data Vault & HQDM Insights
No ratings yet
Data Vault & HQDM Insights
8 pages

DM Lesson3

Uploaded by

DM Lesson3

Uploaded by

What is the difference between database and

A DM task is represented in form of DM query is define in terms of DM task Primitives

Will allow the user to interactively communicate with the DM system.

There are 5 DM task Primitives

- In a data mining system, everyday million of data patterns are generated.

4. Tight coupling 100/100

1. Mining different kinds of knowledge in databases

 Mining Methodology and User Interaction

It refers to the following kinds of issues −

Pattern evaluation − The patterns discovered should be interesting

You might also like