Lecture Notes PRM42

Rural management, IRMA class note

Uploaded by

dharamulva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views3 pages

Lecture Notes PRM42

Rural management, IRMA class note

Uploaded by

dharamulva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Data Mining Techniques for Managers (DMTM)

By Kushal Anjaria
Session-1

Nowadays, we are witnessing the enormous growth of data

from terabytes to petabytes. This is a major challenge for
companies to manage, analyse, and visualize data. In this
note, we will discuss the top big data trends that are shaping
the future. For data, we have multiple data collection tools
and sources. The device includes various types of disk,
servers, hardware, and processing units. The sources can be
classified into three categories
1. Business: transactions stocks Web and e-commerce
2. Science: Remote Sensing bioinformatics and
simulations
3. Society: News digital camera social networking
sites and so on
“Real drowning in data but starving for knowledge.”- Prof.
Pabitra Mitra. Data is growing exponentially, and the rate of
data growth is expected to double every two years. As a result,
there is a tremendous increase in the demand for data
scientists who can manage this data and make sense of it. In
this situation, data mining comes into the picture

Definition of data mining: Extraction of interesting, • In this diagram, the entire KDD process is described.
nontrivial, implicit, previously unknown, and the potentially • From the vast amount of data, it is crucial to search
useful pattern of knowledge obtained from a vast amount of for attributes that fulfill our requirements. This
data is known as data mining process is known as the selection process.
• Once the data is selected, we check the data to verify
The alternative name of data mining is Knowledge Discovery whether any data point is missing or not. The task
from Data (KDD). KDD can be defined as “the process of of scanning the data is known as data pre-
discovering new patterns in large data sets to gain insight into processing. In the pre-processing stage, we may also
the problem at hand.” Data mining is a subset of KDD, which fill the data points using statistical functions.
involves the use of specific algorithms and approaches to • In the transformation phase, we combine our data
analyse large datasets. While doing data mining, one should into meaningful repositories. We create a data
be careful as one has to know what data mining is. For warehouse where relational databases may be given
example, a simple search in the search engine or query in the some formal meanings and interpretations in the
database is not a data mining procedure. transformation phase.
A data mining process is a set of tasks to analyse data, • In the data mining phase, we apply mathematical
uncover patterns and make predictions. It can be used for models and data mining algorithms to transform
many purposes such as fraud detection, marketing analysis data. This stage helps us in identifying underlying
and business intelligence. Data mining uses various patterns in the data. In the data mining phase, we use
techniques such as machine learning and statistical pattern various statistical analysis and learning algorithms.
recognition to identify hidden patterns from large datasets. It • The final stage of the KDD process is interpretation
is a process that focuses on understanding the relationship and evaluation. In this phase, we convert the pattern
between variables or items in a dataset. Data analytics is a obtained in the data mining phase into a human-
process that uses statistical and mathematical techniques to understandable form. Proper data interpretation and
extract information from data in order to reveal patterns and visualization only leads to knowledge generation.
relationships. It can be used for many purposes such as fraud • The entire KDD process is an iterative process that
detection, marketing analysis and business intelligence. means once knowledge is generated, one can again
Normal Data Analytics procedure will not be able to handle go back to the data selection and pre-processing
the following: stage.
• Data Stream (from the sensor), time-series data,
temporal data, sequential data, We will start our discussion on data mining by understanding
• graphs, graphical data, multi-linked data, social the meaning of data.
network data
• Heterogeneous database and legacy databases, In this course, we consider data in tabular form. Suppose a
• Multimedia, large text, and web data bank has provided us with the historical past data. From the
• Simulation and forecasting data patterns available in the data, we intend to evaluate new loan
applications. We aim to identify whether the new applicant is
Procedure for knowledge discovery from data a fraud or legit.
1002 is greater than the person with 1001. You cannot
compare these values. They are just symbolic values.
Consider another example: Kushal Anjaria and another
person’s name is to say something else, say Ram Kumar. So,
it does not tell us about anything more besides our identity.

2. Ordinal attributes: e.g., ranking, grades, weights,

measurements
The ordinal values can be compared and measured. For
example, you are rating a movie or potato chips, for instance,
on a scale of 1 to 10, how good it is or how bad it is.

3. Interval attributes: e.g., date, temperature range

The value of interval attributes represents some interval
space. E.g., a date. The calendar date tells you that whether
another date, say the date of a loan application, falls in some
time interval or not. From the interval attribute values, you
can say that one belongs to this interval one does not belong
to this interval.

4. Ratio attributes: e.g., time, the temperature where

you can change the unit, and ratio can be obtained.

The data will have specific attributes and objects or it can be Properties of the attributes:
structured or unstructured. The data mining process is a very The four types of attributes described above depend upon
complex process that involves many factors. The data mining which of the following properties it possesses:
process includes: Data cleaning, Data integration, Data pre-
processing, Data mining algorithms, and Data Visualization. 1. Distinctness:
In the above example, the table columns are the attributes, and 2. Order
rows of the table are records. In data mining literature, 3. Addition or subtraction
attributes or columns are known as features, variables, or 4. Multiplication or division
inputs.
One important thing to be noted is that in this particular Nominal attribute: distinctness
representation if we examine a row, we can think of each row Ordinal attribute: distinctness and order
as a vector whose components are these individual attribute Interval attribute: distinctness, order, and addition
values. These vectors are also sometimes known as the object Ratio attribute: All the four properties above
vector or the feature vector. Mathematically, we know that
each vector will have a dimension associated with it. The Each data mining algorithm is redefined based on which
number of attributes determines the dimension of the vectors attributes we use and which property the attribute possesses.
in it. In the present example, we have five attributes, so it is a The attribute types and operations are summarized in the table
five-dimensional vector. In data mining, each vector is below:
considered as a point in the coordinate system. For the present
example, ten objects can be represented as data points in the
five-dimensional coordinate system.
Furthermore, a bank may have a collection of one lakh loan
applications over the past year. These loans can be thought of
as one lakh points in a five-dimensional coordinate system.
And once you do this plotting exercise, it helps you visualize
the nature of the data.

There are four types of attributes used in the data mining

process

1. Nominal attributes: e.g., ID numbers, eye color, zip

codes, etc.
The nominal attributes are just symbols. In other words, they
are arbitrary. The nominal attributes of a coin are its face
value, color and size—but none of these things make it
valuable. For example, the ID number of a bank account is
only a number; it has no other meaning. Similarly, eye color:
black or white or blue, zip code the pin code of a place. So, In data mining, whatever operations you do has to be
these are just numbers or values or symbols. These attributes compatible with the data type.
are nominal attributes that act as identifiers. Why are these In data mining, the attribute can be represented as the
only symbols? Because suppose a person has a bank account discrete or continuous attribute:
number say 1001, and another person has a bank account
number say 1002, then you cannot say that the person with Discrete attribute:
• Has finite or countably infinite set of values
• Examples: zip code, set of words in the documents, three has bought milk and diaper and beer and coke
counting of any entity, number of accounts in the and so on.
bank, number of products in the warehouse • These types of transactions are called market basket
• Often represented as integer values transactions. So, these transactions consist of 2 parts.
• Please note that binary values of any attribute can 1st is the Id of the transaction of particular
be the particular case of the discrete attributes customers, and the second is the list of items
Continuous attributes: purchased by the customers. Suppose every day
• Has real numbers as attribute values thousands of people come to the supermarket, and
• For example, temperature, height, weight do this kind of transaction. If you see throughout, say
• Practically real numbers are represented using a 1 or 2 years, there will be an enormous amount of
finite number of digits data. IBM was the first company to analyze these
• Continuous variables are represented as floating- data types and come up with the association rule
point variables generation and mining technique.
Before going for data transformation, it is crucial to check the
quality of data. Let’s observe the following table and find out what IBM has
discovered from the data.
The data can be considered of bad quality if TID ITEMS
• Some values of the attributes are missing
• If the data domain is not satisfied 1 Bread, Milk
• Incorrect data is inserted
• Duplication or redundancy of data exists 2 Bread, Diaper, Beer, Eggs
• If data has some noise or distortion
• Data with outliers (e.g., % data with more than 100% 3 Milk, Diaper, Beer, Coke
values. In decimal, missing the. etc.)
4 Bread, Milk, Diaper, Beer
Data pre-processing increases the value of the data.
5 Bread, Milk, Diaper, Coke
Moreover, it also decreases the computational load.
Data pre-processing tasks can be completed in the following
way: The table shows that people who buy bread and milk are most
1. Aggregation: Aggregation means sometimes you likely to buy diapers. The people who buy diapers are most
consider a bunch of data together. After that, the likely to buy beer. Now this kind of pattern had some
cumulative information of all these data is used. commercial significance. For example, if you buy diapers, I
2. Sampling: In the sampling technique, only a few could have given you a discount, and you can buy beer at a
representative data are kept, and the rest is thrown discounted rate. I can also arrange the placements of the items
away. The idea is that only the sample is enough for accordingly in the store. Now the question is, from the vast
the processing. amount of data, how to calculate association rules?
3. Dimensionality reduction: We pick only the required For association rule, the following terminologies are useful:
characteristics of the data. For example, if you go to 1. Item set: Collection of one or more items: e.g.
a doctor with lots of symptoms and lots of {Bread, Milk, Diaper, Coke}. The k-item set means
measurements, the doctor would not look at all of the item set that contains k-items
them. The doctor will select a few of them and 2. Support count (𝜎): Frequency of occurrence of an
complete the diagnosis. item set. E.g., 𝜎 ({Bread, Milk, Diaper})=2
4. Discretization and binarization: Sometimes, we have 3. Support (s): The fraction of transactions that
to convert data in the discrete form or binary form contains an item set. s({Bread, Milk, Diaper})=⅖
from the continuous data. 4. Frequent Itemset: An itemset whose support is
greater than or equal to some minimum support
• Now, we will focus on data transformation and threshold.
pattern mining. So, the first pattern we will consider
is something known as association rules.
• This association pattern origin was one of the
earliest use of data mining in the retail shop. Say, for
example, you are going to a supermarket or a mall,
and you have bought some items. For this instance,
I may record the bill after a person has bought
something in his basket. For each transaction or
purchase by the customer, I will have a massive
number of rows for each basket of items. You can
see a table where these rows are describing the
different transactions. So, TID 1 is the transaction Id
1, for the first customer’s transactions. The next row
is the subsequent customer transactions, and so on.
Along with the transactions, what is noted is that
what are the items purchased by that customer. So,
you can see that in this table, customer one has
bought bread and milk, customer two has bought
bread and diaper and beer and eggs, and customer

Data Mining Techniques (DMT) by Kushal Anjaria Session-1 (Lecture Note)
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-1 (Lecture Note)
2 pages
Data Mining
No ratings yet
Data Mining
15 pages
DWDM Reference Notes
No ratings yet
DWDM Reference Notes
126 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
Types of Attributes-1
No ratings yet
Types of Attributes-1
8 pages
Mining
No ratings yet
Mining
7 pages
Unit I Notes
No ratings yet
Unit I Notes
23 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
Data Warehousing and Data Mining: DR Seema Agarwal
No ratings yet
Data Warehousing and Data Mining: DR Seema Agarwal
72 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Data Mining and KDD Insights
No ratings yet
Data Mining and KDD Insights
13 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
Unit 2
No ratings yet
Unit 2
21 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
DM Module1
No ratings yet
DM Module1
15 pages
Unit 3 DWM Notes
No ratings yet
Unit 3 DWM Notes
17 pages
Unit 1: Scs5623 - Data Mining and Warehousing
No ratings yet
Unit 1: Scs5623 - Data Mining and Warehousing
13 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
Data Mining & KDD Overview
No ratings yet
Data Mining & KDD Overview
22 pages
DWM chp4 Solution
No ratings yet
DWM chp4 Solution
11 pages
Unit 1 Datamining
No ratings yet
Unit 1 Datamining
16 pages
DM Unit-1
No ratings yet
DM Unit-1
14 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
DWM 4
No ratings yet
DWM 4
23 pages
Yihao Final Paper CCSC For Submission
No ratings yet
Yihao Final Paper CCSC For Submission
6 pages
Chap1 Introduction
No ratings yet
Chap1 Introduction
58 pages
Unit 1
No ratings yet
Unit 1
21 pages
Advance Database With Lab: Professor & Head (Department of Software Engineering)
No ratings yet
Advance Database With Lab: Professor & Head (Department of Software Engineering)
5 pages
Unit 1
No ratings yet
Unit 1
28 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Data Mining for CS Students
No ratings yet
Data Mining for CS Students
406 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining Applications & Tools
No ratings yet
Data Mining Applications & Tools
126 pages
DWDM Unit II Notes
No ratings yet
DWDM Unit II Notes
22 pages
DM - Midsem - Question Bank
No ratings yet
DM - Midsem - Question Bank
5 pages
Whats App
No ratings yet
Whats App
23 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
DM Module 1
No ratings yet
DM Module 1
11 pages
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
No ratings yet
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
55 pages
Data Minng
No ratings yet
Data Minng
20 pages
Ramy Mahmoud 52117
No ratings yet
Ramy Mahmoud 52117
3 pages
Data Mining Note
No ratings yet
Data Mining Note
79 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Module1 DataMining Ktustudents - in
No ratings yet
Module1 DataMining Ktustudents - in
24 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Data Science Module 1 Notes
No ratings yet
Data Science Module 1 Notes
16 pages
Unit 1
No ratings yet
Unit 1
11 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Datamining 1class
No ratings yet
Datamining 1class
76 pages
DATA MINING-Knowledge Discovery in Databases
No ratings yet
DATA MINING-Knowledge Discovery in Databases
6 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
17 pages
DM Unit - 3
No ratings yet
DM Unit - 3
10 pages
Data Mining Lecture One - Docx1
No ratings yet
Data Mining Lecture One - Docx1
12 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
20 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
24 pages
Dabur FMCG OnePager
No ratings yet
Dabur FMCG OnePager
1 page
Lecture Note Session-3,4
No ratings yet
Lecture Note Session-3,4
4 pages
Last Yr Paper
No ratings yet
Last Yr Paper
5 pages
Horticulture 2
No ratings yet
Horticulture 2
1 page
Chapter III
No ratings yet
Chapter III
23 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
Securitization and Off-Balance - Sheet - Activities
No ratings yet
Securitization and Off-Balance - Sheet - Activities
40 pages
Case NF Exercise-X Answers
No ratings yet
Case NF Exercise-X Answers
1 page
CAC Presentations - Upto Session 6 - For Quiz 1
No ratings yet
CAC Presentations - Upto Session 6 - For Quiz 1
115 pages
1 RF PRMX5 2024 in Temp HSS
No ratings yet
1 RF PRMX5 2024 in Temp HSS
5 pages
6.1 Risk Management
No ratings yet
6.1 Risk Management
25 pages
6.3 Pricing of Futures
No ratings yet
6.3 Pricing of Futures
14 pages
Difference Between An NPN and A PNP Transistor
100% (1)
Difference Between An NPN and A PNP Transistor
24 pages
Ficha Tecnica Carrete Punto A Tierra
No ratings yet
Ficha Tecnica Carrete Punto A Tierra
1 page
Quality Assurance and Quality Control in Chemical Processing Plants TÜV SÜD
No ratings yet
Quality Assurance and Quality Control in Chemical Processing Plants TÜV SÜD
1 page
Brocade® 32Gb/s SWL SFP+: Optimized, Certified Optical Transceivers For Data Center Network Connectivity
No ratings yet
Brocade® 32Gb/s SWL SFP+: Optimized, Certified Optical Transceivers For Data Center Network Connectivity
2 pages
Naman Admit Carf
No ratings yet
Naman Admit Carf
2 pages
Liturature Review
No ratings yet
Liturature Review
5 pages
Ochoa, Jobenyl Correlation and Regression
No ratings yet
Ochoa, Jobenyl Correlation and Regression
1 page
Enterprise Security Framework Guide
No ratings yet
Enterprise Security Framework Guide
13 pages
Resistance Predictions of High Speed Mono and Multihull Ships W and Wout Water Jet Propulsors Using RANS
No ratings yet
Resistance Predictions of High Speed Mono and Multihull Ships W and Wout Water Jet Propulsors Using RANS
16 pages
Anisha Bus Pass
No ratings yet
Anisha Bus Pass
2 pages
Data Analyst Career Overview
No ratings yet
Data Analyst Career Overview
2 pages
JavaScript Interview Puzzles
No ratings yet
JavaScript Interview Puzzles
8 pages
Exploring The Challenges and Opportunities of
No ratings yet
Exploring The Challenges and Opportunities of
10 pages
Butterfly Valve
No ratings yet
Butterfly Valve
12 pages
Ecowitt WS80 Manual
No ratings yet
Ecowitt WS80 Manual
21 pages
TTN3512 3e2a
No ratings yet
TTN3512 3e2a
1 page
Jumo Logoscreen NT: Paperless Recorder With TFT Display and Stainless Steel Front
No ratings yet
Jumo Logoscreen NT: Paperless Recorder With TFT Display and Stainless Steel Front
60 pages
DR Stone
No ratings yet
DR Stone
4 pages
Btech Ee 5 Sem Sensors and Transducers Kee052 2022
No ratings yet
Btech Ee 5 Sem Sensors and Transducers Kee052 2022
2 pages
Verizon Wireless Samsung Convoy 2 User Manual
No ratings yet
Verizon Wireless Samsung Convoy 2 User Manual
163 pages
Dbms Practical File Sem 1
No ratings yet
Dbms Practical File Sem 1
12 pages
7720 10 Installing A Server Edition
No ratings yet
7720 10 Installing A Server Edition
48 pages
Plan Etude Bama Master El en
No ratings yet
Plan Etude Bama Master El en
6 pages
SSIService Parts
No ratings yet
SSIService Parts
19 pages
Framework For Project Management
No ratings yet
Framework For Project Management
22 pages
Practice - Chapter 5
No ratings yet
Practice - Chapter 5
5 pages
Engineering Students' News Portal
No ratings yet
Engineering Students' News Portal
16 pages
The Impact of Information Technology Methods On Accounting Information Quality
No ratings yet
The Impact of Information Technology Methods On Accounting Information Quality
15 pages
QTS 5.1.0.2444 Build 20230629 Release Candidate 3 - Release Notes - QNAP (IN)
No ratings yet
QTS 5.1.0.2444 Build 20230629 Release Candidate 3 - Release Notes - QNAP (IN)
3 pages
FSM Unit 5 - Analytics and Reports
No ratings yet
FSM Unit 5 - Analytics and Reports
33 pages

Lecture Notes PRM42

Uploaded by

Lecture Notes PRM42

Uploaded by

Data Mining Techniques for Managers (DMTM)

Nowadays, we are witnessing the enormous growth of data

2. Ordinal attributes: e.g., ranking, grades, weights,

3. Interval attributes: e.g., date, temperature range

4. Ratio attributes: e.g., time, the temperature where

There are four types of attributes used in the data mining

1. Nominal attributes: e.g., ID numbers, eye color, zip

You might also like