[go: up one dir, main page]

0% found this document useful (0 votes)
7 views29 pages

Predictive Analytics

The document outlines the role of forecasting in supply chain management, emphasizing its importance for planning decisions across various processes. It discusses the characteristics of forecasts, methods for forecasting, and key points in the forecasting process, including the integration of demand planning. Additionally, it introduces machine learning techniques such as decision trees and entropy for predictive analytics.

Uploaded by

chaiboubhamza97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views29 pages

Predictive Analytics

The document outlines the role of forecasting in supply chain management, emphasizing its importance for planning decisions across various processes. It discusses the characteristics of forecasts, methods for forecasting, and key points in the forecasting process, including the integration of demand planning. Additionally, it introduces machine learning techniques such as decision trees and entropy for predictive analytics.

Uploaded by

chaiboubhamza97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

9/19/2023

Outline

Forecasting
Predictive Analytics
BY KHAKIM HABIBI
Machine Learning
RENNES SCHOOL OF BUSINESS

1 2

Role of Forecasting in a Supply Chain

• The basis for all planning decisions in a supply chain


• Used for both push and pull processes
◦ Production scheduling, inventory, aggregate planning

Forecasting
◦ Sales force allocation, promotions, new production introduction
◦ Plant/equipment investment, budgetary planning
◦ Workforce planning, hiring, layoffs
• All of these decisions are interrelated

Chopra and Meindl (2019)

3 4
9/19/2023

Characteristics of Forecasts Components and Methods (1 of 2)

1. Forecasts are always inaccurate and should thus include both the • Companies must identify the factors that influence future demand and
expected value of the forecast and a measure of forecast error then ascertain the relationship between these factors and future
2. Long-term forecasts are usually less accurate than short-term demand
forecasts ◦ Past demand
◦ Lead time of product replenishment
3. Aggregate forecasts are usually more accurate than disaggregate ◦ Planned advertising or marketing efforts
forecasts ◦ Planned price discounts
4. In general, the farther up the supply chain a company is, the greater ◦ State of the economy
is the distortion of information it receives ◦ Actions that competitors have taken

Chopra and Meindl (2019) Chopra and Meindl (2019)

5 6

Components and Methods (2 of 2) Components of An Observation


1. Qualitative Observed demand (O) = systematic component (S)
◦ Primarily subjective
◦ Rely on judgment + random component (R)
2. Time Series
◦ Use historical demand only
◦ Best with stable demand Systematic component – expected value of demand
3. Causal – Level (current deseasonalized demand)
◦ Relationship between demand and some other factor – Trend (growth or decline in demand)
4. Simulation – Seasonality (predictable seasonal fluctuation)
◦ Imitate consumer choices that give rise to demand Random component – part of forecast that deviates from systematic part
Forecast error – difference between forecast and actual demand
Chopra and Meindl (2019) Chopra and Meindl (2019)

7 8
9/19/2023

Five Important Points in the Forecasting


Process Techniques
1. Understand the objective of forecasting. Simple Average Smoothing

2. Integrate demand planning and forecasting throughout the supply Exponential Smoothing
chain. Moving Average
3. Identify the major factors that influence the demand forecast. ARIMA

4. Forecast at the appropriate level of aggregation.


5. Establish performance and error measures for the forecast.

Chopra and Meindl (2019)

9 10

Linear
Regression

Supervised Decision Tree

K-Nearest

Machine Learning Machine


Learning (ML)
Neighbor

K-Means
Unsupervised
Hierarchical
Clustering

11 12
9/19/2023

Outline
Structure
Entropy and Information Gain
An example

Decision Tree
Issues
◦ Overfitting
◦ Continuous Variables

13 14

Entropy probability of - and + is equal 0.5, 0.5


Structure

we have to get an homogène data at the end and try to get a sub data

15 16
9/19/2023

Entropy Information gain


A measure of the impurity of a collection of examples Measures the expected reduction of entropy
Denoted as 𝐺𝑎𝑖𝑛(𝑆, 𝐴) of an attribute𝐴, relative to a collection of example 𝑆, is defined as,

𝑆
𝐺𝑎𝑖𝑛 𝑆, 𝐴 = 𝐸 𝑆 − 𝐸(𝑆 )
𝑆
∈ ( )

Where 𝑝 is the probability of element / class 𝑖 in our data

17 18

we have 4 columns, we want to predict when we can play tennis or not

in order to create a decision tree

Data Set Entropies of Attribute: Outlook


Values = Sunny, Overcast, Rain
C in the equation is equal to
two because we have only 2 𝑆 = [9+, 5−]
sets

19 0 means it's completly homogene and 1 means that the entropy is completly heretogene 20
9/19/2023
For all data that we have we have 9 Yes and 5 No so we will try to major the entropy Pi so -9/14*log2(9/14) -5/14*log2(5/14)=0.94 so the range of the entropy is normally between 0 and 1 so the entropy is heterogene
=> E sunny=0.971 & Eovercast=0 and Erain=[3+,2-]= 0.971

Entropies of Attribute: Outlook Entropies of Attribute: Outlook


Values = Sunny, Overcast, Rain Values = Sunny, Overcast, Rain

𝑆 = [9+, 5−] 9 9 5 5 𝑆 = [9+, 5−] 9 9 5 5


𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.94 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.94
14 14 14 14 14 14 14 14
𝑆 ← [2+, 3−]

21 22

Entropies of Attribute: Outlook Entropies of Attribute: Outlook


Values = Sunny, Overcast, Rain Values = Sunny, Overcast, Rain

𝑆 = [9+, 5−] 9 9 5 5 𝑆 = [9+, 5−] 9 9 5 5


𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.94 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.94
14 14 14 14 14 14 14 14
2 2 3 3 2 2 3 3
𝑆 ← [2+, 3−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.971 𝑆 ← [2+, 3−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.971
5 5 5 5 5 5 5 5
4 4 0 0
𝑆 ← [4+, 0−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0
4 4 4 4
3 3 2 2
𝑆 ← [3+, 2−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.971
5 5 5 5

23 24
9/19/2023

Entropies of Attribute: Outlook Entropies of Attribute: Humidity


Values = Sunny, Overcast, Rain Values = High, Normal

𝑆 = [9+, 5−] 9 9 5 5 𝑆 = [9+, 5−] 9 9 5 5


𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.94 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.94
14 14 14 14 14 14 14 14
2 2 3 3 3 3 4 4
𝑆 ← [2+, 3−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.971 𝑆 ← [3+, 4−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.985
5 5 5 5 7 7 7 7
4 4 0 0 6 6 1 1
𝑆 ← [4+, 0−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0 𝑆 ← [6+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.5916
4 4 4 4 7 7 7 7
3 3 2 2
𝑆 ← [3+, 2−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.971
5 5 5 5

𝑆 𝑆
𝐺𝑎𝑖𝑛 𝑆, 𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 𝐺𝑎𝑖𝑛 𝑆, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆
𝑆 𝑆
, , ,
5 4 5 7 7
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.2464 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.1516
14 14 14 14 14

25 4 number of columns
26
if we use the

Entropies of Attribute: Wind Recapitulation


Values = Strong, Weak 𝐺𝑎𝑖𝑛 𝑆, 𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 0.2464
𝐺𝑎𝑖𝑛 𝑆, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.0289
𝑆 = [9+, 5−] 9 9 5 5
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.94
14 14 14 14 𝐺𝑎𝑖𝑛 𝑆, 𝐻𝑢𝑚𝑖𝑑𝑡𝑦 = 0.1516
3 3 3 3
𝑆 ← [3+, 3−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 1 𝐺𝑎𝑖𝑛 𝑆, 𝑊𝑖𝑛𝑑 = 0.0478
6 6 6 6
6 6 2 2
𝑆 ← [6+, 2−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.8113
8 8 8 8

𝑆
𝐺𝑎𝑖𝑛 𝑆, 𝑊𝑖𝑛𝑑 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆
𝑆
,
6 8
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.0478
14 14

27 28
9/19/2023

{𝐷1, 𝐷2, … , 𝐷14}


Datapoints having ‘Sunny’ as Outlook
[9+, 5−]

Outlook

Sunny Rain

Overcast

? ?
Yes

{𝐷1, 𝐷2, 𝐷8, 𝐷9, 𝐷11} {𝐷4, 𝐷5, 𝐷6, 𝐷10, 𝐷14}
{𝐷3, 𝐷7, 𝐷12, 𝐷13}
[2+, 3−] [3+, 2−]
[4+, 0−]

29 30

Entropies of Attribute: Temperature Entropies of Attribute: Humidity


Values = Hot, Mild, Cool Values = High, Normal

𝑆 = [2+, 3−] 2 2 3 3 𝑆 = [2+, 3−] 2 2 3 3


𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.97 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.97
5 5 5 5 5 5 5 5
0 0 2 2 0 0 3 3
𝑆 ← [0+, 2−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0 𝑆 ← [0+, 3−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0
2 2 2 2 3 3 3 3
1 1 1 1 2 2 0 0
𝑆 ← [1+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 1 𝑆 ← [2+, 0−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0
2 2 2 2 2 2 2 2
1 1 0 0
𝑆 ← [1+, 0−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0
1 1 1 1

𝑆 𝑆
𝐺𝑎𝑖𝑛 𝑆, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 𝐺𝑎𝑖𝑛 𝑆, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆
𝑆 𝑆
, , ,
2 2 1 3 2
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.570 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.97
5 5 5 5 5

31 32
9/19/2023

Entropies of Attribute: Wind Recapitulation


Values = Strong, Weak 𝐺𝑎𝑖𝑛 𝑆 , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.57

2 2 3 3 𝐺𝑎𝑖𝑛 𝑆 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.97


𝑆 = [2+, 3−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.97
5 5 5 5 𝐺𝑎𝑖𝑛 𝑆 , 𝑊𝑖𝑛𝑑 = 0.0192
1 1 1 1
𝑆 ← [1+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 1
2 2 2 2

𝑆 ← [1+, 2−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.9183

𝑆
𝐺𝑎𝑖𝑛 𝑆, 𝑊𝑖𝑛𝑑 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆
𝑆
,
2 3
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.0192
5 5

33 34

{𝐷1, 𝐷2, … , 𝐷14} {𝐷1, 𝐷2, … , 𝐷14}


[9+, 5−] [9+, 5−]

Outlook Outlook

Sunny Rain Sunny Rain


{𝐷1, 𝐷2, 𝐷8, 𝐷9, 𝐷11}
Overcast [2+, 3−] Overcast

? ? Humidity ?
Yes Yes

{𝐷1, 𝐷2, 𝐷8, 𝐷9, 𝐷11} {𝐷4, 𝐷5, 𝐷6, 𝐷10, 𝐷14} {𝐷4, 𝐷5, 𝐷6, 𝐷10, 𝐷14}
{𝐷3, 𝐷7, 𝐷12, 𝐷13} High Normal {𝐷3, 𝐷7, 𝐷12, 𝐷13}
[2+, 3−] [3+, 2−] [3+, 2−]
[4+, 0−] [4+, 0−]
No Yes
{𝐷1, 𝐷2, 𝐷8} {𝐷9, 𝐷11}

35 36
9/19/2023

Datapoints having ‘Rain’ as Outlook


Entropies of Attribute: Temperature
Values = Hot, Mild, Cool

𝑆 = [3+, 2−] 3 3 2 2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.97
5 5 5 5
𝑆 ← [0+, 0−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 =0

2 2 1 1
𝑆 ← [2+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.9183
3 3 3 3
1 1 1 1
𝑆 ← [1+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log =1
2 2 2 2

𝑆
𝐺𝑎𝑖𝑛 𝑆, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆
𝑆
, ,
0 3 2
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.0192
5 5 5

37 38

Entropies of Attribute: Humidity Entropies of Attribute: Wind


Values = High, Normal Values = Strong, Weak

𝑆 = [3+, 2−] 3 3 2 2 𝑆 = [3+, 2−] 3 3 2 2


𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.97 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.97
5 5 5 5 5 5 5 5
1 1 1 1 0 0 2 2
𝑆 ← [1+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 1 𝑆 ← [0+, 2−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0
2 2 2 2 2 2 2 2
2 2 1 1 3 3 0 0
𝑆 ← [2+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.9183 𝑆 ← [3+, 0−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0
3 3 2 3 3 3 3 3

𝑆 𝑆
𝐺𝑎𝑖𝑛 𝑆, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 𝐺𝑎𝑖𝑛 𝑆, 𝑊𝑖𝑛𝑑 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆
𝑆 𝑆
, ,
2 3 2 3
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.0192 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.97
5 5 5 5

39 40
9/19/2023

Recapitulation
{𝐷1, 𝐷2, … , 𝐷14}
[9+, 5−]

𝐺𝑎𝑖𝑛 𝑆 , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.0192


Outlook
𝐺𝑎𝑖𝑛 𝑆 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.0192
𝐺𝑎𝑖𝑛 𝑆 , 𝑊𝑖𝑛𝑑 = 0.97
Sunny Rain
{𝐷1, 𝐷2, 𝐷8, 𝐷9, 𝐷11}
[2+, 3−] Overcast

Humidity ?
Yes
{𝐷4, 𝐷5, 𝐷6, 𝐷10, 𝐷14}
High Normal {𝐷3, 𝐷7, 𝐷12, 𝐷13}
[3+, 2−]
[4+, 0−]
No Yes
{𝐷1, 𝐷2, 𝐷8} {𝐷9, 𝐷11}

41 42

Issues in Decision Tree Learning


{𝐷1, 𝐷2, … , 𝐷14}
[9+, 5−]

Overfitting the Data


Outlook
Incorporating Continuous Attributes
Missing Attributes Values
Sunny Rain
{𝐷1, 𝐷2, 𝐷8, 𝐷9, 𝐷11}
{𝐷4, 𝐷5, 𝐷6, 𝐷10, 𝐷14} Different Costs in Attributes
[2+, 3−] Overcast
[3+, 2−] Alternative Measures for Selecting Attributes

Humidity Wind
Yes

High Normal {𝐷3, 𝐷7, 𝐷12, 𝐷13} Strong Weak


[4+, 0−]
{𝐷4, 𝐷5,
No Yes No Yes D10}
{𝐷1, 𝐷2, 𝐷8} {𝐷9, 𝐷11}
{𝐷6, 𝐷14}

43 44
9/19/2023

Avoid overfitting in Decision Trees Avoid overfitting in Decision Trees


Stop growing the tree earlier before it reaches the points where it perfectly classifies the Reduced-error pruning
training data
1. Each node is a candidate for pruning
Allow the tree to overfit the data, and the post-prune the tree
◦ Split the training data into two parts: training and validation and use validation to assess the utility of
2. Pruning consists in removing a subtree rooted in a node
post-pruning 1. The node becomes a leaf and is assigned to the most common classification
◦ Reduced error pruning
3. Nodes are removed only if the resulting tree performs better on the validation set
◦ Rule post-pruning
4. Nodes are pruned iteratively: at each iteration the node whose increases accuracy on the
validation set is pruned.

45 46

Avoid overfitting in Decision Trees


{𝐷1, 𝐷2, … , 𝐷14}
[9+, 5−]

Rule post-pruning
Outlook
1. Convert the tree into an equivalent set of rules
1. Each path corresponds to a rule
Sunny Rain 2. Each node along a path corresponds to a pre-condition
{𝐷1, 𝐷2, 𝐷8, 𝐷9, 𝐷11}
{𝐷4, 𝐷5, 𝐷6, 𝐷10, 𝐷14} 3. Each leaf classification to the post-condition
[2+, 3−] Overcast
[3+, 2−] 2. Prune (generalize) each rule by removing those preconditions whose removal improves
accuracy over validation set
Humidity Wind
Yes

High Normal {𝐷3, 𝐷7, 𝐷12, 𝐷13} Strong Weak


[4+, 0−]
{𝐷4, 𝐷5,
No Yes No Yes D10}
{𝐷1, 𝐷2, 𝐷8} {𝐷9, 𝐷11}
{𝐷6, 𝐷14}

47 48
9/19/2023

Avoid overfitting in Decision Trees Avoid overfitting in Decision Trees


Rule post-pruning Rule post-pruning
Outlook = Sunny ^ Humidity = High  No
Outlook = Sunny ^ Humidity = Normal  Yes
Outlook = Overcast ^ Yes
Outlook = Rain ^ Wind = Strong  No
Outlook = Rain ^ Wind = Weak  Yes

Break the first rule into the following


◦ Outlook = Sunny  No
◦ Humidity = High  No

49 50

Dealing with Continuous Attributes Dealing with Continuous Attributes


Example. Temperature in the Play Tennis Example. Temperature in the Play Tennis
Sort the examples according to Temperature

Determine candidate thresholds by averaging consecutive values where there is a change in


classification: (48+60)/2 = 54 and (80+90)/2 = 85
Evaluate information gain of candidate thresholds (attributes) Temperature > 54 and
Temperature > 85 then select the threshold based on the information gain

51 52
9/19/2023

Entropies of Attribute: Temperature Entropies of Attribute: Temperature


Values = < 54, > 54 Values = < 85, > 85

𝑆 = [3+, 3−] 3 3 3 3 𝑆 = [3+, 3−] 3 3 3 3


𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 1 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 1
6 6 6 6 6 6 6 6
3 3 2 2
𝑆 ← [0+, 2−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 =0 𝑆 ← [3+, 2−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.971
5 5 5 5
3 3 1 1 0 0 1 1
𝑆 ← [3+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0.8113 𝑆 ← [0+, 1−] 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − log − log = 0
4 4 4 4 1 1 1 1

𝑆 𝑆
𝐺𝑎𝑖𝑛 𝑆, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 𝐺𝑎𝑖𝑛 𝑆, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆
𝑆 𝑆
, ,
2 4 5 1
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.4591 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = 0.1908
6 6 6 6

53 54

Recapitulation
𝐺𝑎𝑖𝑛 𝑆 , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.4591
𝐺𝑎𝑖𝑛 𝑆 , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.1908

K-Nearest Neighbour

55 56
9/19/2023

K-Nearest Neighbor K-Nearest Neighbor (cont..)


Finds records in a database that have similar numerical values of a set of predictor variables
Measure the Euclidean distance between records in the training data set.
The nearest neighbor to a record in the training data set is the one that that has the smallest Example of k-NN classification. The test sample
distance from it. (green dot) should be classified either to blue
◦ If k = 1, then the 1-NN rule classifies a record in the same category as its nearest neighbor. squares or to red triangles. If k = 3 (solid line circle)
◦ k-NN rule finds the k-Nearest Neighbors in the training data set to each record we want to it is assigned to the red triangles because there are
classify and then assigns the classification as the classification of majority of the k nearest 2 triangles and only 1 square inside the inner
neighbors circle. If k = 5 (dashed line circle) it is assigned to
the blue squares (3 squares vs. 2 triangles inside
Typically, various values of k are used and then results inspected to determine which is best. the outer circle).

Source: https://en.wikipedia.org/wiki/K-
nearest_neighbors_algorithm
Evans (2016)

57 58

Data
Prediction
Data Data
• ID Age Gender Product • ID Age Gender Product
• 1 54 Male C • 1 54 Male C
• 2 32 Female B • 2 32 Female B With k = 3 Predict the product that may
• 3 43 Female B • 3 43 Female B be bought by a new customer with the
following information:
• 4 75 Female C • 4 75 Female C Age = 48
• 5 15 Female A • 5 15 Female A Gender = Male
• 6 52 Male B • 6 52 Male B
• 7 49 Female B • 7 49 Female B
• 8 10 Female A • 8 10 Female A
• 9 17 Female C • 9 17 Female C
• 10 49 Female B • 10 49 Female B

59 60
9/19/2023

Convert the column ‘Age’ Compute the distance


ID Age Gender ID Age Gender Distance
ID Age Gender
1 54 0 1 54 0 6
1 54 0
2 32 1 2 32 1 32,01562
2 32 1

Gender Encode 3 43 1 3 43 1 43,01163


3 43 1

Male 0 4 75 1 4 75 1 75,00667
4 75 1

Female 1 5 15 1
5 15 1 5 15 1 15,0333
6 52 0
6 52 0 6 52 0 52
7 49 1
7 49 1 7 49 1 49,0102
8 10 1
8 10 1 8 10 1 10,04988
9 17 1
9 17 1 9 17 1 17,02939
10 49 1
10 49 1 10 49 1 49,0102

New Customer 11 48 0
New Customer 11 48 0 Euclidean distance

61 62

Sort the customers based on the


distance Identify the nearest neighbors, k = 3
Age Gender Distance Product
ID Age Gender Distance Age Gender Distance Product
54 0 6 C
1 54 0 6 54 0 6 C
10 1 10,04988 A
2 32 1 32,01562 10 1 10,04988 A
15 1 15,0333 A
3 43 1 43,01163 15 1 15,0333 A
17 1 17,02939 C
4 75 1 75,00667 17 1 17,02939 C
32 1 32,01562 B
5 15 1 15,0333 32 1 32,01562 B
43 1 43,01163 B
6 52 0 52 43 1 43,01163 B
49 1 49,0102 B
7 49 1 49,0102 49 1 49,0102 B
49 1 49,0102 B
8 10 1 10,04988 49 1 49,0102 B
52 0 52 B
9 17 1 17,02939 52 0 52 B
75 1 75,00667 C
10 49 1 49,0102 75 1 75,00667 C

63 64
9/19/2023

Conclusion Exercise: k = 4!
Patient ID Age Gender Blood Type Demand
Age Gender Distance Product
1 78 Female AB+ Fresh Frozen Plasma
54 0 6 C
2 61 Male B+ Red Blood Cell
10 1 10,04988 A
3 18 Male A- Whole Blood
15 1 15,0333 A As the majority of the
4 85 Female B- Platelets
17 1 17,02939 C neighbors chose product A,
5 95 Female A+ Whole Blood
we predict that the new
32 1 32,01562 B
customer will choose 6 84 Male A- Fresh Frozen Plasma
43 1 43,01163 B
Product A
7 33 Female O- Red Blood Cell
49 1 49,0102 B

49 1 49,0102 B 8 37 Female A- Fresh Frozen Plasma

52 0 52 B
9 34 Male A+ Fresh Frozen Plasma
75 1 75,00667 C
10 93 Female A+ Whole Blood

11 57 Male A+ ????

65 66

K-Means
It captures the insight that each point in a cluster should be near to the center of that cluster.

1. choose k, the number of clusters we want to find in the data. Then, the centers of those k
clusters, called centroids, are initialized in some fashion

K-Means 2. Do:
1.
2.
Reassign Points step: assign every point in the data to the cluster whose centroid is nearest to it.
Update Centroids step: recalculate each centroid's location as the mean (center) of all the points
assigned to its cluster. If the points stop switching clusters, end the algorithm. Otherwise, Back to
step 2.1

67 68
9/19/2023

Consider the following data Consider the following data


Customer ID Age Willingness to pay (in €) Customer ID Age Willingness to pay (in €)

1 2 10 1 2 10

2 2 5 2 2 5

3 8 4 3 8 4

4 5 8 4 5 8

5 7 5 5 7 5

6 6 4 6 6 4

7 1 2 7 1 2

8 4 9 8 4 9
Cluster the customers into
9 62 61 9 62 61
two clusters!
10 37 13 10 37 13

69 70

Iteration 1 Iteration 1
Determine the initial centroids Compute the distance between each customer and each centroid

Customer ID Age Willingness to pay (in €)


Customer ID Age Willingness to pay (in €)
ID (8;4) (5;8)
1 2 10
1 2 10
2 2 5 1 12 5
2 2 5
3 8 4 2 7 6
3 8 4
4 5 8 3 0 7
4 5 8
5 7 5 4 7 0
5 7 5
6 6 4 5 2 5
6 6 4
7 1 2 6 2 5
7 1 2 8 − 2 + 4 − 10 = 12 7 9 10
8 4 9
8 4 9 8 9 2
9 62 61
9 62 61 9 111 110
10 37 13
10 37 13 10 38 37

71 72
9/19/2023

Iteration 1 Iteration 1
Calculate the minimum distance for each customer Assign each customer to the closest centroid

ID (8;4) (5;8) Min distance ID (8;4) (5;8) Min distance Cluster


1 12 5 5 1 12 5 5 2
2 7 6 6 2 7 6 6 2
3 0 7 0 3 0 7 0 1
4 7 0 0 4 7 0 0 2
5 2 5 2 5 2 5 2 1
6 2 5 2 6 2 5 2 1
7 9 10 9 7 9 10 9 1
8 9 2 2 8 9 2 2 2
9 111 110 110 9 111 110 110 2
10 38 37 37 10 38 37 37 2

73 74

Iteration 1 Iteration 1
Cluster each customer Determine the new centroid per cluster

ID (8;4) (5;8) Min distance Cluster


Willingness to Willingness to
1 12 5 5 2 ID Age ID Age
pay pay
2 7 6 6 2 1 2 10
3 8 4
3 0 7 0 1 2 2 5
4 7 0 0 2 5 7 5
Cluster 1: 3, 5, 6, and 7 4 5 8
5 2 5 2 1 6 6 4
Cluster 2: 1, 2, 4, 8, 9, and 10 8 4 9
6 2 5 2 1 9 62 61
7 1 2
7 9 10 9 1 10 37 13
New centroid 5,5 3,75
8 9 2 2 2 New centroid 18,66667 17,66667
9 111 110 110 2
Cluster 1 Cluster 2
10 38 37 37 2

8+7+6+1 4+5+4+2
4 4

75 76
9/19/2023

Iteration 2 Iteration 2
Compute the distance of each customer to the new centroids Calculate the minimum distance for each customer

ID (5,5;3,75) (18,67;17,67) ID (5,5;3,75) (18,67;17,67) Min distance

1 9,75 24,33333333 1 9,75 24,33333333 9,75

2 4,75 29,33333333 2 4,75 29,33333333 4,75

3 2,75 24,33333333 3 2,75 24,33333333 2,75

4 4,75 23,33333333 4 4,75 23,33333333 4,75

5 2,75 24,33333333 5 2,75 24,33333333 2,75

6 0,75 26,33333333 6 0,75 26,33333333 0,75

7 6,25 33,33333333 7 6,25 33,33333333 6,25

8 6,75 23,33333333 8 6,75 23,33333333 6,75

9 113,75 86,66666667 9 113,75 86,66666667 86,66667

10 40,75 23 10 40,75 23 23

77 78

Iteration 2 Iteration 2
Assign each customer to the closest centroid Determine the new centroid for each cluster

ID (5,5;3,75) (18,67;17,67) Min distance Cluster

1 9,75 24,33333333 9,75 1 Willingness to


ID Age Willingness to
pay ID Age
pay
2 4,75 29,33333333 4,75 1
1 2 10
9 62 61
3 2,75 24,33333333 2,75 1 2 2 5
10 37 13
4 4,75 23,33333333 4,75 1 3 8 4
Cluster 1: 1 - 8 4 5 8 New centroid 49,5 37
5 2,75 24,33333333 2,75 1
Cluster 2: 9, and 10 5 7 5
6 0,75 26,33333333 0,75 1 6 6 4
Cluster 2
7 6,25 33,33333333 6,25 1 7 1 2
8 4 9
8 6,75 23,33333333 6,75 1
New centroid 4,375 5,875
9 113,75 86,66666667 86,66667 2

10 40,75 23 23 2
Cluster 1

79 80
9/19/2023

Iteration 3 Iteration 3
Compute the Manhattan distance of each customer to the new centroids Calculate the minimum distance for each customer

(4,375; (4,375;
ID (49,5; 37) Min Cluster ID (49,5; 37) Min Cluster
5,875) 5,875)

1 6,5 74,5 6,5 1 1 6,5 74,5 6,5 1

2 3,25 79,5 3,25 1 2 3,25 79,5 3,25 1

3 5,5 74,5 5,5 1 3 5,5 74,5 5,5 1

4 2,75 73,5 2,75 1 4 2,75 73,5 2,75 1

5 3,5 74,5 3,5 1 5 3,5 74,5 3,5 1

6 3,5 76,5 3,5 1 6 3,5 76,5 3,5 1

7 7,25 83,5 7,25 1 7 7,25 83,5 7,25 1

8 3,5 73,5 3,5 1 8 3,5 73,5 3,5 1

9 112,75 36,5 36,5 2 9 112,75 36,5 36,5 2

10 39,75 36,5 36,5 2 10 39,75 36,5 36,5 2

81 82

Iteration 3 Iteration 3
Assign each customer to the closest centroid Assign each customer to the closest centroid

(4,375; (4,375;
ID (49,5; 37) Min Cluster ID (49,5; 37) Min Cluster
5,875) 5,875)

1 6,5 74,5 6,5 1 1 6,5 74,5 6,5 1

2 3,25 79,5 3,25 1 2 3,25 79,5 3,25 1

3 5,5 74,5 5,5 1 3 5,5 74,5 5,5 1

4 2,75 73,5 2,75 1 Cluster 1: 1 - 8 4 2,75 73,5 2,75 1 Cluster 1: 1 - 8


5 3,5 74,5 3,5 1
Cluster 2: 9, and 10 5 3,5 74,5 3,5 1
Cluster 2: 9, and 10
6 3,5 76,5 3,5 1 6 3,5 76,5 3,5 1

7 7,25 83,5 7,25 1 7 7,25 83,5 7,25 1 As there is no change in each


8 3,5 73,5 3,5 1 8 3,5 73,5 3,5 1 cluster, stop the algorithm
9 112,75 36,5 36,5 2 9 112,75 36,5 36,5 2

10 39,75 36,5 36,5 2 10 39,75 36,5 36,5 2

83 84
9/19/2023

Exercise: k = 3
Supplier ID Experience (in years) Capacity (in ton)

1 2 10

Hierarchical Clustering
2 2 5

3 8 4

4 5 8

5 7 5

6 6 4

7 1 2
Initial centroids:
8 4 9 1,4, and 7

85 86

Introduction Dendrogram
A Hierarchical clustering method works via grouping data into a tree of clusters. A tree like structure which represents hierarchical technique
◦ Leaf-Individual
Hierarchical (agglomerative) clustering begins by treating every data points as a separate cluster.
◦ Root-One cluster
Then, it repeatedly executes the subsequent steps:
◦ Identify the 2 clusters which can be closest together, and A cluster at level 1 is the merger of its child cluster at level i+1
◦ Merge the 2 maximum comparable clusters.
◦ Continue these steps until all the clusters are merged together.

87 88
9/19/2023

Dendrogram Basic Methods


Agglomerative
Divisive

A B C D E F

89 90

Distances Example
Single-nearest distance or single linkage
◦ Distance between the closest members of the two clusters 1
Complete-farthest distance or complete linkage 5 2
3 6
◦ Distance between the members that are the farthest apart

Average-average distance or average linkage 4


◦ Looking at the distances between all pairs and average all of these distances
◦ Called Unweighted Pair Group Mean Averaging

91 92
9/19/2023

Distance Matrix Distance Matrix

Smallest distance

93 94

Dendrogram

3 6 2 5 4 1

95 96
9/19/2023

Update the distance matrix Updated Distance Matrix


Between cluster (P3,P6) and
◦ P1: MIN(dist(P3,P1);dist(P6,P1)) = MIN(0.22;0.24) = 0.22
◦ P2: MIN(dist(P3,P2);dist(P6,P2)) = MIN(0.14;0.24) = 0.14
◦ P3: MIN(dist(P3,P4);dist(P6,P4)) = MIN(0.16;0.22) = 0.16
◦ P5: MIN(dist(P3,P5);dist(P6,P5)) = MIN(0.28;0.39) = 0.28

Smallest distance

97 98

Updated Distance Matrix

Let’s say we choose this one

99 100
9/19/2023

Dendrogram Update the distance matrix


Between cluster (P2,P5) and
◦ P1: MIN(dist(P2,P1);dist(P5,P1)) = MIN(0.23;0.34) = 0.23
◦ (P3,P6): MIN(dist(P2, (P3,P6));dist(P6, (P3,P6))) = MIN(0.14;0.28) = 0.14
◦ P4: MIN(dist(P2,P4);dist(P5,P4)) = MIN(0.19;0.28) = 0.19

3 6 2 5 4 1

101 102

Updated Distance Matrix

103 104
9/19/2023

Dendrogram Update the distance matrix


Between cluster (P2,P5,P3,P6) and
◦ P1: MIN(dist((P2,P5),P1);dist((P3,P6),P1)) = MIN(0.23;0.22) = 0.22
◦ P4: MIN(dist((P2,P5),P4);dist((P3,P6),P4)) = MIN(0.19;0.16) = 0.16

3 6 2 5 4 1

105 106

Updated Distance Matrix

107 108
9/19/2023

Dendrogram Updated Distance Matrix

3 6 2 5 4 1

109 110

Dendrogram

3 6 2 5 4 1

111 112
9/19/2023

How many clusters to pick? The optimal


value of k
A fundamental step for any
unsupervised algorithm is to
determine the optimal number of
clusters into which the data may be
clustered.
The Elbow Method is one of the most
popular methods to determine this
optimal value of k.
◦ May use distortion - the average
of the squared distances from all
samples to the cluster center
they belong to.

113 114

References
 Chopra, S. and Meindl, P. (2019). Supply Chain Management: Strategy, Planning, and Operation.
(7th edition). Pearson.
 Evans, J. R. (2016). Business Analytics. (2nd Edition). Pearson.
 http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
 https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
 https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
 https://www.geeksforgeeks.org/hierarchical-clustering-in-data-mining/?ref=gcse
 Various tutorials from http://www.anuradhabhatia.com
 Various lecture series from https://www.vtupulse.com/
 https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8

115

You might also like