MACHINE LEARNING
Lecture 02
Dr. Samana Batool
INTRODUCTION TO ML
Labeled Training Set for Spam Classification (Example of Supervised Learning)
Training Set Label
Sample
New Sample
kernel
approximation
Ensemble
Classifiers labeled data SVR (kernel=‘linear’)
SVR (kernel=‘rbf’)
number of
categories known
scikit-learn algorithm cheat-
sheet
kernel
approximation
Ensemble
Classifiers labeled data SVR (kernel=‘linear’)
SVR (kernel=‘rbf’)
number of
categories known
kernel
approximation
Ensemble
Classifiers labeled data SVR (kernel=‘linear’)
SVR (kernel=‘rbf’)
number of
categories known
Time-series
Anomaly prediction
detection
AR
continuous
Time-
Unsupervised series
Markov
Learning chain
modelling CRF
Latent undirected
variable
models Latent
Mixed HMM
Markov - discrete
member- Dimension chain
ship
reduction
under LDS
complete continuous
Clique
over SLDS
decomp. Latent Linear
complete mixed
Dirichlet
Allocation Visualis-
ation Non Factor
Linear analysis
sparse PLSA
basis NMF
PCA
MACHINE LEARNING ALGORITHMS
Unsupervised Supervised
Clustering & Dimensionality Reduction Regression (Linear, Polynomial )
Continuous
SVD
Decision Trees
PCA
K- means Random Forests
Classification
Categorical
Association Analysis KNN
Apriori Trees
FP-Growth Logistic Regression
Naïve-Bayes
SVM
Prediction Clustering
Making a model based on a given data, Comparing the properties of the
then applying the model to new cases data and forming clusters based on
for predictions similar characteristics
Predicting the quality specifications Clusterization of processes with
from ingredients and environment similar characteristics out of
(pressure, temperature, humidity, various processes
Data
etc.) in the fishing industry
Mining
Problems
Association Rule
Classification
Identifying the attributes or
Determining where a particular case relationships between items to which
belongs in a given series of classified the appearance of a pattern implies the
categories appearance of another pattern
Quality Ratings from good/normal/bad, Predicting what will happen to the
Determining the quality of new entire process when there is an
products abnormal pattern in one process
Data Source Collecting
Manual
Internal
data
Log
Collector
Web
Automated Crawling
External
data
Sensing
Media
Standardization Sampling
• Z-transform • Random Sampling
• Normalization • Stratified Sampling
(feature scaling) • Cluster Sampling
Normal Distribution Dimensionality Reduction
Data
• Log Transformation • Factor Analysis
Transformation
• Square Root Transformation • PCA
Categorization Signal Data Compression
• Discretization • Fourier Transform
• Binarization • Wavelet Transform
train model
Entire Data
choose model
model test
True/Actual
Cat Fish Hen
Cat 4 6 3
Predicted
Fish 1 2 0
Hen 1 2 6
Traditional Approach
Launch!
Solution
Problem Research Making Rules
Evaluation
Error Analysis
Machine Learning Approach
Launch!
Data
Machine Learning Solution
Problem Research
Algorithm Training Evaluation
Error Analysis
UNIT 01
2784
60000
MNIST
Rare Database in a Vast Space
Sampling Noise Sampling Bias
𝑦
𝑥 𝑥0
Graph of an Inaccurate Prediction Due to Overfitting
𝑦
𝑥
𝑦 = 5𝑥 + 3
3x + 5y = 7
(𝑥 − 2)3 = 𝑥 3 − 6𝑥 2 + 12𝑥 − 8
𝑥 2 + 3𝑥 + 2 = 0
𝑥 2 + 𝑥𝑦𝑧 + 2𝑥𝑦 + 3𝑦 2 = 9
4 Terms
Coefficient 4 -3 4 6
Degree 1 1 3 0
Graph of a First-Degree Equation
𝑦 = 5𝑥 + 3 𝑎>0 𝑦 𝑦
5
First−Degree Equation for 𝑥 , 𝑎<0 𝑦 = −7𝑥 + 5
𝑎𝑥 + 𝑏 (𝑏𝑢𝑡, 𝑎 ≠ 0) 𝑎 1
𝑎
3 𝑏 intercept b
1 intercept
𝑎 slope 𝑥
0 𝑥 0
𝑏 intercept
‣ a and b are treated as constants.
‣ The degree of term ax is 1, and the order of term b is zero because there is no letter x. Thus, this equation becomes a first-
degree equation.
Graph of a Second-Degree Equation
Second-Degree 𝑎>0 𝑎<0
𝑦 𝑦
Equation for 𝑥,
𝑎𝑥 2 + 𝑏𝑥 + 𝑐 𝑐
(𝑏𝑢𝑡 𝑎 ≠ 0)
𝑥 𝑥
0 0
c
Convex Downward Convex Upward
‣ a and b are treated as constants. The degree of ax is 2, the degree of b is 1, and the degree of c is 0 because there is no
letter x. Thus, this equation is a second-degree equation.
Distance = 3 Distance = 3
-3 -2 -1 0 1 2 3 4
∥𝑝−𝑞 ∥= 𝑝 − 𝑞 ∙ 𝑝 − 𝑞 = ∥ 𝑝 ∥2 +∥ 𝑞 ∥2 − 2𝑝 ⋅ 𝑞
y
The distance of
segment AB is the a2 A(a1, a2) A
distance between
point A and point B.
b2 (a2− b2)
B (b1, b2)
x B
0 b1 a1 (a1− b1)
23 = 2 ∙ 2 ∙ 2 = 8
23 = 2 ∙ 2 ∙ 2 = 8
Exponentials Logarithms
8 = 23 3 = log 2 8
100 = 102 2 = log10 100
81 = 34 4 = log 3 81
−
0.01 = 10 2 −2 =
log10 0.01
Exponents Log
Base
Antilog
f(x) = 𝑎 𝑥
For the real number 𝑎, satisfying 𝑎 > 0, 𝑎 ≠ 1, the graph of the exponential function 𝑦 = 𝑎𝑥 with the range 𝑎 can be
drawn as shown below.
y=ax y=ax
y y
a1
1 1
a1
x x
0 t 0 t
(a) When a >1 (b) When 0<a <1
The graph of the exponential function 𝒚 = 𝒂𝟐 with the range 𝒂
y2=2**(-x-2)
2^{-X-2}$
𝑦=3 2.0 𝑦 = 2-𝑥
8 𝑦 = 3𝑥-1 + 2 𝑦 = −2-x-2
1.5
6 1.0
4 0.5
0.0
2
-0.5
0
−1.0 -0.5 0.0 0.5 1.0 1.5 2.0 −1.0 -0.5 0.0 0.5 1.0 1.5 2.0
𝑓 𝑥 = 𝑙𝑜𝑔𝑎 𝑥
For the real number 𝑎, satisfying 𝑎 > 0, 𝑎 ≠ 1, the graph of the logarithmic function y = log a x with the
range 𝑎 can be drawn as shown below.
y y
𝑦=𝑥
𝑦 = 𝑎𝑥 𝑦=𝑥
𝑦 = 𝑎𝑥
𝑦 = log 𝑎 𝑥 1
(a) When a >1 1 (b) When 0<a <1
x x
0 1 0 1
𝑦 = log 𝑎 𝑥
The Graph of the Logarithmic Function 𝒚 = 𝒍𝒐𝒈𝒂𝒙 with the range 𝒂
2 𝑦 = log 2(𝑥)
𝑦 = log 𝑥 𝑥 + 1 − 1
1
0
-1
-2
-3
-4
−1 0 1 2 3 4
import numpy as np
import matplotlib.pyplot as plt
data = np.random.exponential(scale=2.0, size=1000)
log_data = np.log(data + 1) # Apply log transform (shift by 1 to avoid
log(0))
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.hist(data, bins=30)
plt.title("Original Data (Exponential Distribution)")
plt.subplot(1, 2, 2)
plt.hist(log_data, bins=30)
plt.title("Log Transformed Data")
plt.show()
For binary classification (i.e., predicting between two classes), the
formula for cross-entropy loss is:
𝐿 𝑦, 𝑦ො = −[y log 𝑦ො + 1 − y log 1 − 𝑦ො ]
import numpy as np
y_true = np.array([1, 0, 1, 1]) # True labels
y_pred = np.array([0.9, 0.1, 0.8, 0.4]) #
Predicted probabilities
cross_entropy_loss = -np.mean(y_true *
np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
print("Binary Cross-Entropy Loss:",
cross_entropy_loss)
Comparing Logarithmic and Linear Loss
Create two functions: one for computing linear loss and one for
logarithmic loss.
Compare the results when applied to the same data.
Task:
•Implement a linear_loss(y_true, y_pred) function that calculates
the mean absolute error (MAE).
•Implement a logarithmic_loss(y_true, y_pred) function that
calculates the log difference between predictions and true values.
•Compare both loss functions using the same test data.
import numpy as np
# Test the functions
y_true = np.array([1, 0, 1])
y_pred = np.array([0.9, 0.2, 0.8])
Linear_loss = np.mean(np.abs(y_true - y_pred))
Log_loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 -
y_pred))
print("Linear Loss:", linear_loss(y_true, y_pred))
print("Logarithmic Loss:", logarithmic_loss(y_true, y_pred))
x (1+x)^(1/x) x (1+x)^(1/x)
0.01 2.70481383 0.0001 2.71814593
0.0099 2.7049473 0.00009 2.7181595
0.0098 2.70508079 0.00008 2.71817311
0.0097 2.70521431 0.00007 2.71818669
0.0096 2.70534785 0.00006 2.71820028
0.0095 2.70548142 0.00005 2.71821387
0.0094 2.70561501 0.00004 2.71822746
0.0093 2.70574863 0.00003 2.71824106
0.0092 2.70588227 0.00002 2.71825465
0.0091 2.70601593 0.00001 2.71826824
0.009 2.70614962 0 #DIV/0!
0.0089 2.70628333 -1.E-05 2.71829542
0.0088 2.70641707 -2.E-05 2.71830901
0.0087 2.70655083 -3.E-05 2.71832260
0.0086 2.70668461 -4.E-05 2.71833620
0.0085 2.70681842 -5.E-05 2.71834979
0.0084 2.70695225 -6.E-05 2.71836338
0.0083 2.70708611 -7.E-05 2.71837697
0.0082 2.70722 -8.E-05 2.71839057
0.0081 2.70735390 -9.E-05 2.71840416
0.008 2.70748783 -0.0001 2.71841776
0.0079 2.70762179 -0.0001 2.71843135
0.0078 2.70775577 -0.0001 2.71844494
0.0077 2.70788977 -0.0001 2.71845854
0.0076 2.70802380 -0.0001 2.71847213
0.0075 2.70815785 -0.0002 2.71848573
0.0074 2.70829193 -0.0002 2.71849932
0.0073 2.70842603 -0.0002 2.71851292
0.0072 2.70856016 -0.0002 2.71852651
𝑒 𝑖𝑛𝑓 = 𝑖𝑛𝑓
1 1
𝑒 −𝑖𝑛𝑓 = = =0
𝑒 𝑖𝑛𝑓 𝑖𝑛𝑓
1
Logistic Curve
0.5
0
-6 -4 -2 0 2 4 6
import numpy as np
import matplotlib.pyplot as plt
# Example of plotting the sigmoid function
x = np.linspace(-10, 10, 100)
y = 1 / (1 + np.exp(-x))
plt.plot(x, y)
plt.title("Sigmoid Function")
plt.xlabel("Input")
plt.ylabel("Output")
plt.grid(True)
plt.show()
x y Sigmoid Function Graph
-1 3.72008E-44
1 -0.9 8.19401E-40 1.2
1 + exp(−𝑎𝑥) -0.8 1.80485E-35
-0.7 3.97545E-31 1
-0.6 8.75651E-27
-0.5 1.92875E-22 0.8
-0.4 4.24835E-18
-0.3 9.35762E-14 0.6
-0.2 2.06115E-09
-0.1 4.53979E-05 0.4
0 0.5
0.1 0.999954602
0.2
0.2 0.999999998
0.3 1
0.4 1
0
0.5 1 -1.5 -1 -0.5 0 0.5 1 1.5
0.6 1
0.7 1
0.8 1
0.9 1
1 1
Linear Relationship between
Time and Height of Water
𝑄
𝑣
5 5
𝑃𝑄 = 𝑜𝑟 𝑣 =
𝑃𝑄 3 3
𝑃
𝑷 5
𝑣=
3
𝑣
O 𝑥
𝑏 = (2,3)
𝑎 = (3,1)
‣ The result of the vector’s inner product is in a form of a real number (size) rather than in a form of a vector.
‣ This real number is sometimes referred to as scalar, and so this process is often called the scalar multiplication.
‣ The operator symbol of the inner product is not x, but ∙, and it is called a “dot.”
𝑏 = (2,3)
𝜃 𝑎 = (3,1)
b
a
Normal Vector p
90 °
a(-4, -2) b(6, 3) a(4, 2) b(6, 3)
a(6, 3)
b(1, -2)
Cosine Similarity is -1. Cosine Similarity is 0. Cosine Similarity is 1.