Classification
n Outline:
1. Introduction
2. K Nearest Neighbor (KNN)
3. Artificial Neural Network (ANN)
4. Support Vector Machine (SVM)
Classification
n Outline:
1. Introduction
2. K Nearest Neighbor (KNN)
3. Artificial Neural Network (ANN)
4. Support Vector Machine (SVM)
ANN
§ Overview of ANN
§ Components of ANN
§ ANN training (forward and backward propagation)
§ ANN characteristics
§ ANN design
ANN
§ Overview of ANN
§ Components of ANN
§ ANN training (forward and backward propagation)
§ ANN characteristics
§ ANN design
What is an ANN?
§ Computing system inspired by the biological neural network
§ Connection of units (or nodes) called artificial neurons
Ø Each node ~ biological neuron
Ø Each connection ~ synapse in brain
Biological neural
networks
§ Our brain consists of ~ 100 billion
neurons
§ A neuron may connect to as many
as 100,000 other neurons
§ Signals “move” via electrochemical
signals
§ Biological neural network:
interconnected network of billions
of neurons with trillions of
interconnections between them
Structure of a biological neuron
§ Dendrite: receives signals from other neurons
§ Cell body: sums all the incoming signals to generate input
§ Axon: transfers signals to the other neurons when the neuron “fires”
(sum reaches a threshold)
§ Synapses: points of interconnection of one neuron with other neurons
McCulloch and Pitts neuron model (1943)
§ A mathematic computing paradigm that models the human neuron
Input Weights
x1
x2 u Output
y = f (u)
x3 y
y
.
N
xN u = ∑ x jw j
j=1
u
Perceptron neuron model
§ An enhanced version of McCulloch-Pitts model:
Ø Merge Hebbian learning rule of adjusting weights
Ø Add bias
Input Weights N
u = ∑ x jw j +θ
x1 j=1
x2 Output
u ⎧⎪ 1 u ≥ 0
f(u) yy = ⎨
x3 ⎪⎩ 0 u < 0
.
xN b= θ
ANN
§ Overview of ANN
§ Components of ANN
§ ANN training (forward and backward propagation)
§ ANN characteristics
§ ANN design
General neuron model
Input Weights Cell body
x1 Net function
Output
x2 u N
f(u) u = ∑ x jw j +θ
x3 y j=1
.
xN Activation function
y = f (u)
θ Ex: 1
{wj; 1 £ j £ N}: synaptic weights y = f (u) =
1+ e −u
q : threshold
Popular net functions
Popular activation functions
Multilayer perceptron model (MLP)
§ Layered network of perceptron
neurons
Fully
connected
ANN
§ Overview of ANN
§ Components of ANN
§ ANN training (forward and backward propagation)
§ ANN characteristics
§ ANN design
ANN training process?
§ Calibrating all of the weights by repeating forward-backward
propagation steps until the output is predicted accurately
§ Forward propagation:
Ø Applying a set of weights to the input data
Ø Calculating the output
§ Backward propagation:
Ø Measuring the error of the output (difference between desired
output and actual output)
Ø Adjusting the weights to decrease the error in the next step
ANN training example – epoch 1
INPUT DATA DESIRED VALUE ACTUAL OUTPUT
0 0.21
0 0.156
1 0.78
1 0.83
ANN training example – epoch 2
INPUT DATA DESIRED VALUE ACTUAL OUTPUT
0 0.194
0 0.143
1 0.802
1 0.895
ANN training example – epoch n
INPUT DATA DESIRED VALUE ACTUAL OUTPUT
0 0.119
0 0.056
1 0.884
1 0.926
Error back propagation learning
§ Step 1: initialization
§ Step 2: output calculating
§ Step 3: error calculating K K
E = å [e(k )]2 = å [d (k ) - z (k )]2
k =1 k =1
d – desired output values (target)
z – actual outputs
§ Step 4: weight updating then go back to step (2) until the stop
condition is satisfied
Weight updating
§ To achieve the minimum error
W – weight
E – error
Learning rate choosing
Stopping conditions
§ Average squared error change: the absolute rate of change in
the average squared error per epoch is sufficiently small (in the
range [0.1, 0.01]).
§ Generalization based criterion: after each epoch the ANN is
tested for generalization. If the generalization performance is
adequate then stop.
§ Good generalization: the I/O mapping is nearly correct for new
data
Bài tập tính thủ công
§ Cho một mạng nơ-ron đơn giản có 1 nơ-ron, 2 đầu vào, 1 đầu ra. Cho biết
giá trị bias cố định là −0.2, các trọng số là: 𝑤! = 0.3, 𝑤" = −0.1; hàm kết nối
là tuyến tính, hàm kích hoạt là hàm bước nhảy. Xác định đầu ra và cập nhật
trọng số theo luật học perceptron, cho biết tốc độ học là 𝜂 = 0.1. 𝑥! = 1, x" =
0, y = 0.
Hướng dẫn – vòng 1
§ Xác định đầu ra:
Hướng dẫn – vòng 1
§ So sánh đầu ra thực tế với đầu ra mong muốn
Hướng dẫn
§ Cập nhật trọng số theo luật học perceptron:
Hướng dẫn – vòng 2
§ Xác định đầu ra:
net = w! x! + w" x" + b = 0 −→ output = 1
§ So sánh đầu ra thực tế với đầu ra mong muốn: có lỗi à cần cập nhật
trọng số
§ Cập nhật trọng số:
w! = 0.2 + 0.1 0 − 1 (1) = 0.1
w" = −0.1 + 0.1 0 − 1 (0) = −0.1
Hướng dẫn – vòng 3
§ Xác định đầu ra:
net = w! x! + w" x" + b = −0.1 −→ output = 0
§ So sánh đầu ra thực tế với đầu ra mong muốn: không có lỗi à hội tụ
§ Trọng số của mạng nơ-ron đã được huấn luyện:
w! = 0.1
w" = −0.1
b = −0.2
ANN
§ Overview of ANN
§ Components of ANN
§ ANN training (forward and backward propagation)
§ ANN characteristics
§ ANN design
ANN characteristics
§ Parameters, hyperparameters
§ Shallow NN, deep NN
§ Underfitting, overfitting
§ Generalization
ANN parameters
§ Parameters: changing while training ANN
Ø Weights
Ø Biases
§ Hyperparameters: constant parameters related to ANN
configuration defined before training ANN
Ø Learning rate
Ø Number of hidden layers
Ø Net function
Ø Activation function,
Ø Number of examples in the training dataset…
Shallow NN and deep NN
§ Shallow NN:
Ø One hidden layer
Ø Used for simple problems
§ Deep NN:
Ø Many hidden layers
Ø Used for complex problems
Ø Each layer is used for a specific role in the entire problem
Deep learning video
Model fitting
§ Underfitting (high bias):
Ø Model is too simple for data
Ø Train error is large, vali/test error is large too.
Ø Model can do accurate predictions, but the initial assumption
about the data is incorrect
Model fitting (cont)
§ Overfitting (high variance):
Ø Model is too complex for data
Ø Model memorizes the training data rather than generalize the
data à error on training set is small, error on testing set is large
Model fitting (cont)
Good generalization
Model fitting
(cont) Good model Overfitting
Underfitting Bad model
How to avoid underfitting?
§ Try more complex model
Ø More powerful model with a larger number of parameters
Ø More layers
Ø More neurons per layer
§ Try larger quantity of features
Ø Get additional features
Ø Feature engineering
§ Data cleaning, cross validation (hold-out, K-fold, LOOCV)
How to avoid overfitting?
§ Try more simple model
Ø Less powerful model with a fewer number of parameters
Ø Less layers, less neurons per layer
§ Try a smaller quantity of features
Ø Remove additional features
Ø Feature selection
How to avoid overfitting? (cont)
§ Enlarge data
Ø Data cleaning
Ø Cross validation (hold-out, K-fold, LOOCV)
Ø Data augmentation (rotate, flip, scale,…)
§ More regularization
Ø Early stopping
Ø Drop out
Ø L1, L2 regularization
Early stopping
Generalization
§ Good generalization: the I/O mapping is nearly correct for new data
Good
generalization
Generalization
§ Factors that influence generalization:
Ø Training set size
Ø ANN architecture
Ø Problem complexity
§ How to improve the generalization?
Ø Collect more data for training
Ø Train several networks then select the best one
Ø Avoid overfitting, avoid underfitting
ANN
§ Overview of ANN
§ Components of ANN
§ ANN training (forward and backward propagation)
§ ANN characteristics
§ ANN design
ANN design process
§ Data collection and representation
§ Setup network topology
§ Create network parameters
§ Initialize weight and bias values
§ Training
§ Validation à re-design or using
ANN design process
§ Data collection and representation
§ Setup network topology
§ Create network parameters
§ Initialize weight and bias values
§ Training
§ Validation à re-design or using
Data representation
One-hot encoding
⎧ 1, x ∈ C ⎡ ⎤
⎪ j k ⎢ 0 ⎥
dk , j =⎨ ⎢ ! ⎥
⎪⎩ 0, x j ∉ Ck ⎢ 1 ⎥ ← kth element
⎢ ⎥
⎢ ! ⎥
⎢ 0 ⎥
⎣ ⎦
Ck − class k
x j − input j
d k , j − desired output
ANN design process
§ Data collection and representation
§ Setup network topology
§ Create network parameters
§ Initialize weight and bias values
§ Training
§ Validation à re-design or using
Network topology
§ The way to connect neurons to form a network
§ Topology consists of:
Ø Neural framework: described by the number of neuron layers,
the number of neurons per layer
Ø Interconnection structure: different kinds of connections such
as interlayer connection, intralayer connection, self connection,
sublayer connection
Types of ANN structure
§ Feed forward neural network: may or may not have the hidden
layers (one or multiple hidden layers)
§ Radial basis function neural network
§ Self organizing neural network
§ Recurrent neural network
§ Convolutional neural network
§ Modular neural network
ANN design process
§ Data collection and representation
§ Setup network topology
§ Create network parameters
§ Initialize weight and bias values
§ Training
§ Validation à re-design or using
Network parameters
§ Learning rate
§ Activation function
§ Net function
§ Data preprocessing
§ Number of examples in the training data set
Heuristic 1
§ Maximization of information content: every training
example presented to the backpropagation algorithm must
maximize the information content.
Ø Use of an example that results in the largest training error.
Ø Use of an example that is radically different from all those
previously used.
Heuristic 2
§ Activation function: network learns faster with antisymmetric
functions when compared to nonsymmetric functions.
Antisymmetric function Nonsymmetric function
Heuristic 3
§ How many training data?
Rule of thumb: the number of training examples should be at
least five to ten times the number of weights of the network
ANN design process
§ Data collection and representation
§ Setup network topology
§ Create network parameters
§ Initialize weight and bias values
§ Training
§ Validation à re-design or using
Initialization
§ Initializing weights and biases before training process
§ Heuristics:
Ø Weights should be initialized randomly (except zero)
Ø Biases should be initialized as zero
ANN design process
§ Data collection and representation
§ Setup network topology
§ Create network parameters
§ Initialize weight and bias values
§ Training
§ Validation à re-design or using
Learning modes
§ Online learning: learning as the data comes in (one example at a
time)
Ø Sequential mode or stochastic mode
§ Offline learning: learning over the entire dataset
Ø Batch mode: updating parameters after consuming the whole
batch
ANN sequential training mode
§ Presenting I/O-1 as x(1)-y(1)
§ Performing a sequence of forward and backward computations
§ Updating the weights
§ Same for x(2)-y(2),… , x(N)-y(N)
§ The learning process continues on an epoch-by-epoch basis until
the stopping condition is satisfied
ANN design process
§ Data collection and representation
§ Setup network topology
§ Create network parameters
§ Initialize weight and bias values
§ Training
§ Validation à re-design or using
Method
§ Hold out
§ K-fold cross validation
§ LOOCV
Performance
§ Confusion matrix à Precision, recall, accuracy,
F1-score, ROC, AUC, IoU,…
§ Algorithm complexity, cost,…
Deep learning video
Convolutional neural network (CNN)
CNN to classify handwritten digits
Convolution
Convolution layer
Convolution
Pooling layer
Classification
CNN to classify handwritten digits
Layer (type) Output Shape Param #
cv0 (Conv2D) (None, 128, 128, 16) 448
Convolutional neural
max_pooling2d_11 (MaxPooling)
network
(None, 64, 64, 16) 0
cv1 (Conv2D) (None, 64, 64, 32) 4640
max_pooling2d_12 (MaxPooling) (None, 32, 32, 32) 0
cv2 (Conv2D) (None, 32, 32, 64) 18496
max_pooling2d_13 (MaxPooling (None, 16, 16, 64) 0
cv3 (Conv2D) (None, 16, 16, 128) 73856
max_pooling2d_14 (MaxPooling (None, 8, 8, 128) 0
cv4 (Conv2D) (None, 8, 8, 256) 295168
max_pooling2d_15 (MaxPooling (None, 4, 4, 256) 0
cv5 (Conv2D) (None, 4, 4, 128) 32896
cv6 (Conv2D) (None, 4, 4, 64) 8256
cv7 (Conv2D) (None, 4, 4, 32) 2080
flatten_3 (Flatten) (None, 512) 0
hiddenlayer1 (Dense) (None, 512) 262656
hiddenlayer2 (Dense) (None, 128) 65664
dense_3 (Dense) (None, 51) 6579
activation_3 (Activation) (None, 51) 0
Deep learning crash course for beginners
https://www.youtube.com/watch?v=VyWAvY2CF9c
Course developed by Jason Dsouza
Duration: 1hr 30 minutes