0% found this document useful (0 votes)

8 views17 pages

Deep Learning Classification-3

This document discusses the use of the MNIST dataset for handwritten digit recognition through deep learning classification models. It covers data preparation, model building, error calculation, and the importance of normalization and one-hot encoding in training neural networks. Additionally, it addresses issues like underfitting and overfitting, emphasizing the need for model complexity in capturing non-linear relationships in data.

Uploaded by

bannureddy669

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views17 pages

Deep Learning Classification-3

Uploaded by

bannureddy669

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

DEEP LEARNING

HANDOUT: CLASSIFICATION

Hand Written Digital Picture Dataset, Build a model, Error Calculation, Non-Linear model, model
complexity, Optimization Method

1.Hand Written Digital Picture Dataset:

We will take 0–9 digital picture recognition as an example to explore how to use machine learning to
solve the classification problem.

Dataset: MNIST data set.

• It is handwritten digital dataset, generally scaled to a fixed size, such as 28x28 pixels. For
simplicity grayscale information is retained.

• These pictures will be used as the input data x. Also the data is labelled in nature.

• MNIST data set contains real handwritten pictures of numbers 0–9. Each number has a total of
7,000 pictures, collected from different writing styles.

• Out of it 60k images are used for training and 10k for testing.

NOTE:

• Generally, pixel values are integers ranging from 0 to 255 to express color intensity information.

• For example, 0 represents the lowest intensity, and 255 indicates the highest intensity.
• If it is a color picture, each pixel contains the intensity information of the three channels R, G, and
B, which, respectively, represent the color intensity of colors red, green, and blue.
• Therefore, each pixel of a color picture is represented by a one-dimensional vector with three
elements, which represent the intensity of R, G, and B colors.

• As a result, a color image is saved as a tensor with dimension [h, w, 3].

• A grayscale picture only needs a two-dimensional matrix with shape [h, w] or a three- dimensional
tensor with shape [h, w, 1] to represent its information.

• The matrix content of a picture for number 8. It can be seen that the black pixels in the picture are
represented by 0 and the grayscale information is represented by 0–255.

• The whiter pixels in the picture correspond to the larger values in the matrix.

Steps to download, manage, and load the MNIST dataset:

Deep learning frameworks like TensorFlow and PyTorch can easily download, manage, and load the
MNIST dataset through a few lines of code. Here we use TensorFlow to automatically download the
MNIST dataset and convert it to a Numpy array format:
# convert to float type and rescale to [-1, 1]

# convert to integer tensor

# create training dataset:

Summary:
The load_data () function returns two tuple objects: the first is the training set, and the second is the
test set. The first element of the first tuple is the training picture data X, and the second element is the
corresponding category number Y. Each image (Figure 3) in the training set X consists of 28×28 pixels,
and there are 60,000 images in the training set X, so the final dimension of X is (60000,28,28). The size
of Y is (60,000), representing the 60,000 digital numbers ranging from 0–9.
Similarly, the test set contains 10,000 test pictures and corresponding digital numbers with dimensions
(10000,28,28) and (10,000) separately. The MNIST dataset loaded from TensorFlow contains images
with values from 0 to 255. In machine learning, it is generally desired that the range of data is
distributed in a small range around 0. Therefore, we rescale the pixel range to interval [−1, 1], which
will benefit the model optimization process.

NOTE:

• We use a matrix of shape [h, w] to represent a picture.

• For multiple pictures, we can add one more dimension in front and use a tensor of shape [b, h, w] to
represent them.

• Here b represents the batch size.

• Color pictures can be represented by a tensor with the shape of [b, h, w, c], where c
represents the number of channels, which is 3 for color pictures.

• TensorFlow’s Dataset object can be used to conveniently convert a dataset into batches using the
batch ( ) function.

Q1: Why do we rescale pixel values to the range [-1, 1]?

Answer: Neural networks train faster and more effectively when the input data is centered around 0.
The original MNIST pixel values range from 0 to 255, so we scale them to [-1, 1] to ensure:
 Better convergence
 Stable gradients
 Improved optimization
This normalization improves learning efficiency and is a common best practice.

Q2: What is the shape of the data after loading and converting it using TensorFlow?

Answer: After loading:

(x, y), (x_val, y_val) = datasets.mnist.load_data()
 x shape: (60000, 28, 28) → 60,000 grayscale images
 y shape: (60000,) → 60,000 labels
After one-hot encoding:
y = tf.one_hot(y, depth=10)
y shape: (60000, 10) → labels converted to one-hot vectors (0–9 classes)

Q3: What is the purpose of train_dataset.batch(512)?

Answer: The batch() function splits the dataset into smaller groups of size 512. This is called batch
training.
Benefits:
 Reduces memory usage
 Increases training speed (vectorized computation)
 Enables stable updates in gradient descent
train_dataset = train_dataset.batch(512)
Each batch now contains 512 images and their corresponding labels.

Q4: Why do we use one-hot encoding for labels?

Answer: One-hot encoding transforms categorical labels into binary vectors. For classification tasks:
 It allows the model to assign probabilities to each class.
 It works well with loss functions like categorical cross-entropy.
Example: Label: 3 → One-hot: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

Q5: How does the datasets.mnist.load_data() function simplify dataset preparation?

Answer:
This function:
 Automatically downloads the MNIST dataset
 Returns training and test data as NumPy arrays
 Ensures data is pre-structured as (images, labels)
(x, y), (x_val, y_val) = datasets.mnist.load_data()
This eliminates manual downloading, file parsing, or reshaping, making it ideal for rapid
experimentation and teaching.
Q6: Why do we flatten the image?
Answer: Neural networks expect a vector input. So, we reshape [28, 28] into a vector of length 784.

Task: You trained a neural network on MNIST images without rescaling the pixel values (left
them between 0–255). You notice the model converges slowly and shows unstable training
behaviour.

Question:
Why is this happening, and how can you fix it?

Answer:
Pixel values between 0–255 are too large for most optimizers to handle efficiently. This causes
unstable gradients and poor convergence.

Fix: Normalize input to a smaller range (e.g., [-1, 1]) using:

x = 2 * tf.convert_to_tensor(x, dtype=tf.float32)/255. - 1
Task: You forgot to use the .batch() method while training with TensorFlow's Dataset object.
Your training is slow and inefficient.
Question:
Why is batching important, and how do you apply it?
Answer:
Batching groups multiple samples together to:
 Improve training speed (vectorized processing)
 Use memory efficiently
 Stabilize gradient updates
Apply batching like this:
train_dataset = train_dataset.batch(512)
2. BUILDING A MODEL:

Linear Model:
 For a single input scalar: We reduce the input vector

, to a single input scalar x, and the model can be expressed as y = xw + b.

 For multi-input, single output:

 For multi-input, multi-output: y = Wx + b

 In batch form: Y = X @ W + b

Where:
 X: Input matrix [batch_size, input_dim]
 W: Weight matrix [input_dim, output_dim]
 b: Bias vector [output_dim]
 @ represents matrix multiplication
 din represents input dimension, and dout indicates output dimension.

 X has shape [b, din], b is the number of samples and din is the length of each sample

 W has shape [din, dout], containing din ∗ dout parameters.

 Bias vector b has shape dout.

 Since the result of the operation X @ W is a matrix of shape [b, dout], it cannot
be directly added to the vector b.
 Therefore, the + sign in batch form needs to support broadcasting, that is, expand the
vector b into a matrix of shape [b, dout] by replicating b.

Let us build a Neural network of the same with 3 inputs and 2 outputs:
A grayscale image is stored using a matrix with shape [h, w], and b pictures are stored using a tensor
with shape [b, h, w].

• However, our model can only accept vectors, so we need to flatten the [h, w]
matrix into a vector of length [h ⋅ w]. Thus the length of the input
features din = h ⋅ w.

• The output can be set to a set of vectors with length dout, where dout is
the same as the number of categories.

For example, if the output belongs to the first category, then the
corresponding index is set to 1, and the other positions are set to 0.
This encoding method is called one-hot encoding.:

One-hot encoding is very sparse. Compared with digital encoding, it needs more storage, so digital
encoding is generally used for storage. During calculation, digital encoding is converted to one-hot
encoding, which can be achieved through the tf.one_hot() function as follows:

Task: You are designing a neural network for a classification task with 10 classes. Your final
weight matrix W has a shape of [784, 1], and your model outputs shape [b, 1].
Question:
Why is this wrong, and how should you fix the shape of W?
Answer:
A [b, 1] output means the model is producing a single value per sample, which is suitable for
regression or binary classification.
But for multi-class classification with 10 classes, you need:
W.shape = [784, 10] → Output shape = [b, 10]

Task: While implementing Y = X @ W + b, you encounter a shape mismatch error because b has
shape [10], and X @ W returns shape [64, 10].
Question:
What causes this, and how is it resolved?
Answer:
The issue is that b is a 1D vector, but it’s being added to a 2D matrix. TensorFlow uses
broadcasting to expand b into shape [64, 10] automatically (replicating it across the batch).
Ensure that broadcasting is supported. If needed, reshape b as:
b = tf.reshape(b, (1, 10)) # shape becomes broadcastable

Q1: What is the shape of X @ W when:

 X.shape = [32, 784] (32 samples, 784 features each)
 W.shape = [784, 10] (weights for 10 output classes)
Answer: Matrix multiplication results in:
X @ W → shape = [32, 10]
This gives 10 outputs (logits) for each of the 32 input samples.
Q2: Why is broadcasting needed when adding the bias vector b to X @ W?
Answer: After computing X @ W, the result has shape [batch_size, output_dim] (e.g., [32, 10]). But:
b.shape = [10] (1D vector)
To add them:
 TensorFlow broadcasts the bias vector b to match shape [32, 10] by replicating it across all
samples.
This enables element-wise addition:
output = X @ W + b # broadcasting b

Q3: What is the purpose of one-hot encoding in classification tasks?

Answer: One-hot encoding converts class labels (e.g., 0–9) into binary vectors so that:
 The model outputs a probability for each class
 A proper loss function (like categorical cross-entropy) can compare the predicted and true
distributions
Example:
Label: 2 → One-hot: [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]

Q4: What will be the output of the following code?

import tensorflow as tf
y = tf.constant([0, 1, 2, 3])
y_onehot = tf.one_hot(y, depth=10)
print(y_onehot)
Answer: The output will be a tensor of shape (4, 10) where each row is a one-hot encoded vector of a
digit from 0 to 3:
[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
These represent one-hot encoded values of digits 0 to 3.
3. ERROR CALCULATION

• For classification problems, our goal is to maximize a certain performance metric, such as accuracy.

• But when accuracy is used as a loss function, it is in fact indifferentiable.

• As a result, the gradient descent algorithm cannot be used to optimize the model parameters.

• For the error calculation of a classification problem, it is more common to use the cross- entropy
loss function instead of the mean squared error loss function introduced in the regression problem.

MAJOR ISSUES in Handwritten digital picture recognition problems are:

1. A linear model is not enough because:

– It is one of the simplest models in machine learning.

– It has only a few parameters

– It can only express linear relationships.

The perception and decision-making of complex brains are far more complex than a linear model.

2. Complexity:

• It is the model ability to approximate complex distributions.

• The preceding solution only uses a one-layer neural network model

composed of a small number of neurons.

• Compared with the 100 billion neuron interconnection structure in

the human brain, its generalization ability is obviously weaker.

Example of model complexity and data distribution:

• The distribution of sampling points with observation errors is plotted. The actual
distribution may be a quadratic parabolic model.

• If you use a linear model to fit the data, it is difficult to learn a good model.

• If you use a suitable polynomial function model to learn, such as a quadratic

polynomial, you can learn a suitable model.

• But when the model is too complex, such as a ten-degree polynomial, it is likely
to overfit and hurt the generalization ability of the model
Task:
You are training a neural network to recognize handwritten digits. You observe three different
models as shown in the diagram:
 (a) Linear model (b) Matching model (c) Complex model
Question:
Which model is likely underfitting, which one fits well, and which one is overfitting? Justify your
answer.
Answer:
 (a) Underfitting – Model is too simple, can’t capture the data pattern.
 (b) Good fit – Just enough complexity to capture the data trend.
 (c) Overfitting – Too complex; learns noise, not just patterns.

1. What is underfitting in machine learning?

Answer:
Underfitting happens when the model is too simple to learn the underlying pattern of the data. It
performs poorly on both training and test data.
2. What is overfitting in machine learning?
Answer:
Overfitting occurs when a model learns the training data too well, including its noise. It performs well
on training data but poorly on new, unseen data.
3. Why is a linear model not suitable for handwritten digit recognition?
Answer:
Because a linear model cannot capture complex patterns in images. Handwritten digits require models
that can learn non-linear relationships.
4. What is the role of model complexity in machine learning?
Answer:
Model complexity refers to a model’s ability to fit complex patterns. Higher complexity can fit more
detailed trends, but too much can lead to overfitting.
5. Why is accuracy not used as a loss function in classification tasks?
Answer:
Because accuracy is not differentiable, so gradient-based optimization (like gradient descent) cannot
use it. Cross-entropy loss is used instead.
6. What is a good alternative to Mean Squared Error for classification problems?
Answer:
Cross-entropy loss is a better alternative because it is differentiable and works well for classification
tasks.
NON-LINEAR MODEL

• Since a linear model is not feasible, we can embed a nonlinear function in the linear
model and convert it to a nonlinear model.

• We call this nonlinear function the activation function, which is represented by σ:

o= σ(Wx+b)

Activation Function:

• Activation functions introduce non-linearities into the network,

allowing it to learn complex patterns in the data.

• Common activation functions are ReLU, Sigmoid, Tanh etc.

• The ReLU function only retains the positive part of function y = x and sets the
negative part to be zeros.

• It has a unilateral suppression characteristic. Although simple, the ReLU function

has excellent nonlinear characteristics, easy gradient calculation, and stable training
process.

• It is one of the most widely used activation functions for deep learning models.

• we convert the model to a nonlinear model by embedding the ReLU function:

o = ReLU( Wx+ b )

Model Complexity:

• To increase the model complexity, we can repeatedly stack multiple transformations

such as:
In the preceding equations, we take:

• the output value h1 of the first-layer neuron as the input of the second-layer neuron .

• Then take the output h2 of the second-layer neuron as the input of the third-layer neuron.

• The output of the last-layer neuron is the model output.

• We call the layer where the input node x is located the input layer.

• The output of each nonlinear module hi along with its parameters Wi and bi is called a
network layer.

• In particular, the layer in the middle of the network is called the hidden layer, and
the last layer is called the output layer.

• This network structure formed by the connection of a large number of neurons is called a
neural network.

• The number of nodes in each layer and the number of layers determine the
complexity of the neural network.
Task:
You are building a deep neural network model to classify images. You decide to use two hidden
layers. Each layer applies a non-linear activation to its input. The architecture and equations are as
follows:

Question:
1. Why is it necessary to use a non-linear activation function (like ReLU) in hidden layers?
2. What would happen if all activation functions were removed and only linear operations
remained?
Answer:
1. Without non-linearity, a neural network—regardless of how many layers it has—behaves like a
single linear transformation. Non-linear activation functions like ReLU allow the model to
learn complex, non-linear patterns in data, which is essential for tasks like image recognition.
2. If activation functions were removed, the model would collapse into a linear model, limiting
its expressiveness and making it unable to capture the complexity in the data.

1. What is the ReLU activation function?

Answer:
ReLU (Rectified Linear Unit) outputs the input directly if it is positive; otherwise, it outputs zero.
It’s defined as:

2. What is the main difference between ReLU and Sigmoid?

Answer:
 Sigmoid outputs values between 0 and 1 and is non-linear but can cause vanishing gradients.
 ReLU is faster, does not saturate for large values, and is more commonly used in deep
networks.

3. What is a hidden layer in a neural network?
Answer:
A layer between the input and output layers where intermediate computations and feature
extraction happen.

4. Why do we stack multiple layers in a neural network?

Answer:
To increase model complexity and allow the network to learn hierarchical features from data.

5. What determines the complexity of a neural network?

Answer:
 Number of layers
 Number of neurons per layer
 Type of activation functions used
OPTIMISATION METHOD: CLASSIFICATION

• Optimization methods similar to regression can also be used to solve classification problems.

For a network model with single layer:

• we can directly derive the partial derivative expression of

and

• and then calculate the gradient for each step and update the parameters w and b using the gradient
descent algorithm.
• As complex nonlinear functions are embedded, the number of network layers and the length of data
features also increase.

• The model becomes very complicated, and it is difficult to manually derive the gradient expressions.

• once the network structure changes, the model function and corresponding gradient expressions also
change.

• Therefore, it is obviously not feasible to rely on the manual calculation of the gradient.

SOLUTION TO THIS PROBLEM:

• Invention of deep learning frameworks.

• With the help of auto differentiation technology, deep learning frameworks can
build the neural network’s computational graph during the calculation of each layer’s
output corresponding loss function and then automatically calculate the gradient of

of any parameter 
• Users only need to set up the network structure, and the gradient will automatically be calculated
and updated, which is very convenient and efficient to use.

Project Documentation
No ratings yet
Project Documentation
24 pages
Tensorflow, Keras and Deep Learning
No ratings yet
Tensorflow, Keras and Deep Learning
51 pages
Tensorflow and Deep Learning
No ratings yet
Tensorflow and Deep Learning
51 pages
Exercise Classification
No ratings yet
Exercise Classification
8 pages
NN From Scratch
No ratings yet
NN From Scratch
5 pages
DL 2021 Tensorflow and Deep Learning
No ratings yet
DL 2021 Tensorflow and Deep Learning
232 pages
DL Mannual For Reference
No ratings yet
DL Mannual For Reference
58 pages
MNIST Dataset
No ratings yet
MNIST Dataset
12 pages
Unit III
No ratings yet
Unit III
28 pages
Assignment3 AL
No ratings yet
Assignment3 AL
23 pages
UNIT II - PPT - Part 1
No ratings yet
UNIT II - PPT - Part 1
41 pages
Deep Learning Python Code Notebook
No ratings yet
Deep Learning Python Code Notebook
9 pages
Stochastic Gradient Descent Guide
No ratings yet
Stochastic Gradient Descent Guide
61 pages
01 - Mnist - Ipynb (4) - JupyterLab
No ratings yet
01 - Mnist - Ipynb (4) - JupyterLab
23 pages
TensorFlow Crash Course: Linear Regression & Neural Networks
No ratings yet
TensorFlow Crash Course: Linear Regression & Neural Networks
63 pages
TLM For CNN
No ratings yet
TLM For CNN
32 pages
CNN TF Keras
No ratings yet
CNN TF Keras
6 pages
Assignment 02# - Machine Learning 2023
No ratings yet
Assignment 02# - Machine Learning 2023
8 pages
"I C U N N ": Mage Lassification Sing Eural Etworks
No ratings yet
"I C U N N ": Mage Lassification Sing Eural Etworks
15 pages
Explore The Implementation of CNNs in Python
No ratings yet
Explore The Implementation of CNNs in Python
10 pages
Implemented LeNet On PyTorch
100% (1)
Implemented LeNet On PyTorch
17 pages
DL Lab-Final
No ratings yet
DL Lab-Final
22 pages
Pytorch MNIST Digits Prediction Hands On 1
No ratings yet
Pytorch MNIST Digits Prediction Hands On 1
16 pages
Assignment 2 - Neural Network Fundamentals
No ratings yet
Assignment 2 - Neural Network Fundamentals
7 pages
Week 6
No ratings yet
Week 6
8 pages
Tensorflow PDF
No ratings yet
Tensorflow PDF
62 pages
Copia de BuildingABrain
No ratings yet
Copia de BuildingABrain
8 pages
How To Develop A CNN For MNIST Handwritten Digit Classification
No ratings yet
How To Develop A CNN For MNIST Handwritten Digit Classification
43 pages
Intro To Pytorch
No ratings yet
Intro To Pytorch
12 pages
ML Guide: MNIST Digit Classification
No ratings yet
ML Guide: MNIST Digit Classification
98 pages
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
No ratings yet
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
9 pages
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
No ratings yet
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
9 pages
DL-basics-of-neural-networks-MNIST-dataset - Ipynb - Colab
No ratings yet
DL-basics-of-neural-networks-MNIST-dataset - Ipynb - Colab
5 pages
106106213
No ratings yet
106106213
637 pages
Getting Started - TensorFlow
0% (1)
Getting Started - TensorFlow
14 pages
MLG Tensor
No ratings yet
MLG Tensor
34 pages
Image Recognition Using ML (CNN) For Beginners - by Akhil Haridasan - The Startup - Medium
No ratings yet
Image Recognition Using ML (CNN) For Beginners - by Akhil Haridasan - The Startup - Medium
21 pages
DL Unit II
No ratings yet
DL Unit II
29 pages
LAB Quiz2
No ratings yet
LAB Quiz2
4 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
Deep Learning Experiments
No ratings yet
Deep Learning Experiments
42 pages
BuildingABrain - Ipynb - Colab
No ratings yet
BuildingABrain - Ipynb - Colab
10 pages
LP V GRPB 2b
No ratings yet
LP V GRPB 2b
8 pages
Building Deep Learning Models Using The PyTorch Library
No ratings yet
Building Deep Learning Models Using The PyTorch Library
4 pages
Unit3 CNN
No ratings yet
Unit3 CNN
66 pages
ANN Notes
No ratings yet
ANN Notes
8 pages
Image Classification with PyTorch
No ratings yet
Image Classification with PyTorch
19 pages
TMA01 Question 1 (45 Marks)
No ratings yet
TMA01 Question 1 (45 Marks)
31 pages
Capstone Project-1
No ratings yet
Capstone Project-1
15 pages
This Python Script Implements A Single
No ratings yet
This Python Script Implements A Single
6 pages
Assignment 2 DL
No ratings yet
Assignment 2 DL
10 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
Keras
No ratings yet
Keras
4 pages
Applied Machine and Deep Learning
No ratings yet
Applied Machine and Deep Learning
34 pages
DLV Lab Manual Print
No ratings yet
DLV Lab Manual Print
29 pages
ASNM Program Explain
No ratings yet
ASNM Program Explain
4 pages
Neural Network Classification With
No ratings yet
Neural Network Classification With
25 pages
VNIT Mathematics Course Guide
No ratings yet
VNIT Mathematics Course Guide
29 pages
Qualitative Analysis of A Coupled System With Nonlinear Mixed Fractional Integro-Differential Equation Involving Caputo Fractional Derivative
No ratings yet
Qualitative Analysis of A Coupled System With Nonlinear Mixed Fractional Integro-Differential Equation Involving Caputo Fractional Derivative
33 pages
Control of A Cart-Ball System Using State-Feedback Controller
No ratings yet
Control of A Cart-Ball System Using State-Feedback Controller
6 pages
DE - Module 1
No ratings yet
DE - Module 1
10 pages
Boldrin Belluzi, Maykel Tesis
No ratings yet
Boldrin Belluzi, Maykel Tesis
184 pages
Nonlinear Science: Encyclopedia of
No ratings yet
Nonlinear Science: Encyclopedia of
1 page
Score Matching Through The Roof Llinear, Nonlinear, and Latent Variables Causal Discovery 26th July 2024 (AAA)
No ratings yet
Score Matching Through The Roof Llinear, Nonlinear, and Latent Variables Causal Discovery 26th July 2024 (AAA)
27 pages
Solution of Nonlinear Equations: Graphical and Incremental Search Methods
No ratings yet
Solution of Nonlinear Equations: Graphical and Incremental Search Methods
43 pages
Control Systems Overview
No ratings yet
Control Systems Overview
45 pages
Modeling and Simulation of Mechatronic Systems
100% (3)
Modeling and Simulation of Mechatronic Systems
161 pages
Describe The Structure of Mathematical Model in Your Own Words
No ratings yet
Describe The Structure of Mathematical Model in Your Own Words
10 pages
Duffing Electrical Oscillator
No ratings yet
Duffing Electrical Oscillator
18 pages
Challenges To Modelling Heave in Expansive Soils: Hung Q. Vu and Delwyn G. Fredlund
No ratings yet
Challenges To Modelling Heave in Expansive Soils: Hung Q. Vu and Delwyn G. Fredlund
24 pages
DE For EE
No ratings yet
DE For EE
6 pages
Dynamical Systems for Math Students
No ratings yet
Dynamical Systems for Math Students
40 pages
L1 Linear Equations
No ratings yet
L1 Linear Equations
50 pages
Types of Control Systems - Linear and Non Linear Control System - Electrical4u
No ratings yet
Types of Control Systems - Linear and Non Linear Control System - Electrical4u
6 pages
Solutions To The Variational Equations For Relative Motion of Satellites
No ratings yet
Solutions To The Variational Equations For Relative Motion of Satellites
10 pages
Deep ONet
No ratings yet
Deep ONet
22 pages
Siemens PLM Using Nonlinear Flexible Bodies in Multibody Simulation WP 50012 A22 Tcm1023 258268
No ratings yet
Siemens PLM Using Nonlinear Flexible Bodies in Multibody Simulation WP 50012 A22 Tcm1023 258268
11 pages
Single Pile Settlement Analysis
No ratings yet
Single Pile Settlement Analysis
12 pages
World Scientific Online Ebooks
No ratings yet
World Scientific Online Ebooks
217 pages
Chapter 03 Higher Order Linear ODEs
No ratings yet
Chapter 03 Higher Order Linear ODEs
10 pages
An Approximate Analysis Procedure For Piled Raft Foundations
100% (1)
An Approximate Analysis Procedure For Piled Raft Foundations
21 pages
Bifd Programme Version 20 July 2022
No ratings yet
Bifd Programme Version 20 July 2022
36 pages
LS Dyna Crack PDF
No ratings yet
LS Dyna Crack PDF
28 pages
Article - 1998 - Kang - Strongly Nonlinear Oscillations of Winding Machines PDF
No ratings yet
Article - 1998 - Kang - Strongly Nonlinear Oscillations of Winding Machines PDF
20 pages
Modelling Ball-&-Beam System From Newtonian Mechanics & Lagrange Methods - Bolivar-Vincenty
No ratings yet
Modelling Ball-&-Beam System From Newtonian Mechanics & Lagrange Methods - Bolivar-Vincenty
9 pages
CVX PDF
No ratings yet
CVX PDF
92 pages
Power Flow Analysis: Lecture 16 (Bus Admittance and Bus Impedance Matrix)
No ratings yet
Power Flow Analysis: Lecture 16 (Bus Admittance and Bus Impedance Matrix)
26 pages

Deep Learning Classification-3

Uploaded by

Deep Learning Classification-3

Uploaded by

DEEP LEARNING

1.Hand Written Digital Picture Dataset:

Dataset: MNIST data set.

• As a result, a color image is saved as a tensor with dimension [h, w, 3].

Steps to download, manage, and load the MNIST dataset:

# convert to integer tensor

# create training dataset:

• We use a matrix of shape [h, w] to represent a picture.

• Here b represents the batch size.

Q1: Why do we rescale pixel values to the range [-1, 1]?

Answer: After loading:

Q3: What is the purpose of train_dataset.batch(512)?

Q4: Why do we use one-hot encoding for labels?

Q5: How does the datasets.mnist.load_data() function simplify dataset preparation?

Fix: Normalize input to a smaller range (e.g., [-1, 1]) using:

, to a single input scalar x, and the model can be expressed as y = xw + b.

 For multi-input, single output:

 For multi-input, multi-output: y = Wx + b

 W has shape [din, dout], containing din ∗ dout parameters.

 Bias vector b has shape dout.

Q1: What is the shape of X @ W when:

Q3: What is the purpose of one-hot encoding in classification tasks?

Q4: What will be the output of the following code?

• But when accuracy is used as a loss function, it is in fact indifferentiable.

MAJOR ISSUES in Handwritten digital picture recognition problems are:

1. A linear model is not enough because:

– It is one of the simplest models in machine learning.

– It has only a few parameters

– It can only express linear relationships.

• It is the model ability to approximate complex distributions.

• The preceding solution only uses a one-layer neural network model

• Compared with the 100 billion neuron interconnection structure in

Example of model complexity and data distribution:

• If you use a suitable polynomial function model to learn, such as a quadratic

1. What is underfitting in machine learning?

• We call this nonlinear function the activation function, which is represented by σ:

• Activation functions introduce non-linearities into the network,

• Common activation functions are ReLU, Sigmoid, Tanh etc.

• It has a unilateral suppression characteristic. Although simple, the ReLU function

• we convert the model to a nonlinear model by embedding the ReLU function:

• To increase the model complexity, we can repeatedly stack multiple transformations

• The output of the last-layer neuron is the model output.

1. What is the ReLU activation function?

2. What is the main difference between ReLU and Sigmoid?

4. Why do we stack multiple layers in a neural network?

5. What determines the complexity of a neural network?

For a network model with single layer:

SOLUTION TO THIS PROBLEM:

• Invention of deep learning frameworks.

You might also like