100% found this document useful (1 vote)

1K views28 pages

ML Unit-2

Machine learning jntuh R22

Uploaded by

yenagandula.narendra2904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

1K views28 pages

ML Unit-2

Machine learning jntuh R22

Uploaded by

yenagandula.narendra2904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

R22 Machine Learning Lecture Notes

UNIT-II
Multi-Layer Perceptron: Going Forwards, Going Backwards, Back Propagation Error, Multi-
layer perceptron in practice, Examples of using the MLP, Deriving Back-propagation.
Radial Basis Functions and Splines: Concepts, RBF Network, Curse of Dimensionality,
Interpolations and Basis Functions, Support Vector Machine

Multi-Layer Perceptron (MLP):

 The Multilayer Perceptron is a neural network where the mapping between inputs and
output is non-linear.
 A Multilayer Perceptron has input and output layers, and one or more hidden layers
with many neurons stacked together.
 Multilayer Perceptron can use any arbitrary activation function.
 Multilayer Perceptron falls under the category of feedforward algorithms, because
inputs are combined with the initial weights in a weighted sum and subjected to the
activation function just like in the Perceptron.
 But the difference is that each linear combination is propagated to the next layer.
 Each layer is feeding the next one with the result of their computation, their internal
representation of the data. This goes all the way through the hidden layers to the output
layer.

Example - XOR Problem:

1
R22 Machine Learning Lecture Notes

Input: (0,1)
A=1, B=0
At Neuron C:
1x1+0x1+1x(-0.5)=1+0-0.5=0.5 > Threshold 0
Neuron C Fires, so output is 1
At Neuron D:
1x1+0x1+1x(-1)=1+0-1=0
Neuron D does not fire, so output is 0
At Neuron E:
1x1+0x(-1)+1x(-0.5)=1-0-0.5=0.5 > Threshold 0
Neuron E fires, so output is 1.
Going Forwards:
 Training the MLP consists of two parts: working out what the outputs are for the given
inputs and the current weights, and then updating the weights according to the error, which
is a function of the difference between the outputs and the targets.
 These are generally known as going forwards and backwards through the network.
 Each neuron in the network (whether it is a hidden layer or the output) has one extra input,
with fixed value is called bias.
Going Backwards- Back Propagation of Error:
 Back-propagation of error makes it clear that the errors are sent backwards through the
network.
 It is a form of gradient descent.
 The problem is that when we try to adapt the weights of the Multi-layer Perceptron, we
have to work out which weights caused the error.
 This could be the weights connecting the inputs to the hidden layer, or the weights
connecting the hidden layer to the output layer.
 We use sum-of-squares error function, which calculates the difference between y and t for
each node, squares them, and adds them all together.

Where y is output, t is target and N is the number of output nodes.

 If we differentiate a function, then it tells us the gradient of that function, which is the
direction along which it increases and decreases the most.
 If we differentiate an error function, we get the gradient of the error.
 The purpose of learning is to minimise the error, we follow the error function downhill.

2
R22 Machine Learning Lecture Notes

 We need an activation function that looks like a threshold function but is differentiable
so that we can compute the gradient.
Activation Functions:
 The activation function basically decides whether a neuron should be activated or not.
 The activation function is a non-linear transformation that we do over the input before
sending it to the next layer of neurons or finalizing it as output.
Sigmoid Function:

 The Sigmoid activation function, also known as the logistic activation function,
takes inputs and turns them into outputs ranging between 0 and 1.
 For this reason, sigmoid is referred to as the “squashing function” and is
differentiable.
 Larger, more positive inputs should produce output values close to 1.0, with
smaller, more negative inputs producing outputs closer to 0.0.

3
R22 Machine Learning Lecture Notes

Hyperbolic Tangent Function:

 Tanh function is very similar to the sigmoid/logistic activation function, and even
has the same S-shape with the difference in output range of -1 to 1.
 In Tanh, the larger the input (more positive), the closer the output value will be to
1.0, whereas the smaller the input (more negative), the closer the output will be to
-1.0.

Multi-Layer Perceptron Algorithm:

1. An input vector is put into the input nodes
2. The inputs are fed forward through the network
o The inputs and the first-layer weights (here labelled as v) are used to decide whether
the hidden nodes fire or not.

4
R22 Machine Learning Lecture Notes

o The activation function g(·) is the sigmoid function

o The outputs of these neurons and the second-layer weights (labelled as w) are used
to decide if the output neurons fire or not
3. The error is computed as the sum-of-squares difference between the network outputs and the
targets
4. This error is fed backwards through the network in order to
o First update the second-layer weights and then afterwards, the first-layer weights

5
R22 Machine Learning Lecture Notes

Improvements for MLP Algoritthm:

Initializing the weights:
 The MLP algorithm suggests that the weights are initialised to small random numbers,
both positive and negative.

 A common trick is to set the weights in the range where n

is the number of nodes in the input layer to those weights.
 We use random values for the initialisation so that the learning starts off from different
places for each run, and we keep them all about the same size because we want all of
the weights to reach their final values at about the same time. This is known as uniform
learning
Different Output Activation Functions:
 Sigmoid activation function in the output layer is fine for Binary classification
problems.
 For regression problems, we use linear activation function in the output layer.
 For multi class classification, we use softmax activation function.

Sequential and Batch Training:

 In Batch Training, we only update the weights once for each iteration of the algorithm,
which means that the weights are moved in the direction that most of the inputs want
them to move, rather than being pulled around by each input individually.
 The batch method performs a more accurate estimate of the error gradient, and will thus
converge to the local minimum more quickly.
 In Sequential Training, where the errors are computed and the weights updated after
each input.
 This is not guaranteed to be as efficient in learning, but it is simpler to program.
 Since it does not converge as well, it can also sometimes avoid local minima, thus
potentially reaching better solutions.
Local Minima:
 A loss function is a function that measures the error between a model’s predictions and
the ground truth.
 The goal of machine learning is to find a model that minimizes the loss function.

6
R22 Machine Learning Lecture Notes

 A local minimum is a point in the parameter space where the loss function is minimized
in a local neighborhood.
 A global minimum is a point in the parameter space where the loss function is
minimized globally.

Picking Up Momentum:
 Momentum in neural networks is a parameter optimization technique that accelerates
gradient descent by adding a fraction of the previous weight update to the current
weight update.

7
R22 Machine Learning Lecture Notes

Minibatches and Stochastic Gradient Descent:

 MiniBatch method is to find a middle way between Batch Algorithm and Sequential
Algorithm, by splitting the training set into random batches, estimating the gradient
based on one of the subsets of the training set, performing a weight update, and then
using the next subset to estimate a new gradient and using that for the weight update,
until all of the training set have been used.
 If the batches are small, then there is often a reasonable degree of error in the gradient
estimate, and so the optimisation has the chance to escape from local minima.
 In Stochastic Gradient Descent method, a single input vector is chosen from the training
set, and the output and hence the error for that one vector computed, and this is used to
estimate the gradient and so update the weights.
 A new random input vector is then chosen and the process repeated. This is known as
stochastic gradient descent.
Multi-layer perceptron in practice:
 Here we can discuss choices that can be made about the network in order to use it for
solving real problems.
Amount of Training Data:
 For the MLP with one hidden layer there are (L + 1) ×M + (M + 1) × N weights, where
L,M,N are the number of nodes in the input, hidden, and output layers, respectively.
 The extra +1s come from the bias nodes, which also have adjustable weights
 This is a potentially huge number of adjustable parameters that we need to set during
the training phase.
 Setting the values of these weights is the job of the back-propagation algorithm, which
is driven by the errors coming from the training data.
 Clearly, the more training data there is, the better for learning, although the time that
the algorithm takes to learn increases.
 Unfortunately, there is no way to compute what the minimum amount of data required
is, since it depends on the problem.
 A rule of thumb that you should use a number of training examples that is at least 10
times the number of weights.
 This is probably going to be a very large number of examples, so neural network
training is a fairly computationally expensive operation, because we need to show the
network all of these inputs lots of times.

Number of Hidden Layers:

 Two Choices
o The number of hidden nodes
o The number of hidden layers
 It is possible to show mathematically that one hidden layer with lots of hidden nodes
is sufficient. This is known as the Universal Approximation Theorem.
 we will never normally need more than two layers (that is, one hidden layer and the
output layer)

8
R22 Machine Learning Lecture Notes

When to stop Learning:

 The training of the MLP requires that the algorithm runs over the entire dataset many
times, with the weights changing as the network makes errors in each iteration.
 Two options
o Predefined number of Iterations
o Predefined minimum error reached
 Using both of these options together can help, as can terminating the learning once the
error stops decreasing.
 We train the network for some predetermined amount of time, and then use the
validation set to estimate how well the network is generalising.
 We then carry on training for a few more iterations, and repeat the whole process.
 At some stage the error on the validation set will start increasing again, because the
network has stopped learning about the function that generated the data, and started to
learn about the noise that is in the data itself.
 At this stage we stop the training. This technique is called early stopping.

Examples of using the MLP:

 We will then apply MLP to find solutions to four different types of problem: regression,
classification, time-series prediction, and data compression.
Regression:
 Regression is a statistical technique that is used for predicting continuous outcomes.
 If you want to predict a single value, you only need a single output neuron and if you want
to predict multiple values, you can add multiple output neurons.
 In general, we don't apply any activation function to the output layer of MLP, when dealing
with regression tasks, It just does the weighted sum and sends the output.
 But, in case you want your value between a given range, for example, -1 or +1 you can use
activation like Tanh(Hyperbolic Tangent) function.

9
R22 Machine Learning Lecture Notes

 The loss functions that can be used in Regression MLP include Mean Squared Error(MSE)
and Mean Absolute Error(MAE).
 MSE can be used in datasets with fewer outliers, while MAE is a good measure in datasets
which has more outliers.
 Example: Rainfall prediction, Stock price prediction

Classification:
 If the output variable is categorical, then we have to use classification for prediction.
Example: Iris Flower classification

 The aim is to classify iris flowers among three species (Setosa, Versicolor, or Virginica)
from the sepals’ and petals’ length and width measurements.
 The above neural network has one input layer, two hidden layers and one output layer.
 In the hidden layers we use sigmoid as an activation function for all neurons.
 In the output layer, we use softmax as an activation function for the three output
neurons.
 In this regard, all outputs are between 0 and 1, and their sum is 1.
 The neural network has three outputs since the target variable contains three classes
(Setosa, Versicolor, and Virginica).

10
R22 Machine Learning Lecture Notes

Working of Softmax:

 The input object belongs to Class 2 (66.4%)

Time series Prediction:
 There is a common data analysis task known as time-series prediction, where we have
a set of data that show how something varies over time, and we want to predict how the
data will vary in the future.
 The problem is that even if there is some regularity in the time-series, it can appear over
many different scales. For example, there is often seasonal variation in temperatures.
 Example: A typical time-series problem is to predict the ozone levels into the future
and see if you can detect an overall drop in the mean ozone level.
Data Compression / Data denoising:
 we train the network to reproduce the inputs at the output layer called auto-associative
learning
 These networks are known as auto encoders.
 The network is trained so that whatever you give as the input is reproduced at the output,
which doesn’t seem very useful at first, but suppose that we use a hidden layer that has
fewer neurons than the input layer.
 This bottleneck hidden layer has to represent all of the information in the input, so that
it can be reproduced at the output.
 It therefore performs some compression of the data, representing it using fewer
dimensions than were used in the input.

11
R22 Machine Learning Lecture Notes

 They are finding a different representation of the input data that extracts important
components of the data, and ignores the noise.
 This auto-associative network can be used to compress images and other data.

Deriving Back-propagation:
Things to know:
1. Derivative of ½ x2 is x
2. Chain rule:

3. Derivative of constant is zero. (Not a function of x is zero)

The Network output and the Error:

 The output of the neural network (the end of the forward phase of the algorithm) is a
function of three things:
 the current input (x)
 the activation function g(·) of the nodes of the network
 the weights of the network (v for the first layer and w for the second)
 We can’t change the inputs, since they are what we are learning about, nor can we
change the activation function as the algorithm learns.
 So the weights are the only things that we can vary to improve the performance of the
network, i.e., to make it learn.

 Here, x represents inputs

 v represents first set of weights
 w represents second set of weights
 y represents outputs

12
R22 Machine Learning Lecture Notes

 Note that i is an index over the input nodes, j is an index over the hidden layer neurons,
and k is an index over the output neurons.
The Error of the Network:
 Error function E(v,w) remind us that the only things that we can change are the weights
v and w.
 We will choose sum of squared error function

 We are going to use a gradient descent algorithm that adjusts each weight.
 The gradient that we want to know is how the error function changes with respect to
the different weights

Requirements of an Activation Function:

In order to model a neuron we want an activation function that has the following properties:

 it must be differentiable so that we can compute the gradient

 it should saturate (become constant) at both ends of the range, so that the neuron
either fires or does not fire
 it should change between the saturation values fairly quickly in the middle

13
R22 Machine Learning Lecture Notes

There is a family of functions called sigmoid functions because they are S-shaped that satisfy
all those criteria perfectly.

Back propagation of Error:

 Now we need chain rule as follows.

since we don’t know much about the inputs to a neuron, we just know about its output. That’s
fine, because we can use the chain rule again

14
R22 Machine Learning Lecture Notes

The important thing that we need to remember is that inputs to the output layer neurons come
from the activations of the hidden layer neurons multiplied by the second layer weights:

15
R22 Machine Learning Lecture Notes

The output Activation Functions:

 For regression case, we use linear activation function.
 For multiclass classification , we use softmax activation function.

Derivative of softmax function is as follows:

16
R22 Machine Learning Lecture Notes

Radial Basis Functions and Splines:

Receptive Fields:
 The Receptive Field (RF) is defined as the size of the area in the input that creates the
feature.
 It is essentially a measure of the relationship of an output feature (of any layer) with
the input area (patch).
Radial Basis Function(RBF) Network:
 Radial Basis Function (RBF) Networks are a specialized type of Artificial Neural
Network (ANN) used primarily for function approximation tasks.
 Radial Basis Function (RBF) Networks are a special category of feed-forward neural
networks comprising three layers:
o Input Layer: Receives input data and passes it to the hidden layer.
o Hidden Layer/RBF Layer: The core computational layer where RBF neurons
process the data.
o Output Layer: Produces the network’s predictions, suitable for classification or
regression tasks.

 RBF Networks are conceptually similar to K-Nearest Neighbor (k-NN) models, though
their implementation is distinct.
 The fundamental idea is that an item’s predicted target value is influenced by nearby
items with similar predictor variable values.
 Here’s how RBF Networks operate:
o Input Vector: The network receives an n-dimensional input vector that needs
classification or regression.
o RBF Neurons: Each neuron in the hidden layer represents a prototype vector
(center, radius/spread) from the training set. The network computes the
Euclidean distance between the input vector and each neuron’s center.
o Activation Function: The Euclidean distance is transformed using a Radial
Basis Function (typically a Gaussian function) to compute the neuron’s
activation value. This value decreases exponentially as the distance increases.

17
R22 Machine Learning Lecture Notes

o Output Nodes: Each output node calculates a score based on a weighted sum of
the activation values from all RBF neurons. For classification, the category with
the highest score is chosen.

Advantages of RBF Networks:

 Universal Approximation: RBF Networks can approximate any continuous function
with arbitrary accuracy given enough neurons.
 Faster Learning: The training process is generally faster compared to other neural
network architectures.
 Simple Architecture: The straightforward, three-layer architecture makes RBF
Networks easier to implement and understand.
Interpolation:
 Interpolation is estimating or measuring an unknown quantity between two known
quantities.
 Interpolation is a mathematical function that takes the values of nearby points and uses
them to predict the value of the unknown end.
 The process of interpolation involves creating a smooth curve between two data
points.
 The curve is created by plotting the point on the graph at which the distance between
two points is equal to half of their difference in y-coordinates.
 It is important because it ensures that your data points are evenly spaced along your
line.
 The interpolation formula is as follows:

18
R22 Machine Learning Lecture Notes

Example: if a child's height was measured at age 5 and age 6, interpolation could be
used to estimate the child's height at age 5.5.
Basis Function:
 Radial basis functions and several other machine learning algorithms can be written in
this form:

The Cubic Spline:

 We can continue to make the functions more complicated, with the important point
being how many degrees of continuity we require at the boundaries between the points.
 These functions are known as splines, and the most common one to use is the cubic
spline.
 Cubic Spline Interpolation: This type of interpolation creates a curved line to connect
two points in a graph. It's also called "quadratic spline interpolation" or "quadratic
smoothing."

19
R22 Machine Learning Lecture Notes

Curse of Dimensionality:
 The Curse of Dimensionality refers to the phenomenon where the efficiency and
effectiveness of algorithms deteriorate as the dimensionality of the data increases
exponentially.
 It is crucial to understand this concept because as the number of features or dimensions
in a dataset increases, the amount of data we need to generalize accurately grows
exponentially.
 Dimensions refer to the features or attributes of data.
 For instance, if we consider a dataset of houses, the dimensions could include the
house's price, size, number of bedrooms, location, and so on.
What problems does it cause?

1. Data sparsity. As mentioned, data becomes sparse, meaning that most of the high-
dimensional space is empty. This makes clustering and classification tasks challenging.
2. Increased computation. More dimensions mean more computational resources and
time to process the data.

20
R22 Machine Learning Lecture Notes

3. Overfitting. With higher dimensions, models can become overly complex, fitting to
the noise rather than the underlying pattern. This reduces the model's ability to
generalize to new data.
4. Distances lose meaning. In high dimensions, the difference in distances between data
points tends to become negligible, making measures like Euclidean distance less
meaningful.
5. Performance degradation. Algorithms, especially those relying on distance
measurements like k-nearest neighbors, can see a drop in performance.
6. Visualization challenges. High-dimensional data is hard to visualize, making
exploratory data analysis more difficult.

How to Solve the Curse of Dimensionality?.

 The primary solution to the curse of dimensionality is "dimensionality reduction."
 It's a process that reduces the number of random variables under consideration by
obtaining a set of principal variables.
 By reducing the dimensionality, we can retain the most important information in the
data while discarding the redundant or less important features.
Dimensionality Reduction Methods:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Factor Analysis
 Independent Component Analysis(ICA)
Support Vector Machine:
 Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.
 However, primarily, it is used for Classification problems in Machine Learning.
 The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.
 SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine.

21
R22 Machine Learning Lecture Notes

 SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM:

 Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed
as linearly separable data, and classifier is used called as Linear SVM classifier.
 Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier. 

Hyperplane:

 There can be multiple lines/decision boundaries to segregate the classes in n-

dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.
 The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features, then hyperplane will be a straight line.
 And if there are 3 features, then hyperplane will be a 2-dimension plane.

22
R22 Machine Learning Lecture Notes

 We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.

Support Vectors:

 The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.

Linear SVM:

 Suppose we have a dataset that has two tags (green and blue), and the dataset has two
features x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates
in either green or blue. Consider the below image:

 So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:

23
R22 Machine Learning Lecture Notes

 Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane.
 SVM algorithm finds the closest point of the lines from both the classes. These points
are called support vectors.
 The distance between the vectors and the hyperplane is called as margin.
 And the goal of SVM is to maximize this margin.
 The hyperplane with maximum margin is called the optimal hyperplane.

Non-Linear SVM:

 If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:

24
R22 Machine Learning Lecture Notes

 So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:

Z=x2+y2

 By adding the third dimension, the sample space will become as below image:

 So now, SVM will divide the datasets into classes in the following way. Consider the
below image:

25
R22 Machine Learning Lecture Notes

Kernels:
 The most interesting feature of SVM is that it can even work with a non-linear dataset
and for this, we use “Kernel Trick” which makes it easier to classifies the points.
Suppose we have a dataset like this:

 Here we see we cannot draw a single line or say hyperplane which can classify the
points correctly.
 So we convert this lower dimension space to a higher dimension space using some
quadratic functions which will allow us to find a decision boundary that clearly divides
the data points.
 The functions which help us to do this are called Kernels and which kernel to use is
purely determined by hyperparameter tuning.

Different Kernel functions:

 Polynomial Kernel

 Here d is the degree of the polynomial, which we need to specify manually.

 Suppose we have two features X1 and X2 and output variable as Y, so using
polynomial kernel we can write it as:

 So we basically need to find X12, X22 and X1.X2, and now we can see that 2
dimensions got converted into 5 dimensions.

26
R22 Machine Learning Lecture Notes

 Sigmoid Kernel

 It is just taking your input, mapping them to a value of 0 and 1 so that they can be
separated by a simple straight line. 
 RBF Kernel

 It creates non-linear combinations of our features to lift your samples onto a higher-
dimensional feature space where we can use a linear decision boundary to separate your
classes
 It is the most used kernel in SVM classifications, the following formula explains it
mathematically:

The Support Vector Machine Algorithm:

– identify the support vectors as those that are within some specified distance of the
closest point and dispose of the rest of the training data
– compute b* using equation

27
R22 Machine Learning Lecture Notes

Advantages of SVM:
 SVM works better when the data is Linear
 It is more effective in high dimensions
 With the help of the kernel trick, we can solve any complex problem
 SVM is not sensitive to outliers
 Can help us with Image classification

Disadvantages of SVM:
 Choosing a good kernel is not easy
 It doesn’t show good results on a large dataset
 The SVM hyperparameters are Cost -C and gamma. It is not that easy to fine-tune
these hyper-parameters. It is hard to visualize their impact.

******

AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
ML Unit-1
100% (1)
ML Unit-1
15 pages
ML Unit 1
100% (1)
ML Unit 1
42 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
ML Unit-5
No ratings yet
ML Unit-5
14 pages
ML Unit 4
No ratings yet
ML Unit 4
50 pages
ML Unit-4
0% (1)
ML Unit-4
17 pages
Unit V Graphical Models
No ratings yet
Unit V Graphical Models
23 pages
Unit 2 Machine Learning Notes
100% (1)
Unit 2 Machine Learning Notes
25 pages
STM Unit 3 Notes
No ratings yet
STM Unit 3 Notes
36 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
No ratings yet
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
7 pages
KRR Unit 1
No ratings yet
KRR Unit 1
26 pages
Unit-3 Unit-3 RL Problems, Prediction and Control P 241111 181426
No ratings yet
Unit-3 Unit-3 RL Problems, Prediction and Control P 241111 181426
15 pages
Unit I Notes Machine Learning Techniques 1
No ratings yet
Unit I Notes Machine Learning Techniques 1
21 pages
Deep Learning R18 Jntuh Lab Manual
0% (1)
Deep Learning R18 Jntuh Lab Manual
21 pages
Advanced Reinforcement Learning
No ratings yet
Advanced Reinforcement Learning
30 pages
Tangent Prop and Manifold Tangent Classifier Are B
No ratings yet
Tangent Prop and Manifold Tangent Classifier Are B
4 pages
Heuristic Search: Dr.M. Nagaratna Professor, Dept - of CSE Jntuceh
No ratings yet
Heuristic Search: Dr.M. Nagaratna Professor, Dept - of CSE Jntuceh
54 pages
Optimal Decision in Games
No ratings yet
Optimal Decision in Games
68 pages
Unit 5
No ratings yet
Unit 5
20 pages
Unit 4
100% (1)
Unit 4
7 pages
JNTUH FLAT Study Material
No ratings yet
JNTUH FLAT Study Material
211 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
KRR Unit-5
100% (2)
KRR Unit-5
51 pages
FLAT - UNIT 1 Notes
100% (2)
FLAT - UNIT 1 Notes
18 pages
ML Unit-1
100% (2)
ML Unit-1
12 pages
Data Analytics - Object Segmentation UNIT-IV
100% (1)
Data Analytics - Object Segmentation UNIT-IV
33 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
19 pages
NNDL Technical Publication Notes
No ratings yet
NNDL Technical Publication Notes
81 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Unit 1 Introduction To ML
50% (2)
Unit 1 Introduction To ML
52 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
43 pages
Introduction to Scripting Languages
No ratings yet
Introduction to Scripting Languages
12 pages
Ai Unit 1 Notes
No ratings yet
Ai Unit 1 Notes
51 pages
NLP Course for B.Tech CSE Students
100% (1)
NLP Course for B.Tech CSE Students
8 pages
Automata - Structural Representation
No ratings yet
Automata - Structural Representation
12 pages
Ai - Unit - 3-1
100% (1)
Ai - Unit - 3-1
31 pages
Machine Learning-Unit-V-Notes
No ratings yet
Machine Learning-Unit-V-Notes
23 pages
ML Notes
100% (1)
ML Notes
202 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
No ratings yet
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
25 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Ai Unit 4 Notes
No ratings yet
Ai Unit 4 Notes
36 pages
STM Viva Que
100% (2)
STM Viva Que
54 pages
Machine Learning UNIT 1 PDF
100% (1)
Machine Learning UNIT 1 PDF
33 pages
21cs502 Unit 4 Ai Notes Short
No ratings yet
21cs502 Unit 4 Ai Notes Short
32 pages
Ccs355 Neural Networks and Deep Learning Unit1
No ratings yet
Ccs355 Neural Networks and Deep Learning Unit1
29 pages
Unit 2
No ratings yet
Unit 2
15 pages
Dbms Lab Manual II Cse II Sem
No ratings yet
Dbms Lab Manual II Cse II Sem
58 pages
Cse Flat Digital Notes Full 2020 21
No ratings yet
Cse Flat Digital Notes Full 2020 21
195 pages
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
IoT & SDN Integration with Raspberry Pi
100% (1)
IoT & SDN Integration with Raspberry Pi
65 pages
Expert System Architecture
No ratings yet
Expert System Architecture
11 pages
Software Testing Lab Guide
No ratings yet
Software Testing Lab Guide
50 pages
STM Notes Unit1
50% (2)
STM Notes Unit1
61 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
Dark Web Classification Based On Text CNN and Topic Modelling Weight
100% (1)
Dark Web Classification Based On Text CNN and Topic Modelling Weight
3 pages
Customer Behavior Model Using Data Mining: Milan Patel, Srushti Karvekar, Zeal Mehta
No ratings yet
Customer Behavior Model Using Data Mining: Milan Patel, Srushti Karvekar, Zeal Mehta
8 pages
Recommender Systems Notes
No ratings yet
Recommender Systems Notes
21 pages
Medical Statistics
No ratings yet
Medical Statistics
38 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
57 pages
Neuroimage: Clinical: Sciencedirect
No ratings yet
Neuroimage: Clinical: Sciencedirect
9 pages
Top 10 Open Source Data Mining Tools: A Brief Look at Mining Tasks
No ratings yet
Top 10 Open Source Data Mining Tools: A Brief Look at Mining Tasks
2 pages
DSCI 6003 Class Notes
No ratings yet
DSCI 6003 Class Notes
7 pages
Ai and Data Science
No ratings yet
Ai and Data Science
9 pages
Microproject Report
No ratings yet
Microproject Report
23 pages
BDA Final Notes
No ratings yet
BDA Final Notes
53 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Park 2015
No ratings yet
Park 2015
7 pages
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
100% (4)
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
2 pages
Open Data Book
No ratings yet
Open Data Book
99 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
What Are The Advantages and Disadvantages of Deep Learning
No ratings yet
What Are The Advantages and Disadvantages of Deep Learning
4 pages
PREDICTING BANK CREDIT RISK USING DATA MINING Group SIX
No ratings yet
PREDICTING BANK CREDIT RISK USING DATA MINING Group SIX
5 pages
Smart Safety and Security Solution For Women Using KNN Algorithm and Iot
No ratings yet
Smart Safety and Security Solution For Women Using KNN Algorithm and Iot
6 pages
Detection of Defects in Rolled Stainless Steel Plates by Machine Learning
No ratings yet
Detection of Defects in Rolled Stainless Steel Plates by Machine Learning
7 pages
AI Chatbot For Tourist Recommendations: A Case Study in Vietnam
No ratings yet
AI Chatbot For Tourist Recommendations: A Case Study in Vietnam
13 pages
17 - A Deep Learning Analysis On Question Classification Task Using Word2vec Representations
No ratings yet
17 - A Deep Learning Analysis On Question Classification Task Using Word2vec Representations
20 pages
Using Generative Adversarial Networks For Improving Classification Effectiveness in Credit Card Fraud Detection
100% (1)
Using Generative Adversarial Networks For Improving Classification Effectiveness in Credit Card Fraud Detection
8 pages
28 - AI-Regression vs. Classification
No ratings yet
28 - AI-Regression vs. Classification
35 pages
ML R23 Material
No ratings yet
ML R23 Material
79 pages
Facial Detection Using Deep Learning, 1
No ratings yet
Facial Detection Using Deep Learning, 1
7 pages
Weather Forecasting and Prediction Using Hybrid C5.0
100% (1)
Weather Forecasting and Prediction Using Hybrid C5.0
14 pages
Social Media Mining With R Sample Chapter
100% (1)
Social Media Mining With R Sample Chapter
18 pages
Data Mining & Bayesian Networks Quiz
100% (1)
Data Mining & Bayesian Networks Quiz
6 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages

ML Unit-2

Uploaded by

ML Unit-2

Uploaded by

R22 Machine Learning Lecture Notes

Multi-Layer Perceptron (MLP):

Example - XOR Problem:

Where y is output, t is target and N is the number of output nodes.

Hyperbolic Tangent Function:

Multi-Layer Perceptron Algorithm:

o The activation function g(·) is the sigmoid function

Improvements for MLP Algoritthm:

 A common trick is to set the weights in the range where n

Sequential and Batch Training:

Minibatches and Stochastic Gradient Descent:

Number of Hidden Layers:

When to stop Learning:

Examples of using the MLP:

 The input object belongs to Class 2 (66.4%)

3. Derivative of constant is zero. (Not a function of x is zero)

The Network output and the Error:

 Here, x represents inputs

Requirements of an Activation Function:

 it must be differentiable so that we can compute the gradient

Back propagation of Error:

The output Activation Functions:

Derivative of softmax function is as follows:

Radial Basis Functions and Splines:

Advantages of RBF Networks:

The Cubic Spline:

How to Solve the Curse of Dimensionality?.

 There can be multiple lines/decision boundaries to segregate the classes in n-

Different Kernel functions:

 Here d is the degree of the polynomial, which we need to specify manually.

The Support Vector Machine Algorithm:

You might also like