UNIT IV Learning
UNIT IV Learning
In this topic, we will provide a detailed description of the types of Categories of Supervised Machine Learning
Machine Learning along with their respective algorithms:
Supervised machine learning can be classified into two types of
problems, which are given below:
1. Supervised Machine Learning
o Classification
As its name suggests, Supervised machine learning is based on
supervision. It means in the supervised learning technique, we train the o Regression
machines using the "labelled" dataset, and based on the training, the
machine predicts the output. Here, the labelled data specifies that a) Classification
some of the inputs are already mapped to the output. More preciously,
we can say; first, we train the machine with the input and Classification algorithms are used to solve the classification problems
corresponding output, and then we ask the machine to predict the in which the output variable is categorical, such as "Yes" or No, Male
output using the test dataset. or Female, Red or Blue, etc. The classification algorithms predict the
categories present in the dataset. Some real-world examples of
Let's understand supervised learning with an example. Suppose we classification algorithms are Spam Detection, Email filtering, etc.
have an input dataset of cats and dog images. So, first, we will provide
the training to the machine to understand the images, such as Some popular classification algorithms are given below:
the shape & size of the tail of cat and dog, Shape of eyes, colour,
height (dogs are taller, cats are smaller), etc. After completion of o Random Forest Algorithm
training, we input the picture of a cat and ask the machine to identify o Decision Tree Algorithm
the object and predict the output. Now, the machine is well trained, so
o Simple Linear Regression Algorithm Supervised Learning algorithms are used in image segmentation. In
o Multivariate Regression Algorithm this process, image classification is performed on different image data
with pre-defined labels.
o Decision Tree Algorithm
o Lasso Regression o Medical Diagnosis:
Supervised algorithms are also used in the medical field for
Advantages and Disadvantages of Supervised diagnosis purposes. It is done by using medical images and
Learning past labelled data with labels for disease conditions. With such
a process, the machine can identify a disease for the new
Advantages:
patients.
o Since supervised learning work with the labelled dataset so we o Fraud Detection - Supervised Learning classification
can have an exact idea about the classes of objects. algorithms are used for identifying fraud transactions, fraud
o These algorithms are helpful in predicting the output on the customers, etc. It is done by using historic data to identify the
basis of prior experience. patterns that can lead to possible fraud.
Disadvantages:
Clustering
2. Unsupervised Machine Learning
o
o Association
Unsupervised learning is different from the Supervised learning
technique; as its name suggests, there is no need for supervision. It 1) Clustering
means, in unsupervised machine learning, the machine is trained using
The clustering technique is used when we want to find the inherent
the unlabeled dataset, and the machine predicts the output without
groups from the data. It is a way to group the objects into a cluster
any supervision.
such that the objects with the most similarities remain in one group
In unsupervised learning, the models are trained with the data that is and have fewer or no similarities with the objects of other groups. An
neither classified nor labelled, and the model acts on that data without example of the clustering algorithm is grouping the customers by their
any supervision. purchasing behaviour.
The main aim of the unsupervised learning algorithm is to group Some of the popular clustering algorithms are given below:
or categories the unsorted dataset according to the similarities,
patterns, and differences. Machines are instructed to find the hidden o K-Means Clustering algorithm
patterns from the input dataset. o Mean-shift algorithm
o DBSCAN Algorithm
Let's take an example to understand it more preciously; suppose there
is a basket of fruit images, and we input it into the machine learning o Principal Component Analysis
model. The images are totally unknown to the model, and the task of o Independent Component Analysis
the machine is to find the patterns and categories of the objects.
3. Semi-Supervised Learning
To overcome the drawbacks of supervised learning and o Iterations results may not be stable.
unsupervised learning algorithms, the concept of Semi- o We cannot apply these algorithms to network-level data.
supervised learning is introduced. The main aim of semi-supervised
o Accuracy is low.
learning is to effectively use all the available data, rather than only
labelled data like in supervised learning. Initially, similar data is
clustered along with an unsupervised learning algorithm, and further, 4. Reinforcement Learning
it helps to label the unlabeled data into labelled data. It is because
labelled data is a comparatively more expensive acquisition than Reinforcement learning works on a feedback-based process, in
unlabeled data. which an AI agent (A software component) automatically explore
its surrounding by hitting & trail, taking action, learning from
We can imagine these algorithms with an example. Supervised experiences, and improving its performance. Agent gets rewarded
learning is where a student is under the supervision of an instructor at for each good action and get punished for each bad action; hence the
home and college. Further, if that student is self-analysing the same goal of reinforcement learning agent is to maximize the rewards.
concept without any help from the instructor, it comes under
unsupervised learning. Under semi-supervised learning, the student In reinforcement learning, there is no labelled data like supervised
has to revise himself after analyzing the same concept under the learning, and agents learn from their experiences only.
guidance of an instructor at college.
behaviour would occur again by adding something. It enhances being implemented with the help of Reinforcement Learning
the strength of the behaviour of the agent and positively by Salesforce company.
impacts it.
Advantages and Disadvantages of Reinforcement
o Negative Reinforcement Learning: Negative reinforcement
Learning
learning works exactly opposite to the positive RL. It increases
the tendency that the specific behaviour would occur again by Advantages
avoiding the negative condition.
Decision Tree
o Decision Tree is a Supervised learning technique that can be used
for both classification and Regression problems, but mostly it is
preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node
represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision
Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.
o Decision Trees usually mimic human thinking ability while making a For the next node, the algorithm again compares the attribute value
with the other sub-nodes and move further. It continues the process
decision, so it is easy to understand.
until it reaches the leaf node of the tree. The complete process can be
o The logic behind the decision tree can be easily understood because better understood using the below algorithm:
it shows a tree-like structure.
Example: Suppose there is a candidate who has a job offer and wants
to decide whether he should accept the offer or Not. So, to solve this
problem, the decision tree starts with the root node (Salary attribute Attribute Selection Measures
by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the While implementing a Decision tree, the main issue arises that how to
corresponding labels. The next decision node further gets split into select the best attribute for the root node and for sub-nodes. So, to
one decision node (Cab facility) and one leaf node. Finally, the decision solve such problems there is a technique which is called as Attribute
node splits into two leaf nodes (Accepted offers and Declined offer). selection measure or ASM. By this measurement, we can easily select
Consider the below diagram: the best attribute for the nodes of the tree. There are two popular
techniques for ASM, which are:
o Information Gain
o Gini Index
1. Information Gain:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) A too-large tree increases the risk of overfitting, and a small tree may
not capture all the important features of the dataset. Therefore, a
Where, technique that decreases the size of the learning tree without reducing
accuracy is known as Pruning. There are mainly two types of
o S= Total number of samples tree pruning technology used:
2. Gini Index:
Advantages of the Decision Tree
In Regression, we plot a graph between the variables which best fits outlier may hamper the result, so it should be avoided.
the given datapoints, using this plot, the machine learning model can o Multicollinearity: If the independent variables are highly correlated
make predictions about the data. In simple words, "Regression shows with each other than other variables, then such condition is called
a line or curve that passes through all the datapoints on target- Multicollinearity. It should not be present in the dataset, because it
predictor graph in such a way that the vertical distance between
creates problem while ranking the most affecting variable.
the datapoints and the regression line is minimum." The distance
between datapoints and line tells whether a model has captured a o Underfitting and Overfitting: If our algorithm works well with the
strong relationship or not. training dataset but not well with test dataset, then such problem is
called Overfitting. And if our algorithm does not perform well even
Some examples of regression can be as:
with training dataset, then such problem is called underfitting.
Types of Regression
There are various types of regressions which are used in data science
and machine learning. Each type has its own importance on different
scenarios, but at the core, all the regression methods analyze the effect
of the independent variable on dependent variables. Here we are
discussing some important types of regression which are given below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression Linear Regression:
o Ridge Regression
o Linear regression is a statistical regression method which is used for
o Lasso Regression:
predictive analysis.
o It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
o It is used for solving the regression problem in machine learning.
o The relationship between variables in the linear regression model can o Real estate prediction
be explained using the below image. Here we are predicting the o Arriving at ETAs in traffic.
salary of an employee on the basis of the year of experience.
Logistic Regression:
When we provide the input values (data) to the function, it gives the o Polynomial Regression is a type of regression which models the non-
S-curve as follows: linear dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve
between the value of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are
present in a non-linear fashion, so for such case, linear regression will
not best fit to those datapoints. To cover such datapoints, we need
Polynomial regression.
o In Polynomial regression, the original features are transformed
into polynomial features of given degree and then modeled
using a linear model. Which means the datapoints are best fitted
using a polynomial line.
Here, the blue line is called hyperplane, and the other two lines are
known as boundary lines.
Ridge Regression:
Learning
o A general linear or polynomial regression will fail if there is high
As we know, the Supervised Machine Learning algorithm can be
collinearity between the independent variables, so to solve such broadly classified into Regression and Classification Algorithms. In
problems, Ridge regression can be used. Regression algorithms, we have predicted the output for continuous
o Ridge regression is a regularization technique, which is used to
values, but to predict the categorical values, we need Classification
algorithms.
reduce the complexity of the model. It is also called as L2
regularization.
What is the Classification Algorithm?
o It helps to solve the problems if we have more parameters than
samples. The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of
Lasso Regression: training data. In Classification, a program learns from the given dataset
or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
o Lasso regression is another regularization technique to reduce the
or dog, etc. Classes can be called as targets/labels or categories.
complexity of the model.
o It is similar to the Ridge Regression except that penalty term contains Unlike regression, the output variable of Classification is a category,
only the absolute weights instead of a square of weights. not a value, such as "Green or Blue", "fruit or animal", etc. Since the
Classification algorithm is a Supervised learning technique, hence it
o Since it takes absolute values, hence, it can shrink the slope to 0,
takes labeled input data, which means it contains input with the
whereas Ridge Regression can only shrink it near to 0. corresponding output.
o It is also called as L1 regularization. The equation for Lasso
In classification algorithm, a discrete output function(y) is mapped to
regression will be:
input variable(x).
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait
until it receives the test dataset. In Lazy learner case, classification is
done on the basis of the most related data stored in the training
dataset. It takes less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based
on a training dataset before receiving a test dataset. Opposite to Lazy
learners, Eager Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Naïve Bayes, ANN.
o Linear Models o The lower log loss represents the higher accuracy of the model.
1. Log Loss or Cross-Entropy Loss: Predicted Positive True Positive False Positive
o It is used for evaluating the performance of a classifier, whose output Predicted Negative False Negative True Negative
is a probability value between the 0 and 1.
o For a good binary Classification model, the value of log loss should
be near to 0.
Dendrites Inputs
Synapse Weights
The given figure illustrates the typical diagram of Biological
Neural Network. Axon Output
The typical Artificial Neural Network looks something like the An Artificial Neural Network in the field of Artificial
given figure. intelligence where it attempts to mimic the network of neurons makes
up a human brain so that computers will have an option to understand
things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like
interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each
neuron has an association point somewhere in the range of 1,000 and
100,000. In the human brain, data is stored in such a manner as to be
distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human
brain is made up of incredibly amazing parallel processors.
Output Layer:
The artificial neural network takes input and computes the weighted
sum of the inputs and includes a bias. This computation is represented
in the form of a transfer function.
Artificial neural networks have a numerical value that can perform There is no particular guideline for determining the structure of
more than one task simultaneously. artificial neural networks. The appropriate network structure is
accomplished through experience, trial, and error.
Storing data on the entire network:
Unrecognized behavior of the network:
Data that is used in traditional programming is stored on the whole
network, not on a database. The disappearance of a couple of pieces It is the most significant issue of ANN. When ANN produces a testing
of data in one place doesn't prevent the network from working. solution, it does not provide insight concerning why and how. It
decreases trust in the network.
Capability to work with incomplete knowledge:
ADVERTISEMENT
After ANN training, the information may produce output even with
inadequate data. The loss of performance here relies upon the Hardware dependence:
significance of missing data.
The network is reduced to a specific value of the error, and this value
does not give us optimum results.
Science artificial neural networks that have steeped into the world in the
mid-20th century are exponentially developing. In the present time, we
have investigated the pros of artificial neural networks and the issues
encountered in the course of their utilization. It should not be overlooked
that the cons of ANN networks, which are a flourishing science branch,
are eliminated individually, and their pros are increasing day by day. It
means that artificial neural networks will turn into an irreplaceable part
of our lives progressively important.
How do artificial neural networks work? Afterward, each of the input is multiplied by its corresponding weights
( these weights are the details utilized by the artificial neural networks
Artificial Neural Network can be best represented as a weighted to solve a specific problem ). In general terms, these weights normally
directed graph, where the artificial neurons form the nodes. The represent the strength of the interconnection between neurons inside
The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.
ADVERTISEMENT
straight line, then such data is termed as linearly separable data, and
How does SVM works?
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated Linear SVM:
data, which means if a dataset cannot be classified by using a straight
Hence, the SVM algorithm helps to find the best line or decision
boundary; this best boundary or region is called as a hyperplane. SVM
algorithm finds the closest point of the lines from both the classes.
These points are called support vectors. The distance between the
vectors and the hyperplane is called as margin. And the goal of SVM
So as it is 2-d space so by just using a straight line, we can easily
is to maximize this margin. The hyperplane with maximum margin is
separate these two classes. But there can be multiple lines that can
called the optimal hyperplane.
separate these classes. Consider the below image:
By adding the third dimension, the sample space will become as below
image:
ADVERTISEMENT