[go: up one dir, main page]

0% found this document useful (0 votes)
48 views34 pages

UNIT IV Learning

The document provides an overview of various forms of machine learning, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with their applications and algorithms. It discusses the advantages and disadvantages of each type, highlighting real-world use cases such as fraud detection, medical diagnosis, and robotics. Additionally, it explains decision trees as a supervised learning technique used for classification and regression tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views34 pages

UNIT IV Learning

The document provides an overview of various forms of machine learning, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with their applications and algorithms. It discusses the advantages and disadvantages of each type, highlighting real-world use cases such as fraud detection, medical diagnosis, and robotics. Additionally, it explains decision trees as a supervised learning technique used for classification and regression tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

UNIT IV

Learning–Forms of Learning, Supervised Learning, Machine


Learning-Decision Trees, Regression and Classification with
Linear Models, Artificial Neural Networks, Support Vector
Machines.
Applications of AI-Natural Language Processing, Text
Classification and Information Retrieval, Speech
Recognition, Image processing and computer vision,
Robotics These ML algorithms help to solve different business problems like
Regression, Classification, Forecasting, Clustering, and Associations,
etc.
Types of Machine Learning
Based on the methods and way of learning, machine learning is
Machine learning is a subset of AI, which enables the machine to divided into mainly four types, which are:
automatically learn from data, improve performance from past
experiences, and make predictions. Machine learning contains a set 1. Supervised Machine Learning
of algorithms that work on a huge amount of data. Data is fed to these
2. Unsupervised Machine Learning
algorithms to train them, and on the basis of training, they build the
model & perform a specific task. 3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1|Page UNIT IV :- Learning By Kamalakar Hegde


it will check all the features of the object, such as height, shape, colour,
eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat
category. This is the process of how the machine identifies the objects
in Supervised Learning.

The main goal of the supervised learning technique is to map the


input variable(x) with the output variable(y). Some real-world
applications of supervised learning are Risk Assessment, Fraud
Detection, Spam filtering, etc.

In this topic, we will provide a detailed description of the types of Categories of Supervised Machine Learning
Machine Learning along with their respective algorithms:
Supervised machine learning can be classified into two types of
problems, which are given below:
1. Supervised Machine Learning
o Classification
As its name suggests, Supervised machine learning is based on
supervision. It means in the supervised learning technique, we train the o Regression
machines using the "labelled" dataset, and based on the training, the
machine predicts the output. Here, the labelled data specifies that a) Classification
some of the inputs are already mapped to the output. More preciously,
we can say; first, we train the machine with the input and Classification algorithms are used to solve the classification problems
corresponding output, and then we ask the machine to predict the in which the output variable is categorical, such as "Yes" or No, Male
output using the test dataset. or Female, Red or Blue, etc. The classification algorithms predict the
categories present in the dataset. Some real-world examples of
Let's understand supervised learning with an example. Suppose we classification algorithms are Spam Detection, Email filtering, etc.
have an input dataset of cats and dog images. So, first, we will provide
the training to the machine to understand the images, such as Some popular classification algorithms are given below:
the shape & size of the tail of cat and dog, Shape of eyes, colour,
height (dogs are taller, cats are smaller), etc. After completion of o Random Forest Algorithm
training, we input the picture of a cat and ask the machine to identify o Decision Tree Algorithm
the object and predict the output. Now, the machine is well trained, so

2|Page UNIT IV :- Learning By Kamalakar Hegde


o Logistic Regression Algorithm o These algorithms are not able to solve complex tasks.
o Support Vector Machine Algorithm o It may predict the wrong output if the test data is different from
the training data.
b) Regression
o It requires lots of computational time to train the algorithm.
Regression algorithms are used to solve regression problems in which
there is a linear relationship between input and output variables. These Applications of Supervised Learning
are used to predict continuous output variables, such as market trends,
weather prediction, etc. Some common applications of Supervised Learning are given below:

Some popular Regression algorithms are given below: o Image Segmentation:

o Simple Linear Regression Algorithm Supervised Learning algorithms are used in image segmentation. In
o Multivariate Regression Algorithm this process, image classification is performed on different image data
with pre-defined labels.
o Decision Tree Algorithm
o Lasso Regression o Medical Diagnosis:
Supervised algorithms are also used in the medical field for
Advantages and Disadvantages of Supervised diagnosis purposes. It is done by using medical images and
Learning past labelled data with labels for disease conditions. With such
a process, the machine can identify a disease for the new
Advantages:
patients.
o Since supervised learning work with the labelled dataset so we o Fraud Detection - Supervised Learning classification
can have an exact idea about the classes of objects. algorithms are used for identifying fraud transactions, fraud
o These algorithms are helpful in predicting the output on the customers, etc. It is done by using historic data to identify the
basis of prior experience. patterns that can lead to possible fraud.

Disadvantages:

3|Page UNIT IV :- Learning By Kamalakar Hegde


o Spam detection - In spam detection & filtering, classification So, now the machine will discover its patterns and differences, such as
algorithms are used. These algorithms classify an email as spam colour difference, shape difference, and predict the output when it is
tested with the test dataset.
or not spam. The spam emails are sent to the spam folder.
o Speech Recognition - Supervised learning algorithms are also Categories of Unsupervised Machine Learning
used in speech recognition. The algorithm is trained with voice
data, and various identifications can be done using the same, Unsupervised Learning can be further classified into two types, which
such as voice-activated passwords, voice commands, etc. are given below:

Clustering
2. Unsupervised Machine Learning
o

o Association
Unsupervised learning is different from the Supervised learning
technique; as its name suggests, there is no need for supervision. It 1) Clustering
means, in unsupervised machine learning, the machine is trained using
The clustering technique is used when we want to find the inherent
the unlabeled dataset, and the machine predicts the output without
groups from the data. It is a way to group the objects into a cluster
any supervision.
such that the objects with the most similarities remain in one group
In unsupervised learning, the models are trained with the data that is and have fewer or no similarities with the objects of other groups. An
neither classified nor labelled, and the model acts on that data without example of the clustering algorithm is grouping the customers by their
any supervision. purchasing behaviour.

The main aim of the unsupervised learning algorithm is to group Some of the popular clustering algorithms are given below:
or categories the unsorted dataset according to the similarities,
patterns, and differences. Machines are instructed to find the hidden o K-Means Clustering algorithm
patterns from the input dataset. o Mean-shift algorithm
o DBSCAN Algorithm
Let's take an example to understand it more preciously; suppose there
is a basket of fruit images, and we input it into the machine learning o Principal Component Analysis
model. The images are totally unknown to the model, and the task of o Independent Component Analysis
the machine is to find the patterns and categories of the objects.

4|Page UNIT IV :- Learning By Kamalakar Hegde


2) Association o The output of an unsupervised algorithm can be less accurate
as the dataset is not labelled, and algorithms are not trained
Association rule learning is an unsupervised learning technique, which
with the exact output in prior.
finds interesting relations among variables within a large dataset. The
main aim of this learning algorithm is to find the dependency of one o Working with Unsupervised learning is more difficult as it works
data item on another data item and map those variables accordingly with the unlabelled dataset that does not map with the output.
so that it can generate maximum profit. This algorithm is mainly
applied in Market Basket analysis, Web usage mining, continuous Applications of Unsupervised Learning
production, etc.
o Network Analysis: Unsupervised learning is used for
Some popular algorithms of Association rule learning are Apriori
Algorithm, Eclat, FP-growth algorithm. identifying plagiarism and copyright in document network
analysis of text data for scholarly articles.
Advantages and Disadvantages of Unsupervised o Recommendation Systems: Recommendation systems widely
Learning Algorithm use unsupervised learning techniques for building
recommendation applications for different web applications
Advantages: and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular
o These algorithms can be used for complicated tasks compared
application of unsupervised learning, which can identify
to the supervised ones because these algorithms work on the
unusual data points within the dataset. It is used to discover
unlabeled dataset.
fraudulent transactions.
o Unsupervised algorithms are preferable for various tasks as
o Singular Value Decomposition: Singular Value
getting the unlabeled dataset is easier as compared to the
Decomposition or SVD is used to extract particular information
labelled dataset.
from the database. For example, extracting information of each
Disadvantages: user located at a particular location.

3. Semi-Supervised Learning

5|Page UNIT IV :- Learning By Kamalakar Hegde


Semi-Supervised learning is a type of Machine Learning algorithm Advantages and disadvantages of Semi-supervised
that lies between Supervised and Unsupervised machine learning.
It represents the intermediate ground between Supervised (With
Learning
Labelled training data) and Unsupervised learning (with no labelled
Advantages:
training data) algorithms and uses the combination of labelled and
unlabeled datasets during the training period.
o It is simple and easy to understand the algorithm.
Although Semi-supervised learning is the middle ground between o It is highly efficient.
supervised and unsupervised learning and operates on the data that
o It is used to solve drawbacks of Supervised and Unsupervised
consists of a few labels, it mostly consists of unlabeled data. As labels
Learning algorithms.
are costly, but for corporate purposes, they may have few labels. It is
completely different from supervised and unsupervised learning as
Disadvantages:
they are based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and o Iterations results may not be stable.
unsupervised learning algorithms, the concept of Semi- o We cannot apply these algorithms to network-level data.
supervised learning is introduced. The main aim of semi-supervised
o Accuracy is low.
learning is to effectively use all the available data, rather than only
labelled data like in supervised learning. Initially, similar data is
clustered along with an unsupervised learning algorithm, and further, 4. Reinforcement Learning
it helps to label the unlabeled data into labelled data. It is because
labelled data is a comparatively more expensive acquisition than Reinforcement learning works on a feedback-based process, in
unlabeled data. which an AI agent (A software component) automatically explore
its surrounding by hitting & trail, taking action, learning from
We can imagine these algorithms with an example. Supervised experiences, and improving its performance. Agent gets rewarded
learning is where a student is under the supervision of an instructor at for each good action and get punished for each bad action; hence the
home and college. Further, if that student is self-analysing the same goal of reinforcement learning agent is to maximize the rewards.
concept without any help from the instructor, it comes under
unsupervised learning. Under semi-supervised learning, the student In reinforcement learning, there is no labelled data like supervised
has to revise himself after analyzing the same concept under the learning, and agents learn from their experiences only.
guidance of an instructor at college.

6|Page UNIT IV :- Learning By Kamalakar Hegde


The reinforcement learning process is similar to a human being; for Real-world Use cases of Reinforcement Learning
example, a child learns various things by experiences in his day-to-day
life. An example of reinforcement learning is to play a game, where the
o Video Games:
Game is the environment, moves of an agent at each step define states,
RL algorithms are much popular in gaming applications. It is
and the goal of the agent is to get a high score. Agent receives
feedback in terms of punishment and rewards. used to gain super-human performance. Some popular games
that use RL algorithms are AlphaGO and AlphaGO Zero.
Due to its way of working, reinforcement learning is employed in
o Resource Management:
different fields such as Game theory, Operation Research,
The "Resource Management with Deep Reinforcement
Information theory, multi-agent systems.
Learning" paper showed that how to use RL in computer to
A reinforcement learning problem can be formalized using Markov automatically learn and schedule resources to wait for
Decision Process(MDP). In MDP, the agent constantly interacts with different jobs in order to minimize average job slowdown.
the environment and performs actions; at each action, the
o Robotics:
environment responds and generates a new state.
RL is widely being used in Robotics applications. Robots are
Categories of Reinforcement Learning used in the industrial and manufacturing area, and these robots
are made more powerful with reinforcement learning. There are
Reinforcement learning is categorized mainly into two types of different industries that have their vision of building intelligent
methods/algorithms: robots using AI and Machine learning technology.

o Positive Reinforcement Learning: Positive reinforcement o Text Mining


learning specifies increasing the tendency that the required Text-mining, one of the great applications of NLP, is now

behaviour would occur again by adding something. It enhances being implemented with the help of Reinforcement Learning
the strength of the behaviour of the agent and positively by Salesforce company.

impacts it.
Advantages and Disadvantages of Reinforcement
o Negative Reinforcement Learning: Negative reinforcement
Learning
learning works exactly opposite to the positive RL. It increases
the tendency that the specific behaviour would occur again by Advantages
avoiding the negative condition.

7|Page UNIT IV :- Learning By Kamalakar Hegde


o It helps in solving complex real-world problems which are and have multiple branches, whereas Leaf nodes are the output of
difficult to be solved by general techniques. those decisions and do not contain any further branches.
o The learning model of RL is similar to the learning of human o The decisions or the test are performed on the basis of features of
beings; hence most accurate results can be found. the given dataset.
o Helps in achieving long term results. o It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions.
Disadvantage
o It is called a decision tree because, similar to a tree, it starts with the
o RL algorithms are not preferred for simple problems. root node, which expands on further branches and constructs a tree-
like structure.
o RL algorithms require huge data and computations.
o In order to build a tree, we use the CART algorithm, which stands
o Too much reinforcement learning can lead to an overload of
for Classification and Regression Tree algorithm.
states which can weaken the results.
o A decision tree simply asks a question, and based on the answer
The curse of dimensionality limits reinforcement learning for real (Yes/No), it further split the tree into subtrees.
physical systems.
o Below diagram explains the general structure of a decision tree:

Decision Tree
o Decision Tree is a Supervised learning technique that can be used
for both classification and Regression problems, but mostly it is
preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node
represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision

8|Page UNIT IV :- Learning By Kamalakar Hegde


Note: A decision tree can contain categorical data (YES/NO) as well Decision Tree Terminologies
as numeric data.
Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or more
homogeneous sets.

Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.

Splitting: Splitting is the process of dividing the decision node/root


node into sub-nodes according to the given conditions.

Branch/Sub Tree: A tree formed by splitting the tree.

Pruning: Pruning is the process of removing the unwanted branches


from the tree.

Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.

How does the Decision Tree algorithm Work?


Why use Decision Trees?
In a decision tree, for predicting the class of the given dataset, the
There are various algorithms in Machine learning, so choosing the best algorithm starts from the root node of the tree. This algorithm
algorithm for the given dataset and problem is the main point to compares the values of root attribute with the record (real dataset)
remember while creating a machine learning model. Below are the two attribute and, based on the comparison, follows the branch and jumps
reasons for using the Decision tree: to the next node.

o Decision Trees usually mimic human thinking ability while making a For the next node, the algorithm again compares the attribute value
with the other sub-nodes and move further. It continues the process
decision, so it is easy to understand.
until it reaches the leaf node of the tree. The complete process can be
o The logic behind the decision tree can be easily understood because better understood using the below algorithm:
it shows a tree-like structure.

9|Page UNIT IV :- Learning By Kamalakar Hegde


o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the
best attributes.
o Step-4: Generate the decision tree node, which contains the best
attribute.
o Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is
reached where you cannot further classify the nodes and called the
final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants
to decide whether he should accept the offer or Not. So, to solve this
problem, the decision tree starts with the root node (Salary attribute Attribute Selection Measures
by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the While implementing a Decision tree, the main issue arises that how to
corresponding labels. The next decision node further gets split into select the best attribute for the root node and for sub-nodes. So, to
one decision node (Cab facility) and one leaf node. Finally, the decision solve such problems there is a technique which is called as Attribute
node splits into two leaf nodes (Accepted offers and Declined offer). selection measure or ASM. By this measurement, we can easily select
Consider the below diagram: the best attribute for the nodes of the tree. There are two popular
techniques for ASM, which are:

o Information Gain
o Gini Index

1. Information Gain:

10 | P a g e UNIT IV :- Learning By Kamalakar Hegde


o Information gain is the measurement of changes in entropy after the o Gini index is a measure of impurity or purity used while creating a
segmentation of a dataset based on an attribute. decision tree in the CART(Classification and Regression Tree)
o It calculates how much information a feature provides us about a algorithm.
class. o An attribute with the low Gini index should be preferred as compared
o According to the value of information gain, we split the node and to the high Gini index.
build the decision tree. o It only creates binary splits, and the CART algorithm uses the Gini
o A decision tree algorithm always tries to maximize the value of index to create binary splits.
information gain, and a node/attribute having the highest o Gini index can be calculated using the below formula:
information gain is split first. It can be calculated using the below
Gini Index= 1- ∑jPj2
formula:

1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)


Pruning: Getting an Optimal Decision tree
Entropy: Entropy is a metric to measure the impurity in a given Pruning is a process of deleting the unnecessary nodes from a tree in
attribute. It specifies randomness in data. Entropy can be calculated as: order to get the optimal decision tree.

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) A too-large tree increases the risk of overfitting, and a small tree may
not capture all the important features of the dataset. Therefore, a
Where, technique that decreases the size of the learning tree without reducing
accuracy is known as Pruning. There are mainly two types of
o S= Total number of samples tree pruning technology used:

o P(yes)= probability of yes


o Cost Complexity Pruning
o P(no)= probability of no
o Reduced Error Pruning.

2. Gini Index:
Advantages of the Decision Tree

11 | P a g e UNIT IV :- Learning By Kamalakar Hegde


o It is simple to understand as it follows the same process which a We can understand the concept of regression analysis using the below
example:
human follow while making any decision in real-life.
o It can be very useful for solving decision-related problems. Example: Suppose there is a marketing company A, who does various
o It helps to think about all the possible outcomes for a problem. advertisement every year and get sales on that. The below list shows
the advertisement made by the company in the last 5 years and the
o There is less requirement of data cleaning compared to other
corresponding sales:
algorithms.

Disadvantages of the Decision Tree


o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
o For more class labels, the computational complexity of the decision
tree may increase.

Regression Analysis in Machine


learning
Now, the company wants to do the advertisement of $200 in the year
Regression analysis is a statistical method to model the relationship 2019 and wants to know the prediction about the sales for this
between a dependent (target) and independent (predictor) variables year. So to solve such type of prediction problems in machine learning,
with one or more independent variables. More specifically, Regression we need regression analysis.
analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable when Regression is a supervised learning technique which helps in finding
other independent variables are held fixed. It predicts continuous/real the correlation between variables and enables us to predict the
values such as temperature, age, salary, price, etc. continuous output variable based on the one or more predictor
variables. It is mainly used for prediction, forecasting, time series

12 | P a g e UNIT IV :- Learning By Kamalakar Hegde


modeling, and determining the causal-effect relationship o Outliers: Outlier is an observation which contains either very low
between variables.
value or very high value in comparison to other observed values. An

In Regression, we plot a graph between the variables which best fits outlier may hamper the result, so it should be avoided.
the given datapoints, using this plot, the machine learning model can o Multicollinearity: If the independent variables are highly correlated
make predictions about the data. In simple words, "Regression shows with each other than other variables, then such condition is called
a line or curve that passes through all the datapoints on target- Multicollinearity. It should not be present in the dataset, because it
predictor graph in such a way that the vertical distance between
creates problem while ranking the most affecting variable.
the datapoints and the regression line is minimum." The distance
between datapoints and line tells whether a model has captured a o Underfitting and Overfitting: If our algorithm works well with the
strong relationship or not. training dataset but not well with test dataset, then such problem is
called Overfitting. And if our algorithm does not perform well even
Some examples of regression can be as:
with training dataset, then such problem is called underfitting.

o Prediction of rain using temperature and other factors


o Determining Market trends
Why do we use Regression Analysis?
o Prediction of road accidents due to rash driving. As mentioned above, Regression analysis helps in the prediction of a
continuous variable. There are various scenarios in the real world
Terminologies Related to the Regression where we need some future predictions such as weather condition,
sales prediction, marketing trends, etc., for such case we need some
Analysis: technology which can make predictions more accurately. So for such
case we need Regression analysis which is a statistical method and
o Dependent Variable: The main factor in Regression analysis which used in machine learning and data science. Below are some other
we want to predict or understand is called the dependent variable. It reasons for using Regression analysis:
is also called target variable.
o Regression estimates the relationship between the target and the
o Independent Variable: The factors which affect the dependent
independent variable.
variables or which are used to predict the values of the dependent
o It is used to find the trends in data.
variables are called independent variable, also called as a predictor.
o It helps to predict real/continuous values.

13 | P a g e UNIT IV :- Learning By Kamalakar Hegde


o By performing the regression, we can confidently determine the most
important factor, the least important factor, and how each factor
is affecting the other factors.

Types of Regression
There are various types of regressions which are used in data science
and machine learning. Each type has its own importance on different
scenarios, but at the core, all the regression methods analyze the effect
of the independent variable on dependent variables. Here we are
discussing some important types of regression which are given below:

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression Linear Regression:
o Ridge Regression
o Linear regression is a statistical regression method which is used for
o Lasso Regression:
predictive analysis.
o It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
o It is used for solving the regression problem in machine learning.

14 | P a g e UNIT IV :- Learning By Kamalakar Hegde


o Linear regression shows the linear relationship between the Here, Y = dependent variables (target variables),
independent variable (X-axis) and the dependent variable (Y-axis),
X= Independent variables (predictor variables),
a and b are the linear coefficients
hence called linear regression.
o If there is only one input variable (x), then such linear regression is Some popular applications of linear regression are:
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear o Analyzing trends and sales estimates

regression. o Salary forecasting

o The relationship between variables in the linear regression model can o Real estate prediction
be explained using the below image. Here we are predicting the o Arriving at ETAs in traffic.
salary of an employee on the basis of the year of experience.
Logistic Regression:

o Logistic regression is another supervised learning algorithm which is


used to solve the classification problems. In classification problems,
we have dependent variables in a binary or discrete format such as 0
or 1.
o Logistic regression algorithm works with the categorical variable such
as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of
probability.
o Logistic regression is a type of regression, but it is different from the
linear regression algorithm in the term how they are used.
o Below is the mathematical equation for Linear regression: o Logistic regression uses sigmoid function or logistic function which
is a complex cost function. This sigmoid function is used to model the
1. Y= aX+b
data in logistic regression. The function can be represented as:

15 | P a g e UNIT IV :- Learning By Kamalakar Hegde


o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o f(x)= Output between the 0 and 1 value. o Ordinal(low, medium, high)
o x= input to the function
o e= base of natural logarithm.
Polynomial Regression:

When we provide the input values (data) to the function, it gives the o Polynomial Regression is a type of regression which models the non-
S-curve as follows: linear dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve
between the value of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are
present in a non-linear fashion, so for such case, linear regression will
not best fit to those datapoints. To cover such datapoints, we need
Polynomial regression.
o In Polynomial regression, the original features are transformed
into polynomial features of given degree and then modeled
using a linear model. Which means the datapoints are best fitted
using a polynomial line.

o It uses the concept of threshold levels, values above the threshold


level are rounded up to 1, and values below the threshold level are
rounded up to 0.

There are three types of logistic regression:

16 | P a g e UNIT IV :- Learning By Kamalakar Hegde


Support Vector Machine is a supervised learning algorithm which can
be used for regression as well as classification problems. So if we use
it for regression problems, then it is termed as Support Vector
Regression.

Support Vector Regression is a regression algorithm which works for


continuous variables. Below are some keywords which are used
in Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into


higher dimensional data.
o Hyperplane: In general SVM, it is a separation line between two
classes, but in SVR, it is a line which helps to predict the continuous
variables and cover most of the datapoints.
o The equation for polynomial regression also derived from linear
o Boundary line: Boundary lines are the two lines apart from
regression equation that means Linear regression equation Y= b0+
hyperplane, which creates a margin for datapoints.
b1x, is transformed into Polynomial regression equation Y= b0+b1x+
b2x2+ b3x3+.....+ bnxn. o Support vectors: Support vectors are the datapoints which are
nearest to the hyperplane and opposite class.
o Here Y is the predicted/target output, b0, b1,... bn are the
regression coefficients. x is our independent/input variable. In SVR, we always try to determine a hyperplane with a maximum
o The model is still linear as the coefficients are still linear with quadratic margin, so that maximum number of datapoints are covered in that
margin. The main goal of SVR is to consider the maximum
Note: This is different from Multiple Linear regression in such a way datapoints within the boundary lines and the hyperplane (best-fit
that in Polynomial regression, a single element has different degrees line) must contain a maximum number of datapoints. Consider the
instead of multiple variables with the same degree. below image:

Support Vector Regression:

17 | P a g e UNIT IV :- Learning By Kamalakar Hegde


dataset). These child nodes are further divided into their children
node, and themselves become the parent node of those nodes.
Consider the below image:

Here, the blue line is called hyperplane, and the other two lines are
known as boundary lines.

Decision Tree Regression:

o Decision Tree is a supervised learning algorithm which can be used


for solving both classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each Above image showing the example of Decision Tee regression, here,
internal node represents the "test" for an attribute, each branch the model is trying to predict the choice of a person between Sports
cars or Luxury car.
represent the result of the test, and each leaf node represents the
final decision or result.
o Random forest is one of the most powerful supervised learning
o A decision tree is constructed starting from the root node/parent algorithms which is capable of performing regression as well as
node (dataset), which splits into left and right child nodes (subsets of classification tasks.

18 | P a g e UNIT IV :- Learning By Kamalakar Hegde


o The Random Forest regression is an ensemble learning method which
combines multiple decision trees and predicts the final output based
on the average of each tree output. The combined decision trees are
called as base models, and it can be represented more formally as:

g(x)= f0(x)+ f1(x)+ f2(x)+....

o Random forest uses Bagging or Bootstrap Aggregation technique


of ensemble learning in which aggregated decision tree runs in
parallel and do not interact with each other.
o With the help of Random Forest regression, we can prevent
Overfitting in the model by creating random subsets of the dataset.

Ridge Regression:

o Ridge regression is one of the most robust versions of linear


regression in which a small amount of bias is introduced so that we
can get better long term predictions.
o The amount of bias added to the model is known as Ridge
Regression penalty. We can compute this penalty term by

19 | P a g e UNIT IV :- Learning By Kamalakar Hegde


multiplying with the lambda to the squared weight of each individual
features.
The equation for ridge regression will be:
Classification Algorithm in Machine
o

Learning
o A general linear or polynomial regression will fail if there is high
As we know, the Supervised Machine Learning algorithm can be
collinearity between the independent variables, so to solve such broadly classified into Regression and Classification Algorithms. In
problems, Ridge regression can be used. Regression algorithms, we have predicted the output for continuous
o Ridge regression is a regularization technique, which is used to
values, but to predict the categorical values, we need Classification
algorithms.
reduce the complexity of the model. It is also called as L2
regularization.
What is the Classification Algorithm?
o It helps to solve the problems if we have more parameters than
samples. The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of
Lasso Regression: training data. In Classification, a program learns from the given dataset
or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
o Lasso regression is another regularization technique to reduce the
or dog, etc. Classes can be called as targets/labels or categories.
complexity of the model.
o It is similar to the Ridge Regression except that penalty term contains Unlike regression, the output variable of Classification is a category,
only the absolute weights instead of a square of weights. not a value, such as "Green or Blue", "fruit or animal", etc. Since the
Classification algorithm is a Supervised learning technique, hence it
o Since it takes absolute values, hence, it can shrink the slope to 0,
takes labeled input data, which means it contains input with the
whereas Ridge Regression can only shrink it near to 0. corresponding output.
o It is also called as L1 regularization. The equation for Lasso
In classification algorithm, a discrete output function(y) is mapped to
regression will be:
input variable(x).

20 | P a g e UNIT IV :- Learning By Kamalakar Hegde


ADVERTISEMENT
1. y=f(x), where y = categorical output
o Binary Classifier: If the classification problem has only two possible
The best example of an ML classification algorithm is Email Spam
outcomes, then it is called as Binary Classifier.
Detector.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or
The main goal of the Classification algorithm is to identify the category DOG, etc.
of a given dataset, and these algorithms are mainly used to predict the
o Multi-class Classifier: If a classification problem has more than two
output for the categorical data.
outcomes, then it is called as Multi-class Classifier.
Classification algorithms can be better understood using the below Example: Classifications of types of crops, Classification of types of
diagram. In the below diagram, there are two classes, class A and Class music.
B. These classes have features that are similar to each other and
dissimilar to other classes. Learners in Classification Problems:
In the classification problems, there are two types of learners:

1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait
until it receives the test dataset. In Lazy learner case, classification is
done on the basis of the most related data stored in the training
dataset. It takes less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based
on a training dataset before receiving a test dataset. Opposite to Lazy
learners, Eager Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Naïve Bayes, ANN.

Types of ML Classification Algorithms:


The algorithm which implements the classification on a dataset is
known as a classifier. There are two types of Classifications:

21 | P a g e UNIT IV :- Learning By Kamalakar Hegde


Classification Algorithms can be further divided into the Mainly two o The value of log loss increases if the predicted value deviates from
category:
the actual value.

o Linear Models o The lower log loss represents the higher accuracy of the model.

o Logistic Regression o For Binary classification, cross-entropy can be calculated as:

o Support Vector Machines


1. ?(ylog(p)+(1?y)log(1?p))
o Non-linear Models
Where y= Actual output, p= predicted output.
o K-Nearest Neighbours
o Kernel SVM 2. Confusion Matrix:
o Naïve Bayes
o The confusion matrix provides us a matrix/table as output and
o Decision Tree Classification
describes the performance of the model.
o Random Forest Classification
o It is also known as the error matrix.
Note: We will learn the above algorithms in later chapters. o The matrix consists of predictions result in a summarized form, which
has a total number of correct predictions and incorrect predictions.
Evaluating a Classification model: The matrix looks like as below table:

Once our model is completed, it is necessary to evaluate its


Actual Positive Actual Negative
performance; either it is a Classification or Regression model. So for o
evaluating a Classification model, we have the following ways:

1. Log Loss or Cross-Entropy Loss: Predicted Positive True Positive False Positive

o It is used for evaluating the performance of a classifier, whose output Predicted Negative False Negative True Negative
is a probability value between the 0 and 1.
o For a good binary Classification model, the value of log loss should
be near to 0.

22 | P a g e UNIT IV :- Learning By Kamalakar Hegde


3. AUC-ROC curve: neural networks that construct the structure of the human brain.
Similar to a human brain has neurons interconnected to each other,
o ROC curve stands for Receiver Operating Characteristics Curve and artificial neural networks also have neurons that are linked to each
AUC stands for Area Under the Curve. other in various layers of the networks. These neurons are known as
nodes.
o It is a graph that shows the performance of the classification model
at different thresholds. Artificial neural network tutorial covers all the aspects related to the
o To visualize the performance of the multi-class classification model, artificial neural network. In this tutorial, we will discuss ANNs, Adaptive
resonance theory, Kohonen self-organizing map, Building blocks,
we use the AUC-ROC Curve.
unsupervised learning, Genetic algorithm, etc.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive
Rate) on Y-axis and FPR(False Positive Rate) on X-axis.
What is Artificial Neural Network?
Use cases of Classification Algorithms The term "Artificial Neural Network" is derived from Biological
neural networks that develop the structure of a human brain. Similar
Classification algorithms can be used in different places. Below are to the human brain that has neurons interconnected to one another,
some popular use cases of Classification Algorithms: artificial neural networks also have neurons that are interconnected to
one another in various layers of the networks. These neurons are
o Email Spam Detection known as nodes.
o Speech Recognition
ADVERTISEMENT
o Identifications of Cancer tumor cells. ADVERTISING
o Drugs Classification
o Biometric Identification, etc.

Artificial Neural Network


The term "Artificial neural network" refers to a biologically inspired
sub-field of artificial intelligence modeled after the brain. An Artificial
neural network is usually a computational network based on biological

23 | P a g e UNIT IV :- Learning By Kamalakar Hegde


Dendrites from Biological Neural Network represent inputs in Artificial
Neural Networks, cell nucleus represents Nodes, synapse represents
Weights, and Axon represents Output.

Relationship between Biological neural network and artificial neural


network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights
The given figure illustrates the typical diagram of Biological
Neural Network. Axon Output

The typical Artificial Neural Network looks something like the An Artificial Neural Network in the field of Artificial
given figure. intelligence where it attempts to mimic the network of neurons makes
up a human brain so that computers will have an option to understand
things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like
interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each
neuron has an association point somewhere in the range of 1,000 and
100,000. In the human brain, data is stored in such a manner as to be
distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human
brain is made up of incredibly amazing parallel processors.

24 | P a g e UNIT IV :- Learning By Kamalakar Hegde


We can understand the artificial neural network with an example,
consider an example of a digital logic gate that takes an input and
gives an output. "OR" gate, which takes two inputs. If one or both the
inputs are "On," then we get "On" in output. If both the inputs are
"Off," then we get "Off" in output. Here the output depends upon
input. Our brain does not perform the same task. The outputs to inputs
relationship keep changing because of the neurons in our brain, which
are "learning."

The architecture of an artificial neural


network:
Input Layer:
To understand the concept of the architecture of an artificial neural
network, we have to understand what a neural network consists of. In As the name suggests, it accepts inputs in several different formats
order to define a neural network that consists of a large number of provided by the programmer.
artificial neurons, which are termed units arranged in a sequence of
layers. Lets us look at various types of layers available in an artificial Hidden Layer:
neural network.
The hidden layer presents in-between input and output layers. It
Artificial Neural Network primarily consists of three layers: performs all the calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden


layer, which finally results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted
sum of the inputs and includes a bias. This computation is represented
in the form of a transfer function.

25 | P a g e UNIT IV :- Learning By Kamalakar Hegde


For ANN is to be able to adapt, it is important to determine the
examples and to encourage the network according to the desired
output by demonstrating these examples to the network. The
succession of the network is directly proportional to the chosen
It determines weighted total is passed as an input to an activation instances, and if the event can't appear to the network in all its aspects,
function to produce the output. Activation functions choose whether it can produce false output.
a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that Having fault tolerance:
can be applied upon the sort of task we are performing.
Extortion of one or more cells of ANN does not prohibit it from
generating output, and this feature makes the network fault-tolerance.
Advantages of Artificial Neural Network
(ANN) Disadvantages of Artificial Neural Network:
Parallel processing capability: Assurance of proper network structure:

Artificial neural networks have a numerical value that can perform There is no particular guideline for determining the structure of
more than one task simultaneously. artificial neural networks. The appropriate network structure is
accomplished through experience, trial, and error.
Storing data on the entire network:
Unrecognized behavior of the network:
Data that is used in traditional programming is stored on the whole
network, not on a database. The disappearance of a couple of pieces It is the most significant issue of ANN. When ANN produces a testing
of data in one place doesn't prevent the network from working. solution, it does not provide insight concerning why and how. It
decreases trust in the network.
Capability to work with incomplete knowledge:
ADVERTISEMENT
After ANN training, the information may produce output even with
inadequate data. The loss of performance here relies upon the Hardware dependence:
significance of missing data.

Having a memory distribution:

26 | P a g e UNIT IV :- Learning By Kamalakar Hegde


Artificial neural networks need processors with parallel processing association between the neurons outputs and neuron inputs can be
power, as per their structure. Therefore, the realization of the viewed as the directed edges with weights. The Artificial Neural
equipment is dependent. Network receives the input signal from the external source in the form
of a pattern and image in the form of a vector. These inputs are then
Difficulty of showing the issue to the network: mathematically assigned by the notations x(n) for every n number of
inputs.
ANNs can work with numerical data. Problems must be converted into
numerical values before being introduced to ANN. The presentation
mechanism to be resolved here will directly impact the performance
of the network. It relies on the user's abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value
does not give us optimum results.

Science artificial neural networks that have steeped into the world in the
mid-20th century are exponentially developing. In the present time, we
have investigated the pros of artificial neural networks and the issues
encountered in the course of their utilization. It should not be overlooked
that the cons of ANN networks, which are a flourishing science branch,
are eliminated individually, and their pros are increasing day by day. It
means that artificial neural networks will turn into an irreplaceable part
of our lives progressively important.

How do artificial neural networks work? Afterward, each of the input is multiplied by its corresponding weights
( these weights are the details utilized by the artificial neural networks
Artificial Neural Network can be best represented as a weighted to solve a specific problem ). In general terms, these weights normally
directed graph, where the artificial neurons form the nodes. The represent the strength of the interconnection between neurons inside

27 | P a g e UNIT IV :- Learning By Kamalakar Hegde


the artificial neural network. All the weighted inputs are summarized F(x) = (1/1 + exp(-????x))
inside the computing unit.
Where ???? is considered the Steepness parameter.
If the weighted sum is equal to zero, then bias is added to make the
output non-zero or something else to scale up to the system's Types of Artificial Neural Network:
response. Bias has the same input, and weight equals to 1. Here the
total of weighted inputs can be in the range of 0 to positive infinity. There are various types of Artificial Neural Networks (ANN) depending
Here, to keep the response in the limits of the desired value, a certain upon the human brain neuron and network functions, an artificial
maximum value is benchmarked, and the total of weighted inputs is neural network similarly performs tasks. The majority of the artificial
passed through the activation function. neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For
The activation function refers to the set of transfer functions used to
example, segmentation or classification.
achieve the desired output. There is a different kind of the activation
function, but primarily either linear or non-linear sets of functions.
Some of the commonly used sets of activation functions are the Binary,
Feedback ANN:
linear, and Tan hyperbolic sigmoidal activation functions. Let us take a
In this type of ANN, the output returns into the network to accomplish
look at each of them in details:
the best-evolved results internally. As per the University of
Massachusetts, Lowell Centre for Atmospheric Research. The
Binary: feedback networks feed information back into itself and are well suited
to solve optimization issues. The Internal system error corrections
In binary activation function, the output is either a one or a 0. Here, to
utilize feedback ANNs.
accomplish this, there is a threshold value set up. If the net weighted
input of neurons is more than 1, then the final output of the activation
function is returned as one or else the output is returned as 0. Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an
Sigmoidal Hyperbolic: input layer, an output layer, and at least one layer of a neuron.
Through assessment of its output by reviewing its input, the intensity
The Sigmoidal Hyperbola function is generally seen as an "S" shaped
curve. Here the tan hyperbolic function is used to approximate output of the network can be noticed based on group behavior of the
from the actual net input. The function is defined as: associated neurons, and the output is decided. The primary

28 | P a g e UNIT IV :- Learning By Kamalakar Hegde


advantage of this network is that it figures out how to evaluate and
recognize input patterns.

Support Vector Machine Algorithm


Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as
Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the


Example: SVM can be understood with the example that we have used
hyperplane. These extreme cases are called as support vectors, and
in the KNN classifier. Suppose we see a strange cat that also has some
hence algorithm is termed as Support Vector Machine. Consider the
features of dogs, so if we want a model that can accurately identify
below diagram in which there are two different categories that are
whether it is a cat or dog, so such a model can be created by using the
classified using a decision boundary or hyperplane:
SVM algorithm. We will first train our model with lots of images of cats
and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector
creates a decision boundary between these two data (cat and dog) and
choose extreme cases (support vectors), it will see the extreme case of
cat and dog. On the basis of the support vectors, it will classify it as a
cat. Consider the below diagram:

ADVERTISEMENT

29 | P a g e UNIT IV :- Learning By Kamalakar Hegde


line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM


algorithm:
Hyperplane: There can be multiple lines/decision boundaries to
segregate the classes in n-dimensional space, but we need to find out
the best decision boundary that helps to classify the data points. This
best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in


the dataset, which means if there are 2 features (as shown in image),
then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
SVM algorithm can be used for Face detection, image classification,
text categorization, etc.
We always create a hyperplane that has a maximum margin, which
means the maximum distance between the data points.
Types of SVM
Support Vectors:
SVM can be of two types:
The data points or vectors that are the closest to the hyperplane and
ADVERTISEMENT
which affect the position of the hyperplane are termed as Support
o Linear SVM: Linear SVM is used for linearly separable data, which Vector. Since these vectors support the hyperplane, hence called a
means if a dataset can be classified into two classes by using a single
Support vector.

straight line, then such data is termed as linearly separable data, and
How does SVM works?
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated Linear SVM:
data, which means if a dataset cannot be classified by using a straight

30 | P a g e UNIT IV :- Learning By Kamalakar Hegde


The working of the SVM algorithm can be understood by using an
example. Suppose we have a dataset that has two tags (green and
blue), and the dataset has two features x1 and x2. We want a classifier
that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision
boundary; this best boundary or region is called as a hyperplane. SVM
algorithm finds the closest point of the lines from both the classes.
These points are called support vectors. The distance between the
vectors and the hyperplane is called as margin. And the goal of SVM
So as it is 2-d space so by just using a straight line, we can easily
is to maximize this margin. The hyperplane with maximum margin is
separate these two classes. But there can be multiple lines that can
called the optimal hyperplane.
separate these classes. Consider the below image:

31 | P a g e UNIT IV :- Learning By Kamalakar Hegde


Non-Linear SVM: So to separate these data points, we need to add one more dimension.
For linear data, we have used two dimensions x and y, so for non-linear
If data is linearly arranged, then we can separate it by using a straight data, we will add a third dimension z. It can be calculated as:
line, but for non-linear data, we cannot draw a single straight line.
Consider the below image: z=x2 +y2

By adding the third dimension, the sample space will become as below
image:

32 | P a g e UNIT IV :- Learning By Kamalakar Hegde


So now, SVM will divide the datasets into classes in the following way. Since we are in 3-d Space, hence it is looking like a plane parallel to
Consider the below image: the x-axis. If we convert it in 2d space with z=1, then it will become as:

ADVERTISEMENT

33 | P a g e UNIT IV :- Learning By Kamalakar Hegde


Hence we get a circumference of radius 1 in case of non-linear data.

34 | P a g e UNIT IV :- Learning By Kamalakar Hegde

You might also like