[go: up one dir, main page]

0% found this document useful (0 votes)
120 views12 pages

Artificial Intelligence and Machine Learning

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 12

UNIT 5 : supervised & Unsupervised

Supervised Machine Learning


● Supervised learning is the types of machine learning in which machines are trained using
well "labelled" training data, and on basis of that data, machines predict the output.
● The labelled data means some input data is already tagged with the correct output.
● In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly.
● It applies the same concept as a student learns in the supervision of the teacher.
● Supervised learning is a process of providing input data as well as correct output data to
the machine learning model.
● The aim of a supervised learning algorithm is to find a mapping function to map the
input variable(x) with the output variable(y).
● In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.

How Supervised Learning Works?


● In supervised learning, models are trained using labelled dataset, where the model
learns about each type of data. Once the training process is completed, the model is
tested on the basis of test data (a subset of the training set), and then it predicts the
output.
● Example

● Suppose we have a dataset of different types of shapes which includes square,


rectangle, triangle, and Polygon. Now the first step is that we need to train the model
for each shape.
○ If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
○ If the given shape has three sides, then it will be labelled as a triangle.
○ If the given shape has six equal sides then it will be labelled as hexagon.
● Now, after training, we test our model using the test set, and the task of the model is to
identify the shape.
● The machine is already trained on all types of shapes, and when it finds a new shape, it
classifies the shape on the bases of a number of sides, and predicts the output.

Steps Involved in Supervised Learning:


● First Determine the type of training dataset
● Collect/Gather the labelled training data.
● Split the training dataset into training dataset, test dataset, and validation dataset.
● Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.
● Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.
● Execute the algorithm on the training dataset. Sometimes we need validation sets as the
control parameters, which are the subset of training datasets.
● Evaluate the accuracy of the model by providing the test set. If the model predicts the
correct output, which means our model is accurate.

Types of supervised Machine learning Algorithms:


Supervised learning can be further divided into two types of problems:

● Classification: A classification problem is when the output variable is a category, such


as “Red” or “blue” , “disease” or “no disease”.
● Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
Classification
● The Classification algorithm is a Supervised Learning technique that is used to identify
the category of new observations on the basis of training data. In Classification, a
program learns from the given dataset or observations and then classifies new
observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not
Spam, cat or dog, etc. Classes can be called as targets/labels or categories.
● In classification algorithm, a discrete output function(y) is mapped to input variable(x).
● y=f(x), where y = categorical output
● The best example of an ML classification algorithm is Email Spam Detector.
● The main goal of the Classification algorithm is to identify the category of a given
dataset, and these algorithms are mainly used to predict the output for the categorical
data.
● Classification algorithms can be better understood using the below diagram. In the
below diagram, there are two classes, class A and Class B. These classes have features
that are similar to each other and dissimilar to other classes.

The algorithm which implements the classification on a dataset is known as a classifier. There
are two types of Classifications:
● Binary Classifier: If the classification problem has only two possible outcomes, then it is
called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
● Multi-class Classifier: If a classification problem has more than two outcomes, then it is
called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.

Types of Classification Algorithms

Classification Algorithms can be further divided into the Mainly two category:

● Linear Models
○ Logistic Regression
○ Support Vector Machines
● Non-linear Models
○ K-Nearest Neighbors
○ Kernel SVM
○ Naïve Bayes
○ Decision Tree Classification
○ Random Forest Classification

Learners in Classification Problems:


In the classification problems, there are two types of learners:
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives
the test dataset. In Lazy learner case, classification is done on the basis of the most
related data stored in the training dataset. It takes less time in training but more time
for predictions.
Example: K-NN algorithm, Case-based reasoning

2. Eager Learners: Eager Learners develop a classification model based on a training


dataset before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes
more time in learning, and less time in prediction. Example: Decision Trees, Naïve Bayes,
ANN.

K-nearest Neighbors

● K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on


Supervised Learning technique.
● K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
● K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
● K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
● K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
● It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
● KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
● Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?


● Suppose there are two categories, i.e., Category A and Category B, and we have a new
data point x1, so this data point will lie in which of these categories.
● To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can
easily identify the category or class of a particular dataset. Consider the below diagram:
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:

● Step-1: Select the number K of the neighbors


● Step-2: Calculate the Euclidean distance of K number of neighbors
● Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
● Step-4: Among these k neighbors, count the number of the data points in each category.
● Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
● Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category. Consider the
below image:
● Firstly, we will choose the number of neighbors, so we will choose the k=5.
● Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in
geometry. It can be calculated as:
● By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:

● As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.

Advantages of KNN Algorithm:


● It is simple to implement.
● It is robust to the noisy training data
● It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


● Always needs to determine the value of K which may be complex some time.
● The computation cost is high because of calculating the distance between the data
points for all the training samples.

Applications of Classification Algorithms,

● Sentiment Analysis
● Email Spam Classification
● Document Classification
● Image Classification
● Bank customers loan pay willingness prediction.
● Cancer tumor cells identification.
● Drugs classification
● Facial key points detection
● Pedestrians detection in an automotive car driving.

Overview of Regression
● Regression is a supervised learning technique which helps in finding the correlation
between variables and enables us to predict the continuous output variable based on
the one or more predictor variables.
● It is mainly used for prediction, forecasting, time series modeling, and determining the
causal-effect relationship between variables.
● In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about the
data.
● In simple words, "Regression shows a line or curve that passes through all the
datapoints on target-predictor graph in such a way that the vertical distance between
the datapoints and the regression line is minimum." The distance between datapoints
and line tells whether a model has captured a strong relationship or not.
● Some examples of regression can be as:
● Prediction of rain using temperature and other factors
● Determining Market trends
● Prediction of road accidents due to rash driving.

● Example:

Suppose there is a marketing company A, who does various advertisement every year
and get sales on that. The below list shows the advertisement made by the company in
the last 5 years and the corresponding sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants
to know the prediction about the sales for this year. So to solve such type of prediction
problems in machine learning, we need regression analysis.

Terminologies Related to the Regression Analysis:

● Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
● Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.

Types of Regression
● Linear Regression
● Logistic Regression
● Polynomial Regression
● Support Vector Regression
● Decision Tree Regression
● Random Forest Regression
● Ridge Regression
● Lasso Regression
Linear Regression:
● Linear regression is a statistical regression method which is used for predictive analysis.
● It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
● It is used for solving the regression problem in machine learning.
● Linear regression shows the linear relationship between the independent variable (X-
axis) and the dependent variable (Y-axis), hence called linear regression.
● If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
● The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the basis of
the year of experience.

● Below is the mathematical equation for Linear regression:

Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients

Applications of linear regression:

● Analyzing trends and sales estimates


● Salary forecasting
● Real estate prediction
● Arriving at ETAs in traffic.

Advantages of Supervised Learning

● With the help of supervised learning, the model can predict the output on the basis of
prior experiences.
● In supervised learning, we can have an exact idea about the classes of objects.
● Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.

Disadvantages of supervised learning:

● Supervised learning models are not suitable for handling the complex tasks.
● Supervised learning cannot predict the correct output if the test data is different from
the training dataset.
● Training required lots of computation times.
● In supervised learning, we need enough knowledge about the classes of object.

You might also like