[go: up one dir, main page]

0% found this document useful (0 votes)
25 views46 pages

Jntuk ML RECORD Full

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 46

1

Experiment-1
Aim: To Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .CSV
file.

Description:
Introduction :

The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S
algorithm finds the most specific hypothesis that fits all the positive examples. We have to
note here that the algorithm considers only those positive training example. The find-S
algorithm starts with the most specific hypothesis and generalizes this hypothesis each time
it fails to classify an observed positive training data. Hence, the Find-S algorithm moves
from the most specific hypothesis to the most general hypothesis.

Important Representation :

1. ? indicates that any value is acceptable for the attribute.


2. specify a single required value ( e.g., Cold ) for the attribute.
3. Φ indicates that no value is acceptable.
4. The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?}
5. The most specific hypothesis is represented by: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Steps Involved In Find-S :

1. Start with the most specific hypothesis.


h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
2. Take the next example and if it is negative, then no changes occur to the
hypothesis.
3. If the example is positive and we find that our initial hypothesis is too specific
then we update our current hypothesis to a general condition.
4. Keep repeating the above steps till all the training examples are complete.
5. After we have completed all the training examples we will have the final
hypothesis when can use to classify the new examples.

Algorithm:

1. Initialize h to the most specific hypothesis in H


2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
Else replace a, in h by the next more general constraint that is satisfied by x
3. Output hypothesis h

20JG1A1233
2

PROGRAM:

OUTPUT:

20JG1A1233
3

Experiment-2
Aim: For a given set of training data examples stored in a .CSV file, to implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.

Description:
The candidate elimination algorithm incrementally builds the version space given a
hypothesis space H and a set E of examples. The examples are added one by one; each
example possibly shrinks the version space by removing the hypotheses that are inconsistent
with the example. The candidate elimination algorithm does this by updating the general and
specific boundary for each new example.
 You can consider this as an extended form of the Find-S algorithm.
 Consider both positive and negative examples.
 Actually, positive examples are used here as the Find-S algorithm (Basically they
are generalizing from the specification).
 While the negative example is specified in the generalizing form.

Terms Used:
 Concept learning: Concept learning is basically the learning task of the
machine (Learn by Train data)
 General Hypothesis: Not Specifying features to learn the machine.
 G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes
 Specific Hypothesis: Specifying features to learn machine (Specific feature)
 S= {‘pi’,’pi’,’pi’…}: The number of pi depends on a number of attributes.
 Version Space: It is an intermediate of general hypothesis and Specific
hypothesis. It not only just writes one hypothesis but a set of all possible
hypotheses based on training data-set.

Algorithm:

Step1: Load Data set


Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.

20JG1A1233
4

PROGRAM:

20JG1A1233
5

OUTPUT:

20JG1A1233
6

Experiment-3
Aim: To Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample.

Description:
Decision Trees:

In simple words, a decision tree is a structure that contains nodes (rectangular boxes) and
edges(arrows) and is built from a dataset (table of columns representing features/attributes and
rows corresponds to records). Each node is either used to make a decision (known as decision
node) or represent an outcome (known as leaf node).

Decision tree Example

The picture above depicts a decision tree that is used to classify whether a person

is Fit or Unfit.
The decision nodes here are questions like ‘’‘Is the person less than 30 years of age?’, ‘Does
the person eat junk?’, etc. and the leaves are one of the two possible outcomes
viz. Fit and Unfit.
Looking at the Decision Tree we can say make the following decisions:
if a person is less than 30 years of age and doesn’t eat junk food then he is Fit, if a person is
less than 30 years of age and eats junk food then he is Unfit and so on. The initial node is
called the root node (colored in blue), the final nodes are called the leaf nodes (colored in
green) and the rest of the nodes are called intermediate or internal nodes.

20JG1A1233
7

The root and intermediate nodes represent the decisions while the leaf nodes represent the
outcomes.

ID3

ID3 stands for Iterative Dichotomiser 3 and is named such because the algorithm iteratively
(repeatedly) dichotomizes(divides) features into two or more groups at each step. Invented
by Ross Quinlan, ID3 uses a top-down greedy approach to build a decision tree. In simple
words, the top-down approach means that we start building the tree from the top and
the greedy approach means that at each iteration we select the best feature at the present
moment to create a node. Most generally ID3 is only used for classification problems
with nominal features only.

Metrics in ID3
As mentioned previously, the ID3 algorithm selects the best feature at each step while
building a Decision tree. Before you ask, the answer to the question: ‘How does ID3 select
the best feature?’ is that ID3 uses Information Gain or just Gain to find the best feature.
Information Gain calculates the reduction in the entropy and measures how well a given
feature separates or classifies the target classes. The feature with the highest Information
Gain is selected as the best one. In simple words, Entropy is the measure of disorder and the
Entropy of a dataset is the measure of disorder in the target feature of the dataset.
In the case of binary classification (where the target column has only two types of classes)
entropy is 0 if all values in the target column are homogenous(similar) and will be 1 if the
target column has equal number values for both the classes.
We denote our dataset as S, entropy is calculated as:

Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n
where,
n is the total number of classes in the target column (in our case n = 2 i.e YES and NO)
pᵢ is the probability of class ‘i’ or the ratio of “number of rows with class i in the target
column” to the “total number of rows” in the dataset.

Information Gain for a feature column A is calculated as:


IG(S, A) = Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))
where Sᵥ is the set of rows in S for which the feature column A has value v, |Sᵥ| is the number
of rows in Sᵥ and likewise |S| is the number of rows in S.

20JG1A1233
8

PROGRAM:

20JG1A1233
9

20JG1A1233
10

OUTPUT:

20JG1A1233
11

EXPERIMENT-4
Aim: Exercises to solve the real-world problems using the following machine learning
methods: a) Linear Regression b) Logistic Regression

Description:

LINEAR REGRESSION: Linear regression is one of the easiest and most popular Machine
Learning algorithms. It is a statistical method that is used for predictive analysis. Linear
regression makes predictions for continuous/real or numeric variables such as sales, salary,
age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression shows
the linear relationship, which means it finds how the value of the dependent variable is
changing according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

LOGISTIC REGRESSION: Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.

o Logistic regression predicts the output of a categorical dependent variable. Therefore


the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.

20JG1A1233
12

o Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the
ability to provide probabilities and classify new data using continuous and discrete
datasets.
o Logistic Regression can be used to classify the observations using different types of
data and can easily determine the most effective variables used for the classification.
The below image is showing the logistic function:

PROGRAM:
Linear Regression:

20JG1A1233
13

20JG1A1233
14

20JG1A1233
15

Logistic Regression:

20JG1A1233
16

20JG1A1233
17

EXPERIMENT-5

Aim: Develop a program for Bias, Variance, Remove duplicates, Cross Validation
Description: The bias error is an error from erroneous assumptions in the learning algorithm.
High bias can cause an algorithm to miss the relevant relations between features and target
outputs (underfitting). The variance is an error from sensitivity to small fluctuations in the
training set. High variance may result from an algorithm modelling the random noise in the
training data (overfitting). The bias–variance tradeoff is a central problem in supervised
learning. Ideally, one wants to choose a model that both accurately captures the regularities in
its training data, but also generalizes well to unseen data. Unfortunately, it is typically
impossible to do both simultaneously. High-variance learning methods may be able to represent
their training set well but are at risk of overfitting to noisy or unrepresentative training data. In
contrast, algorithms with high bias typically produce simpler models that may fail to capture
important regularities (i.e. underfit) in the data. The bias–variance decomposition is a way of
analysing a learning algorithm's expected generalization error with respect to a particular
model. The following diagram illustrates the bias–variance tradeoff.

Preparing a dataset before designing a machine learning model is an important task for the data
scientist. When you gather a dataset for modelling a machine learning model, you may find
some instances repeated several times. It is very important for you to remove duplicates from
the dataset to maintain accuracy and to avoid misleading statistics. Cross-validation is a
technique for evaluating a machine learning model and testing its performance. CV is
commonly used in applied ML tasks. It can be used to estimate the test error associated with a
given statistical learning method in order to evaluate its performance, or to select the
appropriate level of flexibility. In this experiment, students need to take a learning model and
an appropriate data set, remove duplicates in the data set, fit a model, measure bias and variance
components of the error rate, and fine-tune the parameters using cross validation. They may
use built-in APIs if needed

20JG1A1233
18

PROGRAM:

20JG1A1233
19

20JG1A1233
20

Experiment-6
Aim: To Write a program to implement Categorical Encoding, One-hot Encoding.
Description:
One Hot Encoding:
One hot encoding is a technique that we use to represent categorical variables as numerical
values in a machine learning model.
The advantages of using one hot encoding include:
1. It allows the use of categorical variables in models that require numerical input.
2. It can improve model performance by providing more information to the model
about the categorical variable.
3. It can help to avoid the problem of ordinality, which can occur when a categorical
variable has a natural ordering (e.g. “small”, “medium”, “large”).

The disadvantages of using one hot encoding include:


1. It can lead to increased dimensionality, as a separate column is created for each
category in the variable. This can make the model more complex and slow to train.
2. It can lead to sparse data, as most observations will have a value of 0 in most of the
one-hot encoded
columns.
3. It can lead to overfitting, especially if there are many categories in the variable and
the sample size is
relatively small.
4. One-hot-encoding is a powerful technique to treat categorical data, but it can lead
to increased dimensionality, sparsity, and overfitting. It is important to use it
cautiously and consider other methods such as ordinal encoding or binary
encoding.

Examples:
Categorical
Fruit value of fruit Price

apple 1 5

mango 2 10

apple 1 15

orange 3 20

20JG1A1233
21

The output after applying one-hot encoding on the data is given as follows,

apple mango orange price

1 0 0 5

0 1 0 10

1 0 0 15

0 0 1 20

Categorical Encoding:
Encoding categorical data is a process of converting categorical data into integer format so
that the data with converted categorical values can be provided to the different models.
Categorical data can be considered as gathered information that is divided into groups. For
example, a list of many people with their blood group: A+, A-, B+, B-, AB+, AB-,O+, O- etc.
in which each of the blood types is a categorical value.

There can be two kinds of categorical data:

• Nominal data
• Ordinal data

Nominal data: This type of categorical data consists of the name variable without any
numerical values. For example, in any organization, the name of the different departments
like research and development department, human resource department, accounts and billing
department etc.

Ordinal data: This type of categorical data consists of a set of orders or scales. For example,
a list of patients consists of the level of sugar present in the body of a person which can be
divided into high, low and medium classes.

20JG1A1233
22

PROGRAM:

20JG1A1233
23

EXPERIMENT-7
Aim: Build an Artificial Neural Network by implementing the Back propagation algorithm
and test the same using appropriate data sets.

Description:
1. Artificial Neural Network (ANN):
 An ANN is composed of interconnected artificial neurons or nodes organized
into layers: input layer, hidden layer(s), and output layer.
 Each neuron receives inputs, performs a weighted sum of those inputs, applies
an activation function, and produces an output.
 The connections between neurons are associated with weights that determine
the strength of the connection.
 The activation function introduces non-linearity into the network, allowing it to
learn complex patterns and make predictions.

2. Backpropagation Algorithm:
 Backpropagation is a supervised learning algorithm used to train an
ANN by adjusting its weights and biases.
 It utilizes the gradient descent optimization technique to minimize the
network's error or loss function.
 The algorithm consists of two main phases: forward propagation and
backward propagation.
 Forward Propagation:
1. During forward propagation, the input data is fed into the
network, and the outputs of each neuron are calculated
successively through the layers.
2. The output of the network is compared to the desired output,
and the error or loss is calculated.
 Backward Propagation:
1. Backward propagation involves propagating the error from the
output layer back to the previous layers.
2. The error is used to calculate the gradients of the weights and
biases, which indicate the direction and magnitude of
adjustments required.
3. The weights and biases are updated in the opposite direction of
the gradients, effectively reducing the error.
4. This process is repeated iteratively for a defined number of
epochs or until the network converges to a satisfactory level of
accuracy.

PROGRAM:

20JG1A1233
24

20JG1A1233
25

20JG1A1233
26

20JG1A1233
27

Experiment-8
Aim: To Write a program to implement k-Nearest Neighbour algorithm to classify the iris data
set. Print both correct and wrong predictions.

Description: The k-nearest neighbours algorithm, also known as KNN or k-NN, is a non-
parametric, supervised learning classifier, which uses proximity to make classifications or
predictions about the grouping of an individual data point. While it can be used for either
regression or classification problems, it is typically used as a classification algorithm, working
off the assumption that similar points can be found near one another.

Applications:
- Data preprocessing: Datasets frequently have missing values, but the KNN algorithm can
estimate for those values in a process known as missing data imputation.
- Recommendation Engines: Using clickstream data from websites, the KNN algorithm has
been used to provide automatic recommendations to users on additional content.
This research (link resides outside of ibm.com) shows that the a user is assigned to a particular
group, and based on that group’s user behaviour, they are given a recommendation. However,
given the scaling issues with KNN, this approach may not be optimal for larger datasets.
- Finance: It has also been used in a variety of finance and economic use cases. For example,
one paper (PDF, 391 KB) (link resides outside of ibm.com) shows how using KNN on credit
data can help banks assess risk of a loan to an organization or individual. It is used to determine
the credit-worthiness of a loan applicant. Another journal (PDF, 447 KB)(link resides outside
of ibm.com) highlights its use in stock market forecasting, currency exchange rates, trading
futures, and money laundering analyses.
- Healthcare: KNN has also had application within the healthcare industry, making predictions
on the risk of heart attacks and prostate cancer. The algorithm works by calculating the most
likely gene expressions.

Advantages:

- Easy to implement: Given the algorithm’s simplicity and accuracy, it is one of the first
classifiers that a new data scientist will learn.
- Adapts easily: As new training samples are added, the algorithm adjusts to account for any
new data since all training data is stored into memory.
- Few hyperparameters: KNN only requires a k value and a distance metric, which is low
when compared to other machine learning algorithms.

Cons:
o Large datasets take longer to process.
o Requires feature scaling, and inability to do will result in wrongful predictions.
o Noisy data can result in over-fitting or under-fitting of data.

20JG1A1233
28

PROGRAM:

OUTPUT:

20JG1A1233
29

Experiment-9
Aim: To Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.

Description:
Locally Weighted Regression algorithm:
 Locally weighted linear regression is a supervised learning algorithm.
 It is a non-parametric algorithm.
 There exists No training phase. All the work is done during the testing
phase/while making predictions.
 The dataset must always be available for predictions.
 Locally weighted regression methods are a generalization of k-Nearest
Neighbour.
 In Locally weighted regression an explicit local approximation is constructed
from the target function for each query instance.
 The local approximation is based on the target function of the form like
constant, linear, or quadratic functions localized kernel functions.

Steps involved in locally weighted linear regression are:

 Compute the minimum cost


 Predict the output

20JG1A1233
30

PROGRAM:

OUTPUT:

20JG1A1233
31

EXPERIMENT-10

Aim: Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.

Description:
The Naive Bayes algorithm is a supervised machine learning algorithm based on the Bayes’
theorem. It is a probabilistic classifier that is often used in NLP tasks like sentiment analysis
(identifying a text corpus’ emotional or sentimental tone or opinion). The Bayes’ theorem is
used to determine the probability of a hypothesis when prior knowledge is available. It depends
on conditional probabilities.
The formula is given below :

where P(A|B) is posterior probability i.e. the probability of a hypothesis A given the event B
occurs. P(B|A) is likelihood probability i.e. the probability of the evidence given that
hypothesis A is true. P(A) is prior probability i.e. the probability of the hypothesis before
observing the evidence and P(B) is marginal probability i.e. the probability of the evidence.
There are 5 types of Naive Bayes classifiers available in scikit-learn – namely Bernoulli Naive
Bayes, Categorical NB, Complement NB, Gaussian NB, and Multinomial NB.

Working of Naïve Bayes' Classifier:


Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play".
So using this dataset we need to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we need to follow the below
steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Types of Naïve Bayes Model:


20JG1A1233
32

There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a normal distribution. This
means if predictors take continuous values instead of discrete, then the model assumes
that these values are sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
means a particular document belongs to which category such as Sports, Politics,
education, etc. The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular word
is present or not in a document. This model is also famous for document classification
tasks.

Advantages of Naïve Bayes Classifier:


o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn
the relationship between features.

Applications of Naïve Bayes Classifier:


o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager
learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.

PROGRAM:

20JG1A1233
33

20JG1A1233
34

EXPERIMENT-11

Aim: Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment
on the quality of clustering. You can add Java/Python ML library classes/API in the program.

Description:
Clustering is an unsupervised learning technique that separates data of similar nature. It aims
to find a structure (intrinsic grouping) in a collection of unlabelled data. A cluster is therefore
a collection of objects which are ‘similar’ between each other and are ‘dissimilar’ to the objects
belonging to other clusters. Two representatives of the clustering algorithms are the K-means
algorithm and the expectation maximization (EM) algorithm. The K-means algorithm uses
Euclidean distance while EM uses statistical methods.

K-means clustering:
Input: The number of k and a database containing n objects.
Output: A set of k-clusters that minimize the squared-error criterion.
1. arbitrarily choose k objects as the initial cluster centres;
2. repeat;
a. (re)assign each object to the cluster to which the object is the most similar based on
the mean value of the objects in the cluster;
b. update the cluster mean, i.e. calculate the mean value of the object for each cluster;
until no change.

20JG1A1233
35

EM clustering:
Input: Cluster number k, a database, stopping tolerance.
Output: A set of k-clusters with weight that maximize log-likelihood function.
1. Expectation step: For each database record x, compute the membership probability of x in
each cluster h = 1,…, k.
2. Maximization step: Update mixture model parameter (probability weight).
3. Stopping criteria: If stopping criteria are satisfied stop, else set j = j +1 and go to (1).

Steps in EM Algorithm

The EM algorithm is completed mainly in 4 steps, which include Initialization Step,


Expectation Step, Maximization Step, and convergence Step. These steps are explained as
follows:

PROGRAM:

20JG1A1233
36

20JG1A1233
37

Experiment-12
Aim: Exploratory Data Analysis for Classification using Pandas or Matplotlib
Description:
Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore
data, and possibly formulate hypotheses that might cause new data collection and
experiments. EDA focuses more narrowly on checking assumptions required for model
fitting and hypothesis testing. It also checks while handling missing values and making
transformations of variables as needed.
EDA builds a robust understanding of the data, and issues associated with either the info or
process. It’s a scientific approach to getting the story of the data.
TYPES OF EXPLORATORY DATA ANALYSIS:
1. Univariate Non-graphical
2. Multivariate Non-graphical
3. Univariate graphical
4. Multivariate graphical
TOOLS REQUIRED FOR EXPLORATORY DATA ANALYSIS:
Some of the most common tools used to create an EDA are:
1. R: An open-source programming language and free software environment for statistical
computing and graphics supported by the R foundation for statistical computing. The R
language is widely used among statisticians in developing statistical observations and data
analysis.
2. Python: An interpreted, object-oriented programming language with dynamic semantics.
Its high level, built-in data structures, combined with dynamic binding, make it very
attractive for rapid application development, also as to be used as a scripting or glue
language to attach existing components together. Python and EDA are often used together
to spot missing values in the data set, which is vital so you’ll decide the way to handle
missing values for machine learning.

20JG1A1233
38

PROGRAM:

20JG1A1233
39

20JG1A1233
40

Experiment-13
Aim: To Write a Python program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using standard Heart Disease
Data Set.

Description:
Bayesian networks are a widely-used class of probabilistic graphical models. They consist of
two parts: a structure and parameters. The structure is a directed acyclic graph (DAG) that
expresses conditional independencies and dependencies among ran- dom variables associated
with nodes. The parameters consist of conditional probability distributions associated with each
node. A Bayesian network is a compact, flexible and interpretable representation of a joint
probability distribution. It is also an useful tool in knowledge discovery as directed acyclic
graphs allow representing causal relations between variables. Typically, a Bayesian network is
learned from data.
We study Bayesian networks, especially learning their structure. We are interested in scalable
algorithms with theoretical guarantees.
Example:
The corresponding directed acyclic graph is depicted in below figure.

The goal is to calculate the posterior conditional probability distribution of each of the
possible unobserved causes given the observed evidence

PROGRAM:

20JG1A1233
41

OUTPUT:

20JG1A1233
42

20JG1A1233
43

Experiment-14
Aim: To Write a program to Implement Support Vector Machines and Principal Component
Analysis.

Description:
Support Vector Machine (SVM) is a very powerful and versatile Machine Learning model,
capable of performing linear or nonlinear classification, regression, and even outlier detection.
SVM (Support Vector Machine) can be used for both regression and classification. However,
it is widely applied in classifications objectives. The objective of the support vector machine
algorithm is to find a hyperplane in N-dimensional space(N — the number of features) that
distinctly classifies the data points.
There are numerous hyper-planes from which to choose to split the two kinds of data points.
Our goal is to discover a plane with the greatest margin, or the greatest distance between data
points from both classes. Maximizing the margin distance adds some reinforcement, making it
easier to classify future data points.
We can use PCA to find the first two principal components, and visualize the data in this new,
two-dimensional space, with a single scatter-plot.

Principal Component Analysis


PCA is: Linear dimensionality reduction using Singular Value Decomposition of the data to
project it to a lower dimensional space.
Unsupervised Machine Learning
A transformation of your data and attempts to find out what features explain the most
variance in your data.
Principal Component Analysis:
Used in exploratory data analysis (EDA)

20JG1A1233
44

Visualize genetic distance and relatedness between populations.


Method:
Eigenvalue decomposition of a data covariance (or correlation) matrix
Singular value decomposition of a data matrix (After mean centering / normalizing ) the data
matrix for each attribute.
Output : Component scores, sometimes called factor scores (the transformed variable values)
loadings (the weight)
Data compression and information preservation
Visualization, Noise filtering, Feature extraction and engineering
PROGRAM:

20JG1A1233
45

EXPERIMENT-15
Aim: Write a program to Implement Principal Component Analysis
Description:
Principal Component Analysis

Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory data analysis and
predictive modelling. It is a technique to draw strong patterns from the given dataset by
reducing the variances.

PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.

PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality. Some real-world
applications of PCA are image processing, movie recommendation system, optimizing the
power allocation in various communication channels. It is a feature extraction technique, so
it contains the important variables and drops the least important variable.

The PCA algorithm is based on some mathematical concepts such as:

o Variance and Covariance


o Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

o Dimensionality: It is the number of features or variables present in the given dataset.


More easily, it is the number of columns present in the dataset.
o Correlation: It signifies that how strongly two variables are related to each other. Such
as if one changes, the other variable also gets changed. The correlation value ranges
from -1 to +1. Here, -1 occurs if variables are inversely proportional to each other, and
+1 indicates that variables are directly proportional to each other.
o Orthogonal: It defines that variables are not correlated to each other, and hence the
correlation between the pair of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v
will be eigenvector if Av is the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of variables
is called the Covariance Matrix.

20JG1A1233
46

PROGRAM:

OUTPUT:

20JG1A1233

You might also like