Machine Learning
Machine Learning
Machine Learning
NO: 1
DATE:
AIM
PROCEDURE
Linear regression may be defined as the statistical model that analyzes the linear
relationship between a dependent variable with given set of independent variables.
Linear relationship between variables means that when the value of one or more
independent variables will change (increase or decrease), the value of dependent
variable will also change accordingly (increase or decrease).
Y=mX+b
m is the slop of the regression line which represents the effect X has on Y
1. Draw the scatterplot. Look for 1) linear or non-linear pattern of the data and 2)
deviations from the pattern (outliers). If the pattern is non-linear, consider a
transformation. If there are outliers, you may consider removing them only IF
there is a non-statistical reason to do so.
1
2. Fit the least-squares regression line to the data and check the assumptions of
the model by looking at the Residual Plot (for constant standard deviation
assumption) and normal probability plot (for normality assumption). If the
assumptions of the model appear not to be met, a transformation may be
necessary.
3. If necessary, transform the data and re-fit the least-squares regression line
using the transformed data.
4. If a transformation was done, go back to step 1. Otherwise, proceed to step 5.
5. Once a “good-fitting” model is determined, write the equation of the least-
squares regression line. Include the standard errors of the estimates, the
estimate of , and R-squared.
6. Determine if the explanatory variable is a significant predictor of the response
variable by performing a t-test or F-test. Include a confidence interval for the
estimate of the regression coefficient (slope).
PROGRAM
import numpy as np
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]
regr = linear_model.LinearRegression()
regr.fit(diabetes_X_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_X_test)
2
print('Coefficients: \n', regr.coef_)
% mean_squared_error(diabetes_y_test, diabetes_y_pred))
% r2_score(diabetes_y_test, diabetes_y_pred))
plt.xticks(())
plt.yticks(())
plt.show()
3
OUTPUT
4
RUBRICS
RESULT
5
EX.NO: 2
DATE:
AIM:
PROCEDURE:
K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values
of new data points which further means that the new data point will be assigned a
value based on how closely it matches the points in the training set. We can
understand its working with the help of following steps −
Step 1 − For implementing any algorithm, we need dataset. So during the first step
of KNN, we must load the training as well as test data.
Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can
be any integer.
Step 4 – End.
PROGRAM:
import numpy as np
import pandas as pd
path = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
dataset.head()
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
7
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
classifier = KNeighborsClassifier(n_neighbors = 8)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print("Confusion Matrix:")
print(result)
ax= plt.subplot()
ax.set_title('Confusion Matrix');
plt.show()
print("Classification Report:",)
8
print (result1)
result2 = accuracy_score(y_test,y_pred)
9
OUTPUT:
Confusion Matrix:
[[23 0 0]
[ 0 22 0]
[ 0 3 12]]
Classification Report:
accuracy 0.95 60
Accuracy: 0.95
10
RUBRICS
RESULT
11
EX.NO: 3
DATE:
AIM:
PROGRAMMING TOOL:
PROCEDURE:
K-means clustering algorithm computes the centroids and iterates until we it finds
optimal centroid. It assumes that the numbers of clusters are already known. It is
also called flat clustering algorithm. The number of clusters identified from data by
algorithm is represented by ‘K’ in K-means.
In this algorithm, the data points are assigned to a cluster in such a manner that the
sum of the squared distance between the data points and centroid would be
minimum. It is to be understood that less variation within the clusters will lead to
more similar data points within same cluster.
We can understand the working of K-Means clustering algorithm with the help of
following steps −
Step 2 − Next, randomly select K data points and assign each data point to a
cluster. In simple words, classify the data based on the number of data points.
Step 4 − Next, keep iterating the following until we find optimal centroid which is
the assignment of data points to the clusters that are not changing any more
12
4.1 − First, the sum of squared distance between data points and centroids
would be computed.
4.2 − Now, we have to assign each data point to the cluster that is closer than
other cluster (centroid).
4.3 − At last compute the centroids for the clusters by taking the average of
all data points of that cluster.
While working with K-means algorithm we need to take care of the following
things −
PROGRAM:
import numpy as np
plt.show()
kmeans = KMeans(n_clusters = 4)
13
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
centers = kmeans.cluster_centers_
plt.show()
14
OUTPUT:
15
RUBRICS
RESULT:
DATE:
HIERARCHICAL CLUSTERING
AIM:
PROCEDURE:
On the other hand, in divisive hierarchical algorithms, all the data points
are treated as one big cluster and the process of clustering involves dividing (Top-
down approach) the one big cluster into various small clusters.
We are going to explain the most used and important Hierarchical clustering i.e.
agglomerative. The steps to perform the same are as follows −
Step 1 − Treat each data point as single cluster. Hence, we will be having,
say K clusters at start. The number of data points will also be K at start.
Step 2 − Now, in this step we need to form a big cluster by joining two
closet data points. This will result in total of K-1 clusters.
17
Step 3 − Now, to form more clusters we need to join two closet clusters.
This will result in total of K-2 clusters.
Step 4 − Now, to form one big cluster repeat the above three steps until K
would become 0 i.e. no more data points left to join.
Step 5 − At last, after making one single big cluster, dendrograms will be
used to divide into multiple clusters depending upon the problem.
PROGRAM:
import numpy as np
X = np.array(
[[7,8],[12,20],[17,19],[26,15],[32,37],[87,75],[73,85], [62,80],[73,60],[87,96],])
plt.subplots_adjust(bottom = 0.1)
plt.show()
18
dendrogram(linked, orientation = 'top',labels = labelList,
plt.show()
cluster.fit_predict(X)
19
OUTPUT:
20
RUBRICS
RESULT:
21
EX.NO:5
DATE:
AIM
PROCEDURE
Decision tree analysis is a predictive modeling tool that can be applied across
many areas. Decision trees can be constructed by an algorithmic approach that can
split the dataset in different ways based on different conditions. Decisions trees are
the most powerful algorithms that falls under the category of supervised
algorithms.
They can be used for both classification and regression tasks. The two main
entities of a tree are decision nodes, where the data is split and leaves, where we
got outcome.
Gini Index
It is the name of the cost function that is used to evaluate the binary splits in the
dataset and works with the categorical target variable “Success” or “Failure”.
22
Higher the value of Gini index, higher the homogeneity. A perfect Gini index value
is 0 and worst is 0.5 (for 2 class problem). Gini index for a split can be calculated
with the help of following steps −
First, calculate Gini index for sub-nodes by using the formula p^2+q^2,
which is the sum of the square of probability for success and failure.
Next, calculate Gini index for split using weighted Gini score of each node
of that split.
Split Creation
A split is basically including an attribute in the dataset and a value. We can create a
split in dataset with the help of following three parts −
Part 1: Calculating Gini Score − we have just discussed this part in the
previous section.
Part 2: Splitting a dataset − It may be defined as separating a dataset into
two lists of rows having index of an attribute and a split value of that
attribute. After getting the two groups - right and left, from the dataset, we
can calculate the value of split by using Gini score calculated in first part.
Split value will decide in which group the attribute will reside.
Part 3: Evaluating all splits − Next part after finding Gini score and
splitting dataset is the evaluation of all splits. For this purpose, first, we must
check every value associated with each attribute as a candidate split. Then
we need to find the best possible split by evaluating the cost of the split. The
best split will be used as a node in the decision tree.
PROGRAM:
import pandas as pd
23
pima.head()
X = pima[feature_cols] # Features
clf = DecisionTreeClassifier()
clf = clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
print("Confusion Matrix:")
print(result)
print("Classification Report:",)
print (result1)
result2 = accuracy_score(y_test,y_pred)
print("Accuracy:",result2)
24
OUTPUT:
Classification Report:
Accuracy: 0.6623376623376623
25
RUBRICS
RESULT
26
EX.NO:6
DATE:
AIM
PROCEDURE
We can understand the working of Random Forest algorithm with the help of
following steps −
Step 1 − First, start with the selection of random samples from a given
dataset.
Step 2 − Next, this algorithm will construct a decision tree for every sample.
Then it will get the prediction result from every decision tree.
Step 3 − in this step, voting will be performed for every predicted result.
Step 4 − At last, select the most voted prediction result as the final
prediction result.
27
PROGRAM
import numpy as np
import pandas as pd
path = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
dataset.head()
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print("Confusion Matrix:")
print(result)
28
print("Classification Report:",)
print (result1)
result2 = accuracy_score(y_test,y_pred)
print("Accuracy:",result2)
OUTPUT:
Confusion Matrix:
[[13 0 0]
[ 0 14 3]
[ 0 1 14]]
Classification Report:
accuracy 0.91 45
Accuracy: 0.9111111111111111
29
RUBRICS
RESULT
30
EX.NO:7
DATE:
AIM:
PROCEDURE:
Working of SVM
31
The followings are important concepts in SVM −
Support Vectors − Data points that are closest to the hyperplane is called
support vectors. Separating line will be defined with the help of these data
points.
Hyper plane − As we can see in the above diagram, it is a decision plane or
space which is divided between a set of objects having different classes.
Margin − It may be defined as the gap between two lines on the closet data
points of different classes. It can be calculated as the perpendicular distance
from the line to the support vectors. Large margin is considered as a good
margin and small margin is considered as a bad margin.
The main goal of SVM is to divide the datasets into classes to find a maximum
marginal hyper plane (MMH) and it can be done in the following two steps −
First, SVM will generate hyper planes iteratively that segregates the classes
in best way.
Then, it will choose the hyper plane that separates the classes correctly.
SVM Kernels
32
In simple words, kernel converts non-separable problems into separable
problems by adding more dimensions to it. It makes SVM more powerful, flexible
and accurate. The following are some of the types of kernels used by SVM.
Linear Kernel
It can be used as a dot product between any two observations. The formula of
linear kernel is as below −
K(x,xi)=sum(x∗xi)
From the above formula, we can see that the product between two vectors say 𝑥 &
𝑥𝑖 is the sum of the multiplication of each pair of input values.
Polynomial Kernel
k(X,Xi)=1+sum(X∗Xi)^d
RBF kernel, mostly used in SVM classification, maps input space in indefinite
dimensional space. Following formula explains it mathematically −
K(x,xi)=exp(−gamma∗sum(x−xi^2))
33
PROGRAM:
import numpy as np
iris = datasets.load_iris()
y = iris.target
h = (x_max / x_min)/100
plt.subplot(1, 1, 1)
Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
35
OUTPUT:
36
RUBRICS
RESULT:
37
EX.NO:8
DATE:
AIM:
PROCEDURE:
38
Principal Component Analysis (PCA) is a simple yet powerful technique used for
dimensionality reduction. Through it, we can directly decrease the number of
feature variables, thereby narrowing down the important features and saving on
computations. From a high-level view PCA has three main steps:
Use the Eigen values and vectors to select only the most important feature vectors
and then transform your data onto those vectors for reduced dimensionality.
39
PROGRAM:
import pandas as pd
import numpy as np
cancer = load_breast_cancer()
df.head()
scalar = StandardScaler()
scalar.fit(df)
scaled_data = scalar.transform(df)
pca = PCA(n_components = 2)
pca.fit(scaled_data)
x_pca = pca.transform(scaled_data)
x_pca.shape
40
plt.xlabel('First Principal Component')
df_comp =pd.DataFrame(pca.components_,
sns.heatmap(df_comp)
41
OUTPUT
42
RUBRICS
RESULT:
43
EX.NO:9
DATE:
AI M
PROCEDURE
In Bayesian classification, the main interest is to find the posterior probabilities i.e.
the probability of a label given some observed features, (𝐿 | 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠).
Python library, Scikit learn is the most useful library that helps us to build a Naïve
Bayes model in Python. We have the following three types of Naïve Bayes model
under Scikit learn Python library −
It is the simplest Naïve Bayes classifier having the assumption that the data from
each label is drawn from a simple Gaussian distribution.
Another useful Naïve Bayes classifier is Multinomial Naïve Bayes in which the
features are assumed to be drawn from a simple Multinomial distribution. Such
44
kind of Naïve Bayes are most appropriate for the features that represents discrete
counts.
Another important model is Bernoulli Naïve Bayes in which features are assumed
to be binary (0s and 1s). Text classification with ‘bag of words’ model can be an
application of Bernoulli Naïve Bayes.
PROGRAM
dataset = datasets.load_iris()
model = GaussianNB()
model.fit(dataset.data, dataset.target)
expected = dataset.target
predicted = model.predict(dataset.data)
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))
iris = load_iris()
X = iris.data
y = iris.target
45
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=1)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
(y_pred)
46
OUTPUT
confusion matrix:
[[50 0 0]
[ 0 47 3]
[ 0 3 47]]
47
RUBRICS
RESULT
48
EX.NO:10
DATE:
AIM
PROCEDURE
Neural networks (NN), also called artificial neural networks (ANN) are a
subset of learning algorithms within the machine learning field that are loosely
based on the concept of biological neural networks.
49
Sigmoid function
We’ll use the sigmoid function, which draws a characteristic “S”-shaped curve, as
an activation function to the neural network. This function can map any value to a
value from 0 to 1. It will assist us to normalize the weighted sum of the inputs.
Thereafter, we’ll create the derivative of the Sigmoid function to help in computing
the essential adjustments to the weights. The output of a Sigmoid function can be
employed to generate its derivative. For example, if the output variable is “x”, then
its derivative will be x * (1-x).
This is the stage where we’ll teach the neural network to make an accurate
prediction. Every input will have a weight—either positive or negative.
This implies that an input having a big number of positive weight or a big number
of negative weight will influence the resulting output more.
Here is the procedure for the training process that is used in this neural network
example problem:
1. The inputs from the training dataset, performed some adjustments based on
their weights, and siphoned them via a method that computed the output of
the ANN.
50
2. Computed the back-propagated error rate. In this case, it is the difference
between neuron’s predicted output and the expected output of the training
dataset.
3. Based on the extent of the error got, we performed some minor weight
adjustments using the Error Weighted Derivative formula.
4. Then, iterated this process an arbitrary number of 15,000 times. In every
iterations, the whole training set is processed simultaneously.
EXAMPLE
T functions for transposing the matrix from horizontal position to vertical position.
Therefore, the numbers will be stored this way:
Ultimately, the weights of the neuron will be optimized for the provided training
data. Consequently, if the neuron is made to think about a new situation, which is
the same as the previous one, it could make an accurate prediction. This is how
back-propagation takes place.
51
PROGRAM
import numpy as np
class NeuralNetwork():
def __init__(self):
np.random.seed(1)
return 1 / (1 + np.exp(-x))
return x * (1 - x)
output = self.think(training_inputs)
self.synaptic_weights += adjustments
inputs = inputs.astype(float)
return output
if __name__ == "__main__":
52
neural_network = NeuralNetwork()
print(neural_network.synaptic_weights)
training_inputs = np.array([[0,0,1],[1,1,1],[1,0,1],[0,1,1]])
training_outputs = np.array([[0,1,1,0]]).T
print(neural_network.synaptic_weights)
print(neural_network.think(np.array([user_input_one, user_input_two,
user_input_three])))
53
OUTPUT
[[-0.16595599]
[ 0.44064899]
[-0.99977125]]
[[10.08740896]
[-0.20695366]
[-4.83757835]]
[0.00785099]
54
RUBRICS
RESULT
Thus, the Neural Network is trained and tested using Back Propagation.
55
56