ML Lab (2MCA)
ML Lab (2MCA)
Numpy:
Installation:
1. In the terminal type the command
pip install numpy
Testing:
Troubleshooting:
Installation:
Pandas Terminal
Matplotlib is a Python library that helps to plot graphs. It is used in data visualization
and graphical plotting.
To use matplotlib, we need to install it.
To check pip
pip -V
The version of pip will be displayed, if it is successfully installed on your system.
Installing scikit-learn
There are different ways to install scikit-learn:
Install the latest official release. This is the best approach for most
users. It will provide a stable version and pre-built packages are
available for most platforms.
Install the version of scikit-learn provided by your operating system or
Python distribution. This is a quick option for those who have operating
systems or Python distributions that distribute scikit-learn. It might not
provide the latest release version.
Building the package from source. This is best for users who want the
latest-and-greatest features and aren’t afraid of running brand-new
code. This is also needed for users who wish to contribute to the
project.
Then run:
python -m pip show scikit-learn # to see which version and where scikit-
learn is installedpython -m pip freeze # to see all packages installed in
the active virtualenvpython -c "import sklearn; sklearn.show_versions()"
For example, if you want to install the NumPy package, you would type !pip
install numpy.
4. Wait for the installation to complete. You should see a message similar
to the following:
Features of Python
Following are key features of Python −
Python supports functional and structured programming methods as well as
OOP.
It can be used as a scripting language or can be compiled to byte-code for
building large applications.
It provides very high-level dynamic data types and supports dynamic type
checking.
It supports automatic garbage collection.
Variables in Python
Variables are nothing but reserved memory locations to store values. This means that
when you create a variable you reserve some space in memory. Let’s create a variable.
a = 10
Above, a is a variable assigned integer value 10.
Lists in Python
The list is a most versatile datatype available in Python which can be written as a list of
commaseparated values (items) between square bracket. Let’s see how to create lists
with different types.
myList1 = ['abc', 'pq'];
Tuples in Python
Tuples are sequences, just like lists. The differences between tuples and lists are, the
tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use
square brackets.
Creating a tuple is as simple as putting different comma-separated values. Optionally
you can put these comma-separated values between parentheses also. Let’s see how
to create a Tuple.
myTuple1 = ('abc', 'pq)];
Dictionary in Python
Dictionary is a sequence in Python. In a Dictionary, each key is separated from its
value by a colon (:), the items are separated by commas, and the whole thing is
enclosed in curly braces. Keys are unique within a dictionary while values may not be.
The values of a dictionary can be of any type, but the keys must be of an immutable
data type such as strings, numbers, or tuples.
Let’s see how to create a Dictionary −
# Creating two Dictionaries
Functions in Python
function is a block of organized, reusable code that is used to perform a single, related
action. Functions provide better modularity for your application and a high degree of
code reusing.
Function blocks begin with the keyword def followed by the function name and
parentheses ( ( ) ). Let’s create a function.
def demo(s):
print (s)
return
# Function call
demo("Function Called")
Output
Function Called
Basics of Python:
1) Write a program to read two numbers from user and
display the result using bitwise & , | and ^ operators on
the numbers.
Answer:
^: Bitwise XOR operator
Program:
a = int(input("Enter first number: "))
b = int(input("Enter second number: "))
c = a^b
print ("Bitwise XOR Operation of", a, "and", b, "=", c)
Output 1:
Enter first number: 12
Enter second number: 25
Bitwise XOR Operation of 12 and 25 = 21
Output 2:
Enter first number: 12
Enter second number: 25
Bitwise XOR Operation of 12 and 25 = 21
max_num = 20
# starting numbers from 0
n =1
if n % 2 != 0 and n % 3 != 0:
print(n)
n = n+1
# initializing lists
test_list1 = [1, 3, 4, 5, 2, 6]
# printing result
Output:
The original list 1 is : [1, 3, 4, 5, 2, 6]
The original list 2 is : [3, 4, 8, 3, 10, 1]
The maximum of both lists is : 10
The minimum of both lists is : 1
Time Complexity: O(n)
Auxiliary Space: O(n)
Experiment 2:
1) Implement python program to load structured data onto
DataFrame and perform exploratory data analysis
1. Importing a dataset
3. Preparation
4. Understanding of variables
6. Brainstorming
Grouping:
import pandas as pd
#Import data
employee = pd.read_csv("Employees.csv")#Grouping and perform count
over each group
dept_emp_num = employee.groupby('DEPT')
['DEPT'].count()print(dept_emp_num)
Ordering and joining:
import pandas as pd
import numpy as np
class display(object):
"""Display HTML representation of multiple objects"""
template = """<div style="float: left; padding: 10px;">
<p style='font-family:"Courier New", Courier, monospace'>{0}</p>{1}
</div>"""
def __init__(self, *args):
self.args = args
def _repr_html_(self):
return '\n'.join(self.template.format(a, eval(a)._repr_html_())
for a in self.args)
def __repr__(self):
return '\n\n'.join(a + '\n' + repr(eval(a))
for a in self.args)
Experiment 3:
Implement Python program to prepare plots such as
bar plot, histogram, distribution plot, box plot, scatter
plot.
Box plot:
# load packages
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np# prepare some data
np.random.seed(42)
data1 = np.random.randn(100)
data2 = np.random.randn(100)
data3 = np.random.randn(100)
fig,ax = plt.subplots()
bp = ax.boxplot(x=[data1,data2,data3], # sequence of arrays
positions=[1,5,7], # where to put these arrays
patch_artist=True). # allow filling the box with colors
Bar Plot:
fig,ax = plt.subplots()
ax.bar(x=[1,4,9], # positions to put the bar to
height=(data1.max(),data2.max(),data3.max()), # height of each bar
width=0.5, # width of the bar
edgecolor='black', # edgecolor of the bar
color=['green','red','orange'], # fill color of the bar
yerr=np.array([[0.1,0.1,0.1],[0.15,0.15,0.15]]), #
ecolor='red',
capsize=5)
Histogram:
fig,ax = plt.subplots()
ax.hist(x=[data1,data2],bins=20,edgecolor='black')
Scatter Plot:
fig,ax = plt.subplots()
ax.scatter(x=[1,2,3],y=[1,2,3],s=[100,200,300],c=['r','g','b'])
Experiment 4:
1)Implement Simple Linear regression algorithm in
Python
import matplotlib.pyplot as plt
from scipy import stats
x=[89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y=[21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
OUTPUT:
import torch.nn as nn
torch.manual_seed(42)
num_samples = 1000
x = torch.randn(num_samples, 2)
# create random weights and bias for the linear regression model
true_bias = torch.tensor([-3.5])
# Target variable
y = x @ true_weights.T + true_bias
ax[0].scatter(x[:,0],y)
ax[1].scatter(x[:,1],y)
ax[0].set_xlabel('X1')
ax[0].set_ylabel('Y')
ax[1].set_xlabel('X2')
ax[1].set_ylabel('Y')
plt.show()
Out put:
Experiment 5:
Implement Multiple linear regression algorithm using
Python.
import numpy as np
def generate_dataset(n):
x = []
y = []
random_x1 = np.random.rand()
random_x2 = np.random.rand()
for i in range(n):
x1 = i
x2 = i/2 + np.random.rand()*n
y.append(random_x1 * x1 + random_x2 * x2 + 1)
x, y = generate_dataset(200)
mpl.rcParams['legend.fontsize'] = 12
fig = plt.figure()
ax = fig.add_subplot(projection ='3d')
ax.legend()
ax.view_init(45, 0)
plt.show()
Output:
Experiment 6:
Implement Python Program to build logistic regression
and decision tree models using the Python package
statsmodel and sklearn APIs.
import numpy
1,1)
work.
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
regression object.
This object has a method called fit() that takes the independent and
dependent values as parameters
and fills the regression object with data that describes the relationship:
logr = linear_model.LogisticRegression()
logr.fit(X,y)
Now we have a logistic regression object that is ready to whether a
tumor is cancerous based on
predicted = logr.predict(numpy.array([3.46]).reshape(-1,1))
Example
import numpy
X=
numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.
69, 5.88]).reshape(-
1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
predicted = logr.predict(numpy.array([3.46]).reshape(-1,1))
print(predicted)
Result
[0]
Experiment 7:
Implement Python Program to perform the activities
such as
- splitting the data set into training and validation
datasets
INTRODUCTION
Why do you need to split data?
Train Dataset
Set of data used for learning (by the model), that is, to
fit the parameters to the machine learning model
Valid Dataset
Test Dataset
import pandas as pd
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
print(X.head())
ypred = classifier.predict(Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ("-------------------------------------------------------------------------")
if (label == ypred[i]):
else:
i=i+1
print ("-------------------------------------------------------------------------")
print ("-------------------------------------------------------------------------")
print ("-------------------------------------------------------------------------")
print ("-------------------------------------------------------------------------")
Output :
-------------------------------------------------------------------------
-------------------------------------------------------------------------
[[4 0 0]
[0 4 0]
[0 2 5]]
-------------------------------------------------------------------------
Classification Report:
-------------------------------------------------------------------------
Experiment 9:
Implement Support vector Machine algorithm on any
data set
Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in
Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily
put the new data point in the correct category in the future. This best
decision boundary is called a hyperplane.
Example: SVM can be understood with the example that we have used in
the KNN classifier. Suppose we see a strange cat that also has some features
of dogs, so if we want a model that can accurately identify whether it is a cat
or dog, so such a model can be created by using the SVM algorithm. We will
first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this
strange creature. So as support vector creates a decision boundary between
these two data (cat and dog) and choose extreme cases (support vectors), it
will see the extreme case of cat and dog. On the basis of the support
vectors, it will classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate
the classes in n-dimensional space, but we need to find out the best decision
boundary that helps to classify the data points. This best boundary is known
as the hyperplane of SVM.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since
these vectors support the hyperplane, hence called a Support vector.
Hence, the SVM algorithm helps to find the best line or decision boundary;
this best boundary or region is called as a hyperplane. SVM algorithm finds
the closest point of the lines from both the classes. These points are called
support vectors. The distance between the vectors and the hyperplane is
called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
z=x2 +y2
By adding the third dimension, the sample space will become as below
image:
So now, SVM will divide the datasets into classes in the following way.
Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-
axis. If we convert it in 2d space with z=1, then it will become as:
Experiment 10:
Write a program to implement the naive Bayesian
classifier for a sample training data set stored as a .csv
file. Compute the accuracy of the classifier, considering
few test data sets.
1. Importing the libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. # Importing the dataset
7. dataset = pd.read_csv('user_data.csv')
8. x = dataset.iloc[:, [2, 3]].values
9. y = dataset.iloc[:, 4].values
10.
11. # Splitting the dataset into the Training set and Test set
12. from sklearn.model_selection import train_test_split
13. x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, r
andom_state = 0)
14.
15. # Feature Scaling
16. from sklearn.preprocessing import StandardScaler
17. sc = StandardScaler()
18. x_train = sc.fit_transform(x_train)
19. x_test = sc.transform(x_test)
Out Put:
Experiment 11:
Write a Python program to construct a Bayesian network
considering medical data. Use this model to demonstrate
the diagnosis of heart patients using standard Heart
Disease Data Set.
From pomegranate import*
}) Tuberculosis=ConditionalProbabilityTable(
Lung = ConditionalProbabilityTable(
[„True‟, „False‟,0.25].
Bronchitis =
ConditionalProbabilityTable( [[ „True‟,
„True‟, 0.92],
[„True‟, „False‟,0.08].
[ „False‟, „True‟,0.03],
Tuberculosis_or_cancer =
„True‟, 1.0],
dyspnea = ConditionalProbabilityTable(
s1 = State(tuberculosis, name=”
smoker”) network =
BayesianNetwork(“asia”)
network.add_nodes(s0,s1,s2)
network.add_edge(s0,s1)
network.add_edge(s1.s2)
network.bake()
print(network.predict_probal({„tuberculosis‟: „True‟}))
Experiment 12:
Assuming a set of documents that need to be classified,
use the naive Bayesian Classifier model to perform this
task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision and recall for
your data set.
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','l
abel'])
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnu
m print(X)
print(y)
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape
count_vect = CountVectorizer()
xtrain_dtm =
count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names()
)
print(df)#tabular representation
MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
metrics print('Accuracy
metrics')
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
X_new_counts = count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
sandwich,pos
restaurant,neg I am tired of
this stuff,neg
He is my sworn
enemy,neg My boss is
horrible,neg
holiday,pos
house today,neg
OUTPUT
['about', 'am', 'amazing', 'an',
'and', 'awesome', 'beers', 'best',
'boss', 'can', 'deal',
about am amazing an and awesome beers best boss can ... today \
0 1 0 0 0 0 0 1 0 0 0 ... 0
1 0 0 0 0 0 0 0 1 0 0 ... 0
2 0 0 1 1 0 0 0 0 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 1
4 0 0 0 0 0 0 0 0 0 0 ... 0
5 0 1 0 0 1 0 0 0 0 0 ... 0
6 0 0 0 0 0 0 0 0 0 1 ... 0
7 0 0 0 0 0 0 0 0 0 0 ... 0
8 0 1 0 0 0 0 0 0 0 0 ... 0
9 0 0 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0
tomorrow
very view we
went
work 0 0 1 0 0 0 0 0
00
Experiment 13:
Implement PCA on any Image dataset for dimensionality
reduction and classification of images into different
classes
import pandas as pd
import numpy as np
x_pca = pca.transform(scaled_data_frame)
print(x_pca.shape)
print(scaled_data_frame.shape)
Experiment 14:
Implement the non-parametric Locally Weighted
Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw
graphs.
from numpy import *
import operator
from os import
listdir import
matplotlib
import matplotlib.pyplot as
plt import pandas as pd
import numpy.linalg as
np
m,n = np1.shape(xmat)
weights =
np1.mat(np1.eye((m))) for j in
range(m):
weights[j,j] = np1.exp(diff*diff.T/(-
def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W
def localWeightRegression(xmat,ymat,k):
m,n = np1.shape(xmat)
ypred =
np1.zeros(m) for i in
range(m):
ypred[i] =
xmat[i]*localWeight(xmat[i],xmat,ymat,k) return
ypred
data =
pd.read_csv('data10.csv') bill =
np1.array(data.total_bill) tip =
np1.array(data.tip) #preparing
np1.mat(bill)
mtip = np1.mat(tip)
m= np1.shape(mbill)[1]
one = np1.mat(np1.ones(m))
X=
np1.hstack((one.T,mbill.T))
#set k here
ypred =
localWeightRegression(X,mtip,2)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]
Output: