[go: up one dir, main page]

0% found this document useful (0 votes)
22 views52 pages

ML Lab (2MCA)

The document provides a comprehensive guide on installing essential Python packages for data science, including NumPy, SciPy, Pandas, Matplotlib, and scikit-learn, using both pip and Anaconda. It covers installation steps, testing for successful installations, and basic Python programming concepts such as data types, variables, and functions. Additionally, it outlines the process of exploratory data analysis (EDA) using the Pandas library, emphasizing the importance of understanding datasets for effective data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views52 pages

ML Lab (2MCA)

The document provides a comprehensive guide on installing essential Python packages for data science, including NumPy, SciPy, Pandas, Matplotlib, and scikit-learn, using both pip and Anaconda. It covers installation steps, testing for successful installations, and basic Python programming concepts such as data types, variables, and functions. Additionally, it outlines the process of exploratory data analysis (EDA) using the Pandas library, emphasizing the importance of understanding datasets for effective data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Experiment 1:

Installation of Python and its packages (Pandas, NumPy,


SciPy, matplotlib and scikit-learn) (Install Anaconda,
Jypyter Notebook, Programs covering basic concepts in
Python Programming)

Python comes loaded with powerful packages that make


machine learning tasks easier. This is why it is the language
of choice among data scientists. Of the vast collection of
libraries that you can choose from, there are a set of basic
libraries you should be familiar with as a beginner.

In this tutorial, we are going to install these basic libraries


on our system using Python’s built-in package manager PIP.

Numpy:

NumPy (stands for Numerical Python) provides useful


features for operations on n-arrays and matrices in Python. It
provides vectorization of mathematical operations on the
NumPy array type.

Installation:
1. In the terminal type the command
pip install numpy

Note: You will be asked to enter your password.

3. Installation will take only a few seconds.

Numpy is now installed on your system.

Testing:

1. In the terminal, start Python by typing the command


python

2. Use the following

error handling block:


try:import numpyexcept ImportError:print ( “numpy is not
installed”)

3. If numpy is installed successfully, then you will not get any


message in the terminal. Otherwise, you will get an error
message saying “numpy is not installed”.

Troubleshooting:

If you get the error message, try this command


pip install -U numpy
Scipy:

SciPy contains modules for linear algebra, optimization,


integration, and statistics. It is built uponNumPy. It provides
efficient numerical routines as numerical integration,
optimization, and more via specific submodules.

Installation:

1. In the terminal type the command


pip install scipy

Note: You will be asked to enter your…


How to Install or Download Python Pandas
Pandas can be installed in multiple ways on Windows, Linux and MacOS.
Various different ways are listed below:
Install Pandas on Windows
Python Pandas can be installed on Windows in two ways:
 Using pip
 Using Anaconda
Install Pandas using pip
PIP is a package management system used to install and manage software
packages/libraries written in Python. These files are stored in a large “online
repository” termed as Python Package Index (PyPI).
Install Python Pandas using Command Prompt
Pandas can be installed using PIP by use of the following command in
Command Prompt.
pip install pandas

Install Pandas using Anaconda


Anaconda is open-source software that contains Jupyter, spyder, etc that is
used for large data processing, Data Analytics, and heavy scientific computing.
If your system is not pre-equipped with Anaconda Navigator, you can learn how
to install Anaconda Navigator on Windows or Linux.
Install and Run Pandas from Anaconda Navigator
Step 1: Search for Anaconda Navigator in Start Menu and open it.
Step 2: Click on the Environment tab and then click on the Create button to
create a new Pandas Environment.
Step 3: Give a name to your Environment, e.g. Pandas, and then choose a
Python and its version to run in the environment. Now click on
the Create button to create Pandas Environment.
Step 4: Now click on the Pandas Environment created to activate it.
Step 5: In the list above package names, select All to filter all the packages.
Step 6: Now in the Search Bar, look for ‘Pandas‘. Select the Pandas
package for Installation.
Step 7: Now Right Click on the checkbox given before the name of the package
and then go to ‘Mark for specific version installation‘. Now select the version
that you want to install.
Step 8: Click on the Apply button to install the Pandas Package.
Step 9: Finish the Installation process by clicking on the Apply button.
Step 10: Now to open the Pandas Environment, click on the Green Arrow on
the right of the package name and select the Console with which you want to
begin your Pandas programming.
Pandas Terminal
Window:

Pandas Terminal

How to install matplotlib in Python?

Matplotlib is a Python library that helps to plot graphs. It is used in data visualization
and graphical plotting.
To use matplotlib, we need to install it.

Step 1 − Make sure Python and pip is preinstalled on your


system
Type the following commands in the command prompt to check is python and pip is
installed on your system.
To check Python
python --version
If python is successfully installed, the version of python installed on your system will be
displayed.

To check pip
pip -V
The version of pip will be displayed, if it is successfully installed on your system.

Step 2 − Install Matplotlib


Matplotlib can be installed using pip. The following command is run in the command
prompt to install Matplotlib.
pip install matplotlib
This command will start downloading and installing packages related to the matplotlib
library. Once done, the message of successful installation will be displayed.

Step 3 − Check if it is installed successfully


To verify that matplotlib is successfully installed on your system, execute the following
command in the command prompt. If matplotlib is successfully installed, the version of
matplotlib installed will be displayed.
import matplotlib
matplotlib.__version__

Installing scikit-learn
There are different ways to install scikit-learn:

 Install the latest official release. This is the best approach for most
users. It will provide a stable version and pre-built packages are
available for most platforms.
 Install the version of scikit-learn provided by your operating system or
Python distribution. This is a quick option for those who have operating
systems or Python distributions that distribute scikit-learn. It might not
provide the latest release version.
 Building the package from source. This is best for users who want the
latest-and-greatest features and aren’t afraid of running brand-new
code. This is also needed for users who wish to contribute to the
project.

Installing the latest release


Operating System Windows macOS Linux

Packager pip conda

Install the 64bit version of Python 3, for instance from https://www.python.org.

Then run:

pip install -U scikit-learn


In order to check your installation you can use

python -m pip show scikit-learn # to see which version and where scikit-
learn is installedpython -m pip freeze # to see all packages installed in
the active virtualenvpython -c "import sklearn; sklearn.show_versions()"

How to Install Python Packages on Jupyter


Notebook
Python packages are modules that contain code and functions that can be
used in Python programs. There are thousands of Python packages available,
and you may need to install some of them to complete your data analysis
tasks on Jupyter Notebook.

Here’s how to install Python packages on Jupyter Notebook:

1. Open Jupyter Notebook on your computer.


2. Create a new notebook or open an existing one.
3. In a code cell, type !pip install <package_name> and run the cell.
Replace <package_name> with the name of the package you want to
install.

For example, if you want to install the NumPy package, you would type !pip
install numpy.

4. Wait for the installation to complete. You should see a message similar
to the following:

Successfully installed numpy-1.20.3


This means that the package has been installed successfully.
5. To use the package in your code, import it using import <package_name>.
For example, to import NumPy, you would type import numpy.

What are the basic concepts of Python?


PythonProgrammingServer Side Programming

Python is a general-purpose interpreted, interactive, object-oriented, and high-level


programming language.

Features of Python
Following are key features of Python −
 Python supports functional and structured programming methods as well as
OOP.
 It can be used as a scripting language or can be compiled to byte-code for
building large applications.
 It provides very high-level dynamic data types and supports dynamic type
checking.
 It supports automatic garbage collection.
Variables in Python
Variables are nothing but reserved memory locations to store values. This means that
when you create a variable you reserve some space in memory. Let’s create a variable.
a = 10
Above, a is a variable assigned integer value 10.

Numeric Datatype in Python


Number data types store numeric values. They are immutable data types, means that
changing the value of a number data type results in a newly allocated object.
Python supports four different numerical types.
 int (signed integers) − They are often called just integers or ints, are positive or
negative whole numbers with no decimal point.
 long (long integers ) − Also called longs, they are integers of unlimited size,
written like integers and followed by an uppercase or lowercase L.
 float (floating point real values) − Also called floats, they represent real
numbers and are written with a decimal point dividing the integer and fractional
parts. Floats may also be in scientific notation, with E or e indicating the power of
10 (2.5e2 = 2.5 x 102 = 250).
 complex (complex numbers) − are of the form a + bJ, where a and b are floats
and J (or j) represents the square root of -1 (which is an imaginary number). The
real part of the number is a, and the imaginary part is b. Complex numbers are
not used much in Python programming.
Strings in Python
Strings are amongst the most popular types in Python. We can create them simply by
enclosing characters in quotes. Python treats single quotes the same as double quotes.
Creating strings is as simple as assigning a value to a variable.
Let’s see how to easily create a String in Python.
myStr = Thisisit!'

Lists in Python
The list is a most versatile datatype available in Python which can be written as a list of
commaseparated values (items) between square bracket. Let’s see how to create lists
with different types.
myList1 = ['abc', 'pq'];

myList2 = [5, 10, 15, 20];

Tuples in Python
Tuples are sequences, just like lists. The differences between tuples and lists are, the
tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use
square brackets.
Creating a tuple is as simple as putting different comma-separated values. Optionally
you can put these comma-separated values between parentheses also. Let’s see how
to create a Tuple.
myTuple1 = ('abc', 'pq)];

myTuple2 = (5, 10, 15, 20);

Dictionary in Python
Dictionary is a sequence in Python. In a Dictionary, each key is separated from its
value by a colon (:), the items are separated by commas, and the whole thing is
enclosed in curly braces. Keys are unique within a dictionary while values may not be.
The values of a dictionary can be of any type, but the keys must be of an immutable
data type such as strings, numbers, or tuples.
Let’s see how to create a Dictionary −
# Creating two Dictionaries

dict1 = {'Player':['Jacob','Steve','David','John','Kane'], 'Age':[29, 25,


31, 26, 27]}

dict2 = {'Rank':[1,2,3,4,5], 'Points':[100,87, 80,70, 50]}

Classes & Objects in Python


A class is a user-defined prototype for an object that defines a set of attributes that
characterize any object of the class. The attributes are data members and methods,
accessed via dot notation.
An object is a unique instance of a data structure that's defined by its class. An object
comprises both data members (class variables and instance variables) and methods.

Functions in Python
function is a block of organized, reusable code that is used to perform a single, related
action. Functions provide better modularity for your application and a high degree of
code reusing.
Function blocks begin with the keyword def followed by the function name and
parentheses ( ( ) ). Let’s create a function.
def demo(s):

print (s)

return

# Function call

demo("Function Called")

Output
Function Called

Basics of Python:
1) Write a program to read two numbers from user and
display the result using bitwise & , | and ^ operators on
the numbers.
Answer:
^: Bitwise XOR operator
Program:
a = int(input("Enter first number: "))
b = int(input("Enter second number: "))
c = a^b
print ("Bitwise XOR Operation of", a, "and", b, "=", c)
Output 1:
Enter first number: 12
Enter second number: 25
Bitwise XOR Operation of 12 and 25 = 21
Output 2:
Enter first number: 12
Enter second number: 25
Bitwise XOR Operation of 12 and 25 = 21

2) Write a program to calculate the sum of numbers from 1 to


20 which are not divisible by 2, 3 or 5.
We can input a set of integer, and check which integers in this range, beginning
with 1 are not divisible by 2 or 3, by checking the remainder of the integer with 2
and 3.
Example:
Input: 10
Output: Numbers not divisible by 2 and 3
1
5
7
Method 1: We check if the number is not divisible by 2 and 3 using the and
clause, then outputs the number.
 Python3

# input the maximum number to

# which you want to send

max_num = 20
# starting numbers from 0

n =1

# run until it reaches maximum number

print("Numbers not divisible by 2 and 3")

while n <= max_num:

# check if number is divisible by 2 and 3

if n % 2 != 0 and n % 3 != 0:

print(n)

# incrementing the counter

n = n+1

3) Write a program to find the maximum of two numbers using


functions. Implement slicing operation on strings and lists.

# Python3 code to demonstrate


# maximum and minimum values in two lists

# using max() + min() + "+" operator

# initializing lists

test_list1 = [1, 3, 4, 5, 2, 6]

test_list2 = [3, 4, 8, 3, 10, 1]

# printing the original lists

print ("The original list 1 is : " + str(test_list1))

print ("The original list 2 is : " + str(test_list2))

# using max() + min() + "+" operator

# maximum and minimum values in two lists

max_all = max(test_list1 + test_list2)

min_all = min(test_list1 + test_list2)

# printing result

print ("The maximum of both lists is : " + str(max_all))

print ("The minimum of both lists is : " + str(min_all))

Output:
The original list 1 is : [1, 3, 4, 5, 2, 6]
The original list 2 is : [3, 4, 8, 3, 10, 1]
The maximum of both lists is : 10
The minimum of both lists is : 1
Time Complexity: O(n)
Auxiliary Space: O(n)

Experiment 2:
1) Implement python program to load structured data onto
DataFrame and perform exploratory data analysis

Exploratory data analysis (EDA) is an especially important


activity in the routine of a data analyst or scientist.

It enables an in depth understanding of the dataset, define or


discard hypotheses and create predictive models on a solid
basis.

It uses data manipulation techniques and several statistical


tools to describe and understand the relationship between
variables and how these can impact business.

In fact, it’s thanks to EDA that we can ask ourselves


meaningful questions that can impact business.

In this article, I will share with you a template for


exploratory analysis that I have used over the years and that
has proven to be solid for many projects and domains. This is
implemented through the use of the Pandas library — an
essential tool for any analyst working with Python.

The process consists of several steps:

1. Importing a dataset

2. Understanding the big picture

3. Preparation

4. Understanding of variables

5. Study of the relationships between variables

6. Brainstorming

This template is the result of many iterations and allows me


to ask myself meaningful questions about the data in front of
me. At the end of the process, we will be able to
consolidate a business report or continue with the data
modeling phase.

The image below shows how the brainstorming phase is


connected with that of understanding the variables and how
this in turn is connected again with the brainstorming phase.

This process describes how we can move to ask new


questions until we are satisfied.
2) Implement python program for data preparation activities
such as filtering, grouping, ordering and joining of datasets.
Filtering:
Importpandasas
pd
# information about employees
id_number = ['128', '478', '257', '299', '175', '328', '099', '457',
'144', '222']
name = ['Patrick', 'Amanda', 'Antonella', 'Eduard', 'John',
'Alejandra', 'Layton', 'Melanie', 'David', 'Lewis']
surname = ['Miller', 'Torres', 'Brown', 'Iglesias', 'Wright', 'Campos',
'Platt', 'Cavill', 'Lange', 'Bellow']
division = ['Sales', 'IT', 'IT', 'Sales', 'Marketing', 'Engineering',
'Engineering', 'Sales', 'Engineering', 'Sales']
salary = [30000, 54000, 80000, 79000, 15000, 18000, 30000, 35000,
45000, 30500]
telephone = ['7366578', '7366444', '7366120', '7366574', '7366113',
'7366117', '7366777', '7366579', '7366441', '7366440']
type_contract = ['permanent', 'temporary', 'temporary', 'permanent',
'internship', 'internship', 'permanent', 'temporary', 'permanent',
'permanent']
# data frame containing information about employees
df_employees = pd.DataFrame({'name': name, 'surname': surname,
'division': division,
'salary': salary, 'telephone': telephone,
'type_contract': type_contract}, index=id_number)
df_employees

Grouping:
import pandas as pd
#Import data
employee = pd.read_csv("Employees.csv")#Grouping and perform count
over each group
dept_emp_num = employee.groupby('DEPT')
['DEPT'].count()print(dept_emp_num)
Ordering and joining:
import pandas as pd
import numpy as np

class display(object):
"""Display HTML representation of multiple objects"""
template = """<div style="float: left; padding: 10px;">
<p style='font-family:"Courier New", Courier, monospace'>{0}</p>{1}
</div>"""
def __init__(self, *args):
self.args = args

def _repr_html_(self):
return '\n'.join(self.template.format(a, eval(a)._repr_html_())
for a in self.args)

def __repr__(self):
return '\n\n'.join(a + '\n' + repr(eval(a))
for a in self.args)

Experiment 3:
Implement Python program to prepare plots such as
bar plot, histogram, distribution plot, box plot, scatter
plot.
Box plot:
# load packages
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np# prepare some data
np.random.seed(42)
data1 = np.random.randn(100)
data2 = np.random.randn(100)
data3 = np.random.randn(100)
fig,ax = plt.subplots()
bp = ax.boxplot(x=[data1,data2,data3], # sequence of arrays
positions=[1,5,7], # where to put these arrays
patch_artist=True). # allow filling the box with colors
Bar Plot:
fig,ax = plt.subplots()
ax.bar(x=[1,4,9], # positions to put the bar to
height=(data1.max(),data2.max(),data3.max()), # height of each bar
width=0.5, # width of the bar
edgecolor='black', # edgecolor of the bar
color=['green','red','orange'], # fill color of the bar
yerr=np.array([[0.1,0.1,0.1],[0.15,0.15,0.15]]), #
ecolor='red',
capsize=5)
Histogram:
fig,ax = plt.subplots()
ax.hist(x=[data1,data2],bins=20,edgecolor='black')

Scatter Plot:
fig,ax = plt.subplots()
ax.scatter(x=[1,2,3],y=[1,2,3],s=[100,200,300],c=['r','g','b'])
Experiment 4:
1)Implement Simple Linear regression algorithm in
Python
import matplotlib.pyplot as plt
from scipy import stats
x=[89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y=[21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
OUTPUT:

2)Implement Gradient Descent algorithm for the


above linear regression model
import torch

import torch.nn as nn

import matplotlib.pyplot as plt

Set the input and output data

# set random seed for reproducibility

torch.manual_seed(42)

# set number of samples

num_samples = 1000

# create random features with 2 dimensions

x = torch.randn(num_samples, 2)

# create random weights and bias for the linear regression model

true_weights = torch.tensor([1.3, -1])

true_bias = torch.tensor([-3.5])

# Target variable
y = x @ true_weights.T + true_bias

# Plot the dataset

fig, ax = plt.subplots(1, 2, sharey=True)

ax[0].scatter(x[:,0],y)

ax[1].scatter(x[:,1],y)

ax[0].set_xlabel('X1')

ax[0].set_ylabel('Y')

ax[1].set_xlabel('X2')

ax[1].set_ylabel('Y')

plt.show()

Out put:
Experiment 5:
Implement Multiple linear regression algorithm using
Python.

import numpy as np

import matplotlib as mpl

from mpl_toolkits.mplot3d import Axes3D

import matplotlib.pyplot as plt

def generate_dataset(n):

x = []
y = []

random_x1 = np.random.rand()

random_x2 = np.random.rand()

for i in range(n):

x1 = i

x2 = i/2 + np.random.rand()*n

x.append([1, x1, x2])

y.append(random_x1 * x1 + random_x2 * x2 + 1)

return np.array(x), np.array(y)

x, y = generate_dataset(200)

mpl.rcParams['legend.fontsize'] = 12

fig = plt.figure()

ax = fig.add_subplot(projection ='3d')

ax.scatter(x[:, 1], x[:, 2], y, label ='y', s = 5)

ax.legend()
ax.view_init(45, 0)

plt.show()

Output:

Experiment 6:
Implement Python Program to build logistic regression
and decision tree models using the Python package
statsmodel and sklearn APIs.
import numpy

Store the independent variables in X.

Store the dependent variable in y.

Below is a sample dataset:

#X represents the size of a tumor in centimeters.


X=
numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.
69, 5.88]).reshape(-

1,1)

#Note: X has to be reshaped into a column from a row for the


LogisticRegression() function to

work.

#y represents whether or not the tumor is cancerous (0 for


&quot;No&quot;, 1 for &quot;Yes&quot;).

y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

We will use a method from the sklearn module, so we will have to


import that module as well:

from sklearn import linear_model

From the sklearn module we will use the LogisticRegression() method


to create a logistic

regression object.

This object has a method called fit() that takes the independent and
dependent values as parameters

and fills the regression object with data that describes the relationship:

logr = linear_model.LogisticRegression()

logr.fit(X,y)
Now we have a logistic regression object that is ready to whether a
tumor is cancerous based on

the tumor size:

#predict if tumor is cancerous where the size is 3.46mm:

predicted = logr.predict(numpy.array([3.46]).reshape(-1,1))

Example

See the whole example in action:

import numpy

from sklearn import linear_model

#Reshaped for Logistic function.

X=
numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.
69, 5.88]).reshape(-

1,1)

y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()

logr.fit(X,y)

#predict if tumor is cancerous where the size is 3.46mm:

predicted = logr.predict(numpy.array([3.46]).reshape(-1,1))

print(predicted)

Result
[0]

Experiment 7:
Implement Python Program to perform the activities
such as
- splitting the data set into training and validation
datasets

INTRODUCTION
Why do you need to split data?

You don’t want your model to over-learn from training data


and perform poorly after being deployed in production. You
need to have a mechanism to assess how well your model is
generalizing. Hence, you need to separate your input data
into training, validation, and testing subsets to prevent your
model from overfitting and to evaluate your model
effectively.

In this post, we will cover the following things.

1. A brief definition of training, validation, and testing


datasets

2. Ready to use code for creating these datasets (2


methods)
3. Understand the science behind dataset split ratio

Definition of Train-Valid-Test Split


Train-Valid-Test split is a technique to evaluate the
performance of your machine learning model — classification
or regression alike. You take a given dataset and divide it
into three subsets. A brief description of the role of each of
these datasets is below.

Train Dataset

 Set of data used for learning (by the model), that is, to
fit the parameters to the machine learning model

Valid Dataset

 Set of data used to provide an unbiased evaluation of a


model fitted on the training dataset while tuning
model hyperparameters.

 Also play a role in other forms of model preparation,


such as feature selection, threshold cut-off selection.

Test Dataset

 Set of data used to provide an unbiased evaluation of a


final model fitted on the training dataset.
Experiment 8:
Write a Python program to implement k-Nearest
Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions.
import numpy as np

import pandas as pd

from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

from sklearn import metrics

names = [&#39;sepal-length&#39;, &#39;sepal-width&#39;, &#39;petal-length&#39;, &#39;petal-


width&#39;, &#39;Class&#39;]

# Read dataset to pandas dataframe

dataset = pd.read_csv(&quot;9-dataset.csv&quot;, names=names)

X = dataset.iloc[:, :-1]

y = dataset.iloc[:, -1]
print(X.head())

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)

classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)

ypred = classifier.predict(Xtest)

i=0

print (&quot;\n-------------------------------------------------------------------------&quot;)

print (&#39;%-25s %-25s %-25s&#39; % (&#39;Original Label&#39;, &#39;Predicted Label&#39;,


&#39;Correct/Wrong&#39;))

print (&quot;-------------------------------------------------------------------------&quot;)

for label in ytest:

print (&#39;%-25s %-25s&#39; % (label, ypred[i]), end=&quot;&quot;)

if (label == ypred[i]):

print (&#39; %-25s&#39; % (&#39;Correct&#39;))

else:

print (&#39; %-25s&#39; % (&#39;Wrong&#39;))

i=i+1

print (&quot;-------------------------------------------------------------------------&quot;)

print(&quot;\nConfusion Matrix:\n&quot;,metrics.confusion_matrix(ytest, ypred))

print (&quot;-------------------------------------------------------------------------&quot;)

print(&quot;\nClassification Report:\n&quot;,metrics.classification_report(ytest, ypred))

print (&quot;-------------------------------------------------------------------------&quot;)

print(&#39;Accuracy of the classifer is %0.2f&#39; % metrics.accuracy_score(ytest,ypred))

print (&quot;-------------------------------------------------------------------------&quot;)

Output :

sepal-length sepal-width petal-length petal-width


0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

-------------------------------------------------------------------------

Original Label Predicted Label Correct/Wrong

-------------------------------------------------------------------------

Iris-versicolor Iris-versicolor Correct

Iris-virginica Iris-versicolor Wrong

Iris-virginica Iris-virginica Correct

Iris-versicolor Iris-versicolor Correct

Iris-setosa Iris-setosa Correct

Iris-versicolor Iris-versicolor Correct

Iris-setosa Iris-setosa Correct

Iris-setosa Iris-setosa Correct

Iris-virginica Iris-virginica Correct

Iris-virginica Iris-versicolor Wrong

Iris-virginica Iris-virginica Correct

Iris-setosa Iris-setosa Correct

Iris-virginica Iris-virginica Correct

Iris-virginica Iris-virginica Correct

Iris-versicolor Iris-versicolor Correct


Confusion Matrix:

[[4 0 0]

[0 4 0]

[0 2 5]]

-------------------------------------------------------------------------

Classification Report:

precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 4

Iris-versicolor 0.67 1.00 0.80 4

Iris-virginica 1.00 0.71 0.83 7

avg / total 0.91 0.87 0.87 15

-------------------------------------------------------------------------

Accuracy of the classifer is 0.87

Experiment 9:
Implement Support vector Machine algorithm on any
data set
Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in
Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily
put the new data point in the correct category in the future. This best
decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the


hyperplane. These extreme cases are called as support vectors, and hence
algorithm is termed as Support Vector Machine. Consider the below diagram
in which there are two different categories that are classified using a
decision boundary or hyperplane:

Example: SVM can be understood with the example that we have used in
the KNN classifier. Suppose we see a strange cat that also has some features
of dogs, so if we want a model that can accurately identify whether it is a cat
or dog, so such a model can be created by using the SVM algorithm. We will
first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this
strange creature. So as support vector creates a decision boundary between
these two data (cat and dog) and choose extreme cases (support vectors), it
will see the extreme case of cat and dog. On the basis of the support
vectors, it will classify it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Types of SVM
SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate
the classes in n-dimensional space, but we need to find out the best decision
boundary that helps to classify the data points. This best boundary is known
as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the


dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means


the maximum distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since
these vectors support the hyperplane, hence called a Support vector.

How does SVM works?


Linear SVM:

The working of the SVM algorithm can be understood by using an example.


Suppose we have a dataset that has two tags (green and blue), and the
dataset has two features x1 and x2. We want a classifier that can classify the
pair(x1, x2) of coordinates in either green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate
these two classes. But there can be multiple lines that can separate these
classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary;
this best boundary or region is called as a hyperplane. SVM algorithm finds
the closest point of the lines from both the classes. These points are called
support vectors. The distance between the vectors and the hyperplane is
called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line,


but for non-linear data, we cannot draw a single straight line. Consider the
below image:
So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data, we
will add a third dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below
image:

So now, SVM will divide the datasets into classes in the following way.
Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-
axis. If we convert it in 2d space with z=1, then it will become as:

Experiment 10:
Write a program to implement the naive Bayesian
classifier for a sample training data set stored as a .csv
file. Compute the accuracy of the classifier, considering
few test data sets.
1. Importing the libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. # Importing the dataset
7. dataset = pd.read_csv('user_data.csv')
8. x = dataset.iloc[:, [2, 3]].values
9. y = dataset.iloc[:, 4].values
10.
11. # Splitting the dataset into the Training set and Test set
12. from sklearn.model_selection import train_test_split
13. x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, r
andom_state = 0)
14.
15. # Feature Scaling
16. from sklearn.preprocessing import StandardScaler
17. sc = StandardScaler()
18. x_train = sc.fit_transform(x_train)
19. x_test = sc.transform(x_test)
Out Put:
Experiment 11:
Write a Python program to construct a Bayesian network
considering medical data. Use this model to demonstrate
the diagnosis of heart patients using standard Heart
Disease Data Set.
From pomegranate import*

Asia=DiscreteDistribution({ „True‟:0.5, „False‟:0.5

}) Tuberculosis=ConditionalProbabilityTable(

[[ „True‟, „True‟, 0.2],

[„True‟, „False‟, 0.8],

[ „False‟, „True‟, 0.01],

[ „False‟, „False‟, 0.98]], [asia])

Smoking = DiscreteDistribution({ „True‟:0.5, „False‟:0.5 })

Lung = ConditionalProbabilityTable(

[[ „True‟, „True‟, 0.75],

[„True‟, „False‟,0.25].

[ „False‟, „True‟, 0.02],

[ „False‟, „False‟, 0.98]], [ smoking])

Bronchitis =

ConditionalProbabilityTable( [[ „True‟,

„True‟, 0.92],

[„True‟, „False‟,0.08].

[ „False‟, „True‟,0.03],

[ „False‟, „False‟, 0.98]], [ smoking])

Tuberculosis_or_cancer =

ConditionalProbabilityTable( [[ „True‟, „True‟,

„True‟, 1.0],

[„True‟, „True‟, „False‟, 0.0],

[„True‟, „False‟, „True‟, 1.0],

[„True‟, „False‟, „False‟, 0.0],


[„False‟, „True‟, „True‟, 1.0],

[„False‟, „True‟, „False‟, 0.0],

[„False‟, „False‟ „True‟, 1.0],

[„False‟, „False‟, „False‟, 0.0]], [tuberculosis,

lung]) Xray = ConditionalProbabilityTable(

[[ „True‟, „True‟, 0.885],

[„True‟, „False‟, 0.115],

[ „False‟, „True‟, 0.04],

[ „False‟, „False‟, 0.96]], [tuberculosis_or_cancer])

dyspnea = ConditionalProbabilityTable(

[[ „True‟, „True‟, „True‟, 0.96],

[„True‟, „True‟, „False‟, 0.04],

[„True‟, „False‟, „True‟, 0.89],

[„True‟, „False‟, „False‟, 0.11],

[„False‟, „True‟, „True‟, 0.96],

[„False‟, „True‟, „False‟, 0.04],

[„False‟, „False‟ „True‟, 0.89],

[„False‟, „False‟, „False‟, 0.11 ]], [tuberculosis_or_cancer,

bronchitis]) s0 = State(asia, name=”asia”)

s1 = State(tuberculosis, name=”

tuberculosis”) s2 = State(smoking, name=”

smoker”) network =

BayesianNetwork(“asia”)

network.add_nodes(s0,s1,s2)

network.add_edge(s0,s1)
network.add_edge(s1.s2)

network.bake()

print(network.predict_probal({„tuberculosis‟: „True‟}))

Experiment 12:
Assuming a set of documents that need to be classified,
use the naive Bayesian Classifier model to perform this
task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision and recall for
your data set.
import pandas as pd

msg=pd.read_csv(&#39;naivetext1.csv&#39;,names=[&#39;message&#39;,&#39;l
abel&#39;])

print(&#39;The dimensions of the dataset&#39;,msg.shape)

msg[&#39;labelnum&#39;]=msg.label.map({&#39;pos&#39;:1,&#39;neg&#39;:0})

X=msg.message

y=msg.labelnu

m print(X)

print(y)

#splitting the dataset into train and test data

from sklearn.model_selection import


train_test_split

xtrain,xtest,ytrain,ytest=train_test_split(X,y)

print(xtest.shape)

print(xtrain.shape)

print(ytest.shape)

print(ytrain.shape

#output of count vectoriser is a sparse matrix

from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer()

xtrain_dtm =

count_vect.fit_transform(xtrain)

xtest_dtm=count_vect.transform(xtest)

print(count_vect.get_feature_names())

df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names()
)

print(df)#tabular representation

print(xtrain_dtm) #sparse matrix representation

# Training Naive Bayes (NB) classifier on training

data. from sklearn.naive_bayes import

MultinomialNB

clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)

#printing accuracy metrics

from sklearn import

metrics print(&#39;Accuracy

metrics&#39;)

print(&#39;Accuracy of the classifer


is&#39;,metrics.accuracy_score(ytest,predicted))

print(&#39;Confusion matrix&#39;)

print(metrics.confusion_matrix(ytest,predicted))

print(&#39;Recall and Precison &#39;)

print(metrics.recall_score(ytest,predicted))

print(metrics.precision_score(ytest,predicted))

&#39;&#39;&#39;docs_new = [&#39;I like this place&#39;, &#39;My boss is not


my saviour&#39;]

X_new_counts = count_vect.transform(docs_new)

predictednew = clf.predict(X_new_counts)

for doc, category in zip(docs_new,

predictednew): print(&#39;%s-&gt;%s&#39; % (doc,

msg.labelnum[category]))&#39;&#39;&#39; I love this

sandwich,pos

This is an amazing place,pos


I feel very good about these

beers,pos This is my best work,pos

What an awesome view,pos

I do not like this

restaurant,neg I am tired of

this stuff,neg

I can&#39;t deal with this,neg

He is my sworn

enemy,neg My boss is

horrible,neg

This is an awesome place,pos

I do not like the taste of this

juice,neg I love to dance,pos

I am sick and tired of this

place,neg What a great

holiday,pos

That is a bad locality to stay,neg

We will have good fun

tomorrow,pos I went to my enemy&#39;s

house today,neg

OUTPUT
[&#39;about&#39;, &#39;am&#39;, &#39;amazing&#39;, &#39;an&#39;,
&#39;and&#39;, &#39;awesome&#39;, &#39;beers&#39;, &#39;best&#39;,
&#39;boss&#39;, &#39;can&#39;, &#39;deal&#39;,

&#39;do&#39;, &#39;enemy&#39;, &#39;feel&#39;, &#39;fun&#39;,


&#39;good&#39;, &#39;have&#39;, &#39;horrible&#39;, &#39;house&#39;,
&#39;is&#39;, &#39;like&#39;, &#39;love&#39;, &#39;my&#39;,

&#39;not&#39;, &#39;of&#39;, &#39;place&#39;, &#39;restaurant&#39;,


&#39;sandwich&#39;, &#39;sick&#39;, &#39;stuff&#39;, &#39;these&#39;,
&#39;this&#39;, &#39;tired&#39;, &#39;to&#39;,

&#39;today&#39;, &#39;tomorrow&#39;, &#39;very&#39;, &#39;view&#39;,


&#39;we&#39;, &#39;went&#39;, &#39;what&#39;, &#39;will&#39;,
&#39;with&#39;, &#39;work&#39;]

about am amazing an and awesome beers best boss can ... today \

0 1 0 0 0 0 0 1 0 0 0 ... 0

1 0 0 0 0 0 0 0 1 0 0 ... 0

2 0 0 1 1 0 0 0 0 0 0 ... 0

3 0 0 0 0 0 0 0 0 0 0 ... 1

4 0 0 0 0 0 0 0 0 0 0 ... 0

5 0 1 0 0 1 0 0 0 0 0 ... 0

6 0 0 0 0 0 0 0 0 0 1 ... 0

7 0 0 0 0 0 0 0 0 0 0 ... 0

8 0 1 0 0 0 0 0 0 0 0 ... 0

9 0 0 0 1 0 1 0 0 0 0 ... 0

10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0

12 0 0 0 1 0 1 0 0 0 0 ... 0

tomorrow

very view we

went

what will with

work 0 0 1 0 0 0 0 0

00

Experiment 13:
Implement PCA on any Image dataset for dimensionality
reduction and classification of images into different
classes
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt


import seaborn
#%matplotlib inline

from sklearn.datasets import load_breast_cancer


raw_data = load_breast_cancer()

raw_data_frame = pd.DataFrame(raw_data['data'], columns =


raw_data['feature_names'])
raw_data_frame.columns

#Standardize the data


from sklearn.preprocessing import StandardScaler
data_scaler = StandardScaler()
data_scaler.fit(raw_data_frame)
scaled_data_frame = data_scaler.transform(raw_data_frame)
#Perform the principal component analysis transformation
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
pca.fit(scaled_data_frame)

x_pca = pca.transform(scaled_data_frame)

print(x_pca.shape)
print(scaled_data_frame.shape)

#Visualize the principal components


plt.scatter(x_pca[:,0],x_pca[:,1])
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')

#Visualize the principal components with a color scheme


plt.scatter(x_pca[:,0],x_pca[:,1], c=raw_data['target'])
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')

#Investigating at the principal components


pca.components_[0]

Experiment 14:
Implement the non-parametric Locally Weighted
Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw
graphs.
from numpy import *

import operator

from os import

listdir import

matplotlib

import matplotlib.pyplot as
plt import pandas as pd

import numpy as np1

import numpy.linalg as

np

from scipy.stats.stats import pearsonr

def kernel(point,xmat, k):

m,n = np1.shape(xmat)

weights =

np1.mat(np1.eye((m))) for j in

range(m):

diff = point - X[j]

weights[j,j] = np1.exp(diff*diff.T/(-

2.0*k**2)) return weights

def localWeight(point,xmat,ymat,k):

wei = kernel(point,xmat,k)

W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))

return W

def localWeightRegression(xmat,ymat,k):

m,n = np1.shape(xmat)

ypred =
np1.zeros(m) for i in

range(m):

ypred[i] =

xmat[i]*localWeight(xmat[i],xmat,ymat,k) return

ypred

# load data points

data =

pd.read_csv(&#39;data10.csv&#39;) bill =

np1.array(data.total_bill) tip =

np1.array(data.tip) #preparing

and add 1 in bill mbill =

np1.mat(bill)

mtip = np1.mat(tip)

m= np1.shape(mbill)[1]

one = np1.mat(np1.ones(m))

X=

np1.hstack((one.T,mbill.T))

#set k here

ypred =

localWeightRegression(X,mtip,2)
SortIndex = X[:,1].argsort(0)

xsort = X[SortIndex][:,0]

Output:

You might also like