0% found this document useful (0 votes)

69 views8 pages

Load Dataset: Import As

The document loads and analyzes an advertising dataset with 200 entries and 4 variables: TV, Radio, Newspaper, and Sales. It performs exploratory data analysis including calculating statistics, visualizing the distributions with histograms, and splitting the data into training and test sets for linear regression. Linear regression is performed to predict Sales with TV, Radio, and Newspaper as features. The model performance is evaluated using mean absolute error. Pairwise relationships between variables are also visualized with a scatter plot.

Uploaded by

ZESTY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views8 pages

Load Dataset: Import As

Uploaded by

ZESTY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

5/7/2019 lab06

Load dataset

In [2]:

import seaborn as sns

In [3]:

import pandas as pd
# %matplotlib inline
import matplotlib.pyplot as plt
data=pd.read_csv("D:\8 semester\Data warehousing and data mining\Labs\LAB6\Advertising.csv"

In [4]:

data=pd.read_csv("D:\8 semester\Data warehousing and data mining\Labs\LAB6\Advertising.csv"

Task NO 1

In [5]:

data.shape

Out[5]:

(200, 4)

localhost:8888/notebooks/lab06.ipynb# 1/9
5/7/2019 lab06

In [6]:

data.describe()

Out[6]:

TV Radio Newspaper Sales

count 200.000000 200.000000 200.000000 200.000000

mean 147.042500 23.264000 30.554000 14.022500

std 85.854236 14.846809 21.778621 5.217457

min 0.700000 0.000000 0.300000 1.600000

25% 74.375000 9.975000 12.750000 10.375000

50% 149.750000 22.900000 25.750000 12.900000

75% 218.825000 36.525000 45.100000 17.400000

max 296.400000 49.600000 114.000000 27.000000

In [7]:

data.max()

Out[7]:

TV 296.4
Radio 49.6
Newspaper 114.0
Sales 27.0
dtype: float64

In [8]:

data.min()

Out[8]:

TV 0.7
Radio 0.0
Newspaper 0.3
Sales 1.6
dtype: float64

localhost:8888/notebooks/lab06.ipynb# 2/9
5/7/2019 lab06

In [9]:

data.mean()

Out[9]:

TV 147.0425
Radio 23.2640
Newspaper 30.5540
Sales 14.0225
dtype: float64

In [10]:

data.count()

Out[10]:

TV 200
Radio 200
Newspaper 200
Sales 200
dtype: int64

In [11]:

#data.count

In [12]:

data.columns.values

Out[12]:

array(['TV', 'Radio', 'Newspaper', 'Sales'], dtype=object)

In [13]:

data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 200 entries, 1 to 200
Data columns (total 4 columns):
TV 200 non-null float64
Radio 200 non-null float64
Newspaper 200 non-null float64
Sales 200 non-null float64
dtypes: float64(4)
memory usage: 7.8 KB

localhost:8888/notebooks/lab06.ipynb# 3/9
5/7/2019 lab06

In [14]:

data.dtypes

Out[14]:

TV float64
Radio float64
Newspaper float64
Sales float64
dtype: object

In [15]:

data.ndim

Out[15]:

In [16]:

data.size

Out[16]:

800

In [17]:

data.values

Out[17]:

array([[230.1, 37.8, 69.2, 22.1],

[ 44.5, 39.3, 45.1, 10.4],
[ 17.2, 45.9, 69.3, 9.3],
[151.5, 41.3, 58.5, 18.5],
[180.8, 10.8, 58.4, 12.9],
[ 8.7, 48.9, 75. , 7.2],
[ 57.5, 32.8, 23.5, 11.8],
[120.2, 19.6, 11.6, 13.2],
[ 8.6, 2.1, 1. , 4.8],
[199.8, 2.6, 21.2, 10.6],
[ 66.1, 5.8, 24.2, 8.6],
[214.7, 24. , 4. , 17.4],
[ 23.8, 35.1, 65.9, 9.2],
[ 97.5, 7.6, 7.2, 9.7],
[204.1, 32.9, 46. , 19. ],
[195.4, 47.7, 52.9, 22.4],
[ 67.8, 36.6, 114. , 12.5],

localhost:8888/notebooks/lab06.ipynb# 4/9
5/7/2019 lab06

In [18]:

data.empty

Out[18]:

False

Task no 2 compare "conda" vs "pip"

Pip is conatain only the pakages of python but conda install pakages which may contain software of any
language. pip installation is easy with one command but conda installation is complex. Conda also include pip
but pip can't include conda. Pip compiles from source and conda install libraries. conda create and manage
multiple envirement but pip can't.

Task no 3

In [19]:

fig,ax=plt.subplots(1,4,figsize=(15, 3))
data['Radio'].plot(kind="hist", ax=ax[0],color ='blue',alpha=0.6)
data['Sales'].plot(kind="hist", ax=ax[1],color='green',alpha=0.6)
data['Newspaper'].plot( kind="hist",ax=ax[2],color='cyan',alpha=0.6)
data['TV'].plot( kind="hist",ax=ax[3],color='red',alpha=0.6)

Out[19]:

<matplotlib.axes._subplots.AxesSubplot at 0x20e1e4f78d0>

In [20]:

X = data[['TV', 'Radio', 'Newspaper']]

In [21]:

y = data['Sales']

localhost:8888/notebooks/lab06.ipynb# 5/9
5/7/2019 lab06

In [22]:

from sklearn.model_selection import train_test_split

In [23]:

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

In [24]:

from sklearn.linear_model import LinearRegression

In [25]:

lr = LinearRegression()

In [26]:

lr.fit(X_train, y_train)

Out[26]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [27]:

print(lr.intercept_)

2.8769666223179318

In [28]:

print(lr.coef_)

[0.04656457 0.17915812 0.00345046]

In [29]:

y_pred = lr.predict(X_test)

localhost:8888/notebooks/lab06.ipynb# 6/9
5/7/2019 lab06

In [30]:

# Mean Absolute Error

from sklearn import metrics
print(metrics.mean_absolute_error(y_test, y_pred))

1.0668917082595215

What occurs inside a linear regression model?

Compare it with KNN classifier.
Inside a linear regression,draw a line whose distance from all point is small and distance from the origen is
large.In dataset, multiple lines are draw during regression, by changing the value of intersept and coffecient. By
repeating this process, finally one line is selected whose distance from all points is small. In kneignbour
classifier, different classes are made according to their similarities. when new data is come, its distance is
calculated from it k number of neighbour and it consider a part of shortest distance class.

Contribution

How Sales are related with other variables using scatter plot

In [31]:

sns.pairplot(data, x_vars=['TV', 'Newspaper', 'Radio'], y_vars='Sales', height=4, aspect=1,

plt.show()

Numerical data distribution

All the types of our data from our dataset and take only the numerical ones

localhost:8888/notebooks/lab06.ipynb# 7/9
5/7/2019 lab06

In [32]:

list(set(data.dtypes.tolist()))

Out[32]:

[dtype('float64')]

In [33]:

data = data.select_dtypes(include = ['float64', 'int64'])

data.head()

Out[33]:

TV Radio Newspaper Sales

1 230.1 37.8 69.2 22.1

2 44.5 39.3 45.1 10.4

3 17.2 45.9 69.3 9.3

4 151.5 41.3 58.5 18.5

5 180.8 10.8 58.4 12.9

In [34]:

data.hist(figsize=(6, 6), bins=50, xlabelsize=8, ylabelsize=8);

# ; avoid having the matplotlib verbose informations

localhost:8888/notebooks/lab06.ipynb# 8/9

Dschx60v Help en
No ratings yet
Dschx60v Help en
172 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Machine Exercise 3
No ratings yet
Machine Exercise 3
22 pages
Vivid T9 Vet - Vivid T8 Vet Basic Service Manual - SM - 5945151-1EN - 2
No ratings yet
Vivid T9 Vet - Vivid T8 Vet Basic Service Manual - SM - 5945151-1EN - 2
433 pages
SourceInsight4UserGuide PDF
No ratings yet
SourceInsight4UserGuide PDF
436 pages
12.1 - 12.9 Introduction To Modules - Libraries For DataScience
No ratings yet
12.1 - 12.9 Introduction To Modules - Libraries For DataScience
54 pages
4.1 Data Retrieval and Preprocessing of Python
No ratings yet
4.1 Data Retrieval and Preprocessing of Python
57 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
08-Huawei Cyber Security Guide For Partners
100% (1)
08-Huawei Cyber Security Guide For Partners
30 pages
PR Final File
No ratings yet
PR Final File
70 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Ge - Computer Science Data Analysis
No ratings yet
Ge - Computer Science Data Analysis
16 pages
Python Libraries - 2025 (1) Python Libraries - 2025 (1) Python Libraries - 2025
No ratings yet
Python Libraries - 2025 (1) Python Libraries - 2025 (1) Python Libraries - 2025
9 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Microsoft Windows Shortcut Keys List: Advertisement
No ratings yet
Microsoft Windows Shortcut Keys List: Advertisement
5 pages
Nitya Practical File Class Xii 2023-2024
No ratings yet
Nitya Practical File Class Xii 2023-2024
41 pages
DXE 24gksmknvj
No ratings yet
DXE 24gksmknvj
16 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Batch Weighing Machine: A Report On
No ratings yet
Batch Weighing Machine: A Report On
20 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Electrical Part List
No ratings yet
Electrical Part List
6 pages
DLL Etech Q1 W4
No ratings yet
DLL Etech Q1 W4
3 pages
تمرین اول
No ratings yet
تمرین اول
2 pages
Question Bank2 1722502558363
No ratings yet
Question Bank2 1722502558363
6 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Worksheet Class Xii Ip
No ratings yet
Worksheet Class Xii Ip
59 pages
Clema Entrelec
No ratings yet
Clema Entrelec
35 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Self Study Assignment Python II
No ratings yet
Self Study Assignment Python II
4 pages
PYQ Data Analysis and Visualisation Using Python GE May 2024
No ratings yet
PYQ Data Analysis and Visualisation Using Python GE May 2024
6 pages
R12 Crossdock Training Documentation
67% (3)
R12 Crossdock Training Documentation
95 pages
Designing V-T Amplifiers
100% (4)
Designing V-T Amplifiers
318 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Case Study: Honeywell Video Technology Helps Protect Largest Privately-Funded Construction Project in U.S. History
No ratings yet
Case Study: Honeywell Video Technology Helps Protect Largest Privately-Funded Construction Project in U.S. History
2 pages
ControlLogix 5580 and GuardLogix 5580 Controllers Manual
No ratings yet
ControlLogix 5580 and GuardLogix 5580 Controllers Manual
202 pages
Introduction To Python (Part III)
No ratings yet
Introduction To Python (Part III)
29 pages
ML Manual
No ratings yet
ML Manual
21 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Assignment 01
No ratings yet
Assignment 01
7 pages
Unit 5
No ratings yet
Unit 5
28 pages
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
No ratings yet
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
19 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
IP Book 12 Question Bank
No ratings yet
IP Book 12 Question Bank
20 pages
Rajasthan Geography Notes (WWW - GkNotesPDF.com)
No ratings yet
Rajasthan Geography Notes (WWW - GkNotesPDF.com)
102 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Ip Study
No ratings yet
Ip Study
18 pages
Sap Erp Price Hike Analysis
No ratings yet
Sap Erp Price Hike Analysis
14 pages
Software Project Planning (Ch.5)
No ratings yet
Software Project Planning (Ch.5)
25 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Miahayub MR
No ratings yet
Miahayub MR
1 page
The Art of Rigging
No ratings yet
The Art of Rigging
187 pages
Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
Machine Learning Lab File: Submitted To: Submitted by
9 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
12th Practical
No ratings yet
12th Practical
21 pages
QP - Ip PB19-01QP
No ratings yet
QP - Ip PB19-01QP
7 pages
Orbit Final TP
No ratings yet
Orbit Final TP
4 pages
EDA Document
No ratings yet
EDA Document
13 pages
Combining Factors
No ratings yet
Combining Factors
18 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Ch5 Perception and Individual Decision Making
No ratings yet
Ch5 Perception and Individual Decision Making
29 pages
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
SPL LL90 - 3: Radial Smart Laser
No ratings yet
SPL LL90 - 3: Radial Smart Laser
12 pages
Holidays Homework - 20231204 - 195647 - 0000
No ratings yet
Holidays Homework - 20231204 - 195647 - 0000
15 pages
C 5
100% (1)
C 5
18 pages
Cisa Journal Audit Executives Review
No ratings yet
Cisa Journal Audit Executives Review
3 pages
Foundations of Engineering Economy
No ratings yet
Foundations of Engineering Economy
29 pages
Study Material For XII Computer Science On: Data Visualization Using Pyplot
No ratings yet
Study Material For XII Computer Science On: Data Visualization Using Pyplot
22 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Learning Goals: Normalization
No ratings yet
Learning Goals: Normalization
8 pages
Import Pandas As PD S1 PD - Series ( (1,2,3,4) ) S2 PD - Series ( (7,8) ) S3 S1 + S2 Print (S3.size)
No ratings yet
Import Pandas As PD S1 PD - Series ( (1,2,3,4) ) S2 PD - Series ( (7,8) ) S3 S1 + S2 Print (S3.size)
6 pages
IP MODEL 1 QST Set 2
No ratings yet
IP MODEL 1 QST Set 2
4 pages
Five Steps To Install Localhost Joomla Template
No ratings yet
Five Steps To Install Localhost Joomla Template
3 pages
مﺎﻄﻧ ماﺪﺨﺘﺳا SPSS ﺔﻴﺋﺎﺼﺣﻹا تﺎﻧﺎﻴﺒﻟا ﻞﻴﻠﺤﺗ ﻲﻓ: · April 2002
No ratings yet
مﺎﻄﻧ ماﺪﺨﺘﺳا SPSS ﺔﻴﺋﺎﺼﺣﻹا تﺎﻧﺎﻴﺒﻟا ﻞﻴﻠﺤﺗ ﻲﻓ: · April 2002
16 pages
Distributed Pervasive Systems
No ratings yet
Distributed Pervasive Systems
6 pages
Machine Learning - Manual
No ratings yet
Machine Learning - Manual
32 pages
Distributed Pervasive Systems
No ratings yet
Distributed Pervasive Systems
8 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Practical List Questions-1
No ratings yet
Practical List Questions-1
6 pages
Ccs337 - Cognitive Science Laboratory Lab Manual Record
No ratings yet
Ccs337 - Cognitive Science Laboratory Lab Manual Record
27 pages
Whats Stopping You Why Smart People Dont
No ratings yet
Whats Stopping You Why Smart People Dont
4 pages
FDD LTE Handover Optimization - 切换
No ratings yet
FDD LTE Handover Optimization - 切换
35 pages
Welcome and General Information (Mod. 1 2022)
No ratings yet
Welcome and General Information (Mod. 1 2022)
6 pages
Mahesh VM Resume New
No ratings yet
Mahesh VM Resume New
2 pages
Parallel Algorithm Models
No ratings yet
Parallel Algorithm Models
21 pages
Prof. Rajasshrie Pillai
No ratings yet
Prof. Rajasshrie Pillai
38 pages
Hyperion Essbase Interview Questions
No ratings yet
Hyperion Essbase Interview Questions
7 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet

Load Dataset: Import As

Uploaded by

Load Dataset: Import As

Uploaded by

5/7/2019 lab06

import seaborn as sns

data=pd.read_csv("D:\8 semester\Data warehousing and data mining\Labs\LAB6\Advertising.csv"

TV Radio Newspaper Sales

count 200.000000 200.000000 200.000000 200.000000

mean 147.042500 23.264000 30.554000 14.022500

std 85.854236 14.846809 21.778621 5.217457

min 0.700000 0.000000 0.300000 1.600000

25% 74.375000 9.975000 12.750000 10.375000

50% 149.750000 22.900000 25.750000 12.900000

75% 218.825000 36.525000 45.100000 17.400000

max 296.400000 49.600000 114.000000 27.000000

array(['TV', 'Radio', 'Newspaper', 'Sales'], dtype=object)

array([[230.1, 37.8, 69.2, 22.1],

Task no 2 compare "conda" vs "pip"

X = data[['TV', 'Radio', 'Newspaper']]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

from sklearn.linear_model import LinearRegression

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

[0.04656457 0.17915812 0.00345046]

# Mean Absolute Error

What occurs inside a linear regression model?

sns.pairplot(data, x_vars=['TV', 'Newspaper', 'Radio'], y_vars='Sales', height=4, aspect=1,

Numerical data distribution

data = data.select_dtypes(include = ['float64', 'int64'])

TV Radio Newspaper Sales

1 230.1 37.8 69.2 22.1

2 44.5 39.3 45.1 10.4

3 17.2 45.9 69.3 9.3

4 151.5 41.3 58.5 18.5

5 180.8 10.8 58.4 12.9

data.hist(figsize=(6, 6), bins=50, xlabelsize=8, ylabelsize=8);

You might also like