EDA Assignment Day 14.ipynb

The document outlines an exploratory data analysis (EDA) process using a Cars dataset from Kaggle to predict car prices based on various features. It includes steps such as importing libraries, loading the dataset, checking and cleaning data by dropping irrelevant columns, renaming features, removing duplicates, handling missing values, and identifying outliers. The goal is to prepare the dataset for accurate model training by ensuring the data is clean and relevant.

Uploaded by

nikhil14nettechindia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views19 pages

EDA Assignment Day 14.ipynb

Uploaded by

nikhil14nettechindia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 19

{

"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "DgE0o3YHBw-n"
},
"source": [
"<center> <h1 style=\"background-color:orange; color:white\"> Exploratory
Data Analysis </h1></center>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "w6lzj4kjDJWu"
},
"source": [
"# `Problem Statement:`\n",
"We have used Cars dataset from kaggle with features including make, model,
year, engine, and other properties of the car used to predict its price."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JpZPe8JBBw-y"
},
"source": [
"## Ìmporting the necessary libraries`\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "dl9ocdwHBw-2"
},
"outputs": [],
"source": [
"# import pandas as pd\n",
"# import numpy as np\n",
"# import seaborn as sns #visualisation\n",
"# import matplotlib.pyplot as plt #visualisation\n",
"# %matplotlib inline \n",
"# sns.set(color_codes=True)\n",
"# from scipy import stats\n",
"# import warnings\n",
"# warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "K5JcLAN2Bw-7"
},
"source": [
"## `Load the dataset into dataframe`"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"id": "Yc-ChymZBw_A"
},
"outputs": [],
"source": [
"## load the csv file \n",
"# df = "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "ZUd5Fl7jBw_C",
"outputId": "79c6280b-0909-4245-a805-9607cb59effa"
},
"outputs": [],
"source": [
"## print the head of the dataframe\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Gi3_9poxrSjE"
},
"source": [
"Now we observe the each features present in the dataset. \n",
"\n",
" `Make:` The Make feature is the company name of the Car. \n",
"`Model:` The Model feature is the model or different version of Car
models. \n",
"`Year:` The year describes the model has been launched. \n",
"Èngine Fuel Type:` It defines the Fuel type of the car model. \n",
"Èngine HP:` It's say the Horsepower that refers to the power an engine
produces. \n",
"Èngine Cylinders:` It define the nos of cylinders in present in the
engine. \n",
"`Transmission Type:` It is the type of feature that describe about the car
transmission type i.e Mannual or automatic. \n",
"`Driven_Wheels:` The type of wheel drive. \n",
"`No of doors:` It defined nos of doors present in the car. \n",
"`Market Category:` This features tells about the type of car or which category
the car belongs. \n",
"`Vehicle Size:` It's say about the about car size. \n",
"`Vehicle Style:` The feature is all about the style that belongs to car. \
n",
"`highway MPG:` The average a car will get while driving on an open stretch of
road without stopping or starting, typically at a higher speed. \n",
"`city mpg:` City MPG refers to driving with occasional stopping and
braking. \n",
"`Popularity:` It can refered to rating of that car or popularity of car. \
n",
"`MSRP:` The price of that car.\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VQ9qn4PaBw_i"
},
"source": [
"## `Check the datatypes`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "OPozGraJBw_l",
"outputId": "b72042d2-5913-43d8-c78a-2101feea6294"
},
"outputs": [],
"source": [
"# Get the datatypes of each columns number of records in each column.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gFyzAJLIBw_n"
},
"source": [
"## `Dropping irrevalent columns`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZZ863Z4jBw_p"
},
"source": [
"If we consider all columns present in the dataset then unneccessary columns
will impact on the model's accuracy. \n",
"Not all the columns are important to us in the given dataframe, and hence we
would drop the columns that are irrevalent to us. It would reflect our model's
accucary so we need to drop them. Otherwise it will affect our model.\n",
"\n",
"\n",
"The list cols_to_drop contains the names of the cols that are irrevalent, drop
all these cols from the dataframe.\n",
"\n",
"\n",
"`cols_to_drop = [\"Engine Fuel Type\", \"Market Category\", \"Vehicle Style\",
\"Popularity\", \"Number of Doors\", \"Vehicle Size\"]`\n",
"\n",
"These features are not neccessary to obtain the model's accucary. It does not
contain any relevant information in the dataset. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "oW5t3xE-Bw_p"
},
"outputs": [],
"source": [
"# initialise cols_to_drop\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "RJvrJS9-Bw_r",
"outputId": "69709257-f66a-41b3-f3e8-0cced7dbb28b"
},
"outputs": [],
"source": [
"# drop the irrevalent cols and print the head of the dataframe\n",
"# df = \n",
"\n",
"# print df head\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Jg4y0BS7Bw_s"
},
"source": [
"## `Renaming the columns`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aDciVmlRBw_t"
},
"source": [
"Now, Its time for renaming the feature to useful feature name. It will help to
use them in model training purpose. \n",
"\n",
"We have already dropped the unneccesary columns, and now we are left with
useful columns. One extra thing that we would do is to rename the columns such that
the name clearly represents the essence of the column.\n",
"\n",
"The given dict represents (in key value pair) the previous name, and the new
name for the dataframe columns"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "LPr2b3NPBw_u"
},
"outputs": [],
"source": [
"# rename cols \n",
"# rename_cols = \n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "YpY0qGvIBw_v"
},
"outputs": [],
"source": [
"# use a pandas function to rename the current columns - \n",
"# df = \n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"id": "3N1i99nYBw_v",
"outputId": "d4c5d762-55ef-4566-c6d3-374cc8f9160e"
},
"outputs": [],
"source": [
"# Print the head of the dataframe\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UgNExPnZBw_w"
},
"source": [
"## `Dropping the duplicate rows`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ozWzkdrSBw_x"
},
"source": [
"There are many rows in the dataframe which are duplicate, and hence they are
just repeating the information. Its better if we remove these rows as they don't
add any value to the dataframe. \n",
"\n",
"For given data, we would like to see how many rows were duplicates. For this,
we will count the number of rows, remove the dublicated rows, and again count the
number of rows."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"id": "drvQvYs2Bw_x",
"outputId": "a7e6f707-fab9-47f8-86c4-9cbd9f1b110f"
},
"outputs": [],
"source": [
"# number of rows before removing duplicated rows\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"id": "LvwZZUruBw_x",
"outputId": "617daeb0-f1e8-46dd-9623-34dd5b4d3bdf"
},
"outputs": [],
"source": [
"# drop the duplicated rows\n",
"# df = \n",
"\n",
"# print head of df\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"id": "Gg4hjGakBw_y",
"outputId": "a0f3f48c-7f23-4f2b-911b-57529b32663b"
},
"outputs": [],
"source": [
"# Count Number of rows after deleting duplicated rows\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Q06o1NwrBw_z"
},
"source": [
"## `Dropping the null or missing values`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ddf1mIspBw_z"
},
"source": [
"Missing values are usually represented in the form of Nan or null or None in
the dataset.\n",
"\n",
"Finding whether we have null values in the data is by using the isnull()
function.\n",
"\n",
"There are many values which are missing, in pandas dataframe these values are
reffered to as np.nan. We want to deal with these values beause we can't use nan
values to train models. Either we can remove them to apply some strategy to replace
them with other values.\n",
"\n",
"To keep things simple we will be dropping nan values"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"id": "s0MtVaYABw_z",
"outputId": "61fbc5cc-d21a-453c-8bf5-8ba42a7f553e"
},
"outputs": [],
"source": [
"# check for nan values in each columns\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "58N8lvWRlIVT"
},
"source": [
"As we can see that the HP and Cylinders have null values of 69 and 30. As
these null values will impact on models' accuracy. So to avoid the impact we will
drop the these values. As these values are small camparing with dataset that will
not impact any major affect on model accuracy so we will drop the values."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"id": "TObFlN7xBw_0"
},
"outputs": [],
"source": [
"# drop missing values\n",
"# df = \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "q3tsOjvcBw_0",
"outputId": "067469f3-04d9-4894-f1e2-7ee4132a1d79"
},
"outputs": [],
"source": [
"# Make sure that missing values are removed\n",
"# check number of nan values in each col again\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"id": "N0Ge8_yfBw_1",
"outputId": "88459604-4bba-434c-d5fb-6e81910b4b50"
},
"outputs": [],
"source": [
"#Describe statistics of df\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qBk8SZ29Bw_1"
},
"source": [
"## `Removing outliers`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tn5lLccGBw_2"
},
"source": [
"Sometimes a dataset can contain extreme values that are outside the range of
what is expected and unlike the other data. These are called outliers and often
machine learning modeling and model skill in general can be improved by
understanding and even removing these outlier values."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"id": "2QnFqFbyBw_3",
"outputId": "b0a85d54-e5d7-4943-aec5-854695406cac"
},
"outputs": [],
"source": [
"## Plot a boxplot for 'Price' column in dataset. \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qCpI41VqBci9"
},
"source": [
"### **Òbservation:`** \n",
"\n",
"Here as you see that we got some values near to 1.5 and 2.0 . So these values
are called outliers. Because there are away from the normal values.\n",
"Now we have detect the outliers of the feature of Price. Similarly we will
checking of anothers features."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"id": "lvDBhe4jBw_3",
"outputId": "6acf12e7-757f-4cbc-9020-d1d6a6e40564"
},
"outputs": [],
"source": [
"## PLot a boxplot for 'HP' columns in dataset\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-YWNqTn7GI-4"
},
"source": [
"### **Òbservation:`** \n",
"Here boxplots show the proper distribution of of 25 percentile and 75
percentile of the feature of HP."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "S9tucB8ABw_4"
},
"source": [
"print all the columns which are of int or float datatype in df. \n",
"\n",
"Hint: Use loc with condition"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"id": "4uEumv0uBw_4",
"outputId": "c0c5515e-96dc-4e40-ca4b-e83c76ce7fad"
},
"outputs": [],
"source": [
"# print all the columns which are of int or float datatype in df.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pQOOqmvEBw_5"
},
"source": [
"### `Save the column names of the above output in variable list named 'l'`\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"id": "PgJz8dtQBw_5"
},
"outputs": [],
"source": [
"# save column names of the above output in variable list\n",
"# l=\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3iAhdSFPBw_5"
},
"source": [
"## **Òutliers removal techniques - IQR Method`**\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4u67f7AzBw_6"
},
"source": [
"**Here comes cool Fact for you!**\n",
"\n",
"IQR is the first quartile subtracted from the third quartile; these quartiles
can be clearly seen on a box plot on the data."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eMW1PTL_Bw_6"
},
"source": [
"- Calculate IQR and give a suitable threshold to remove the outliers and save
this new dataframe into df2.\n",
"\n",
"Let us help you to decide threshold: Outliers in this case are defined as the
observations that are below (Q1 − 1.5x IQR) or above (Q3 + 1.5x IQR)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"id": "G5EHp8JxBw_6"
},
"outputs": [],
"source": [
"## define Q1 and Q2\n",
"# Q1 = \n",
"# Q3 = \n",
"\n",
"# # define IQR (interquantile range) \n",
"# IQR = \n",
"\n",
"# # define df2 after removing outliers\n",
"# df2 = \n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"# find the shape of df & df2\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"id": "Ok1cLuSEBxAB",
"outputId": "40c55ded-4804-4ecb-b6ab-9795033207dd"
},
"outputs": [],
"source": [
"# find unique values and there counts in each column in df using value counts
function.\n",
"\n",
"# for i in df.columns:\n",
"# print (\"--------------- %s ----------------\" % i)\n",
"# # code here"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zQ0GaJ_kBxAB"
},
"source": [
"## `Visualising Univariate Distributions`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "H0PQlhWEBxAC"
},
"source": [
"We will use seaborn library to visualize eye catchy univariate plots. \n",
"\n",
"Do you know? you have just now already explored one univariate plot. guess
which one? Yeah its box plot.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SnzpC8JABxAC"
},
"source": [
"### `Histogram & Density Plots`\n",
"\n",
"Histograms and density plots show the frequency of a numeric variable along
the y-axis, and the value along the x-axis. The ```sns.distplot()``` function plots
a density curve. Notice that this is aesthetically better than vanilla
```matplotlib```."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"id": "-uqWiICoBxAC",
"outputId": "47e45800-1103-40e0-e407-93977635ea53"
},
"outputs": [],
"source": [
"#ploting distplot for variable HP\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1GSaLnCxiWHc"
},
"source": [
"### **Òbservation:`**\n",
"We plot the Histogram of feature HP with help of distplot in seaborn. \n",
"In this graph we can see that there is max values near at 200. similary we
have also the 2nd highest value near 400 and so on. \n",
"It represents the overall distribution of continuous data variables. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-P7Xup3vBxAD"
},
"source": [
"Since seaborn uses matplotlib behind the scenes, the usual matplotlib
functions work well with seaborn. For example, you can use subplots to plot
multiple univariate distributions.\n",
"- Hint: use matplotlib subplot function"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"id": "CdlvvfvfBxAD",
"outputId": "23484911-5553-41bd-cdf6-8bd38a526ce7"
},
"outputs": [],
"source": [
"# plot all the columns present in list l together using subplot of dimention
(2,3).\n",
"\n",
"\n",
"# c=0\n",
"# plt.figure(figsize=(15,10))\n",
"# for i in l:\n",
"# # code here\n",
"# plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ziOcNh-sBxAD"
},
"source": [
"## `Bar Chart Plots`\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lF54VPLRBxAE"
},
"source": [
"Plot a histogram depicting the make in X axis and number of cars in y axis.
 "
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"id": "d1gpl5LxBxAE",
"outputId": "726eae7f-c413-456a-e989-960d43a9c89b"
},
"outputs": [],
"source": [
"# plt.figure(figsize = (12,8))\n",
"\n",
"# use nlargest and then .plot to get bar plot like below output\n",
"# Plot Title, X & Y label\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "N-8CXMKVkn-I"
},
"source": [
"### **Òbservation:`**\n",
"In this plot we can see that we have plot the bar plot with the cars model and
nos. of cars."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xk2s0-9UBxAE"
},
"source": [
"### `Count Plot`\n",
"A count plot can be thought of as a histogram across a categorical, instead of
quantitative, variable.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OmT9X5aBBxAF"
},
"source": [
" Plot a countplot for a variable Transmission vertically with hue as Drive
mode"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"id": "UyYYXn36BxAF",
"outputId": "24b59852-4612-4065-cf6e-29b02c259565"
},
"outputs": [],
"source": [
"# plt.figure(figsize=(15,5))\n",
"\n",
"# plot countplot on transmission and drive mode\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9I0XvhdTla4h"
},
"source": [
"### **Òbservation:`**\n",
"In this count plot, We have plot the feature of Transmission with help of
hue. \n",
"We can see that the the nos of count and the transmission type and automated
manual is plotted. Drive mode as been given with help of hue. \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zDHMfUpNBxAF"
},
"source": [
"# `Visualising Bivariate Distributions`\n",
"\n",
"\n",
"Bivariate distributions are simply two univariate distributions plotted on x
and y axes respectively. They help you observe the relationship between the two
variables.\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DQxcdTZsBxAG"
},
"source": [
"## `Scatter Plots`\n",
"Scatterplots are used to find the correlation between two continuos
variables.\n",
"\n",
"Using scatterplot find the correlation between 'HP' and 'Price' column of the
data. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"id": "L5zvuQD8BxAG",
"outputId": "6cc2ef16-7039-4eaa-df3f-7bdd6b4e5c80"
},
"outputs": [],
"source": [
"## Your code here - \n",
"# fig, ax = plt.subplots(figsize=(10,6))\n",
"\n",
"# plot scatterplot on hp and price\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kPLqA4B6o92w"
},
"source": [
"### **Òbservation:`** \n",
"It is a type of plot or mathematical diagram using Cartesian coordinates to
display values for typically two variables for a set of data. \n",
"We have plot the scatter plot with x axis as HP and y axis as Price. \n",
"The data points between the features should be same either wise it give
errors. \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HEUOARh5BxAN"
},
"source": [
"## `Plotting Aggregated Values across Categories`\n",
"\n",
"\n",
"### `Bar Plots - Mean, Median and Count Plots`\n",
"\n",
"\n",
"\n",
"Bar plots are used to **display aggregated values** of a variable, rather than
entire distributions. This is especially useful when you have a lot of data which
is difficult to visualise in a single figure. \n",
"\n",
"For example, say you want to visualise and *compare the Price across
Cylinders*. The ```sns.barplot()``` function can be used to do that.\n"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"id": "dTSOpY5jBxAN",
"outputId": "13ca613f-edab-42d8-819d-84cc5b566ee2"
},
"outputs": [],
"source": [
"# bar plot with default statistic=mean between Cylinder and Price\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rFd9QisOBxAO"
},
"source": [
"### **Òbservation:`** \n",
"By default, seaborn plots the mean value across categories, though you can
plot the count, median, sum etc. \n",
"Also, barplot computes and shows the confidence interval of the mean as well.\
n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "od8Fuqm_BxAO"
},
"source": [
"## `When you want to visualise having a large number of categories, it is
helpful to plot the categories across the y-axis.`\n",
"\n",
"### `Let's now drill down into Transmission sub categories.`"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"id": "lJnPU4KtBxAP",
"outputId": "2dfa446f-874f-435f-dba0-a17f30f34718"
},
"outputs": [],
"source": [
"# Plotting categorical variable Transmission across the y-axis\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Q5Y7xg3ZBxAQ"
},
"source": [
"These plots looks beutiful isn't it? In Data Analyst life such charts are
there unavoidable friend.:)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QX2szH0MBxAQ"
},
"source": [
"# `Multivariate Plots`\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_wiepyZEBxAT"
},
"source": [
"## `Heatmaps`\n",
"\n",
"\n",
"A heat map is a two-dimensional representation of information with the help of
colors. Heat maps can help the user visualize simple or complex information"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VslkQJNWBxAU"
},
"source": [
"Using heatmaps plot the correlation between the features present in the
dataset."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"id": "DWpcsVJCBxAU",
"outputId": "dae92aaa-5a7f-4acf-8082-03555340ee16"
},
"outputs": [],
"source": [
"#find the correlation of features of the data \n",
"# corr = \n",
"\n",
"# print corr\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"id": "rDqYeuI1BxAW",
"outputId": "e20f0d9a-e76f-4f59-8ebb-11047156049d"
},
"outputs": [],
"source": [
"# Using the correlated df, plot the heatmap \n",
"# set cmap = 'BrBG', annot = True - to get the same graph as shown below \n",
"# set size of graph = (12,8)\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-uMl7P-DBxAX"
},
"source": [
"### **Òbservation:`** \n",
"A heatmap contains values representing various shades of the same colour for
each value to be plotted. Usually the darker shades of the chart represent higher
values than the lighter shade. For a very different value a completely different
colour can also be used.\n",
"\n",
"\n",
"The above heatmap plot shows correlation between various variables in the
colored scale of -1 to 1. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

Pandas Exercises - Ipynb
No ratings yet
Pandas Exercises - Ipynb
11 pages
Jupyter & Pandas Guide
No ratings yet
Jupyter & Pandas Guide
104 pages
Smart Factory Energy Prediction - Ipynb
No ratings yet
Smart Factory Energy Prediction - Ipynb
355 pages
2 Linear Regression-Homeprices - Ipynb
No ratings yet
2 Linear Regression-Homeprices - Ipynb
62 pages
Assignment 4
No ratings yet
Assignment 4
216 pages
Code Refactoring for Data Scientists
No ratings yet
Code Refactoring for Data Scientists
4 pages
Australian Clothing Sales Q4 Analysis
No ratings yet
Australian Clothing Sales Q4 Analysis
1,230 pages
Problem Statement Is To Predict Price Column Based On Data With 24 Columns With Over 200 Data Entries Using Linear Regression
No ratings yet
Problem Statement Is To Predict Price Column Based On Data With 24 Columns With Over 200 Data Entries Using Linear Regression
5 pages
EDA Withoutcode
No ratings yet
EDA Withoutcode
36 pages
2 3-SVM Ipynb
No ratings yet
2 3-SVM Ipynb
111 pages
Chapter 04 Other Python Data Structures - Ipynb
No ratings yet
Chapter 04 Other Python Data Structures - Ipynb
62 pages
RegresiÃ N Lineal Con Python - Ipynb
No ratings yet
RegresiÃ N Lineal Con Python - Ipynb
83 pages
1 4-EDA Ipynb
No ratings yet
1 4-EDA Ipynb
12 pages
Sesion1 Ipynb
No ratings yet
Sesion1 Ipynb
13 pages
1 Linear Regression - Ipynb
No ratings yet
1 Linear Regression - Ipynb
66 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
RecommendationSystem - R5 - Project7 - Amazon Product - Ipynb
No ratings yet
RecommendationSystem - R5 - Project7 - Amazon Product - Ipynb
112 pages
Fake News Classifier - Ipynb 2
No ratings yet
Fake News Classifier - Ipynb 2
39 pages
CropYield Prediction Checkpoint - Ipynb
No ratings yet
CropYield Prediction Checkpoint - Ipynb
74 pages
1 Optimization Avoid Collect
No ratings yet
1 Optimization Avoid Collect
3 pages
Pandas
No ratings yet
Pandas
91 pages
Exercise - First Machine Learning Model
No ratings yet
Exercise - First Machine Learning Model
2 pages
Data Science Cohort 1 Assignment 1.ipynb
No ratings yet
Data Science Cohort 1 Assignment 1.ipynb
53 pages
Tutorial Ipynb
No ratings yet
Tutorial Ipynb
160 pages
Credit Card Fraud Detection V29.Ipynb
No ratings yet
Credit Card Fraud Detection V29.Ipynb
976 pages
Code
No ratings yet
Code
7 pages
Starter-Chasedb1-Ca720eec-4 (1) .Ipynb - File
No ratings yet
Starter-Chasedb1-Ca720eec-4 (1) .Ipynb - File
6 pages
Data Analysis Report
No ratings yet
Data Analysis Report
74 pages
List 2
No ratings yet
List 2
7 pages
Data Visualization Code
No ratings yet
Data Visualization Code
123 pages
Data Analysis: Data Preparation
No ratings yet
Data Analysis: Data Preparation
9 pages
Naive Bayes
No ratings yet
Naive Bayes
7 pages
DMA Flask
No ratings yet
DMA Flask
14 pages
1 Notmnist - Ipynb
No ratings yet
1 Notmnist - Ipynb
15 pages
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
No ratings yet
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
25 pages
DL - Exam Nikhil Kumar
No ratings yet
DL - Exam Nikhil Kumar
62 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Cells
No ratings yet
Cells
7 pages
1st Notebook N
No ratings yet
1st Notebook N
18 pages
Untitled
No ratings yet
Untitled
2 pages
Data Cleaning for Missing Values
No ratings yet
Data Cleaning for Missing Values
92 pages
Rittik Kumar Naskar
No ratings yet
Rittik Kumar Naskar
19 pages
City Cycle Fuel Consumption 2024
No ratings yet
City Cycle Fuel Consumption 2024
23 pages
Kmeans
No ratings yet
Kmeans
31 pages
Data Structures Challenge (1) .Ipynb
No ratings yet
Data Structures Challenge (1) .Ipynb
3 pages
Untitledd
No ratings yet
Untitledd
3 pages
Python OOPs Assignment
No ratings yet
Python OOPs Assignment
19 pages
EDA Assignment
No ratings yet
EDA Assignment
16 pages
DA0101EN-Review-Introduction - Jupyter Notebook
No ratings yet
DA0101EN-Review-Introduction - Jupyter Notebook
8 pages
IntroRS Algebra Statistics - Ipynb
No ratings yet
IntroRS Algebra Statistics - Ipynb
430 pages
Assignment 05.ipynb
No ratings yet
Assignment 05.ipynb
21 pages
Style Ipynb
No ratings yet
Style Ipynb
42 pages
Exploratory Data Analysis BCG - Ipynb
No ratings yet
Exploratory Data Analysis BCG - Ipynb
273 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
List 3
No ratings yet
List 3
7 pages
Chatbot Agri
No ratings yet
Chatbot Agri
7 pages
Python 2 Sesion - Ipynb
No ratings yet
Python 2 Sesion - Ipynb
20 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
Tarea2 Ejercicio 1
No ratings yet
Tarea2 Ejercicio 1
3 pages
Past Tense Practice Guide
No ratings yet
Past Tense Practice Guide
2 pages
Film Directing Shot by Shot Visualizing From Concept To Screen Steven D Katzpdf PDF Free
No ratings yet
Film Directing Shot by Shot Visualizing From Concept To Screen Steven D Katzpdf PDF Free
69 pages
?lady Cons 2023 Question
No ratings yet
?lady Cons 2023 Question
21 pages
Wang Stein Handbook For Web
No ratings yet
Wang Stein Handbook For Web
92 pages
Genesys Softphone Deployment Guide
No ratings yet
Genesys Softphone Deployment Guide
44 pages
Quizizz - Circumference and Area of A Circle
No ratings yet
Quizizz - Circumference and Area of A Circle
6 pages
Breath
No ratings yet
Breath
2 pages
How To Exit The Reincarnation System
100% (4)
How To Exit The Reincarnation System
24 pages
No1. RMI - CMRT - 6.4
No ratings yet
No1. RMI - CMRT - 6.4
189 pages
TRANSLATION
No ratings yet
TRANSLATION
8 pages
Tenses Exercises
100% (1)
Tenses Exercises
2 pages
Mathematics Physics For Programmers 2nd Edition Danny Kodicek Download
No ratings yet
Mathematics Physics For Programmers 2nd Edition Danny Kodicek Download
36 pages
Bio F242 1430
No ratings yet
Bio F242 1430
3 pages
Research Methods in Media Studies: What Do We Mean by Research?
No ratings yet
Research Methods in Media Studies: What Do We Mean by Research?
9 pages
4 Grade Wind Instrument Lessons Stokes Elementary: Average Time Period, Actual Time For Different Students May Vary)
No ratings yet
4 Grade Wind Instrument Lessons Stokes Elementary: Average Time Period, Actual Time For Different Students May Vary)
8 pages
CP 340 Manual
No ratings yet
CP 340 Manual
212 pages
Mysideofthemountainunitplansandattachments
No ratings yet
Mysideofthemountainunitplansandattachments
62 pages
Lesson 6 Measures of Relative Dispersion Power Point
No ratings yet
Lesson 6 Measures of Relative Dispersion Power Point
8 pages
Get Jorge Semprún Memory S Long Voyage Iberian and Latin American Studies The Arts Literature and Identity Daniela Omlor Free All Chapters
100% (20)
Get Jorge Semprún Memory S Long Voyage Iberian and Latin American Studies The Arts Literature and Identity Daniela Omlor Free All Chapters
84 pages
Using Songs in The English Classroom
100% (1)
Using Songs in The English Classroom
10 pages
Introduction To DAA
No ratings yet
Introduction To DAA
140 pages
Cisco Preferred Arch For Enterprise Collab 12.0 CVD
No ratings yet
Cisco Preferred Arch For Enterprise Collab 12.0 CVD
402 pages
Intellipaat Python Certification Training Course Converted 3
No ratings yet
Intellipaat Python Certification Training Course Converted 3
11 pages
There Are Three Types of Assessment
No ratings yet
There Are Three Types of Assessment
6 pages
English Sanctified Bece Mock - May, 2024
No ratings yet
English Sanctified Bece Mock - May, 2024
11 pages
Macmillan 1 Revision Unit 8-9-10
No ratings yet
Macmillan 1 Revision Unit 8-9-10
22 pages
3is Module 1 Lecture Notes
No ratings yet
3is Module 1 Lecture Notes
4 pages
ARC4: A Lightweight Encryption Algorithm: Background
No ratings yet
ARC4: A Lightweight Encryption Algorithm: Background
6 pages
601words You Need To Know To Pass Your Exam 5th Edition
No ratings yet
601words You Need To Know To Pass Your Exam 5th Edition
18 pages
Yom Kippur Katan Siddur
No ratings yet
Yom Kippur Katan Siddur
72 pages

EDA Assignment Day 14.ipynb

Uploaded by

EDA Assignment Day 14.ipynb

Uploaded by

{

You might also like