0% found this document useful (0 votes)

28 views36 pages

Fods Lab

The document outlines the installation and usage of Python packages such as NumPy, SciPy, and Pandas, detailing commands for installation and basic operations like array creation, indexing, slicing, and DataFrame manipulation. It includes examples of creating arrays, performing array operations, and working with DataFrames in Pandas. Additionally, it explains data types in NumPy and provides examples of loading data into DataFrames.

Uploaded by

Sudarsan Assistant Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views36 pages

Fods Lab

Uploaded by

Sudarsan Assistant Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 36

PROCEDURE: (DOWNLOAD / INSTALLATION - windows)

1. Python version 3.7 to be installed prior.

2. To check whether Python exists,
Go to Search Type cmd Command prompt appears and type the commands as given below,
>>> python –V
//python --version
Python 3.7.8rc1
>>>python –m pip install numpy
(If already installed, a message will be prompted as “Requirement already satisfied” else
the installation would be continued and completed with success message).
>>>python –m pip install scipy
>>>python –m pip install statsmodels
>>>python –m pip install jupyter
>>>python –m pip install pandas

Note:

pip - Package Installer for Python is the de facto and recommended package management system
written in Python and is used to install and manage software packages. It connects to an online
repository of public packages, called the Python Package Index.

Package installation: NumPy, SciPy, Jupyter, StatsModel, Pandas

RESULT:
PROCEDURE:
If we have Python and PIP already installed on a system, then installation of NumPy is very easy.
Installation – NumPy package:

C:\Users\User>pip install numpy

Once NumPy is installed, import it in your applications by adding the import keyword:
>>> import numpy as np

ARRAY CREATION:

Single-dimensional NumPy Array:

>>> import numpy as np
>>> a=np.array([1,2,3])
>>> print(a)
[1,2,3]

Multi-dimensional Numpy Array:

>>> a=np.array([(1,2,3),(4,5,6)])
>>> print(a)
[[ 1 2 3]
[4 5 6]]
>>>import numpy as np
>>>arr = np.array([1, 2, 3, 4, 5])
>>>print(arr)
[1 2 3 4 5]
>>>print(type(arr))
<class 'numpy.ndarray'>
>>>a = np.array(42)
>>>b = np.array([1, 2, 3, 4, 5])
>>>c = np.array([[1, 2, 3], [4, 5, 6]])
>>d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
>>>print(a.ndim)
0
>>>print(b.ndim)
1
>>>print(c.ndim)
2
>>>print(d.ndim)
3

ARRAY INDEXING:

Array indexing is the same as accessing an array element. We can access an array element by
referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first
element has index 0, and the second has index 1 etc.

>>>import numpy as np
>>>arr = np.array([1, 2, 3, 4])
>>>print(arr[0])
1
>>>print(arr[2])
3
>>>print(arr[4])
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
print(arr[4])
IndexError: index 4 is out of bounds for axis 0 with size 4
>>>arr = np.array([1, 2, 3, 4])
>>>print(arr[2] + arr[3])
7
>>>arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
>>>print('2nd element on 1st row: ', arr[0, 1])
2nd element on 1st row: 2
>>>print('5th element on 2nd row: ', arr[1, 4])
5th element on 2nd row: 10
>>>print('Last element from 2nd dim: ', arr[1, -1])
Last element from 2nd dim: 10
>>>arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
>>>print(arr[0, 1, 2])
6

ARRAY SLICING:

Slicing in python means retrieving elements from one given index to another given index.

 We pass slice instead of index like this: [start:end].

 We can also define the step, like this: [start:end:step].

If we don't pass start its considered 0. If we don't pass end it’s considered length of array in that
dimension. If we don't pass step it’s considered 1.

>>>arr = np.array([1, 2, 3, 4, 5, 6, 7])

>>>print(arr[1:5])
[2 3 4 5]>>>print(arr[4:])
[5 6 7]
>>>print(arr[:4])
[1 2 3 4]
>>>print(arr[-3:-1])
[5 6]
>>>print(arr[1:5:2])
[2 4]
>>>print(arr[::2])
[1 3 5 7]
>>>print(arr[1, 1:4])
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
print(arr[1, 1:4])
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
>>>arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
>>>print(arr[0:2, 2])
[3 8]
>>>print(arr[0:2, 1:4])
[[2 3 4]
[7 8 9]]
ARRAY SHAPE / RESHAPE:
Array Shape - NumPy arrays have an attribute called shape that returns a tuple with each index
having the number of corresponding elements.

import numpy as np
>>>arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>>print(arr.shape)
(2, 4)

Array Reshape - By reshaping we can add or remove dimensions or change the number of elements
in each dimension.

#Converting a 1d array to 2d
>>>import numpy as np
>>>arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
>>>newarr = arr.reshape(4, 3)
>>>print(newarr)
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
ARRAY ITERATION:
Iterating means looping through elements one by one for specific number of times.
>>>import numpy as np
>>> arr = np.array([1, 2, 3])
>>> for x in arr:
print(x)
1
2
3ARRAY JOINING:
Joining is the process of combining contents of two or more arrays in a single array.
>>>import numpy as np
>>>arr1 = np.array([1, 2, 3])
>>>arr2 = np.array([4, 5, 6])
>>>arr = np.concatenate((arr1, arr2))
>>>print(arr)
[1 2 3 4 5 6]

ARRAY SPLITTING:
Splitting is the reverse process operation of Joining. Splitting breaks one array into multiple
subarrays.
>>>import numpy as np
>>>arr = np.array([1, 2, 3, 4, 5, 6])
>>>newarr = np.array_split(arr,3)
>>>print(newarr)
[array([1, 2]), array([3, 4]), array([5, 6])]
>>>print(np.array_split(arr,5))
[array([1, 2]), array([3]), array([4]), array([5]), array([6])]
ARRAY SORTING:
Sorting is the process of combining elements in an ordered sequence either in the ascending or
descending order.
>>>import numpy as np
#sorting numbers in ascending order
>>>arr = np.array([3, 2, 0, 1])
>>>print(np.sort(arr))
[0 1 2 3]
#sorting in alphabetical order
>>>arr = np.array(['banana', 'cherry', 'apple'])
>>>print(np.sort(arr))
['apple' 'banana' 'cherry']
SEARCHING ARRAYS:
Search an array for a certain value returns the index that gets a match. To search an array, use the
where ( ) method.
Find the indexes where the value is 4:
>>>arr = np.array([1, 2, 3, 4, 5, 4, 4])
>>>x = np.where(arr == 4)
>>>print(x)
(array([3, 5, 6], dtype=int32),)
>>>arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
>>>x = np.where(arr%2 == 0)
>>>print(x)
(array([1, 3, 5, 7], dtype=int32),)
>>>x = np.where(arr%2 == 1)
>>>print(x)
(array([0, 2, 4, 6], dtype=int32),)

DATA TYPES:
NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc. Below is a list of all data types in NumPy and the
characters used to represent them.
i - integer
b - boolean
u - unsigned integer
f - float
c - complex float
m - timedelta
M - datetime
O - object
S - string
U - unicode string
V - fixed chunk of memory for other type ( void )
>>>import numpy as np
>>>arr = np.array([1, 2, 3, 4])
>>>print(arr.dtype)
int32
>>>arr = np.array(['apple', 'banana', 'cherry'])
>>>print(arr.dtype)
<U6
>>>arr = np.array([1, 2, 3, 4], dtype='S')
>>>print(arr)
[b'1' b'2' b'3' b'4']
>>>print(arr.dtype)
|S1
>>>arr = np.array([1, 2, 3, 4], dtype='i4')
>>>print(arr)
[1 2 3 4]
>>>print(arr.dtype)
int32
>>>arr = np.array(['a', '2', '3'], dtype='i')
Traceback (most recent call last):
File "<pyshell#83>", line 1, in <module>
arr = np.array(['a', '2', '3'], dtype='i')
ValueError: invalid literal for int() with base 10: 'a'
>>>arr = np.array([1, 0, 3])
>>>newarr = arr.astype(bool)
>>>print(newarr)
[ True False True]
>>>print(newarr.dtype)
bool

RESULT:
Create a simple Pandas DataFrame

import pandas as pd
data = {"calories": [420, 380, 390],"duration": [50, 40, 45]}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)

calories duration

0 420 50
1 380 40
2 390 45

Pandas use the loc attribute to return one or more specified row(s)

Return row 0:
#refer to the row index:
print(df.loc[0])

calories 420
duration 50

Name: 0, dtype: int64

Return row 0 and 1:

0#use a list of indexes:
print(df.loc[[0, 1]])

calories duration

0 420 50
1 380 40

Named Indexes:

With the index argument, we can name your own indexes.

import pandas as pd
data = {"calories": [420, 380, 390],"duration": [50, 40, 45]}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)

calories duration

day1 420 50
day2 380 40
day3 390 45

Load Files into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame. Load a comma separated
file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
iso_code ... excess_mortality_cumulative_per_million
0 AFG ... NaN
1 AFG ... NaN
2 AFG ... NaN
3 AFG ... NaN
4 AFG ... NaN
... ... ... ...
166321 ZWE ... NaN
166322 ZWE ... NaN
166323 ZWE ... NaN
166324 ZWE ... NaN
166325 ZWE ... NaN
.
[166326 rows x 67 columns]

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0

import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print(df)

Name Age
rank1 Tom 28
rank2 Jack 34
rank3 Steve 29
rank4 Ricky 42
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print(df)
a b c
first 1 2 NaN
second 5 10 20.0

Creating a DataFrame using List:

DataFrame can be created using a single list or a list of lists.

import pandas as pd
# list of strings
lst = ['Pandas', 'SciPy', 'DataFrames', 'NumPy', 'Analytics']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)
0 Pandas
1 SciPy
2 DataFrames
3 NumPy
4 Analytics

Creating DataFrame from dict of ndarray/lists: To create DataFrame from dict of

narray/list, all the narray must be of same length. If index is passed then the length index should be
equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is
the array length.

import pandas as pd
# intialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
print(df)
Name Age
0 Tom 20
1 nick 21
2 krish 19
3 jack 18

Column Selection: In order to select a column in Pandas DataFrame, we can either access the
columns by calling them by their columns name.

import pandas as pd
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],'Age':[27, 24, 22, 32],'Address':['Delhi',
'Kanpur', 'Allahabad', 'Kannauj'],'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
# select two columns
print(df[['Name', 'Qualification']])
Name Qualification
0 Jai Msc
1 Princi MA
2 Gaurav MCA
3 Anuj Phd
Row Selection: Pandas provide a unique method to retrieve rows from a Data
frame. DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be
selected by passing integer location to an iloc[] function.

File used: country.csv

import pandas as pd
data = pd.read_csv("country.csv", index_col ="iso_code")
first = data.loc["AFG"]
second = data.loc["NOR"]
print(first, "\n\n\n", second)

iso_code continent location date total_cases

AFG Asia Afghanistan 2/24/2020 5
AFG Asia Afghanistan 2/25/2020 5
AFG Asia Afghanistan 2/26/2020 5
AFG Asia Afghanistan 2/27/2020 5
AFG Asia Afghanistan 2/28/2020 5
AFG Asia Afghanistan 2/29/2020 5

iso_code continent location date total_cases

NOR Europe Norway 10/1/2021 189915
NOR Europe Norway 10/2/2021 190224
NOR Europe Norway 10/3/2021 190533
NOR Europe Norway 10/4/2021 191017
NOR Europe Norway 10/5/2021 191599
NOR Europe Norway 10/6/2021 192079
NOR Europe Norway 10/7/2021 192587

Indexing a DataFrame using indexing operator []:

Indexing operator is used to refer to the square brackets following an object.
The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing
operator to refer to df[].

Working with Missing Data:

Missing Data can occur, when no information is provided for one or more items or for a
whole unit. Missing Data is a very big problem in real life scenario. Missing Data can also
refer to as NA(Not Available) values in pandas.

isnull() and notnull():

Both function help in checking whether a value is NaN or not. This function can also be used in Pandas
Series in order to find null values in a series.

import pandas as pd
import numpy as np
dict = {'First Score':[100, 90, np.nan, 95],'Second Score': [30, 45, 56, np.nan],'Third
Score':[np.nan, 40, 80, 98]}
df = pd.DataFrame(dict)
print(df.isnull()) First Score Second Score Third Score

0 False False True

1 False False False
2 True False False
3 False True False
fillna(), replace() and interpolate():

All these function help in filling null values in datasets of a DataFrame. Interpolate () function is
basically used to fill NA values in the DataFrame, but it uses various interpolation technique to fill the
missing values rather than hard-coding the value.

import pandas as pd
import numpy as np
dict = {'First Score':[100, 90, np.nan, 95],'Second Score': [30, 45, 56, np.nan],'Third
Score':[np.nan, 40, 80, 98]}
df = pd.DataFrame(dict)
print(df.fillna(0))

First Score Second Score Third Score

0 100.0 30.0 0.0
1 90.0 45.0 40.0
2 0.0 56.0 80.0
3 95.0 0.0 98.0

Iterating over rows and columns:

Pandas DataFrame consists of rows and columns so, in order to iterate over DataFrame, we have to
iterate a DataFrame like a dictionary. In order to iterate over rows, we can use three function
iteritems(), iterrows(), itertuples() . These three functions will help in iteration over rows.

import pandas as pd
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],'degree': ["MBA", "BCA", "M.Tech",
"MBA"],'score':[90, 40, 80, 98]}
df = pd.DataFrame(dict)
print(df)

name degree score

0 aparna MBA 90
1 pankaj BCA 40
2 sudhir M.Tech 80
3 Geeku MBA 98

RESULT:
PROCEDURE:

CASE 1: READING DATA FROM EXCEL/CSV FILE

We will use the Pandas library to load the Iris data set CSV file, and will convert it into the dataframe.
read_csv() method which is used to read CSV files.

import pandas as pd
# Reading the CSV file
df = pd.read_csv("Iris.csv")
sepallength sepalwidth petallength petalwidth class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

[150 rows x 5 columns]

# Printing top 5 rows

print(df.head())

sepallength sepalwidth petallength petalwidth class

0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

#Use the shape parameter to get the shape of the dataset.

print(df.shape)
(150, 5)
#To view the columns and their data types
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149

Data columns (total 5 columns):

# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepallength 150 non-null float64
1 sepalwidth 150 non-null float64
2 petallength 150 non-null float64
3 petalwidth 150 non-null float64
4 class 150 non-null object

dtypes: float64(4), object(1)

memory usage: 6.0+ KB
None
The describe() function applies basic statistical computations on the dataset like extreme values,
count of data points standard deviation, etc. Any missing value or NaN value is automatically skipped.
describe() function gives a good picture of the distribution of data.
print(df.describe())
sepallength sepalwidth petallength petalwidth
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

CASE 2: READING DATA FROM TEXT FILE

file1 = open("/content/sample_data/Basics-Python.txt","r+")
print("Output of Read function is ")
print(file1.read())
print()
Output of Read function is
Python is a very popular general-purpose interpreted, interactive, object
oriented, and high-level programming language. Python is dynamically-typed
and garbage-collected programming language. It was created by Guido van
Rossum during 1985- 1990. Like Perl, Python source code is also available
under the GNU General Public License (GPL).

Python is consistently rated as one of the world's most popular programming

languages. Python is fairly easy to learn, so if you are starting to learn
any programming language then Python could be your great choice. Today
various Schools, Colleges and Universities are teaching Python as their
primary programming language. There are many other good reasons which makes
Python as the top choice of any programmer:

Python is Open Source which means its available free of cost.

Python is simple and so easy to learn
Python is versatile and can be used to create many different things.
Python has powerful development libraries include AI, ML etc.
Python is much in demand and ensures high salary

CASE 3: READING DATA FROM WEB

//To download a file from web using wget module

# wget module to be installed

pip install wget

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab
wheels/public/simple/
Collecting wget
Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
Building wheel for wget (setup.py) ... done
Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9674
sha256=c0e498fded138e8bf764bbcda6a413bfac3d6338f40f4be9b5ce9384baa4c957
Stored in directory:
/root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c1
3e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Descriptive Statistics — is used to understand your data by calculating various statistical
values for given numeric variables. For any given data our approach is to understand it and calculated
various statistical values. This will help us to identify various statistical tests that can be done on
provided data.
Under descriptive statistics we can calculate following values,
1. Central tendency — mean, median, mode
2. Dispersion — variance, standard deviation, range, interquartile range(IQR)
3. Skewness — symmetry of data along with mean value
4. Kurtosis — peakedness of data at mean value
We have system defined functions to get these values for any given datasets.
# Changing the column headers in Iris dataset
import pandas as pd
import numpy as np
df = pd.read_csv("/content/sample_data/Iris.csv")
data=pd.DataFrame(df,columns=list("ABCDE"))
print(data)
A B C D E
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
.. .. .. .. .. ..
145 NaN NaN NaN NaN NaN
146 NaN NaN NaN NaN NaN
147 NaN NaN NaN NaN NaN
148 NaN NaN NaN NaN NaN
149 NaN NaN NaN NaN NaN

[150 rows x 5 columns]

1. Calculating Central Tendency

data[‘A’].mean()
data[‘A’].median()
data[‘A’].mode()
#mean — is average value of given numeric values
#median — is middle most value of given values
#mode — is most frequently occurring value of given numeric variables
# Mean, Median, Mode on Iris dataset
print(df)

sepallength sepalwidth petallength petalwidth class

0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

[150 rows x 5 columns]

df['sepallength'].mean()
5.843333333333334
df['sepalwidth'].median()
3.0
df['petalwidth'].mode()
0
0.2
dtype: float64
df['class'].mode()
0 Iris-setosa
1 Iris-versicolor
2 Iris-virginica
dtype: object

2. Dispersion
Dispersion is used to define variation present in given variable. Variation means how
values are close or away from the mean value.
Variance — its gives average deviation from mean value
Standard Deviation — it is square root of variance
Range — it gives difference between max and min value
InterQuartile Range(IQR) — it gives difference between Q3 and Q1, where Q3 is 3rd
Quartile value and Q1 is 1st Quartile value.
data[‘A’].var()
data[‘A’].std()
data[‘A’].max()-data[‘A’].min()
data[‘A’].quantile([.25,.5,.75])
df["sepalwidth"].var()
0.1880040268456376
df["sepallength"].std()
0.4335943113621737
df["sepallength"].max()-df["sepalwidth"].min()
5.9
df["petalwidth"].quantile([.25,.5,.75])
0.50 1.3
0.75 1.8
Name: petalwidth, dtype: float64

3. Skewness
Skewness is used to measure symmetry of data along with the mean value. Symmetry
means equal distribution of observation above or below the mean.
skewness = 0: if data is symmetric along with mean
skewness = Negative: if data is not symmetric and right side tail is longer than left side tail of density
plot.
skewness = Positive: if data is not symmetric and left side tail is longer than right side tail in density
plot.
We can find skewness of given variable by below given formula.
data[‘A’].skew()
df["sepallength"].skew()
0.3149109566369728
df["sepalwidth"].skew()
0.3340526621720866
df["class"].skew()
ValueError: could not convert string to float: 'Iris-setosa'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/nanops.py in _f(*args,
**kwargs)
99 # object arrays that contain strings
100 if is_object_dtype(args[0]):
--> 101 raise TypeError(e) from e
102 raise
103
TypeError: could not convert string to float: 'Iris-setosa

4. Kurtosis
Kurtosis is used to defined peakedness (or flatness) of density plot (normal distribution
plot). As per Dr. Wheeler defines kurtosis defined as: “The kurtosis parameter is a measure of the
combined weight of the tails relative to the rest of the distribution.” This means we measure tail
heaviness of given distribution.

kurtosis = 0: if peakedness of graph is equal to normal distribution.

kurtosis = Negative: if peakedness of graph is less than normal distribution(flat plot)
kurtosis = Positive: if peakedness of graph is more than normal distribution (more peaked plot)
We can find kurtosis of given variable by below given formula.
data[‘A’].kurt()
df["sepalwidth"].kurt()
0.2907810623654279
df["sepallength"].kurt()
-0.5520640413156395
Let see the graph representation of given variable and interpretation of skewness and
peakedness of distribution from it.
import seaborn as sns
sns.distplot(df[“sepallength”],hist=True,kde=True)
/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure
level function with similar flexibility) or `histplot` (an axes-level
function for histograms).
warnings.warn(msg, FutureWarning)
<matplotlib.axes._subplots.AxesSubplot at 0x7fa94e2957d0>
Density plot of variable ‘sepallength’

In the above graph, we can clearly see that left side and right side of plot is equally
distributed. Histogram is above the line that means data has flat plot. This means kurtosis of this
distribution is Normal.

Checking Missing Values

Missing values can occur when no information is provided for one or more items or for a whole unit.
We will use the isnull() method.

df.isnull().sum()
sepallength 0
sepalwidth 0
petallength 0
petalwidth 0
class 0
dtype: int64

Checking Duplicates

Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates() method
helps in removing duplicates from the data frame.
#interactive table view
data = df.drop_duplicates(subset ="class",)
data

df.value_counts("sepalwidth")
sepalwidth
3.0 26
2.8 14
3.2 13
3.4 12
3.1 12
2.9 10
2.7 9
2.5 8
3.3 6
3.5 6
3.8 6
2.6 5
2.3 4
2.4 3
2.2 3
3.6 3
3.7 3
3.9 2
4.1 1
4.2 1
2.0 1
4.0 1
4.4 1
dtype: int64
Data Visualization

Visualizing the target column - Our target column will be the sepalwidth column because at the end, we
need the result according to the sepalwidth only. Let’s see a countplot for species. (We will use Matplotlib
and Seaborn library for the data visualization.)

import seaborn as sns

import matplotlib.pyplot as plt
sns.countplot(x="sepalwidth", data=df, )
plt.show()

Comparing Sepal Length and Sepal Width

import seaborn as sns

import matplotlib.pyplot as plt
sns.scatterplot(x="sepallength", y="sepalwidth",hue="class", data=df, )
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()
Histograms

Histograms allow seeing the distribution of data for various columns. It can be used for uni as well as bi-
variate analysis.

import seaborn as sns

import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(10,10))
axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df["sepallength"], bins=7)
axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df["sepalwidth"], bins=5);
axes[1,0].set_title("Petal Length")
axes[1,0].hist(df["petallength"], bins=6);
axes[1,1].set_title("Petal Width")
axes[1,1].hist(df["petalwidth"], bins=6);

Output:
 The highest frequency of the sepal length is between 30 and 35 which is between 5.5 and 6.
 The highest frequency of the sepal width is around 70 which is between 3.0 and 3.5.
 The highest frequency of the petal length is around 50 which is between 1 and 2.
 The highest frequency of the petal width is between 40 and 50 which is between 0.0 and 0.5

RESULT:
PROCEDURE:

(5a) Univariate Analysis - Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and
Kurtosis

import pandas as pd
import numpy as np
df = pd.read_csv("diabetes.csv")
print(df)

Age Gender Polyuria ... Alopecia Obesity class

0 40 Male No ... Yes Yes Positive
1 58 Male No ... Yes No Positive
2 41 Male Yes ... Yes No Positive
3 45 Male No ... No No Positive
4 60 Male Yes ... Yes Yes Positive
.. ... ... ... ... ... ... ...
515 39 Female Yes ... No No Positive
516 48 Female Yes ... No No Positive
517 58 Female Yes ... No Yes Positive
518 32 Female No ... Yes No Negative
519 42 Male No ... No No Negative
[520 rows x 17 columns]

>>>print(df['Age'].mean())
48.02884615384615
>>>print(df['Age'].median())
47.5
>>>print(df['Age'].mode())
0 35
dtype: int64
>>>print(df["Age"].var())
147.65812583370388
>>>print(df["Age"].std())
12.151465995249458
>>>print(df["Age"].skew())
0.3293593578272701
>>>print(df["Age"].kurt())
-0.19170941407070163

Data-Visualization:(pima-diabetes.csv)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics as st
#Load the data
filepath='pima-diabetes.csv';
df = pd.read_csv(filepath)
Data_X= df.copy(deep=True)
Data_X= Data_X.drop(['Outcome'],axis=1)
plt.rcParams['figure.figsize']=[40,40]
#Plotting Histogram of Data
Data_X.hist(bins=40)
plt.show()
(5b) Bivariate Analysis – Linear and Logistic Regression

Simple Linear Regression - It is an approach for predicting a response using a single feature. It is
assumed that the two variables are linearly related. So, we try to find a linear function that predicts the
response value(y) as accurately as possible as a function of the feature or independent variable(x). Let us
consider a dataset where we have a value of response y for every feature x as given below(example):

Now, the task is to find a line that fits best in the above scatter plot so that we can predict the response
for any new feature values. (i.e. a value of x not present in a dataset). This line is called a regression line.
The equation of regression line is represented as: h(xi) = b0+b1xi
Here,
 h(xi) represents the predicted response value for ith observation.
 b0 and b1 are regression coefficients and represent y-intercept and slope of regression line
respectively.
Now, the task is to find a line that fits best in the above scatter plot so that we can predict the response
for any new feature values. (i.e. a value of x not present in a dataset). This line is called a regression line.
The equation of regression line is represented as: h(xi) = b0+b1xi
Here,

 h(xi) represents the predicted response value for ith observation.

 b0 and b1 are regression coefficients and represent y-intercept and slope of regression line
respectively.

SOURCE CODE:

import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m", marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \\n b_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()

Output:

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
And graph obtained looks like this:
Logistic Regression:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics as st
import sklearn
filepath='pima-diabetes.csv';
df = pd.read_csv(filepath)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
LR = LogisticRegression()
LR.fit(X_train, y_train)
y_pred = LR.predict(X_test)
print("Accuracy ", LR.score(X_test, y_test)*100)
sns.set(font_scale=1.5)
cm = confusion_matrix(y_pred, y_test)
sns.heatmap(cm, annot=True, fmt='g')
plt.show()
(5c) Multiple Regression Analysis
import pandas as pd
from sklearn import linear_model
df = pd.read_csv("pima-diabetes.csv")
X = df[['Glucose', 'BloodPressure']]
y = df['Age']
regr = linear_model.LinearRegression()
regr.fit(X, y)
#Predict age based on Glucose and BloodPressure
predictedage = regr.predict([[185, 145]])
print(predictedage)
Output:
[48.13025197]

(5d) Comparative Analysis

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics as st
#Load the data
filepath='pima-diabetes.csv';
df = pd.read_csv(filepath)
plt.style.use("classic")
plt.figure(figsize=(10,10))
sns.distplot(df[df['Outcome'] == 0]["Pregnancies"], color='green') # Healthy - green
sns.distplot(df[df['Outcome'] == 1]["Pregnancies"], color='red') # Diabetic - Red
plt.title('Healthy vs Diabetic by Pregnancy', fontsize=15)
plt.xlim([-5,20])
plt.grid(linewidth = 0.7)
plt.show()

From above graph, we can infer that the Pregnancy isn't likely cause for diabetes as the distribution between
the Healthy and Diabetic is almost same.
//diabetes.csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics as st
#Load the data
filepath='diabetes.csv';
df = pd.read_csv(filepath)
plt.style.use("classic")
plt.figure(figsize=(10,10))
sns.distplot(df[df['Gender'] == 'Male']["Age"], color='green')
sns.distplot(df[df['Polyuria'] == 'No']["Age"], color='red')
plt.title('Male vs Polyuria by Age', fontsize=15)
plt.xlim([-5,20])
plt.grid(linewidth = 0.7)
plt.show()

RESULT:
SOURCE CODE:

# Normal Curve
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot
import pandas as pd
from pandas.plotting import scatter_matrix
import plotly.express as px
from scipy.stats import norm
import statistics as st
from mpl_toolkits import mplot3d
data = pd.read_csv('diabetes.csv')
x=data.Glucose[0:50]
mean=st.mean(x)
sd=st.stdev(x)
pyplot.plot(x,norm.pdf(x,mean,sd))
pyplot.title("Normal plot")
pyplot.show()

OUTPUT:

#density plot
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot
import pandas as pd
from pandas.plotting import scatter_matrix
import plotly.express as px
from scipy.stats import norm
import statistics as st
from mpl_toolkits import mplot3d
data = pd.read_csv('diabetes.csv')
data.plot(kind='density', subplots=True, layout=(3,3), sharex=False)
pyplot.show()
OUTPUT:

#contour plot
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot
import pandas as pd
from pandas.plotting import scatter_matrix
import plotly.express as px
from scipy.stats import norm
import statistics as st
from mpl_toolkits import mplot3d
data = pd.read_csv('diabetes.csv')
x=data.BloodPressure[0:2]
y=data.Glucose[0:2]
z=((data.BMI[0:2],data.Age[0:2]))
pyplot.figure(figsize=(7,5))
pyplot.title("Contour plot")
contours=pyplot.contour(x,y,z)
pyplot.show()

OUTPUT:
#correlation plot
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot
import pandas as pd
from pandas.plotting import scatter_matrix
import plotly.express as px
from scipy.stats import norm
import statistics as st
from mpl_toolkits import mplot3d
data = pd.read_csv('diabetes.csv')
names=["Pregnancies", "Glucose","BloodPressure","SkinThickness","Insulin",
"BMI","DiabetesPedigreeFunction", "Age"]
correlation = data.corr()
fig = pyplot.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(correlation, vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = np.arange(0,8,1)
ax.set_xticks(ticks)
ax.set_yticks(ticks)
ax.set_xticklabels(names)
ax.set_yticklabels(names)
pyplot.title("Correlation")
pyplot.show()

OUTPUT:

#scatter plot
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot
import pandas as pd
from pandas.plotting import scatter_matrix
import plotly.express as px
from scipy.stats import norm
import statistics as st
from mpl_toolkits import mplot3d
data = pd.read_csv('diabetes.csv')
scatter_matrix(data)
pyplot.show()
OUTPUT:

#Histograms
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot
import pandas as pd
from pandas.plotting import scatter_matrix
import plotly.express as px
from scipy.stats import norm
import statistics as st
from mpl_toolkits import mplot3d
data = pd.read_csv('diabetes.csv')
data.hist()
pyplot.show()

OUTPUT:

#three dimensonal plotting

import numpy as np
import matplotlib as mpl
from matplotlib import pyplot
import pandas as pd
from pandas.plotting import scatter_matrix
import plotly.express as px
from scipy.stats import norm
import statistics as st
from mpl_toolkits import mplot3d

data = pd.read_csv('diabetes.csv')
fig = pyplot.figure()

ax = pyplot.axes(projection='3d')
ax = pyplot.axes(projection='3d')
zline = np.array(data.BMI)
xline = np.sin(zline)
yline = np.cos(zline)

ax.plot3D(xline, yline, zline, 'gray')

zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Blues')
pyplot.show()

OUTPUT:

RESULT:
PROCEDURE:
#Basemap and other packages installation
!pip install basemap
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting basemap
Downloading basemap-1.3.6-cp38-cp38-manylinux1_x86_64.whl (863 kB)
863 kB 14.5 MB/s
Collecting basemap-data<1.4,>=1.3.2
Downloading basemap_data-1.3.2-py2.py3-none-any.whl (30.5 MB)
30.5 MB 1.4 MB/s
Requirement already satisfied: matplotlib<3.7,>=1.5 in /usr/local/lib/python3.8/dist-packages (from
basemap) (3.2.2)
Collecting pyproj<3.5.0,>=1.9.3
Downloading pyproj-3.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
7.8 MB 55.4 MB/s
Collecting pyshp<2.4,>=1.2
Downloading pyshp-2.3.1-py2.py3-none-any.whl (46 kB)
46 kB 3.6 MB/s
Collecting numpy<1.24,>=1.22
Downloading numpy-1.23.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
17.1 MB 46.7 MB/s
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.8/dist-packages (from
matplotlib<3.7,>=1.5->basemap) (2.8.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.8/dist-
packages (from matplotlib<3.7,>=1.5->basemap) (3.0.9)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from
matplotlib<3.7,>=1.5->basemap) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from
matplotlib<3.7,>=1.5->basemap) (1.4.4)
Requirement already satisfied: certifi in /usr/local/lib/python3.8/dist-packages (from pyproj<3.5.0,>=1.9.3->
basemap) (2022.9.24)

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-

dateutil>=2.1->matplotlib<3.7,>=1.5->basemap) (1.15.0)
Installing collected packages: numpy, pyshp, pyproj, basemap-data, basemap
Attempting uninstall: numpy
Found existing installation: numpy 1.21.6
Uninstalling numpy-1.21.6:
Successfully uninstalled numpy-1.21.6
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
This behaviour is the source of the following dependency conflicts.
scipy 1.7.3 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.23.5 which is incompatible.
Successfully installed basemap-1.3.6 basemap-data-1.3.2 numpy-1.23.5 pyproj-3.4.0 pyshp-2.3.1
!pip install basemap-data
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: basemap-data in /usr/local/lib/python3.8/dist-packages (1.3.2)
!pip install basemap-data-hires
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting basemap-data-hires
Downloading basemap_data_hires-1.3.2-py2.py3-none-any.whl (91.1 MB)
91.1 MB 57 kB/s
Installing collected packages: basemap-data-hires
Successfully installed basemap-data-hires-1.3.2
!pip install chain
Requirement already satisfied: chain in /usr/local/lib/python3.8/dist-packages (1.0)

SOURCE CODE:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5)
plt.show()

The useful thing is that the globe shown here is not a mere image; it is a fully-functioning Matplotlib axes
that understands spherical coordinates and which allows us to easily over plot data on the map.
fig = plt.figure(figsize=(8, 8))
m=Basemap(projection='lcc', resolution=None,width=8E6, height=8E6,lat_0=45, lon_0=-100)
m.etopo(scale=0.5, alpha=0.5)
# Map (long, lat) to (x, y) for plotting
x, y = m(-122.3, 47.6)
plt.plot(x, y, 'ok', markersize=5)
plt.text(x, y, ' Seattle', fontsize=12);

Map Projections:
The Basemap package implements several dozen such projections, all referenced by a short format code.
from itertools import chain
def draw_map(m, scale=0.2):
# draw a shaded-relief image
m.shadedrelief(scale=scale)
# lats and longs are returned as a dictionary
lats = m.drawparallels(np.linspace(-90, 90, 13))
lons = m.drawmeridians(np.linspace(-180, 180, 13))
# keys contain the plt.Line2D instances
lat_lines = chain(*(tup[1][0] for tup in lats.items()))
lon_lines = chain(*(tup[1][0] for tup in lons.items()))
all_lines = chain(lat_lines, lon_lines)
# cycle through these lines and set the desired style
for line in all_lines:
line.set(linestyle='-', alpha=0.3, color='w')

Cylindrical projections

The simplest of map projections are cylindrical projections, in which lines of constant latitude and
longitude are mapped to horizontal and vertical lines, respectively. This type of mapping represents
equatorial regions quite well, but results in extreme distortions near the poles. The spacing of latitude
lines varies between different cylindrical projections, leading to different conservation properties, and
different distortion near the poles.
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='cyl', resolution=None,
llcrnrlat=-90, urcrnrlat=90,
llcrnrlon=-180, urcrnrlon=180, )
draw_map(m)

Cylindrical projections

Pseudo-cylindrical projections
Pseudo-cylindrical projections relax the requirement that meridians (lines of constant longitude) remain
vertical; The Mollweide projection (projection='moll') is one common example of this, in which all
meridians are elliptical arcs. It is constructed so as to preserve area across the map: though there are
distortions near the poles, the area of small patches reflects the true area. Other pseudo-cylindrical
projections are the sinusoidal (projection='sinu') and Robinson (projection='robin') projections.
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='moll', resolution=None,lat_0=0, lon_0=0)
draw_map(m)
Perspective projections
Perspective projections are constructed using a particular choice of perspective point, similar to if you
photographed the Earth from a particular point in space (a point which, for some projections, technically
lies within the Earth!). One common example is the orthographic projection (projection='ortho'), which
shows one side of the globe as seen from a viewer at a very long distance.
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=0)
draw_map(m)

Conic projections
A Conic projection projects the map onto a single cone, which is then unrolled. This can lead to very good
local properties, but regions far from the focus point of the cone may become much distorted.
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None, lon_0=0, lat_0=50, lat_1=45, lat_2=55,
width=1.6E7, height=1.2E7)
draw_map(m)

Example – Dataset (California_cities.csv)

import pandas as pd
cities = pd.read_csv('/content/sample_data/california_cities.csv')
# Extract the data we're interested in
lat = cities['latd'].values
lon = cities['longd'].values
population = cities['population_total'].values
area = cities['area_total_km2'].values
# 1. Draw the map background
fig = plt.figure(figsize=(8, 8))
m=Basemap(projection='lcc',resolution='h',lat_0=37.5,lon_0=119,width=1E6, height=1.2E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')
# 2. scatter city data, with color reflecting population
# and size reflecting area
m.scatter(lon, lat, latlon=True,c=np.log10(population), s=area, cmap='Reds', alpha=0.5)
# 3. create colorbar and legend
plt.colorbar(label=r'$\log_{10}({\rm population})$')
plt.clim(3, 7)
# make legend with dummy points
for a in [100, 300, 500]:
plt.scatter([], [], c='k', alpha=0.5, s=a, label=str(a) + ' km$^2$')
plt.legend(scatterpoints=1, frameon=False, labelspacing=1, loc='lower left');

This shows us where larger populations of people have settled in California: they are clustered near the coast in
the Los Angeles and San Francisco areas, stretched along the highways in the flat central valley, and avoiding
almost completely the mountainous regions along the borders of the state.

RESULT:

EMVFoundry Release Notes (Archives) - Telegraph
No ratings yet
EMVFoundry Release Notes (Archives) - Telegraph
13 pages
Numpy
No ratings yet
Numpy
9 pages
Introduction To Numpy
No ratings yet
Introduction To Numpy
13 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
Atn950c Product Brochure 7.11
No ratings yet
Atn950c Product Brochure 7.11
6 pages
Tado° GMBH Is A German Technology Company Headquartered in Munich (Germany) and A
No ratings yet
Tado° GMBH Is A German Technology Company Headquartered in Munich (Germany) and A
2 pages
Dbms Previous Year QP Answers
No ratings yet
Dbms Previous Year QP Answers
16 pages
1 Numpy
No ratings yet
1 Numpy
26 pages
FDS Lab Manual-1
No ratings yet
FDS Lab Manual-1
51 pages
Patend On Ai Applacation
No ratings yet
Patend On Ai Applacation
14 pages
De Lab Manual New
No ratings yet
De Lab Manual New
24 pages
Unit 1
No ratings yet
Unit 1
170 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Ec6411 Circuits and Simulation Integrated Lab Manual
No ratings yet
Ec6411 Circuits and Simulation Integrated Lab Manual
83 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
Unit 3 - Numpy - VP
No ratings yet
Unit 3 - Numpy - VP
53 pages
PP&DS 3
No ratings yet
PP&DS 3
109 pages
Numpy
No ratings yet
Numpy
32 pages
A Detailed Analysis of The Lockbit Ransomware: Prepared By: Vlad Pasca, Lifars, LLC Date
No ratings yet
A Detailed Analysis of The Lockbit Ransomware: Prepared By: Vlad Pasca, Lifars, LLC Date
62 pages
Numpy Operations
No ratings yet
Numpy Operations
55 pages
Sentinel 2 Products Specification Document
No ratings yet
Sentinel 2 Products Specification Document
524 pages
Cs3351 Dpco Lab Manual 2024
No ratings yet
Cs3351 Dpco Lab Manual 2024
55 pages
Python
No ratings yet
Python
11 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
Nethserver PDF
No ratings yet
Nethserver PDF
88 pages
Numpy
No ratings yet
Numpy
27 pages
Unit 4
No ratings yet
Unit 4
49 pages
Array in Python
No ratings yet
Array in Python
33 pages
Lab 1 - Introduction
No ratings yet
Lab 1 - Introduction
14 pages
Data Base Management Systems - 2 Marks
No ratings yet
Data Base Management Systems - 2 Marks
24 pages
Manoj Krishnan - LinkedIn
No ratings yet
Manoj Krishnan - LinkedIn
5 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Python Lab6 NumPy
No ratings yet
Python Lab6 NumPy
46 pages
LMS Ilias 7 - PDF
100% (1)
LMS Ilias 7 - PDF
146 pages
Num Py
No ratings yet
Num Py
21 pages
NDG Linux Unhatched - NDG Linux Unhatched
No ratings yet
NDG Linux Unhatched - NDG Linux Unhatched
4 pages
CS3492-Data Base Management Systems Imp
No ratings yet
CS3492-Data Base Management Systems Imp
7 pages
Untitled Document
No ratings yet
Untitled Document
10 pages
Unit - Iii
No ratings yet
Unit - Iii
79 pages
NUMPYA03
No ratings yet
NUMPYA03
36 pages
Introduction To HTML Lecture 5-2
No ratings yet
Introduction To HTML Lecture 5-2
22 pages
Dse Unit 3
No ratings yet
Dse Unit 3
12 pages
Print
No ratings yet
Print
296 pages
Numpy
No ratings yet
Numpy
64 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
13 pages
Numpy
No ratings yet
Numpy
71 pages
Katalog Terr II
No ratings yet
Katalog Terr II
22 pages
Chapter 2
No ratings yet
Chapter 2
32 pages
Dpco Ciat - II
No ratings yet
Dpco Ciat - II
2 pages
Numpy - Pandas
No ratings yet
Numpy - Pandas
26 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Numpy, Pandas and Matplotlib
No ratings yet
Numpy, Pandas and Matplotlib
60 pages
Teach Pendant Programming 2020
No ratings yet
Teach Pendant Programming 2020
64 pages
Ads CS3691 - Eiot Imp Questions For 2025
No ratings yet
Ads CS3691 - Eiot Imp Questions For 2025
6 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
Numpy
No ratings yet
Numpy
14 pages
Oomph-Lib - Maths.man - Ac.uk Doc Beam Steady Ring Latex Refman
No ratings yet
Oomph-Lib - Maths.man - Ac.uk Doc Beam Steady Ring Latex Refman
16 pages
Numpy New
No ratings yet
Numpy New
16 pages
AD3251
No ratings yet
AD3251
1 page
FALLSEM2023-24 CSI3007 ETH VL2023240104352 2023-09-27 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3007 ETH VL2023240104352 2023-09-27 Reference-Material-I
47 pages
Unit III - Data Manipulation Using Python
No ratings yet
Unit III - Data Manipulation Using Python
16 pages
Num Py
No ratings yet
Num Py
15 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Unit 3 Numpy
No ratings yet
Unit 3 Numpy
23 pages
3 Introduction To Numpy
No ratings yet
3 Introduction To Numpy
9 pages
12 The EBIOS
No ratings yet
12 The EBIOS
4 pages
Numpy - Basics
No ratings yet
Numpy - Basics
18 pages
NumPy Class 11th
No ratings yet
NumPy Class 11th
10 pages
NUMPY
No ratings yet
NUMPY
8 pages
Numpy Arrays
No ratings yet
Numpy Arrays
7 pages
Unit 4
No ratings yet
Unit 4
19 pages
2 - Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
2 - Numpy - Tutorial - Ipynb - Colaboratory
10 pages
Rodenstock CV1000P
No ratings yet
Rodenstock CV1000P
2 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
Numpy
No ratings yet
Numpy
7 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
How Do I Install Numpy?: Numpy Array: Numpy Array Is A Powerful N-Dimensional Array Object Which Is in The Form of Rows
No ratings yet
How Do I Install Numpy?: Numpy Array: Numpy Array Is A Powerful N-Dimensional Array Object Which Is in The Form of Rows
3 pages
Requirements Elicitation Techniques
No ratings yet
Requirements Elicitation Techniques
7 pages
HW 21091 3
No ratings yet
HW 21091 3
7 pages
ITP Assignment
100% (3)
ITP Assignment
54 pages
How To Import Users From CSVFile
No ratings yet
How To Import Users From CSVFile
5 pages
Lab 1
No ratings yet
Lab 1
6 pages
Fortiweb ™: Web Application and Api Protection
No ratings yet
Fortiweb ™: Web Application and Api Protection
9 pages
Learn Python 3 - Dictionaries
No ratings yet
Learn Python 3 - Dictionaries
2 pages
Essential Python Libraries
100% (1)
Essential Python Libraries
41 pages
Geographic Information System
No ratings yet
Geographic Information System
15 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
How To Prepare For Placement in 4 Months (Roadmap + Free Resources) )
No ratings yet
How To Prepare For Placement in 4 Months (Roadmap + Free Resources) )
6 pages
Project Review 1 PPT Format Sem 5-1
No ratings yet
Project Review 1 PPT Format Sem 5-1
11 pages
Module 2-Advanced Computational Techniques
No ratings yet
Module 2-Advanced Computational Techniques
27 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
MBSE
No ratings yet
MBSE
54 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
1 page
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
434 7 PDF
No ratings yet
434 7 PDF
40 pages
Numpy Cheat Sheet Python For Data Science: Inspecting Your Array Sorting Arrays
No ratings yet
Numpy Cheat Sheet Python For Data Science: Inspecting Your Array Sorting Arrays
1 page

Fods Lab

Uploaded by

Fods Lab

Uploaded by

PROCEDURE: (DOWNLOAD / INSTALLATION - windows)

1. Python version 3.7 to be installed prior.

Package installation: NumPy, SciPy, Jupyter, StatsModel, Pandas

C:\Users\User>pip install numpy

Single-dimensional NumPy Array:

Multi-dimensional Numpy Array:

 We pass slice instead of index like this: [start:end].

>>>arr = np.array([1, 2, 3, 4, 5, 6, 7])

Name: 0, dtype: int64

Return row 0 and 1:

With the index argument, we can name your own indexes.

Load Files into a DataFrame

Creating a DataFrame using List:

Creating DataFrame from dict of ndarray/lists: To create DataFrame from dict of

File used: country.csv

iso_code continent location date total_cases

iso_code continent location date total_cases

Indexing a DataFrame using indexing operator []:

Working with Missing Data:

isnull() and notnull():

0 False False True

First Score Second Score Third Score

Iterating over rows and columns:

name degree score

CASE 1: READING DATA FROM EXCEL/CSV FILE

[150 rows x 5 columns]

# Printing top 5 rows

sepallength sepalwidth petallength petalwidth class

#Use the shape parameter to get the shape of the dataset.

Data columns (total 5 columns):

dtypes: float64(4), object(1)

CASE 2: READING DATA FROM TEXT FILE

Python is consistently rated as one of the world's most popular programming

Python is Open Source which means its available free of cost.

CASE 3: READING DATA FROM WEB

//To download a file from web using wget module

pip install wget

[150 rows x 5 columns]

1. Calculating Central Tendency

sepallength sepalwidth petallength petalwidth class

[150 rows x 5 columns]

kurtosis = 0: if peakedness of graph is equal to normal distribution.

Checking Missing Values

import seaborn as sns

Comparing Sepal Length and Sepal Width

import seaborn as sns

import seaborn as sns

Age Gender Polyuria ... Alopecia Obesity class

 h(xi) represents the predicted response value for ith observation.

(5d) Comparative Analysis

#three dimensonal plotting

ax.plot3D(xline, yline, zline, 'gray')

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-

Example – Dataset (California_cities.csv)

You might also like