Unit - V
Unit - V
ANDTECHNOLOGY,
CHENNAI.
SRM
21CSS101J – Programming for Problem Solving
Unit 5
INSTITUTE OF SCIENCE
ANDTECHNOLOGY,
CHENNAI.
SRM
LEARNING RESOURCES
S. No TEXT BOOKS
3. https://www.tutorialspoint.com/python/index.htm
4. https://www.w3schools.com/python/
INSTITUTE OF SCIENCE ANDTECHNOLOGY,
CHENNAI.
SRM UNIT V
(TOPICS COVERED)
Creating NumPy Array -Numpy Indexing - Numpy Array
attributes - Slicing using Numpy - Descriptive Statistics in
Numpy: Percentile - Variance in Numpy –
UNIT-5
Numpy
(Numerical Python)
NumPy
Stands for Numerical Python
Is the fundamental package required for high performance
computing and data analysis
NumPy is so important for numerical computations in Python is
because it is designed for efficiency on large arrays of data.
It provides
ndarray for creating multiple dimensional arrays
Internally stores data in a contiguous block of memory,
independent of other built-in Python objects, use much less
memory than built-in Python sequences.
Standard math functions for fast operations on entire
arrays of data without having to write loops
NumPy Arrays are important because they enable you to
express batch operations on data without writing any for
loops. We call this vectorization.
NumPy ndarray vs list
One of the key features of NumPy is its N-dimensional array
object, or ndarray, which is a fast, flexible container for
large datasets in Python.
Whenever you see “array,” “NumPy array,” or “ndarray” in the
text, with few exceptions they all refer to the same thing: the
ndarray object.
NumPy-based algorithms are generally 10 to 100 times faster
(or more) than their pure Python counterparts and use
significantly less memory.
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))
ndarray
ndarray is used for storage of homogeneous data
Every array must have a shape and a dtype
Supports convenient slicing, indexing and efficient vectorized
computation
1-D Arrays
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
These are often used to represent matrix or 2nd order tensors.
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.
NumPy Arrays provides the ndim attribute that returns an integer that
tells us how many dimensions the array have
Negative
OUTPUT?
OUTPUT?
Slicing arrays
• Slicing in python means taking elements from one given index to another given
index.
• We pass slice instead of index like this: [start:end].
• We can also define the step, like this: [start:end:step].
• If we don't pass start its considered 0
• If we don't pass end its considered length of array in that dimension
• If we don't pass step its considered 1
2D Array
• The copy owns the data and any changes made to the copy will not
affect original array, and any changes made to the original array will not
affect the copy.
• The view does not own the data and any changes made to the view will
affect the original array, and any changes made to the original array will
affect the view.
Joining NumPy Arrays
We pass a sequence of arrays that we want to join to the concatenate() function, along with
the axis. If axis is not explicitly passed, it is taken as 0.
array_split() for splitting arrays, we pass it the array we want to split and the
number of splits.
If the array has less elements than required, it will adjust from the end accordingly.
Searching Arrays
You can search an array for a certain value, and return the indexes that get a
match. To search an array, use the where() method.
Sorting
Operations between arrays and scalars
Array creation functions
Numpy Indexing
Where,
N is the total number of elements or frequency of distribution.
Parameters:
a: Array containing data to be averaged
axis: Axis or axes along which to average a
dtype: Type to use in computing the variance.
out: Alternate output array in which to place the result.
ddof: Delta Degrees of Freedom
keepdims: If this is set to True, the axes which are reduced are left
in the result as dimensions with size one
Example:
OR
0 1 1 7 2 2 dtype: int64
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
Labels
• If nothing else is specified, the values are labeled with their
index number. First value has index 0, second value has index 1
etc.
• This label can be used to access a specified value.
Create Labels
With the index argument, you can name your own labels.
Key/Value Objects as Series
You can also use a key/value object, like a dictionary, when
creating a Series. The keys of the dictionary become the labels.
Example
Create a simple Pandas Series from a dictionary:
import pandas as pd
myvar = pd.Series(calories)
print(myvar)
To select only some of the items in the dictionary, use the index
argument and specify only the items you want to include in the
Series.
Example
Create a Series using only data from "day1" and "day2":
import pandas as pd
print(myvar)
Pandas DataFrame
It is two-dimensional size-
mutable, potentially
heterogeneous tabular data
structure with labeled axes
(rows and columns).
Example
Create a DataFrame from two Series:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2
dimensional array, or a table with rows and columns.
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Locate Row
As you can see from the result above, the DataFrame is like a
table with rows and columns.
Example
Return row 0:
Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Locate Named Indexes
Use the named index in the loc attribute to return the specified
row(s).
Example
Return "day2":
• A simple way to store big data sets is to use CSV files (Comma
Separated Files).
• CSV files contains plain text and is a well know format that can
be read by everyone including Pandas.
Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Output:
max_rows
• The number of rows returned is defined in Pandas option settings.
Creating a DataFrame
Dealing with Rows and Columns
Indexing and Selecting Data
Working with Missing Data
Iterating over rows and columns
Create a Pandas DataFrame from Lists
DataFrame can be created using a single list or a list of lists.
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# creating a dataframe from list
df = pd.DataFrame(dict)
# using isnull() function
df.isnull()
Querying from Data Frames
import pandas as pd
data = {
"name": ["Sally", "Mary", "John"],
"age": [50, 40, 30]
}
df = pd.DataFrame(data)
Step2:
import pandas as pd
path="/content/drive/MyDrive/CT2.csv"
df=pd.read_csv(path)
print(df.query('mark>30'))
Applying Functions to Data frames
The apply() function is used to apply a function along an axis of the
DataFrame.
As we don’t need to write the same code again and again for
different programs.