Numpy Notes
Numpy Notes
Series: Creation of Series from – ndarray, dictionary, scalar value; mathematical operations;
Data Frames: Creation - from dictionary of Series, list of dictionaries, Text/CSV files;
display; iteration;
Purpose of plotting;
drawing and saving following types of plots using Matplotlib – line plot, bar graph, histogram
Customizing plots: adding label, title, and legend in plots.
Text functions: UCASE ()/UPPER (), LCASE ()/LOWER (), MID ()/SUBSTRING ()/SUBSTR (),
LENGTH (), LEFT (), RIGHT (), INSTR (), LTRIM (), RTRIM (), TRIM ().
Date Functions: NOW (), DATE (), MONTH (), MONTHNAME (), YEAR (), DAY (), DAYNAME ().
Aggregate Functions: MAX (), MIN (), AVG (), SUM (), COUNT (); using COUNT (*).
Querying and manipulating data using Group by, Having, Order by.
Introduction to networks,
Introduction to Internet, URL, WWW, and its applications- Web, email, Chat, VoIP.
Website: Introduction, difference between a website and webpage, static vs dynamic web
page, web server and hosting of a website.
Web Browsers: Introduction, commonly used browsers, browser settings, add-ons and plug-
ins, cookies.
Digital footprint, net and communication etiquettes, data protection, intellectual property
rights (IPR), plagiarism, licensing and copyright, free and open source software (FOSS),
cybercrime and cyber laws, hacking, phishing, cyber bullying, overview of Indian IT Act.
E-waste: hazards and management. Awareness about health concerns related to the usage of
technology.
2. Given a Series, print all the elements that are above the 75th percentile.
3. Create a Data Frame quarterly sales where each row contains the item category, item
name, and expenditure. Group the rows by the category and print the total expenditure per
category.
4. Create a data frame for examination result and display row labels, column labels data types
of each column and the dimensions
5.2 Visualization
1. Given the school result data, analyses the performance of the students on different
parameters, e.g subject wise or class wise.
2. For the Data frames created above, analyze, and plot appropriate charts with title and
legend.
3. Take data of your interest from an open source (e.g. data.gov.in), aggregate and summarize
it. Then plot it using different plotting functions of the Matplotlib library.
1. Create a student table with the student id, name, and marks as attributes where the
student id is the primary key.
4. Use the select command to get the details of the students with marks more than 80.
5. Find the min, max, sum, and average of the marks in a student marks table.
6. Find the total number of customers from each country in the table (customer ID, customer
Name, country) using group by.
7. Write a SQL query to order the (student ID, marks) table in descending order of the marks.
Streng
Subject Study th
% If % > 50 If % > 70
Purpose
20
CBSE Curriculum
Board Examination Teacher’s Guide
(+ Practical + Project
Topics)
Higher Studies with
Teacher’s Guide + Sumita Teacher’s Guide + Sumita
the Subject
Arora Arora
(Interest)
Uni
NCERT Chapter No & Topics To be Hours
t Curriculum Unit Name
Name Covered Allotted
No.
4 hours in April
32
iii. 2-D labelled array iii. Pandas . Dataframe 3+3 hours in
May
Networking / Web-World
3 hours in
3 Introduction to Networks 5. Internet and Web 3
Case-Study Based August
Analysis
4 hours in
4 Societal Impacts 6. Societal Impacts 4
September
Before we start with Chapter 2 - Data handling using Pandas – I
Learning objectives -
* What is Python, what are the views of the developer Guido Van Rossum
* RAD projects and python
* What are libraries, Python libraries, their purpose
* Introduction to Pandas
Text & Codes with output which are highlighted in blue are to be written in the IP register.
Interpreter – is a translator which converts the program/code line by line. You will/might have
notice that when you do coding in Python, the error is highlighted immediately and before
moving onto the next line you fix the error.
High-level programming – any programming which can be done with an easy set-up,
independent of platform specification, friendlier to use ( writing, understanding, support
and execution)
Dynamic semantics – Semantics are tools which help a programmer to make her program
user interactive. Dynamic semantics are the ways/features through which a programmer
can make her program, maybe, to update the data automatically or save memory spaces.
The tools are objects which are constructs which we create as an instance of the
modules/classes to bind them with their properties and functions, variables assigned with
multiple values, variable declaration is initiated only during run-time, in the program
Python’s high-level built in data structures, combined with dynamic typing and dynamic
binding, make it very attractive for Rapid Application Development, as well as for use as a
scripting or glue language to connect existing components together.
Data structures – are containers which hold data in particular patterns (Some linear, non-
linear, heterogenous, homogenous, tree like, etc. ) to establish relationship on these data, to
perform certain operations on these data in order to obtain a desired result.
#include <stdio.h>
int main( )
sum=num1+num2;
printf("%i", sum);
return 0;
num1=2; num2=5
sum=num1+num2
print(sum)
Dynamic binding - binding means using objects and the functions together (as objects are
instances of modules/classes)
2.Prototype
3.Receive Feedback
4.Finalize Software
At the very beginning, rapid application development sets itself apart from traditional
software development models. It doesn’t require you to sit with end users and get a detailed
list of specifications; instead, it asks for a broad requirement.
2. Prototype
This is where the actual development takes place. Instead of following a strict set of
requirements, developers create prototypes with different features and functions as fast as
they can. These prototypes are then shown to the clients who decide what they like and what
they don’t.
3. Receive Feedback
In this stage, feedback on what’s good, what’s not, what works, and what doesn’t is shared.
Feedback isn’t limited to just pure functionality, but also visuals and interfaces.
4. Finalize Software
Here, features, functions, aesthetics, and interface of the software are finalized with the client.
Stability, usability, and maintainability are of paramount importance before delivering to the
client.
Glue Language - the extension ("glue") modules are required because Python cannot call
C/C++ functions directly; the glue extensions handle conversion between Python data types
and C/C++ data types and error checking, translation error return values into Python
exception.
To develop an application we may require combining the desirable qualities: like speed of C
and Java (internally faster because uses compilers as translators) with ease of use of Python
(highly-user friendly because of dynamic semantics but internally slower because of
interpreter as translator). Turns out, executing C/Java code from Python is not that hard. So it
became a practice to run fast C/Java code through Python. The "through Python" part is why
it's called a "glue" language
Summary
Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost
of program maintenance. Python supports modules and packages, which encourages program
modularity and code reuse. The Python interpreter and the extensive standard library are
available in source or binary form without charge for all major platforms, and can be freely
distributed.
Python’s standard library is very extensive, offering a wide range of built-in modules (written
in C) that provide access to system functionality such as file I/O that would otherwise be
inaccessible to Python programmers.
ndarray(numpy array)
( Just read it )
NumPy- is a module that provides a multidimensional / n-dimesional array object
For fast operations on arrays, including mathematical, logical, shape manipulation, sorting,
selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations,
random simulation and much more.
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow
dynamically). Changing the size of a ndarray will create a new array and delete the
original.
The elements in a NumPy array are all required to be of the same data type, and thus will
be the same size in memory. The exception: one can have arrays of (Python, including
NumPy) objects, thereby allowing for arrays of different sized elements.
NumPy arrays facilitate advanced mathematical and other types of operations on large
numbers of data. Typically, such operations are executed more efficiently and with less
code than is possible using Python’s built-in sequences.
Data which need to be calculated and manipulated are first stored in the simplest form called Array and then
operations are performed on it to get the desired result.
Array
i) through numpy.array(obj)
( important )
i. numpy.array(object_name) - is a method / function of numpy module which converts the specified object in the
argument to an ndarray.
This object_name can be any valid data structure which holds data in it, like a list, dictionary, tuple etc.
Step 1 -- > Create a List (Here with the first 5 natural numbers)
MyList = [ 1 , 2 , 3 , 4 , 5 ]
** Since array( ) belongs to the numpy package so numpy should be imported in the program.
3. 'as' keyword
that user can use the submodules / classes / built-in functions of that particular module.
import numpy
MyList= [1, 2, 3, 4, 5]
MyList= [1, 2, 3, 4, 5]
print(np.array(MyList))
** why do we need to convert a list to an array !! why can’t we directly use a list instead!!
A list accepts the data value as string be it numbers, alphabets, characters. In case of nu mber values the
mathematical operations will not be possible. So we need to convert a list into an array.
import numpy as np
MyList = [1, 2, 3, 4, 5]
arr1=np.array(MyList)
Example 2
import numpy as np
# Creating a 1D array
print("1D Array:")
print(arr_1d)
# Accessing elements
# Slicing arrays
# Array attributes
# Reshaping arrays
# Creating a 2D array
print("\n2D Array:")
print(arr_2d)
print("Slicing 2D array (first two rows, first two columns):\n", arr_2d[:2, :2])
# Creating a 3D array
arr_3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print("\n3D Array:")
print(arr_3d)
print("Accessing element at the first 'layer', row 0, column 1 of 3D array:", arr_3d[0, 0, 1])
# Array operations
Special Codes to create an ndarray using the numpy.array() construct through list object –
a=0
b=1
mylist=[a,b] =[ 0 , 1 ]
for i in range(0,6): i= 0 1 2
c=a+b c= 1 2 3
a=b a= 1 1 2
b=c b=1 2 3
mylist.append(c)
print(mylistt
# Task 1.1 - WAP to create an ndarray of the generated list of the above series
import numpy
myarray = numpy.array(mylist)
print("The ndarray created from the list object of the first 8 digits are ",)
O/P The ndarray created from the list object of the first 8 digits are
[ 0 1 1 2 3 5 8 13]
# Task 2 - identify the pattern of the series in range 0 and 50- 0, 1, 4, 9, 16 ........
a=0
list2=[a]
for i in range(0,8):
num=a*a
list2.append(num)
a=a+1
print(list2)
arr2=numpy.array(list2)
print(arr2)
MYLIST1=[c]
# Task 3 - WAP to create a list of the given series and convert it to an ndarry
a=1
b=3
list2=[a, [b,a] ]
for i in range(0,8):
a=a*3
b=b*3
list2.append([a,b])
print(list2)
O/P [1, [3, 1], [3, 9], [9, 27], [27, 81], [81, 243], [243, 729], [729, 2187], [2187, 6561], [6561, 19683]]
object_name.append(new data) – is a method which adds on the new data / element to the existing data structure
/object after the old data.(does not replace the old data with the new data)
Example – L1 = [1, 2, 3, 4, 5]
print(L1)
if I wish to add next 5 natural numbers to the list L1
print(L1)
L1.append([6, 7, 8, 9, 10])
print(L1)
( Important )
ii. creating a ndarray with numpy.fromstring(string_data, [ dtype, ] sep) - this method /
function is used to create an array from a string data.
dtype - is the keyword used to define the data type of the array; and the default data type is
float.
Eg -- import numpy as np
print(np.fromstring('1234'))
Observe the output in each different arguments.
when fromstring( ) is used without the second argument which is 'sep' (separator)
then the output is ValueError which means the size of the data passed as an argument is
lesser to the required data length.
Imagine if we tried to put a Great Dane (dog) into a Chihuahua’s kennel. This would be a
problem with the value of the dog, because although they are both of type ‘dog’, a
Chihuahua’s kennel would not be able to accept a dog the size of a Great Dane.
When the second argument of the method fromstring( ) is ‘sep’ keyword with the value ‘,’
(Comma) then the output is like the string ends with a decimal point within the array.
When the second argument of the method fromstring( ) is ‘sep’ keyword with the value ‘ ’
(blank space) then the output is like the string elements are actually separated with blank
spaces within the array.
ii. numpy.empty( [rows / , columns], dtype=data_type) - is a method / function of numpy module which creates an
array with random values. (user need not specify the values)
[rows, columns] - to specify the total number of rows and columns of the array
dtype - is used to specify which type of data is to be generated; by default the data type is float.
Example --
import numpy
print(numpy.empty(5))
import numpy
print(numpy.empty(5, dtype=int))
numpy.empty( [ 3, 2 ], dtype=int )
** In the above program the empty( ) has generated an array with random values in a matrix of 3x2 where the
random values are shown as integer value. Kindly remember that these random values will be different each time
when the program is executed.
** the output is of system generated random default numbers of type float (long exponential type numbers )
Type Error - is generated when the parameter value does not match with the specified syntax.
Corrected Code –
iii. numpy.zeros( rows, columns , dtype=data_type) - this method/ function is used to create an array of specified
rows and columns with the data type specified. Values are zero for each element.
[rows, columns] - to specify the total number of rows and columns of the array
dtype - is used to specify which type of data is to be generated; by default the data type is float.
Example --
In the above example 5 columns and 1 row has been generated for the 2-D array all with the value '0' and of type
integer (which means without the decimal dot.)
In the above example 3 columns and 2 rows have been generated for the 2-D array all with the value '0' and of type
float(which means each zero value is suffixed with the decimal dot.)
print(np.ones(5))
# In the above code there is only one argument value '5', so no_of_rows =
1(default) and no_of_columns = 5 and the data / element value = ‘ 1. ’
Output = [ 1. 1. 1. 1. 1. ]
print(np.ones([2, 5]))
#In the above code the dimension of the matrix is of 2 rows and 5 columns.
#In the above code dtype argument is assigned to value ‘int’ which means all the
element values which is 1, of this matrix will be integer and will no more appear as
a float.
Output - [1 1 1 1 1]
[1 1 1 1 1]
import numpy
numpy.arange(0,10)
Output - [0123456789 ]
( Skip this )
Q^ in the above code why 10 is not being displayed?
1 2 3 4 5
0 0 0<10 [0] 0 + 1 =1
yes
1 1<10 [0 1] 1 + 1 =2
yes
2 2<10 [0 1 2] 2 + 1 =3
yes
9 9<10 [0 1 2 …. 9 + 1 =10
yes 9]
.. 10 10<10 no x x
X
The condition becomes false and so execution stops and loop is terminated,
so that is why 10 is not displayed as the last value of the array but 9 is.
Code 3
Code 4
M-D
Code 1
for j in numpy.arange(1):
print(i, j)
I i<13 (if yes j. j<1 (if yes print(i,j) j++ loop to j i++ loop to i
continue else continue else condition condition
exit from loop) exit from loop) check check
(step 4) (step 2)
to Step 8
12<13 yes 0 0<1 yes 12 0 1 jump to
Step 4
to Step 8
13<13 no (exit X X X X X X X
the loop)
In the above code variable i and j reaches till the value 13 and 1 respectively but do not execute for these values.
(because the condition becomes false for these values)
12 0
Code 2
.arange(1) - - 0
.arange(1) .arange(1) - 0 0
.arange(1,2) - - 1
.arange(1,2) . arange(1) - 1 0
.arange(1,2) .arange(1,2) - 1 1
In all of the above examples all of the loops have 1 element value so there is
always one single row output.
Code 3
.arange(1,3) - - 1
.arange(1,3) .arange(1) - 1 0
2 0
2 0 0
.arange(1,3) .arange(1,2) - 1 1
2 1
.arange(1,3) .arange(1,2) .arange(1, 1 1 1
2)
2 1 1
2 1 0
? ? ? 1 0 1
2 0 1
The outer loop has more than one element values but at the same time all the
inner loops have only 1 element value, so the output has 2 rows filled with the
respective loop’s element values.
Code 4
.arange(1, 3) .arange(1,3) - 1 1
1 2
2 1
2 2
if yes if yes
next next
step step
if no if no
then then
exit Step 8
3 3< X X X x X X X
Code 5
1 2 0
2 1 0
2 2 0
2 1 1
2 2 1
1 2 1
1 2 2
2 1 1
2 1 2
2 2 1
2 2 2
Some more code examples are -
( Imp. Examples )
linspace(initial_value, final_value, no ) - is used to show specified number of
data / elements / values as ‘no’ in the specified range ( from initial value to final
value) at equal intervals.
Syntax
#no = total number of data in the range including the initial and final value
Code 1
print( numpy.linspace(10,11,5) )
# no = 5
# equalgap / the exact difference between each two values or the x = 11-
10/5-1
x=
x = 0.25
Final Output =
Code 2
print( numpy.linspace(3,1,6) )
# total values = 6
# equalgap / the exact difference between each two values or the x = 3-1 / 6-
1
x=2/5
x = 0.4
Final Output =
When linspace( ) is specified with pair of range values in arguments then data range
appears in matrix
Here in the matrix to be formed the Column 1 data range = 1 to 4 and Column 2 data
range = 2 to 10
1+x 2+y
1+x+x 2+y+y
4 10 ]
For Column 1
# stop_value – start_value = 4-1 = 3
# equalgap / the exact difference between each two values or the x = 4-1/4-1
x = 3/3
x=1
1.+1.
1.+1.+1.
4. ]
For Column 2
# stop_value – start_value = 10-2 = 8
# equalgap / the exact difference between each two values or the y = 10-2/4-
1
y = 8/3
y=
2.6667
2.+2.6667
2.+2.6667+2.6667
10. ]
Final Output =
[ [ 1. 2. ]
[ 2. 4.66666667 ]
[ 3. 7.33333333 ]
[ 4. 10. ] ]
copy( )
import numpy
print(y)
In the second code example 'y' is an array variable which is a copy of the array 'x' and
is being increased by value 10. Which means that each element of the array will be increased
by 10. And thats is reflected in the output too.
reshape( )
Eg -
print(x.reshape(3,2)) # 3x2 = 6
** copy( ) and reshape( ) needs a predefined array variable as its argument.
full( )
Syntax
shape
Example 1
In the above examples shape is when specified as one value in the argument of this function
then it becomes total number of columns and number of rows remain 1 (as default).
When 2 values are specified as matrix dimension in the argument of this function then the
first value is number of rows and second value is number of columns .
Example 2
In the above example the fill value is a boolean value True for the matrix of 3x3 and is shown
in the output too.
Example 3
In the above example the fill value is a string value 'madam' for a matrix of 2x3 and is
reflected in the output too.