Unit 5-COURSE MATERIAL
Unit 5-COURSE MATERIAL
The main focus of businesses using big data was on building frameworks that can store a
large amount of data. Then, frameworks like Hadoop were created, which helped in storing
massive amounts of data.With the problem of storage solved, the focus then shifted to
processing the data that is stored. This is where data science came in as the future for
processing and analysing data. Now, data science has become an integral part of all the
businesses that deal with large amounts of data. Companies today hire data scientists and
professionals who take the data and turn it into a meaningful resource.
What is Data Science? Data science is all about finding and exploring data in the real world
and using that knowledge to solve business problems. Some examples of data science are:
Customer Prediction - System can be trained based on customer behavior patterns
to predict the likelihood of a customer buying a product
Service Planning - Restaurants can predict how many customers will visit on the
weekend and plan their food inventory to handle the demand
Why Python? When it comes to data science, we need some sort of programming language
or tool, like Python. Although there are other tools for data science, like R and SAS, we will
focus on Python and how it is beneficial for data science in this article.
Python as a programming language has become very popular in recent times. It has
been used in data science, IoT, AI, and other technologies, which has added to its
popularity.
Python is used as a programming language for data science because it contains costly
tools from a mathematical or statistical perspective. It is one of the significant reasons
why data scientists around the world use Python. If you track the trends over the past
few years, you will notice that Python has become the programming language of
choice, particularly for data science.
There are several other reasons why Python is one of the most used programming
languages for data science, including:
Python allows its users to create their own Data Structures enabling them to have full
control over their functionality. The most prominent Data Structures are Stack, Queue, Tree,
Linked List and so on which are also available to you in other programming languages. So
now that you know what are the types available to you, why don’t we move ahead to the Data
Structures and implement them using Python.
5.2.1. Built-in Data Structures
As the name suggests, these Data Structures are built-in with Python which makes
programming easier and helps programmers use them to obtain solutions faster.
Lists
Lists are used to store data of different data types in a sequential manner. There are addresses
assigned to every element of the list, which is called as Index. The index value starts from 0
and goes on until the last element called the positive index. There is also negative
indexing which starts from -1 enabling you to access elements from the last to first.
Dictionary
Dictionaries are used to store key-value pairs. To understand better, think of a phone
directory where hundreds and thousands of names and their corresponding numbers have
been added. Now the constant values here are Name and the Phone Numbers which are called
as the keys. And the various names and phone numbers are the values that have been fed to
the keys. If you access the values of the keys, you will obtain all the names and phone
numbers. So that is what a key-value pair is. And in Python, this structure is stored using
Dictionaries. Let us understand this better with an example program.
Tuple
Tuples are the same as lists are with the exception that the data once entered into the tuple
cannot be changed no matter what. The only exception is when the data inside the tuple is
mutable, only then the tuple data can be changed. The example program will help you
understand better.
Sets
Sets are a collection of unordered elements that are unique. Meaning that even if the data is
repeated more than one time, it would be entered into the set only once. It resembles the sets
that you have learnt in arithmetic. The operations also are the same as is with the arithmetic
sets. An example program would help you understand better.
5.2.3. User-Defined Data Structures
Stack
Stacks are linear Data Structures which are based on the principle of Last-In-First-Out
(LIFO) where data which is entered last will be the first to get accessed. It is built using the
array structure and has operations namely, pushing (adding) elements, popping (deleting)
elements and accessing elements only from one point in the stack called as the TOP. This
TOP is the pointer to the current position of the stack. Stacks are prominently used in
applications such as Recursive Programming, reversing words, undo mechanisms in word
editors and so forth.
Tree
Trees are non-linear Data Structures which have root and nodes. The root is the node from
where the data originates and the nodes are the other data points that are available to us. The
node that precedes is the parent and the node after is called the child. There are levels a tree
has to show the depth of information. The last nodes are called the leaves. Trees create a
hierarchy which can be used in a lot of real-world applications such as the HTML pages use
trees to distinguish which tag comes under which block. It is also efficient in searching
purposes and much more.
Linked List
Linked lists are linear Data Structures which are not stored consequently but are linked with
each other using pointers. The node of a linked list is composed of data and a pointer called
next. These structures are most widely used in image viewing applications, music player
applications and so forth.
Graph
Graphs are used to store data collection of points called vertices (nodes) and edges (edges).
Graphs can be called as the most accurate representation of a real-world map. They are used
to find the various cost-to-distance between the various data points called as the nodes and
hence find the least path. Many applications such as Google Maps, Uber, and many more use
Graphs to find the least distance and increase profits in the best ways.
Syntax
def functionname( parameters ):
"function_docstring"
function_suite
return [expression]
Types of Functions
Basically, we can divide functions into the following two types:
1. Built-in functions - Functions that are built into Python.
2. User-defined functions - Functions defined by the users themselves.
Creating a Function
Example
def my_function():
print("Hello from a function")
Calling a Function
To call a function, use the function name followed by parenthesis:
Example
def my_function():
print("Hello from a function")
my_function()
Function Arguments
You can call a function by using the following types of formal arguments −
Required arguments
Keyword arguments
Default arguments
Variable-length arguments
Arguments of a Function
Arguments are the values passed inside the parenthesis of the function. A function can have
any number of arguments separated by a comma.
Types of Arguments
Python supports various types of arguments that can be passed at the time of the function call.
Required arguments
Required arguments are the arguments passed to a function in correct positional order. Here,
the number of arguments in the function call should match exactly with the function
definition.
# Function definition is here
def printme( str ):
"This prints a passed string into this function"
print str
return;
Keyword arguments
Keyword arguments are related to the function calls. When you use keyword arguments in a
function call, the caller identifies the arguments by the parameter name.
Name: miki
Age 50
Default arguments
A default argument is a parameter that assumes a default value if a value is not provided in
the function call for that argument. The following example illustrates Default arguments.
Output
('x: ', 10)
('y: ', 50)
Python is a simple programming language to learn, and there is some basic stuff that you can
do with it, like adding, printing statements, and so on. However, if you want to perform data
analysis, you need to import specific libraries. Some examples include:
Pandas - Used for structured data operations
NumPy - A powerful library that helps you create n-dimensional arrays
SciPy - Provides scientific capabilities, like linear algebra and Fourier transform
Matplotlib - Primarily used for visualization purposes
Scikit-learn - Used to perform all machine learning activities
In addition to these, there are other libraries as well, like:
Networks & I graph
TensorFlow
BeautifulSoup
OS
5.5. NumPy
1 object Any object exposing the array interface method returns an array or any
(nested) sequence.
5 subok By default, returned array forced to be a base class array. If true, sub-
classes passed through
Example 1
import numpy as np
a = np.array([1,2,3])
print(a)
bool_
Boolean (True or False) stored as a byte
int_
Default integer type (same as C long; normally either int64 or int32)
intc
Identical to C int (normally int32 or int64)
intp
An integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8
Byte (-128 to 127)
int16
Integer (-32768 to 32767)
float_
Shorthand for float64
float64
Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
float64
Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_
Shorthand for complex128
complex64
Complex number, represented by two 32-bit floats (real and imaginary components)
complex128
Complex number, represented by two 64-bit floats (real and imaginary components)
NumPy numerical types are instances of dtype (data-type) objects, each having unique
characteristics. The dtypes are available as np.bool_, np.float32, etc.
Example 1
# using array-scalar type
import numpy as np
dt = np.dtype(np.int32)
print(dt)
5.5.2. ndarray.shape
This array attribute returns a tuple consisting of array dimensions. It can also be used to
resize the array.
Example 1
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print (a.shape)
Example 2
# this resizes the ndarray
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
a.shape = (3,2)
print(a)
5.5.3. ndarray.ndim
This array attribute returns the number of array dimensions.
Example 1
# an array of evenly spaced numbers
import numpy as np
a = np.arange(24)
print(a)
Example 2
# this is one dimensional array
import numpy as np
a = np.arange(24)
a.ndim
# now reshape it
b = a.reshape(2,4,3)
print(b)
# b is having three dimensions
5.5.4. numpy.itemsize
This array attribute returns the length of each element of array in bytes.
Example 1
# dtype of array is int8 (1 byte)
import numpy as np
x = np.array([1,2,3,4,5], dtype = np.int8)
print (x.itemsize)
5.5.5. numpy.flags
The ndarray object has the following attributes. Its current values are returned by this
function.
Sr.No. Attribute & Description
4 WRITEABLE (W)The data area can be written to. Setting this to False
locks the data, making it read-only
5 ALIGNED (A)The data and all elements are aligned appropriately for the
hardware
Example
The following example shows the current values of flags.
3 Order: ‘C’ for C-style row-major array, ‘F’ for FORTRAN style
column-
Example
5.5.7. numpy.zeros
Example 1
# array of five ones. Default dtype is float
import numpy as np
x = np.ones(5)
print(x)
The output is as follows −
[ 1. 1. 1. 1. 1.]
Contents of ndarray object can be accessed and modified by indexing or slicing, just like
Python’s in-built container objects. items in ndarray object follows zero-based index.
Three types of indexing methods are available − field access, basic slicing and advanced
indexing.
Example 1
Integer Indexing
This mechanism helps in selecting any arbitrary item in an array based on its N-
dimensional index. Each integer array represents the number of indexes into that
dimension. When the index consists of as many integer arrays as the dimensions
of the target ndarray, it becomes straightforward.
In the following example, one element of the specified column from each row of
ndarray object is selected. Hence, the row index contains all row numbers, and the
column index specifies the element to be selected.
Example 1
import numpy as np
x = np.array([[1, 2], [3, 4], [5, 6]])
y = x[[0,1,2], [0,1,0]]
print(y)
The selection includes elements at (0,0), (1,1) and (2,0) from the first array.
In the following example, elements placed at corners of a 4X3 array are selected.
The row indices of selection are [0, 0] and [3,3] whereas the column indices are
[0,2] and [0,2].
Advanced and basic indexing can be combined by using one slice (:) or ellipsis
(…) with an index array. The following example uses a slice for the advanced
index for column. The result is the same when a slice is used for both. But
advanced index results in copy and may have different memory layout.
This type of advanced indexing is used when the resultant object is meant to be the result
of Boolean operations, such as comparison operators.
Example 1
In this example, items greater than 5 are returned as a result of Boolean indexing.
import numpy as np
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
print (‘Our array is:’)
print(x)
print ‘\n’
# Now we will print the items greater than 5
print (‘The items greater than 5 are:’)
print (x[x > 5])
The output of this program would be −
Our array is:
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
The items greater than 5 are:
[ 6 7 8 9 10 11]
Example 1
import numpy as np
a = np.array([1,2,3,4])
b = np.array([10,20,30,40])
c = a * b
print(c)
Transpose Operations
2 ndarray.T:Same as self.transpose()
Changing Dimensions
Splitting Arrays
3 insert:Inserts the values along the given axis before the given indices
Trigonometric Functions
NumPy has standard trigonometric functions which return trigonometric ratios for a given
angle in radians.
Example
import numpy as np
a = np.array([0,30,45,60,90])
print (‘Sine of different angles:’ )
# Convert to radians by multiplying with pi/180
Print(np.sin(a*np.pi/180))
print (‘\n’)
print(‘Cosine values for angles in array:’_
print(np.cos(a*np.pi/180))
print (‘\n’ )
print(‘Tangent values for given angles:’ )
print (np.tan(a*np.pi/180) )
arcsin, arcos, and arctan functions return the trigonometric inverse of sin, cos, and tan
of the given angle. The result of these functions can be verified by numpy.degrees()
function by converting radians to degrees.
1 a: Input data
NumPy has quite a few useful statistical functions for finding minimum, maximum,
percentile standard deviation and variance, etc. from the given elements in the array. The
functions are explained as follows −
numpy.amin() and numpy.amax()numpy.amin() and numpy.amax()
These functions return the minimum and the maximum from the elements in the given
array along the specified axis.
Example
import numpy as np
a = np.array([[3,7,5],[8,4,3],[2,4,9]])
print ‘Our array is:’
print(a)
print (‘\n’ )
print ‘Applying amin() function:’
print(np.amin(a,1))
print (‘\n’ )
print ‘Applying amin() function again:’
print (np.amin(a,0))
print (‘\n’ )
print ‘Applying amax() function:’
print (np.amax(a) )
print (‘\n’)
print ‘Applying amax() function again:’
print(np.amax(a, axis = 0))
5.5.15. numpy.ptp()
Percentile (or a centile) is a measure used in statistics indicating the value below which a
given percentage of observations in a group of observations fall. The
function numpy.percentile() takes the following arguments.
Where,
1 a Input array
A variety of sorting related functions are available in NumPy. These sorting functions
implement different sorting algorithms, each of them characterized by the speed of
execution, worst-case performance, the workspace required and the stability of
algorithms. Following table shows the comparison of three sorting algorithms.
‘quicksort’ 1 O(n^2) 0 no
‘heapsort’ 3 O(n*log(n)) 0 no
5.5.16. numpy.sort()
The sort() function returns a sorted copy of the input array. It has the following
parameters −
numpy.sort(a, axis, kind, order)
Where,
1 aArray to be sorted
2 axisThe axis along which the array is to be sorted. If none, the array is flattened,
sorting on the last axis
3 kindDefault is quicksort
5.5.19. No Copy
Simple assignments do not make the copy of array object. Instead, it uses the same id() of
the original array to access it. The id() returns a universal identifier of Python object,
similar to the pointer in C.
Furthermore, any changes in either gets reflected in the other. For example, the changing
shape of one will change the shape of the other too.
5.5.22. matlib.empty()
3 order C or F
Example
import numpy.matlib
import numpy as np
print(np.matlib.empty((2,2)))
# filled with random data
This function returns a matrix with 1 along the diagonal elements and the zeros
elsewhere. The function takes the following parameters.
numpy.matlib.eye(n, M,k, dtype)
Where,
3 k Index of diagonal
Example
import numpy.matlib
import numpy as np
print np.matlib.eye(n = 3, M = 4, k = 0, dtype = float)
It will produce the following output −
[[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]]
1 dot
# initializing matrices
x = numpy.array([[1, 2], [4, 5]])
y = numpy.array([[7, 8], [9, 10]])
# initializing matrices
x = numpy.array([[1, 2], [4, 5]])
y = numpy.array([[7, 8], [9, 10]])
Transpose
Determinant
Inverse
# Taking a 3 * 3 matrix
A = np.array([[6, 1, 1],
[4, -2, 5],
[2, 8, 7]])
Example 2
A = np.array([[4, 3, 2], [-2, 2, 3], [3, -5, 2]])
B = np.array([25, -10, -4])
X = np.linalg.inv(A).dot(B)
print(X)
Output:
[ 5. 3. -2.]
np.arange
import numpy as np
#create an array
arr = np.arange(1,10).reshape(3,3)
#finding the Eigenvalue and Eigenvectors of arr
np.linalg.eig(arr)
5.6. Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of arrays.
Matplotlib is a multi-platform data visualization library built on NumPy arrays and
designed to work with the broader SciPy stack. It was introduced by John Hunter in
the year 2002.
One of the greatest benefits of visualization is that it allows us visual access to huge
amounts of data in easily digestible visuals. Matplotlib consists of several plots like
line, bar, scatter, histogram etc.
Installation :
Windows, Linux and macOS distributions have matplotlib and most of its dependencies as
wheel packages. Run the following command to install matplotlib package :
python -mpip install -U matplotlib
Importing matplotlib :
Output:
2. Bar plot :
Output:
3. Histogram :
A histogram is a graph showing frequency distributions.
Output:
4. Scatter Plot :
With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the
same length, one for the values of the x-axis, and one for values on the y-axis:
Output:
Creating Pie Charts
With Pyplot, you can use the pie() function to draw pie charts:
import matplotlib.pyplot as plt
import numpy as np
y = np.array([35, 25, 25, 15])
plt.pie(y)
plt.show()
5.7. Pandas
Pandas is used for data manipulation, analysis and cleaning. Python pandas is well suited for
different kinds of data, such as:
Tabular data with heterogeneously-typed columns
Ordered and unordered time series data
Arbitrary matrix data with row & column labels
Unlabelled data
Any other form of observational or statistical data sets
5.7.1. Series
Create a simple Pandas Series from a list:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
5.7.2. What is a DataFrame?
It is a widely used data structure of pandas and works with a two-dimensional array with
labeled axes (rows and columns). DataFrame is defined as a standard way to store data and
has two different indexes, i.e., row index and column index. It consists of the following
properties:
o The columns can be heterogeneous types like int, bool, and so on.
o It can be seen as a dictionary of Series structure where both the rows and columns are
indexed. It is denoted as "columns" in case of columns and "index" in case of rows.
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Result
calories duration
0 420 50
1 380 40
2 390 45
import pandas as pd
# a list of strings
x = ['Python', 'Pandas']
# Calling DataFrame constructor on list
df = pd.DataFrame(x)
print(df)
Output
0
0 Python
1 Pandas
# importing the pandas library
import pandas as pd
df = pd.DataFrame()
print (df)
Output
Empty DataFrame
Columns: []
Index: []
# importing the pandas library
import pandas as pd
info = {'ID' :[101, 102, 103],'Department' :['B.Sc','B.Tech','M.Tech',]}
df = pd.DataFrame(info)
print (df)
Output
ID Department
0 101 B.Sc
1 102 B.Tech
2 103 M.Tech
Create a DataFrame from Dict of Series:
# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'
d1 = pd.DataFrame(info)
print (d1)
Output
one two
a 1.0 1
b 2.0 2
c 3.0 3
d 4.0 4
e 5.0 5
f 6.0 6
g NaN 7
h NaN 8
Column Selection
We can select any column from the DataFrame. Here is the code that demonstrates how to
select a column from the DataFrame.
# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'}
d1 = pd.DataFrame(info)
print (d1 ['one'])
Output
a 1.0
b 2.0
c 3.0
d 4.0
e 5.0
f 6.0
g NaN
h NaN
Name: one, dtype: float64
Column Addition
We can also add any new column to an existing DataFrame. The below code demonstrates
how to add any new column to an existing DataFrame:
# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
# Add a new column to an existing DataFrame object
print ("Add new column by passing series")
df['three']=pd.Series([20,40,60],index=['a','b','c'])
print (df)
print ("Add new column using existing DataFrame columns")
df['four']=df['one']+df['three']
print (df)
Output
Column Deletion:
We can also delete any column from the existing DataFrame. This code helps to demonstrate
how the column can be deleted from an existing DataFrame:
# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2], index= ['a', 'b']),
'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}
df = pd.DataFrame(info)
print ("The DataFrame:")
print (df)
# using del function
print ("Delete the first column:")
del df['one']
print (df)
# using pop function
print ("Delete the another column:")
df.pop('two')
print (df)
Output
The DataFrame:
one two
a 1.0 1
b 2.0 2
c NaN 3
Row Selection:
We can easily select, add, or delete any row at anytime. First of all, we will understand the
row selection. Let's see how we can select a row using different ways that are as follows:
Selection by Label:
# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df.loc['b'])
Output
one 2.0
two 2.0
Name: b, dtype: float64
The rows can also be selected by passing the integer location to an ilocfunction.
# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df.iloc[3])
Output
one 4.0
two 4.0
Name: d, dtype: float64
Slice Rows
# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df[2:5])
Output
one two
c 3.0 3
d 4.0 4
e 5.0 5
Addition of rows:
We can easily add new rows to the DataFrame using append function. It add the new rows at
the end.
# importing the pandas library
import pandas as pd
d = pd.DataFrame([[7, 8], [9, 10]], columns = ['x','y'])
d2 = pd.DataFrame([[11, 12], [13, 14]], columns = ['x','y'])
d = d.append(d2)
print (d)
Output
x y
0 7 8
1 9 10
0 11 12
1 13 14
Deletion of rows:
We can delete or drop any rows from a DataFrame using the index label. If in case, the label
is duplicate then multiple rows will be deleted.
# importing the pandas library
import pandas as pd
a_info = pd.DataFrame([[4, 5], [6, 7]], columns = ['x','y'])
b_info = pd.DataFrame([[8, 9], [10, 11]], columns = ['x','y'])
a_info = a_info.append(b_info)
# Drop rows with label 0
a_info = a_info.drop(0)
Output
x y
1 6 7
1 10 11
A simple way to store big data sets is to use CSV files (comma separated files).
CSV files contains plain text and is a well know format that can be read by everyone
including Pandas.
import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Viewing the Data
One of the most used method for getting a quick overview of the DataFrame, is
the head() method.
The head() method returns the headers and a specified number of rows, starting from the top.
Example
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head(10))
The tail() method returns the headers and a specified number of rows, starting from the
bottom.
Example
Print the last 5 rows of the DataFrame:
print(df.tail())
Info About the Data
The DataFrames object has a method called info(), that gives you more information about the
data set.
Computational Complexity
Time Complexity in Python Now-a-days, for one problem we can write the solution in n
number of ways, but, how can we decide which type is better. We can use different types of
algorithms to solve one problem. We need to compare these algorithms and have to choose the
best one to solve the problem.
The amount of time it takes to run the program and perform the functions in it is known
as Time Complexity. By using Time Complexity we can determine whether the program is
efficient or we have to use another algorithm which take less time compared to the other one.
Reducing Time Complexity of an algorithm is often difficult in Data Science, rather than
difficult we can say its a bigger challenge.
We will tell the time complexity of a program by calculating the time taken to run the
algorithm in the worst-case scenario.
When analyzing the time complexity of an algorithm we may find three cases: best-case,
average-case and worst-case.
Example: Suppose we have the following unsorted list [1, 5, 3, 9, 2, 4, 6, 7, 8] and we need to
find the index of a value in this list using linear search.
best-case: this is the complexity of solving the problem for the best input. In our example, the
best case would be to search for the value 1. Since this is the first value of the list, it would be
found in the first iteration.
average-case: this is the average complexity of solving the problem. This complexity is
defined with respect to the distribution of the values in the input data. Maybe this is not the
best example but, based on our sample, we could say that the average-case would be when
we’re searching for some value in the “middle” of the list, for example, the value 2.
worst-case: this is the complexity of solving the problem for the worst input of size n. In our
example, the worst-case would be to search for the value 8, which is the last element from the
list.
Big-O notation is used to classify algorithms according to how their run time or space
requirements grow as the input size grows. The letter O is used because the growth rate of a
function is also referred to as the order of the function or order of the program. We will
always refer order of the function in its worst-case.
O(1) constant
O(logn) logarithmic
O(n) linear
Ο(n2) quadratic
Ο(n3) cubic
O(2n) exponential
A linear search is the most basic kind of search that is performed. A linear or
sequential search, is done when you inspect each item in a list one by one from one
end to the other to find a match for what you are searching for.
Let’s see the example I stated above once again to understand the Linear Search
We have a list which consists of integers and we have to check whether the number
given by the user is present in that list or not.
l = [1,2,3,6,4,9,10,12]
k = 12
The simple code for this is
l = [1,2,3,6,4,9,10,12]
k = 12
for i in range(0, len(l)):
if l[i] == k:
print("Yes")
break
Here, the worst-case for this algorithm is to check the number which is present in the
last element of the given list. So, if we go by the above program, first it’ll start with
index 0 and check whether that element in the list is equal to k or not, i.e, one
operation and we have to check for every element in the list for worst-case scenario.
The Time Complexity of the above program is O(n).
Example 2: O(n2)
The complexity of an algorithm is said to be quadratic when the steps required to execute an
algorithm are a quadratic function of the number of items in the input. Quadratic complexity is
denoted as O(n^2). Take a look at the following example to see a function with quadratic
complexity:
def quadratic_algo(items):
for item in items:
for item2 in items:
print(item, ' ' ,item)
quadratic_algo([4, 5, 6, 8])
In this example, we have an outer loop that iterates through all the items in the input list and
then a nested inner loop, which again iterates through all the items in the input list. The total
number of steps performed is n * n, where n is the number of items in the input array.
def mergeSort(alist):
print("Splitting ",alist)
if len(alist)>1:
mid = len(alist)//2
lefthalf = alist[:mid]
righthalf = alist[mid:]
mergeSort(lefthalf)
mergeSort(righthalf)
i=0
j=0
k=0
while i < len(lefthalf) and j < len(righthalf):
if lefthalf[i] <= righthalf[j]:
alist[k]=lefthalf[i]
i=i+1
else:
alist[k]=righthalf[j]
j=j+1
k=k+1
Output:
Enter the list of numbers: 56 48 10 2 40
Sorted list: [2, 10, 40, 48, 56]
def insertionSort(nlist):
for index in range(1,len(nlist)):
currentvalue = nlist[index]
position = index
while position>0 and nlist[position-1]>currentvalue:
nlist[position]=nlist[position-1]
position = position-1
nlist[position]=currentvalue
Output:
Enter the list of numbers: 4 5 6 3 1
Sorted list: [1, 3, 4, 5, 6]
Binary search is a searching algorithm that works efficiently with a sorted list.
If a list is already sorted, then the search for an element in the list can be made faster
by using ‘divide and conquer’ technique.
The list is divided into two halves separated by the middle element.
The binary search follows the following steps:
Step 1: The middle element is tested for the required element. If found, then
its position is reported else the following test is made.
Step 2: If search element ‘val’< ‘middle’ element, search the left half of the
list, else search the right half of the list.
Step 3: Repeat step 1 and 2 on the selected half until the entry is found
otherwise report failure.
This search is called binary because in each iteration, the given list is divided into two
parts. Then the search becomes limited to half the size of the list to be searched.
Step 1:
# Iterative Binary Search Function method Python Implementation
# It returns index of n in given list1 if present,
# else returns -1
def binary_search(list1, n):
low = 0
high = len(list1) - 1
mid = 0
# Function call
result = binary_search(list1, n)
if result != -1:
print("Element is present at index", str(result))
else:
print("Element is not present in list1")
Output:
Enter the list of numbers: 10 3 2 13 5 6
enter the search element 2
2
Element is present at index 2
Given two lists, print all the common elements of two lists.
Input : list1 = [1, 2, 3, 4, 5]
list2 = [5, 6, 7, 8, 9]
Output : {5}
Explanation: The common elements of
both the lists are 3 and 4
Output:
{5}
No common elements
Method 2:Using Set’s intersection property
Convert the list to set by conversion. Use the intersection function to check if both sets have
any elements in common. If they have many elements in common, then print the intersection
of both sets.
Below is the Python3 implementation of the above approach:
# Python program to find common elements in
# both sets using intersection function in
# sets
# function
def common_member(a, b):
a_set = set(a)
b_set = set(b)
# check length
if len(a_set.intersection(b_set)) > 0:
return(a_set.intersection(b_set))
else:
return("no common elements")
a = [1, 2, 3, 4, 5]
b = [5, 6, 7, 8, 9]
print(common_member(a, b))
a =[1, 2, 3, 4, 5]
b =[6, 7, 8, 9]
print(common_member(a, b))
Output:
{5}
No common elements
A hash table is a form of list where elements are accessed by a keyword rather
than an index number.
Figure: An ideal hash table
In Python, the Dictionary data types represent the implementation of hash tables. The Keys
in the dictionary satisfy the following requirements.
The keys of the dictionary are hashable i.e. the are generated by hashing function
which generates unique result for each unique value supplied to the hash function.
The order of data elements in a dictionary is not fixed.
5.12. DICTIONARY
As you can see from above, we can also create a dictionary using the built-in dict()function.
While indexing is used with other data types to access values, a dictionary uses keys.
Keys can be used either inside square brackets [] or with the get() method.
If we use the square brackets [], KeyError is raised in case a key is not found in the
dictionary. On the other hand, the get() method returns None if the key is not found.
# Output: Jack
print(my_dict['name'])
# Output: 26
print(my_dict.get('age'))
# KeyError
print(my_dict['address'])
Output
Jack
26
None
Traceback (most recent call last):
File "<string>", line 15, in <module>
print(my_dict['address'])
KeyError: 'address'
Dictionaries are mutable. We can add new items or change the value of existing items
using an assignment operator.
If the key is already present, then the existing value gets updated. In case the key is
not present, a new (key: value) pair is added to the dictionary.
# update value
my_dict['age'] = 27
# add item
my_dict['address'] = 'Downtown'
Output
{'name': 'Jack', 'age': 27}
{'name': 'Jack', 'age': 27, 'address': 'Downtown'}
# create a dictionary
squares = {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Output: {1: 1, 2: 4, 3: 9}
print(squares)
# Output: {}
print(squares)
# Throws Error
print(squares)
Output
16
{1: 1, 2: 4, 3: 9, 5: 25}
(5, 25)
{1: 1, 2: 4, 3: 9}
{}
Traceback (most recent call last):
File "<string>", line 30, in <module>
print(squares)
NameError: name 'squares' is not defined
Method Description
fromkeys(seq[, Returns a new dictionary with keys from seq and value
v]) equal to v (defaults to None).
Returns the value of the key. If the key does not exist,
get(key[,d])
returns d (defaults to None).
# Dictionary Methods
marks = {}.fromkeys(['Math', 'English', 'Science'], 0)
Output
{'Math': 0, 'English': 0, 'Science': 0}
('Math', 0)
('English', 0)
('Science', 0)
['English', 'Math', 'Science']
print(squares)
Output
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
Output
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
Here are some examples to make a dictionary with only odd items.
# Dictionary Comprehension with if conditional
odd_squares = {x: x*x for x in range(11) if x % 2 == 1}
print(odd_squares)
Output
{1: 1, 3: 9, 5: 25, 7: 49, 9: 81}
We can test if a key is in a dictionary or not using the keyword in. Notice that the
membership test is only for the keys and not for the values.
# Output: True
print(1 in squares)
# Output: True
print(2 not in squares)
Functio
Description
n
Return True if all keys of the dictionary are True (or if the
all()
dictionary is empty).
# Output: False
print(all(squares))
# Output: True
print(any(squares))
# Output: 6
print(len(squares))
# Output: [0, 1, 3, 5, 7, 9]
print(sorted(squares))
Output
False
True
6
[0, 1, 3, 5, 7, 9]
Part-A
Q.No Questions Competence BT Level
BTL 1
1. Define Data structure Remember
BTL 2
2. List the types of built-in data structures Understand
BTL 1
3. Define function and write the syntax of function? Remember
BTL 1
4. Define function call Remember
Write a python program to find even or odd using function BTL 3
5. Apply
BTL 1
6. Define pandas Remember
BTL 2
7. Interpret Dataframe Understand
Elaborate numpy BTL 2
8. Understand
How to load data in pandas? BTL 2
9. Understand
Write about the libraries used for pre-processing and array BTL 2
10. operations? Understand