Python Notes
join will use the join the word in the list according to the joining condition
syntax: " ".join(l)
List provides a general meachanism to store the objects which have their specific
index by a number in python. The elements in the list are arbitary they can be
numbers string, functions, user defined objects.
The asterik * is overloaded for lists to serve as the repetition operator. the
result of applying repetition operator to a list is a single list with the element
of the original list repeated as many times as we specify
>>>l = [1,2,3]
Dictionaries in Python
Dictionaries in python is a data structure which is unorderd, more generally we
also called it associative array. A dictionary consists of a collection of key-
valued pairs. Each key-value pair maps the key to its associated value.
syntax of dictionary:
we can define a dictionary by enclosing a comma seprated key-value pair; d =
{k1:v1, k2:v2, k3:v3, .....}
dict_roll_name = {'CS101':'Yogesh', 'CS102':'Raghav', 'CS141':'Sohan'}
>> type(dict_roll_name)
a while loop i sused to execute a block of code repeatedly until a given condition
goes false. Whwn the condition goes false then the libe after loop got executed
while expression:
for i in range(n):
# Nested Looping
In nested looping we have more than one loop let us say we have two loops
the first one is called the outer loop, the second one is called inner loop
for i in range(3):
for j in range(2):
Break, Continue and Pass
you have to build a guess and win game, where the owner will choose a random number
less than or equal to 12 and then he gives player 3 turns to
attempt, but also tell hin that how many of his turns are remaining, and also
prompt him that his guess his higher than the jackpot number or lower so that this
will help him in gusessing
Function in pythons:
Function is a resuable component which we can use again and again just by changing
the values.
Anonymous Function
the syntax of anonymous function is lambda [args] : expressions
The lambda function can have only one expression but it can have any number of
# lambda function for the calculation of area of a rectangle
rectangle = lambda l,b : l*b
area = rectangle(14,6)
reduce function
The reduce function works also like lambda function but it operates in a different
way. this function takes two arguments perform operation according to lambda
function are store the result, in the next turn the reduce function will take gain
two arguments but this time it will take the previous result and the new argument
Nested Function
The nested function basically a function inside another function
def outer_fun():
def inner_fun():
return " i am from inner function"
return " i am from outer function"
# syntax:
# some code
# optional code
# handling of exception
# the else block will execute if there is no exception
# The code of this block will always get executed
Q. Write a python program that prompts the user to input an integer and raises a
ValueError exception if the input is not a valid integer.
Q write a program which will enter two values and raises type error we try to add a
string with a integer
Inheritance in Python
The process of inheriting the properties of the parent class into a child class is
called inheritance
The existing class is called the Base Class or Parent Class, and the new class is
called derived class or Child Class.
Class BaseClass:
Body of the base class
Class DerivedClass:
Body of the derived class
Type of Inheritances:
1. Single Inheritance : In single inheritance a child class inherits from a single-
parent class. Here is one child class and one parent class.
2. Multiple Inheritance : In Multiple Inheritance one child class can inherit from
multiple parent classes. So we have one child class and multiple parent classes.
base class can acess the member functions of parent classes and acess them, apart
from its regular child class functions
3. Multilevel Inheritance
In multilevel inheritance, a class inherits from a child class or derived class.
Suppose three classes A,B,C. A is the superclass, B is the child of A, C is the
child of B. In other words we can say a chain of classes is called multilevel
4. Hierarichal Inheritance
In heirarichal Inheritance, more than one child class is derived from a single
parent class. In other words, we can say one parent class and multiple child
5. Hybrid Inheritance
When the inheritance consist of different cominaions of inheritance hen it is
called hybrid inheritance, normally the hybrid inheritance is the combination of
multiple inheritance and hierarichal inheritance
Polymorphism means a function with the same name but with different forms, and it
is a very important concept in OOPS.
len(['this' 'is', 'an', 'era', 'of', 'python', 'programming'])
len({'Name':'John', 'Address':'Altlanta'})
File handling
The file handling is a very important topic
syntax: to open the file is
f = file(filename, mode)
r : read mode
w : write mode
rt : read text mode
wt : write text mode
a : append mode
w+ : to read and write data
a+ : to append and read data from the file
Whenever you have to read a file you ahve to open it and after reading the file
always close it
when we try to open a file using 'w' mode or 'w+' mode then if the file does not
exists then the file gets created but if the file is already created then it get
deleted and new file of same name is created
Numpy is called Numerical Python, it is the fundamental package for high
performance computing and data analysis. The NumPY provides the ndarray which is a
fast and space efficent multidimensional array providing vectorized airthmetic
operations and sophisicated broadcasting capabilities.
The N-dimensional array object or ndarray, which is fast, flexible container for
large data sets in python. Arrays enable you to perform mathematical operations on
whole block of data using the similar syntax.
wearegoingoworkt a[a-zA-Z0-9]t$
allareworkingfinet ^a[a-zA-Z0-9]t$
Web Scrapping:
Web scrapping is a automatic method to obtain large amount of data from a
particular website. The data is html is normally unstructured and we convert the
data into structured form and feed the data into table. There are different ways to
perform web scrapping but we have used here beautiful soup package to perform the
web scrapping.
Parsers: the parser are use to parse a unstrucutred data, we use two types of
parsers 1. html5, 2. lxml parser
A dataframe represents a tabular, excel sheets like data structure containing an
ordered collection of columns, each of which has different value types. The data
frame has both row and column index.
Data Visualization
Data Visulaization is the discipline of trying to understand data by placing it on
a visual context to get the patterns, trends and correlations that might not be
detected ithout visual representations.
Python offeres multiple great graphing libraries packed with lot of features
depending upon how much customized plot you want to make
To get the overview let us have some popular plotting libraries.
1. Matplotlib: It is a low level light weight provide lot of freedom
2. Pandas Visulaization: easy to use interface built on matplotlib
3. Seaborn: high level interface used to build exoctic graphs
4. Plotly: used to create interactive plots
Types of graphs
1. Scatter Plot : this plot is used to create scatter representation of data and it
use scatter method.
2. Line Plot: In matplotlib we can create a line chart by calling the plot method.
We can also plot multiplr columns in one graph by looping through the columns we
wnt plotting each column.
3. Histogram are basicaly used to visulaize the frequency distribution. In
matplotlib we can create a Histogram using the hist method.If we pass categorical
data then the histogram will make automatically
4. barplot : this pot needs frequency list seprately
Pandas Visualization
HeatMap: A heatmap is a graphical reprsentation of data where values contained in a
matrix are represented as colors, heat map are prefect graphs for exploring the
correlation of features in a dataset
Box plots:
The box plots are the very effecient plots in detecting the outliers
Machine Learning
Data : The data is normally in the form of .csv files, but we can also have data in
excel file format
Machine Learning==> The machine learning composed of two words machine and
learning, where machine indicates an automated process which is trained on lot of
data to make it learn for prediciting the results.
Machine learning is an application of artificial intelligence that uses statistical
techniques to enable computers to learnand make decisionswithout being explicitily
programmed. In ML computers can learn from data spot patters and make judgements
with little assistant from humans.
1. Model: the model is also known as hypothesis, a ml model is the real life
application of the problem and it is basically a mathematical representation.
2. Features: A feature is a measurable property also known as parameters the model
have some times the features are also called components.
3. Vector/Feature Vector: It is a multiple numeric feature. We use these feature
vectors as input to the machine learning models.
4. Training:
An algorithm takes a set of data known as training data as input. The learning
algorithm finds patterns in the input data and trains the model for expected
results known as target. " The output of the training data or process is the
machine learning model.
5. Prediction. Once the machine learning model is ready after training, it can be
fed with the test data to predict the output.
6. Target (Label): The value that the machine learning model has to predict is
called the target or label.
7. Overfitting: In this case the model performs well with the training data but it
fails when get tested on test data
8. Underfitting: when we have too few data to train the model
x = {1,2,3,4}
y = {5,8,11,14}
Supervised learning algorithm is a class of problems that uses a model to learn the
mapping between the input and target variables. Applications consisting of the
training data describing the various input variables and target variables are
called supervied learning tasks.
Let the set of input variable be X and the target variable be y. A supervised
learning algorithm tries to learn a model which can predict the values correctly.
S = [p, c, m]
s1 = [7, 6, 8]
s2 = [4, 7, 6]
s3 = [8, 9, 7]
s4 = [6, 5, 6]
D(s1,s2) = 2
D(s1,s3) = 1.66
D(s1,s4) = 1.33
D(s2,s3) = 2.33
D(s3,s4) = 2.33
D(s2,s4) =
Bi-Variate Analysis
sal = [10000,4000,2000,5000,8000,20000]
Age = [25,32,28,36,32,35]
std_scaler = x-mu/sigma(std)
min-max = x-min/max-min
Important Questions:
1. What is the dimesion of data X.
2. how many components are there in on vector of X==> keep in mind that each sample
is a vector in ML.
3. Why we do normalization
4. Different types of normalization
5. What is the effect of normalization
Cross Validation
If the data is not big then we use cross validation where data gets split into
blocks and cyclicly one block is being given for testing and remaining blocks are
given for training, and the process continues till all the blocks exhausted for
The regularization is used to avoid overfitting. The loss which occured when the
model predict the wrong class or regress the wrong values, then we say than the
model is giving us loss. Therefore the loss is the error and ther error further
increases when the model got overfitted.
The regularization is the mechanism where we add a penalty is the loss function
whcih keep on incresing as longer we train th model. This effort is also being used
to avoid over training of the model
Lasso Regularization
loss function = 1/N(summation(pred_y(i)-test_y(i))**2 for i in range(N)) +
1/2(weight parameters)
The classification is the technique to classify the data into two different classes
if we are talking about the binary classification. In classification we have
sigmoid function as the activation unit to classify data into two different
Q: TN: 50, FP: 10, FN: 5, TP:100 then calculate precision, recall and accuracy of
the models
Y = w1x1 + w2x2 + b
Decision Tree: A decision tree is a non parametric superised learning algorithm for
classification and regression tasks. The decision tree follows a hierarichal
strucutre consisting of a root node, branches, internal nodes.
root node: it is the top most node of the tree
branches: are used to connect the nodes
internal nodes: the internal are those which have atleast one child
leaf node: it is the node which has no child
Q. As we know that the root note which is using feature1 has got 8 yes and 4 no.
==> Entropy(Weather) = -(8/12*np.log(8/12))-(4/12*np.log(4/12))=0.6365
impurity for feature 2 has 5 yes and 2 no ==> N1
==> Entropy(weather, feature2) = -(5/7*np.log(5/7))-(2/7*np.log(2/7))=0.598
impurity for feature 3 has 3 yes and 2 no ==> N2
==> Entropy(weather, feature3) = -(3/5*np.log(3/5))-(2/5*np.log(2/5)) = 0.67
I.G = Entropy(Weather)-[7/12*Entropy(weather, feature2) + 5/12*Entropy(weather,
= 0.6365 - (0.58*0.598 + 0.41*0.67)
Gini Impurity
Gini impurity is a measure used in decision tree algorithms to quantify a data set
impurity level or disorder.
It ranges from 0 to 0.5 where 0 indicate a perfectly pure node means all instances
belong to the same class i.e. either Yes or No.
and 0.5 indicate perfectly impure node with maximum impurity.
formula is Gini Impurity = 1-Gini
Gini also called gini index is used to calculate the the probalility of correct
classification but when we substract it from 1 the expression give us impurity
for eg we choose a feature which gives 3 yes for node1 and 2 no for node 2
G(node1) = 1-(3/3)**2 = 1-1 = 0
G(node2) = 1-(2/2)**2 = 1-1 = 0
This data is downloaded from kaggle. It describes patient medical record data for
Pima Indians and wether they had an onset of diabetes within five years.
chances/onset : 1, not : 0
1. Number of times Pregnant
2. Plasma gulucose concentration
3. Blood pressure
4. triceps skin strength
5. serum insulin
6. BMI
7. Diabetes pedigree function
8. Age
Both of these conditions are not good for the health of the model and we cannot
genralize the the model properly.
5. Improve acuracy
a. Algorithm tunning
b. Ensembles
kNN : The kNN algorithm wat the most accurate model that was tested. Now we want to
get an idea of accuracy of the model on our validation data set. the closest
distance of the sample with either of the classes sampels will gives us idea that
our testing samle belong to which class.
Ideally The SVM is designed for binary classification, but they are used for
multiclass classification using the above two varaints
One vs One CLassifier:
In OVO we have number of classifers equal to n(n-1)/2, where n is the number of
[Red, Blue, Green, Yellow]
1. Binary classification : (Red, Blue) ==> red .60, blue, 0.40
2. Binary classification : (Red, Green) = r--> 0.78, g--> 0.22
3. Binary classification : (red, Yellow) ==> r-->0.65, y--> 0.35
4. Binary classification : (Blue, Yellow) ==> 0.32, y--> 0.78