Question Bank With Answers
Question Bank With Answers
(5 Marks)
Features
Object oriented language : . Python supports object-oriented language and concepts of classes, objects
encapsulation, etc.
Interpreted language : Python code is executed line by line at a time.
Supports dynamic data type : A variable is decided at run time not in advance. hence, we don’t need to specify
the type of variable. (for example- int, double, long, etc.)
Simple and easy to code : Python is very easy code
High-level Language
Automatic memory management
open source: Python language is freely available
advantages & disadvantages
advantages
Free availability (like Perl, Python is open source).
Stability (Python is in release 2.6 at this point and, as I noted earlier, is older than Java).
Very easy to learn and use
Good support for objects, modules, and other reusability mechanisms.
Easy integration with and extensibility using C and Java.
disadvantages
Smaller pool of Python developers compared to other languages, such as Java
Lack of true multiprocessor support
Absence of a commercial support point, even for an Open Source project (though this situation is
changing)
Software performance slow, not suitable for high performance applications
def sum(numbers):
total = 0
for x in numbers:
total += x
return total
print(sum((8, 2, 3, 0, 7)))
The while loop in Python is used to iterate over a block of code as long as the test expression (condition) is true.
We generally use this loop when we don't know the number of times to iterate beforehand.
Syntax of while Loop in Python
while test_expression:
Body of while
In the while loop, test expression is checked first. The body of the loop is entered only if the test_expression evaluates
to True. After one iteration, the test expression is checked again. This process continues until
the test_expression evaluates to False.
In Python, the body of the while loop is determined through indentation.
The body starts with indentation and the first unindented line marks the end.
Python interprets any non-zero value as True. None and 0 are interpreted as False.
Flowchart of while Loop
n = 10
while i <= n:
sum = sum + i
i = i+1 # update counter
Enter n: 10
The sum is 55
In the above program, the test expression will be True as long as our counter variable i is less than or equal to n (10 in
our program).
We need to increase the value of the counter variable in the body of the loop. This is very important (and mostly
forgotten). Failing to do so will result in an infinite loop (never-ending loop).
Finally, the result is displayed.
While loop with else
Same as with for loops, while loops can also have an optional else block.
The else part is executed if the condition in the while loop evaluates to False.
The while loop can be terminated with a break statement. In such cases, the else part is ignored. Hence, a while
loop's else part runs if no break occurs and the condition is false.
Here is an example to illustrate this.
'''Example to illustrate
the use of else statement
with the while loop'''
counter = 0
Output
Inside loop
Inside loop
Inside loop
Inside else
Here, we use a counter variable to print the string Inside loop three times.
On the fourth iteration, the condition in while becomes False. Hence, the else part is executed.
Here, val is the variable that takes the value of the item inside the sequence on each iteration.
Loop continues until we reach the last item in the sequence. The body of for loop is separated from the rest of the code
using indentation.
Flowchart of for Loop
# List of numbers
numbers = [6, 5, 3, 8, 4, 2, 5, 4, 11]
The sum is 48
RANGE()
Range generates a list of integers and there are 3 ways to use it.
The function takes 1 to 3 arguments. Note I’ve wrapped each usage in list comprehension so we can see the values
generated.
i) range(end) : generate integers from 0 to the “end” integer.
[i for i in range(10)]
#=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
ii) range(start, end) : generate integers from the “start” to the “end” integer.
[i for i in range(2,10)]
#=> [2, 3, 4, 5, 6, 7, 8, 9]
iii) range(start, end, step) : generate integers from “start” to “end” at intervals of “step”.
[i for i in range(2,10,2)]
#=> [2, 4, 6, 8]
3. (a) What is the difference between a list and a tuple? Give an example (5 Marks)
Module
A module can contain executable statements as well as function definitions. These statements are intended to
initialize the module. They are executed only the first time the module name is encountered in an import
statement.
Package
The collections of modules organized together and kept into a directory. That directory is known as Package.
That is , a package is a directory of modules.
Inside this directory there will be __init__.py file. This file is the one which will always be recognized and run
by the compiler. Packages are modules, but not all modules are packages.
Example
From sklearn import cross_validation
4. (a) Evaluate the following expression (i) 5//3*2-6/3*5%3 (ii) 5%8 *3+8%3*5 ((5 Marks)
(i) 5//3*2-6/3*5%3
=1*2-6/3*5%3
=2-6/3*5%3
=2-2*5%3
=2-10%3
=2-1
Ans=1
(ii) 5%8*3+8%3*5
=5*3+8%3*5
=15+8%3*5
=15+2*5
=15+10
Ans=25
Modulus: returns the remainder when the first operand is divided by the
% second x%y
# Addition of numbers
add = a + b
# Subtraction of numbers
sub = a - b
# Multiplication of number
mul = a * b
# Division(float) of number
div1 = a / b
# Division(floor) of number
div2 = a // b
# Power
p = a ** b
# print results
print(add)
print(sub)
print(mul)
print(div1)
print(div2)
print(mod)
print(p)
Output
13
5
36
2.25
2
1
6561
Note: Refer to Differences between / and // for some interesting facts about these two operators.
Comparison Operators
Comparison of Relational operators compares the values. It either returns True or False according to the condition.
Operato
r Description Syntax
> Greater than: True if the left operand is greater than the right x>y
Operato
r Description Syntax
< Less than: True if the left operand is less than the right x<y
>= Greater than or equal to True if the left operand is greater than or equal to the right x >= y
<= Less than or equal to True if the left operand is less than or equal to the right x <= y
# a > b is False
print(a > b)
# a < b is True
print(a < b)
# a == b is False
print(a == b)
# a != b is True
print(a != b)
# a >= b is False
print(a >= b)
# a <= b is True
print(a <= b)
Output
False
True
False
True
False
True
Logical Operators
Logical operators perform Logical AND, Logical OR, and Logical NOT operations. It is used to combine conditional
statements.
Operato
r Description Syntax
and Logical AND: True if both the operands are true x and y
# Print a or b is True
print(a or b)
Output
False
True
False
Bitwise Operators
Bitwise operators act on bits and perform the bit-by-bit operations. These are used to operate on binary numbers.
Operator Description Syntax
| Bitwise OR x|y
~ Bitwise NOT ~x
Bitwise right
>> shift x>>
Output
0
14
-11
14
2
40
Assignment Operators
Assignment operators are used to assigning values to the variables.
Operato
r Description Syntax
Add AND: Add right-side operand with left side operand and then assign to left
+= operand a+=b a=a+b
Subtract AND: Subtract right operand from left operand and then assign to left
-= operand a-=b a=a-b
Multiply AND: Multiply right operand with left operand and then assign to left
*= operand a*=b a=a*b
Divide AND: Divide left operand with right operand and then assign to left
/= operand a/=b a=a/b
Modulus AND: Takes modulus using left and right operands and assign the result
%= to left operand a%=b a=a%b
Operato
r Description Syntax
Divide(floor) AND: Divide left operand with right operand and then assign the
//= value(floor) to left operand a//=b a=a//b
Exponent AND: Calculate exponent(raise power) value using operands and assign
**= value to left operand a**=b a=a**b
&= Performs Bitwise AND on operands and assign value to left operand a&=b a=a&b
|= Performs Bitwise OR on operands and assign value to left operand a|=b a=a|b
^= Performs Bitwise xOR on operands and assign value to left operand a^=b a=a^b
>>= Performs Bitwise right shift on operands and assign value to left operand a>>=b a=a>>b
a <<= b a= a <<
<<= Performs Bitwise left shift on operands and assign value to left operand b
# Assign value
b=a
print(b)
Output
10
20
10
100
102400
Identity Operators
is and is not are the identity operators both are used to check if two values are located on the same part of the memory.
Two variables that are equal do not imply that they are identical.
a = 10
b = 20
c=a
print(a is not b)
print(a is c)
Output
True
True
Membership Operators
in and not in are the membership operators; used to test whether a value or variable is in a sequence.
in True if value is found in the sequence
not in True if value is not found in the sequence
if (x not in list):
print("x is NOT present in given list")
else:
print("x is present in given list")
if (y in list):
print("y is present in given list")
else:
print("y is NOT present in given list")
Output
x is NOT present in given list
y is present in given list
Precedence and Associativity of Operators
6. (a) Write python program to illustrate variable length keyword arguments (5 Marks)
*args and **kwargs are mostly used in function definitions. *args and **kwargs allow you to pass an unspecified
number of arguments to a function, so when writing the function definition, you do not need to know how many
arguments will be passed to your function. *args is used to send a non-keyworded variable length argument list
to the function. Here’s an example to help you get a clear idea:
def test_var_args(f_arg, *argv):
print("first normal arg:", f_arg)
for arg in argv:
print("another arg through *argv:", arg)
>>> greet_me(name="yasoob")
name = yasoob
So you can see how we handled a keyworded argument list in our function. This is just the basics of **kwargs and you
can see how useful it is. Now let’s talk about how you can use *args and **kwargs to call a function with a list or
dictionary of arguments.
1.3. Using *args and **kwargs to call a function
So here we will see how to call a function using *args and **kwargs. Just consider that you have this little function:
def test_args_kwargs(arg1, arg2, arg3):
print("arg1:", arg1)
print("arg2:", arg2)
print("arg3:", arg3)
Now you can use *args or **kwargs to pass arguments to this little function. Here’s how to do it:
# first with *args
>>> args = ("two", 3, 5)
>>> test_args_kwargs(*args)
arg1: two
arg2: 3
arg3: 5
# Function call
result = search(arr, n, x)
if(result == -1):
print("Element is not present in array")
else:
print("Element is present at index", result)
swapcase(...)
| S.swapcase() -> string
|
| Return a copy of the string S with uppercase characters
| converted to lowercase and vice versa
strip(...)
| S.strip([chars]) -> string or unicode
|
| Return a copy of the string S with leading and trailing
| whitespace removed.
| If chars is given and not None, remove characters in chars instead.
| startswith(...)
| S.startswith(prefix[, start[, end]]) -> bool
|
| Return True if S starts with the specified prefix, False otherwise.
| With optional start, test S beginning at that position.
| With optional end, stop comparing S at that position.
| prefix can also be a tuple of strings to try.
| split(...)
| S.split([sep [,maxsplit]]) -> list of strings
|
| Return a list of the words in the string S, using sep as the
| delimiter string. If maxsplit is given, at most maxsplit
| splits are done. If sep is not specified or is None, any
| whitespace string is a separator and empty strings are removed
| from the result.
format(...)
| S.format(*args, **kwargs) -> string
|
| Return a formatted version of S, using substitutions from args and kwargs.
| The substitutions are identified by braces ('{' and '}').
slice(stop)
slice(start, stop, step)
Parameters:
start: Starting index where the slicing of object starts.
stop: Ending index where the slicing of object stops.
step: It is an optional argument that determines the increment between each index for slicing.
Return Type: Returns a sliced object containing elements in the given range only.
slice() Constructor
The slice() constructor creates a slice object representing the set of indices specified by range(start, stop, step).
fruit ’ b a n a n a ’
index 0 1 2 3 4 5 6
If you omit the first index (before the colon), the slice starts at the beginning of the string. If you omit the
second index, the slice goes to the end of the string:
>>> fruit = 'banana'
>>> fruit[:3] 'ban'
>>> fruit[3:] 'ana'
If the first index is greater than or equal to the second the result is an empty string, represented by two
quotation marks:
Extending indexing
In Python, indexing syntax can be used as a substitute for the slice object. This is an easy and convenient way to
slice a string both syntax wise and execution wise.
Syntax
string[start:end:step]
start, end and step have the same mechanism as slice() constructor.
Example
# String slicing
String ='ASTRING'
Output:
AST
SR
GITA
Reverse String
GNIRTSA
(b) nums = [10, 20, 30, 40, 50, 60, 70, 80, 90]
Write the output (i)nums[2:7] (ii) nums[:5] (iii) nums[-3:] (5 Marks)
(i) [30, 40, 50, 60, 70]
(ii) [10, 20, 30, 40, 50]
(iii) [70, 80, 90]
9. Write a python program using object oriented programming to demonstrate encapsulation, overloading
and inheritance (10 Marks)
class Base:
def __init__(self):
self.a = 10
self._b = 20
def display(self):
print(" the values are :")
print(f"a={self.a} b={self._b}")
def display(self):
Base.display(self)
print(f"d={self.d}")
obj1 = Base()
obj2 = Derived()
obj3 = Derived()
obj2.display()
obj3.display()
10. Write Python program to count words and store in dictionary for the given input text
Input Text : the clown ran after the car and the car ran into the tent and the tent fell down on the clown
and the car
Output : word count : {'and': 3, 'on': 1, 'ran': 2, 'car': 3, 'into': 1, 'after': 1, 'clown': 2, 'down': 1, 'fell': 1,
'the': 7, 'tent': 2}
Method 1
counts = dict()
line = input('Enter a line of text:')
words = line.split()
print('Words:', words)
print('Counting...’)
Method 2
def word_count(str):
counts = dict()
words = str.split()
for word in words:
if word in counts:
counts[word] += 1
else:
counts[word] = 1
return counts
#Driver Code
print( word_count(' the clown ran after the car and the car ran into the tent and the tent fell down on the
clown and the car’))
Output:
----------
1 List is used to collect items that usually consist An aííay is also a vital component that collects
of elements of multiple data types. seveíal items of the same data type.
2 List cannot manage aíithmetic opeíations. Aííay can manage aíithmetic opeíations.
4 When it comes to flexibility, the list is When it comes to flexibility, the aííay is not suitable
peífect as it allows easy modification of as it does not allow easy modification of data.
data.
6 In a list, the complete list can be accessed In an aííay, a loop is mandatoíy to access the
without any specific looping. components of the aííay.
Output
[[ 1 23 78]
[98 60 75]
[79 25 48]]
1
98
Mean
Mean is the sum of the elements divided by its sum and given by the following formula:
1|Page
It calculates the mean by adding all the items of the arrays and then divides it by the number of
elements. We can also mention the axis along which the mean can be calculated.
import numpy as np
a = np.array([5,6,7])
print(a)
print(np.mean(a))
Output
[5 6 7]
6.0
Median
Median is the middle element of the array. The formula differs for odd and even sets.
It can calculate the median for both one-dimensional and multi-dimensional arrays. Median
separates the higher and lower range of data values.
import numpy as np
a = np.array([5,6,7])
print(a)
print(np.median(a))
Output
[5 6 7]
6.0
Standard Deviation
Standard deviation is the square root of the average of square deviations from mean. The formula
for standard deviation is:
import numpy as np
a = np.array([5,6,7])
print(a)
2|Page
print(np.std(a))
Output
[5 6 7]
0.816496580927726
Variance
Variance is the average of the square deviations. Following is the formula for the same:
import numpy as np
a = np.array([5,6,7])
print(a)
print(np.var(a))
Output
[5 6 7]
0.6666666666666666
Output
[ 2 10 20]
3.6
2 Write python code to interact with database and perform the following task:
i) Create table
ii) Insert 3 records into the table
iii) Display all records
SQL Based relational Databases are widely used to store data. Eg - SQL Server, PostgreSQL,
MySQL, etc. Many alternative databases have also become quite popular.
The choice of DataBase is usually dependant on performance, data integrity and scalability nneds
of the application.
Loading data from SQl to DataFrame is straightforward. pandas has some functions to simplify
the process.
3|Page
In this example, we create a SQLite database using Python's built in sqlite3 driver.
In [ ]:
Most SQL Drivers (PyODBC, psycopg2, MySQLdb, pymssql, etc.) return a list of tuples when
selecting data from table. We can use these list of tuples for the DataFrame, but the column
names are present in the cursor's 'description' attribute.
4|Page
3(a) Write a Python program to write and read the contents of text file.
3(b Write a Python program to sort integer elements using Bubble sort
)
def bubbleSort(array):
for i in range(len(array)):
for j in range(0, len(array) - i - 1): # loop to compare array elements
if array[j] > array[j + 1]: # compare two adjacent elements change > to < to sort in
descending order
temp = array[j] # swapping elements if elements
array[j] = array[j+1]
array[j+1] = temp
## Main code
x = [-2, 45, 0, 11, -9]
bubbleSort(x)
5|Page
print('Sorted Array in Ascending Order:')
print(x)
Output
Sorted Array in Ascending Order:
[-9, -2, 0, 11, 45]
6|Page
7|Page
5 Discuss any five methods to handle the missing data with python code
df = pd.DataFrame(np.random.randn(7,3))
df.iloc[:4, 1] = np.nan
df.iloc[:2, 2] =np.nan
df
# Deleting the columns with missing data
df.dropna(axis=1)
8|Page
## method -2 - Deleting the rows with missing data
df = pd.DataFrame(np.random.randn(7,3))
df.iloc[:4, 1] = np.nan
df.iloc[:2, 2] =np.nan
df
# Deleting the columns with missing data
df.dropna(axis=0)
9|Page
df.iloc[:4, 1] = np.nan
df.iloc[:2, 2] =np.nan
df
df.fillna(0, inplace=True)
In [ ]:
## Threshold -keyword
df = pd.DataFrame(np.random.randn(7,3))
df.iloc[:4, 1] = np.nan
df.iloc[:2, 2] =np.nan
df
df.dropna(thresh=2)
In [ ]:
## ## method -3 Filling the missing data with a value –Imputation - mean
df = pd.DataFrame(np.random.randn(7,3))
df.iloc[:4, 1] = np.nan
df.iloc[:2, 2] =np.nan
df
df.fillna(df.mean(), inplace=True)
10 | P a g e
In [ ]:
## ## method -3 - Filling the missing data with a value – Imputation-median
df = pd.DataFrame(np.random.randn(7,3))
df.iloc[:4, 1] = np.nan
df.iloc[:2, 2] =np.nan
df.info()
d
df.fillna(df.median(), inplace=True
11 | P a g e
3. (a): Import the dataset
a. Setting current working directory
b. Import the dataset
5: Categorical Data
ML models are based on mathematical equations.
Country- France, Germany, Spain
Purcahsed- Yes, No
12 | P a g e
6.Splitting the dataset into train and test set
7 Write a python statement to Remove Row duplicates from the Data frame, null values
(a)
An important part of Data analysis is analyzing Duplicate Values and removing them.
Pandas drop_duplicates() method helps in removing duplicates from the data frame.
If ‘first’, it considers first value as unique and rest of the same values as duplicate.
If ‘last’, it considers last value as unique and rest of the same values as duplicate.
If False, it consider all of the same values as duplicates
inplace: Boolean values, removes rows with duplicates if True.
Return type: DataFrame with removed duplicate rows depending on Arguments passed.
Python’s pandas library provides a function to remove rows or columns from a dataframe which
contain missing values or NaN i.e.
DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)
Arguments :
axis:
0 , to drop rows with missing values
1 , to drop columns with missing values
13 | P a g e
how:
‘any’ : drop if any NaN / missing value is present
‘all’ : drop if all the values are missing / NaN
thresh: threshold for non NaN values
inplace: If True then make changes in the dataplace itself
It removes rows or columns (based on arguments) with missing values / NaN
7( Discuss loc and iloc functions with an example
b) loc is label-based, which means that you have to specify rows and columns based on their
row and column labels.
iloc is integer position-based, so you have to specify rows and columns by their integer
position values (0-based integer position).
Example :
i) Selecting via a single value : Both loc and iloc allow input to be a single value.
Syntax for data selection:
loc[row_label, column_label]
iloc[row_position, column_position]
For example, let’s say we would like to retrieve Friday’s temperature value. With loc, we can pass
the row label 'Fri' and the column label 'Temperature'.
# To get Friday's temperature using loc
df.loc['Fri', 'Temperature']
Output:
10.51
The equivalent iloc statement should take the row number 4 and the column number 1 .
# To get Friday's temperature using iloc
df.iloc[4, 1]
Output:
10.51
Output:
Day
Mon 12.79
Tue 19.67
Wed 17.51
14 | P a g e
Thu 14.44
Fri 10.51
Sat 11.07
Sun 17.50
Name: Temperature, dtype: float64#
Output:
Weather Shower
Temperature 10.51
Wind 26
Humidity 79
Name: Fri, dtype: object
Output:
Day
Thu 14.44
Fri 10.51
Name: Temperature, dtype: float64# Multiple columns
Similarly, a list of integer values can be passed to iloc to select multiple rows or columns. Here are
the equivalent statements using iloc:
>>> df.iloc[[3, 4], 1]
Output:
Day
Thu 14.44
Fri 10.51
Name: Temperature, dtype: float64
All the above outputs are Series because their results are 1-dimensional data.
4) The output will be a DataFrame when the result is 2-dimensional data,
15 | P a g e
# Multiple rows and columns using loc
rows = ['Thu', 'Fri']
cols=['Temperature','Wind']
df.loc[rows, cols]
8 Discuss different types of joins can be used in Pandas tools to implement a wide array
of functionality with an example.
These three types of joins can be used with other Pandas tools to implement a wide array of
functionality
The pd.merge() function implements a number of types of joins:
the one-to-one,
many-to-one, and
many-to-many joins
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'],
'group': ['Accounting', 'Engineering', 'Engineering', 'HR']})
df2 = pd.DataFrame({'employee': ['Lisa', 'Bob', 'Jake', 'Sue'],
'hire_date': [2004, 2008, 2012, 2014]})
display('df1', df1 , 'df2', df2 )
#One-to-one joins
df3 = pd.merge(df1, df2)
df3
16 | P a g e
9 What are the different ways a DataFrame can be created in python? Explain with
an example
17 | P a g e
df1 = pd.DataFrame(dictionary_of_lists)
df1
4. Create pandas DataFrame from list of dictionaries
Each dictionary represents one row and the keys are the columns names.
# Import pandas library
import pandas as pd # Create a list of dictionaries
list_of_dictionaries = [
{'Name': 'Emma', 'Age': 29, 'Department': 'HR'},
{'Name': 'Oliver', 'Age': 25, 'Department': 'Finance'},
{'Name': 'Harry', 'Age': 33, 'Department': 'Marketing'},
{'Name': 'Sophia', 'Age': 24, 'Department': 'IT'}]
# Create the DataFrame
df4 = pd.DataFrame(list_of_dictionaries)
df4
5. Create pandas Dataframe from dictionary of pandas Series
The dictionary keys represent the columns names and each Series represents a column contents.
# Import pandas library
import pandas as pd # Create Series
series1 = pd.Series(['Emma', 'Oliver', 'Harry', 'Sophia'])
series2 = pd.Series([29, 25, 33, 24])
series3 = pd.Series(['HR', 'Finance', 'Marketing', 'IT']) # Create a dictionary of Series
dictionary_of_nparray = {'Name': series1, 'Age': series2, 'Department':series3} # Create the
DataFrame
df5 = pd.DataFrame(dictionary_of_nparray)
df5
o/p:
###Sorting
# importing Numpy package
import numpy as np
a = np.array([[1,4],[3,1]])
print("sorted array : ",np.sort(a)) # sort along the last axis
print("\n sorted flattened array:", np.sort(a, axis=0)) # sort the flattened array
x = np.array([3, 1, 2])
18 | P a g e
print("\n sorting complex number :" ,np.sort_complex([5, 3, 6, 2, 1]))
Output:
sorted array : [[1 4]
[1 3]]
# where( ) : search an array for a certain value, and return the indexes that get a match.
Output:
(array([3, 5, 6], dtype=int64),)
0
[1 2 3]
###Splitting
#np.array_split: Split an array into multiple sub-arrays of equal or near-equal size. Does not raise
an exception if an equal division cannot be made.
x = np.arange(9.0)
print(x)
19 | P a g e
print(np.split(x, 3)) # with no of partitions N,
print(np.split(x, [3, 5, 6, 10])) # with indices
#the array will be divided into N equal arrays along axis. If such a split is not possible, an error is
raised.
x = np.arange(9)
np.array_split(x, 4)
#Split an array into multiple sub-arrays of equal or near-equal size. Does not raise an exception if
an equal division cannot be made.
a = np.array([[1, 3, 5, 7, 9, 11],
[2, 4, 6, 8, 10, 12]])
# horizontal splitting
print("Splitting along horizontal axis into 2 parts:\n", np.hsplit(a, 2))
# vertical splitting
print("\nSplitting along vertical axis into 2 parts:\n", np.vsplit(a, 2))
Output:
[0. 1. 2. 3. 4. 5. 6. 7. 8.]
[array([0., 1., 2.]), array([3., 4., 5.]), array([6., 7., 8.])]
[array([0., 1., 2.]), array([3., 4.]), array([5.]), array([6., 7., 8.]), array([], dtype=float64)]
Splitting along horizontal axis into 2 parts:
[array([[1, 3, 5],
[2, 4, 6]]), array([[ 7, 9, 11],
[ 8, 10, 12]])]
20 | P a g e
plt.scatter(cars_data['Age'],cars_data['Price'],c='red')
plt.title('Scatter plot of Price vs Age of the cars')
plt.xlabel('Age(months)')
plt.ylabel('Price(Euros)')
plt.show()
## Histogram
plt.hist(cars_data['KM'])
Out[3]:
(array([ 92., 239., 331., 222., 111., 51., 25., 13., 10., 2.]),
array([1.000000e+00, 2.430090e+04, 4.860080e+04, 7.290070e+04,
9.720060e+04, 1.215005e+05, 1.458004e+05, 1.701003e+05,
1.944002e+05, 2.187001e+05, 2.430000e+05]),
<BarContainer object of 10 artists>)
21 | P a g e
2 What do you mean by Normalization and Standardization? Write any five differences
between them
Feature scaling is one of the most important data preprocessing step in machine learning. Algorithms that
compute the distance between the features are biased towards numerically larger values if the data is not scaled.
Tree-based algorithms are fairly insensitive to the scale of the features. Also, feature scaling helps machine
learning, and deep learning algorithms train and converge faster.
There are some feature scaling techniques such as Normalization and Standardization that are the most popular
and at the same time, the most confusing ones.
Let’s resolve that confusion.
Normalization or Min-Max Scaling is used to transform features to be on a similar scale. The new point is
calculated as:
X_new = (X - X_min)/(X_max - X_min)
This scales the range to [0, 1] or sometimes [-1, 1]. Geometrically speaking, transformation squishes the n-
dimensional data into an n-dimensional unit hypercube. Normalization is useful when there are no outliers as it
cannot cope up with them. Usually, we would scale age and not incomes because only a few people have high
incomes but the age is close to uniform.
Standardization or Z-Score Normalization is the transformation of features by subtracting from mean and
dividing by standard deviation. This is often called as Z-score.
X_new = (X - mean)/Std
Standardization can be helpful in cases where the data follows a Gaussian distribution. However, this does not
have to be necessarily true. Geometrically speaking, it translates the data to the mean vector of original data to
the origin and squishes or expands the points if std is 1 respectively. We can see that we are just changing mean
and standard deviation to a standard normal distribution which is still normal thus the shape of the distribution is
not affected.
Standardization does not get affected by outliers because there is no predefined range of transformed features.
Difference between Normalization and Standardization
S.NO. Normalization Standardization
Minimum and maximum value of Mean and standard deviation is used for
1. features are used for scaling scaling.
It is used when features are of different It is used when we want to ensure zero
2. scales. mean and unit standard deviation.
3. Scales values between [0, 1] or [-1, 1]. It is not bounded to a certain range.
22 | P a g e
4. It is really affected by outliers. It is much less affected by outliers.
This transformation squishes the n- It translates the data to the mean vector
dimensional data into an n- of original data to the origin and squishes
6. dimensional unit hypercube. or expands.
It is useful when we don’t know about It is useful when the feature distribution
7. the distribution is Normal or Gaussian.
i)Importance of ‘self’
• Explicit reference to refer the current object, i.e the object which invoked the method
• Used to create and initialize instance variables of a class i.e it creates the attribute for the
class
• ‘self’ reference must be used as a first parameter in all instance methods of a class
otherwise the methods are known as simply “class methods”
• Moreover, “self” is not a keyword and has no special meaning in Python. We can use any
name in that place. However, it is recommended not to use any name other than “self”
(merely a convention and for readability)
23 | P a g e
24 | P a g e
25 | P a g e
5 Explain the terms histogram, binning and density.
6 Explain with an example “The GroupBy object “-- aggregate, filter, transform, and apply
The GroupBy object is a very flexible abstraction. In many ways, you can simply treat it as if it's a
collection of DataFrames, and it does the difficult things under the hood. Let's see some examples
using the Planets data.
The preceding discussion focused on aggregation for the combine operation, but there are more
options available. In particular, GroupBy objects have aggregate(), filter(), transform(),
and apply() methods that efficiently implement a variety of useful operations before combining
the grouped data.
For the purpose of the following subsections, we'll use this DataFrame:
Code
rng = np.random.RandomState(0)
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
'data1': range(6),
'data2': rng.randint(0, 10, 6)},
columns = ['key', 'data1', 'data2'])
df
Aggregation
The aggregate() method allows for even more flexibility. It can take a string, a function, or a list thereof, and compute
all the aggregates at once. Here is a quick example combining all these:
26 | P a g e
7 When to use Static and Instance methods? Explain with an example
Instance / Regular methods require an instance (self) as the first argument and when the method
is invoked (bound), self is automatically passed to the method
• Static methods are functions which do not require instance but are part of class definitions.
• Static mathods Useful when method does not need access to either the class variables or
the instance variables.
• Instance methods Useful when method needs access to the values that are specific to the
instance and needs to call other methods that have access to instance specific values.
27 | P a g e
A pivot table is a similar operation that is commonly seen in spreadsheets and other programs that
operate on tabular data.
The pivot table takes simple column-wise data as input, and groups the entries into a two-
dimensional table that provides a multidimensional summarization of the data.
The difference between pivot tables and GroupBy : pivot tables as essentially
a multidimensional version of GroupBy aggregation. That is, you split-apply-combine, but both
the split and the combine happen across not a one-dimensional index, but across a two-
dimensional grid.
Pivot Table Syntax
Here is the equivalent to the preceding operation using the pivot_table method of DataFrames:
his is eminently more readable than the groupby approach, and produces the same result. As you
might expect of an early 20th-century transatlantic cruise, the survival gradient favors both
women and higher classes. First-class women survived with near certainty (hi, Rose!), while only
one in ten third-class men survived (sorry, Jack!).
Multi-level pivot tables
Just as in the GroupBy, the grouping in pivot tables can be specified with multiple levels, and via
a number of options. For example, we might be interested in looking at age as a third dimension.
We'll bin the age using the pd.cut function:
10 Write a Pandas program to create a data frame with the test data , split the dataframe by school code and get mean,
min, and max value of i) age ii) weight for each school.
Test Data:
school class name age height weight
S1 s001 V Ram 12 173 35
S2 s002 V Kiran 12 192 32
S3 s003 VI Ryan 13 186 33
28 | P a g e
S4 s001 VI Bhim 13 167 30
S5 s002 VI Sita 14 151 31
S6 s004 V Bhavana 12 159 32
Solution
29 | P a g e
Python Revision Questions
1. Write the features of Python. Give the advantages & disadvantages of it.
2. Write a python program to check and print given list contains even numbers or not.
3. Discuss the Looping Statements Syntax and example.
4. Write any three differences between a list, tuples and dictionary ?
5. Illustrate args and kwargs parameters in python programming language with an example
6. How to create dictionary in python. Explain five methods with a brief description with example
7. Explain any five string functions with syntax and example.
8. How to create Constructors, inheritance and operator overloading in Python? Demonstrate with an
example
9. Discuss the usage of the following with respect to the print() function i) sep argument ii) end argument
iii) .format(arguments)
10. Differentiate between
i). pop and remove in List
ii). Aliasing & Cloning in List
iii). append () and insert () methods of list.
11. Write a python program to find the roots of a quadratic equation.
12. How do you implement call by reference or call by value in python? Explain with an example
13. Lists are heterogeneous. Support the statement with an example.
14. Write a Python program to reverse words in a given String in Python.
15. Write a Python program to read the contents of a text file and count the number of words in it.
16. Calculate Area of Rectangle and Triangle using Single Inheritance.
17. Write the features of Python. Give the advantages & disadvantages of it.
18. Discuss the Looping Statements with an example
19. Write a python program to ask user to enter a number and check if entered number is Prime number or
not ?
20. Differentiate between List and Tuple.
21. Discuss any five string functions with examples
22. Write a python program using object-oriented programming to demonstrate encapsulation, overloading
and inheritance
23. Write a program to perform sort elements using Insertion Sort.
24. Discuss any five string functions with examples. Write a Python program to swap cases of a given
string
25. Mention the key applications of Python. Why is Python called an Interpreted language?
26. Explain break, continue and pass statements with examples.
27. Write a Python program to print all prime numbers between a given range.
28. Create a student class and initialize it with name and roll number. Design methods to:
i. Display to display all information of the student
ii. setAge to assign age of student
iii. setMarks to assign marks of student
29. Use concept of inheritance to find total marks for the student
30. Differentiate between the following a List ,Tuple and Dictionary.
31. 6. Explain Variable-length arguments with examples.
32. 7. Consider the following two sets and write the Python code to perform following operations on them.
30 | P a g e
a. Lows = {0,1,2,3,4} Odds = {1,3,5,7,9}
i. Union ii) Difference iii) Symmetric difference iv) Intersection
33. 8. Write a Python program to read the contents of a text file and write into another.
34. What are the Steps in Visualization? Explain with an example
35. Explain the different types of graphs with syntax and examples in matplotlib.
36. Explain the process of Binning in Histograms with examples.
37. Differentiate between Bar graph and Histogram in matplotlib.
38. What do you mean by Normalization and Standardization? Write any five differences between them
39. Explain reshaping and pivoting in Pandas with examples.
40. Discuss Data Transformation techniques in Pandas.
41. Discuss any five string manipulation operations in Pandas.
42. What are the different ways to read and write in text files using Pandas. Give methods and examples.
43. Write python code to interact with database and perform the following task
a. Create table
b. Insert 3 records into the table
c. Display all records
44. Give example to load a database table into a dataframe and find the number of missing values
31 | P a g e