[go: up one dir, main page]

0% found this document useful (0 votes)
6 views238 pages

Unit1 Python

The document outlines a course on Data Science, detailing its outcomes and modules, including an introduction to Python programming. It covers various programming approaches, Python features, basic elements, and operators, along with input-output functions. The course aims to equip students with skills in data collection, management, and real-world applications of data science concepts.

Uploaded by

janviiiiiii0046
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views238 pages

Unit1 Python

The document outlines a course on Data Science, detailing its outcomes and modules, including an introduction to Python programming. It covers various programming approaches, Python features, basic elements, and operators, along with input-output functions. The course aims to equip students with skills in data collection, management, and real-world applications of data science concepts.

Uploaded by

janviiiiiii0046
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 238

Introduction to Data Science

(114AT01)
Course Outcomes :
CO1 To explain how data is collected, managed and stored for data science

CO2 To understand the key concepts in data science, including their


real-world applications and the toolkit used by data scientists

CO3 To implement data collection and management scripts using MongoDB


Introduction to Data Science
(114AT01)

Module 1 :
Data Science Overview

2
Contents:
● Introduction to Data Science
● Different Sectors using Data Science
● Purpose and Components of Python in Data Science

3
Introduction to Data Science
(114AT01)

Module 1 (Sub Topic):


Components of Python

4
Python Basics

5
Contents:
● Different Programming Approaches
● Features of Python
● Basic elements of Python
● Basics of Python
○ Data types
○ Variables
○ Expressions
○ Objects and Functions
● Python Data Structures along with operations
○ String
○ Array
○ List
○ Tuple
○ Set
○ Dictionary 6
Different Programming Approaches :
● Procedural Oriented Programming is a programming language that follows a
step-by-step approach to break down a task into a collection of variables and
routines.

//C Program to add two numbers


#include<stdio.h>
void main()

{
int a,b;
C Style Coding:
a=b=10
Focuses more on function aspect
;

printf("Sum = %d",a+b);
} 7
Different Programming Approaches : (...)
● Object-oriented programming is a programming paradigm built on the concept of
objects that contain both data and code to modify the data.

○ Mimics a lot of the real-world attributes of objects.


//Java Program to add two numbers
public class MyClass {

public static void main(String args[])


{
int a,b;
Java Style Coding:
a=b=10
Class is mandatory
;

System.out.println("Sum = " +(a+b));


} 8
Python :
● Python = C Programming Feature + Java Programming Feature
○ Python is a general-purpose interpreted, interactive, object-oriented, and high-level
programming language.
○ Developed by Guido Van Rossum in the year 1991 at the center for Mathematics and
Computer Science managed by Dutch Govt.
○ Multipurpose Programming Language - used in Artificial Intelligence and Machine
Learning.

○ Python build applications e.g. youtube, instagram, spotify etc.

#Python program to add two numbers


a=b=10
Python Style Coding:
print("Sum = ",(a+b))
Combined Features of C and Java9
Features of Python :
1. Simple - more clarity and simple understandable syntax
2. Easy to learn - very few keywords and resembles C
3. Open source
4. High level language
5. Dynamically typed- no declarations of variables needed
6. Platform independent, portable and scalable
7. Procedure and object oriented program
8. Interpreted - Python Byte code is interpreted by interpreter in PVM
9. Huge Library
10. Embeddable- insert Python into C/C++ programs
11. Scripting Language and Database connectivity 10
Features of Python: (...)
12. Extensible - Cython executed in PVM, Jython executed in JVM, Iron Python(.NET
programs integrated into Python and executed using Common Language
Runtime)

13. Batteries included -


a. Numpy : for processing array of single and multidimensional
b. Pandas : for powerful data structure for data analysis & manipulation
c. Matplotlib: package for drawing electronic circuits and 2D graphs
d. BeautifulSoup : for HTML parsing and XML
e. SciPy: for scientific computing and technical computing
f. Scikit : for machine learning
11
Basic Elements of Python:
● Objects
○ Python is an object oriented programming language.
○ Almost everything in Python is an object, with its properties and methods.
○ A Class is like an object constructor, or a "blueprint" for creating objects.
● Comments
● Variables, Constants, Identifiers, Keywords
● Operators
● Expression
● Statements
● Indentation
● Function and Block Suites
12
Comments:
● Comments are “non-executable” statements
● Used for purpose of understanding for human being and not for compiler or Python
Virtual Machine (PVM)

● Two types of Comments:


○ Single Line Comments ( Supported in Python)
■ Starts with symbol #

Example of Single Line Comment

13
Comments: (...)
● Two types of Comments: (...)
○ Multiline Comments (Not Supported in Python)
■ Text is written inside “““comment text ””” or ‘‘‘comment text’’’ are regular
strings that is spanned multiple lines
■ They are internally allocated memory
■ If these strings are not assigned to any variable then it will be removed by garbage
collector
■ This way it works as multiline comments as they are never executed
■ If these strings are written as first statements in a module, function, class or method
, then these strings are called “Documentation Strings” or Docstrings.
■ Used to create Application Programming Interface (API) documentation file from a
python program

14
Comments: (...)

Example of Docstring treated as MultiLine comments

15
Comments: (...)

Example of Docstring

16
Variables:
● Variables are containers for storing data values
● Programming languages like C, C++, Java are statically typed

//Variables in C

#include<stdio.h> 5
void main()
{
a
int a=5,b;
b=a;
printf("a = %d",a); b
printf("b = %d",a);
}
Memory allocation in RAM 17
Variables: (...)
● Variables are containers for storing data values
● Programming languages like C, C++, Java are statically typed

//Variables in C

#include<stdio.h> 5
void main()
{
a
int a=5, b; 5
b=a;
printf("a = %d",a); b
printf("b = %d",a);
}
Memory allocation in RAM 18
Variables: (...)
● Python is dynamically typed
● Variables do not need to be declared with any particular type
● Are created the moment we first assign a value to it
● Can even change type after they have been set
● So, in Python, a variable is seen as a tag(or name) that is tied to some
value
○ e.g. a=5 means value ‘5’ is created first in memory and then a
tag name ‘a’ is created for it to show value

19
Variables: (...)
#Variables in Python

a=5
b=a 5
print(a)
print(b)
a=”Hello”
print(a)

Memory allocation in RAM

20
Variables: (...)
#Variables in Python

a=5
b=a 5
print(a)
print(b) a
a=”Hello”
print(a)

Memory allocation in RAM

21
Variables: (...)
#Variables in Python
b
a=5
b=a 5
print(a)
print(b) a
a=”Hello”
print(a)

Memory allocation in RAM


Only one memory is referenced by two tags.
Therefore, Python is memory efficient. 22
Variables: (...)
#Variables in Python
b
a=5
b=a 5
print(a)
print(b)
a=”Hello” Hello
print(a)
a
Memory allocation in RAM

23
Variables: (...)
#Variables in Python

a=5
b=a
print(a)
print(b)
a=”Hello”
print(a)

24
Constants, Identifiers and Keywords:
● Constants: Similar to variable, but it’s value cannot be
changed throughout the execution of the program
○ Unlike C/Java, defining constants is not possible in Python
○ In Python, constants are written though in CAPITALS but
are modifiable

● Identifiers: Name given to a variable/ function / class (Case sensitive)


● Keywords (Reserved Words) - Words that are already a reserved for some
particular purpose
○ Each keyword has a special meaning and a specific operation.
○ These keywords can't be used as a variable.
25
Constants, Identifiers and Keywords: (...)

List of Python Keywords


26
Operators:
● The operator can be defined as a symbol which is responsible for a
particular operation between two operands.
● Operators are the pillars of a program on which the logic is built in a
specific programming language.

● Classification based on number of operands:


○ Unary operator: operator acts on single variable
○ Binary operator: operator acts on two variables
○ Ternary operator: operator acts on three variables

27
Operators: (...)
● Classification based on nature of operation:
○ Arithmetic operators
○ Assignment operators
○ Unary minus operators
○ Relational operators
○ Logical operators
○ Boolean operators
○ Bitwise operators
○ Membership operators
○ Identity operators 28
Operators: (...)
● Arithmetic Operators:
○ Arithmetic operators are used to perform arithmetic operations between two
operands.

○ It includes:
1. + (addition),
2. - (subtraction),
3. * (multiplication),
4. / (divide),
5. % (reminder),
6. // (floor division), and
7. ** (exponent) operators. 29
Operators: (...)
● Arithmetic Operators: (…)

30
Operators: (...)
● Assignment Operators:
○ The assignment operators are used to assign the value of the right
expression to the left operand.

○ Operations like: =, +=, -=, *=, /=, //=, **=

A = A + 10 is written as A+= 10

NOTE: Increment and Decrement operator are not supported in Python

● Unary Minus Operator:


○ Used to negate a value
31
Operators: (...)
● Assignment Operators: (…)

32
Operators: (...)

33
Operators: (...)
● Relational Operators:
○ Used to compare two quantities. Operators like ==, >=, <=, >, <, !=
○ Used to construct conditions in if statement or in loops based on conditions
○ Relational operators can be chained: A single expression can hold more than
one relational operator.
○ In the chain of relational operators, if we get all as True then only the final
result is evaluated to “True”

e.g. 1 < 2 < 3 < 4 yields “True”


1 < 2 > 3 < 4 yields “False”

34
Operators: (...)
● Logical Operators:
○ Useful to construct compound conditions (combination of more than one single
conditions)

○ Operators used are: and, or , not


○ In case of logical operators, False indicates value 0 and True indicates any other value.

35
Operators: (...)
● Boolean Operators:

○ These operators act on “bool” type literals and they provide “bool” type output

○ Operators used are: and, or, not

36
Operators: (...)
● Bitwise Operators:
○ These operators acts on individual bits (0 and 1) of the operands
○ They can be directly used in binary numbers or on integers, but result will be
always in form of integers

○ 6 types of bitwise operators:


1. Bitwise Complement Operator (~)
2. Bitwise And Operator (&)
3. Bitwise Or Operator (|)
4. Bitwise XOR Operator (^)
5. Bitwise Left Shift Operator (<<)
37
6. Bitwise Right Shift Operator (>>)
Operators: (...)
● Bitwise Operators:
(…)

38
Operators: (...)
● Membership Operators:

○ Used to test for membership in a sequence such as strings, lists, tuples or dictionaries

○ Two types of membership operators:

1. IN : Returns true if specified element is found in given sequence else returns false

2. NOT IN: Returns true if element is not found in the given sequence else return false

39
Operators: (...)
● Identity Operators:

○ These operators compare the memory locations of two objects.

○ id() – used to know memory location for an object


○ Returns an integer number called the identity number that
internally represents memory location of an object

40
Operators: (...)
● Identity Operators: (…)
○ There are two identity operators:
1. IS: Returns true if they are same else return false
2. IS NOT: Returns true if location of two compared objects are not same

41
Operator Precedence and Associativity:
● Precedence: Sequence of execution of operators is called as operator precedence.
● Represents the priority level of operator
● Precedence level is as follows:
1. Opening and closing brackets ()
2. Exponential **
3. Urinary minus – and bitwise complement ~
4. *, /, //, %
5. +, -
6. Bitwise left and right shift <<, >>
7. Bitwise AND &
8. Bitwise XOR ^ 42
Operator Precedence and Associativity: (…)
● Precedence level is as follows: (…)
9. Bitwise OR |
10. Relational Operators >, < , ==, <=, >=, !=
11. Assignment operators =, %=, /=, //=, -=, +=, *=, **=
12. Identity operators: is and is not
13. Membership operators: in and not in
14. Logical NOT not
15. Logical OR or
16. Logical AND and
● Associativity is the order in which an expression is evaluated that has
multiple operators of same precedence. 43
Input - Output:
● Output Statements:
● print() : This function is used to display output or results
○ They are used in different formats
○ Goes to next line after displaying output

44
Input - Output:(…)

Output Statements:(…)

45
Input - Output:(…)
Output Statements:(…)

46
Input - Output:(…)
Output Statements:(…)

47
Input - Output:(…)

Output Statements: (…)


Inside the formatted string, we can use replacement field which is denoted by a pair of
curly braces {}
Names or indexes in these replacement field represents order of the values.
Syntax:

print(“Formatted string with replacement fields” . format(values))

48
Input - Output:
(…)
Output Statements: (…)

49
Input - Output:
(…)
Input Statements:
● input() : This function is used to accept input from the user and return the

received value as string

50
Input - Output:
(…)
Input Statements:(…)

51
Input - Output:
(…)
Input Statements:(…)

52
Input - Output:
input() returns string values and + operator is used for joining
(…) strings
● Input Statements:
(…)

53
Input - Output:
(…)
● Input Statements:
(…)

54
Input - Output:
(…)
● Input Statements: (…)

Problem Statement:
Implement a Python program to compute area of circle.

55
Control Structures:
● Control Statements are statements which control or change the
flow of execution.

● Following are the control statements available in Python:


1. if statement
2. if … else statement
3. if …elif … else (nested if)
4. while loop
5. for loop

Note: Switch case is not supported by Python


56
Control Structures: (…)
❑ if statement: (Simple if)
Syntax:

if condition:
statement1
statement2

57
Control Structures: (…)
❑ if … else statement:
Syntax: if condition:
statement1
else:

statement3

58
Control Structures: (…)
❑ if … elif … else
statement: Syntax:

if condition1: Used when we want to test


statement1 multiple conditions and
execute statements as per
elif condition2:
the conditions
statement2
elif condition3:
Note: else part is not
statement3
mandatory.
else:
statement4 59
Control Structures: (…)
❑ if … elif … else statement: (…)
Problem Statement: Python program that display whether number is +ve, -ve or 0.

60
Control Structures: (…)
❑ while loop:
Syntax:

while condition:

statement1

statement2

while loop is useful to execute a group of statements several times repeatedly


depending on whether a condition is TRUE or FALSE

61
Control Structures: (…)
❑ while loop: (…)
Problem Statement: Python program that display all even numbers between X and Y.

62
Control Structures: (…)
❑ for loop:
Syntax:

for variable in sequence:


statements
○ for loop is useful to iterate over the elements of a sequence
○ It is used to execute a group of statements repeatedly depending upon the
number of elements in the sequence like string, list, tuple, range etc.
○ First element in the sequence is assigned to variable and set of statements
inside for loop is executed.
○ Process repeats for all other elements in the sequence till, no more elements are
left to be covered. 63
Control Structures: (…)
❑ for loop: (…)
Problem Statement: Python program that display all characters of given string.

64
Control Structures: (…)
❑ Infinite loop: Loop that runs forever

i=1
while i < = 10:
print(i)

❑ Nested loop: One loop inside another loop

65
Branching Statements:
● Branching statements are used to change the normal flow of execution
based on some condition
● Transfer of control from the current statement to another statement or
construct in the program unit

● Python provides 3 branching statements:


1. break : used to come out of the loop
2. continue : used to continue with the next repetition of a loop
3. return : used in a function to return some value from the function

66
Branching Statements: (…)
● break statement:

67
Branching Statements: (…)
● continue statement:

68
Branching Statements: (…)
● return statement:

69
Other Statements:
● pass statement:

70
Statement , Function and Block Suites:
● Statement: Instructions written in the source code for execution are called statements. There
are different types of statements in the Python programming language like Assignment
statements, Conditional statements, Looping statements, etc.
● Block: A block is a combination of all these statements. It can be regarded as the grouping of
statements for a specific purpose.
● Block/Code or Block/Suite: A group of statements that is part of another statement or
function is called a block/code or block/suite.

● All the statements in a block are at the same indent level.


i=0
if i < 10:
i = 10 # this is a code block
print(i)
71
Functions

72
Functions:
● FUNCTIONS: A function is similar to a program that consists of a group of
statements that are intended to perform a specific task.

● Two Types:
1. Built-in functions: e.g. sqrt(), power(), len() etc. comes with Python
2. User - defined functions: Functions created by user/programmer

73
Functions: (...)
● Advantages of using functions:
1. They are important in programming because they are used to process data, make
calculations or perform any task which is required in the software development

2. They are reusable code : Increase reusability


3. They provides modularity for programming
■ Module represents a part of the program
■ Main program is divided into smaller sub tasks called modules which are implemented
using functions

4. Code maintenance becomes easy. i.e. adding additional feature or removing an existing feature

can be done easily using functions

5. Code debugging becomes easy i.e. Function returning in errors only needs to be modified
6. Reduces the length of the program. 74
Functions: (...)
● Creating / Defining a function: Function is defined using “def” keyword

def functionname( param1, param2,... ):


”””Function_docstring”””
Function_suite
return [expression]

75
Functions: (...)
● Creating / Defining a function: Function is defined using “def” keyword
def keyword Function name Parameter List

def functionname( param1, param2,... ): “ :” represents


Function
”””Function_docstring””” beginning of function
Specification
body
Or
Documentation Function_suite
String return [expression] Set of executable
statements, written
at same indentation

return [expression] exits a function,


optionally passing back an expression to the caller.
Note: A return statement with no arguments is the
same as return None. 76
Functions: (...)
● Calling a function: functionname( paramValue1, paramValue2,... )

Argument List
Function
Definition

Function Call

Note: Type of data is determined at run time by Python interpreter based on


arguments given to the function. Thus, Python is dynamically typed. 77
Functions: (...)
● Returning values from function:
○ Return the output from the function to the calling function using “return” statement
○ In Python, functions can return single as well as multiple values.
○ In case of multiple values , they are returned as tuples
○ Return statement should not be used in function, if it not going to return anything

78
Functions: (...)
● Returning values from function: (...)

79
Functions: (...)
● Passing a group of elements to a function: Create a list of elements and pass it to its
intended function for processing.

80
Python Data Structures with Operations

81
Data Types in Python:
● Datatype - Represents the type of data stored into a variable or memory
● Two Types:
○ Build in data types - In Built in Python Language
○ Derived data types - Created by programmer from existing data type e.g. array,
class or module
● Build in data types: (5 types)
○ None types
○ Numeric Types - int, float,complex
○ Sequences - str, tuple, list, bytes, bytearray, range
○ Sets
○ Mappings (Dictionary)
82
NONE data type :
● Represents an object that does not contain any value.
● Maximum of only one ‘None’ object is provided
● Two purpose:
■ Used inside function as a default value for arguments
■ In Boolean Expression: None data type represents FALSE

83
Numeric data type:
● There are 3 subtypes of numbers:
■ int - Stores integer values.
● No limit on size for int datatype in
python

■ float- Stores floating point numbers

● Can be written in scientific notation


e.g. 22.55 * 1010 can be represented as
2.55E10 or 2.55e10
■ Complex - Represents complex number

which has real and imaginary part e.g.

C = -1 - 5.5J 84
Sequence data type :
● Represents group of elements or items.
● There are 6 types of sequences:
1. Bytes
2. Bytearray Common operations for all sequence data type:

3. Tuple ● Indexing
● Slicing
4. List
● Membership
5. Str
● Concatenation
6. range
● Repetition
● Iteration
85
Bytes data type :
● Converts objects into byte objects that cannot be modified

86
Bytes data type : (...)

Byte list cannot be modified 87


Bytearray data type :
● Similar to bytes but are modifiable

88
Tuple data type:
● Tuple is a collection of elements of different data types, enclosed in ( ) and
elements are separated by commas (,)

● Tuples are immutable


● Syntax : tuple1 = 1,2,4.5,”ht”)
● Creation of tuple :

89
Tuple data type: (...)

Tuple cannot be modified

90
Tuple data type: (...)
● Operations on Tuple:
1. Indexing : Fetching element by position
■ Syntax : object[index]
● +ve index : Traverse left to right

● -ve index : Traverse right to left

91
Tuple data type: (...)
● Operations on Tuple: (...)
2. Slicing : Fetching elements from given range
■ Syntax : object[start : stop : step]

92
Tuple data type: (...)
● Operations on Tuple: (...)
2. Slicing : Fetching elements from given range
■ Syntax : object[start : stop : step]

93
Tuple data type: (...)
● Operations on Tuple: (...)
3. Membership operation :
■ Syntax : element in object

94
Tuple data type: (...)
● Operations on Tuple: (...)
4. Repetition operation :
■ Syntax : Seq1 * Number

95
Tuple data type: (...)
● Operations on Tuple: (...)
5. Concatenation operation :
■ Syntax : Seq1 +
Seq2

96
Tuple data type: (...)
● Operations on Tuple: (...)
6. Iteration operation :

97
Tuple data type: (...)
● Operations on Tuple: (...)
7. count(): Displays number of times the element occurred in the tuple

8. index() : Displays the position of first occurrence of the given element

98
List data type :
● List is collection of elements of different data types
● It is mutable.

Accessing List Elements

Range object is “iterable” i.e. suitable as a target for functions and loops that expect
something from which they can obtain successive items 99
List data type : (...)
List Methods and their descriptions

Method Description

lst.index(x) Returns the first occurrence of x in list lst

lst.append(x) Appends x at the end of list

lst.insert(i,x) Inserts x into the list in the position specified by i

lst.copy() Copies all the list elements into a new list and returns it

lst.extend(lst1) Append lst1 to lst

lst.count(x) Returns number of occurrences of x in the list

lst.remove(x) Removes x from the list lst

lst.pop() Removes the ending element from the list


100
List data type : (...)

List Methods and their descriptions (...)

lst.sort() Sorts the element of the list into ascending order

lst.reverse() Reverses the sequence of elements in the list

lst.clear() Deletes all elements from the list

List Functions and their descriptions

Functions Description

len(lst) Returns number of elements in the list

max(lst) Returns biggest element in the list

min(lst) Returns smallest element in the list

sum(lst) Returns sum of all the elements in the list 101


List data type : (...)

102
List data type : (...)

103
List data type : (...)

104
List data type : (...)

105
List data type : (...)
● Other operations are :
○ Concatenation, Repetition, Indexing, Slicing, Iteration, Membership
○ Aliasing: New name given to an existing list is known as aliasing
■ Therefore, any modification to original list will also reflect in aliased list and vice
versa
○ Cloning: To obtain exact independent copy of an existing object is known as
cloning.

■ Any change in original list will not reflect in cloned list


■ Can be created in two ways:
● Y = X[ : ] → is cloned to list Y
● Y = X.copy() → Copy the elements of one list (X) into another(Y)
106
List data type : (...)

107
List data type : (...)
● Nested List :
○ A list within another list is called a nested list.
○ Main use of nested list is in creation of matrix

108
List data type : (...)
● List comprehension:
○ Represents creation of new lists from an iterable object(like a list, set, tuple,
dictionary or range) that satisfy a given condition.
○ List comprehensions contain very compact code usually a single statement that
performs the task
○ Thus, a list comprehension consists of a square braces containing an expression,
followed with a for loop which is further followed by zero or more if statements.

[ expression for item1 in iterable1 if statement1


for item2 in iterable2 if statement2
for item3 in itterable3 if statement3 ….]
109
List data type : (...)
● List comprehension: (...)

List Comprehension

110
Set data type :
● Set is an unordered collection of elements much like a set in Maths
● Elements may not appear in the same order as they are entered into the set

● Does not accept duplicate elements


● There are 2 subtypes:
1. Set Datatype
2. Frozen Set Datatype

111
Set data type : (...)
● Set Datatype

● Creation of set can be done in two ways:


○ Create set by entering element in { } where each elements in it is
separated by comma (,)

○ set() function can be used to create set


● As sets are not ordered, indexing and slicing operations on set cannot be
performed.

● Set is modifiable
● Set do not contain duplicates

112
Set data type : (...)

Set Methods and their descriptions


Method Description
add() Adds an element to the set

clear() Removes all the elements from the set

copy() Returns a copy of the set

difference() Returns a set containing the difference between two or more


sets
difference_update() Removes the items in this set that are also included in another,
specified set
discard() Remove the specified item
113
Set data type : (...)

Set Methods and their descriptions (…)


Method Description
intersection() Returns a set, that is the intersection of two other sets

intersection_update() Removes the items in this set that are not present in other,
specified set(s)
isdisjoint() Returns whether two sets have a intersection or not

issubset() Returns whether another set contains this set or not

issuperset() Returns whether this set contains another set or not

pop() Removes a element from the set randomly

114
Set data type : (...)

Set Methods and their descriptions (…)


Method Description
remove() Removes the specified element

symmetric_difference() Returns a set with the symmetric differences of two sets

symmetric_difference_up inserts the symmetric differences from this set and


date() another

union() Return a set containing the union of sets

update() Update the set with the union of this set and others

115
Set data type : (...)

116
Set data type : (...)

117
Set data type : (...)

118
Set data type : (...)

119
Set data type : (...)

120
Set data type : (...)

121
Set data type : (...)

122
Set data type : (...)

123
Set data type : (...)

124
Set data type : (...)

125
Set data type : (...)

126
Set data type : (...)

127
Set data type : (...)

128
Set data type : (...)

129
Set data type : (...)

130
Dictionary :
● A dictionary represents a group of elements arranged in the form of key-value pairs.
● The first element is considered as key and immediate next element is considered as value
● Key and its values are separated by (:) and as pair separated by (,) and enclosed in {}
● Value are mutable i.e. they can be of any data type like string , number, list, tuple or
another dictionary

● Rules for keys: They must be unique and immutable

131
Dictionary : (...)

Dictionary Methods and their descriptions


Method Description
d.clear() Removes all key-value pairs from dictionary ‘d’
d.copy() Copies all elements from ‘d’ into new dictionary
d.fromkeys(s,[,v]) Create a new dictionary with keys from sequence ‘s’ and values all
set to ‘v’
d.get(k, [,v]) Returns the value associated with key ‘k’. If k is not found, it
returns ‘v’
d.items() Returns an object (tuple) that contain key-value pairs of ‘d’
d.keys() Returns a sequence of keys from the dictionary ‘d’
132
Dictionary : (...)

Dictionary Methods and their descriptions (…)


Method Description
d.values() Returns a sequence of values from the dictionary ‘d’
d.update(x) Adds all element from dictionary ‘x’ to ‘d’
d.popitem() Removes the last element key-value pair
d.pop(k, [,v]) Removes the key ‘k’ and its value from ‘d’ and returns the value. If
key is not found, then the value ‘v’ is returned. If key is not found
and ‘v’ is not mentioned then “KeyError” is raised
d.setdefault(k, [,v]) If key ‘k’ is found, its value is returned. If key is not found, then
the k,v pair is stored into dictionary ‘d’
133
Dictionary : (...)

134
Dictionary : (...)

135
Dictionary : (...)

136
Dictionary : (...)

137
Dictionary : (...)

138
Dictionary : (...)

139
Dictionary : (...)

140
Dictionary : (...)

141
Dictionary : (...)

142
Dictionary : (...)

143
Dictionary : (...)

144
Dictionary : (...)

145
Dictionary : (...)

146
STRINGS IN PYTHON

147
Strings:
● Strings: represents a group of characters.
● In Python, str datatype is used for strings

148
Strings are Immutable:
● Immutable object is an object whose content cannot be changed e.g. numbers, strings, tuple
● Mutable object is an object whose content can be changed e.g.lists, sets and dictionary
● Two reasons for making string objects as immutable in Python:
1. Performance:
o Space allocated at creation time->storage requirements are fixed and unchanging
o Less time to allocate memory and access them
o Improves performance
2. Security:
o Any attempt to modify the existing string object will create a new object in memory
o Identity number of the new object will change – helps programmer to identify if
modified

o Useful to enforce security where strings can be transported from one application to
another application without modifications 149
Strings are Immutable: (…)
As, strings are immutable,
they cannot be modified

Here, strings are still immutable. As


string “Hey” still exist in memory but
it does not have any tag assigned to
it. It will be removed by garbage
collector later.

“s1” and “s2” are pointing to the


same object “Say” and so they
have the same ID. 150
Various Operations Performed on Strings:
● Length of String: len() – Gives total number of characters (including spaces) in a string

● Indexing: Represents position number of characters in string

Output:

151
Various Operations Performed on Strings: (…)
● Slicing the string:
Syntax: stringName [ start : stop : stepsize ]

152
Various Operations Performed on Strings: (…)
● Repeating the string: (using ‘*’ operator)

● Concatenation of Strings: (using ‘+’ operator or join())

● Checking membership: (using ‘in’ and ‘not in’ operator)

153
Various Operations Performed on Strings: (…)
● Comparing strings: (using relational operators >, <, >=, <=, ==, !=)

● Removing spaces: (using methods rstrip(), lstrip(), strip())

o rstrip() : removes the spaces on the right side of the string

o lstrip() : removes the spaces on the left side of the string

o strip() : removes spaces from both side of the string

o Note: These methods do not remove spaces which are in the middle of the string

154
Various Operations Performed on Strings: (…)
● Finding substrings: (using find(), rfind(), index() and rindex() method)
● These methods returns the location of the first occurrence of the sub string in the main
string
● find() and index() methods search for the sub string from the beginning of the main string
● find() method returns -1 if the substring is not found in the main string.
● index() methods returns “ValueError” exception if the substring is not found in the
main string
● rfind() and rindex() methods search for the substring from right to left i.e. reverse
● Format of all the methods are similar as below:
mainstring.find(substring, beginning, ending)
● count(): Returns no. of occurrences of substring in main string
mainstring.count(substring)
mainstring.count(substring, beginning, ending)
155
Various Operations Performed on Strings: (…)

156
Various Operations Performed on Strings: (…)

157
Various Operations Performed on Strings: (…)
● replace() : Replace all the occurences of “old” substring with the “new” substring in the main string

Format: #for elements separated by commas

Format: stringname.replace(old, new)


● split() : Used to break a string into pieces and these pieces are returned as list.
str.split(‘,’)
str.split() #for elements separated by spaces
● join() : Join all string given as a tuple or list of strings using separator between them.
Format: separator.join(str)
● Changing case of a string: 4 methods are useful to change case of given string
1. upper() – convert all characters of string into upper case
2. lower() – convert all characters of string into lower case
3. swapcase() – toggle i.e. capital into small and vice versa
4. title() – convert the first letter of every word in string as capital 158
Various Operations Performed on Strings: (…)

159
Various Operations Performed on Strings: (…)

160
Various Operations Performed on Strings: (…)

161
Various Operations Performed on Strings: (…)

162
Various Operations Performed on Strings: (…)
● Starting testing methods:

1. startswith() : to know whether a string is starting with a substring or not

2. endswith() : to know whether a string is ending with a substring or not


Format: str.startswith(substring)
str.endswith(substring)
● String testing methods: To test the nature of characters and return result in true or false. (Also

applicable to single character)

1. isalnum() : To check string contains characters consisting of alphabets and digits


2. isalpha() : To check whether string contains characters consisting of only alphabets
3. isdigit() : To check whether string contains characters consisting of only digits
4. islower() : To check whether string contains characters in lower case
5. isupper() : To check whether string contains characters in upper case
6. istitle() : To check whether string contains characters in title mode 163
7. isspace() : To check whether string contains characters consisting of only spaces
Various Operations Performed on Strings: (…)

164
Various Operations Performed on Strings: (…)

165
Various Operations Performed on Strings: (…)

166
Various Operations Performed on Strings: (…)
Sorting strings: Sort group of strings into alphabetical order using sort() or sorted() method
● sort() : after sorting, original array’s order will be lost and will be replaced with sorted arrays order
● sorted() : Original array’s order is not lost and instead a new reference is assigned to sorted array

Format: str.sort()
sorted(str)

167
ARRAYS IN PYTHON
(Working with array module)

168
Arrays:
● Array is an object that stores a group of elements of same data type
● Main advantage : Ease to store and process group of elements
● Properties of Arrays in Python:
● Store only one type of data
● Can increase / decrease their size dynamically
● Faster than list
● Methods that are useful to process the elements of any array are

available in “array” module.

169
Creating an Array:
● Syntax:
arrayName = array( typecode , [elements] )

Array name given by user array class data type of elements elements

e.g.
a1 = array ( ' i ', [2, 3, 4, 5]) #for creation of integer array
a1 = array ( ‘ f ', [2.5, 3.6, 4.6, 5.7]) #for creation of float array
a1 = array ( ‘ u ', [‘a’, ‘b’]) #for creation of character array

170
Creating an Array: (…)
Type Code C Type Minimum Size (bytes)

‘b’ Signed integer 1


‘B’ Unsigned integer 1
‘i’ Signed integer 2
‘I’ Unsigned integer (Capital I) 2
‘l’ Signed integer (Small L) 4
‘L’ Unsigned integer 4
‘f’ Floating point 4
‘d’ Double precision floating point 4
‘u’ Unicode character
2
Creating an Array: (…)

172
Indexing and Slicing Operations on Arrays:

173
Indexing and Slicing Operations on Arrays:

174
Processing the Arrays:
● “array” class of “array” module in Python offers methods to process the arrays easily

Variables of Array Class


Variable Description
a.typecode Represents the type code character used to create array a
a.itemsize Represents the size of items stored in the array(in bytes)

175
Processing the Arrays: (…)
Methods of Array Class

Method Description

a.append(x) Adds an element x at the end of existing array a

a.count(x) Returns number of occurrences of x in array a

a.extend(x) Appends x at the end of array a (x can be another array or an iterable object)

a.fromfile(f,n) Reads n items (in binary format) from the file object f and appends it at the end or
array a
Raises “EOFError” if fewer than n items are read

a.fromlist(lst) Appends items from the lst to the end of array a. lst can be any list or iterable object

a.fromstring(str) Appends item from string str to the end of array a

a.index(x) Returns the position of first occurrence of x. Raises “ValueError” if not found

176
Processing the Arrays: (…)
Methods of Array Class (…)

Method Description

a.insert(i,x) Inserts x in the position i in the array

a.pop(x) Removes the item x from array a and returns it

a.pop() Removes the last item from array a

a.remove(x) Removes the first occurrence of x in the array a. Raises “ValueError” if not
found

a.reverse() Reverses the order of elements in the array a

a.tofile(f) Writes all elements of array a to the file f

a.tolist() Converts array ‘a’ into a list

a.tostring() Converts array a into a string 177


Processing the Arrays: (…)

178
Types of Arrays:
General Array Types
Single Dimensional Represents only one row / column of elements (1D
Array Array)
Multi-Dimensional Represents more than one row/column of elements
Array 2D and 3D are multi-dimensional array

● In Python, we can create and work with single dimensional arrays


● Python does not support multi-dimensional array
● So, multi-dimensional arrays can be constructed using third party packages like
numpy (numerical python)

179
ARRAYS IN PYTHON
(Working with NUMPY module)

180
Working with Arrays using numpy:
● “numpy” is a package that contains several classes, functions, variables etc. to deal
with scientific calculations in Python.
● It is useful to create and also process single as well as multi-dimensional arrays.
● In addition, numpy contains a large library of mathematical functions like linear
algebra functions and Fourier transforms.

● To work with numpy, we should first import “numpy” module.


● Arrays, created using numpy are called as n dimensional arrays (ndarray)
● If n = 1, it represents 1D Array
● If n = 2, it represents 2D Array
● If n = 3, it represents 3D Array
181
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy :
1. Using array() function : Used to create an array
■ Format: array( [elements], dtype)
e.g. myArray = array([10,20,30], int)
■ Mentioning datatype is optional as Python can assume the data type from first
element’s data type in an array
■ For string, format of mentioning datatype is as follows:
myArray = array([“Hey”, “Say”], dtype = str )

Output:

182
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy : (...)
2. Using linspace() function: Used to create an array with evenly spaced points between a
starting and ending point.

■ Format: linspace (start, stop, n)


● start – represents the starting points
● stop - represents the last point
● n – represents the numbers of parts the elements should be divided (including
start and stop). If n is omitted, the default value of n = 50 is considered

Output:

183
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy : (...)
3. Using logspace() function: Used to produce evenly spaced points on a logarithmically
spaced scale.

■ Format: logspace (start, stop, n)


● Starts at a value 10start and ends at 10stop. (Default value of n = 50)

Output:

184
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy : (...)
4. Using arange() function : Same as range() function in Python but output will be array
■ Format: arange (start, stop, stepsize)
● This creates an array of elements from “start” to one element prior to “stop” in
steps of “stepsize”
● If stepsize is omitted, default value is taken as 1
● If start is omitted, default value is taken as 0

Output:

185
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy : (...)
5. Using zeros() and ones() functions
■ zeros() : Used to create an array with all zeroes
■ ones() : Used to create an array with all ones
■ Format: zeros ( n , datatype)
ones ( n , datatype)
● n – represents the number of elements
● If datatype is eliminated, then default datatype is taken as float in numpy

Output:

186
Mathematical Operations on Arrays:
● Mathematical operations : +, - , * , / etc. can be done on 1D array elements
● Other functions from math module, redefined in numpy like : sin(), cos(), tan(), sqrt(),
pow(), sum(), prod(), min(), max(), sort() are available to process the 1D array elements

Output:

187
Mathematical Operations on Arrays:
● Mathematical operations : +, - , * , / etc. can be done on 1D array elements
● Other functions from math module, redefined in numpy like : sin(), cos(), tan(), sqrt(),
pow(), sum(), prod(), min(), max(), sort() are available to process the 1D array elements

Vectorized Operations: As entire array


(vector) is processed like a variable. They
are important due to following reasons:
1. Vectorized operations are faster
2. They are syntactically clearer
Output:
3. They provide compact code

188
Mathematical Operations on Arrays: (…)

Output:

189
Comparing Arrays:
● Relational operators (>,<, >=, <=, ==, !=) and logical functions(logical_and(),
logical_or() and logical_not()) are used to compare elements of an array of same size.

● Both after comparison, gives a resultant array of values having True/False.


● any(): Used to determine if any one element of the array is TRUE
● all(): Used to determine whether all the elements in the array are TRUE
● nonzero(): Gives the position of non-zero elements in an array
● where(): Used to create a new array based on whether a given condition is
True/False

Format: array = where(condition, expression1, expression2)


190
Comparing Arrays: (…)

191
Dimensions of Arrays
● Dimension of an array represents the arrangement of elements in the array
● When an array contains only one row or only one column – 1D array
● When an array contains more than 1 row and 1 column - 2D array
● Several combination of 2D arrays, forms three dimensional array or 3D array
● Both 2D and 3D are multi-dimensional arrays.

Output:

192
Attributes of an Array:
● ndarray contains following important attributes :
1. ndim : Represents number of dimensions / axes of the array (also referred as “rank”)
2. shape : Represents shape of an array.
o It is a tuple listing the number of elements along each dimension. e.g. For 1D, shape
provides number of elements in the row. For 2D, it specifies the number of rows and
columns in each row.

o We can also change shape using shape attribute.


3. size : Gives total number of elements in an array
4. itemsize : Gives memory size of the array element in bytes
5. dtype : Gives datatype of the elements in the array
6. nbytes : Gives the total number of bytes occupied by an array (i.e. size of an array *
itemsize of each element in the array) 193
Attributes of an Array: (…)

194
reshape() Method :
● reshape() method is useful to change the shape of an array.
● The new array should have the same number of elements as the original array.

Output:

195
flatten() Method :
● flatten() is useful to return a copy of the array collapsed into one dimension.

Output:

196
Working with Multi-dimensional Arrays:
● Multi – dimensional arrays can be created using following ways:
1. array() : Numpy’s array can also be used to create multidimensional array
2. ones() and zeros() : Useful to create 2D arrays with 1s and 0s respectively.
Format: zeros ( (r,c) , datatype) ones ( (r,c) , datatype)
where, r and c is no. of rows and columns respectively and default data type is float
3. eye() function: Creates a 2D array and fills the elements in the diagonal with 1s.
Format: eye ( n , dtype = datatype)
where, n = no. of rows and columns (square matrix) and default dtype = float
4. reshape() function: Useful to convert a 1D array into multidimensional(2D or 3D ) array.
Format: reshape ( arrayName , (n, r, c))
where, arrayName represents name of an array to be converted, n indicates the no. of
arrays in the resultant array, r indicates no. of rows and c indicates no. of columns
197
Working with Multi-dimensional Arrays: (…)

198
Indexing Operation in ndArray :

199
Slicing in Multi-dimensional Arrays:

0 1 2 3 4

0 1 2 3 4 5 Output:
1 6 7 8 9 10

2 11 12 13 14 15
200
Matrices in numpy:
● Matrix is a rectangular array of elements arranged in rows and columns.

● If a matrix has one row, it is called as “row matrix”

● If a matrix has one column, it is called as “column matrix”


● When a matrix has more than one row and one column, then it is called as “m X n matrix”

where m is number of rows and n is number of columns

● numpy provides a special object called “matrix” to work with matrices


● In numpy, matrix is a specialized 2D array that retains its 2D nature through operations.
Syntax for creating numpy matrix:
matrixName = matrix( 2D array/string/list)
array form : arr = [[1,2],[3,4],[5,6]]
string form : str = ‘1 2 ; 3 4 ; 5 6’

201
Matrices in numpy: (…)
● diagonal() : Returns a 1D array that contains diagonal elements of the original matrix
Format: a = diagonal( matrix)
● max(): Returns maximum element in the matrix
● min() : Returns minimum element in the matrix
● sum() : Returns sum of all the elements in the matrix
● mean() : Returns average of all the elements in the matrix
● prod():
● prod(0) returns a matrix that contains the products of elements in each column of the
original matrix.
● prod(1) returns a matrix that contains the products of elements in each row of the original
matrix
● sort()
Format:: Sorts the matrix in ascending
sort(matrixname orderaxis = 0 for columns, axis = 1 for row, default = 1)
, axis) (where
● transpose() or getT() : returns transpose of the given matrix
202
Matrices in numpy: (…)

203
Matrices in numpy: (…)

Output:

204
Matrix Addition :

Output:

205
Matrix Multiplication : (…)
Output:

206
Random Numbers :
● A random number is a number that cannot be guessed by anyone.
● numpy has a sub module called random that is equipped with the rand() function
which is used to create random numbers

● To call this function, use random.rand()


● rand(): generate random numbers between 0.0 and 1.0

207
Pandas Library

212
Introduction :
● Pandas is an open-source Python Library, that contains data structures and data
manipulation tools designed to make data cleaning and analysis fast and easy in Python.

● It has functions for : Analyzing, Cleaning, Exploring, and Manipulating data.


● The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was
created by Wes McKinney in 2008.
● pandas is often used in tandem with numerical computing tools like NumPy and SciPy,
analytical libraries like statsmodels and scikit-learn, and data visualization libraries like
matplotlib.
● While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is
designed for working with tabular or heterogeneous data.

○ NumPy, by contrast, is best suited for working with homogeneous numerical array data.
Features of Pandas :
● Allows us to analyze big data and make conclusions based on statistical theories.
● Pandas can clean messy data sets, and make them readable and relevant.
● Fast and efficient DataFrame object with default and customized indexing.
● Tools for loading data into in-memory data objects from different file formats.
● Data alignment and integrated handling of missing data.
● Reshaping and pivoting of date sets.
● Label-based slicing, indexing and subsetting of large data sets.
● Columns from a data structure can be deleted or inserted.
● Group by data for aggregation and transformations.
● High performance merging and joining of data.
● Time Series functionality.
Data Structures used in Pandas :
● Pandas deals with the following three data structures −
1. Series : Series is a one-dimensional array like structure with homogeneous / heterogeneous data

Properties: Heterogeneous data, data values mutable and size immutable

2. DataFrame : DataFrame is a two-dimensional array with heterogeneous data

Properties: Heterogeneous data, size as well as data values are mutable.


3. Panel : Panel is a three-dimensional data structure with heterogeneous data. (No more
supported)
Series:
○ Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float,
python objects, etc.).

○ The axis labels are collectively called index.


○ A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)

○ data takes various forms like ndarray, list, constants,dictionaries


○ index values mustbe unique and hashable, samelength as data.
Default np.arange(n) if no index is passed.

○ dtype is for data type. If None, data type will be inferred


○ Copy: Copy input data. Default False. Only affects series or 1D ndarray input.
Series: (...)
Series: (...)
Series: (...)
Series: (...)

Note: Works only for series from array


Series: (...)
Series: (...)
DataFrames:
● A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with
rows and columns.

● A pandas DataFrame can be created using the following constructor:

pandas.DataFrame( data, index, columns, dtype, copy)


○ Data takes various forms like ndarray, series, lists, dict and also
another DataFrame.
○ Index : For the row labels, the Index to be used for the resulting frame is Optional
Default np.arange(n) if no index is passed.
○ Columns : For column labels, the optional default syntax is - np.arange(n). This is only
true if no index is passed.

○ Dtype : Data type of each column.


○ Copy: This command is used for copying of data, default is False.
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
DataFrames: (...)
Accessing data in Structured Flat File Format:
● Native Python methods aren’t enough to read input intelligently

a. Can’t distinguish column name with column data


b. Can’t select particular column
● Pandas helps with these requirements.
● Pandas library provides parsers, code used to read individual bits of data and determine the
purpose of each bit according to the format of entire file.

● Structured Flat File Formats:


a. Text file: least formatted, elements separated by spaces, all elements are treated as strings
(so, typecasting is needed when numeric computations need to be done)

b. CSV file: more formatted, more informative and elements are separated by commas
c. Excel file: contains extensive formatting, could include multiple datasets in single file
Reading from text file:
Reading CSV delimited format:
Reading CSV delimited format: (...)
Reading CSV delimited format: (...)
Reading Excel and other Microsoft Office files:
Reading Excel and other Microsoft Office files: (...)

You might also like