Unit1 Python
Unit1 Python
(114AT01)
Course Outcomes :
CO1 To explain how data is collected, managed and stored for data science
Module 1 :
Data Science Overview
2
Contents:
● Introduction to Data Science
● Different Sectors using Data Science
● Purpose and Components of Python in Data Science
3
Introduction to Data Science
(114AT01)
4
Python Basics
5
Contents:
● Different Programming Approaches
● Features of Python
● Basic elements of Python
● Basics of Python
○ Data types
○ Variables
○ Expressions
○ Objects and Functions
● Python Data Structures along with operations
○ String
○ Array
○ List
○ Tuple
○ Set
○ Dictionary 6
Different Programming Approaches :
● Procedural Oriented Programming is a programming language that follows a
step-by-step approach to break down a task into a collection of variables and
routines.
{
int a,b;
C Style Coding:
a=b=10
Focuses more on function aspect
;
printf("Sum = %d",a+b);
} 7
Different Programming Approaches : (...)
● Object-oriented programming is a programming paradigm built on the concept of
objects that contain both data and code to modify the data.
13
Comments: (...)
● Two types of Comments: (...)
○ Multiline Comments (Not Supported in Python)
■ Text is written inside “““comment text ””” or ‘‘‘comment text’’’ are regular
strings that is spanned multiple lines
■ They are internally allocated memory
■ If these strings are not assigned to any variable then it will be removed by garbage
collector
■ This way it works as multiline comments as they are never executed
■ If these strings are written as first statements in a module, function, class or method
, then these strings are called “Documentation Strings” or Docstrings.
■ Used to create Application Programming Interface (API) documentation file from a
python program
14
Comments: (...)
15
Comments: (...)
Example of Docstring
16
Variables:
● Variables are containers for storing data values
● Programming languages like C, C++, Java are statically typed
//Variables in C
#include<stdio.h> 5
void main()
{
a
int a=5,b;
b=a;
printf("a = %d",a); b
printf("b = %d",a);
}
Memory allocation in RAM 17
Variables: (...)
● Variables are containers for storing data values
● Programming languages like C, C++, Java are statically typed
//Variables in C
#include<stdio.h> 5
void main()
{
a
int a=5, b; 5
b=a;
printf("a = %d",a); b
printf("b = %d",a);
}
Memory allocation in RAM 18
Variables: (...)
● Python is dynamically typed
● Variables do not need to be declared with any particular type
● Are created the moment we first assign a value to it
● Can even change type after they have been set
● So, in Python, a variable is seen as a tag(or name) that is tied to some
value
○ e.g. a=5 means value ‘5’ is created first in memory and then a
tag name ‘a’ is created for it to show value
19
Variables: (...)
#Variables in Python
a=5
b=a 5
print(a)
print(b)
a=”Hello”
print(a)
20
Variables: (...)
#Variables in Python
a=5
b=a 5
print(a)
print(b) a
a=”Hello”
print(a)
21
Variables: (...)
#Variables in Python
b
a=5
b=a 5
print(a)
print(b) a
a=”Hello”
print(a)
23
Variables: (...)
#Variables in Python
a=5
b=a
print(a)
print(b)
a=”Hello”
print(a)
24
Constants, Identifiers and Keywords:
● Constants: Similar to variable, but it’s value cannot be
changed throughout the execution of the program
○ Unlike C/Java, defining constants is not possible in Python
○ In Python, constants are written though in CAPITALS but
are modifiable
27
Operators: (...)
● Classification based on nature of operation:
○ Arithmetic operators
○ Assignment operators
○ Unary minus operators
○ Relational operators
○ Logical operators
○ Boolean operators
○ Bitwise operators
○ Membership operators
○ Identity operators 28
Operators: (...)
● Arithmetic Operators:
○ Arithmetic operators are used to perform arithmetic operations between two
operands.
○ It includes:
1. + (addition),
2. - (subtraction),
3. * (multiplication),
4. / (divide),
5. % (reminder),
6. // (floor division), and
7. ** (exponent) operators. 29
Operators: (...)
● Arithmetic Operators: (…)
30
Operators: (...)
● Assignment Operators:
○ The assignment operators are used to assign the value of the right
expression to the left operand.
A = A + 10 is written as A+= 10
32
Operators: (...)
33
Operators: (...)
● Relational Operators:
○ Used to compare two quantities. Operators like ==, >=, <=, >, <, !=
○ Used to construct conditions in if statement or in loops based on conditions
○ Relational operators can be chained: A single expression can hold more than
one relational operator.
○ In the chain of relational operators, if we get all as True then only the final
result is evaluated to “True”
34
Operators: (...)
● Logical Operators:
○ Useful to construct compound conditions (combination of more than one single
conditions)
35
Operators: (...)
● Boolean Operators:
○ These operators act on “bool” type literals and they provide “bool” type output
36
Operators: (...)
● Bitwise Operators:
○ These operators acts on individual bits (0 and 1) of the operands
○ They can be directly used in binary numbers or on integers, but result will be
always in form of integers
38
Operators: (...)
● Membership Operators:
○ Used to test for membership in a sequence such as strings, lists, tuples or dictionaries
1. IN : Returns true if specified element is found in given sequence else returns false
2. NOT IN: Returns true if element is not found in the given sequence else return false
39
Operators: (...)
● Identity Operators:
40
Operators: (...)
● Identity Operators: (…)
○ There are two identity operators:
1. IS: Returns true if they are same else return false
2. IS NOT: Returns true if location of two compared objects are not same
41
Operator Precedence and Associativity:
● Precedence: Sequence of execution of operators is called as operator precedence.
● Represents the priority level of operator
● Precedence level is as follows:
1. Opening and closing brackets ()
2. Exponential **
3. Urinary minus – and bitwise complement ~
4. *, /, //, %
5. +, -
6. Bitwise left and right shift <<, >>
7. Bitwise AND &
8. Bitwise XOR ^ 42
Operator Precedence and Associativity: (…)
● Precedence level is as follows: (…)
9. Bitwise OR |
10. Relational Operators >, < , ==, <=, >=, !=
11. Assignment operators =, %=, /=, //=, -=, +=, *=, **=
12. Identity operators: is and is not
13. Membership operators: in and not in
14. Logical NOT not
15. Logical OR or
16. Logical AND and
● Associativity is the order in which an expression is evaluated that has
multiple operators of same precedence. 43
Input - Output:
● Output Statements:
● print() : This function is used to display output or results
○ They are used in different formats
○ Goes to next line after displaying output
44
Input - Output:(…)
Output Statements:(…)
45
Input - Output:(…)
Output Statements:(…)
46
Input - Output:(…)
Output Statements:(…)
47
Input - Output:(…)
48
Input - Output:
(…)
Output Statements: (…)
49
Input - Output:
(…)
Input Statements:
● input() : This function is used to accept input from the user and return the
50
Input - Output:
(…)
Input Statements:(…)
51
Input - Output:
(…)
Input Statements:(…)
52
Input - Output:
input() returns string values and + operator is used for joining
(…) strings
● Input Statements:
(…)
53
Input - Output:
(…)
● Input Statements:
(…)
54
Input - Output:
(…)
● Input Statements: (…)
Problem Statement:
Implement a Python program to compute area of circle.
55
Control Structures:
● Control Statements are statements which control or change the
flow of execution.
if condition:
statement1
statement2
57
Control Structures: (…)
❑ if … else statement:
Syntax: if condition:
statement1
else:
statement3
58
Control Structures: (…)
❑ if … elif … else
statement: Syntax:
60
Control Structures: (…)
❑ while loop:
Syntax:
while condition:
statement1
statement2
61
Control Structures: (…)
❑ while loop: (…)
Problem Statement: Python program that display all even numbers between X and Y.
62
Control Structures: (…)
❑ for loop:
Syntax:
64
Control Structures: (…)
❑ Infinite loop: Loop that runs forever
i=1
while i < = 10:
print(i)
65
Branching Statements:
● Branching statements are used to change the normal flow of execution
based on some condition
● Transfer of control from the current statement to another statement or
construct in the program unit
66
Branching Statements: (…)
● break statement:
67
Branching Statements: (…)
● continue statement:
68
Branching Statements: (…)
● return statement:
69
Other Statements:
● pass statement:
70
Statement , Function and Block Suites:
● Statement: Instructions written in the source code for execution are called statements. There
are different types of statements in the Python programming language like Assignment
statements, Conditional statements, Looping statements, etc.
● Block: A block is a combination of all these statements. It can be regarded as the grouping of
statements for a specific purpose.
● Block/Code or Block/Suite: A group of statements that is part of another statement or
function is called a block/code or block/suite.
72
Functions:
● FUNCTIONS: A function is similar to a program that consists of a group of
statements that are intended to perform a specific task.
● Two Types:
1. Built-in functions: e.g. sqrt(), power(), len() etc. comes with Python
2. User - defined functions: Functions created by user/programmer
73
Functions: (...)
● Advantages of using functions:
1. They are important in programming because they are used to process data, make
calculations or perform any task which is required in the software development
4. Code maintenance becomes easy. i.e. adding additional feature or removing an existing feature
5. Code debugging becomes easy i.e. Function returning in errors only needs to be modified
6. Reduces the length of the program. 74
Functions: (...)
● Creating / Defining a function: Function is defined using “def” keyword
75
Functions: (...)
● Creating / Defining a function: Function is defined using “def” keyword
def keyword Function name Parameter List
Argument List
Function
Definition
Function Call
78
Functions: (...)
● Returning values from function: (...)
79
Functions: (...)
● Passing a group of elements to a function: Create a list of elements and pass it to its
intended function for processing.
80
Python Data Structures with Operations
81
Data Types in Python:
● Datatype - Represents the type of data stored into a variable or memory
● Two Types:
○ Build in data types - In Built in Python Language
○ Derived data types - Created by programmer from existing data type e.g. array,
class or module
● Build in data types: (5 types)
○ None types
○ Numeric Types - int, float,complex
○ Sequences - str, tuple, list, bytes, bytearray, range
○ Sets
○ Mappings (Dictionary)
82
NONE data type :
● Represents an object that does not contain any value.
● Maximum of only one ‘None’ object is provided
● Two purpose:
■ Used inside function as a default value for arguments
■ In Boolean Expression: None data type represents FALSE
83
Numeric data type:
● There are 3 subtypes of numbers:
■ int - Stores integer values.
● No limit on size for int datatype in
python
C = -1 - 5.5J 84
Sequence data type :
● Represents group of elements or items.
● There are 6 types of sequences:
1. Bytes
2. Bytearray Common operations for all sequence data type:
3. Tuple ● Indexing
● Slicing
4. List
● Membership
5. Str
● Concatenation
6. range
● Repetition
● Iteration
85
Bytes data type :
● Converts objects into byte objects that cannot be modified
86
Bytes data type : (...)
88
Tuple data type:
● Tuple is a collection of elements of different data types, enclosed in ( ) and
elements are separated by commas (,)
89
Tuple data type: (...)
90
Tuple data type: (...)
● Operations on Tuple:
1. Indexing : Fetching element by position
■ Syntax : object[index]
● +ve index : Traverse left to right
91
Tuple data type: (...)
● Operations on Tuple: (...)
2. Slicing : Fetching elements from given range
■ Syntax : object[start : stop : step]
92
Tuple data type: (...)
● Operations on Tuple: (...)
2. Slicing : Fetching elements from given range
■ Syntax : object[start : stop : step]
93
Tuple data type: (...)
● Operations on Tuple: (...)
3. Membership operation :
■ Syntax : element in object
94
Tuple data type: (...)
● Operations on Tuple: (...)
4. Repetition operation :
■ Syntax : Seq1 * Number
95
Tuple data type: (...)
● Operations on Tuple: (...)
5. Concatenation operation :
■ Syntax : Seq1 +
Seq2
96
Tuple data type: (...)
● Operations on Tuple: (...)
6. Iteration operation :
97
Tuple data type: (...)
● Operations on Tuple: (...)
7. count(): Displays number of times the element occurred in the tuple
98
List data type :
● List is collection of elements of different data types
● It is mutable.
Range object is “iterable” i.e. suitable as a target for functions and loops that expect
something from which they can obtain successive items 99
List data type : (...)
List Methods and their descriptions
Method Description
lst.copy() Copies all the list elements into a new list and returns it
Functions Description
102
List data type : (...)
103
List data type : (...)
104
List data type : (...)
105
List data type : (...)
● Other operations are :
○ Concatenation, Repetition, Indexing, Slicing, Iteration, Membership
○ Aliasing: New name given to an existing list is known as aliasing
■ Therefore, any modification to original list will also reflect in aliased list and vice
versa
○ Cloning: To obtain exact independent copy of an existing object is known as
cloning.
107
List data type : (...)
● Nested List :
○ A list within another list is called a nested list.
○ Main use of nested list is in creation of matrix
108
List data type : (...)
● List comprehension:
○ Represents creation of new lists from an iterable object(like a list, set, tuple,
dictionary or range) that satisfy a given condition.
○ List comprehensions contain very compact code usually a single statement that
performs the task
○ Thus, a list comprehension consists of a square braces containing an expression,
followed with a for loop which is further followed by zero or more if statements.
List Comprehension
110
Set data type :
● Set is an unordered collection of elements much like a set in Maths
● Elements may not appear in the same order as they are entered into the set
111
Set data type : (...)
● Set Datatype
● Set is modifiable
● Set do not contain duplicates
112
Set data type : (...)
intersection_update() Removes the items in this set that are not present in other,
specified set(s)
isdisjoint() Returns whether two sets have a intersection or not
114
Set data type : (...)
update() Update the set with the union of this set and others
115
Set data type : (...)
116
Set data type : (...)
117
Set data type : (...)
118
Set data type : (...)
119
Set data type : (...)
120
Set data type : (...)
121
Set data type : (...)
122
Set data type : (...)
123
Set data type : (...)
124
Set data type : (...)
125
Set data type : (...)
126
Set data type : (...)
127
Set data type : (...)
128
Set data type : (...)
129
Set data type : (...)
130
Dictionary :
● A dictionary represents a group of elements arranged in the form of key-value pairs.
● The first element is considered as key and immediate next element is considered as value
● Key and its values are separated by (:) and as pair separated by (,) and enclosed in {}
● Value are mutable i.e. they can be of any data type like string , number, list, tuple or
another dictionary
131
Dictionary : (...)
134
Dictionary : (...)
135
Dictionary : (...)
136
Dictionary : (...)
137
Dictionary : (...)
138
Dictionary : (...)
139
Dictionary : (...)
140
Dictionary : (...)
141
Dictionary : (...)
142
Dictionary : (...)
143
Dictionary : (...)
144
Dictionary : (...)
145
Dictionary : (...)
146
STRINGS IN PYTHON
147
Strings:
● Strings: represents a group of characters.
● In Python, str datatype is used for strings
148
Strings are Immutable:
● Immutable object is an object whose content cannot be changed e.g. numbers, strings, tuple
● Mutable object is an object whose content can be changed e.g.lists, sets and dictionary
● Two reasons for making string objects as immutable in Python:
1. Performance:
o Space allocated at creation time->storage requirements are fixed and unchanging
o Less time to allocate memory and access them
o Improves performance
2. Security:
o Any attempt to modify the existing string object will create a new object in memory
o Identity number of the new object will change – helps programmer to identify if
modified
o Useful to enforce security where strings can be transported from one application to
another application without modifications 149
Strings are Immutable: (…)
As, strings are immutable,
they cannot be modified
Output:
151
Various Operations Performed on Strings: (…)
● Slicing the string:
Syntax: stringName [ start : stop : stepsize ]
152
Various Operations Performed on Strings: (…)
● Repeating the string: (using ‘*’ operator)
153
Various Operations Performed on Strings: (…)
● Comparing strings: (using relational operators >, <, >=, <=, ==, !=)
o Note: These methods do not remove spaces which are in the middle of the string
154
Various Operations Performed on Strings: (…)
● Finding substrings: (using find(), rfind(), index() and rindex() method)
● These methods returns the location of the first occurrence of the sub string in the main
string
● find() and index() methods search for the sub string from the beginning of the main string
● find() method returns -1 if the substring is not found in the main string.
● index() methods returns “ValueError” exception if the substring is not found in the
main string
● rfind() and rindex() methods search for the substring from right to left i.e. reverse
● Format of all the methods are similar as below:
mainstring.find(substring, beginning, ending)
● count(): Returns no. of occurrences of substring in main string
mainstring.count(substring)
mainstring.count(substring, beginning, ending)
155
Various Operations Performed on Strings: (…)
156
Various Operations Performed on Strings: (…)
157
Various Operations Performed on Strings: (…)
● replace() : Replace all the occurences of “old” substring with the “new” substring in the main string
159
Various Operations Performed on Strings: (…)
160
Various Operations Performed on Strings: (…)
161
Various Operations Performed on Strings: (…)
162
Various Operations Performed on Strings: (…)
● Starting testing methods:
164
Various Operations Performed on Strings: (…)
165
Various Operations Performed on Strings: (…)
166
Various Operations Performed on Strings: (…)
Sorting strings: Sort group of strings into alphabetical order using sort() or sorted() method
● sort() : after sorting, original array’s order will be lost and will be replaced with sorted arrays order
● sorted() : Original array’s order is not lost and instead a new reference is assigned to sorted array
Format: str.sort()
sorted(str)
167
ARRAYS IN PYTHON
(Working with array module)
168
Arrays:
● Array is an object that stores a group of elements of same data type
● Main advantage : Ease to store and process group of elements
● Properties of Arrays in Python:
● Store only one type of data
● Can increase / decrease their size dynamically
● Faster than list
● Methods that are useful to process the elements of any array are
169
Creating an Array:
● Syntax:
arrayName = array( typecode , [elements] )
Array name given by user array class data type of elements elements
e.g.
a1 = array ( ' i ', [2, 3, 4, 5]) #for creation of integer array
a1 = array ( ‘ f ', [2.5, 3.6, 4.6, 5.7]) #for creation of float array
a1 = array ( ‘ u ', [‘a’, ‘b’]) #for creation of character array
170
Creating an Array: (…)
Type Code C Type Minimum Size (bytes)
172
Indexing and Slicing Operations on Arrays:
173
Indexing and Slicing Operations on Arrays:
174
Processing the Arrays:
● “array” class of “array” module in Python offers methods to process the arrays easily
175
Processing the Arrays: (…)
Methods of Array Class
Method Description
a.extend(x) Appends x at the end of array a (x can be another array or an iterable object)
a.fromfile(f,n) Reads n items (in binary format) from the file object f and appends it at the end or
array a
Raises “EOFError” if fewer than n items are read
a.fromlist(lst) Appends items from the lst to the end of array a. lst can be any list or iterable object
a.index(x) Returns the position of first occurrence of x. Raises “ValueError” if not found
176
Processing the Arrays: (…)
Methods of Array Class (…)
Method Description
a.remove(x) Removes the first occurrence of x in the array a. Raises “ValueError” if not
found
178
Types of Arrays:
General Array Types
Single Dimensional Represents only one row / column of elements (1D
Array Array)
Multi-Dimensional Represents more than one row/column of elements
Array 2D and 3D are multi-dimensional array
179
ARRAYS IN PYTHON
(Working with NUMPY module)
180
Working with Arrays using numpy:
● “numpy” is a package that contains several classes, functions, variables etc. to deal
with scientific calculations in Python.
● It is useful to create and also process single as well as multi-dimensional arrays.
● In addition, numpy contains a large library of mathematical functions like linear
algebra functions and Fourier transforms.
Output:
182
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy : (...)
2. Using linspace() function: Used to create an array with evenly spaced points between a
starting and ending point.
Output:
183
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy : (...)
3. Using logspace() function: Used to produce evenly spaced points on a logarithmically
spaced scale.
Output:
184
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy : (...)
4. Using arange() function : Same as range() function in Python but output will be array
■ Format: arange (start, stop, stepsize)
● This creates an array of elements from “start” to one element prior to “stop” in
steps of “stepsize”
● If stepsize is omitted, default value is taken as 1
● If start is omitted, default value is taken as 0
Output:
185
Working with Arrays using numpy: (…)
● Few of the ways of creating arrays in numpy : (...)
5. Using zeros() and ones() functions
■ zeros() : Used to create an array with all zeroes
■ ones() : Used to create an array with all ones
■ Format: zeros ( n , datatype)
ones ( n , datatype)
● n – represents the number of elements
● If datatype is eliminated, then default datatype is taken as float in numpy
Output:
186
Mathematical Operations on Arrays:
● Mathematical operations : +, - , * , / etc. can be done on 1D array elements
● Other functions from math module, redefined in numpy like : sin(), cos(), tan(), sqrt(),
pow(), sum(), prod(), min(), max(), sort() are available to process the 1D array elements
Output:
187
Mathematical Operations on Arrays:
● Mathematical operations : +, - , * , / etc. can be done on 1D array elements
● Other functions from math module, redefined in numpy like : sin(), cos(), tan(), sqrt(),
pow(), sum(), prod(), min(), max(), sort() are available to process the 1D array elements
188
Mathematical Operations on Arrays: (…)
Output:
189
Comparing Arrays:
● Relational operators (>,<, >=, <=, ==, !=) and logical functions(logical_and(),
logical_or() and logical_not()) are used to compare elements of an array of same size.
191
Dimensions of Arrays
● Dimension of an array represents the arrangement of elements in the array
● When an array contains only one row or only one column – 1D array
● When an array contains more than 1 row and 1 column - 2D array
● Several combination of 2D arrays, forms three dimensional array or 3D array
● Both 2D and 3D are multi-dimensional arrays.
Output:
192
Attributes of an Array:
● ndarray contains following important attributes :
1. ndim : Represents number of dimensions / axes of the array (also referred as “rank”)
2. shape : Represents shape of an array.
o It is a tuple listing the number of elements along each dimension. e.g. For 1D, shape
provides number of elements in the row. For 2D, it specifies the number of rows and
columns in each row.
194
reshape() Method :
● reshape() method is useful to change the shape of an array.
● The new array should have the same number of elements as the original array.
Output:
195
flatten() Method :
● flatten() is useful to return a copy of the array collapsed into one dimension.
Output:
196
Working with Multi-dimensional Arrays:
● Multi – dimensional arrays can be created using following ways:
1. array() : Numpy’s array can also be used to create multidimensional array
2. ones() and zeros() : Useful to create 2D arrays with 1s and 0s respectively.
Format: zeros ( (r,c) , datatype) ones ( (r,c) , datatype)
where, r and c is no. of rows and columns respectively and default data type is float
3. eye() function: Creates a 2D array and fills the elements in the diagonal with 1s.
Format: eye ( n , dtype = datatype)
where, n = no. of rows and columns (square matrix) and default dtype = float
4. reshape() function: Useful to convert a 1D array into multidimensional(2D or 3D ) array.
Format: reshape ( arrayName , (n, r, c))
where, arrayName represents name of an array to be converted, n indicates the no. of
arrays in the resultant array, r indicates no. of rows and c indicates no. of columns
197
Working with Multi-dimensional Arrays: (…)
198
Indexing Operation in ndArray :
199
Slicing in Multi-dimensional Arrays:
0 1 2 3 4
0 1 2 3 4 5 Output:
1 6 7 8 9 10
2 11 12 13 14 15
200
Matrices in numpy:
● Matrix is a rectangular array of elements arranged in rows and columns.
●
● In numpy, matrix is a specialized 2D array that retains its 2D nature through operations.
Syntax for creating numpy matrix:
matrixName = matrix( 2D array/string/list)
array form : arr = [[1,2],[3,4],[5,6]]
string form : str = ‘1 2 ; 3 4 ; 5 6’
201
Matrices in numpy: (…)
● diagonal() : Returns a 1D array that contains diagonal elements of the original matrix
Format: a = diagonal( matrix)
● max(): Returns maximum element in the matrix
● min() : Returns minimum element in the matrix
● sum() : Returns sum of all the elements in the matrix
● mean() : Returns average of all the elements in the matrix
● prod():
● prod(0) returns a matrix that contains the products of elements in each column of the
original matrix.
● prod(1) returns a matrix that contains the products of elements in each row of the original
matrix
● sort()
Format:: Sorts the matrix in ascending
sort(matrixname orderaxis = 0 for columns, axis = 1 for row, default = 1)
, axis) (where
● transpose() or getT() : returns transpose of the given matrix
202
Matrices in numpy: (…)
203
Matrices in numpy: (…)
Output:
204
Matrix Addition :
Output:
205
Matrix Multiplication : (…)
Output:
206
Random Numbers :
● A random number is a number that cannot be guessed by anyone.
● numpy has a sub module called random that is equipped with the rand() function
which is used to create random numbers
207
Pandas Library
212
Introduction :
● Pandas is an open-source Python Library, that contains data structures and data
manipulation tools designed to make data cleaning and analysis fast and easy in Python.
○ NumPy, by contrast, is best suited for working with homogeneous numerical array data.
Features of Pandas :
● Allows us to analyze big data and make conclusions based on statistical theories.
● Pandas can clean messy data sets, and make them readable and relevant.
● Fast and efficient DataFrame object with default and customized indexing.
● Tools for loading data into in-memory data objects from different file formats.
● Data alignment and integrated handling of missing data.
● Reshaping and pivoting of date sets.
● Label-based slicing, indexing and subsetting of large data sets.
● Columns from a data structure can be deleted or inserted.
● Group by data for aggregation and transformations.
● High performance merging and joining of data.
● Time Series functionality.
Data Structures used in Pandas :
● Pandas deals with the following three data structures −
1. Series : Series is a one-dimensional array like structure with homogeneous / heterogeneous data
b. CSV file: more formatted, more informative and elements are separated by commas
c. Excel file: contains extensive formatting, could include multiple datasets in single file
Reading from text file:
Reading CSV delimited format:
Reading CSV delimited format: (...)
Reading CSV delimited format: (...)
Reading Excel and other Microsoft Office files:
Reading Excel and other Microsoft Office files: (...)