Pandas
Pandas Main Data Structures
• Dataframes
• Series
What is Pandas??
Pandas is an open-source library that allows to perform data manipulation and analysis in
Python.
Pandas Python library offers data manipulation and data operations for numerical tables
and time series.
Pandas provide an easy way to create, manipulate, and wrangle the data. It is built on top
of NumPy, means it needs NumPy to operate.
Pandas was created by wes mckinney in 2008.
what are benefits of Pandas?
Data scientists make use of Pandas in Python for its following advantages:
Easily handles missing data
It uses Series for one-dimensional data structure and DataFrame for multi-
dimensional data structure
It provides an efficient way to slice the data
It provides a flexible way to merge, concatenate or reshape the data
It includes a powerful time series tool to work.
Pandas deals with the following three data structures −Pandas generally provide two data
structures for manipulating data, They are:
•Series
•DataFrame
•Panel
What is Series in Pandas?
Series is a one-dimensional array like structure with homogeneous data. For example, the
following series is a collection of integers 10, 23, 56, …
10 23 56 17 52 61 73 90 26 72
What is DataFrame in Pandas?
DataFrame is a two-dimensional array with heterogeneous data. For example,
Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78
The table represents the data of a sales team of an organization with their overall performance
rating. The data is represented in rows and columns. Each column represents an attribute and
each row represents a person.
What is Panel in pandas?
Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the
panel in graphical representation. But a panel can be illustrated as a container of DataFrame.
Creating a Series:- In the real world, a Pandas Series will be created by loading the datasets from
existing storage, storage can be SQL Database, CSV file, an Excel file. Pandas Series can be created
from the lists, dictionary, and from a scalar value etc.
import pandas as pd
import numpy as np
# Creating empty series
ser = pd.Series()
print(ser)
data =# simple array
np.array(['g', 'e', 'e', 'k', 's'])
ser = pd.Series(data)
print(ser)
Output:
Series([], dtype: float64)
0 g
1 e
2 e
3 k
4 s
dtype: object
Creating a DataFrame: In the real world, a Pandas DataFrame will be created by loading the
datasets from existing storage, storage can be SQL Database, CSV file, an Excel file. Pandas
DataFrame can be created from the lists, dictionary, and from a list of dictionaries, etc.
import pandas as pd
# Calling DataFrame constructor
df = pd.DataFrame()
print(df)
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is','portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)
Parameters of Pandas
read_csv Function
• nrows
• Usecols
• Skiprows
• Index_col
• Header
• Prefix
• names