[go: up one dir, main page]

0% found this document useful (0 votes)
4 views14 pages

Pandas Intro

Uploaded by

Mehfil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views14 pages

Pandas Intro

Uploaded by

Mehfil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Pandas

Pandas Main Data Structures


• Dataframes
• Series
What is Pandas??
 Pandas is an open-source library that allows to perform data manipulation and analysis in
Python.
 Pandas Python library offers data manipulation and data operations for numerical tables
and time series.
 Pandas provide an easy way to create, manipulate, and wrangle the data. It is built on top
of NumPy, means it needs NumPy to operate.
 Pandas was created by wes mckinney in 2008.

what are benefits of Pandas?


Data scientists make use of Pandas in Python for its following advantages:
 Easily handles missing data
 It uses Series for one-dimensional data structure and DataFrame for multi-
dimensional data structure
 It provides an efficient way to slice the data
 It provides a flexible way to merge, concatenate or reshape the data
 It includes a powerful time series tool to work.
Pandas deals with the following three data structures −Pandas generally provide two data
structures for manipulating data, They are:
•Series
•DataFrame
•Panel
What is Series in Pandas?
Series is a one-dimensional array like structure with homogeneous data. For example, the
following series is a collection of integers 10, 23, 56, …
10 23 56 17 52 61 73 90 26 72

What is DataFrame in Pandas?


DataFrame is a two-dimensional array with heterogeneous data. For example,

Name Age Gender Rating


Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78

The table represents the data of a sales team of an organization with their overall performance
rating. The data is represented in rows and columns. Each column represents an attribute and
each row represents a person.
What is Panel in pandas?
Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the
panel in graphical representation. But a panel can be illustrated as a container of DataFrame.
Creating a Series:- In the real world, a Pandas Series will be created by loading the datasets from
existing storage, storage can be SQL Database, CSV file, an Excel file. Pandas Series can be created
from the lists, dictionary, and from a scalar value etc.
import pandas as pd
import numpy as np
# Creating empty series
ser = pd.Series()
print(ser)
data =# simple array
np.array(['g', 'e', 'e', 'k', 's'])
ser = pd.Series(data)
print(ser)

Output:
Series([], dtype: float64)
0 g
1 e
2 e
3 k
4 s
dtype: object
Creating a DataFrame: In the real world, a Pandas DataFrame will be created by loading the
datasets from existing storage, storage can be SQL Database, CSV file, an Excel file. Pandas
DataFrame can be created from the lists, dictionary, and from a list of dictionaries, etc.

import pandas as pd

# Calling DataFrame constructor


df = pd.DataFrame()
print(df)

# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is','portal', 'for', 'Geeks']

# Calling DataFrame constructor on list


df = pd.DataFrame(lst)
print(df)
Parameters of Pandas
read_csv Function
• nrows
• Usecols
• Skiprows
• Index_col
• Header
• Prefix
• names

You might also like