0% found this document useful (0 votes)

127 views25 pages

1-Python Pandas Case Study

Uploaded by

sureshkgenai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

127 views25 pages

1-Python Pandas Case Study

Uploaded by

sureshkgenai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Pandas - Series and Dataframe

There is a close connection between the DataFrames and the Series of Pandas. A DataFrame
can be seen as a concatenation of Series, each Series having the same index, i.e. the index of
the DataFrame.

We will demonstrate this in the following example.

We define the following three Series:

import pandas as pd
years = range(2014, 2018)
shop1 = pd.Series([2409.14, 2941.01, 3496.83,
3119.55], index=years)
shop2 = pd.Series([1203.45, 3441.62, 3007.83,
3619.53], index=years)
shop3 = pd.Series([3412.12, 3491.16, 3457.19,
1963.10], index=years)

pd.concat([shop1, shop2, shop3])

What happens, if we concatenate these "shop" Series? Pandas provides a concat function
for this purpose:

shops_df = pd.concat([shop1, shop2, shop3],

axis=1)
shops_df

Let's do some fine sanding by giving names to the columns:

cities = ["Zürich", "Winterthur", "Freiburg"]

shops_df.columns = cities
print(shops_df)
# alternative way: give names to series:
shop1.name = "Zürich"
shop2.name = "Winterthur"
shop3.name = "Freiburg"
print("------")
shops_df2 = pd.concat([shop1, shop2, shop3],
axis=1)
print(shops_df2)

print(type(shops_df))

This means, we can arrange or concat Series into DataFrames!

DataFrames from Dictionaries

A DataFrame has a row and column index; it's like a dict of Series with a
common index.

cities = {"name": ["London", "Berlin",

"Madrid", "Rome",
"Paris", "Vienna", "Bucharest", "Hamburg",
"Budapest", "Warsaw", "Barcelona",
"Munich", "Milan"],
"population": [8615246, 3562166, 3165235,
2874038,
2273305, 1805681, 1803425, 1760433,
1754000, 1740119, 1602386, 1493900,
1350680],
"country": ["England", "Germany", "Spain",
"Italy",
"France", "Austria", "Romania",
"Germany", "Hungary", "Poland", "Spain",
"Germany", "Italy"]}
city_frame = pd.DataFrame(cities)
city_frame

Retrieving the Column Names

It's possible to get the names of the columns as a list:

city_frame.columns.values

Custom Index
We can see that an index (0,1,2, ...) has been automatically assigned to
the DataFrame. We can also assign a custom index to the DataFrame
object:

ordinals = ["first", "second", "third",

"fourth",
"fifth", "sixth", "seventh", "eigth",
"ninth", "tenth", "eleventh", "twelvth",
"thirteenth"]
city_frame = pd.DataFrame(cities,
index=ordinals)
city_frame

Rearranging the Order of Columns

We can also define and rearrange the order of the columns at the time of
creation of the DataFrame. This makes also sure that we will have a
defined ordering of our columns, if we create the DataFrame from a
dictionary. Dictionaries are not ordered

city_frame = pd.DataFrame(cities,
columns=["name",
"country",
"population"])
city_frame

But what if you want to change the column names and the ordering of an
existing DataFrame?

city_frame.reindex(["country", "name",
"population"])
city_frame

Now, we want to rename our columns. For this purpose, we will use the
DataFrame method 'rename'. This method supports two calling
conventions

(index=index_mapper, columns=columns_mapper, ...)

(mapper, axis={'index', 'columns'}, ...)
We will rename the columns of our DataFrame into Romanian names in
the following example. We set the parameter inplace to True so that our
DataFrame will be changed instead of returning a new DataFrame, if
inplace is set to False, which is the default!

city_frame.rename(columns={"name":"Nume",
"country":"țară", "population":"populație"},
inplace=True)
city_frame

Existing Column as the Index of a DataFrame

We want to create a more useful index in the following example. We will
use the country name as the index, i.e. the list value associated to the key
"country" of our cities dictionary:

city_frame = pd.DataFrame(cities,
columns=["name", "population"],
index=cities["country"])
city_frame

Alternatively, we can change an existing DataFrame. We can us the

method set_index to turn a column into an index. "set_index" does not
work in-place, it returns a new data frame with the chosen column as the
index:
city_frame = pd.DataFrame(cities)
city_frame2 = city_frame.set_index("country")
print(city_frame2)

We saw in the previous example that the set_index method returns a

new DataFrame object and doesn't change the original DataFrame. If we
set the optional parameter "inplace" to True, the DataFrame will be
changed in place, i.e. no new object will be created:

city_frame = pd.DataFrame(cities)
city_frame.set_index("country", inplace=True)
print(city_frame)

Label-Indexing on the Rows

So far we have indexed DataFrames via the columns. We will
demonstrate now, how we can access rows from DataFrames via the
locators 'loc' and 'iloc'. ('ix' is deprecated and will be removed in the
future)

city_frame = pd.DataFrame(cities,
columns=("name", "population"),
index=cities["country"])
print(city_frame.loc["Germany"])

print(city_frame.loc[["Germany", "France"]])
print(city_frame.loc[city_frame.population>2000
000])

Sum and Cumulative Sum

We can calculate the sum of all the columns of a DataFrame or the sum of
certain columns:

print(city_frame.sum())

city_frame["population"].sum()

x = city_frame["population"].cumsum()
print(x)

Assigning New Values to Columns

x is a Pandas Series. We can reassign the previously calculated cumulative

sums to the population column:

city_frame["population"] = x
print(city_frame)
Instead of replacing the values of the population column with the
cumulative sum, we want to add the cumulative population sum as a new
culumn with the name "cum_population".

city_frame = pd.DataFrame(cities,
columns=["country",
"population",
"cum_population"],
index=cities["name"])
city_frame

We can see that the column "cum_population" is set to Nan, as we

haven't provided any data for it.
We will assign now the cumulative sums to this column:

city_frame["cum_population"] =
city_frame["population"].cumsum()
city_frame

We can also include a column name which is not contained in the

dictionary, when we create the DataFrame from the dictionary. In this
case, all the values of this column will be set to NaN:

city_frame = pd.DataFrame(cities,
columns=["country",
"area",
"population"],
index=cities["name"])
print(city_frame)

Accessing the Columns of a DataFrame

There are two ways to access a column of a DataFrame. The result is in
both cases a Series:

# in a dictionary-like way:
print(city_frame["population"])

# as an attribute
print(city_frame.population)

print(type(city_frame.population))

city_frame.population

From the previous example, we can see that we have not copied the
population column. "p" is a view on the data of city_frame.
Assigning New Values to a Column

The column area is still not defined. We can set all elements of the
column to the same value:

city_frame["area"] = 1572
print(city_frame)

In this case, it will be definitely better to assign the exact area to the
cities. The list with the area values needs to have the same length as the
number of rows in our DataFrame.

# area in square km:

area = [1572, 891.85, 605.77, 1285,
105.4, 414.6, 228, 755,
525.2, 517, 101.9, 310.4,
181.8]
# area could have been designed as a list, a
Series, an array or a scalar
city_frame["area"] = area
print(city_frame)

Sorting DataFrames

Let's sort our DataFrame according to the city area:

city_frame = city_frame.sort_values(by="area",
ascending=False)
print(city_frame)

Let's assume, we have only the areas of London, Hamburg and Milan. The
areas are in a series with the correct indices. We can assign this series as
well:

city_frame = pd.DataFrame(cities,
columns=["country",
"area",
"population"],
index=cities["name"])
some_areas = pd.Series([1572, 755, 181.8],
index=['London', 'Hamburg', 'Milan'])
city_frame['area'] = some_areas
print(city_frame)

Inserting new columns into existing DataFrames

In the previous example we have added the column area at creation time.
Quite often it will be necessary to add or insert columns into existing
DataFrames.

city_frame = pd.DataFrame(cities,
columns=["country",
"population"],
index=cities["name"])
city_frame
idx = 1
city_frame.insert(loc=idx, column='area',
value=area)
city_frame

DataFrame from Nested Dictionaries

A nested dictionary of dicts can be passed to a DataFrame as well. The
indices of the outer dictionary are taken as the the columns and the inner
keys. i.e. the keys of the nested dictionaries, are used as the row indices:

growth = {"Switzerland": {"2010": 3.0, "2011":

1.8, "2012": 1.1, "2013": 1.9},
"Germany": {"2010": 4.1, "2011": 3.6, "2012":
0.4, "2013": 0.1},
"France": {"2010":2.0, "2011":2.1, "2012":
0.3, "2013": 0.3},
"Greece": {"2010":-5.4, "2011":-8.9, "2012":-
6.6, "2013": -3.3},
"Italy": {"2010":1.7, "2011": 0.6, "2012":-
2.3, "2013":-1.9}
}

growth_frame = pd.DataFrame(growth)
growth_frame

You like to have the years in the columns and the countries in the rows?
No problem, you can transpose the data:

growth_frame.T

growth_frame = growth_frame.T
growth_frame2 =
growth_frame.reindex(["Switzerland",
"Italy",
"Germany",
"Greece"])
print(growth_frame2)

Filling a DataFrame with random values:

import numpy as np
names = ['Frank', 'Eve', 'Stella', 'Guido',
'Lara']
index = ["January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"]
df = pd.DataFrame((np.random.randn(12,
5)*1000).round(2),
columns=names,
index=index)
df

Reading CSV and DSV Files

Pandas offers two ways to read in CSV or DSV files to be precise:

DataFrame.from_csv
read_csv
There is no big difference between those two functions, e.g. they have
different default values in some cases and read_csv has more paramters.
We will focus on read_csv, because DataFrame.from_csv is kept inside
Pandas for reasons of backwards compatibility.

import pandas as pd
exchange_rates =
pd.read_csv("/home/madhu/Desktop/datasets/dolla
r_euro.txt",
sep="\t")
print(exchange_rates)

As we can see, read_csv used automatically the first line as the names for
the columns. It is possible to give other names to the columns. For this
purpose, we have to skip the first line by setting the parameter "header"
to 0 and we have to assign a list with the column names to the parameter
"names":

import pandas as pd
exchange_rates =
pd.read_csv("/home/madhu/Desktop/datasets/dolla
r_euro.txt",
sep="\t",
header=0,
names=["year", "min", "max", "days"])
print(exchange_rates)

Exercise
The file "countries_population.csv" is a csv file, containing the population
numbers of all countries (July 2014). The delimiter of the file is a space
and commas are used to separate groups of thousands in the numbers.
The method 'head(n)' of a DataFrame can be used to give out only the
first n rows or lines. Read the file into a DataFrame.

pop =
pd.read_csv("/home/madhu/Desktop/datasets/countri
es_population.csv",
header=None,
names=["Country", "Population"],
index_col=0,
quotechar="'",
sep=" ",
thousands=",")
print(pop.head(5))

",
header=None,
names=["Country", "Population"],
index_col=0,
quotechar="'",
sep=" ",
thousands=",")
print(pop.head(5))
Writing csv Files
We can create csv (or dsv) files with the method "to_csv". Before we do
this, we will prepare some data to output, which we will write to a file. We
have two csv files with population data for various
countries. countries_male_population.csv contains the figures of the male
populations and countries_male_population.csv correspondingly the
numbers for the female populations. We will create a new csv file with the
sum:

column_names = ["Country"] + list(range(2002,

2013))
male_pop =
pd.read_csv("/home/madhu/Desktop/datasets/count
ries_male_population.csv",
header=None,
index_col=0,
names=column_names)
female_pop =
pd.read_csv("/home/madhu/Desktop/datasets/count
ries_female_population.csv",
header=None,
index_col=0,
names=column_names)
population = male_pop + female_pop
population

population.to_csv("/home/madhu/Desktop/
datasets/countries_total_population.csv")
We want to create a new DataFrame with all the information, i.e. female,
male and complete population. This means that we have to introduce an
hierarchical index. Before we do it on our DataFrame, we will introduce
this problem in a simple example:

import pandas as pd
shop1 = {"foo":{2010:23, 2011:25}, "bar":
{2010:13, 2011:29}}
shop2 = {"foo":{2010:223, 2011:225}, "bar":
{2010:213, 2011:229}}
shop1 = pd.DataFrame(shop1)
shop2 = pd.DataFrame(shop2)
both_shops = shop1 + shop2
print("Sales of shop1:\n", shop1)
print("\nSales of both shops\n", both_shops)

shops = pd.concat([shop1, shop2], keys=["one",

"two"])
shops

We want to swap the hierarchical indices. For this we will

use 'swaplevel':

shops.swaplevel()
shops.sort_index(inplace=True)
shops
We will go back to our initial problem with the population
figures. We will apply the same steps to those
DataFrames:

pop_complete = pd.concat([population.T,
male_pop.T,
female_pop.T],
keys=["total", "male", "female"])
df = pop_complete.swaplevel()
df.sort_index(inplace=True)
df[["Austria", "Australia", "France"]]

df.to_csv("/home/madhu/Desktop/datasets/countries_total_population.csv")

Exercise
Read in the dsv file (csv) bundeslaender.txt. Create a new
file with the columns 'land', 'area', 'female', 'male',
'population' and 'density' (inhabitants per square
kilometres.
print out the rows where the area is greater than
30000 and the population is greater than 10000
Print the rows where the density is greater than 300

lands =
pd.read_csv('/home/madhu/Desktop/datasets/bunde
slaender.txt', sep=" ")
print(lands.columns.values)
‘nan' in Python
Python knows NaN values as well. We can create it with "float":

n1 = float("nan")
n2 = float("Nan")
n3 = float("NaN")
n4 = float("NAN")
print(n1, n2, n3, n4)

"nan" is also part of the math module since Python 3.5:

import math
n1 = math.nan
print(n1)
print(math.isnan(n1))

nan
True

NaN in Pandas
Example without NaNs
Before we will work with NaN data, we will process a file without any NaN
values. The data file temperatures.csv contains the temperature data of
six sensors taken every 15 minuts between 6:00 to 19.15 o'clock.

Reading in the data file can be done with the read_csv function:
import pandas as pd
df =
pd.read_csv("home/madhu/Desktop/datasets/temper
atures.csv",
sep=";",
decimal=",")
df.loc[:3]

We want to calculate the avarage temperatures per measuring point over

all the sensors. We can use the DataFrame method 'mean'. If we use
'mean' without parameters it will sum up the sensor columns, which isn't
what we want, but it may be interesting as well:

df.mean()

average_temp_series = df.mean(axis=1)
print(average_temp_series[:8])

sensors = df.columns.values[1:]
# all columns except the time column will be
removed:
df = df.drop(sensors, axis=1)
print(df[:5])

We will assign now the average temperature values as a

new column 'temperature':
# best practice:
df = df.assign(temperature=average_temp_series)
# inplace option not available
# alternatively:
#df.loc[:,"temperature"] = average_temp_series

df[:3]

Example with NaNs

We will use now a data file similar to the previous temperature csv, but
this time we will have to cope with NaN data, when the sensors
malfunctioned.

We will create a temperature DataFrame, in which some data is not

defined, i.e. NaN.

We will use and change the data from the the temperatures.csv file:

temp_df =
pd.read_csv("home/madhu/Desktop/datasets/temper
atures.csv",
sep=";",
index_col=0,
decimal=",")

We will randomly assign some NaN values into the data frame. For this
purpose, we will use the where method from DataFrame. If we apply
where to a DataFrame object df, i.e. df.where(cond, other_df), it will
return an object of same shape as df and whose corresponding entries are
from df where the corresponding element of cond is True and otherwise
are taken from other_df.

Before we continue with our task, we will demonstrate the way of working
of where with some simple examples:

s = pd.Series(range(5)) s.where(s > 0)

import numpy as np
A = np.random.randint(1, 30, (4, 2))
df = pd.DataFrame(A, columns=['Foo', 'Bar'])
m = df % 2 == 0
df.where(m, -df, inplace=True)
df

For our task, we need to create a DataFrame 'nan_df', which consists

purely of NaN values and has the same shape as our temperature
DataFrame 'temp_df'. We will use this DataFrame in 'where'. We also
need a DataFrame with the conditions "df_bool" as True values. For this
purpose we will create a DataFrame with random values between 0 and 1
and by applying 'random_df < 0.8' we get the df_bool DataFrame, in
which about 20 % of the values will be True:

random_df =
pd.DataFrame(np.random.random(size=(54, 6)),
columns=temp_df.columns.values,
index=temp_df.index)
nan_df = pd.DataFrame(np.nan,
columns=temp_df.columns.values,
index=temp_df.index)
df_bool = random_df<0.8
df_bool[:5]

Finally, we have everything toghether to create our DataFrame with

distrubed measurements:

disturbed_data = temp_df.where(df_bool, nan_df)

disturbed_data.to_csv("data1/
temperatures_with_NaN.csv")
disturbed_data[:10]

Using dropna on the DataFrame

'dropna' is a DataFrame method. If we call this method without
arguments, it will return an object where every row is ommitted, in which
data are missing, i.e. some value is NaN:

df = disturbed_data.dropna()
df

'dropna' can also be used to drop all columns in which some values are
NaN. This can be achieved by assigning 1 to the axis parameter. The
default value is False, as we have seen in our previous example. As every
column from our sensors contain NaN values, they will all disappear:

df = disturbed_data.dropna(axis=1)
df[:5]

Let us change our task: We only want to get rid of all the rows, which
contain more than one NaN value. The parameter 'thresh' is ideal for this
task. It can be set to the minimum number. 'thresh' is set to an integer
value, which defines the minimum number of non-NaN values. We have
six temperature values in every row. Setting 'thresh' to 5 makes sure that
we will have at least 5 valid floats in every remaining row:

cleansed_df = disturbed_data.dropna(thresh=5,
axis=0)
cleansed_df[:7]

Now we will calculate the mean values again, but this time on the
DataFrame 'cleansed_df', i.e. where we have taken out all the rows,
where more than one NaN value occurred.

average_temp_series = cleansed_df.mean(axis=1)
sensors = cleansed_df.columns.values
df = cleansed_df.drop(sensors, axis=1)
# best practice:
df = df.assign(temperature=average_temp_series)
# inplace option not available
df[:6]

Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
Subject IP
No ratings yet
Subject IP
9 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas
No ratings yet
Pandas
27 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
FALLSEMFY2023-24 BCSE101E ELA CH2023241700215 Reference Material II 24-11-2023 Introduction To Pandas
No ratings yet
FALLSEMFY2023-24 BCSE101E ELA CH2023241700215 Reference Material II 24-11-2023 Introduction To Pandas
15 pages
Pandas
No ratings yet
Pandas
25 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
Pandas (Paneled Data)
No ratings yet
Pandas (Paneled Data)
97 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
Pandas - Cheat - Sheet (1) - 240511 - 113437
No ratings yet
Pandas - Cheat - Sheet (1) - 240511 - 113437
1 page
Pandas
No ratings yet
Pandas
36 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
Lecture 9 Pandas
No ratings yet
Lecture 9 Pandas
176 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Pandas
No ratings yet
Pandas
20 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
EXPERIMENT 1 - Colab
No ratings yet
EXPERIMENT 1 - Colab
2 pages
LIst of Practicals 2024 - 25 Class Xii
No ratings yet
LIst of Practicals 2024 - 25 Class Xii
10 pages
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
No ratings yet
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
1 page
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Python Pandas Dataframe
No ratings yet
Python Pandas Dataframe
21 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas Fundamentals
No ratings yet
Pandas Fundamentals
90 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
10 pages
Lecture 1 On DataFrame
No ratings yet
Lecture 1 On DataFrame
4 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Ip Study
No ratings yet
Ip Study
18 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Python Workshops: Data Analysis & Visualization
No ratings yet
Python Workshops: Data Analysis & Visualization
43 pages
Pandas: DataFrames & Series Guide
No ratings yet
Pandas: DataFrames & Series Guide
2 pages
Chapter 1 - Part 2 - DataFrame
No ratings yet
Chapter 1 - Part 2 - DataFrame
48 pages
Pandas Data Indexing & Selection Guide
No ratings yet
Pandas Data Indexing & Selection Guide
8 pages
Unit 4
No ratings yet
Unit 4
27 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
DF 1
No ratings yet
DF 1
17 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
626 Fa 4 Be 30 C 3304 ADGECLAXMAREDDY
No ratings yet
626 Fa 4 Be 30 C 3304 ADGECLAXMAREDDY
56 pages
Zoho Interview Questions - 2024 Batch
0% (1)
Zoho Interview Questions - 2024 Batch
11 pages
100 Questions To Master DSA
No ratings yet
100 Questions To Master DSA
46 pages
Data Structures & Algorithms Practice Guide
No ratings yet
Data Structures & Algorithms Practice Guide
33 pages
Assignment 07 - Arrays
No ratings yet
Assignment 07 - Arrays
9 pages
Sorting Algorithms Report
No ratings yet
Sorting Algorithms Report
9 pages
Al-Junaid Tech Institute
No ratings yet
Al-Junaid Tech Institute
554 pages
DSA Sample Questions
No ratings yet
DSA Sample Questions
20 pages
Quicksort: Pseudo Code For Recursive Quicksort Function
No ratings yet
Quicksort: Pseudo Code For Recursive Quicksort Function
11 pages
In Place Sorting Vs Out Place Sorting
No ratings yet
In Place Sorting Vs Out Place Sorting
2 pages
Nagarro Placement Questions and Solution 2011-2012
100% (3)
Nagarro Placement Questions and Solution 2011-2012
13 pages
Chapter One:data Structures and Algorithm Analysis
No ratings yet
Chapter One:data Structures and Algorithm Analysis
209 pages
6 Tips and Tricks To Increase Labview Performance: Document Type Ni Supported Publish Date
No ratings yet
6 Tips and Tricks To Increase Labview Performance: Document Type Ni Supported Publish Date
2 pages
? DSA Master
No ratings yet
? DSA Master
7 pages
UNIT - 1 Advanced Algorithm PDF
50% (2)
UNIT - 1 Advanced Algorithm PDF
43 pages
Tut7 Soln
No ratings yet
Tut7 Soln
10 pages
A New Approach To Sorting Min Max Sorting Algorithm IJERTV2IS50210 PDF
No ratings yet
A New Approach To Sorting Min Max Sorting Algorithm IJERTV2IS50210 PDF
4 pages
Cs502 Midterm Solved Mcqs by Junaid Malik
No ratings yet
Cs502 Midterm Solved Mcqs by Junaid Malik
69 pages
Inplace and Outplace
No ratings yet
Inplace and Outplace
1 page
Department of Computer Education: Dalubhasaan NG Lungsod NG Calamba
No ratings yet
Department of Computer Education: Dalubhasaan NG Lungsod NG Calamba
3 pages
Examples: Input:: Inplace Rotate Square Matrix by 90 Degrees - Set 1
No ratings yet
Examples: Input:: Inplace Rotate Square Matrix by 90 Degrees - Set 1
4 pages
Java Array Programming Tasks
No ratings yet
Java Array Programming Tasks
8 pages
資結重點2
No ratings yet
資結重點2
112 pages
Sorting
No ratings yet
Sorting
21 pages
Programming and Data Structures
No ratings yet
Programming and Data Structures
74 pages
Algorithm-Lecture4 - Sorting-1
No ratings yet
Algorithm-Lecture4 - Sorting-1
45 pages
PTSC M&C Company Limited: Structural Checklist
100% (2)
PTSC M&C Company Limited: Structural Checklist
31 pages
Homework 2
No ratings yet
Homework 2
2 pages
COMP90038 Practice Exam SEO
No ratings yet
COMP90038 Practice Exam SEO
15 pages

1-Python Pandas Case Study

Uploaded by

1-Python Pandas Case Study

Uploaded by

Pandas - Series and Dataframe

We will demonstrate this in the following example.

We define the following three Series:

pd.concat([shop1, shop2, shop3])

shops_df = pd.concat([shop1, shop2, shop3],

Let's do some fine sanding by giving names to the columns:

cities = ["Zürich", "Winterthur", "Freiburg"]

This means, we can arrange or concat Series into DataFrames!

DataFrames from Dictionaries

cities = {"name": ["London", "Berlin",

Retrieving the Column Names

ordinals = ["first", "second", "third",

Rearranging the Order of Columns

(index=index_mapper, columns=columns_mapper, ...)

Existing Column as the Index of a DataFrame

Alternatively, we can change an existing DataFrame. We can us the

We saw in the previous example that the set_index method returns a

Label-Indexing on the Rows

Sum and Cumulative Sum

Assigning New Values to Columns

x is a Pandas Series. We can reassign the previously calculated cumulative

We can see that the column "cum_population" is set to Nan, as we

We can also include a column name which is not contained in the

Accessing the Columns of a DataFrame

# area in square km:

Let's sort our DataFrame according to the city area:

Inserting new columns into existing DataFrames

DataFrame from Nested Dictionaries

growth = {"Switzerland": {"2010": 3.0, "2011":

Filling a DataFrame with random values:

Reading CSV and DSV Files

column_names = ["Country"] + list(range(2002,

shops = pd.concat([shop1, shop2], keys=["one",

We want to swap the hierarchical indices. For this we will

"nan" is also part of the math module since Python 3.5:

We want to calculate the avarage temperatures per measuring point over

We will assign now the average temperature values as a

Example with NaNs

We will create a temperature DataFrame, in which some data is not

s = pd.Series(range(5)) s.where(s > 0)

For our task, we need to create a DataFrame 'nan_df', which consists

Finally, we have everything toghether to create our DataFrame with

disturbed_data = temp_df.where(df_bool, nan_df)

Using dropna on the DataFrame

You might also like