[go: up one dir, main page]

0% found this document useful (0 votes)
23 views17 pages

Lesson 23 Notes - Pandas Reading Data

Lesson 23 Notes - Pandas Reading Data

Uploaded by

3037171
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views17 pages

Lesson 23 Notes - Pandas Reading Data

Lesson 23 Notes - Pandas Reading Data

Uploaded by

3037171
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Lesson 23 Notes

Pandas - Reading Data


Warm Up:
Write the python code to prompt the user for the size of a dataset and then write a
loop to fill that dataset by appending the input to the list each iteration.

What is the potential issue with this method of creating a dataset in our programs?
The Pandas Module
The pandas module is used to do data analysis on datasets in tables. These tables
are referred to as data frames. The pandas module allows us to work with datasets
that we don’t have to hard code or rely on the user to input. It can do this by
reading the data from a .csv file or an excel (.xlsx) file. Once we have that data we
can analyze it like we have other datasets, we can clean it up, and we can graph it.
This lesson will focus only on the importing from a file.
We have to import the pandas module in order to use it. See the syntax below:
import pandas as pd
We don’t have to give pandas an alias, but it makes it easier to work with.
What is a CSV File
A .csv file is a data file in which all values are separated by commas on each line.
The values in a csv file can be used to create a 2 dimensional table, or can be
created from a 2 dimensional table. CSV literally stands for Comma Separated
Values. Below is an example of a .csv file in PyCharm. The colorful organization is
from a downloadable extension for Pycharm called Rainbow CSV.

Each row of the .csv file would


be like a row of a 2D table and
each column has the same
corresponding data for each
row of the table.
Reading From a .csv File
The first step to reading from a .csv file is to make sure it is in the same directory as
the program that will be pulling the data from it. We have compiled a library of .csv
files for your use and posted them to classroom.

Once you have a .csv file in the directory, we use the read_csv function to create a
data frame. A data frame is like a table of data that is stored by Python.

The syntax is:

data_frame = pd.read_csv(FILE_NAME.csv)
Printing a Data Frame
Printing a data frame can be anti-climactic since the way it prints by default skips
rows and columns if there are too many. This isn’t much of an issue since we rarely
want to look at the entire data frame anyway. Below is an example of how it prints.
Reading from an Excel File
We can also create a data frame by reading data from an Excel (.xlsx) file. It’s
easiest if you have the .xlsx file in the same directory as the Python file. The syntax
to create a data frame from an excel file is:

df = pd.read_excel(EXCEL_FILE.xlsx)
Isolating a Specific Row
We can isolate a specific row of a data frame by using the loc attribute. The syntax
is df.loc[row#]

See the example below:


Isolating a Specific Column
Probably the most common thing we will be doing with data frames is isolating
individual columns. The syntax to retrieve a single column from a data frame is:

df[COLUMN_NAME]

See the example:


Making a List From a Column:
Once we’ve isolated a column we use the tolist( ) function to convert the column
into a list. From there we can do everything we’ve already learned about with lists.

The syntax is:

df[COLUMN_NAME].tolist( )
Example:
Try It:
Which of the following could be the
missing code given the output?

A. df.row(19)
B. df.loc(19)
C. df.row[19]
D. df.loc[19]

Skip Ahead
Try It:
Which of the following could be the
missing code given the output?

A. df.column(“Population”)
B. df[“Population”]
C. pd.getColumn(“Population”)
D. None of these.

Skip Ahead
Try It:
Which of the following could be the
missing code given the output?

A. df[“State Name”].tolist( )
B. df[“State Name”].toList( )
C. df[“State Name”].to_list( )
D. None of these.

Skip Ahead
Allowing the User to Select a File by File Path
As we stated earlier, reading from a .csv or .xlsx file is easiest when the file is in the
same directory as the program reading from it. But it is possible to take the full file
path of a file and read from it. This would allow the user to create/download their
own .csv or .xlsx file and have the program work with it.

The good news is that, there’s nothing new that we have to learn to do this, we just
have to make sure that we use the correct read function to match the file type.
Example:

From here we can get


the user to choose a
column and turn it into a
list and do all of the
analysis that we already
know how to do.
This is the power of the
pandas module.
Wrap Up:
What is the major benefit of allowing the user to use their own CSV/Excel files for
data?

What would happen if the data in the file was updated?

You might also like