Automating Excel With Python - Michael Driscoll
Automating Excel With Python - Michael Driscoll
Michael Driscoll
* * * * *
This is a Leanpub book. Leanpub empowers authors and publishers with the
Lean Publishing process. Lean Publishing is the act of publishing an in-
progress ebook using lightweight tools and many iterations to get reader
feedback, pivot until you have the right book and build traction once you
do.
* * * * *
He thanks his wife for choosing to join him in life’s great adventure –
without her he would be a much poorer man.
OceanofPDF.com
Acknowledgments
Learning and writing about OpenPyXL and how to use Python to automate
Excel was a lot of fun. However, it wouldn’t have been as good a book as it
is without Ethan Furman returning as my technical reviewer and editor.
Thank you so much!
Steve Barnes is a long time reader and friend of mine who helped out by
reviewing a few chapters here and there. Steve gave me some great
feedback that helped make the book better.
Special thanks goes out to the folks who wrote OpenPyXL, XslxWriter,
Pandas and all the other great Python packages you can use to work with
Microsoft Excel.
Mike
OceanofPDF.com
Introduction
Welcome to Automating Excel with Python! Microsoft Excel is one of the
most popular spreadsheet programs in the world. Most businesses use Excel for
something in their work. Many businesses will add onto Excel using macros or
Visual Basic for Applications.
The purpose of this book is to help you learn how to use Python to work with
Excel. You will be using a package called OpenPyXL to create, read, and edit
Excel documents with Python. While the focus of this book will be on
OpenPyXL, you will also learn about other Python packages that you can use to
interact with Excel using the Python programming language.
By the time you have finished this book, you will have learned the following:
OpenPyXL is a versatile Python package that will allow you to automate Excel
using Python. You will be able to enhance your knowledge of Python and Excel
by reading this book. If you have only dabbled in programming before, then
you will find that Python can really expand your knowledge and abilities.
Conventions
This book doesn’t have a lot of conventions. However, there are a few small
ones. The first convention is how a code block is formatted:
# open_workbook.py
def open_workbook(path):
workbook = load_workbook(filename=path)
print(f'Worksheet names: {workbook.sheetnames}')
sheet = workbook.active
print(sheet)
print(f'The title of the Worksheet is: {sheet.title}')
if __name__ == '__main__':
open_workbook('books.xlsx')
You may also see tip blocks that look like this:
This is a tip block
Lorem ipsum
Other than that, there are no other conventions you need to know about.
Book Source Code
The book’s source code can be found on Github:
https://github.com/driscollis/automating_excel_with_python
The book’s code is under the MIT license, which is very permissive and allows
you to do almost anything with the code.
Reader Feedback
I welcome feedback about my writings. If you’d like to let me know what you
thought of the book, you can send comments to the following address:
comments@pythonlibrary.org
Errata
I try my best not to publish errors in my writings, but it happens from time to
time. If you happen to see an error in this book, feel free to let me know by
emailing me at the following:
errata@pythonlibrary.org
Cover Art
The cover art was created by Nisa Tokmak.
Dependencies
Installing OpenPyXL with pip
Installing OpenPyXL with conda
Using OpenPyXL in a Python Virtual Environment
Dependencies
To be able to use this book, you will need to have the Python programming
language installed. If you are on Mac or Linux, you may have Python
already installed. If you want to, you can install the latest version of Python
from their website.
You will also need some kind of spreadsheet software. You will be reading
and writing XLSX documents. These documents are a special type of XML.
They can be read by other spreadsheet programs besides Microsoft Excel.
While this book will focus on using Microsoft Excel and Python, you can
use Apple’s Numbers application, LibreOffice or any other spreadsheet
software that supports creating and reading XSLX files. However, a
spreadsheet program is technically not required at all to use OpenPyXL.
OpenPyXL can create, read and edit Excel files without needing any
spreadsheet applications installed.
The reason you may want to have a spreadsheet program installed is for you
to be able to open the spreadsheets yourself.
The other tool you will need is the OpenPyXL package. This package does
not come pre-installed in Python. There are multiple ways you can install
OpenPyXL. These will be covered in the next couple of sections.
Here is the command you should run on your machine to install OpenPyXL
to your system Python:
If you use Anaconda instead of Python, then you will want to see the next
section.
Once conda is finished, you should be ready to use OpenPyXL with your
Anaconda distribution.
This will create a folder named test in the directory that you are currently
in. You can name the folder whatever you want to. When you run this
command, venv will copy over the Python executable to the folder along
with a few other files.
To use the virtual environment on Linux or Mac, you will need to change
directories to be inside the test folder, then run this command:
source bin/activate
Your terminal or console will now update so that the prompt says “test”.
That indicates that the virtual environment is active.
On Windows, you will need to go into the Scripts folder and run the
activate.bat file to activate your virtual environment.
Once your virtual environment is running, you can use pip to install
OpenPyXL.
When you are done with your virtual environment, you can run the
deactivate command in your console or terminal. On Windows, there is a
deactivate.bat file in the Scripts folder that you will need to use to
deactivate your environment.
Wrapping Up
At this point, you should be all set up. You will be able to use OpenPyXL to
create and edit Microsoft Excel documents. You should spend some time
familiarizing yourself with pip, conda and virtualenv, if you haven’t
already. All three of these tools have multiple command line options that
allow you to tailor your installation or virtual environments.
Anaconda has its own Python virtual environment, for example. See their
documentation for details.
Before you dive into automating Excel with Python, you should understand
some of the common terminologies:
Now that you have some basic understanding of the vocabulary, you can
move on.
Open a spreadsheet
Read specific cells
Read cells from a specific row
Read cells from a specific column
Read cells from multiple rows or columns
Read cells from a range
Read all cells in all sheets
You can get started by learning how to open a workbook in the next section!
Open a Spreadsheet
The first item that you need is a Microsoft Excel file. You can use the file
that is in this book’s GitHub code repository. There is a file in the chapter 2
folder called books.xlsx that you will use here.
Note: The data in these sheets are inaccurate, but they help learn how
to use OpenPyXL.
Now you’re ready to start coding! Open up your favorite Python editor and
create a new file named open_workbook.py. Then add the following code to
your file:
# open_workbook.py
def open_workbook(path):
workbook = load_workbook(filename=path)
print(f"Worksheet names: {workbook.sheetnames}")
sheet = workbook.active
print(sheet)
print(f"The title of the Worksheet is: {sheet.title}")
if __name__ == "__main__":
open_workbook("books.xlsx")
The first step in this code is to import load_workbook() from the openpyxl
package. The load_workbook() function will load up your Excel file and
return it as a Python object. You can then interact with that Python object
like you would any other object in Python.
You can get a list of the worksheets in the Excel file by accessing the
sheetnames attribute. This list contains the titles of the worksheets from left
to right in your Excel file. Your code will print out this list.
Next, you grab the currently active sheet. If your workbook only has one
worksheet, then that sheet will be the active one. If your workbook has
multiple worksheets, as this one does, then the last worksheet will be the
active one.
The last two lines of your function print out the Worksheet object and the
title of the active worksheet.
# read_specific_sheet.py
if __name__ == "__main__":
open_workbook("books.xlsx", sheet_name="Sales")
Now you are ready to move on and learn how to read data from the cells
themselves.
# reading_specific_cells.py
def get_cell_info(path):
workbook = load_workbook(filename=path)
sheet = workbook.active
print(sheet)
print(f'The title of the Worksheet is: {sheet.title}')
print(f'The value of A2 is {sheet["A2"].value}')
print(f'The value of A3 is {sheet["A3"].value}')
cell = sheet['B3']
print(f'The variable "cell" is {cell.value}')
if __name__ == '__main__':
get_cell_info('books.xlsx')
In this example, there are three hard-coded cells: A2, A3 and B3. You can
access their values by using dictionary-like access: sheet["A2"].value.
Alternatively, you can assign sheet["A2"] to a variable and then do
something like cell.value to get the cell’s value.
You can see both of these methods demonstrated in your code above.
When you run this code, you should see the following output:
<Worksheet "Sales">
The title of the Worksheet is: Sales
The value of A2 is 'Python 101'
The value of A3 is 'wxPython Recipes'
The variable "cell" is 5
This output shows how you can easily extract specific cell values from
Excel using Python.
Now you’re ready to learn how you can read the data from a specific row of
cells!
# reading_row_cells.py
sheet = workbook[sheet_name]
for cell in sheet[row]:
print(f"{cell.column_letter}{cell.row} = {cell.value}")
if __name__ == "__main__":
iterating_row("books.xlsx", sheet_name="Sheet 1 - Books",
row=2)
In this example, you pass in the row number 2. You can iterate over the
values in the row like this:
That makes grabbing the values from a row pretty straightforward. When
you run this code, you’ll get the following output:
A2 = Title
B2 = Author
C2 = Publisher
D2 = Publishing Date
E2 = ISBN
F2 = None
G2 = None
Those last two values are both None. If you don’t want to get values that
are None, you should add some extra processing to check if the value is
None before printing it out. You can try to figure that out yourself as an
exercise.
You are now ready to learn how to get cells from a specific column!
To see how you can do that, create a new file and name it
reading_column_cells.py. Then enter this code:
# reading_column_cells.py
sheet = workbook[sheet_name]
for cell in sheet[col]:
print(f"{cell.column_letter}{cell.row} = {cell.value}")
if __name__ == "__main__":
iterating_column("books.xlsx", sheet_name="Sheet 1 - Books",
col="A")
This code is very similar to the code in the previous section. The difference
here is that you are replacing sheet[row] with sheet[col] and iterating on
that instead.
In this example, you set the column to “A”. When you run this code, you
will get the following output:
A1 = Books
A2 = Title
A3 = Python 101
A4 = wxPython Recipes
A5 = Python Interviews
A6 = None
A7 = None
A8 = None
A9 = None
A10 = None
A11 = None
A12 = None
A13 = None
A14 = None
A15 = None
A16 = None
A17 = None
A18 = None
A19 = None
A20 = None
A21 = None
A22 = None
A23 = None
Once again, some columns have no data (i.e., “None”). You can edit this
code to ignore empty cells and only process cells that have contents.
iter_rows()
iter_cols()
You use the min and max rows and column parameters to tell OpenPyXL
which rows and columns to iterate over. You can have OpenPyXL return the
data from the cells by setting values_only to True. If you set it to False,
iter_rows() and iter_cols() will return cell objects instead.
It’s always good to see how this works with actual code. With that in mind,
create a new file named iterating_over_cells_in_rows.py and add this
code to it:
# iterating_over_cells_in_rows.py
sheet = workbook[sheet_name]
for value in sheet.iter_rows(
min_row=1, max_row=3, min_col=1, max_col=3,
values_only=True):
print(value)
if __name__ == "__main__":
iterating_over_values("books.xlsx", sheet_name="Sheet 1 - Books")
Here you load up the workbook as you have in the previous examples. You
get the sheet name that you want to extract data from and then use
iter_rows() to get the rows of data. In this example, you set the minimum
row to 1 and the maximum row to 3. That means that you will grab the first
three rows in the Excel sheet you have specified.
When you run this code, you will get the following output:
Your program will print out the first three columns of the first three rows in
your Excel spreadsheet. Your program prints the rows as tuples with three
items in them. You are using iter_rows() as a quick way to iterate over
rows and columns in an Excel spreadsheet using Python.
# read_cells_from_range.py
import openpyxl
from openpyxl import load_workbook
sheet = workbook[sheet_name]
for column in sheet[cell_range]:
for cell in column:
if isinstance(cell, openpyxl.cell.cell.MergedCell):
# Skip this cell
continue
print(f"{cell.column_letter}{cell.row} = {cell.value}")
if __name__ == "__main__":
iterating_over_values("books.xlsx", sheet_name="Sheet 1 - Books",
cell_range="A1:B6")
Here you pass in your cell_range and iterate over that range using the
following nested for loop:
You check to see if the cell that you are extracting is a MergedCell. If it is,
you skip it. Otherwise, you print out the cell name and its value.
When you run this code, you should see the following output:
A1 = Books
A2 = Title
B2 = Author
A3 = Python 101
B3 = Mike Driscoll
A4 = wxPython Recipes
B4 = Mike Driscoll
A5 = Python Interviews
B5 = Mike Driscoll
A6 = None
B6 = None
That worked quite well. You should take a moment and try out a few other
range variations to see how it changes the output.
Note: while the image of “Sheet 1 - Books” looks like cell A1 is distinct
from the merged cell B1-G1, A1 is actually part of that merged cell.
The last code example that you’ll create will read all the data in your Excel
document!
The natural way to iterate through an Excel file is to read the sheets from
left to right, and within each sheet, you would read it row by row, from top
to bottom. That is what you will learn how to do in this section.
You will take what you have learned in the previous sections and apply it
here. Create a new file and name it read_all_data.py. Then enter the
following code:
# read_all_data.py
import openpyxl
from openpyxl import load_workbook
def read_all_data(path):
workbook = load_workbook(filename=path)
for sheet_name in workbook.sheetnames:
sheet = workbook[sheet_name]
print(f"Title = {sheet.title}")
for row in sheet.rows:
for cell in row:
if isinstance(cell, openpyxl.cell.cell.MergedCell):
# Skip this cell
continue
print(f"{cell.column_letter}{cell.row} = {cell.value}")
if __name__ == "__main__":
read_all_data("books.xlsx")
Here you load up the workbook as before, but this time you loop over the
sheetnames. You print out each sheet name as you select it. You use a
nested for loop to loop over the rows and cells to extract the data from your
spreadsheet.
Once again, you skip MergedCells because their value is None – the actual
value is in the normal cell that the MergedCell is merged with. If you run
this code, you will see that it prints out all the data from the two
worksheets.
You can simplify this code a bit by using iter_rows(). Open up a new file
and name it read_all_data_values.py. Then enter the following:
# read_all_data_values.py
import openpyxl
from openpyxl import load_workbook
def read_all_data(path):
workbook = load_workbook(filename=path)
for sheet_name in workbook.sheetnames:
sheet = workbook[sheet_name]
print(f"Title = {sheet.title}")
for value in sheet.iter_rows(values_only=True):
print(value)
if __name__ == "__main__":
read_all_data("books.xlsx")
In this code, you once again loop over the sheet names in the Excel
document. However, rather than looping over the rows and columns, you
use iter_rows() to loop over only the rows. You set values_only to True
which will return a tuple of values for each row. You also do not set the
minimum and maximum rows or columns for iter_rows() because you
want to get all the data.
When you run this code, you will see it print out the name of each sheet,
then all the data in that sheet, row-by-row. Give it a try on your own Excel
worksheets and see what this code can do!
Wrapping Up
OpenPyXL lets you read an Excel Worksheet and its data in many different
ways. You can extract values from your spreadsheets quickly with a
minimal amount of code.
Open a spreadsheet
Read specific cells
Read cells from a specific row
Read cells from a specific column
Read cells from multiple rows or columns
Read cells from a range
Read all cells in all sheets
Now you are ready to learn how to create an Excel spreadsheet using
OpenPyXL. That is the subject of the next chapter!
OceanofPDF.com
Chapter 3 - Creating a
Spreadsheet with OpenPyXL
Reading Excel spreadsheets is all well and good. However, you also need to
be able to create or edit a spreadsheet. The focus of this chapter will be on
learning how to do that! OpenPyXL lets you create Microsoft Excel
spreadsheets with a minimum of fuss.
Once you learn how to create Excel spreadsheets with Python, you can
leverage that knowledge to transform other data into Excel spreadsheets.
This knowledge also allows you to do the reverse, taking in an Excel
spreadsheet and output a different format, such as JSON or XML.
In this chapter, you will learn how to use OpenPyXL to do the following:
Create a spreadsheet
Write to a spreadsheet
Add and remove sheets
Insert and delete rows and columns
Edit cell data
Create merged cells
Fold rows and columns
Creating a Spreadsheet
Creating an empty spreadsheet using OpenPyXL doesn’t take much code.
Open up your Python editor and create a new file. Name it
creating_spreadsheet.py.
# creating_spreadsheet.py
def create_workbook(path):
workbook = Workbook()
workbook.save(path)
if __name__ == "__main__":
create_workbook("hello.xlsx")
The critical piece here is that you need to import the Workbook class. This
class allows you to instantiate a workbook object that you can then save. All
this code does is create the file that you pass to it and save it.
Now you’re ready to learn how to add some data to the cells in your
spreadsheet.
Writing to a Spreadsheet
When writing data in a spreadsheet, you need to get the “sheet” object. You
learned how to do that in the previous chapter using workbook.active,
which gives you the active or currently visible sheet. You could also
explicitly tell OpenPyXL which sheet you want to access by passing it a
sheet title.
For this example, you will create another new program and then use the
active sheet. Open up a new file and name it adding_data.py. Now add this
code to your file:
# adding_data.py
def create_workbook(path):
workbook = Workbook()
sheet = workbook.active
sheet["A1"] = "Hello"
sheet["A2"] = "from"
sheet["A3"] = "OpenPyXL"
workbook.save(path)
if __name__ == "__main__":
create_workbook("hello.xlsx")
This code will overwrite the previous example’s Excel spreadsheet. After
you create the Workbook() object, you grab the active Worksheet. Then you
add text strings to the cells: A1, A2, and A3. The last step is to save your
new spreadsheet.
When you run this code, your new spreadsheet will look like this:
Fig. 3-2: A Spreadsheet with Data
You can use this technique to write data to any cell in your spreadsheet.
What if you need to add rows of data? You can use the worksheet’s
append() method for that! To see how that might work, create a new file
named adding_data_rows.py and add the following code:
# adding_data_rows.py
def create_workbook(path):
workbook = Workbook()
sheet = workbook.active
data = [[1, 2, 3],
["a", "b", "c"],
[44, 55, 66]]
for row in data:
sheet.append(row)
workbook.save(path)
if __name__ == "__main__":
create_workbook("write_rows.xlsx")
In this example, you create a list of lists that represents three rows of data.
Then you iterate over your list of lists, calling append() each time. The
append() method will add data to each row in your Excel spreadsheet
starting at A1.
After running this code, your new spreadsheet will look like this:
# creating_sheet_title.py
def create_sheets(path):
workbook = Workbook()
sheet = workbook.active
sheet.title = "Hello"
sheet2 = workbook.create_sheet(title="World")
workbook.save(path)
if __name__ == "__main__":
create_sheets("hello_sheets.xlsx")
Here you create the Workbook and then grab the active Worksheet. You can
then set the Worksheet’s title using the title attribute. The following line
of code adds a new worksheet to the Workbook by calling create_sheet().
The create_sheet() method takes two parameters: title and index. The
title attribute gives a title to the Worksheet. The index tells the Workbook
where to insert the Worksheet, from left to right. If you specify zero, your
Worksheet gets inserted at the beginning.
If you run this code, your new spreadsheet will look like this:
# delete_sheets.py
import openpyxl
def create_worksheets(path):
workbook = openpyxl.Workbook()
workbook.create_sheet()
print(workbook.sheetnames)
# Insert a worksheet
workbook.create_sheet(index=1, title="Second sheet")
print(workbook.sheetnames)
del workbook["Second sheet"]
print(workbook.sheetnames)
workbook.save(path)
if __name__ == "__main__":
create_worksheets("del_sheets.xlsx")
In this example, you create two new sheets. The first Worksheet has no title
specified, so it defaults to “Sheet1”. You supply a title to the second sheet,
and then you print out all the current worksheet titles.
Next, you use Python’s del keyword to delete the Worksheet’s name from
the workbook, which removes the sheet. Then you print out the current
worksheet titles again.
['Sheet', 'Sheet1']
['Sheet', 'Second sheet', 'Sheet1']
['Sheet', 'Sheet1']
The first Worksheet gets created automatically when you instantiate the
Workbook. The Worksheet is named “Sheet”. Then you make “Sheet1”.
Lastly, you create “Second sheet”, but you insert it at position 1, which tells
the Workbook to shift ‘Sheet1’ to the right by one position.
You can see from the output above how the worksheets are ordered before
and after you add and delete the “Second sheet”.
Now let’s learn about inserting and removing rows and columns!
.insert_rows()
.delete_rows()
.insert_cols()
.delete_cols()
You can use the insert methods to insert rows or columns at the specified
index.
Open up a new file and name it insert_demo.py. Then enter this code in
your new file:
# insert_demo.py
def inserting_cols_rows(path):
workbook = Workbook()
sheet = workbook.active
sheet["A1"] = "Hello"
sheet["A2"] = "from"
sheet["A3"] = "OpenPyXL"
# insert a column before A
sheet.insert_cols(idx=1)
# insert 2 rows starting on the second row
sheet.insert_rows(idx=2, amount=2)
workbook.save(path)
if __name__ == "__main__":
inserting_cols_rows("inserting.xlsx")
Here you create another new Spreadsheet. In this case, you add text to the
first three cells in the “A” column. Then you insert one column at index
one. That means you inserted a single column before “A”, which causes the
cells in column “A” to shift to column “B”.
Next, you insert two new rows starting at index two. This code will insert
two rows between the first and second rows.
You can see how this changes things by taking a look at the following
screenshot:
Try changing the indexes or number of rows and columns that you want to
insert and see how it works.
You will also need to delete columns and rows from time to time. To do that
you will use .delete_rows() and .delete_cols().
Open up a new file and name it delete_demo.py. Then add this code:
# delete_demo.py
def deleting_cols_rows(path):
workbook = Workbook()
sheet = workbook.active
sheet["A1"] = "Hello"
sheet["B1"] = "from"
sheet["C1"] = "OpenPyXL"
sheet["A2"] = "row 2"
sheet["A3"] = "row 3"
sheet["A4"] = "row 4"
# Delete column A
sheet.delete_cols(idx=1)
# delete 2 rows starting on the second row
sheet.delete_rows(idx=2, amount=2)
workbook.save(path)
if __name__ == "__main__":
deleting_cols_rows("deleting.xlsx")
In this example, you add text to six different cells. Four of those cells are in
column “A”. Then you use delete_cols() to delete column “A”! That
means you got rid of four values. Next, you delete two rows, starting at row
number two.
When you run this code, your result should look like this:
Fig. 3-6: Deleting Rows and Columns
Try editing the index or amount values to get familiar with deleting rows
and columns.
For this example, you will use the inserting.xlsx file you created in the
previous section. Now create a new Python file named editing_demo.py.
Then add the following code:
# editing_demo.py
if __name__ == "__main__":
data = {"B1": "Hi", "B5": "Python"}
edit("inserting.xlsx", data)
This code loads up the Excel file that you created in the previous section. It
then loops over each value in the data dictionary that you passed to the
edit() function. You get the current value for the cell using a key in the
dictionary. Then you change that cell’s value to match the value in the
dictionary.
To make it more obvious what is going on, you print out the old and new
values of the cell.
When you run this code, you will see the following output:
Open up the new version of the inserting.xlsx file, and it should now
look like this:
Fig. 3-7: Editing Cells
Here you can see how the cell values have changed to match the one
specified in the data dictionary.
Now you can move on and learn how to create merged cells!
To see how this works in practice, create a file called merged_cells.py and
add this code to it:
# merged_cells.py
if __name__ == "__main__":
create_merged_cells("merged.xlsx", "Hello World")
OpenPyXL has many ways to style cells. In this example, you import
Alignment from openpyxl.styles. You will learn more about styles and
formatting cells in a later chapter.
Here you merge the cells “A2:E2” and set the alignment to the center of the
cell. Then you set the value of “A2” to a string that you passed to the
create_merged_cells() function.
When you run this example, your new Excel spreadsheet will look like this:
To get some hands-on experience, change the range of cells you want to
merge and try it with and without the alignment set.
Now you are ready to learn about folding columns or rows!
OpenPyXL supports folding too. To see how this works, create a new file
named folding.py and enter the following code:
# folding.py
import openpyxl
if rows:
begin_row, end_row = rows
sheet.row_dimensions.group(begin_row, end_row, hidden=hidden)
if cols:
begin_col, end_col = cols
sheet.column_dimensions.group(begin_col, end_col, hidden=hidden)
workbook.save(path)
if __name__ == "__main__":
folding("folded.xlsx", rows=(1, 5), cols=("C", "F"))
When you run this code, your spreadsheet will look like this:
Fig. 3-9: Folding Cells
You can see in this spreadsheet that some of the rows and columns are
folded or hidden. There is a “+” symbol next to row 6 and another “+”
symbol above column “G”. If you click on either of those buttons, it will
expand the folded rows or columns.
Give this code a try. You can also experiment with different row or column
ranges.
Freezing Panes
Microsoft Excel allows you to freeze panes. What that means is that you
can freeze one or more columns or rows. One popular use case is to freeze a
row of headers so that the headers are always visible while scrolling
through a lot of data.
You can see how this works by writing some code. Open up a new file and
name it freezing_panes.py. Then enter the following into it:
# freezing_panes.py
if __name__ == "__main__":
freeze("freeze.xlsx", row_to_freeze="A2")
Here you create a new Workbook and set the active sheet’s title to “Freeze”.
Then you set the freeze_panes attribute to “A2”. The rest of the code in
the function adds a couple of rows of data to the Worksheet.
When you run this code, the spreadsheet that you create will look like this:
Fig. 3-10: Freeze Panes
Try scrolling down through some rows in the spreadsheet. The top row
should always remain visible because it has been “frozen”.
Wrapping Up
You can use OpenPyXL not only to create an Excel spreadsheet, but modify
a pre-existing one. In this chapter, you learned how to do the following:
Create a spreadsheet
Write to a spreadsheet
Add and remove sheets
Insert and delete rows and columns
Edit cell data
Create merged cells
Freeze panes
Give the examples in this chapter a try. Then modify them a bit to see what
else you can do on your own. When you’re ready, go on to the next chapter!
OceanofPDF.com
Chapter 4 - Styling Cells
OpenPyXL gives you the ability to style your cells in many different ways.
Styling cells will give your spreadsheets pizazz! Your spreadsheets can have
some pop and zing to them that will help differentiate them from others.
However, don’t go overboard! If every cell had a different font and color,
your spreadsheet would look like a mess.
You should use the skills that you learn in this chapter sparingly. You’ll still
have beautiful spreadsheets that you can share with your colleagues. If you
would like to learn more about what styles OpenPyXL supports, you should
check out their documentation.
Now that you know what you’re going to learn, it’s time to get started by
discovering how to work with fonts using OpenPyXL!
The Font class takes many parameters. Here is the Font class’s full list of
parameters according to OpenPyXL’s documentation:
The following list shows the parameters you are most likely to use and their
defaults:
name=’Calibri’
size=11
bold=False
italic=False
vertAlign=None
underline=’none’
strike=False
color=’FF000000’
These settings allow you to set most of the things you’ll need to make your
text look nice. Note that the color names in OpenPyXL use hexadecimal
values to represent RGB (red, green, blue) color values. You can set
whether or not the text should be bold, italic, underlined, or struck-through.
To see how you can use fonts in OpenPyXL, create a new file named
font_sizes.py and add the following code to it:
# font_sizes.py
import openpyxl
from openpyxl.styles import Font
def font_demo(path):
workbook = openpyxl.Workbook()
sheet = workbook.active
cell = sheet["A1"]
cell.font = Font(size=12)
cell.value = "Hello"
cell2 = sheet["A2"]
cell2.font = Font(name="Arial", size=14, color="00FF0000")
sheet["A2"] = "from"
cell2 = sheet["A3"]
cell2.font = Font(name="Tahoma", size=16, color="00339966")
sheet["A3"] = "OpenPyXL"
workbook.save(path)
if __name__ == "__main__":
font_demo("font_demo.xlsx")
This code uses three different fonts in three different cells. In A1, you use
the default, which is Calibri. Then in A2, you set the font size to Arial and
increase the size to 14 points. Finally, in A3, you change the font to Tahoma
and the font size to 16 points.
For the second and third fonts, you also change the text color. In A2, you
set the color to red, and in A3, you set the color to green.
When you run this code, your output will look like this:
Fig. 4-1: Using Different Fonts in OpenPyXL
Try changing the code to use other fonts or colors. If you want to get
adventurous, you should try to make your text bold or italicized.
horizontal=’general’
vertical=’bottom’
text_rotation=0
wrap_text=False
shrink_to_fit=False
indent=0
It’s time for you to get some practice in. Open up your Python editor and
create a new file named alignment.py. Then add this code to it:
# alignment.py
if __name__ == "__main__":
center_text("alignment.xlsx")
You will center the string both horizontally and vertically in A1 when you
run this code. Then you use the defaults for A2. Finally, for A3, you rotate
the text 90 degrees.
Try running this code, and you will see something like the following:
Fig. 4-2: Changing Text Alignment in OpenPyXL
That looks nice! It would be best if you took the time to try out different
text_rotation values. Then try changing the horizontal and vertical
parameters with different values. Pretty soon, you will be able to align your
text like a pro!
Adding a Border
OpenPyXL gives you the ability to style the borders on your cell. You can
specify a different border style for each of the four sides of a cell.
‘dashDot’
‘dashDotDot’
‘dashed’
‘dotted’
‘double’
‘hair’
‘medium’
‘mediumDashDot’
‘mediumDashDotDot’,
‘mediumDashed’
‘slantDashDot’
‘thick’
‘thin’
Open your Python editor and create a new file named border.py. Then
enter the following code in your file:
# border.py
def border(path):
pink = "00FF00FF"
green = "00008000"
thin = Side(border_style="thin", color=pink)
double = Side(border_style="double", color=green)
workbook = Workbook()
sheet = workbook.active
sheet["A1"] = "Hello"
sheet["A1"].border = Border(top=double, left=thin, right=thin,
bottom=double)
sheet["A2"] = "from"
sheet["A3"] = "OpenPyXL"
sheet["A3"].border = Border(top=thin, left=double, right=double,
bottom=thin)
workbook.save(path)
if __name__ == "__main__":
border("border.xlsx")
This code will add a border to cell A1 and A3. The top and bottom of A1
use a “double” border style and are green, while the cell sides are using a
“thin” border style and are colored pink.
Cell A3 uses the same borders but swaps them so that the sides are now
green and the top and bottom are pink.
You get this effect by creating Side objects in the border_style and the
color to be used. Then you pass those Side objects to a Border class,
which allows you to set each of the four sides of a cell individually. To
apply the Border to a cell, you must set the cell’s border attribute.
When you run this code, you will see the following result:
This image is zoomed in a lot so that you can easily see the borders of the
cells. It would be best if you tried modifying this code with some of the
other border styles mentioned at the beginning of this section so that you
can see what else you can do.
patternType=None
fgColor=Color()
bgColor=Color()
fill_type=None
start_color=None
end_color=None
There are several different fill types you can use. Here is a list of currently
supported fill types:
‘none’
‘solid’
‘darkDown’
‘darkGray’
‘darkGrid’
‘darkHorizontal’
‘darkTrellis’
‘darkUp’
‘darkVertical’
‘gray0625’
‘gray125’
‘lightDown’
‘lightGray’
‘lightGrid’
‘lightHorizontal’
‘lightTrellis’
‘lightUp’
‘lightVertical’
‘mediumGray’
Now you have enough information to try setting the background color of a
cell using OpenPyXL. Open up a new file in your Python editor and name it
background_colors.py. Then add this code to your new file:
# background_colors.py
def background_colors(path):
workbook = Workbook()
sheet = workbook.active
yellow = "00FFFF00"
for rows in sheet.iter_rows(min_row=1, max_row=10, min_col=1, max_col=12):
for cell in rows:
if cell.row % 2:
cell.fill = PatternFill(start_color=yellow, end_color=yellow,
fill_type = "solid")
workbook.save(path)
if __name__ == "__main__":
background_colors("bg.xlsx")
This example will iterate over nine rows and 12 columns. It will set every
cell’s background color to yellow if that cell is in an odd-numbered row.
The cells with their background color changes will be from column A
through column L.
When you want to set the cell’s background color, you set the cell’s fill
attribute to an instance of PatternFill. In this example, you specify a
start_color and an end_color. You also set the fill_type to “solid”.
OpenPyXL also supports using a GradientFill for the background.
Try running this code. After it runs, you will have a new Excel document
that looks like this:
Fig. 4-4: Changing the Background Color in OpenPyXL
Here are some ideas that you can try out with this code:
Once you are done experimenting with background colors, you can learn
about inserting images in your cells!
img - The path to the image file that you are inserting
anchor - Provide a cell as a top-left anchor of the image (optional)
For this example, you will be using the Mouse vs. Python logo:
The GitHub repository for this book has the image for you to use.
Once you have the image downloaded, create a new Python file and name it
insert_image.py. Then add the following:
# insert_image.py
if __name__ == "__main__":
insert_image("logo.xlsx", "logo.png")
Here you pass in the path to the image that you wish to insert. To insert the
image, you call add_image(). In this example, you are hard-coding to use
cell B1 as the anchor cell. Then you save your Excel spreadsheet.
If you open up your spreadsheet, you will see that it looks like this:
Fig. 4-6: Inserting an Image
You probably won’t need to insert an image into an Excel spreadsheet all
that often, but it’s an excellent skill to have.
You also must use this particular cell to set the style for the merged cell as a
whole. You can use all the styles and font settings you used on an individual
cell with the merged cell. However, you must apply the style to the top-left
cell for it to apply to the entire merged cell.
You will understand how this works if you see some code. Go ahead and
create a new file named style_merged_cell.py. Now enter this code in
your file:
# style_merged_cell.py
def merge_style(path):
workbook = Workbook()
sheet = workbook.active
cell_range = "A2:G4"
sheet.merge_cells(cell_range)
top_left_cell = sheet["A2"]
light_purple = "00CC99FF"
green = "00008000"
thin = Side(border_style="thin", color=light_purple)
double = Side(border_style="double", color=green)
if __name__ == "__main__":
merge_style("merged_style.xlsx")
Here you create a merged cell that starts at A2 (the top-left cell) through
G4. Then you set the cell’s value, border, fill, font and alignment. You will
need to loop over the cell_range and apply the border to each of those
cells for the border to be applied correctly to the entirety of the merged cell.
When you run this code, your new spreadsheet will look like this:
Fig. 4-7: Styling a Merged Cell
Doesn’t that look nice? You should take some time and try out some
different styles on your merged cell. Maybe come up with a better gradient
than the gray one used here, for example.
However, it is worth noting some of the styles. For example, here are the
number format styles you can use:
‘Comma’
‘Comma [0]’
‘Currency’
‘Currency [0]’
‘Percent’
You can also apply text styles. Here is a listing of those styles:
‘Title’
‘Headline 1’
‘Headline 2’
‘Headline 3’
‘Headline 4’
‘Hyperlink’
‘Followed Hyperlink’
‘Linked Cell’
OpenPyXL has several other built-in style groups. You should check out the
documentation to learn about all the different styles that are supported.
Now that you know about some of the built-in styles you can use, it’s time
to write some code! Create a new file and name it builtin_styls.py. Then
enter the following code:
# builtin_styles.py
def builtin_styles(path):
workbook = Workbook()
sheet = workbook.active
sheet["A1"].value = "Hello"
sheet["A1"].style = "Title"
sheet["A2"].value = "from"
sheet["A2"].style = "Headline 1"
sheet["A3"].value = "OpenPyXL"
sheet["A3"].style = "Headline 2"
workbook.save(path)
if __name__ == "__main__":
builtin_styles("builtin_styles.xlsx")
Here you apply three different styles to three different cells. You use
“Title”, “Headline 1” and “Headline 2”, specifically.
When you run this code, you will end up having a spreadsheet that looks
like this:
As always, you should try out some of the other built-in styles. Trying them
out is the only way to determine what they do and if they will work for you.
But wait! What if you wanted to create your style? That’s what you will
cover in the next section!
The NamedStyle class takes the following arguments (defaults are included
too):
name=”Normal”
font=Font()
fill=PatternFill()
border=Border()
alignment=Alignment()
number_format=None
protection=Protection()
builtinId=None
hidden=False
xfId=None
You should always provide your own name to your NamedStyle to keep it
unique. Go ahead and create a new file and call it named_style.py. Then
add this code to it:
# named_style.py
def named_style(path):
workbook = Workbook()
sheet = workbook.active
red = "00FF0000"
font = Font(bold=True, size=22)
thick = Side(style="thick", color=red)
border = Border(left=thick, right=thick, top=thick, bottom=thick)
named_style = NamedStyle(name="highlight", font=font, border=border)
sheet["A1"].value = "Hello"
sheet["A1"].style = named_style
sheet["A2"].value = "from"
sheet["A3"].value = "OpenPyXL"
workbook.save(path)
if __name__ == "__main__":
named_style("named_style.xlsx")
Here you create a Font(), Side(), and Border() instance to pass to your
NamedStyle(). Once you have your custom style created, you can apply it
to a cell by setting the cell’s style attribute. Applying a custom style is
done in the same way as you applied built-in styles!
Now it’s your turn! Edit the code to use a Side style, which will change
your border. Or create multiple Side instances so you can make each side of
the cell unique. Play around with different fonts or add a custom
background color!
Wrapping Up
You can do a lot of different things with cells using OpenPyXL. The
information in this chapter gives you the ability to format your data in
beautiful ways.
You can take the information that you learned in this chapter to make
beautiful spreadsheets. You can highlight exciting data by changing the
cell’s background color or font. You can also change the cell’s format by
using a built-in style. Go ahead and give it a try.
Experiment with the code in this chapter and see how powerful and
valuable OpenPyXL is when working with cells.
OceanofPDF.com
Chapter 5 - Conditional
Formatting
Microsoft Excel has the concept of conditional formatting. You apply
conditional formatting to a range of cells. You can set up rules for the
formatting, such as change the background color of a row if it contains a
specific string. You can also set cell formatting based on a formula-type
condition. There are many different ways to set up conditional formatting
using OpenPyXL.
This chapter will focus on built-in formats. You can read more about
OpenPyXL’s support for Excel’s conditional formatting in the package’s
documentation.
Builtin formats
Working with ColorScales
Adding IconSets
Creating a DataBar
Using DifferentialStyles
Builtin Formats
OpenPyXL has three built-in formats that you can use. They are as follows:
ColorScale
IconSet
DataBar
These built-in formats allow you to format your data dynamically. You will
start by learning how to use a ColorScale first!
The FormatObject can use any of the following settings: ‘num’, ‘percent’,
‘max’, ‘min’, ‘formula’, and ‘percentile’. The FormatObject tells
OpenPyXL how to apply the ColorScale to your data. The Color objects
define the two or three colors that you will use on your data.
To get started, you will need some data. You can use the ratings.xslx
spreadsheet provided in the book’s source code repository on GitHub for
this chapter.
Now that you have some data to work with, you are ready to write some
code! Open up your Python editor and create a new file named
using_colorscale.py. Then enter the following code:
# using_colorscale.py
first = FormatObject(type='min')
last = FormatObject(type="max")
if __name__ == "__main__":
applying_colorscale("ratings.xlsx",
output_path="colorscale.xlsx")
This code imports the classes you need: ColorScale, FormatObject, Rule,
and Color. You create two instances of FormatObject to create a minimum
and maximum format objects. Then you make a list that contains two Color
objects. Finally, you instantiate ColorScale and pass in your format objects
and colors
To use your color scale, you add it to a Rule object. You need to set the
type of the Rule to a “colorScale”. Note that you need to use a lowercase
“c” here, whereas the class name uses an uppercase “C”.
When you run this code, your spreadsheet will look like this:
Fig. 5-2: Using a ColorScale
The lower the value in column B, the redder it becomes. You can see this in
the screenshot above. You can add a third color to your list of colors to get
more fine-grained control over how the colors change over a range of
values. Give it a try and see what you can come up with.
You can simplify your code a bit by using a ColorScaleRule instead of all
those other classes.
# using_colorscalerule.py
red = "AA0000"
yellow = "00FFFF00"
green = "00AA00"
color_scale_rule = ColorScaleRule(start_type="num",
start_value=1,
start_color=red,
mid_type="num",
mid_value=3,
mid_color=yellow,
end_type="num",
end_value=5,
end_color=green)
sheet.conditional_formatting.add("B1:B100", color_scale_rule)
workbook.save(output_path)
if __name__ == "__main__":
applying_colorscalerule("ratings.xlsx",
output_path="colorscalerule.xlsx")
This example demonstrates adding three colors to your color scale. The
ColorScaleRule class does not require you to add three colors. You can
pass in only the start and end values instead, and it will happily create a
spreadsheet that is the same as the previous code.
This code makes you set the start_type (a number in this case), the
start_value, and the start_color. You use the same start color that you
used in the previous example. Then you add a mid-type, color, and value.
Finally, you add an end-type, value, and color. The mid-color is a new color,
yellow, while the end-color is still green.
You apply the ColorScaleRule the same way that you applied the previous
rule.
When you run this code, you will create the following spreadsheet:
Fig. 5-3: Using a ColorScaleRule
Adding that extra color made a difference in the output. Compare this
screenshot to the one from the previous example. Try changing the colors to
something else or change the range of values in the data to see how your
spreadsheet changes. You can have a lot of fun by experimenting with these
values while getting your spreadsheet looking nice and professional.
Adding IconSets
OpenPyXL has an interesting built-in format called an IconSet. The
IconSet will replace the values in your spreadsheet with icons, which can
be more eye-catching than simple numbers.
To see how you can use an IconSet, create a new file and name it
using_iconset.py. Then enter the following:
# using_iconset.py
sheet.conditional_formatting.add("B1:B100", rule)
workbook.save(output_path)
if __name__ == "__main__":
applying_iconset("ratings.xlsx",
output_path="iconset.xlsx")
Here you create three format objects: one for the first value, one for the
middle value, and one for the last value. By setting these bounds, you are
telling OpenPyXL what to use for the icons. Next, you create an IconSet,
making sure to pass in your format objects as a list to cfvo. Then you pass
the IconSet object to your Rule. Note that you set the type of the rule to
“iconSet”, which has the same odd capitalization as the “colorScale” did.
In this example, you used the “3TrafficLights1” icon set and applied it to
column B. Try swapping out that icon set for one of the others to see how it
changes your spreadsheet.
To see how you would use an IconSetRule, create a new file and name it
using_iconsetrule.py. Now enter this code in your new file:
# using_iconsetrule.py
if __name__ == "__main__":
applying_iconsetrule("ratings.xlsx",
output_path="iconsetrule.xlsx")
This code is significantly shorter than the previous example. Here you only
need to import IconSetRule. You tell it which icons you want to use
(“5Arrows”), the format object type (“num”), and then the range of values.
You apply this rule the same way you have used the previous rules.
When you run this code, your spreadsheet will look like this:
Isn’t that an excellent little enhancement for your spreadsheet? Give both
versions of the IconSet a try and see which one you like better.
Note: If you run this code on Windows 10, the arrows will look
different then the ones shown in this screenshot.
Now let’s move on and learn about the DataBar!
Creating a DataBar
OpenPyXL supports adding a “progress bar” in your cells as well. The
progress bar is called a DataBar. The DataBar is an excellent way to
visualize data where you might be trying to reach a goal. For example, you
might have a column of weekly sales numbers, and you want to show the
progress, week-over-week, of your sales. OpenPyXL makes that
straightforward.
Open up your Python editor and create a new file. Save your file with the
name using_databar.py. Then add the following code:
# using_databar.py
if __name__ == "__main__":
applying_databar("ratings.xlsx",
output_path="databar.xlsx")
In this example, you create two format objects and set their type to “num”
and their val to the bounds you want your progress bar to use. For this case,
your bounds are 1-5. Then you create a DataBar and pass in the format
objects as a Python list to the cfvo parameter. Next, you set the color to
green.
Finally, you add the DataBar object to your Rule. You need to set the rule’s
type to “dataBar” to make the rule work correctly.
When you run this code, your spreadsheet will look like this:
The screenshot shows how you can create a column of progress bars
quickly, with only a few lines of code. However, OpenPyXL makes it even
easier by providing yet another convenience function: DataBarRule.
# using_databarrule.py
red = "AA0000"
data_bar_rule = DataBarRule(start_type="num",
start_value=1,
end_type="num",
end_value="5",
color=red)
sheet.conditional_formatting.add("B1:B100", data_bar_rule)
workbook.save(output_path)
if __name__ == "__main__":
applying_databar_rule("ratings.xlsx",
output_path="databarrule.xlsx")
This code reduces the number of imports that you need and the number of
objects you need to create. Here you make only the DataBarRule. You give
it a start type and end type, both of which are “num”. Then you set the start
and end values, which are 1 (one) and 5 (five), respectively. Finally, you set
the color to red to easily differentiate the output of this code from your
regular DataBar output.
When you run this code, your new spreadsheet will have nice, red progress
bars in it:
Now it’s time to turn your attention to this chapter’s final section:
DifferentialStyles!
Using DifferentialStyles
A DifferentialStyle is a type of NamedStyle, which you learned about in
chapter 4. You use a DifferentialStyle for packaging up multiple styles,
like fonts, border, alignments, etc. When you combine a
DifferentialStyle with a Rule, you can do some custom conditional
formatting!
# using_rules.py
yellow = PatternFill(bgColor="00FFFF00")
diff_style = DifferentialStyle(fill=yellow)
rule = Rule(type="expression", dxf=diff_style)
rule.formula = [rule_formula]
sheet.conditional_formatting.add("A1:F100", rule)
workbook.save(output_path)
if __name__ == "__main__":
applying_rules("ratings.xlsx", rule_formula="$B1<3",
output_path="rules.xlsx")
This code creates a PatternFill object, which you use to create your
background color. Then you instantiate your DifferentialStyle object and
pass in the background color that you created.
Now you can create a Rule. For this to work, you set the rule’s type to an
“expression”. To apply the style, you pass it to the dxf parameter. You also
pass in a rule formula: the value must be less than 3 for the background
color to change to yellow.
When you run this code, your spreadsheet will look like this:
This looks pretty good, but you may have noticed that the empty rows are
also yellow. This seems to occur because there are no values set, so
Microsoft Excel interprets them as less than 3.
Wrapping Up
Conditional formatting gives you the ability to use formulas and boundaries
to format your spreadsheets dynamically. After applying conditional
formatting, you can change your data and see how the formatting will
change automatically.
In this chapter, you learned about the following:
Builtin formats
Working with ColorScales
Adding IconSets
Creating a DataBar
Using DifferentialStyles
You can take the code examples in this code and use these concepts together
to do complex formatting of your spreadsheets. You need to spend some
time experimenting with different rules and formatting objects, but once
you understand the fundamentals, you will be creating unique spreadsheets
quickly and confidently!
OceanofPDF.com
Chapter 6 - Creating Charts
OpenPyXL supports many different kinds of charts. These charts are an
excellent way to convey information to the user and help them understand
the data in the spreadsheet. The most common charts include the bar chart,
pie chart, and line chart.
The focus of this chapter is to get you familiar with how to create a chart
and edit its many different parts. When you create a chart, you will need to
define the type of chart you want to use, such as BarChart or LineChart.
You will also need to provide the data that the chart uses, which is called a
Reference.
The charts don’t have any required arguments to pass to them. You can
initialize them by doing the following:
Once you have a chart instance, you can add it to your worksheet. To learn
how to do that, create a new Python file and name it first_chart.py.
# first_chart.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
bar_chart.add_data(data, titles_from_data=True)
sheet.add_chart(bar_chart, "E2")
workbook.save(filename)
if __name__ == "__main__":
main("bar_chart.xlsx")
Here you create a Worksheet the same way that you have in the past. Then
you make some data and add it to the worksheet using append(). Next, you
create the BarChart and a Reference object. The Reference class tells
OpenPyXL what data to use to create the chart.
In this case, you tell it to use rows one through ten and columns two
through three. That allows you to skip the first column, which is the book
number. You then add the Reference to the chart by passing it to the chart’s
add_data() method. You set titles_from_data to True, which tells the
chart to use the column labels for the chart’s legend.
If you do not set titles_from_data to True, then the legend will say
“Series 1”, “Series 2”, etcetera.
Finally, you attach the chart to the worksheet using add_chart(), which
takes the chart object and a cell name. The cell name is the position where
the chart will appear in the worksheet.
When you run this code, your spreadsheet will look like this:
Isn’t that a nice-looking little chart? But wait! It’s missing something. It
needs a chart title, and some axis titles would be good to have too.
You will learn how to add all of those items in the next section!
# first_chart_titles.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
bar_chart.add_data(data, titles_from_data=True)
sheet.add_chart(bar_chart, "E2")
workbook.save(filename)
if __name__ == "__main__":
main("bar_chart_titles.xlsx")
This code is a copy of the code from the last section. You need to focus on
the following new lines of code:
The bar chart instance has many methods that you can use to modify the
chart in different ways. In this case, you want to add a title to the chart.
Since this data is about the prices of various book types, you give it a
“Book Prices by Type” title.
Next, you want to add some labels or titles to each axis. The x-axis
represents the various book types (i.e. eBook and paperback). To add a title
to the x-axis, you need to set x_axis.title to a string.
Finally, you add a title to the y-axis. The y-axis in this chart represents the
price of the book. So you set y_axis.title to “Prices”.
Now that you understand what the new lines of code are doing, you should
run this program.
When you do, you will see that your chart updates to look like this:
Fig. 6-2: Creating a BarChart with Titles
Your chart has some friendly, new titles. Try changing the titles to
something different, or play around with the data to make your chart more
unique!
Now you are ready to move on and learn about how to change your chart’s
axis orientation!
Open up your Python editor and create a new file. Save the file with the
name axis_orientations.py.
# axis_orientations.py
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference
from openpyxl.chart.layout import Layout, ManualLayout
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
chart.add_data(data, titles_from_data=True)
if x_orientation:
chart.x_axis.scaling.orientation = x_orientation
if y_orientation:
chart.y_axis.scaling.orientation = y_orientation
return chart
def main(filename):
workbook = Workbook()
sheet = workbook.active
workbook.save(filename)
if __name__ == "__main__":
main("axis_orientations.xlsx")
Since this code is a bit longer than the previous examples were, you will go
over each function individually.
The first half of the code contains your imports and the create_chart()
function:
# axis_orientations.py
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
chart.add_data(data, titles_from_data=True)
if x_orientation:
chart.x_axis.scaling.orientation = x_orientation
if y_orientation:
chart.y_axis.scaling.orientation = y_orientation
return chart
You use create_chart() to create multiple charts that you then insert into
the worksheet. The create_chart() function takes in the following
arguments:
def main(filename):
workbook = Workbook()
sheet = workbook.active
workbook.save(filename)
if __name__ == "__main__":
main("axis_orientations.xlsx")
Here you create the actual Workbook and grab the active worksheet. Then
you add the data that you will use for your charts. Finally, you make four
charts with different scaling orientations.
To set the x and y axis orientation, you need to set one or both of them to
either “maxMin” or “minMax”.
When you run the code, you will end up with four charts in your
spreadsheet. Here are the charts:
You can also use the ManualLayout to control the position and size of the
chart’s legend. You control the position of the chart’s legend by setting its
position to one of the following:
r - right (default)
l - left
t - top
b - bottom
tr - top right
Now that you have a basic understanding of how to position the chart and
the legend, you can write some code. Create a new file and name it
manual_chart_layout.py. Then add the following code to your file:
# manual_chart_layout.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
chart.add_data(data, titles_from_data=True)
sheet.add_chart(chart, "E2")
workbook.save(filename)
if __name__ == "__main__":
main("manual_chart_layout.xlsx")
In this example, you want to focus on the section of code that comes right
after you create your BarChart.
Here you set the legend.position to “tr” or top-right. Then you the set
chart’s layout attribute to a Layout() class, which contains a
ManualLayout instance inside of it.
When you run this code, your spreadsheet will look like this:
Fig. 6-4: Manual Chart Layout
You can see that the legend is no longer to the right of the chart like it has
been in all the previous examples. Instead, the legend is in the top-right
corner with the chart kind of underneath the legend!
If you have the time, you should play around with ManualLayout to see
what you can do with it. You can change the way your chart looks with only
some minor changes to its layout.
Now you are ready to learn about changing the size of your chart!
To change the container’s size, you need to adjust the height and width of
the chart object.
Create a new file and name it chart_size.py. Then enter the following
code:
# chart_size.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
bar_chart.add_data(data, titles_from_data=True)
sheet.add_chart(bar_chart, "E2")
workbook.save(filename)
if __name__ == "__main__":
main("bar_chart_size.xlsx")
Here you set the height of the chart to 20 and the width to 30. That doesn’t
sound like much, but when you run this code, you can see that the chart is
quite a bit larger:
Fig. 6-5: Changing the Chart Size
You now have a couple of different tools that you can use to adjust the size
of your charts. Give them a try and see what works best for you!
The next topic to learn about is how to change the style of your chart.
Using Styles
The chart object has forty-eight different styles. The style is a color scheme
for your chart. To apply a style, you set the style attribute of the chart
object to an integer from 1 to 48.
You can see how this works by creating a new Python script. After making
your new file, name it chart_styles.py and enter the following code:
# chart_styles.py
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
bar_chart.add_data(data, titles_from_data=True)
sheet.add_chart(bar_chart, "E2")
workbook.save(filename)
if __name__ == "__main__":
for style in range(1, 4):
main(f"chart_style{style}.xlsx", style)
Here you use Python’s range() function to create three spreadsheets using
the styles 1 - 3. The first style applies a gray color scheme to your chart:
Fig. 6-6: Chart Style 1
When you use style two, your color scheme looks like the default color
scheme:
You can update the code example above to create all forty-eight styles to
see a sample of each of them. You should be able to find at least one color
scheme that fits your needs.
The last subject that you will learn about are chartsheets!
Creating Chartsheets
A chartsheet is a special worksheet that only contains a chart. There is no
data whatsoever on a chartsheet. OpenPyXL supports creating a chartsheet
using the data from a different worksheet in your spreadsheet.
To make this example more interesting, you will be creating a PieChart and
putting it into a chartsheet.
# chartsheet_demo.py
rows = [
["Python", 50],
["C++", 35],
["Java", 10],
["R", 5]
]
chart = PieChart()
labels = Reference(ws, min_col=1, min_row=1, max_row=4)
data = Reference(ws, min_col=2, min_row=1, max_row=5)
chart.add_data(data, titles_from_data=True)
chart.set_categories(labels)
chart.title = "Programming Languages"
chart_sheet.add_chart(chart)
wb.save(filename)
if __name__ == "__main__":
main("chartsheet.xlsx")
chart = PieChart()
labels = Reference(ws, min_col=1, min_row=1, max_row=4)
data = Reference(ws, min_col=2, min_row=1, max_row=5)
chart.add_data(data, titles_from_data=True)
chart.set_categories(labels)
chart.title = "Programming Languages"
chart_sheet.add_chart(chart)
In the code above, you create two references. The first reference contains
the legend’s labels. The second reference is the data used to generate the
chart. You use set_categories() to apply the labels to the legend, and you
use add_data() to add the data to the chart.
When you run this code, you will create a Workbook that contains two
worksheets. The first worksheet contains the data, and the second worksheet
is the chartsheet.
Since the chartsheet only shows the chart, the chart is much bigger than if it
was in a regular worksheet. Try editing this code to change it to a BarChart
or another chart type. You can also modify some of the other settings to see
how they affect the chart’s appearance.
Wrapping Up
You can visualize your spreadsheet’s data using charts. A chart can be easier
to understand than raw numbers. Everyone is different, so it’s good to be
able to communicate the data in different ways.
You can take any of the working examples in these chapters and modify
them to help you cement what you have learned. You can take the tools you
learned in this chapter and apply most of them to any of the supported chart
types.
In the next chapter, you will learn about the many other chart types you can
create in OpenPyXL.
OceanofPDF.com
Chapter 7 - Chart Types
There are many different kinds of charts that you can create using
OpenPyXL. These different chart types give you many different ways to
visualize and display your data to the user. You can help users better
understand their data by putting it into a chart.
In this chapter, you will learn how to create the following chart types:
Area Charts
Bar Charts
Bubble Charts
Line Charts
Scatter Charts
Pie Charts
Doughnut Charts
Radar Charts
Surface Charts
You won’t learn every idiosyncrasy with every chart, but how to create
most of the supported chart types. By doing so, you will learn to visualize
your data in many different ways. Once you have learned about all the
different types, you can experiment with them and see which ones work
best for your audience.
The first chart type you will learn about is the area chart!
Area Charts
OpenPyXL supports two different types of area charts. You can create a
two-dimensional or a three-dimensional chart using OpenPyXL. The
creation process of both charts is the same except that you use a different
import.
You will get started by learning how to create a two-dimensional area chart!
2D Area Charts
You can use an area chart to show the rise and fall of multiple data series
effectively. They are also useful for demonstrating total amounts over a
time period.
To see how you can create an area chart with OpenPyXL, create a new file
and name it area_chart.py. Then enter the following code:
# area_chart.py
def create_excel_data(sheet):
data_rows = [
["Book", "Kindle", "Paperback"],
["Python 101", 15, 5],
["Python 201", 5, 1],
["ReportLab", 10, 0],
["wxPython", 2, 2],
["Jupyter", 25, 15],
]
def create_chart(sheet):
chart = AreaChart()
chart.style = 23
chart.title = "Book Prices by Type"
chart.x_axis.title = "Book Types"
chart.y_axis.title = "Prices"
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
chart.add_data(data, titles_from_data=True)
sheet.add_chart(chart, "E2")
def main():
workbook = Workbook()
sheet = workbook.active
create_excel_data(sheet)
create_chart(sheet)
workbook.save("area_chart_3d.xlsx")
if __name__ == "__main__":
main()
This code should look familiar to you if you read the last chapter. The first
function, create_excel_data(), creates a few rows of data using dummy
book price information. Then you have create_chart(), where you
instantiate AreaChart() and set the style and a couple of titles. You use
main() to create the Workbook and call the other two functions.
When you run this code, your spreadsheet will look like this:
Fig. 7-1: Creating an Area Chart
Doesn’t that look nice? You can experiment with different style values or
change the Reference object to limit what data OpenPyXL uses to generate
the chart.
Now you are ready to see what happens when you use a 3D area chart
instead!
3D Area Charts
The 3D area chart conveys the same information as a 2D area chart. The
difference is in the presentation. The chart appears to show the data in three
dimensions instead of two. How well it accomplishes that feat is up for
debate.
You can decide for yourself by creating a new file and naming it
area_chart_3d.py. Then add this code to your new file:
# area_chart_3d.py
rows = [
["Book", "Kindle", "Paperback"],
[2, 30, 40],
[3, 25, 40],
[4 ,30, 50],
[5 ,10, 30],
[6, 5, 25],
[7 ,10, 50],
]
chart = AreaChart3D()
chart.title = "Area Chart 3D"
chart.x_axis.title = "Books"
chart.y_axis.title = "Copies Sold"
chart.legend = None
sheet.add_chart(chart, "E2")
wb.save(filename)
if __name__ == "__main__":
main("area_chart_3d.xlsx")
The main difference between this code and the code in the previous
example is that you use AreaChart3D instead of AreaChart. The other
change is that you set the chart title differently and you add some
categories.
Go ahead and run the code. Your code will create the following spreadsheet:
Fig. 7-2: Creating a 3D Area Chart
Now your data is shown in 3D! You can use this type of chart to add a little
pizzazz to a presentation.
The next chart that you will learn about is the bar chart!
Bar Charts
Bar charts are one of the most common types of charts. A bar chart makes it
easy to show the difference in sales figures or populations. You can visually
compare these values and see their differences clearly in a bar chart.
Vertical
Horizontal
Stacked
The vertical version is probably the type of bar chart you will see most, and
it is the one you will learn about first.
Vertical Bar Charts
The humble vertical bar chart is the most common type of bar chart. You
have seen how to create one of these in the previous chapter. For
completeness, you will create one in this section too.
Open up your Python editor and create a new file. You can name it
bar_chart.py. Then enter the following code:
# bar_chart.py
def create_excel_data(sheet):
data_rows = [
('Number', 'Batch 1', 'Batch 2'),
(2, 10, 30),
(3, 40, 60),
(4, 50, 70),
(5, 20, 10),
(6, 10, 40),
(7, 50, 30),
]
def create_bar_chart(sheet):
bar_chart = BarChart()
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
bar_chart.add_data(data, titles_from_data=True)
sheet.add_chart(bar_chart, "E2")
def main():
workbook = Workbook()
sheet = workbook.active
create_excel_data(sheet)
create_bar_chart(sheet)
workbook.save("bar_chart.xlsx")
if __name__ == "__main__":
main()
In this example, you re-use the main() function from earlier in the chapter.
The create_excel_data() is similar to the earlier ones, although the data
is slightly different.
The screenshot above shows your standard vertically oriented bar chart. If
you like, you can change its style by setting the style attribute to a
different value, which you learned about in chapter 6.
Now you are ready to see how to transform this chart into a horizontally-
oriented bar chart!
A horizontal bar chart is where the bars grow from left to right instead of
from bottom to top. You will still use the BarChart class that you used in
the previous example.
However, to get it to grow horizontally instead of vertically, you must set
the bar chart’s type attribute to “bar”.
You can see how this works by creating a new file named
bar_chart_horizontal.py and entering the following code:
# bar_chart_horizontal.py
def create_excel_data(sheet):
data_rows = [
('Number', 'Batch 1', 'Batch 2'),
(2, 10, 30),
(3, 40, 60),
(4, 50, 70),
(5, 20, 10),
(6, 10, 40),
(7, 50, 30),
]
def create_bar_chart(sheet):
bar_chart = BarChart()
bar_chart.type = "bar"
bar_chart.style = 31
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
bar_chart.add_data(data, titles_from_data=True)
sheet.add_chart(bar_chart, "E2")
def main():
workbook = Workbook()
sheet = workbook.active
create_excel_data(sheet)
create_bar_chart(sheet)
workbook.save("bar_chart_horizontal.xlsx")
if __name__ == "__main__":
main()
This code looks almost exactly the same as the previous example. The part
you want to focus on here is as follows:
def create_bar_chart(sheet):
bar_chart = BarChart()
bar_chart.type = "bar"
bar_chart.style = 31
Here you set the bar_chart.type to “bar”. When you run this code, you
will see that the bar chart it creates is now horizontally oriented:
That looks a little different than the vertical bar chart. Both of these
versions of bar charts are easy to interpret. Give them each a try so you can
see which one works best for your data.
A stacked bar chart shows how a larger category is divided into smaller
ones and the relationship between each part and the total. OpenPyXL
supports two different types of stacked bar charts:
To create a regular stacked bar chart with OpenPyXL, you must set the
grouping attribute to “stacked”. Create a new file and name it
bar_chart_stacked.py. Then enter the following:
# bar_chart_stacked.py
def create_excel_data(sheet):
data_rows = [
('Number', 'Batch 1', 'Batch 2'),
(2, 10, 30),
(3, 40, 60),
(4, 50, 70),
(5, 20, 10),
(6, 10, 40),
(7, 50, 30),
]
def create_bar_chart(sheet):
bar_chart = BarChart()
bar_chart.type = "col"
bar_chart.style = 12
bar_chart.grouping = "stacked"
bar_chart.overlap = 100
bar_chart.title = 'Stacked Chart'
data = Reference(worksheet=sheet,
min_row=1,
max_row=7,
min_col=2,
max_col=3)
bar_chart.add_data(data, titles_from_data=True)
sheet.add_chart(bar_chart, "E2")
def main():
workbook = Workbook()
sheet = workbook.active
create_excel_data(sheet)
create_bar_chart(sheet)
workbook.save("bar_chart_stacked.xlsx")
if __name__ == "__main__":
main()
This code is also similar to the other bar chart examples. You can focus in
on creare_bar_chart() to see what’s different:
def create_bar_chart(sheet):
bar_chart = BarChart()
bar_chart.type = "col"
bar_chart.style = 12
bar_chart.grouping = "stacked"
bar_chart.overlap = 100
bar_chart.title = 'Stacked Chart'
Here you set the bar chart type to “col” and the grouping to “stacked”.
You also set the overlap value to 100.
When you run this code, you will get the following chart:
If you want to see how a percent stacked bar chart differs from a regular
one, you will need to make some minor edits.
All you will need to do is change the following code to make a Percent
Stacked Bar Chart:
def create_bar_chart(sheet):
bar_chart = BarChart()
bar_chart.type = "bar"
bar_chart.style = 13
bar_chart.grouping = "percentStacked"
bar_chart.overlap = 100
bar_chart.title = 'Percent Stacked Chart'
data = Reference(worksheet=sheet,
min_row=1,
max_row=10,
min_col=2,
max_col=3)
bar_chart.add_data(data, titles_from_data=True)
sheet.add_chart(bar_chart, "E2")
Re-run the code, and your spreadsheet will update to look like this:
Now you know how to create stacked bar charts! Feel free to edit the data
or try them out with your own data sets.
The last bar chart you will learn about is the 3D bar chart!
3D Bar Charts
The 3D bar chart is kind of fun. You can use the 3D bar chart to make your
data stand out. While this bar chart version doesn’t add anything new to
your visualization, it is a nice effect.
# bar_chart_3d.py
def create_excel_data(sheet):
data_rows = [
["", "Kindle", "Paperback"],
["Python 101", 9.99, 25.99],
["Python 201: Intermediate Python", 9.99, 25.99],
["wxPython Cookbook", 9.99, 25.99],
["ReportLab: PDF Processing with Python", 4.99, 29.99],
["Jupyter Notebook 101", 4.99, 29.99],
["Creating GUI Applications with wxPython", 24.99, 29.99],
["Python Interviews", 24.99, 65.00],
["Pillow: Image Processing with Python", 24.99, 69.00],
["Automating Excel with Python", 24.99, 69.00],
]
def create_bar_chart(sheet):
bar_chart = BarChart3D()
bar_chart.title = "Book Prices by Type"
bar_chart.height = 20
bar_chart.width = 30
data = Reference(worksheet=sheet,
min_row=2,
max_row=10,
min_col=2,
max_col=3)
titles = Reference(sheet, min_col=1, min_row=2, max_row=10)
bar_chart.add_data(data, titles_from_data=True)
bar_chart.set_categories(titles)
sheet.add_chart(bar_chart, "E2")
def main():
workbook = Workbook()
sheet = workbook.active
create_excel_data(sheet)
create_bar_chart(sheet)
workbook.save("bar_chart_3d.xlsx")
if __name__ == "__main__":
main()
To make things a little more interesting, the data added to this chart is
slightly different from before (see create_excel_data()). The other
change is to import BarChart3D and use that instead of BarChart.
You set the height and width of the 3D bar chart to make it easier to view.
When you run this code, you will get the following spreadsheet:
Isn’t that a nice-looking chart? You should try changing the data around or
pick a different style to use.
The bubble chart requires at least three sets of values or columns of data:
The OpenPyXL documentation for the Bubble Chart has a good example.
You will use a variant of that code.
To get started, create a new file named bubble_chart.py and add the
following code:
# bubble_chart.py
def main():
workbook = Workbook()
sheet = workbook.active
rows = [
("Number of Products", "Sales in USD", "Market share"),
(14, 12200, 15),
(20, 60000, 33),
(18, 24400, 10),
(22, 32000, 42),
(),
(12, 8200, 18),
(15, 50000, 30),
(19, 22400, 15),
(25, 25000, 50),
]
chart = BubbleChart()
chart.style = 18
# add the first series of data
xvalues = Reference(sheet, min_col=1, min_row=2, max_row=5)
yvalues = Reference(sheet, min_col=2, min_row=2, max_row=5)
size = Reference(sheet, min_col=3, min_row=2, max_row=5)
series = Series(values=yvalues, xvalues=xvalues, zvalues=size,
title="2013")
chart.series.append(series)
if __name__ == "__main__":
main()
Once you finish creating a Series, you add it to the chart using
series.append(). When you run this code, your new spreadsheet will have
a bubble chart:
Fig. 7-8: Creating a Bubble Chart
Bubble charts are a fun way to display your data visually. You don’t see
bubble charts used very often, so keep them handy when you need to add a
little extra pop to your presentation or report.
Line Charts
OpenPyXL supports several different types of line charts:
A 2D line chart is a pretty common type of chart. You create line charts the
same way you do bar charts, except that you will use LineChart instead of
BarChart.
You can see how to make a line chart by creating a new file named
line_chart.py and entering this code:
# line_chart.py
def create_excel_data(sheet):
data_rows = [
["Book", "Kindle", "Paperback"],
[1, 9.99, 25.99],
[2, 9.99, 25.99],
[3, 9.99, 25.99],
[4, 4.99, 29.99],
[5, 4.99, 29.99],
[6, 24.99, 29.99],
[7, 24.99, 65.00],
[8, 24.99, 69.00],
[9, 24.99, 69.00],
]
def create_line_chart(sheet):
chart = LineChart()
chart.title = "Line Chart"
chart.style = 15
chart.y_axis.title = 'Sales'
chart.x_axis.title = 'Books'
sheet.add_chart(chart, "E2")
def main():
workbook = Workbook()
sheet = workbook.active
create_excel_data(sheet)
create_line_chart(sheet)
workbook.save("line_chart.xlsx")
if __name__ == "__main__":
main()
There isn’t anything truly new in this code. The only difference here is that
you are creating a LineChart object.
Go ahead and run this code, and you will have created your first line chart:
That looks pretty good! Now you’re ready to make your line chart three-
dimensional!
3D Line Charts
The 3D line chart is another type of line chart, only drawn such that it looks
three-dimensional. The only difference is that you need to use LineChart3D
to create one.
Open your Python editor back up and create a new file named
line_chart_3d.py. Then enter this code:
# line_chart_3d.py
def create_excel_data(sheet):
data_rows = [
["Book", "Kindle", "Paperback"],
[1, 9.99, 25.99],
[2, 9.99, 25.99],
[3, 9.99, 25.99],
[4, 4.99, 29.99],
[5, 4.99, 29.99],
[6, 24.99, 29.99],
[7, 24.99, 65.00],
[8, 24.99, 69.00],
[9, 24.99, 69.00],
]
def create_line_chart(sheet):
chart = LineChart3D()
chart.title = "Line Chart"
chart.style = 15
chart.y_axis.title = 'Sales'
chart.x_axis.title = 'Books'
sheet.add_chart(chart, "E2")
def main():
workbook = Workbook()
sheet = workbook.active
create_excel_data(sheet)
create_line_chart(sheet)
workbook.save("line_chart_3d.xlsx")
if __name__ == "__main__":
main()
This code is the same as the previous example except for the new import.
Go ahead and run this example to see what a 3D line chart looks like:
Fig. 7-11: Creating a 3D Line Chart
Now you know how to create two different kinds of line charts. The next
chart type to learn about is the scatter chart!
Scatter Charts
The scatter chart is a type of plot that uses Cartesian coordinates to display
values for two variables for a set of data. Your data is a collection of points.
The two variables define the horizontal and vertical positions of each point
on the chart.
OpenPyXL draws a line between the points by default in its scatter charts,
so they look like a line chart. You can see this happen by creating a new
Python file named scatter_chart.py and entering this code:
# scatter_chart.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
rows = [
["Book", "Kindle", "Paperback"],
[1, 9.99, 25.99],
[2, 9.99, 25.99],
[3, 9.99, 25.99],
[4, 4.99, 29.99],
[5, 4.99, 29.99],
[6, 24.99, 29.99],
[7, 24.99, 65.00],
[8, 24.99, 69.00],
[9, 24.99, 69.00],
]
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
sheet.add_chart(chart, "E2")
workbook.save(filename)
if __name__ == "__main__":
main("scatter_chart.xlsx")
This code creates the values to plot by creating a Reference that you name
xvalues. You then loop over a small range to create a couple of Series
objects that will plot out your chart using the xvalues you created.
When you run this code, your spreadsheet will look like this:
Fig. 7-12: Creating a Scatter Chart
Feel free to try a different data set with your scatter chart.
Pie Charts
A pie chart is a type of graph that is a circle divided into sections that
represent a proportion of the whole. OpenPyXL supports three different
types of pie charts:
Pie charts are a great way to show percentages in a population. You will
learn how to create a pie chart now!
Regular Pie Charts
You can create a pie chart with OpenPyXL using the PieChart class. The
data that you add to a pie chart does not have to add up to 100. The
PieChart class will take the data and determine the percentage of each
category based on the total.
Open up your Python IDE and create a new file named pie_chart.py. Now
you can add this code to create your pie chart:
# pie_chart.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = [
['Pizza', 'Sold'],
['Pepperoni', 50],
['Sausage', 30],
['Cheese', 10],
['Supreme', 40],
]
chart = PieChart()
chart.title = "Pizza Pie Chart"
labels = Reference(sheet, min_col=1, min_row=2, max_row=5)
data = Reference(sheet, min_col=2, min_row=1, max_row=5)
chart.add_data(data, titles_from_data=True)
chart.set_categories(labels)
sheet.add_chart(chart, "E2")
workbook.save(filename)
if __name__ == "__main__":
main("pie_chart.xlsx")
You don’t need to use a Series object with a pie chart. You only need a
couple of Reference objects. However, if you want to pop out a slice of the
pie chart, you need to use a DataPoint object.
To use the DataPoint object, you need to create some code like this:
When you run the full code above, you will get a spreadsheet with a pie
chart that looks like this:
Now that you have created a normal pie chart, it’s time to learn about the
projected pie chart variety!
Projected Pie Charts
Projected pie charts are a little different than a regular pie chart. According
to the documentation, a projected pie chart will “extract some slices from a
pie chart and project them into a second pie or bar chart”.
You will take the projected pie chart example from the OpenPyXL
documentation and modify it slightly for your code. Create a new file
named projected_pie_chart.py and add the following:
# projected_pie_chart.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = [
['Page', 'Views'],
['Search', 95],
['Products', 4],
['Offers', 0.5],
['Sales', 0.5],
]
projected_pie = ProjectedPieChart()
projected_pie.type = "pie"
projected_pie.splitType = "val" # split by value
labels = Reference(sheet, min_col=1, min_row=2, max_row=5)
data = Reference(sheet, min_col=2, min_row=1, max_row=5)
projected_pie.add_data(data, titles_from_data=True)
projected_pie.set_categories(labels)
sheet.add_chart(projected_pie, "E2")
projected_bar = deepcopy(projected_pie)
projected_bar.type = "bar"
projected_bar.splitType = 'pos' # split by position
sheet.add_chart(projected_bar, "E19")
workbook.save(filename)
if __name__ == "__main__":
main("project_pie_chart.xlsx")
This code will create two projected pie charts. The first projected pie chart
will project a pie chart, while the second will project a bar chart. You can
specify the project by setting the type attribute to “pie” or “bar”. You can
also set the splitType to be split by value (“val”) or split by position
(“pos”).
When you run this example, your spreadsheet will look like the following
screenshot:
The projected pie chart is unique from the other charts in that it is kind of a
chart within a chart. You should try editing the code and experimenting with
the values that you pass to the project pie chart. It’s the best way to
familiarize yourself with how it functions.
The last pie chart type to learn is the 3D pie chart.
3D Pie Charts
You can create a pie chart with a 3D effect applied by using PieChart3D.
To get started, create a new file named pie_chart_3d.py and add this code
to your file:
# pie_chart_3d.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = [
['Pizza', 'Sold'],
['Pepperoni', 50],
['Sausage', 30],
['Cheese', 10],
['Supreme', 40],
]
pie = PieChart3D()
labels = Reference(sheet, min_col=1, min_row=2, max_row=5)
data = Reference(sheet, min_col=2, min_row=1, max_row=5)
pie.add_data(data, titles_from_data=True)
pie.set_categories(labels)
pie.title = "Pizza Pies sold by type"
sheet.add_chart(pie, "E2")
workbook.save(filename)
if __name__ == "__main__":
main("pie_chart_3d.xlsx")
This code is the same as the first pie chart code you saw in this section,
except you are using PieChart3D here. When you execute this code, you
will end up with this spreadsheet:
Fig. 7-15: Creating a 3D Pie Chart
The spreadsheet above shows you what the pie chart looks like with a 3D
effect applied to it.
Doughnut Charts
A doughnut chart is closely related to pie charts. The pie chart is a circle,
while the doughnut chart is a ring or series of concentric rings.
For this example, you will take the code from OpenPyXL’s documentation
and modify it a bit to use some of the sample data that you have used in
some of your previous examples.
To get started, create a new file named doughnut_chart.py, then enter this
code:
# doughnut_chart.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = [
['Books', 2019, 2020],
['Python 101', 40, 50],
['Python 201', 2, 10],
['ReportLab', 20, 30],
['Jupyter', 30, 40],
]
chart = DoughnutChart()
labels = Reference(sheet, min_col=1, min_row=2, max_row=5)
data = Reference(sheet, min_col=2, min_row=1, max_row=5)
chart.add_data(data, titles_from_data=True)
chart.set_categories(labels)
chart.title = "Books sold by title"
chart.style = 26
sheet.add_chart(chart, "E1")
chart2 = deepcopy(chart)
chart2.title = None
data = Reference(sheet, min_col=3, min_row=1, max_row=5)
series2 = Series(data, title_from_data=True)
series2.data_points = slices
chart2.series.append(series2)
sheet.add_chart(chart2, "E17")
workbook.save(filename)
if __name__ == "__main__":
main("doughnut_chart.xlsx")
This code will create two doughnut charts. They both pull from the same
dataset, but they load it differently using different rows and columns. In the
first doughnut chart, you split out the slices to apply custom colors to each
slice. You also set the explosion value of the chocolate ring to 10. That
will separate the chocolate portion of the doughnut from the rest of the
doughnut.
The second doughnut chart doesn’t have any customization applied to it.
You don’t set the explosion either. By using the defaults, you get to see
what a normal doughnut chart looks like:
The doughnut chart is a fun way to make a pie chart look different. If you
need an alternative way to showing your data, you may want to consider
using the doughnut instead of the pie chart.
Radar charts are the next chart type that you will discover.
Radar Charts
Radar charts compare the aggregate values of multiple data series.
OpenPyXL describes them as a “projection of an area chart on a circular x-
axis”. OpenPyXL supports two versions of the radar chart:
There is a third type called “marker”, but it has no effect. You can add
markers via the Series object.
You will be taking the radar chart example from the OpenPyXL
documentation and modifying it slightly for your code. Open up a new file
and name it radar_chart.py. Then enter the following code:
# radar_chart.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = [
['Month', "Bulbs", "Seeds", "Flowers", "Trees & shrubs"],
['Jan', 0, 2500, 500, 0,],
['Feb', 0, 5500, 750, 1500],
['Mar', 0, 9000, 1500, 2500],
['Apr', 0, 6500, 2000, 4000],
['May', 0, 3500, 5500, 3500],
['Jun', 0, 0, 7500, 1500],
['Jul', 0, 0, 8500, 800],
['Aug', 1500, 0, 7000, 550],
['Sep', 5000, 0, 3500, 2500],
['Oct', 8500, 0, 2500, 6000],
['Nov', 3500, 0, 500, 5500],
['Dec', 500, 0, 100, 3000 ],
]
sheet.add_chart(chart, "G2")
workbook.save(filename)
if __name__ == "__main__":
main("radar_chart.xlsx")
When you create your radar chart, you choose to use the “filled” version.
Then you create a couple of references, one to contain the labels and one
for the data. Then you set up the chart.
Run your code, and the spreadsheet that it creates will look like this:
By default, OpenPyXL draws its surface charts in 3D. You can create 2D
surface charts by setting the rotation and perspective in the chart, if you so
desire.
Create a new Python file named surface_chart.py that contains this code:
# surface_chart.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
data = [
[None, 10, 20, 30, 40, 50,],
[0.1, 15, 65, 105, 65, 15,],
[0.2, 35, 105, 170, 105, 35,],
[0.3, 55, 135, 215, 135, 55,],
[0.4, 75, 155, 240, 155, 75,],
[0.5, 80, 190, 245, 190, 80,],
[0.6, 75, 155, 240, 155, 75,],
[0.7, 55, 135, 215, 135, 55,],
[0.8, 35, 105, 170, 105, 35,],
[0.9, 15, 65, 105, 65, 15],
]
chart1 = SurfaceChart()
ref = Reference(sheet, min_col=2, max_col=6, min_row=1, max_row=10)
labels = Reference(sheet, min_col=1, min_row=2, max_row=10)
chart1.add_data(ref, titles_from_data=True)
chart1.set_categories(labels)
chart1.title = "Contour"
sheet.add_chart(chart1, "A12")
# wireframe
chart2 = deepcopy(chart1)
chart2.wireframe = True
chart2.title = "2D Wireframe"
sheet.add_chart(chart2, "G12")
# 3D Surface
chart3 = SurfaceChart3D()
chart3.add_data(ref, titles_from_data=True)
chart3.set_categories(labels)
chart3.title = "Surface"
sheet.add_chart(chart3, "A29")
chart4 = deepcopy(chart3)
chart4.wireframe = True
chart4.title = "3D Wireframe"
sheet.add_chart(chart4, "G29")
workbook.save(filename)
if __name__ == "__main__":
main("surface_chart.xlsx")
You can set the wireframe attribute to True for the SurfaceChart and the
SurfaceChart3D objects to enable wireframe drawing.
When you run this code, you will get the following spreadsheet:
Fig. 7-18: Creating Surface Charts
Aren’t those charts fun? They are very different than the other charts in this
chapter.
Wrapping Up
You have learned about a wide variety of chart types in this chapter.
Specifically, you have learned how to create:
Area Charts
Bar Charts
Bubble Charts
Line Charts
Scatter Charts
Pie Charts
Doughnut Charts
Radar Charts
Surface Charts
You can use these charts to visualize a large number of different data types.
From sales figures to populations and beyond, you will be able to plot your
data in one or more of these fun charts. Your spreadsheets will look much
more interesting when they contain charts.
Give these charts a try. Experiment with them using the knowledge you
gained in the previous chapter. Soon you will have your amazing custom
charts!
OceanofPDF.com
Chapter 8 - Converting CSV to
Excel
There are many common file types that you will need to work with as a
software developer. One such format is the CSV file. CSV stands for
“Comma-Separated Values” and is a text file format that uses a comma as a
delimiter to separate values from one another. Each row is its own record
and each value is its own field. Most CSV files have records that are all the
same length.
Microsoft Excel opens CSV files with no problem. You can open one
yourself with Excel and then save it yourself in an Excel format. The
purpose of this chapter is to teach you the following concepts:
You will be using Python to do the conversion from one file type to the
other.
book_title,author,publisher,pub_date,isbn
Python 101,Mike Driscoll, Mike Driscoll,2020,123456789
wxPython Recipes,Mike Driscoll,Apress,2018,978-1-4842-3237-8
Python Interviews,Mike Driscoll,Packt Publishing,2018,9781788399081
Save this file as books.txt. You can also download the CSV file from this
book’s GitHub code repository.
Now that you have the CSV file, you need to create a new Python file too.
Open up your Python IDE and create a new file named csv_to_excel.py.
Then enter the following code:
# csv_to_excel.py
import csv
import openpyxl
workbook = openpyxl.Workbook()
sheet = workbook.active
for row in csv_data:
sheet.append(row)
workbook.save(excel_file)
if __name__ == "__main__":
csv_to_excel("books.csv", "converted_books.xlsx")
Your code uses Python’s csv module in addition to OpenPyXL. You create
a function, csv_to_excel(), that accepts two arguments:
You want to extract each row of data from the CSV. To extract the data, you
create a csv.reader() object and then iterate over one row at a time. For
each iteration, you append the row to csv_data. A row is a list of strings.
The next step of the process is to create the Excel spreadsheet. To add data
to your Workbook, you iterate over each row in csv_data and append()
them to your Worksheet. Finally, you save the Excel spreadsheet.
When you run this code, you will have an Excel spreadsheet that looks like
this:
Fig. 8-1: Converted CSV to Excel Spreadsheet
You are now able to convert a CSV file to an Excel spreadsheet in less than
twenty-five lines of code!
Now you are ready to learn how to convert an Excel spreadsheet to a CSV
file!
You can convert an Excel spreadsheet to a CSV file using Python. Create a
new file named excel_to_csv.py and add the following code:
# excel_to_csv.py
import csv
import openpyxl
# Write to CSV
with open(csv_file, 'w') as csv_file_obj:
writer = csv.writer(csv_file_obj, delimiter=',')
for line in csv_data:
writer.writerow(line)
if __name__ == "__main__":
excel_to_csv("books.xlsx", "new_books.csv")
Once again you only need the csv and openpyxl modules to do the
conversion. This time, you load the Excel spreadsheet first and iterate over
the Worksheet using the iter_rows method. The value you receive in each
iteration of iter_tools is a list of strings. You append the list of strings to
csv_data.
The next step is to create a csv.writer(). Then you iterate over each list of
strings in csv_data and call writerow() to add it to your CSV file.
Once your code finishes, you will have a brand new CSV file!
Wrapping Up
Converting a CSV file to an Excel spreadsheet is easy to do with Python.
It’s a useful tool that you can use to take in data from your clients or other
data sources and transform it into something that you can present to your
company.
You can apply cell styling to the data as you write it to your Worksheet too.
By applying cell styling, you can make your data stand out with different
fonts or background row colors.
Try this code out on your own Excel or CSV files and see what you can do.
OceanofPDF.com
Chapter 9 - Using Pandas with
Excel
OpenPyXL is not the only tool you can use to read and write Excel
spreadsheets. One popular package used by the scientific Python
community that can read and write Excel spreadsheets is Pandas.
If you look in the Pandas documentation, you will discover that it is using
the following Python packages for reading and writing Excel though:
Pandas doesn’t have Excel file manipulation built-in. Instead, Pandas is off-
loading that functionality to one or more external Python packages. When
you search for reading and writing Excel files, you will discover that
Pandas is a popular choice. The reason is that Pandas is incredibly popular
for data analysis, and it provides a thin wrapper around other packages.
Pandas makes working with Excel easier for some developers because they
already use Pandas and don’t need to switch to another tool.
In this chapter, you will learn how to use Pandas to do the following:
Your first step is to get Pandas installed so that you can start using it to read
and write Excel!
Install Pandas and Dependencies
Installing Pandas is nice and easy with pip. All you need to do is the
following:
If you are using Anaconda, you will want to use the conda package manager
instead. Here is the proper conda command:
Once you have Pandas installed, you should be good to go if you plan to
work with the xlsx file type. You should already have OpenPyXL installed.
If you do not have OpenPyXL installed, then the code in this chapter will
not work. If you plan to work with the xls file type, you will need to install
xlrd and xlwt. You can install these additional packages using pip or
conda.
Now that you have everything you need, you can move on and learn how to
read an Excel spreadsheet with Pandas!
Now open up your Python editor and create a new file named
read_excel.py. Then enter the following code:
# read_excel.py
import pandas as pd
df = pd.read_excel('books.xlsx')
print(df)
Here you import the pandas package as pd where pd is an alias for pandas.
Then you call read_excel() and give it the path to your Excel file. The
read_excel() function returns a DataFrame, which is a two-dimensional
data structure for holding tabular data.
Finally, you print out the DataFrame. The following shows what the result
of running this code displays:
[3 rows x 5 columns]
You can use Pandas to load up a data set that you have stored in an Excel
spreadsheet. Once you convert the spreadsheet to a Pandas DataFrame, you
can analyze the data.
Pandas will also let you choose which sheets to load. You will find out how
to do load different sheets next!
Instead of reading the entire spreadsheet as you did in the last example, this
time you will pass in the sheet_name as a parameter to read_excel(). The
sheet_name can be used to specify which sheet you want to load, either by
index or by sheet name.
To see how this works, create a new Python file named
read_excel_by_sheet.py. Then enter this code:
# read_excel_by_sheet.py
import pandas as pd
print(sheet_one_data)
print(sheet_two_data)
This code will print out two DataFrames, one for each sheet when you run
this code. The output will look like this:
[4 rows x 7 columns]
Title Amazon Leanpub Gumroad
0 Python 101 100 433 10
1 wxPython Recipes 5 0 0
2 Python Interviews 10 0 0
Now it’s your turn! Try this code with your own Excel spreadsheets and
modify it to work with your sheet names. You will quickly learn to use
Pandas to read Excel files by testing this code yourself with your data.
You are now ready to learn how to turn Pandas DataFrames into an Excel
spreadsheet!
Write DataFrames to Excel
When you are doing data analysis with Pandas, you will be working with
DataFrames. However, when you finish your analysis, you may want to
share the results with non-technical people. Pandas provides the
to_excel() method for this purpose. You can convert your DataFrames to
Excel!
Create a new file named create_excel.py and then enter the following
code:
# create_excel.py
import pandas as pd
Here you create a Pandas DataFrame. The index parameter contains the first
item in each row, starting in row two. The columns parameter contains the
column labels, starting in column B. The other lists are mapped as rows in
Excel when running the to_excel() method. You set the sheet_name
yourself here, but the sheet name would default to Sheet1 if you didn’t.
When you run this code, you will get the following Excel spreadsheet:
Fig. 9-1: Converted Pandas DataFrame to Excel Spreadsheet
If you have multiple DataFrames that you want to convert to Excel, Pandas
provides the ExcelWriter() class. The ExcelWriter() provides an object
that is accepted by the to_excel() method.
# create_excel_sheets.py
import pandas as pd
def create_multiple_sheets(path):
df = pd.DataFrame(
[[100, 433, 10], [34, 10, 0], [75, 125, 5]],
index=["Python 101", "Python 201", "wxPython"],
columns=["Amazon", "Leanpub", "Gumroad"],
)
df2 = pd.DataFrame(
[[150, 233, 5], [5, 15, 0], [10, 120, 5]],
index=[
"Jupyter Notebook",
"Python Interview",
"Pillow: Image Processing with Python",
],
columns=["Amazon", "Leanpub", "Gumroad"],
)
with pd.ExcelWriter(path) as writer:
df.to_excel(writer, sheet_name="Books")
df2.to_excel(writer, sheet_name="More Books")
if __name__ == "__main__":
create_multiple_sheets("pandas_to_excel_sheets.xlsx")
In this example, you have two DataFrames. When you turn them into an
Excel spreadsheet, you create an ExcelWriter object that you pass to each
of the DataFrame instance’s to_excel() method. You convert each
DataFrame into a Worksheet in the Excel spreadsheet.
When you run this code, you will end up with a spreadsheet that contains
two Worksheets and looks like this:
OpenPyXL has support for the Pandas DataFrame built-in. All you need to
do is import dataframe_to_rows from openpyxl.utils.dataframe and it
will convert a DataFrame to a row that is added to your OpenPyXL
Worksheet object.
To see how that works, create a new file named
add_dataframes_to_excel.py and enter the following code:
# add_dataframes_to_excel.py
import pandas as pd
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
def add_dataframes_to_excel(path):
workbook = openpyxl.Workbook()
sheet = workbook.active
df = pd.DataFrame(
[[100, 433, 10], [34, 10, 0], [75, 125, 5]],
index=["Python 101", "Python 201", "wxPython"],
columns=["Amazon", "Leanpub", "Gumroad"],
)
if __name__ == "__main__":
add_dataframes_to_excel("df_to_excel.xlsx")
This code probably looks familiar. You create a Workbook() object and
extract the active Worksheet. Then you have the Pandas DataFrame from
earlier. The for loop is the new code where you convert the DataFrame to a
series of rows that you can append() to your Worksheet.
Running this code will create an Excel spreadsheet that looks the same as
the one you created at the beginning of this section.
# csv_to_excel_pandas.py
import pandas as pd
if __name__ == "__main__":
csv_to_excel("books.csv", "pandas_csv_to_excel.xlsx", "Books")
Here you read in the CSV file using the read_csv() method, which takes
the path to the CSV file (or a file-like object) and converts the file to a
Pandas DataFrame. Next, you call to_excel() to convert the DataFrame to
an Excel spreadsheet.
Wrapping Up
Millions of users use Pandas to do data analysis and number crunching.
This chapter doesn’t even begin to scratch the surface of what you can do
with Pandas. However, the purpose of this chapter is to teach you how to
use Pandas to read and write Excel spreadsheets.
Spend some time reading your spreadsheets with Pandas. If you are already
familiar with Pandas and have some old code, try turning your existing
DataFrames into Excel. Practicing with Pandas will help cement these ideas
in your mind.
OceanofPDF.com
Chapter 10 - Python and Google
Sheets
There are alternatives to using Microsoft Excel for managing your data in
spreadsheets. One popular option is to use Google Sheets, which is an
online option. Google Sheets provides many of the same features that
Microsoft Excel does but is primarily only in the cloud. Microsoft has a
cloud version of Excel in their Office 365 offering.
There are several different Python packages you can use to connect to
Google Sheets. For this chapter, you will focus on one called gspread. You
can learn more about gspread by checking out their GitHub repository.
Install gspread
Create credentials with Google
Create a new Google Sheet
Read Google Sheets
Update Google Sheets
Delete Google Sheets
Now that you know what you will learn, it’s time to get started!
Install gspread
The gspread package does not come with the Python programming
language, so you need to install it yourself. You can do that using the pip
utility.
Now that you have gspread and oauth2client installed, you will need to
set up your Google credentials and scopes so you can access Google Sheets
with Python!
https://docs.gspread.org/en/latest/oauth2.html
https://developers.google.com/sheets/api/quickstart/python
You should follow the directions in the links above to make sure you
have everything set up that you need before proceeding.
These are the primary steps that you need to go through. These steps may
change, so refer to the links above if the following instructions don’t work
for you.
Your first step is to open up the Google Cloud Platform (GCP) site, then
create a new project. If you don’t already have any projects of your own in
GCP, it will look something like this:
After clicking New Project, you give it a name. For this tutorial, you can
call it PySpread:
Some of the websites that give instructions on this topic say that you should
Enable APIs. Make sure that you have your new project selected so that you
enable the APIs on the correct project.
Specifically, the APIs you would want to enable are Google Sheets and
Google Drive. These steps seem to be optional, depending on how you plan
to connect to Google Sheets. If you connect as a Service Account, this step
is unnecessary. But it does appear to be required if you connect to Google
without a Service Account.
For completeness, you will learn how to Enable APIs in Google. Click
Enable APIs and Services:
When you click that link / button, you will end up at a screen that looks like
this:
Search for Google Sheets, and you should be able to find it:
Fig. 10-5: Google Sheets (Optional)
)
After adding Google Sheets, go back and search for Google Drive:
Now that you have the APIs you need for your application, you need to
create credentials:
It would be best if you gave the Service Account a name as well. For
consistency, you can name it pyspread in lowercase:
Fig. 10-9: Name Service Account Credentials
You can hit Done at this point as the other configuration steps here are
optional.
You are almost done! However, you need to do one more step to make the
Service Account work for you. Click on the little pencil-like icon on the far
right. It will let you edit your Service Account:
After downloading the file, you can place it in the same folder as your
gspread application code will go in. Or you can remember to use the
absolute path to the JSON file when you call your gspread application.
Note that Google does have a warning against using the JSON file for a
Service Account as the credentials do not expire. Use at your own risk or go
back to Google’s site and learn how to connect to Google Sheets using a
more complicated process.
Now you are ready to start working with Google Sheets using Python!
# hello_gsheet.py
import gspread
from oauth2client.service_account import ServiceAccountCredentials
def authenticate(credentials):
"""
Authenticate with Google and get the client object
"""
scope = [
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive",
]
creds = ServiceAccountCredentials.from_json_keyfile_name(credentials,
scope)
client = gspread.authorize(creds)
return client
def main():
"""
Create a Google Sheet
"""
# Pass in the file location for the JSON file you created
client = authenticate("pyspread.json")
try:
workbook = client.open("test")
print("Test Sheet opened")
except:
workbook = client.create("test")
print("Test Sheet created")
sheet = workbook.sheet1
sheet.update("A1", [[1, 2], [3, 4]])
workbook.share("YOUR_EMAIL_ADDRESS", perm_type="user", role="writer")
if __name__ == "__main__":
main()
Next, you add data to cells A1, A2, B1 and B2. You then use the share()
method to send your email the “test” Google Sheet.
If you open your new Google Sheet, it will look like this:
Now you know how to create a Google Sheet with Python! You are ready to
learn how to read data from Google Sheets!
Read Google Sheets
Reading a Google Sheet requires you to use the same code you wrote in the
previous section. Once you have the Google client object, you have three
different options for opening the Google Sheet.
workbook = client.open("test")
You can also open a Google Sheet by key. The key comes from the URL,
which you can extract yourself:
workbook = client.open_by_key("YOUR_HASHED_GOOGLE_KEY")
workbook = client.open_by_url('https://docs.google.com/spreadsheet/ccc?
key=0Bm...END\
_OF_KEY')
For this tutorial, you will use open() since you already know the name of
the Google Sheet that you wish to read. Open up your Python editor and
create a new file named read_gsheet.py. Then enter the following code:
# read_gsheet.py
import gspread
from oauth2client.service_account import ServiceAccountCredentials
def authenticate(credentials):
scope = [
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive",
]
creds = ServiceAccountCredentials.from_json_keyfile_name(credentials,
scope)
client = gspread.authorize(creds)
return client
def main():
client = authenticate("pyspread.json")
try:
workbook = client.open("test")
print("Test Sheet opened")
except Exception:
print("Error opening test spreadsheet")
return
sheet = workbook.sheet1
print("Worksheets: " + str(workbook.worksheets()))
print(f"All values on row 1: {sheet.row_values(1)}")
print(f"All values in column 1: {sheet.col_values(1)}")
print(f"All values in worksheet: {sheet.get_all_values()}")
# Get all values from worksheet as a list of dictionaries
values = sheet.get_all_records()
print(f"All records: {values=}")
if __name__ == "__main__":
main()
This code may look a little familiar. The authenticate() function remains
the same, but you did update your main() function. In this example, you
attempt to open the “test” Google Sheet. If not found, you print an error and
exit the program.
If the worksheet is found, then you extract data from it. You use the
following methods:
When you run this code, you will see the following output:
value = sheet.acell('B1').value
Alternatively, you can also get the value by using row and column
coordinates:
If the cell contains a formula that you want to extract, you would do one of
the following:
Now that you have a good foundation in reading Google Sheets, you are
ready to learn about updating their data!
The first method uses the update() method. You pass in the name of the
cell and its new value:
You can also use row and column coordinates to set a cell’s value using the
update_cell() method:
def main():
client = authenticate("pyspread.json")
try:
workbook = client.open("test")
print("Test Sheet opened")
except:
workbook = client.create("test")
print("Test Sheet created")
sheet = workbook.sheet1
workbook.del_worksheet(sheet)
Sometimes you may only need to clear the worksheet of its data. In that
case, you can use the worksheet’s clear() method:
sheet.clear()
sheet.batch_clear(["A1:B1", "C2:E2"])
Now you know the basics of deleting and clearing a Google Sheet!
Wrapping Up
This chapter taught you how to connect Python to Google Sheets using the
gspread package. Specifically, you learned about the following:
Install gspread
Create credentials with Google
Create a new Google Sheet
Read Google Sheets
Update Google Sheets
Delete Google Sheets
This chapter does not cover everything that you can do with gspread. The
gspread package allows you to search for cells as well as format the cells.
There is also integration support for NumPy and Pandas in gspread. See the
gspread documentation for more details.
In the meantime, practice what you have learned in this chapter and see
how well you can automate Google Sheets with Python!
OceanofPDF.com
Chapter 11 - XlsxWriter
OpenPyXL isn’t the only Python package for Excel. There are many others.
One of the best, most full-featured alternatives to OpenPyXL is XlsxWriter.
This chapter aims to give you an overview of what you can do with
XlsxWriter. However, it will not be exhaustive because XlsxWriter has
many of the same features plus a few different ones that OpenPyXL does
not.
Before you can start using XlsxWriter, you need to install it.
Installation
XlsxWriter doesn’t come with Python, so you will need to install it. You can
use Python’s pip utility to get XlsxWriter on your system by using the
following command:
Now you have XlsxWriter installed on your computer. You are ready to get
started by creating your first spreadsheet with XlsxWriter!
Creating an Excel Spreadsheet
Creating an Excel Spreadsheet with XlsxWriter can be done with only a few
lines of code. To see how this works, you will need to open your Python
editor and create a new file named create_spreadsheet.py.
# create_spreadsheet.py
import xlsxwriter
def create_workbook(path):
workbook = xlsxwriter.Workbook(path)
sheet = workbook.add_worksheet(name="Hello")
data = [["Python 101", 1000],
["Jupyter Notebook 101", 400],
["ReportLab: PDF Processing", 250]
]
row = 0
col = 0
for book, sales in data:
sheet.write(row, col, book)
sheet.write(row, col + 1, sales)
row += 1
sheet2 = workbook.add_worksheet(name="World")
workbook.close()
if __name__ == "__main__":
create_workbook("hello.xlsx")
The first step in writing your new code is to import xlsxwriter itself. Then
you create a xlsxwriter.Workbook() object, which takes the file path of
the Excel spreadsheet that you plan to save.
You then call add_worksheet() with the name of the worksheet that you
are adding. The call to add_worksheet() returns a Worksheet object. Since
an empty spreadsheet is boring, you create some data and then loop over it.
In each iteration, you write() a row of data to the Worksheet object
(sheet).
At the end of your code, you create a second blank worksheet and then save
the workbook.
When you run this code, you will create an Excel spreadsheet named
hello.xlsx. If you open the spreadsheet, it will look like this:
Isn’t that a lovely spreadsheet? Good job! Now you’re ready to learn how to
add some formatting to your cells!
Formatting Cells
Excel lets you format every cell in your spreadsheet differently. The
XlsxWriter package also gives you the ability to programmatically format
each cell differently. You can make your spreadsheet more attractive
looking by adding formatting to your cells.
In this example, you will learn to make text bold and apply some number
formatting to your cells. To get started, create a new file named
formatting_spreadsheet.py and then add this code to it:
# formatting_spreadsheet.py
import xlsxwriter
def format_data(path):
workbook = xlsxwriter.Workbook(path)
sheet = workbook.add_worksheet(name="Hello")
data = [["Python 101", 1000, 9.99],
["Jupyter Notebook 101", 400, 14.99],
["ReportLab: PDF Processing", 250, 24.99]
]
# Add headers
sheet.write("A1", "Book Title", bold)
sheet.write("B1", "Copies Sold", bold)
sheet.write("C1", "Book Price", bold)
row = 1
col = 0
for book, sales, amount in data:
sheet.write(row, col, book)
sheet.write(row, col + 1, sales)
sheet.write(row, col + 2, amount, money)
row += 1
workbook.close()
if __name__ == "__main__":
format_data("formatting.xlsx")
This code starts out much like the last example. But this time around, you
add one additional piece of data per row, namely the “Book Price”.
The next step is to add header columns that describe what is in the
following rows. For the header columns, you apply the bold formatting to
make them stand out. Then you look over the data you created and add each
row to the worksheet. The last write() call that you make in this code uses
the money formatting object to apply formatting to the “Book Price” to have
a decimal place and a dollar sign.
Try running this code and then open the Excel spreadsheet that it generates.
Your new spreadsheet will look like this:
This spreadsheet looks nice and professional now. You should try to figure
out how to create a totals field and add that to your spreadsheet.
Now you are ready to learn how to add a chart using XlsxWriter!
Adding a Chart
Creating a chart is a good way to visualize your data using Microsoft Excel.
The XlsxWriter package can create charts for you as well. For this example,
you will create a line chart.
To get started, you will need to create a new Python file named
line_chart.py. Then enter the following code:
# line_chart.py
import xlsxwriter
def chart(path):
workbook = xlsxwriter.Workbook(path)
sheet = workbook.add_worksheet(name="Hello")
if __name__ == "__main__":
chart("line_chart.xlsx")
In this example, you create the Workbook and worksheet objects the same
way you did in the previous examples. Then you add some data using the
write_column() method. The write_column() method allows you to write
out data using “A1” (or named cell) syntax.
Finally, you pass the chart location and the chart object to insert_chart().
When you run this code and open the Excel spreadsheet, you will see the
following:
Fig. 11-3: Adding a Line Chart with XslxWriter
Creating Sparklines
A Sparkline is a small chart that fits into a single cell. In contrast, regular
charts are placed over multiple cells. You can add a Sparkline by using the
worksheet’s add_sparkline() method. Sparklines were added to Microsoft
Excel in the 2010 edition.
To see how this works, create a new Python file named sparklines.py and
enter this code:
# sparklines.py
import xlsxwriter
def sparklines(path):
workbook = xlsxwriter.Workbook(path)
sheet = workbook.add_worksheet(name="Sparky")
spark_styles = [{'range': 'Sparky!A1:E1',
'markers': True},
{'range': 'Sparky!A2:E2',
'type': 'column',
'style': 12},
{'range': 'Sparky!A3:E3',
'type': 'win_loss',
'negative_points': True}
]
workbook.close()
if __name__ == "__main__":
sparklines("sparklines.xlsx")
This code shows you how to create three different types of Sparklines. The
first Sparkline uses markers. The second uses style 12 and is of type
“column”. The third is of type “win_loss” and you also turn on
negative_points.
To add the Sparklines to your spreadsheet, you loop over a nested list of
lists and add each row to the spreadsheet. You also add a Sparkline and
specify which spark style to use.
When your code finishes, you should end up with a spreadsheet that looks
something like this:
Data Validation
XslxWriter has decent documentation for their data validation support. You
can use data validation to validate that the data the user enters is correct.
For this example, you will create a cell that only accepts integers between
the values of 1 and 15. Create a new file named data_validation.py and
enter the following code:
# data_validation.py
import xlsxwriter
def validate(path):
workbook = xlsxwriter.Workbook(path)
sheet = workbook.add_worksheet()
sheet.set_column('A:A', 34)
sheet.set_column('B:B', 15)
header_format = workbook.add_format(
{
"border": 1,
"bg_color": "#33f3ff",
"bold": True,
"text_wrap": True,
"valign": "vcenter",
"indent": 1,
}
)
if __name__ == "__main__":
validate("validation.xlsx")
The first new piece of code here is the set_column() method. You can use
set_column() to set a column’s width. In this example, you set the A and B
column’s to different widths.
header_format = workbook.add_format(
{
"border": 1,
"bg_color": "#33f3ff",
"bold": True,
"text_wrap": True,
"valign": "vcenter",
"indent": 1,
}
)
This code allows you to add a border, set the background color, bold the
text, and more. Study this dictionary carefully to see what all it does for
you. After you define this code, you apply it to cells A1 and B1.
When you run this code, your spreadsheet will look like this:
Fig. 11-5: Adding Data Validation with XslxWriter
Try entering an integer that is outside the range of 1-15. Here is what
happens when you enter 18:
You get an alert message telling you that you can’t do that! Check out
XlxsWriter’s documentation for many other examples of data validation.
Wrapping Up
The XlxsWriter package is very versatile. XlxsWriter is a great way to
generate Microsoft Excel spreadsheets in Python. The only downside is that
XlxsWriter does not support reading Excel spreadsheets.
XlxsWriter can do much more than what is covered here. For example, you
can add filters, cell comments, conditional formatting, worksheet tables,
and much more. Check this package out, and try creating your spreadsheets
today!
OceanofPDF.com
Appendix A - Cell Comments
Microsoft Excel can add comments to each cell in your spreadsheet. A
company may use comments to give feedback about how the spreadsheet
works or add documentation about complex formulas in the spreadsheet.
You will start by learning how to add comments and view them solely with
Microsoft Excel. Then you will learn how to use OpenPyXL and Python to
work with cell comments.
Choose New Comment from the context menu. It’s quite a ways down the
list of menu options, but you’ll find it. After selecting New Comment, you
will see the following dialog:
Your comment will save with a timestamp and appear like this:
Microsoft allows you to add another comment at this time if you want to do
so. When there are multiple comments on a single cell, they are called
threaded comments.
Save your Excel spreadsheet with the name comments.xlsx. If you don’t
have Microsoft Excel yourself, you can download the comments.xlsx file
from this book’s source repository on GitHub.
Now you are ready to learn about adding comments using OpenPyXL!
Adding Comments to Cells with OpenPyXL
The documentation for OpenPyXL on comments claims you can add and
save comments to an Excel spreadsheet. Depending on the version of Excel
that you have, when you follow the documentation, you may end up saving
a note, not a comment. Microsoft changes the terminology depending on
whether or not you are using a standalone Excel version versus Office 365.
Regardless, you will use their currently documented method for adding
comments, and perhaps it will work the way it should at some point. Go
ahead and create a file named creating_comments.py. Then enter the
following code:
# creating_comments.py
print(comment.text)
print(f"Comment author: {comment.author}")
workbook.save(filename)
if __name__ == "__main__":
main("automated_comments.xlsx", "B1")
You will need to import the Comment class from openpyxl.comments to use
it. The Comment takes four parameters:
If you open the Excel file and mouse over B1, you will see that the
comment doesn’t pop up. Instead, you’ll see a different kind of box:
This spreadsheet shows a note, not a comment. You can confirm that it is a
note by right-clicking on the cell, and you will see options for edit the note
or removing the note.
The next topic you will explore is how well OpenPyXL loads a comment
from a workbook.
You will be using the former approach where you pass in the cell(s) that
you know have comments. Create a new file named reading_comments.py.
Then enter the following code:
# reading_comments.py
if __name__ == "__main__":
main("comments.xlsx", "A1")
Here you create a main() function that takes in the name of the Excel file
and from which cell to load a comment. You use the comments.xlsx file
that you created in the first section. If you don’t have that file handy, you
can download it from GitHub. Next, you use sheet[cell].comment to
extract the comment from the cell.
When you print the comment, you will see something like this:
Your version of Excel allows you to read this threaded comment; however, any
edits
to it will get removed if the file is opened in a newer version of Excel.
Learn more: https://go.microsoft.com/fwlink/?linkid=870924
Comment:
This is a comment in A1 by tc={36BFDB51-47A0-8346-936E-835C6F72E94D}
Loading the comment appears to work correctly in this case. Although the
author appears to saved as a hash that OpenPyXL can’t read.
Wrapping Up
OpenPyXL allows you to read comments in Excel easily. In this appendix,
you learned about the following:
If you can’t get comments to work correctly, you may be using a version of
Excel that doesn’t call them “Notes” or you may be using LibreOffice or
something similar. That’s okay. The functionality is very similar.
OceanofPDF.com
Appendix B - Print Settings
Basics
OpenPyXL gives you the ability to set up your spreadsheet’s print settings.
Their documentation, such as it is, claims near-full support. The only
exception mentioned is that OpenPyXL may have only partial support for
nested headers and footers.
With that said, this chapter will look at the documented features, which are:
You will get started by learning how to set horizontal and vertical centering.
# print_options_center.py
sheet.print_options.horizontalCentered = True
sheet.print_options.verticalCentered = True
workbook.save(path)
if __name__ == "__main__":
center_data("print_options_center.xlsx")
You can tell OpenPyXL to center your data horizondally when printing by
setting the following:
print_options.horizontalCentered = True
sheet.print_options.verticalCentered = True
In this example, you set both to True, which will center the data on the
page. You can confirm this by opening up the print dialog by pressing
CTRL+P or by going to the File menu and choosing the Print option.
Either way, you will see the following dialog:
Fig. Appendix B-1: Centering Data Horizontally and Vertically
This dialog gives you a preview of how your printed sheet will appear on
paper. As you can see, your data is centered on the page.
Try enabling one or the other of these settings and then re-run the code. You
can then re-open the Print dialog and see how the document layout
changes.
Now you are ready to move on and learn about adding page headers.
Adding Headers
OpenPyXL lets you set a header for each page that you print. If you have a
large spreadsheet with many rows in it, then it can be helpful to add a
header to each page. The header is especially helpful if you include the page
numbers, so if the large document were to get dropped, you could easily re-
sort the pages.
You will need to create a new Python file to see how this works. Name your
new file print_options_header.py and then enter the following code:
# print_options_header.py
def add_header(path):
workbook = Workbook()
sheet = workbook.active
sheet["A1"] = "Hello"
sheet["A2"] = "from"
sheet["A3"] = "OpenPyXL"
workbook.save(path)
if __name__ == "__main__":
add_header("print_options.xlsx")
Fortunately, it will print out the current page and the total number of pages
in this format: “Page 1 of N”. However, you don’t have any way of
knowing if there is a way to change this. For example, some companies
don’t print a page number on the first page. There is no way of knowing if
that is supported programmatically here.
The rest of the code in this example sets the font face, its size, and its color.
When you run this code and open up the Print dialog, you will see the
following:
This code doesn’t say if you can set the header to be in the center or on the
right. However, you can replace left with center or right, and the header
will be shifted accordingly on the page.
This screenshot makes it a little easier to see the formatting of your data on
the page. Go ahead and experiment, and you’ll discover some of the
subtleties of print option formatting in OpenPyXL.
Now you can move on and learn about adding titles to the data.
You can do something similar when you go to print your data. You want to
be able to always show a particular set of columns and rows on every page.
OpenPyXL lets you set this programmatically.
# print_titles.py
workbook.save(path)
if __name__ == "__main__":
print_titles("print_titles.xlsx")
Here you use the sheet object’s print_title_cols to set which columns
you print on each page. Then you use the sheet object’s print_title_rows
attribute to set which rows you want to print first on each page. In this case,
you set the columns to A:C and the first row.
When you run this code and open the Print dialog, you can see that when
you go to print, it has three pages:
Fig. Appendix B-4: Adding Titles in Print Dialog
It’s hard to see in this screenshot, but if you click through each page in the
print preview, you can see that the first row is the same in all of them.
If you’d like to set the print area with Python, you must modify the sheet
object’s print_area attribute.
You can see this in action by writing some code! Create a new file and
name it print_area.py. Then enter this code:
# print_area.py
def print_area(path):
workbook = Workbook()
sheet = workbook.active
workbook.save(path)
if __name__ == "__main__":
print_area("print_area.xlsx")
Here you set the print_area to A1:B20. That will print the first 20 rows,
but only the data in columns A and B.
If you run this code and open up the Print dialog, you will the following:
Fig. Appendix B-5: Specifying the Print Area in Print Dialog
This screenshot shows that you are only printing one page, and that page
isn’t going to be full of data either because it is only twenty rows worth of
data! Try modifying the print area range yourself and see how it changes for
yourself!
Wrapping Up
OpenPyXL provides a fairly robust API for modifying the print settings in
Microsoft Excel. You should set any print settings that you need to,
although you may have to resort to some trial and error since the API is not
fully documented.
Try modifying the code examples in this appendix. You can learn a lot
about the print settings by trying out different settings.
OceanofPDF.com
Appendix C - Formulas
Microsoft Excel supports many different types of formulas that you may
choose to use on your data. There are several support documents on this
topic on the Microsoft website. Here are just a couple of samples:
For this example, you will use Excel’s PI() function, which returns the
value of pi (3.14159…):
=PI() * B12 ^ 2
Now that you know what makes up a formula, you are ready to write your
own!
For example, if you wanted to find the sum of the first 12 rows in column
“B” in Excel, you could type this into cell “B13”:
=SUM(B1:B12)
The formula above tells Microsoft Excel to add up all the numeric data in
cells B1 through B12.
This code shows you that OpenPyXL supports 352 formulas in Excel. Now
that you know what formulas you can create, it would be great to use that
knowledge.
Create a new file named summing.py and enter the following code:
# summing.py
def main(filename):
workbook = Workbook()
sheet = workbook.active
# Sum up columns
sheet["A7"] = "Totals"
sheet["B7"] = "=SUM(B2:B6)"
sheet["C7"] = "=SUM(C2:C6)"
workbook.save(filename)
if __name__ == "__main__":
main("summing.xlsx")
Here you create an Excel file with five lines of raw sales data. The last row
is for totals. You can total up those rows using Excel’s SUM() function.
When you run this code, the output will look like this:
You successfully applied the SUM() function on the data in the “B” and “C”
columns of your Excel spreadsheet. Try applying an average or one of the
other useful formulas that are in that list. You can use this technique to add
any of those formulas to the spreadsheet.
If you combine this knowledge with the cell styling chapter, you can
highlight certain pieces of your data very well.
Wrapping Up
OpenPyXL is a very flexible package. OpenPyXL can create over 350
different Excel formulas!
You can take this knowledge and combine it with many of the other tricks
in this book to create unique spreadsheets of your own.
OceanofPDF.com
Afterword
For the second time as an author, I started working on this book with only a
handful of knowledge of the package I was writing about. Fortunately,
OpenPyXL is put together well and I was able to figure out everything I
needed to from their documentation, the package itself, or from examples I
found online.
OpenPyXL is one of the most robust Python packages for reading and
writing Microsoft Excel spreadsheets. While there are some rough spots in
the documentation that make it hard to follow at the time I wrote this book,
I still feel like OpenPyXL is one of the best packages for processing Excel
spreadsheets.
I sincerely hope you find this book valuable for expanding your knowledge
of Python and for helping you to automate the processing of Excel
spreadsheets!
Mike
OceanofPDF.com