[go: up one dir, main page]

0% found this document useful (0 votes)
975 views277 pages

Math in Society - Lippman, Clifford

MAT 142 Book

Uploaded by

Michael Rozinski
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
975 views277 pages

Math in Society - Lippman, Clifford

MAT 142 Book

Uploaded by

Michael Rozinski
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 277

Math in Society

Mohave Edition 2.0

David Lippman
Remixed by: Laurel Clifford
Math in Society
Mohave Edition 2.0
Contents

Unit 1: Reasoning and Problem Solving


Reasoning: Laurel Clifford, David Lippman
Problem Solving: David Lippman

Unit 2: Patterns and Growth with Accumulation


David Lippman, Laurel Clifford

Unit 3: Finance
David Lippman, Laurel Clifford

Unit 4: Number Theory


Laurel Clifford

Unit 5: Geometry
Geometry: Laurel Clifford
Fractals portion: David Lippman, Melonie Rasmussen

Unit 6: Probability
David Lippman, Jeff Eldridge, onlinestatbook.com, Laurel Clifford

Unit 7: Statistics
David Lippman, Jeff Eldridge, onlinestatbook.com, Laurel Clifford

Unit 8: Voting and Apportionment


David Lippman, Mike Kenyon
Copyright © 2013 David Lippman, 2016 Laurel Clifford

The source book for the bulk of the material was edited by David Lippman, Pierce College Ft
Steilacoom. Added materials and editing by Laurel Clifford, Mohave Community College.
Development of this book was supported, in part, by the Transition Math Project and the
Open Course Library Project.

Statistics, Describing Data, and Probability contain portions derived from works by:
Jeff Eldridge, Edmonds Community College (used under CC-BY-SA license)
www.onlinestatbook.com (used under public domain declaration)

Apportionment is largely based on work by:


Mike Kenyon, Green River Community College (used under CC-BY-SA license)

Number Theory and Geometry sections based on work by:


Laurel Clifford, Mohave Community College
Some geometry exercises (noted in the text) were adapted from CK-12 Foundation materials,
made available using the Creative Commons Attribution-Non-Commercial 3.0 Unported (CC
BY-NC 3.0) License (http://creativecommons.org/licenses/by-nc/3.0) as amended and
updated by Creative Commons from time to time (the “CC license”). Attribution link
included per CK-12 Foundation request: http://www.ck12.org/saythanks

Front cover photo:


Jan Tik, http://www.flickr.com/photos/jantik/, CC-BY 2.0

This text is licensed under a Creative Commons Attribution-Share Alike 3.0 United
States License.

To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to


Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

You are free:


to Share — to copy, distribute, display, and perform the work
to Remix — to make derivative works

Under the following conditions:


Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any
way that suggests that they endorse you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only
under the same, similar or a compatible license.

With the understanding that:


Waiver. Any of the above conditions can be waived if you get permission from the copyright holder.
Other Rights. In no way are any of the following rights affected by the license:
 Your fair dealing or fair use rights;
 Apart from the remix rights granted under this license, the author's moral rights;
 Rights other persons may have either in the work itself or in how the work is used, such as
publicity or privacy rights.
 Notice — For any reuse or distribution, you must make clear to others the license terms of this
work. The best way to do this is with a link to this web page:
http://creativecommons.org/licenses/by-sa/3.0/us/
About the Author/Editor

David Lippman received his master’s degree in mathematics from


Western Washington University and has been teaching at Pierce
College since Fall 2000.

David has been a long time advocate of open learning, open materials,
and basically any idea that will reduce the cost of education for
students. It started by supporting the college’s calculator rental
program, and running a book loan scholarship program. Eventually the
frustration with the escalating costs of commercial text books and the
online homework systems that charged for access led to action.

First, David developed IMathAS, open source online math homework software that runs
WAMAP.org and MyOpenMath.com. Through this platform, he became an integral part of a
vibrant sharing and learning community of teachers from around Washington State that
support and contribute to WAMAP. These pioneering efforts, supported by dozens of other
dedicated faculty and financial support from the Transition Math Project, have led to a
system used by thousands of students every quarter, saving hundreds of thousands of dollars
over comparable commercial offerings.

David continued further and wrote the first edition of this textbook, Math in Society, after
being frustrated by students having to pay $100+ for a textbook for a terminal course.
Together with Melonie Rasmussen, he co-authored PreCalculus: An Investigation of
Functions in 2010.

Acknowledgements

David would like to thank the following for their generous support and feedback.

 Jeff Eldridge, Lawrence Morales, and Mike Kenyon, who were kind enough to
license me use of their works.

 The community of WAMAP users and developers for creating some of the homework
content used in the online homework sets.

 Pierce College students in David’s online Math 107 classes for helping correct typos
and identifying portions of the text that needed improving, along with other users of
the text.

 The Open Course Library Project for providing the support needed to produce a full
course package for this book.
Preface

The traditional high school and college mathematics sequence leading from algebra up
through calculus could leave one with the impression that mathematics is all about algebraic
manipulations. This book is an exploration of the wide world of mathematics, of which
algebra is only one small piece. The topics were chosen because they provide glimpses into
other ways of thinking mathematically, and because they have interesting applications to
everyday life. Together, they highlight algorithmic, graphical, algebraic, statistical, and
analytic approaches to solving problems.

This book is available online for free, in both Word and PDF format. You are free to change
the wording, add materials and sections or take them away. I welcome feedback, comments
and suggestions for future development. If you add a section, chapter or problems, I would
love to hear from you and possibly add your materials so everyone can benefit.

New in This Edition

Edition 2 has been heavily revised to introduce a new layout that emphasizes core concepts
and definitions, and examples. Based on experience using the first edition for three years as
the primarily learning materials in a fully online course, concepts that were causing students
confusion were clarified, and additional examples were added. New “Try it Now” problems
were introduced, which give students the opportunity to test out their understanding in a
zero-stakes format. Edition 2.0 also added four new chapters.

Edition 2.1 was a typo and clarification update on the first 14 chapters, and added 2
additional new chapters. No page or exercise numbers changed on the first 14 chapters.
Edition 2.2 was a typo revision. A couple new exploration exercises were added.
Edition 2.3 and 2.4 were typo revisions.

Supplements

The Washington Open Course Library (OCL) project helped fund the creation of a full
course package for this book, which contains the following features:

 Suggested syllabus for a fully online course


 Possible syllabi for an on-campus course
 Online homework for most chapters (algorithmically generated, free response)
 Online quizzes for most chapters (algorithmically generated, free response)
 Written assignments and discussion forum assignments for most chapters

The course shell was built for the IMathAS online homework platform, and is available for
Washington State faculty at www.wamap.org and mirrored for others at
www.myopenmath.com.

The course shell was designed to follow Quality Matters (QM) guidelines, but has not yet
been formally reviewed.
Notes from the Mohave Remixer, Laurel Clifford :

In 2014, I gratefully began the journey to build on David’s strong


foundation to adapt the Math in Society book to meet Mohave
Community College’s MAT 142 College Mathematics learning
outcomes. It seemed silly to me for a student to pay $150 - $200 for
a physical textbook where only one-third of the content applied to
the course. Along the way, I found I had to write two units from
scratch; this work has become a labor of Hercules and of love.
Constructive feedback and suggestions are welcomed!

All additions including problems and pictures, are either created by me, or the source
indicated. If no notation or attribution is given on illustrations, pictures, or diagrams, these
were either created by me or public domain/no attribution required.

Materials in this edition, along with an online homework and assessment component
delivered via the http://myopenmath.com learning system, were field tested at MCC in online
8 week summer sessions in 2014, 2015, and 2015, and on ground at Lake Havasu Campus
Fall 2014, Spring 2016. Some materials were also used in MAT 142 courses at the ASU
Colleges at Lake Havasu Fall 2015 and Spring 2016 semesters in both 8 week condensed and
16 week full semester courses. I am grateful for the hard work, effort and interest my
students expressed in the content and materials, as well as their questions and suggestions.

I am also very grateful to David Lippman himself for his active involvement in the
myopenmath.com community. He is extremely helpful in answering questions, refining
online math problems, and advocating for OER materials. I have a great respect for the work
his is doing and feel blessed to be able to use it.
College Mathematics Reasoning and Problem Solving 1

“The spirit of mathematics is not captured by spending 3 hours solving 20 look-alike


homework problems. Mathematics is thinking, comparing, analyzing, inventing, and
understanding.”—Peter Doyle 1

Welcome!
Mathematics is more than numbers on a page. Mathematics is thinking. Mathematics is
reasoning. Mathematics is problem solving. We begin our course developing a broad
understanding of reasoning, logic and problem solving which will aid us in understanding the
conditions in our world as well as the rest of the content we will study.

Reasoning about everyday math

Colin is a 10-year old child with Type 1 diabetes, so he must take insulin intravenously
(shots!) with every meal. His insulin dosage is based on the number of grams of
carbohydrates in the meal. When mentally adding up his carbohydrates at lunch, he thought
aloud:

“19 carbs plus 12 carbs is 23 carbs… no, 32 carbs, I mean 31 carbs.”

When asked how he knew the answer was not 32 carbs, he confidently asserted:

“Because an odd number plus an even number is always an odd number.”

But how does he know this assertion is true? Is there a way to justify this assertion using
means other than the innate confidence a 10-year old has in his own abilities?

You may think he already has evidence for his assertion, as 19 carbs plus 12 carbs does
indeed equal 31 carbs. But is this enough evidence to accept that an odd number plus an
even number is always an odd number? What would strengthen his argument? More
evidence!

1Doyle, P. (1994, April 12). Philosophy in Geometry and the Imagination. Retrieved May 30, 2014, from
http://www.geom.uiuc.edu/docs/education/institute91/handouts/handouts.html

© Laurel Clifford Creative Commons BY-SA


Reasoning and Problem Solving 2

“Data! data! data!” he cried impatiently. “I can’t make bricks without clay.”—Sherlock
Holmes2

Let’s rename Colin’s assertion a conjecture, an “educated guess” that he has made about the
relationship between the sum (the result from addition) and its addends (the numbers being
added), and assume he is only working with natural numbers.

Example 1: Colin’s Conjecture


Conjecture: An odd number plus an even number is always an odd number.

To test this conjecture, we generate some examples where we are adding an odd number and
an even number, and observe the value of the result. We already have Colin’s first example,
so let’s include it. Think about what other numbers we could choose: should they be large,
small, single digit, double digit, multi-digit? Will how we choose these numbers affect our
support of the conjecture?

The table below shows twelve different examples of the sum of an odd number and an even
number.

Examples:
19 + 12 = 31 5 + 12 = 17 3 + 8 = 11 5 + 16 = 21 7 + 32 = 39 11 + 12 = 23
29 + 6 = 35 25 + 34 = 59 17 + 18 = 35 15 + 14 = 29 39 + 11 = 50 23 + 36 = 59

What do you notice about the examples in the table and Colin’s conjecture?

Consider that the examples chosen may represent special cases, as all the addends chosen
were less than 40, involving only single and two-digit numbers. Some of them were
consecutive values, such as 11 and 12. Several of them used 5s or 11s, which may not seem
remarkable in this case, but limiting examples to certain multiples may create special cases.
Some conjectures may not generalize or apply beyond the examples chosen. Choosing a
wide variety and large amount of values for our examples will help strengthen the support for
the conjecture.

Try it now 1:
Create 10 more examples of sums of odd and even numbers beyond those given in the table.
Take care to choose a wide variety of examples. Do your examples support Colin’s
conjecture? Can you find a counterexample, a case that does not support the conjecture?

With the inclusion of more examples, have we proved Colin’s conjecture? In order to prove
Colin’s conjecture, we need to show that it is true for every possible sum of an odd number
and an even number. We created 22 examples to support the conjecture, but these 22 hardly
represent all possibilities. We could create even more examples, which would increase the
likelihood of our conjecture being true, but would not prove it. Supporting our conjecture via
specific examples as evidence is an illustration of inductive reasoning.

2 Doyle, A. C. (199). The adventures of Sherlock Holmes. Champaign, Ill.: Project Gutenberg.
Reasoning and Problem Solving 3

Using a bit of algebra, we can show that Colin’s conjecture is true for all possibilities as we
can use variable expressions to represent all possible odd numbers and all possible even
numbers, find their sum, and show that the sum is an odd number.

An even number is a number which is divisible by 2, thus has 2 as a factor: every even
number can be thought of as “2 times something” where the “something” is another natural
number. So if we call that “something” factor n, then 2n represents any even number.

An odd number can be thought of as a number that is one less than an even number: every
odd number can be thought of as “2 times something, then minus 1” where once again the
“something” is another natural number. So if we call that “something” factor k*, then 2k – 1
represents any odd number.

Using these expressions, we can find the sum of any odd number and any even number:
(2k – 1) + (2n)
Reordered: 2k + 2n – 1
Rewritten: 2(k + n) – 1 by factoring out a common factor of 2 from the first two terms

Notice that this result takes the form of an odd number, “2 times something, then minus 1”
where the “something” is (k + n), which is another natural number**.
Thus we’ve shown that the sum of any odd number and any even number is an odd number.

Some comments on the proof above:


*We use different variables (n, k) for the “something” factor to allow for any two odd
or even numbers. If we used the same variable for both, such as 2k and 2k – 1, that
would make the numbers a special case; for example, if k = 3, 2k would be 2(3) = 6,
and 2k – 1 would be 2(3) – 1 = 5, and our two numbers would be consecutive odds
and evens rather than any odd and even.

**The “something” factor (k + n) being a natural number follows from a property


known as closure, which we will study later in the course.

The algebraic argument above is an example of deductive reasoning. We began with


statements that are true or assumed to be true (the expressions for any even number and any
odd number) and showed that the conjecture follows from those statements (the sum of the
numbers was always odd). Using deductive reasoning we have proven the conjecture to be
true. Both our inductive examples and deductive proof had the odd number first and the even
number second in the sum. Due to the commutative property of addition (something which
Colin uses intuitively in his calculations), the result would be the same if we reversed the
order.

Before we move on from Colin to other examples of inductive and deductive reasoning,
consider why Colin might actually care that his conjecture is indeed true: he uses this
relationship among even and odd numbers to help check the reasonableness of his
calculations, and his health depends on accurately calculating his insulin dosages!

We use inductive and deductive reasoning throughout our lives. Each has its inherent
dangers. David, a former student of mine, related how every time he had been to the casino
Reasoning and Problem Solving 4

to play the roulette, he won. Then there was the day he went to the casino to play roulette,
but lost. He assumed that he should always win, so he continued to play. When he finally
quit that night, he had lost so much money that he had to work two additional jobs to pay his
bills for the month.

In David’s situation, he used inductive reasoning to conclude that he should always win
when he played roulette. He based this conclusion on the premise that every time he had
played before he had won. The night that he lost (and thus did not win) contradicted his
conclusion, and thus is a counterexample to his conclusion. As David unfortunately
experienced, all it takes is one counterexample to disprove a conjecture!

Had David used deductive reasoning, he could start with a known (or assumed) truth as his
premise: the odds in roulette favor the casino, and in the long run, the player will lose money
(later in the course, we will show mathematically why this premise is true). Since David is
the player, the conclusion must follow that in the long run, David will lose money, which is
exactly what happened.

Deductive reasoning can provide proof, but also has flaws. A deductive argument can be
valid, but if the premises it is based on are not true, then the conclusion may follow from
false premises. David may have also argued deductively starting from the premise that when
he plays roulette, he always wins. Assuming this premise to be true, since he was playing
roulette that fateful night, he should win, a logically valid argument based on a false premise.

Let’s play with inductive and deductive reasoning using a classic number trick.

Example 2: Number Trick


Choose a natural number (a relatively small one, if you don’t have a calculator) then perform
the following operations:
Example: if we choose 7:
1. Add 4 to your number. 7 + 4 = 11
2. Add 7 to the result of the previous step. 11 + 7 = 18
3. Double the result of the previous step. (18)(2) = 36
4. Subtract 2 from the result of the previous step. 36 – 2 = 34
5. Divide the result of the previous step by 2. 34 ÷ 2 = 17
6. Add 1 to the result of the previous step. 17 + 1 = 18
7. Subtract the number you started out with from the result of
18 – 7 = 11
the previous step.

Try it now 2:
Choose two other numbers besides 7, and repeat the procedures above to them, one at a time.
What do you observe about the final result for each?
What do you predict will happen if we apply the procedure to any number we choose?
What kind of reasoning did you use to reach this conclusion?
Reasoning and Problem Solving 5

If we are careful with our calculations, we should observe that the procedures listed in the
number trick should always produce 11 as the final result. You may notice some patterns
and relationships among the steps in the procedure that provide clues that the results will be
predictable. We can prove deductively that the result will always be 11 if we use a variable
such as n to represent any number and show that 11 will follow as a result.

Example 2: Number Trick continued


Choose a natural number (a relatively small one, if you don’t have a calculator) then perform
the following operations:
Example: if we choose n:
8. Add 4 to your number. n+4=n+4
9. Add 7 to the result of the previous step. n + 4 + 7 = n + 11
10. Double the result of the previous step. (n + 11)(2) = 2n + 22
11. Subtract 2 from the result of the previous step. 2n + 22 – 2 = 2n + 20
12. Divide the result of the previous step by 2. (2n + 20) ÷ 2 = n + 10
13. Add 1 to the result of the previous step. n + 10 + 1 = n + 11
14. Subtract the number you started out with from the result of
n + 11 – n = 11
the previous step.

Logic
Logic is the study of valid reasoning. When searching the internet, we use Boolean logic –
terms like “and” and “or” – to help us find specific web pages that fit in the sets we are
interested in. After exploring this form of logic, we will look at logical arguments and how
we can determine the validity of a claim.

In logic, we look at propositions, statements we can label as true or false. The sentence “It
is hot today in Lake Havasu City” is NOT a propositional statement because we cannot label
it as true or false; hot is subjective as what is hot to one person may not be hot to another.
However, the sentence “The temperature in Lake Havasu City reached 128ºF today”3 is
propositional because we can label it true or false.

Boolean Logic
We can often classify items as belonging to sets. If you went the library to search for a book
and they asked you to express your search using unions, intersections, and complements of
sets, it would feel a little strange. Instead, we typically use words such as and, or, and not to
connect our keywords together to form a search. These words form the basis of Boolean
logic, named after English mathematician George Boole (1815 – 1864), and are known as
Boolean operators. They relate directly to set operations such as the intersection, union and
complement, and allow us to create propositions out of multiple statements. Boolean
operators have very specific mathematical meanings that may be less familiar to you.

3 Record high temperature for Lake Havasu City, AZ is 128F on June 29, 1994. Source: USATODAY.com.
(n.d.). USATODAY.com. Retrieved May 31, 2014, from http://usatoday30.usatoday.com/weather/wheat7.htm
Reasoning and Problem Solving 6

Boolean Logic
Boolean logic combines multiple statements that are either true or false into an
expression that is either true or false.

Suppose M is the set of all mystery books and C is the set of all comedy books. If we search
for “mystery,” we are looking for all the books that are an element of the set M; the search is
true for books that are in the set.

When we search for “mystery and comedy,” we are looking for a book that is in both sets, in
the intersection. If we were to search for “mystery or comedy,” we are looking for a book
that is a mystery, a comedy, or both, which is the union of the sets. If we searched for “not
comedy,” we are looking for any book in the library that is not a comedy, the complement of
the set C. The set operations can be illustrated using a Venn diagram:

M C
(3)
(1) (2)

The two circles represent the mystery books, set M, and the comedy books, set C. The
yellow region, labeled (1), represents the books that are mystery only (in set M but not in set
C). The red region, labeled (2), represents the set of books that are comedy only (in set C but
not in set M). The orange region, labeled (3), represents the books that are in both mystery
and comedy (in sets M and C), the intersection of the two sets. The colored regions, (1), (2)
and (3), together represent the union of the two sets, mystery, comedy, or both.

Connection of Boolean Operators to Set Operations


M and C elements in the intersection, labeled M ⋂ C
M or C elements in the union, labeled M ⋃ C
not M elements in the complement,
labeled ~M or M c
Soup or Salad… or both?
Be careful of the use of the word or as compared to
what you may be familiar with in conversation. When
the server at a restaurant asks you if you’d like soup or
salad, you usually think of the choice as exclusive, and
you can have one option or the other, but not both. In
Boolean logic, the operator or is inclusive: you can have
soup, salad or both. The “both, please” answer is a
critical difference between Boolean logic and everyday
language.
Picture by Oi Naengguk, from Wikimedia Commons
Available via Creative Commons Attribution 2.0 license
Reasoning and Problem Solving 7

Example 3: Library Search


Suppose we are searching a library database for Mexican universities. Express a reasonable
search using Boolean logic.

We could start with the search “Mexico and university”, but would be likely to find results
for the U.S. state New Mexico. To account for this, we could revise our search to read:
Mexico and university not “New Mexico”

In most internet search engines, it is not necessary to include the word and; the search engine
assumes that if you provide two keywords you are looking for both. In Google’s search, the
keyword or has to be capitalized as OR, and a negative sign (-) in front of a word is used to
indicate not. Quotes around a phrase indicate that the entire phrase should be looked for.
The search from the previous example on Google could be written:
Mexico university -“New Mexico”

Example 4: Number Sets


List the numbers that meet the condition:
Even and less than 10 and greater than 0

We need to find the numbers that fit all three conditions.


We could start with those that are even numbers: {…, -10, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10, …},
then narrow this set to those less than 10: {…, -10, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10, …}
and finally narrow this set to those that are greater than 0: {…, -4, -2, 0, 2, 4, 6, 8}
The numbers that satisfy all three requirements are {2, 4, 6, 8}.

Sometimes statements made in English can be ambiguous. For this reason, Boolean logic
uses parentheses to show precedence, just like in algebraic order of operations.

Example 5: English Breakfast?


The English phrase “Go to the store and buy me eggs and bagels or cereal” is ambiguous; it
is not clear whether the requestors wants eggs and letting the buyer choose from either bagels
or cereal, or whether they’re letting the buyer choose to buy eggs and bagels, or just buy
cereal.

For this reason, using parentheses clarifies the intent:


Eggs and (bagels or cereal) means Option 1: Eggs and bagels, Option 2: Eggs and cereal
(Eggs and bagels) or cereal means Option 1: Eggs and bagels, Option 2: Cereal

Example 6: Numbers Sets Redux


List the numbers that meet the condition:
Odd and less than 20 and greater than 0 and (multiple of 3 or multiple of 5)

The first three conditions limit us to the set {1, 3, 5, 7, 9, 11, 13, 15, 17, 19}
The last group of conditions narrows this set to those that are either a multiple of 3 or a
multiple of 5: {1, 3, 5, 7, 9, 11, 13, 15, 17, 19}.
This leaves us with the set {3, 5, 9, 15}
Reasoning and Problem Solving 8

Notice that we would have gotten a very different result if we had written
(Odd and less than 20 and greater than 0 and multiple of 3) or multiple of 5

The first grouped set of conditions would give {3, 9, 15}.


When combined with the last condition, though, this set expands without limits:
{3, 5, 9, 15, 20, 25, 30, 35, 40, 45, …} since we can now include all the multiples of 5
(assuming we are working with natural numbers).

Be aware that when a string of conditions is written without grouping symbols, it is often
interpreted from the left to right, resulting in the latter interpretation.

Conditionals
Beyond searching, Boolean logic is commonly used in spreadsheet applications like Excel to
do conditional calculations. Recall that a proposition is a statement that is either true or
false. A statement like 3 < 5 is true; a statement like “a rat is a fish” is false. A statement
like “x < 5” is true for some values of x and false for others. When an action is taken or not
depending on the value of a statement, it forms a conditional.

Propositions and Conditionals


A proposition is either true or false.
A conditional is a compound statement of the form
“if p then q” or “if p then q, else s”.

Example 7: Take a Hike?


In common language, an example of a conditional statement would be “If it is raining, then
we’ll go to the mall. Otherwise we’ll go for a hike.”

The statement “If it is raining” is the condition – this may be true or false for any given day.
If the condition is true, then we will follow the first course of action, and go to the mall. If
the condition is false, then we will use the alternative, and go for a hike.

Example 8: Excel Formula


Conditional statements are commonly used in spreadsheet applications like Excel. In Excel,
you can enter an expression like: =IF(A1<2000, A1+1, A1*2)

Notice that after the IF, there are three parts. The first part is the condition, and the second
two are calculations. Excel will look at the value in cell A1 and decide if it is less than 2000.
If that condition is true, then the first calculation is used, and 1 is added to the value of A1.
If the condition is false, then the second calculation is used, and the value of A1 is multiplied
by 2.

In other words, this statement is equivalent to saying “If the value of A1 is less than 2000,
then add 1 to the value in A1. Otherwise, multiply A1 by 2”
Reasoning and Problem Solving 9

Example 9: Rather Taxing


An accountant needs to withhold 15% of income for taxes if the income is below $30,000,
and 20% of income if the income is $30,000 or more. Write an expression that would
calculate the amount to withhold.

Our conditional needs to compare the value to 30,000. If the income is less than 30,000, we
need to calculate 15% of the income: 0.15*income. If the income is more than 30,000, we
need to calculate 20% of the income: 0.20*income.

In words, we could write “If income < 30,000, then multiply by 0.15, otherwise multiply by
0.20”.
In Excel, we would write: =IF(A1<30000, 0.15*A1, 0.20*A1)

As we did earlier, we can create more complex conditions by using the operators and, or, and
not to join simpler conditions together.

Example 10: Wrangling Children


A parent might say to their child “if you clean your room and take out the garbage, then you
can have ice cream.”

Here, there are two simpler conditions:


1) The child cleaning her room
2) The child taking out the garbage

Since these conditions were joined with and, then the combined conditional will only be true
if both simpler conditions are true; if either chore is not completed then the parent’s
condition is not met. If the child completes both tasks to meet both conditions, it must
follow that they get ice cream. Heaven help the parent who doesn’t follow through!

Try it now 3:
Suppose you have a standard deck of 52 playing cards, where
J, Q, and K are face cards.

How many cards in the deck fit the condition of being red
and a face card?

How many cards in the deck fit the condition of being red or
a face card?

How many cards in the deck fit the condition of being red
and not a face card?

How many cards in the deck fit the condition of being not
red and a face card?
Reasoning and Problem Solving 10

Implications
When we discussed conditions earlier, we discussed the type where we take an action based
on the value of the condition. We are now going to talk about a more general version of a
conditional, sometimes called an implication.

Implications
Implications are logical conditional sentences stating that a statement p, called the
antecedent, implies a consequence q.

Implications are commonly written as p → q.


A negation of the antecedent is written as ~p, meaning “not p.”
A negation of the consequence is written as ~q, meaning “not q.”

Implications are similar to the conditional statements we looked at earlier; p → q is typically


written as “if p then q”, or “p therefore q”. The difference between implications and
conditionals is that conditionals we discussed earlier suggest an action – if the condition is
true, then we take some action as a result. Implications are a logical statements that suggest
that the consequence must logically follow if the antecedent is true. If p happens, q must
follow.

Example 11: Rainy Day


The English statement “If it is raining, then there are clouds is the sky” is a logical
implication. It is a valid implication because when the antecedent “it is raining” is true, the
consequence “there are clouds in the sky” must also be true.

We can write this statement symbolically as p → q, where p represents “it is raining” and q
represents “there are clouds in the sky.” When p happens, q has to follow.

Notice that the statement tells us nothing of what to expect if it is not raining. If the
antecedent is false (~p), then the implication becomes irrelevant. We cannot logically
conclude that there are no clouds in the sky (~q) or that there are clouds in the sky (q).

In traditional logic, an implication is considered valid (true) as long as there are no cases in
which the antecedent is true (p happens) and the consequence is false (q doesn’t follow). It is
important to keep in mind that symbolic logic cannot capture all the intricacies of the English
language.

For any implication, there are three related statements, the converse, the inverse, and the
contrapositive.

Related Statements
The original implication is “if p then q” p→q
The converse is: “if q then p” q→p
The inverse is “if not p then not q” ~p → ~q
The contrapositive is “if not q then not p” ~q → ~p
Reasoning and Problem Solving 11

Example 12: Rainy Day Again…


Consider again the valid implication “If it is raining, then there are clouds in the sky”.

The converse would be “If there are clouds in the sky, it is raining.” Notice how the
converse reverses the order of p and q. The converse may not always be true, and in this
case, it is not. Assuming the converse is true is not a valid argument (a fallacy), and is called
converse error.

The inverse would be “If it is not raining, then there are not clouds in the sky.” Notice how
the inverse negates both p (not p, ~p) and q (not q, ~q). Likewise, the inverse is not always
true. Assuming the inverse is true is not a valid argument (a fallacy), and is called inverse
error.

The contrapositive would be “If there are not clouds in the sky, then it is not raining.”
Notice how the contrapositive both negates and reverses the order of p and q. The
contrapositive is logically equivalent to the original implication, and thus true in this case.

Equivalence
A conditional statement and its contrapositive are logically equivalent.
The converse and inverse of a statement are logically equivalent.

Try it now 4:
a. Given the true implication:
If x = 3 then x2 = 9
Write the converse, inverse, and contrapositive and determine if they are true.

b. Given the true implication:


If x = 2 then 2x = 4
Write the converse, inverse, and contrapositive and determine if they are true.

Arguments
A logical argument is a claim that a set of premises support a conclusion. As we have seen,
there are two main types of arguments: inductive and deductive arguments.

Argument types
An inductive argument uses a collection of specific examples as its premises and uses
them to propose a general conclusion.

A deductive argument uses a collection of general statements as its premises and uses
them to propose a specific situation as the conclusion.
Reasoning and Problem Solving 12

Example 13: That’s My Purse!


The argument “when I went to the store last week I forgot my purse, and when I went today I
forgot my purse. I always forget my purse when I go the store” is an inductive argument.

The premises are:


I forgot my purse last week when I went to the store
I forgot my purse today when I went to the store

The conclusion is:


I always forget my purse

Notice that the premises are specific situations, while the conclusion is a general statement.
In this case, this is a fairly weak argument, since it is based on only two instances.

Example 14: Flyover


The argument “every day for the past year, a plane flew over my house at 2:00 pm. A plane
will fly over my house every day at 2:00 pm” is a stronger inductive argument, since it is
based on a larger set of evidence.

Evaluating inductive arguments


An inductive argument is never able to prove the conclusion true, but it can provide
either weak or strong evidence to suggest it may be true.

Many scientific theories, such as the big bang theory, can never be proven. Instead, they are
inductive arguments supported by a wide variety of evidence. Usually in science, an idea is
considered a hypothesis until it has been well tested, at which point it graduates to being
considered a theory. The commonly known scientific theories, like Newton’s theory of
gravity, have all stood up to years of testing and evidence, though sometimes they need to be
adjusted based on new evidence. For gravity, this happened when Einstein proposed the
theory of general relativity.

A deductive argument is clearly valid or not, which makes them easier to evaluate. Be
wary with deductive arguments, as they can be logically valid, where the conclusion follows
from the premises, but not true if the premises themselves are false.

Evaluating deductive arguments


A deductive argument is considered valid if all the premises are true, and the
conclusion follows logically from those premises.

In other words, the premises are true, and the conclusion follows necessarily from
those premises.
Reasoning and Problem Solving 13

Example 15: Hey Tiger!


The argument “All cats are mammals and a tiger is a cat, so a tiger is a mammal” is a valid
deductive argument.

The premises are:


All cats are mammals
A tiger is a cat

The conclusion is: Mammals


A tiger is a mammal

Both the premises are true. To see that the premises must Cats
logically lead to the conclusion, one approach would be use
a Venn diagram. From the first premise, we can conclude Tiger
that the set of cats is a subset of the set of mammals. From x
the second premise, we are told that a tiger lies within the
set of cats. From that, we can see in the Venn diagram that
the tiger also lies inside the set of mammals, so the
conclusion is valid.

Analyzing arguments with Venn diagrams 4


To analyze an argument with a Venn diagram
1) Draw a Venn diagram based on the premises of the argument
2) If the premises are insufficient to determine what determine the location of an
element, indicate that.
3) The argument is valid if it is clear that the conclusion must be true

Example 16: Firefighters and CPR


Premise: All firefighters know CPR
Premise: Jill knows CPR
Conclusion: Jill is a firefighter
Know CPR
From the first premise, we know that firefighters all lie
inside the set of those who know CPR. From the second Jill x?
premise, we know that Jill is a member of that larger set, but
we do not have enough information to know if she also is a x?
member of the smaller subset that is firefighters.
Firefighters
Since the conclusion does not necessarily follow from the
premises, this is an invalid argument, regardless of whether
Jill actually is a firefighter.

It is important to note that whether or not Jill is actually a firefighter is not important in
evaluating the validity of the argument; we are only concerned with whether the premises are
enough to prove the conclusion.

4Technically, these are Euler circles or Euler diagrams, not Venn diagrams, but for the sake of simplicity we’ll
continue to call them Venn diagrams.
Reasoning and Problem Solving 14

Try it Now 5:
Determine the validity of this argument:
Premise: No cows are purple
Premise: Fido is not a cow
Conclusion: Fido is purple

In addition to these categorical style premises of the form “all ___,” “some ____,” and “no
____,” it is also common to see premises that are implications.

Example 17: Marcus in Seattle?


Premise: If you live in Seattle, you live in Washington.
Premise: Marcus does not live in Seattle
Conclusion: Marcus does not live in Washington Washington x?
Marcus x?
From the first premise, we know that the set of people who
live in Seattle is inside the set of those who live in
Washington. From the second premise, we know that
Marcus does not lie in the Seattle set, but we have Seattle
insufficient information to know whether or not Marcus lives
in Washington or not. This is an invalid argument.

Example 18: Got Bread?


Consider the argument
Premise: If you bought bread, then you went to the store
Premise: You bought bread
Conclusion: You went to the store

While this example is hopefully fairly obviously a valid argument, we can analyze it by
representing each of the premises symbolically. We can then look at the implication that the
premises together imply the conclusion.

We’ll let B represent “you bought bread” and S represent “you went to the store”. Then the
argument becomes:
Premise: B→S
Premise: B
Conclusion: S

Recall that if all the premises are true, and the conclusion follows logically from those
premises, then the deductive argument is valid. In this case, we have the premise given (B →
S) and we know that B, the antecedent, is true, then S has to follow. So we can conclude that
S is the result (our conclusion is valid). This form of valid argument is known as the Law of
Detachment.

Recall that B → S does not mean that S → B. If our second premise was S instead of B, we
cannot conclude that B follows from S (watch out for this converse error).
Reasoning and Problem Solving 15

If we look back at Example 17: Marcus in Seattle, can use logical equivalence and symbols
to determine the validity of this argument.

Premise: If you live in Seattle, you live in Washington.


Premise: Marcus does not live in Seattle
Conclusion: Marcus does not live in Washington

Consider the first premise symbolically,


Premise: If you live in Seattle, [then] you live in Washington.
It has the form: S → W
The antecedent S = you live in Seattle, and the consequence W = you live in Washington.

The second premise, Marcus does not live in Seattle, is the negation of the antecedent (“not
S” or ~S), and the conclusion, Marcus does not live in Washington, is the negation of the
consequence (“not W” or ~W).
The argument becomes:
Premise: S→W
Premise: ~S
Conclusion: ~W

The argument is asserting that when S → W is true, ~S → ~W is also true. This statement ~S
→ ~W is the inverse of the original S → W implication, and assuming is also true is a case of
inverse error.

Try it Now 6:
Determine if the argument is valid, and explain your thinking:
Premise: If I have a shovel I can dig a hole.
Premise: I dug a hole
Conclusion: Therefore I had a shovel

Example 19: Shopping Trip.


Premise: If I go to the mall, then I’ll buy new jeans
Premise: If I buy new jeans, I’ll buy a shirt to go with it
Conclusion: If I go to the mall, I’ll buy a shirt.

Let M = I go to the mall, J = I buy jeans, and S = I buy a shirt.


The premises and conclusion can be stated as:
Premise: M→J
Premise: J→S
Conclusion: M → S

Notice how this example links together the M statement to the S statement using the J
statement as a connector (where the first premise ends, the next premise takes off). When
this happens, we can have a valid argument known as a syllogism. Sometimes this form of
argument is called the chain rule, as J acts as a link between the two premises.
Reasoning and Problem Solving 16

Syllogism
A syllogism is an implication derived from two others, where the consequence of one
is the antecedent to the other. The general form of a syllogism is:
Premise: p→q
Premise: q→r
Conclusion: p → r

This is also called the transitive property for implication.

Example 20: Working for the Boat


Premise: If I work hard, I’ll get a raise.
Premise: If I get a raise, I’ll buy a boat.
Conclusion: If I don’t buy a boat, I must not have worked hard.

If we let W = working hard, R = getting a raise, and B = buying a boat, then we can represent
our argument symbolically:
Premise H→R
Premise R→B
Conclusion: ~ B → ~ H

We can use the notation of the contrapositive we learned earlier to recognize that the
implication ~ B → ~ H is logically equivalent to the implication H → B. Rewritten, we can
see that this conclusion is indeed a logical syllogism derived from the premises:
Premise H→R
Premise R→B
Conclusion: H → B

Try it Now 7:
Is this argument valid? Explain.
Premise: If I go to the party, I’ll be really tired tomorrow.
Premise: If I go to the party, I’ll get to see friends.
Conclusion: If I don’t see friends, I won’t be tired tomorrow.

Lewis Carroll, author of Alice in Wonderland, was a math and logic teacher, and wrote two
books on logic. In them, he would propose premises as a puzzle, to be connected using
syllogisms.

Example 21: Babies and Crocodiles


Solve the puzzle. In other words, find a logical conclusion from these premises.
All babies are illogical.
Nobody is despised who can manage a crocodile.
Illogical persons are despised.

Let B = is a baby, D = is despised, I = is illogical, and M = can manage a crocodile.


Reasoning and Problem Solving 17

Then we can write the premises as:


All babies are illogical. B→I
Nobody is despised who can manage a crocodile. M → ~D
Illogical persons are despised. I→D

From the first and third premises, we can conclude that B → D; that babies are despised.
Using the contrapositive of the second premised, D → ~M, we can conclude that
B → ~M; that babies cannot manage crocodiles:
Premise: B→I
Premise: I→D
Premise: D → ~M
Conclusion: B → ~M

While silly, this is a logical conclusion from the given premises.

Visually, this argument builds as follows:


B→I B → I, I → D B → I, I → D, D → ~M

Babies
Babies Babies
Illogical

Illogical Beings Illogical Beings Illogical Beings

Despised Beings Despised Beings

Crocodile Non-managers
Babies are part of the set that cannot manage crocodiles.

Logical Fallacies in Common Language (optional but compelling section!)


In the previous discussion, we saw that logical arguments can be invalid when the premises
are not true, when the premises are not sufficient to guarantee the conclusion, or when there
are invalid chains in logic. There are a number of other ways in which arguments can be
invalid, a sampling of which are given here.

Logical Fallacy: Ad hominem


An ad hominem argument attacks the person making the argument, ignoring the
argument itself.
Reasoning and Problem Solving 18

Example 22: Ad hominem


“Jane says that whales aren’t fish, but she’s only in the second grade, so she can’t be right.”

Here the argument is attacking Jane, not the validity of her claim, so this is an ad hominem
argument.

Example 23: Not ad hominem, just rude!


“Jane says that whales aren’t fish, but everyone knows that they’re really mammals – she’s
so stupid.”

This certainly isn’t very nice, but it is not ad hominem since a valid counterargument is made
along with the personal insult.

Logical Fallacy: Appeal to ignorance


This type of argument assumes something is true because it hasn’t been proven false.

Example 24: Appeal to ignorance


“Nobody has proven that photo isn’t Bigfoot, so it must be Bigfoot.”

Logical Fallacy: Appeal to authority


These arguments attempt to use the authority of a person to prove a claim. While often
authority can provide strength to an argument, problems can occur when the person’s
opinion is not shared by other experts, or when the authority is irrelevant to the claim.

Example 25: Appeal to authority


“A diet high in bacon can be healthy – Doctor Atkins said so.”

Here, an appeal to the authority of a doctor is used for the argument. This generally would
provide strength to the argument, except that the opinion that eating a diet high in saturated
fat runs counter to general medical opinion. More supporting evidence would be needed to
justify this claim.

Example 26: Appeal to authority (celebrity)


“Jennifer Hudson lost weight with Weight Watchers, so their program must work.”

Here, there is an appeal to the authority of a celebrity. While her experience does provide
evidence, it provides no more than any other person’s experience would.

Logical Fallacy: Appeal to consequence


An appeal to consequence concludes that a premise is true or false based on whether
the consequences are desirable or not.
Reasoning and Problem Solving 19

Example 27: Appeal to consequence


“Humans will travel faster than light: faster-than-light travel would be beneficial for space
travel.”

Logical Fallacy: False dilemma


A false dilemma argument falsely frames an argument as an “either or” choice, without
allowing for additional options.

Example 28: False dilemma


“Either those lights in the sky were an airplane or aliens. There are no airplanes scheduled
for tonight, so it must be aliens.”

This argument ignores the possibility that the lights could be something other than an
airplane or aliens.

Logical Fallacy: Circular reasoning


Circular reasoning is an argument that relies on the conclusion being true for the
premise to be true.

Example 29: Circular reasoning


“I shouldn’t have gotten a C in that class; I’m an A student!”

In this argument, the student is claiming that because they’re an A student, they shouldn’t
have gotten a C. But because they got a C, they’re not an A student.

Logical Fallacy: Straw man


A straw man argument involves misrepresenting the argument in a less favorable way
to make it easier to attack.

Example 30: Straw man


“Senator Jones has proposed reducing military funding by 10%. Apparently he wants to
leave us defenseless against attacks by terrorists”

Here the arguer has represented a 10% funding cut as equivalent to leaving us defenseless,
making it easier to attack.

Logical Fallacy: Post hoc (post hoc ergo propter hoc)


A post hoc argument claims that because two things happened sequentially, then the
first must have caused the second.
Reasoning and Problem Solving 20

Example 31: Post hoc ergo propter hoc


“Today I wore a red shirt, and my football team won! I need to wear a red shirt every time
they play to make sure they keep winning.”

Logical Fallacy: Correlation implies causation


Similar to post hoc, but without the requirement of sequence, this fallacy assumes that
just because two things are related one must have caused the other. Often there is a
third variable not considered.

Example 32: Correlation vs. causation


“Months with high ice cream sales also have a high rate of deaths by drowning. Therefore
ice cream must be causing people to drown.”

This argument is implying a causal relation, when really both are more likely dependent on
the weather; that ice cream and drowning are both more likely during warm summer months.

Try it Now 8:
Identify the logical fallacy in each of the arguments
a. Only an untrustworthy person would run for office. The fact that politicians are
untrustworthy is proof of this.
b. Since the 1950s, both the atmospheric carbon dioxide level and obesity levels have
increased sharply. Hence, atmospheric carbon dioxide causes obesity.
c. The oven was working fine until you started using it, so you must have broken it.
d. You can’t give me a D in the class – I can’t afford to retake it.
e. The senator wants to increase support for food stamps. He wants to take the taxpayers’
hard-earned money and give it away to lazy people. This isn’t fair so we shouldn’t do it.

Reference: Visualizing “All”, “Some”, “None” with Venn Diagrams

All: All p are q, p is a Some: Some p are q, p and None: No p are q, p and q
subset of q. q intersect. do not intersect (disjoint).

x
x
p y y
p x q p q
q

Items x in the intersection None of the items x are in q,


All items x that are in p are are in both p and q but items and none of the items y are
also in q. y are in p but not q. in p.
Reasoning and Problem Solving 21

Reference: Visualizing Valid and Invalid Deductive Logic Forms

Implications with the structure “if p then q” written symbolically as p  q


“not p” is written symbolically as  p, “not q” as  q
Valid: Modus Ponens (direct reasoning) Invalid: Converse Error
Premise: p  q Premise: p  q
Premise: p . x Premise: q .
p p
Conclusion:  q Conclusion:  p
q q
Since p occurs, q must The occurrence of q doesn’t x
follow; mean p happened as q could
in p means it’s also in q. follow from other things;
in q doesn’t mean it’s also in p.

Valid: Modus Tollens (indirect reasoning Invalid: Inverse Error


Premise: p  q Premise: p  q
Premise:  q . p Premise:  p . p
Conclusion:   p Conclusion:   q x

x q q
Since q must follow from Although p doesn’t occur,
p, and q didn’t occur, then q could follow from other
p couldn’t have occurred; things, and so q could still occur;
outside q means also outside p as well. outside of p doesn’t mean it has to be
(contrapositive) outside of q.

Valid: Syllogism (transitive, chain rule) Invalid: Example of misuse of chain rule
Premise: p  q Premise: p  q
Premise: q  r Premise: r  q
x
x
p
Premise: p . p Premise: p .
Conclusion:  r Conclusion:  r
q q r

Since q follows from p, and follows from While q may follow from both p and r, it
q, once p occurs, r will follow (like doesn’t mean p and r relate to each other;
dominoes falling); in p means also in q but doesn’t mean it has
in p means also in q, and since q is in r, in to be inside of r.
p means also in r.
Reasoning and Problem Solving 22

Problem Solving
In previous math courses, you’ve no doubt run into the infamous “word problems.”
Unfortunately, these problems rarely resemble the type of problems we actually encounter in
everyday life. In math books, you usually are told exactly which formula or procedure to
use, and are given exactly the information you need to answer the question. In real life,
problem solving requires identifying an appropriate formula or procedure, and determining
what information you will need (and won’t need) to answer the question.

In this chapter, we will review several basic but powerful algebraic ideas: percent, rates, and
proportions. We will then focus on the problem solving process, and explore how to use
these ideas to solve problems where we don’t have perfect information.

Percent
In the 2004 vice-presidential debates, Edwards's claimed that US forces have suffered "90%
of the coalition casualties" in Iraq. Cheney disputed this, saying that in fact Iraqi security
forces and coalition allies "have taken almost 50 percent" of the casualties 5. Who is correct?
How can we make sense of these numbers?

Percent literally means “per 100,” or “parts per hundred.” When we write 40%, this is
40
equivalent to the fraction or the decimal 0.40. Notice that 80 out of 200 and 10 out of
100
80 10 40
25 are also 40%, since   .
200 25 100

Example 33: Hot Dogs


243 people out of 400 state that they like dogs. What percent is this?

243 60.75
 0.6075  . This is 60.75%.
400 100

Notice that the percent can be found from the equivalent decimal by moving the decimal
point two places to the right.

Example 34: Conversion to Percent


1
Write each as a percent: a) b) 0.02 c) 2.35
4

1
a)  0.25 = 25% b) 0.02 = 2% c) 2.35 = 235%
4

5 http://www.factcheck.org/cheney_edwards_mangle_facts.html
Reasoning and Problem Solving 23

Percent
If we have a part that is some percent of a whole, then
part
percent  , or equivalently, part  percent  whole
whole
To do the calculations, we write the percent as a decimal or a fraction.

Example 35: Sales Tax


The sales tax in a town is 9.4%. How much tax will you pay on a $140 purchase?

Here, $140 is the whole, and we want to find 9.4% of $140. We start by writing the percent
as a decimal by moving the decimal point two places to the left (which is equivalent to
dividing by 100): 9.4% = 0.094. We can then compute: tax part = 0.094 140   $13.16
9.4  $140  $1316
Alternatively, we can use the fraction form of percent: tax =    $13.16
100  1  100

Example 36: Percent of Increase


In the news, you hear “tuition is expected to increase by 7% next year.” If tuition this year
was $1200 per quarter, what will it be next year?

The tuition next year will be the current tuition (100%) plus an additional 7%, so it will be
100% + 7% = 107% of this year’s tuition: $1200(1.07) = $1284.

Alternatively, we could have first calculated 7% of $1200: $1200(0.07) = $84.

Notice this is not the expected tuition for next year (we could only wish). Instead, this is the
expected increase, so to calculate the expected tuition, we’ll need to add this change to the
previous year’s tuition: $1200 + $84 = $1284.

Try it Now 9:
A TV originally priced at $799 is on sale for 30% off. There is then a 9.2% sales tax. Find
the price after including the discount and sales tax.

Example 37: Percent of Decrease


The value of a car dropped from $7400 to $6800 over the last year. What percent decrease is
this?

To compute the percent change, we first need to find the dollar value change:
$6800  $7400 = - $600.
Often we will take the absolute value of this amount, which is called the absolute change:
600  600 .
Since we are computing the decrease relative to the starting value, we compute this percent
out of $7400:
600
 0.081  8.1% decrease. This is called a relative change.
7400
Reasoning and Problem Solving 24

Absolute and Relative Change


Given two quantities,
Absolute change = |ending quantity – starting quantity|
absolute change ending quantity  starting quantity
Relative change: =
starting quantity starting quantity

Absolute change has the same units as the original quantity.


Relative change gives a percent change.
The starting quantity is called the base of the percent change.

The base (or whole) of a percent is very important. For example, while Nixon was president,
it was argued that marijuana was a “gateway” drug, claiming that 80% of marijuana smokers
went on to use harder drugs like cocaine. The problem is, this isn’t true. The true claim is
that 80% of harder drug users first smoked marijuana. The difference is one of base, or
which whole the part is compared to: 80% of marijuana smokers using hard drugs, vs. 80%
of hard drug users having smoked marijuana. These numbers are not equivalent. As it turns
out, only one in 2,400 marijuana users actually go on to use harder drugs 6 (this doesn’t mean
you should smoke marijuana or do harder drugs!).

Example 38: Change


There are about 75 QFC supermarkets in the U.S. Albertsons has about 215 stores. Compare
the size of the two companies.

When we make comparisons, we must ask first whether an absolute or relative comparison.
The absolute difference is 215 – 75 = 140. From this, we could say “Albertsons has 140
more stores than QFC.” However, if you wrote this in an article or paper, that number does
not mean much. The relative difference may be more meaningful. There are two different
relative changes we could calculate, depending on which store we use as the base or whole:
140
Using QFC as the base,  1.867 .
75
This tells us Albertsons is 186.7% larger than QFC.

140
Using Albertsons as the base,  0.651 .
215
This tells us QFC is 65.1% smaller than Albertsons.

Notice both of these are showing percent differences. We could also calculate the size of
215
Albertsons relative to QFC:  2.867 , which tells us Albertsons is 2.867 times the size
75
75
of QFC. Likewise, we could calculate the size of QFC relative to Albertsons:  0.349 ,
215
which tells us that QFC is 34.9% of the size of Albertsons.

6 http://tvtropes.org/pmwiki/pmwiki.php/Main/LiesDamnedLiesAndStatistics
Reasoning and Problem Solving 25

Example 39: Up and Down


Suppose a stock drops in value by 60% one week, then increases in value the next week by
75%. Is the value higher or lower than where it started?

To answer this question, suppose the value started at $100. After one week, the value
dropped by 60%:
$100 - $100(0.60) = $100 - $60 = $40.

Notice that this calculation can be completed in one step, if we recognize that a 60% drop
leaves 100% - 60% = 40% of the original amount left:
$100(0.40) = $40.

In the next week, notice that base of the percent has changed to the new value, $40.
Computing the 75% increase:
$40 + $40(0.75) = $40 + $30 = $70.

Notice that this calculation can also be completed in one step, if we recognize that a 75%
increase means we have 100% + 75% = 175% of the previous amount:
$40(1.75) = $70

$30
In the end, the stock is still $30 lower, or  30% lower, valued than it started.
$100

Try it Now 10:


The U.S. federal debt at the end of 2001 was $5.77 trillion, and grew to $6.20 trillion by the
end of 2002. At the end of 2005 it was $7.91 trillion, and grew to $8.45 trillion by the end of
20067. Calculate the absolute and relative increase for 2001-2002 and 2005-2006. Which
year saw a larger increase in federal debt?

Example 40: Interpreting Change


A Seattle Times article on high school graduation rates reported “The number of schools
graduating 60 percent or fewer students in four years – sometimes referred to as “dropout
factories” – decreased by 17 during that time period. The number of kids attending schools
with such low graduation rates was cut in half.”

a) Is the “decrease by 17” number a useful comparison?

This number is hard to evaluate, since we have no basis for judging whether this is a larger or
small change. If the number of “dropout factories” dropped from 20 to 3, that’d be a very
significant change, but if the number dropped from 217 to 200, that’d be less of an
improvement.

7 http://www.whitehouse.gov/sites/default/files/omb/budget/fy2013/assets/hist07z1.xls
Reasoning and Problem Solving 26

b) Considering the last sentence, can we conclude that the number of “dropout factories”
was originally 34?

The last sentence provides relative change which helps put the first sentence in perspective.
We can estimate that the number of “dropout factories” was probably previously around 34.
However, it’s possible that students simply moved schools rather than the school improving,
so that estimate might not be fully accurate.

Example 41: Up for Debate


In the 2004 vice-presidential debates, Edwards's claimed that US forces have suffered "90%
of the coalition casualties" in Iraq. Cheney disputed this, saying that in fact Iraqi security
forces and coalition allies "have taken almost 50 percent" of the casualties. Who is correct?

Without more information, it is hard for us to judge who is correct, but we can easily
conclude that these two percentages are talking about different things, so one does not
necessarily contradict the other. Edward’s claim was a percent with coalition forces as the
base of the percent, while Cheney’s claim was a percent with both coalition and Iraqi security
forces as the base of the percent. It turns out both statistics are in fact fairly accurate.

Try it Now 11:


In the 2012 presidential elections, one candidate argued that “the president’s plan will cut
$716 billion from Medicare, leading to fewer services for seniors,” while the other candidate
rebuts that “our plan does not cut current spending and actually expands benefits for seniors,
while implementing cost saving measures.” Are these claims in conflict, in agreement, or not
comparable because they’re talking about different things?

We’ll wrap up our review of percent with a couple cautions. First, when talking about a
change of quantities that are already measured in percent, we have to be careful in how we
describe the change.

Example 42: Percentage Points


A politician’s support increases from 40% of voters to 50% of voters. Describe the change.

We could describe this using an absolute change: 50%  40%  10% . Notice that since the
original quantities were percent, this change also has the units of percent. In this case, it is
best to describe this as an increase of 10 percentage points.

10%
In contrast, we could compute the percent change:  0.25  25% increase. This is the
40%
relative change, and we’d say the politician’s support has increased by 25%.

Lastly, we explore a problem that illustrations a caution against averaging with percent.
Reasoning and Problem Solving 27

Example 43: Averaging Percent


A basketball player scores on 40% of 2-point field goal attempts, and on 30% of 3-point of
field goal attempts. Find the player’s overall field goal percentage.

It is very tempting to average these values, and claim the overall average is 35%, but this is
likely not correct, since most players make many more 2-point attempts than 3-point
attempts. We don’t actually have enough information to answer the question.
Suppose the player attempted 200 2-point field goals and 100 3-point field goals. Then they
made 200(0.40) = 80 2-point shots and 100(0.30) = 30 3-point shots. Overall, they made 110
110
shots out of 300, for a  0.367 = 36.7% overall field goal percentage.
300

Proportions and Rates


If you wanted to power the city of Seattle using wind power, how many windmills would you
need to install? Questions like these can be answered using rates and proportions.

Rates
A rate is the ratio (fraction) of two quantities.
A unit rate is a rate with a denominator of one.

Example 44: Rate


Your car can drive 300 miles on a tank of 15 gallons. Express this as a rate.
300 miles 20 miles
Expressed as a rate, . We can divide to find a unit rate: , which we could
15 gallons 1gallon
miles
also write as 20 , or just 20 miles per gallon.
gallon

Proportion Equation
A proportion equation is an equation showing the equivalence of two rates or ratios.

Example 45: Proportion Equations


5 x
Solve the proportion  for the unknown value x.
3 6

This proportion is asking us to find a fraction with denominator 6 that is equivalent to the
5
fraction . We can solve this by multiplying both sides of the equation by 6, giving
3
5
x   6  10 .
3
We can also scale the first ratio into an equivalent fraction by multiplying the numerator and
denominator by 2, recognizing the 32 = 6:
5 2 10 x
  , thus x = 10.
3 2 6 6
Reasoning and Problem Solving 28

Example 46: Scaling


A map scale indicates that ½ inch on the map corresponds with 3 real miles. How many
1
miles apart are two cities that are 2 inches apart on the map?
4

map inches
We can set up a proportion by setting equal two rates, and introducing a
real miles
variable, x, to represent the unknown quantity – the mile distance between the cities.
1 1
map inch 2 map inches
2  4 Multiply both sides by x
3 miles x miles
and rewriting the mixed number
1
2 x  9 Multiply both sides by 3
3 4
1 27
x Multiply both sides by 2 (or divide by ½)
2 4
27 1
x  13 miles
2 2

We can also use scaling if we recognize that in the proportion we have:


1 9
map inch map inches
2 1 9 9
 4 and   , we can multiply the numerator and
3 miles x miles 2 2 4
denominator by 9/2:

1 9 9 9
map inch map inches map inches
2 2 4 4 27 1
   , thus x   13 miles .
3 miles 9 27 x miles 2 2
miles
2 2

Many proportion problems can also be solved using dimensional analysis, the process of
multiplying a quantity by rates to change the units.

Example 47: Dimensional Analysis


Your car can drive 300 miles on a tank of 15 gallons. How far can it drive on 40 gallons?

300 miles x miles


We could certainly answer this question using a proportion:  .
15 gallons 40 gallons
However, we earlier found that 300 miles on 15 gallons gives a rate of 20 miles per gallon.
If we multiply the given 40 gallon quantity by this rate, the gallons unit “cancels” and we’re
left with a number of miles:
20 miles 40 gallons 20 miles
40 gallons     800 miles
gallon 1 gallon
Reasoning and Problem Solving 29

Notice if instead we were asked “how many gallons are needed to drive 50 miles?” we could
answer this question by inverting the 20 mile per gallon rate so that the miles unit cancels and
we’re left with gallons:
1gallon 50 miles 1gallon 50 gallons
50 miles      2.5 gallons
20 miles 1 20 miles 20

Dimensional analysis can also be used to do unit conversions. Here are some unit
conversions for reference.
Unit Conversions
Length
1 foot (ft.) = 12 inches (in) 1 yard (yd.) = 3 feet (ft.)
1 mile = 5,280 feet
1000 millimeters (mm) = 1 meter (m) 100 centimeters (cm) = 1 meter
1000 meters (m) = 1 kilometer (km) 2.54 centimeters (cm) = 1 inch

Weight and Mass


1 pound (lb.) = 16 ounces (oz.) 1 ton = 2000 pounds
1000 milligrams (mg) = 1 gram (g) 1000 grams = 1kilogram (kg)
1 kilogram = 2.2 pounds (on earth)

Capacity
1 cup = 8 fluid ounces (fly oz.)* 1 pint = 2 cups
1 quart = 2 pints = 4 cups 1 gallon = 4 quarts = 16 cups
1000 milliliters (ml) = 1 liter (L)
*Fluid ounces are a capacity measurement for liquids. 1 fluid ounce ≈ 1 ounce (weight) for water only.

Example 48: Conversions via Dimensional Analysis


A bicycle is traveling at 15 miles per hour. How many feet will it cover in 20 seconds?

To answer this question, we need to convert 20 seconds into feet. If we know the speed of
the bicycle in feet per second, this question would be simpler. Since we don’t, we will need
to do additional unit conversions. We will need to know that 5280 ft. = 1 mile. We might
start by converting the 20 seconds into hours:
1minute 1hour 1
20 seconds    hour Now we can multiply by the 15 miles/hr.
60 seconds 60 minutes 180
1 15 miles 1
hour   mile Now we can convert to feet
180 1hour 12
1 5280 feet
mile   440 feet
12 1mile

We could have also done this entire calculation in one long set of products:
1minute 1hour 15 miles 5280 feet
20 seconds      440 feet
60 seconds 60 minutes 1hour 1mile
Reasoning and Problem Solving 30

Try it Now 12:


A 1000 foot spool of bare 12-gauge copper wire weighs 19.8 pounds. How much will 18
inches of the wire weigh, in ounces?

Notice that with the miles per gallon example, if we double the miles driven, we double the
gas used. Likewise, with the map distance example, if the map distance doubles, the real-life
distance doubles. This is a key feature of proportional relationships, and one we must
confirm before assuming two things are related proportionally.

Example 49: Proportions with Area


Suppose you’re tiling the floor of a 10 ft. by 10 ft. room, and find that 100 tiles will be
needed. How many tiles will be needed to tile the floor of a 20 ft. by 20 ft. room?

In this case, while the width the room has doubled, the area has quadrupled. Since the
number of tiles needed corresponds with the area of the floor, not the width, 400 tiles will be
needed. We could find this using a proportion based on the areas of the rooms:
100 tiles n tiles
2

100 ft 400 ft 2

Other quantities just don’t scale proportionally at all.

Example 50: Limits of Scaling


Suppose a small company spends $1000 on an advertising campaign, and gains 100 new
customers from it. How many new customers should they expect if they spend $10,000?

While it is tempting to say that they will gain 1000 new customers, it is likely that additional
advertising will be less effective than the initial advertising. For example, if the company is
a hot tub store, there are likely only a fixed number of people interested in buying a hot tub,
so there might not even be 1000 people in the town who would be potential customers.

Sometimes when working with rates, proportions, and percent, the process can be made more
challenging by the magnitude of the numbers involved. Sometimes, large numbers are just
difficult to comprehend.

Example 51: Comparisons with Large Numbers


Compare the 2010 U.S. military budget of $683.7 billion to other quantities.

Here we have a very large number, about $683,700,000,000 written out. Of course,
imagining a billion dollars is very difficult, so it can help to compare it to other quantities.

If that amount of money was used to pay the salaries of the 1.4 million Walmart employees
in the U.S., each would earn over $488,000.

There are about 300 million people in the U.S. The military budget is about $2,200 per
person. If you were to put $683.7 billion in $100 bills, and count out 1 per second, it would
take 216 years to finish counting it.
Reasoning and Problem Solving 31

Example 52: Comparisons per Capita


Compare the electricity consumption per capita in China to the rate in Japan.

To address this question, we will first need data. From the CIA 8 website we can find the
electricity consumption in 2011 for China was 4,693,000,000,000 KWH (kilowatt-hours), or
4.693 trillion KWH, while the consumption for Japan was 859,700,000,000, or 859.7 billion
KWH. To find the rate per capita (per person), we will also need the population of the two
countries. From the World Bank 9, we can find the population of China is 1,344,130,000, or
1.344 billion, and the population of Japan is 127,817,277, or 127.8 million.

Computing the consumption per capita for each country:


4,693,000,000,000 KWH
China: ≈ 3491.5 KWH per person
1,344,130,000 people
859,700,000,000 KWH
Japan: ≈ 6726 KWH per person
127 ,817 ,277 people

While China uses more than 5 times the electricity of Japan overall, because the population
of Japan is so much smaller, it turns out Japan uses almost twice the electricity per person
compared to China.

Problem Solving and Estimating


Finally, we will bring together the mathematical tools we’ve reviewed, and use them to
approach more complex problems. In many problems, it is tempting to take the given
information, plug it into whatever formulas you have handy, and hope that the result is what
you were supposed to find. Chances are, this approach has served you well in other math
classes.

This approach does not work well with real life problems. Instead, problem solving is best
approached by first starting at the end: identifying exactly what you are looking for. From
there, you then work backwards, asking “what information and procedures will I need to find
this?” Very few interesting questions can be answered in one mathematical step; often times
you will need to chain together a solution pathway, a series of steps that will allow you to
answer the question.

Problem Solving Process


1. Identify the question you’re trying to answer.
2. Work backwards, identifying the information you will need and the relationships
you will use to answer that question.
3. Continue working backwards, creating a solution pathway.
4. If you are missing necessary information, look it up or estimate it. If you have
unnecessary information, ignore it.
5. Solve the problem, following your solution pathway.

8 https://www.cia.gov/library/publications/the-world-factbook/rankorder/2042rank.html
9 http://data.worldbank.org/indicator/SP.POP.TOTL
Reasoning and Problem Solving 32

In most problems we work, we will be approximating a solution, because we will not have
perfect information. We will begin with a few examples where we will be able to
approximate the solution using basic knowledge from our lives.

Example 53: Heart Beats


How many times does your heart beat in a year?

This question is asking for the rate of heart beats per year. Since a year is a long time to
measure heart beats for, if we knew the rate of heart beats per minute, we could scale that
quantity up to a year. So the information we need to answer this question is heart beats per
minute. This is something you can easily measure by counting your pulse while watching a
clock for a minute.
Suppose you count 80 beats in a minute. To convert this beats per year:

80 beats 60 minutes 24 hours 365 days


    42,048,000 beats per year
1 minute 1 hour 1 day 1 year

Example 54: Paper weight


How thick is a single sheet of paper? How much does it weigh?

While you might have a sheet of paper handy, trying to measure it would be tricky. Instead
we might imagine a stack of paper, and then scale the thickness and weight to a single sheet.
If you’ve ever bought paper for a printer or copier, you probably bought a ream, which
contains 500 sheets. We could estimate that a ream of paper is about 2 inches thick and
weighs about 5 pounds. Scaling these down,
2 inches 1 ream
 = 0.004 inches per sheet
ream 500 pages
5 pounds 1 ream
 = 0.01 pounds per sheet, or 0.16 ounces per sheet.
ream 500 pages

Example 55: Muffins!


A recipe for zucchini muffins states that it yields 12 muffins, with 250 calories per muffin.
You instead decide to make mini-muffins, and the recipe yields 20 muffins. If you eat 4,
how many calories will you consume?

There are several possible solution pathways to answer this question. We will explore one.

To answer the question of how many calories 4 mini-muffins will contain, we would want to
know the number of calories in each mini-muffin. To find the calories in each mini-muffin,
we could first find the total calories for the entire recipe, then divide it by the number of
mini-muffins produced. To find the total calories for the recipe, we could multiply the
calories per standard muffin by the number per muffin. Notice that this produces a multi-step
solution pathway. It is often easier to solve a problem in small steps, rather than trying to
find a way to jump directly from the given information to the solution.
Reasoning and Problem Solving 33

We can now execute our plan:


250 calories
12 muffins  = 3000 calories for the whole recipe
muffin
3000 calories
gives 150 calories per mini-muffin
20 mini  muffins
150 calories
4 mini muffins  totals 600 calories consumed.
mini  muffin

Example 56: On Deck


You need to replace the boards on your deck. About how much will the materials cost?

There are two approaches we could take to this problem: 1) estimate the number of boards
we will need and find the cost per board, or 2) estimate the area of the deck and find the
approximate cost per square foot for deck boards. We will take the latter approach.

For this solution pathway, we will be able to answer the question if we know the cost per
square foot for decking boards and the square footage of the deck. To find the cost per
square foot for decking boards, we could compute the area of a single board, and divide it
into the cost for that board. We can compute the square footage of the deck using geometric
formulas. So first we need information: the dimensions of the deck, and the cost and
dimensions of a single deck board.

Suppose that measuring the deck, it is rectangular, measuring 16 ft. by 24 ft., for a total area
of 384 ft2.
From a visit to the local home store, you find that an 8 foot by 4 inch cedar deck board costs
about $7.50. The area of this board, doing the necessary conversion from inches to feet, is:
1 foot
8 feet  4 inches  = 2.667 ft2. The cost per square foot is then
12 inches
$7.50
2
= $2.8125 per ft2.
2.667 ft

This will allow us to estimate the material cost for the whole 384 ft 2 deck
$2.8125
$384 ft 2  = $1080 total cost.
ft 2

Of course, this cost estimate assumes that there is no waste, which is rarely the case. It is
common to add at least 10% to the cost estimate to account for waste.

Example 57: Comparing Cars


Is it worth buying a Hyundai Sonata hybrid instead the regular Hyundai Sonata?

To make this decision, we must first decide what our basis for comparison will be. For the
purposes of this example, we’ll focus on fuel and purchase costs, but environmental impacts
and maintenance costs are other factors a buyer might consider.
Reasoning and Problem Solving 34

It might be interesting to compare the cost of gas to run both cars for a year. To determine
this, we will need to know the miles per gallon both cars get, as well as the number of miles
we expect to drive in a year. From that information, we can find the number of gallons
required from a year. Using the price of gas per gallon, we can find the running cost.

From Hyundai’s website, the 2013 Sonata will get 24 miles per gallon (mpg) in the city, and
35 mpg on the highway. The hybrid will get 35 mpg in the city, and 40 mpg on the highway.

An average driver drives about 12,000 miles a year. Suppose that you expect to drive about
75% of that in the city, so 9,000 city miles a year, and 3,000 highway miles a year.

We can then find the number of gallons each car would require for the year.

Sonata:
1 gallon 1 gallon
9000 city miles   3000 hightway miles  = 460.7 gallons
24 city miles 35 highway miles
Hybrid:
1 gallon 1 gallon
9000 city miles   3000 hightway miles  = 332.1 gallons
35 city miles 40 highway miles

If gas in your area averages about $3.50 per gallon, we can use that to find the running cost:

$3.50
Sonata: 460.7 gallons  = $1612.45
gallon
$3.50
Hybrid: 332.1 gallons  = $1162.35
gallon

$450.10
The hybrid will save $450.10 a year. The gas costs for the hybrid are about =
$1612.45
0.279 = 27.9% lower than the costs for the standard Sonata.

While both the absolute and relative comparisons are useful here, they still make it hard to
answer the original question, since “is it worth it” implies there is some tradeoff for the gas
savings. Indeed, the hybrid Sonata costs about $25,850, compared to the base model for the
regular Sonata, at $20,895.

To better answer the “is it worth it” question, we might explore how long it will take the gas
savings to make up for the additional initial cost. The hybrid costs $4965 more. With gas
savings of $451.10 a year, it will take about 11 years for the gas savings to make up for the
higher initial costs.

We can conclude that if you expect to own the car 11 years, the hybrid is indeed worth it. If
you plan to own the car for less than 11 years, it may still be worth it, since the resale value
of the hybrid may be higher, or for other non-monetary reasons. This is a case where math
can help guide your decision, but it can’t make it for you.
Reasoning and Problem Solving 35

Try it Now 13:


If traveling from Seattle, WA to Spokane WA for a three-day conference, does it make more
sense to drive or fly?

Problem Solving Extension: Taxes


Governments collect taxes to pay for the services they provide. In the United States, federal
income taxes help fund the military, the environmental protection agency, and thousands of
other programs. Property taxes help fund schools. Gasoline taxes help pay for road
improvements. While very few people enjoy paying taxes, they are necessary to pay for the
services we all depend upon.

Taxes can be computed in a variety of ways, but are typically computed as a percentage of a
sale, of one’s income, or of one’s assets.

Example 58: Sales Tax Revisited


The sales tax rate in a city is 9.3%. How much sales tax will you pay on a $140 purchase?

The sales tax will be 9.3% of $140. To compute this, we multiply $140 by the percent
written as a decimal: $140(0.093) = $13.02.

When taxes are not given as a fixed percentage rate, sometimes it is necessary to calculate
the effective rate.

Effective rate
The effective tax rate is the equivalent percent rate of the tax paid out of the dollar
amount the tax is based on.

Example 59: Effective Tax Rate


Joan paid $3,200 in property taxes on her house valued at $215,000 last year. What is the
effective tax rate?

We can compute the equivalent percentage: 3200/215000 = 0.01488, or about 1.49%


effective rate.

Taxes are often referred to as progressive, regressive, or flat.

Tax categories
A flat tax, or proportional tax, charges a constant percentage rate.
A progressive tax increases the percent rate as the base amount increases.
A regressive tax decreases the percent rate as the base amount increases.
Reasoning and Problem Solving 36

Example 60: US Income Tax


The United States federal income tax on earned wages is an example of a progressive tax.
People with a higher wage income pay a higher percent tax on their income.

For a single person in 2011, adjusted gross income (income after deductions) under $8,500
was taxed at 10%. Income over $8,500 but under $34,500 was taxed at 15%.

A person earning $10,000 would pay 10% on the portion of their income under $8,500, and
15% on the income over $8,500, so they’d pay:
8500(0.10) = 850 10% of $8500
1500(0.15) = 225 15% of the remaining $1500 of income
Total tax: = $1075

The effective tax rate paid is 1075/10000 = 10.75%

A person earning $30,000 would also pay 10% on the portion of their income under $8,500,
and 15% on the income over $8,500, so they’d pay:
8500(0.10) = 850 10% of $8500
21500(0.15) = 3225 15% of the remaining $21500 of income
Total tax: = $4075

The effective tax rate paid is 4075/30000 = 13.58%.

Notice that the effective rate has increased with income, showing this is a progressive tax.

Example 61: Gasoline Tax


A gasoline tax is a flat tax when considered in terms of consumption, a tax of, say, $0.30 per
gallon is proportional to the amount of gasoline purchased. Someone buying 10 gallons of
gas at $4 a gallon would pay $3 in tax, which is $3/$40 = 7.5%. Someone buying 30 gallons
of gas at $4 a gallon would pay $9 in tax, which is $9/$120 = 7.5%, the same effective rate.

However, in terms of income, a gasoline tax is often considered a regressive tax. It is likely
that someone earning $30,000 a year and someone earning $60,000 a year will drive about
the same amount. If both pay $60 in gasoline taxes over a year, the person earning $30,000
has paid 0.2% of their income, while the person earning $60,000 has paid 0.1% of their
income in gas taxes.

Try it Now 14:


A sales tax is a fixed percentage tax on a person’s purchases. Is this a flat, progressive, or
regressive tax?

Income Taxation
Many people have proposed various revisions to the income tax collection in the United
States. Some, for example, have claimed that a flat tax would be fairer. Others call for
Reasoning and Problem Solving 37

revisions to how different types of income are taxed, since currently investment income is
taxed at a different rate than wage income.

The following two projects will allow you to explore some of these ideas and draw your own
conclusions.

Project 1: Flat tax, Modified Flat Tax, and Progressive Tax.


Imagine the country is made up of 100 households. The federal government needs to collect
$800,000 in income taxes to be able to function. The population consists of 6 groups:

Group A: 20 households that earn $12,000 each


Group B: 20 households that earn $29,000 each
Group C: 20 households that earn $50,000 each
Group D: 20 households that earn $79,000 each
Group E: 15 households that earn $129,000 each
Group F: 5 households that earn $295,000 each

This scenario is roughly proportional to the actual United States population and tax needs.
We are going to determine new income tax rates.

The first proposal we’ll consider is a flat tax – one where every income group is taxed at the
same percentage tax rate.
1) Determine the total income for the population (all 100 people together)
2) Determine what flat tax rate would be necessary to collect enough money

The second proposal we’ll consider is a modified flat-tax plan, where everyone only pays
taxes on any income over $20,000. So, everyone in group A will pay no taxes. Everyone in
group B will pay taxes only on $9,000.
3) Determine the total taxable income for the whole population
4) Determine what flat tax rate would be necessary to collect enough money in this
modified system

5) Complete this table for both the plans


Flat Tax Plan Modified Flat Tax Plan
Group Income per Income tax Income after Income tax Income
household per taxes per after taxes
household household
A $12,000
B $29,000
C $50,000
D $79,000
E $129,000
F $295,000
Reasoning and Problem Solving 38

The third proposal we’ll consider is a progressive tax, where lower income groups are taxed
at a lower percent rate, and higher income groups are taxed at a higher percent rate. For
simplicity, we’re going to assume that a household is taxed at the same rate on all their
income.

6) Set progressive tax rates for each income group to bring in enough money. There is no
one right answer here – just make sure you bring in enough money!
Group Income per Tax rate Income tax Total tax Income after
household (%) per collected for all taxes per
household households household
A $12,000
B $29,000
C $50,000
D $79,000
E $129,000
F $295,000
This better total
to $800,000

7) Discretionary income is the income people have left over after paying for necessities like
rent, food, transportation, etc. The cost of basic expenses does increase with income,
since housing and car costs are higher, however usually not proportionally. For each
income group, estimate their essential expenses, and calculate their discretionary income.
Then compute the effective tax rate for each plan relative to discretionary income rather
than income.

Group Income Discretionary Effective Effective rate, Effective rate,


per Income rate, flat modified progressive
household (estimated)
A $12,000
B $29,000
C $50,000
D $79,000
E $129,000
F $295,000

8) Which plan seems the most fair to you? Which plan seems the least fair to you? Why?
Reasoning and Problem Solving 39

Project 2: Calculating Taxes.


Visit www.irs.gov, and download the most recent version of forms 1040, and schedules A, B,
C, and D.

Scenario 1: Calculate the taxes for someone who earned $60,000 in standard wage income
(W-2 income), has no dependents, and takes the standard deduction.

Scenario 2: Calculate the taxes for someone who earned $20,000 in standard wage income,
$40,000 in qualified dividends, has no dependents, and takes the standard deduction.
(Qualified dividends are earnings on certain investments such as stocks.)

Scenario 3: Calculate the taxes for someone who earned $60,000 in small business income,
has no dependents, and takes the standard deduction.

Based on these three scenarios, what are your impressions of how the income tax system
treats these different forms of income (wage, dividends, and business income)?

Scenario 4: To get a more realistic sense for calculating taxes, you’ll need to consider
itemized deductions. Calculate the income taxes for someone with the income and
expenses listed below.

Married with 2 children, filing jointly


Wage income: $50,000 combined
Paid sales tax in Washington State
Property taxes paid: $3200
Home mortgage interest paid: $4800
Charitable gifts: $1200
Patterns and Growth 1

“What’s predictable is that these measures [of technology] grow exponentially, not linearly,
though our intuition about the future is linear, which is hard-wired in our brains. This makes
a remarkable difference. Thirty steps linearly gets you to 30, whereas 30 steps exponentially
(2, 4, 8, 16. . .) gets you to a billion.”—Ray Kurzweil1

Growth Models
Ray Kurzweil is an inventor and entrepreneur who predicts a time in the future when
technology will allow humans to live forever. He bases his vision on the leaps and bounds in
the advancement of technology. Whether or not you have a desire to be immortal, you
certainly feel the impact of growth (and decay) in your life.

Let’s compare two different types of growth: linear and exponential. Assuming you start
with nothing, at 0, the linear pattern that would get you to 30 after 30 steps would look like:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, …, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30

A quick examination of the pattern shows that it is increasing by a constant difference of 1


(it adds 1 to get the next term). If we were to graph these values on an x-y coordinate plane,
where x is the step number or count, and y is the term value, it would look something like:

Term Term
count value Linear Steps
0 0 30

1 1 25
2 2
20
3 3
Term value

4 4 15
… …
10
26 26
27 27 5
28 28 0
29 29 0 5 10 15 20 25 30
30 30 Term count

We can see the data creates a straight line pattern, hence the name linear growth. The
sequence of numbers with a linear growth pattern is also called an arithmetic sequence.

If we look at 30 steps exponentially, following the pattern that Kurzweil suggests, starting at
1 and doubling for 30 steps, it looks like:
1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, …, 67108864, 134217728,
268435456, 536870912, 1073741824
That last number after the 30th step is 1,073,741,824, indeed a bit over 1 billion.

A quick examination of this pattern shows that the difference is not constant (it goes up by 1,
then 2, then 4, then 8, etc.). The growth itself is growing as each term value is doubled

1
Kurzweil, R. (2010, December 28). Technology 25 Years Hence. The New York Times.
© David Lippman, edited by Laurel Clifford Creative Commons BY-SA
2

(multiplied by 2) to get the next term value. If we graph these values versus their count, it
looks something like this:

Term Term Exponential Steps


count value
0 1
1,000,000,000
1 2
2 4 800,000,000
3 8 Term value
4 16 600,000,000
5 32
6 64 400,000,000

… …
200,000,000
26 67108864
27 134217728 0
28 268435456 0 5 10 15 20 25 30
29 536870912 Term count
30 1073741824

The numbers grow so large that graphing them becomes difficult. What looks like a flat line
at beginning is only such because the numbers at the end are so large in comparison. If we
graphed the first 6 terms of both patterns on the same grid, we can visually compare linear
and exponential growth:

Linear vs. Exponential


80
Expon.
60 (exponential)
Term value

40 Linear
(linear)
20

0
0 1 2 3 4 5 6
Term count

Notice how the exponential growth outpaces the linear growth.

So if technology is growing exponentially, is Kurzweil right? Is immortality around the


corner? Are there limits to growth that we cannot predict or see? Populations of people,
animals, bacteria, and technology are growing all around us. By understanding how things
change, we can better understand what to expect in the future. In this chapter, we focus on
time-dependent change, look for patterns in change, and generalize these patterns using
algebraic expressions and formulas.
Patterns and Growth 3

Linear (Arithmetic) Growth


Marcy collects antique soda bottles. Her collection currently contains 437 bottles. Every
year, she budgets enough money to buy 32 new bottles. How many bottles she will have in 5
years, and how long it will take for her collection to reach 1000 bottles?

While you could probably solve both of these questions without an equation or formal
mathematics, we are going to formalize our approach to this problem to provide a means to
answer more complicated questions.

Working informally, we could make a list of Marcy’s bottles, and each year’s purchases:

Marcy starts with: 437 bottles, she buys 32 bottles,


After 1 year, she has: 437 bottles + 32 bottles = 469 bottles
After 2 years, she has: 469 bottles + 32 bottles = 501 bottles
After 3 years, she has: 501 bottles + 32 bottles = 533 bottles
After 4 years, she has: 533 bottles + 32 bottles = 565 bottles
After 5 years, she has: 565 bottles + 32 bottles = 597 bottles Vintage Soda Bottles2

At this point, we know Marcy would have 597 bottles after 5 years. We still need to answer
the second question, how long it will take for her collection to reach 1000 bottles. We’re
about halfway to 1000 after 2 years (501 bottles), but we can’t just double the year, as at 4
years we’re only at 565. We could continue the process of adding 32 bottles each year, but
that would take some time (and tedium). Using algebra to generalize the process we have
used can lead us to a more efficient solution.

Suppose that Pn represents the number, or population, of bottles Marcy has after n years.
Notice that the subscript or index number is n, and indicates how many years she has been
collecting bottles. So P0 would represent the number of bottles now (or to being with),
because she has not collected any bottles yet (she will do that during the upcoming year), and
thus n = 0. P1 would represent the number of bottles after 1 year of collecting, P2 would
represent the number of bottles after 2 years, and so on. Using our previous work, we could
describe how Marcy’s bottle collection is changing using this Pn notation:

Marcy starts with: 437 bottles P0 = 437

After 1 year, she has: 437 bottles + 32 bottles = 469 bottles P1 = 437 + 32 = 469
P1 = P0 + 32 = 469
After 2 years, she has: 469 bottles + 32 bottles = 501 bottles P2 = 469 + 32 = 501
P2 = P1 + 32 = 501
After 3 years, she has: 501 bottles + 32 bottles = 533 bottles P3 = 501 + 32 = 533
P3 = P2 + 32 = 533
After 4 years, she has: 533 bottles + 32 bottles = 565 bottles P4 = 533 + 32 = 565
P4 = P3 + 32 = 565
After 5 years, she has: 565 bottles + 32 bottles = 597 bottles P5 = 565 + 32 = 597
P5 = P4 + 32 = 597

2
Photo by Andee Duncan, https://www.flickr.com/photos/windysydney/3969688528/ license CC BY-NC 2.0
4

Notice how the next year’s calculation relies on the previous year’s calculation: we need to
know many bottles she had last year to figure out how many she’ll have for the next year,
and we are always adding 32 bottles to that previous year’s amount. We can generalize this
as:

P0 = 437 (P0 is the amount of bottles in the starting year)


Pn = Pn-1 + 32 (Pn is the amount of bottles in a particular year, and Pn-1 is the
amount of bottles for the previous year)

This description called a recursive relationship. A recursive relationship is a formula which


relates the next value in a sequence to the previous values. Here, the number of bottles in
year n, indicated by Pn, can be found by adding 32 to the number of bottles in the previous
year, Pn-1.

Recursive relationships are often the most natural and intuitive way we describe a particular
pattern. For example, if you saw the sequence 437, 469, 501, 533, 565, 597, and were asked
to describe the pattern, what would you say? Most likely you would say the numbers start at
437 (P0 = 437) and increase by 32 to get the next term (Pn = Pn-1 + 32). Although natural
and intuitive, recursive thinking can have its limitations. If we asked you to predict P30,
which is 30 terms after the starting term, you would need to know P30-1 = P29, the term that is
29 terms after the starting term, and to find P29, you would need to know P29-1 = P28, the term
that is 28 terms after the starting term, and so on.

While recursive relationships are excellent for describing simply and cleanly how a quantity
is changing, they are not convenient for making predictions or solving problems that stretch
far into the future. Instead, a general or explicit form for the relationship is preferred. An
explicit equation allows us to calculate Pn directly from knowing n, the number of terms
after the starting term, without needing to know Pn-1, the previous term. While you may
already be able to guess the explicit equation, let us derive it from the recursive formula. We
can do so by selectively not simplifying as we go:

Marcy starts with: 437 bottles P0 = 437


After 1 year, she has: 437 bottles + 32 bottles P1 = 437 + 32
She has added 32 bottles 1 time since she started. P1 = 437 + 32(1)
After 2 years, she has: 437 bottles + 32 bottles + 32 P2 = 437 + 32+ 32
bottles P2 = 437 + 32(2)
She has added 32 bottles 2 times since she started.
After 3 years, she has: 437 bottles + 32 bottles + 32 P3 = 437 + 32+ 32 + 32
bottles + 32 bottles. She has added 32 bottles 3 times P3 = 437 + 32(3)
since she started.
After 4 years, she has: 437 bottles + 32 bottles + 32 P4 = 437 + 32+ 32 + 32 + 32
bottles + 32 bottles + 32 bottles. She has added 32 P4 = 437 + 32(4)
bottles 4 times since she started.
After 5 years, she has: 437 bottles + 32 bottles + 32 P5 = 437 + 32+ 32 + 32 + 32 + 32
bottles + 32 bottles + 32 bottles + 32 bottles. She has P5 = 437 + 32(5)
added 32 bottles 5 times since she started.

You can probably see the pattern now, and generalize that Pn = 437 + 32(n) = 437 + 32n.
Patterns and Growth 5

Using this equation, we can calculate how many bottles she’ll have after 5 years (n = 5):

Pn = 437 + 32n
P5 = 437 + 32(5) Substitute 5 for n
= 437 + 160 = 597 bottles. Simplify using the order of operations

We can now also solve for when the collection will reach 1000 bottles by substituting in
1000 for Pn and solving for n:

Pn = 437 + 32n Replace Pn with 1000, solve for n


1000 = 437 + 32n Subtract 437 from both sides
563 = 32n Divide both sides by 32
563/32 = n = 17.59375

So Marcy will reach 1000 bottles in 18 years. Bottles in Marcy's Collection


1100
In the previous example, Marcy’s collection 1000

Total Bottles
grew by the same number of bottles every year. 900
800
This constant change is the defining 700
characteristic of linear growth. Plotting the 600
values we calculated for Marcy’s collection, we 500
400
can see the values form a straight line, the
0 2 4 6 8 10 12 14 16 18
shape of linear or arithmetic growth.
Years from now

Linear (Arithmetic) Growth


If a quantity starts at size P0 and grows by d every time period, then the quantity after n
time periods can be determined using either of these relations:

Recursive form: (predicts the next term, Pn, from the previous term, Pn-1)
Pn  Pn1  d In words: to get Pn, we add d to the previous term, Pn-1

Explicit form: (predicts any term, Pn, from knowing n, how many terms after the start)
Pn  P0  dn In words: to get Pn, we start at P0 and add d for n times

In this equation, d represents the common difference, the amount that the population
changes each time n increases by 1

Connection to prior learning – slope and intercept


You may recognize the common difference, d, in our linear equation as slope. In fact, the
entire explicit equation should look familiar – it is the same linear equation you learned in
algebra, probably stated as y = mx + b, the slope-intercept form of a line.

In the standard algebraic equation y = mx + b, (0, b) is the y-intercept, and b is the y-value
when x is zero. In the form of the equation we’re using, we are using P0 to represent that
initial amount, b = P0.
6

In the y = mx + b equation, recall that m is the slope. You might remember this as “rise over
run” or the change in y divided by the change in x. Either way, it represents the same thing
as the common difference, d, we are using – the amount the output Pn changes when the
input n increases by 1, m = d.

Reorder our equation, and you can see the correspondence:

y = mx + b
Pn = dn + P0

The equations y = mx + b and Pn = P0 + dn are equivalent in linear (arithmetic) growth.

Note: We have chosen to call our starting value P0. If we counted our starting value as the
first value, or P1, indexing from n = 1 instead of n = 0, then our formula would look like:

Pn = P1 + d(n – 1) since our “n” would be counting from 1 rather than 0.

Some texts use this alternative form or the similar Pn = P1 + (n – 1)d form. This form is
similar in usage to the point-slope form of a line, written as y = y1 + m(x – x1).

Example 1: Elk Population


The population of elk in a national forest was measured to be
12,000 in 2003, and was measured again to be 15,000 in 2007.
If the population continues to grow linearly at this rate, what
will the elk population be in 2014?

To begin, we need to define how we’re going to measure n.


Remember that P0 is the population when n = 0. Since we
already know the population in 2003, let us define n = 0 to be
the year 2003. Then P0 = 12,000, and we will need to
remember that our n values will be counting years after 2003. Elk in Yellowstone NP3

Next we need to find d. Remember d is the growth per time period, in this case growth per
year. Between the two measurements, the population grew by 15,000 elk – 12,000 elk =
3,000 elk, but it took 2007 – 2003 = 4 years to grow that much. To find the growth per year,
we can divide: 3000 elk /4 years = 750 elk in 1 year.

Alternatively, you can use the slope formula from algebra to determine the common
difference, noting that the population is the output of the formula, and time is the input.

change in output 15000 elk  12000 elk 3000 elk


d  slope     750 elk/year
change in input 2007  2003 4 years

3
Photo by Brocken Inaglory,
http://en.wikipedia.org/wiki/Elk#mediaviewer/File:OPAL_TERRACE_with_elks.jpg, license CC BY-SA 3.0
Patterns and Growth 7

We can now write our equation in whichever form is preferred, using P0 = 12000, and d =
750:

Recursive form:
P0 = 12,000 In words: the population starts with 12,000 elk
Pn = Pn-1 + 750 add 750 elk to get from one year to the next
Explicit form:
Pn = 12,000 + 750n In words: the population starts with 12,000 elk
and adds 750 elk per year after the start

To answer the question and predict the elk population in 2014, we notice that the year 2014 is
n = 11, since 2014 is 11 years after 2003 (2014 – 2003 = 11). Using the explicit form,

Pn = 12,000 + 750n Substitute 11 for n


P11 = 12,000 + 750(11) = 20,250 elk Simplify using the order of operations

Example 2: Gasoline Consumption


Gasoline consumption in the US has been increasing steadily. Consumption data from 1992
to 2004 is shown below4. Find a model for this data, and use it to predict consumption in
2016. If the trend continues, when will consumption reach 200 billion gallons?

Year ‘92 ‘93 ‘94 ‘95 ‘96 ‘97 ‘98 ‘99 ‘00 ‘01 ‘02 ‘03 ‘04
Consumption
(billions of
gallons) 110 111 113 116 118 119 123 125 126 128 131 133 136

Plotting this data, it appears to have an Gas Consumption


approximately linear relationship: 140
Gallons of gas in billions

While there are more advanced statistical 135


130
techniques that can be used to find an equation
125
to model the data, to get an idea of what is 120
happening, we can find an equation by using 115
two pieces of the data – perhaps the data from 110
1993 and 2003. 105
100
1992 1994 1996 1998 2000 2002 2004
Letting n = 0 correspond with 1993, as it’s our
Year
first given year, makes P0 = 111 billion
gallons, and we will remember that our n
counts the number of years after 1993.

To find d, we need to know how much the gas consumption increased each year, on average.
From 1993 to 2003 the gas consumption increased from 111 billion gallons to 133 billion
gallons, a total change of 133 – 111 = 22 billion gallons, over 10 years. This gives us an
average change of 22 billion gallons / 10 year = 2.2 billion gallons per year.

4
http://www.bts.gov/publications/national_transportation_statistics/2005/html/table_04_10.html
8

Equivalently,
change in output 133 billion  111 billion 22 billion
d  slope     2.2 billion gallons/year
change in input 2003  1993 10 years

We can now write our equation in whichever form is preferred, using P0 = 111, d = 2.2

Recursive form: Gasoline Consumption


P0 = 111
140
Pn = Pn-1 + 2.2

Gallons of gas in billions


135
130
Explicit form: 125
Pn = 111 + 2.2n 120
115
110
We can calculate and plot values using this explicit
105
form, represented by the trend line connecting 100
these values. Compare the trend line with the 1992 1994 1996 1998 2000 2002 2004
original data—notice how close they are! Year

We can use the explicit model to make predictions about the future, assuming that the
previous trend continues unchanged. To predict the gasoline consumption in 2016, we first
find n. Since we’re counting years after 1993, n = 23, since 2016 – 1993 = 23 years later.

P23 = 111 + 2.2(23) = 161.6 Substitute 23 for n and simplify

Our model predicts that the US will consume 161.6 billion gallons of gasoline in 2016 if the
current trend continues.

To find when the consumption will reach 200 billion gallons, we would set Pn = 200, and
solve for n:

Pn = 111 + 2.2n Replace Pn with 200 and solve for n


200 = 111 + 2.2n Subtract 111 from both sides
89 = 2.2n Divide both sides by 2.2
40.4545… = n

This tells us that consumption will reach 200 billion gallons of gas about 40 years after 1993,
which would be in the year 2033 (or more precisely, 2033.4545…).

Example 3: Gym Membership


The cost, in dollars, of a gym membership for n months can be described by the explicit
equation Pn = 70 + 30n. What does this equation tell us?

The value for P0 in this equation is 70, so the initial starting cost is $70. This tells us that
there must be an initiation or start-up fee of $70 to join the gym.
The value for d in the equation is 30, so the cost increases by $30 each month. This tells us
that the monthly membership fee for the gym is $30 a month.
Patterns and Growth 9

Try it Now 1:
The number of stay-at-home fathers in Canada has been growing steadily5. While the trend
is not perfectly linear, it is fairly linear. Use the data from 1976 and 2010 to find an explicit
formula for the number of stay-at-home fathers, then use it to predict the number in 2020.

Year 1976 1984 1991 2000 2010


Number of stay-at-home fathers 20,610 28,725 43,530 47,665 53,555

When good models go bad


When using mathematical models to predict future behavior, it is important to keep in mind
that very few trends will continue indefinitely.

Example 4: Growing Height


Suppose a four year old boy is currently 39 inches tall, and you are told to expect him to
grow 2.5 inches a year.

We can set up a growth model, with n = 0 corresponding to 4 years old (we will be counting
age from 4 years old for n), and the boy is 39 inches tall, so P0 = 39. We also are given the
growth rate, d = 2.5.

Recursive form:
P0 = 39 In words: the 4 year old’s height starts at 39 inches
Pn = Pn-1 + 2.5 add 2.5 inches to the previous year to get the
height for the next year
Explicit form:
Pn = 39 + 2.5n In words: the height starts at 39 inches and increases
By 2.5 inches per year after he is 4 years old

At 6 years old, which is 2 years after he was 4 years old, n = 6 – 4 = 2, we would expect him
to be
P2 = 39 + 2.5(2) = 44 inches tall Substitute 2 in for n and simplify

Any mathematical model will break down eventually. Certainly, we shouldn’t expect this
boy to continue to grow at the same rate all his life. If he did, at age 50, which is 46 years
after he was 4 years old, n = 50 – 4 = 46, he would be:

P46 = 39 + 2.5(46) = 154 inches tall = 12.8 feet tall!

When using any mathematical model, we have to consider which inputs are reasonable to
use. Whenever we extrapolate, or make predictions into the future, we are assuming the
model will continue to be valid.

5
http://www.fira.ca/article.php?id=140
10

Linear (Arithmetic) Decay


According to MedlinePlus6, people over 40 years old lose about 1 cm of
height every 10 years, or 0.1 cm per year. Michael Jordan, basketball star
who came out of retirement twice to play the game again, turned 40 years
old in his final NBA season in 2003. At that time, he was reported to be 6
feet 6 inches tall7. How tall would we expect Michael Jordan to be in
2014?

This question is not a growth problem, as Michael Jordan’s height is not


growing. The opposite of growth is decay, and assuming that Michael’s
height follows the constant decrease described by MedlinePlus, we can use
the same thinking and processes we used with linear (arithmetic) growth.
Michael’s height should be decreasing or decaying by a constant difference
of 0.1 cm per year, and thus should follow a linear (arithmetic) model.

Notice our difference is in centimeters per year, and the height we are
given is in feet and inches. We need to complete a few conversions first,
knowing there are 12 inches in a foot, and 1 inch is 2.54 centimeters:
Michael Jordan with unnamed
member of US National Guard
Michael’s height (at age 40) in inches: in 2008, photo by The U.S. Army
12 inches (www.Army.mil) [Public
6 feet   6 inches  78 inches domain], via Wikimedia
1 foot Commons

Michael’s height (at age 40) in centimeters:


2.54 cm
78 inches   198.12 cm
1 inch

Since height decreases after age 40, we’ll call n = 0 the year 2003, when he turned 40 years
old (and thus count our year n values as years after 2003), and P0 = 198.12 cm.

Using the Medline statistic that height decreases by 0.1 cm per year, our constant difference
(or slope) is d = -0.1 cm per year, the negative sign indicating that we have decay, the
opposite of growth. From these pieces of data, we can create decay equations.

Recursive form:
P0 = 198.12 In words: height in 2003 is 198.12 cm
Pn = Pn-1 + (-0.1) or Pn = Pn-1 – 0.1 and drops 0.1 cm from the previous year
to the next year
Explicit form:
Pn = 198.12 + (-0.1)n or Pn = 198.12 – 0.1n In words: height in 2003 is 198.12 cm
and drops 0.1 cm each year after

6
Aging changes in body shape: MedlinePlus Medical Encyclopedia. (n.d.). U.S National Library of Medicine.
Retrieved June 5, 2014, from http://www.nlm.nih.gov/medlineplus/ency/article/003998.htm
7
Michael Jordan. (n.d.). Basketball-Reference.com. Retrieved June 5, 2014, from http://www.basketball-
reference.com/players/j/jordami01.html
Patterns and Growth 11

We can use the convenience of the explicit form to calculate Michael’s height in 2014, using
n = 2014 – 2003 = 11, since we’re counting years after 2003:
Pn = 198.12 + (-0.1)n
P11 = 198.12 + (-0.1)(11) = 197.02 cm. Substitute 11 for n and simplify

Michael Jordan should have lost just over a centimeter in height since 2003. I’ll let you tell
him that. I’m sure he could still beat any of us in a game of one-on-one.

If we verify our calculations and graph the information we can see that the graph is linear
with a negative slope, which makes sense if we remember that decay is the opposite of
growth.

Year n MJ's Height (cm) Michael Jordan's Height


2003 0 198.12 198.20
2004 1 198.02 198.00
2005 2 197.92
Height (cm)
197.80
2006 3 197.82 197.60
197.40
2007 4 197.72
197.20
2008 5 197.62
197.00
2009 6 197.52 196.80
2010 7 197.42 0 2 4 6 8 10 12
2011 8 197.32 Years since 2003 (n)
2012 9 197.22
2013 10 197.12
2014 11 197.02

Try it Now 2:
Suppose you want to supplement your diet with protein, and buy a kilogram bag of protein
powder (1000 g). A serving size is 170 g, but you don’t want to overdo it, so decide to take
half the normal serving. Create a recursive equation and an explicit equation showing the
amount of protein powder left after n of your servings. How many serving will you get out
of the bag?

Exponential (Geometric) Growth


Suppose that every year, only 10% of the fish in a lake have surviving offspring. If there
were 100 fish in the lake last year, there would now be 110 fish. If there were 1000 fish in
the lake last year, there would now be 1100 fish. Absent any inhibiting factors, populations
of people and animals tend to grow by a percent of the existing population each year.

Suppose our lake began with 1000 fish, and 10% of the fish have surviving offspring each
year. Since we start with 1000 fish, P0 = 1000. How do we calculate P1? The new
population will be 100% of the original population, plus an additional 10%. Symbolically:

P1 = 100%P0 + 10%P0 = 1P0 + 0.10P0


12

Notice this could be condensed to a shorter form:

P1 = 1P0 + 0.10P0 = (1+ 0.10)P0


P1 = 1.10P0

While 10% is the growth rate, 1.10 is the growth multiplier. Notice that 1.10 can be
thought of as “the original 100% plus an additional 10%”

For our fish population,


P1 = 1.10P0
P1 = 1.10(1000) = 1100

We could then calculate the population in later years, recursively


The fish population begins with 1000 fish P0 = 1000
After 1 year, the population grows by 10%, so it is now P1 = 1.10P0
110% of 1000, what it was last year. P1 = 1.10(1000) = 1100
After 2 years, the population has grown another 10%, so P2 = 1.10P1
110% of 1100, what it was last year P2 = 1.10(1100) = 1210
After 3 years, the population has grown by another 10%, so P3 = 1.10P2
110% of 1210, what it was last year P3 = 1.10(1210) = 1331
After 4 years, the population has grown by another 10%, so P4 = 1.10P3
110% of 1331, what it was last year P4 = 1.10(1331) = 1464.1

Notice that we keep multiplying last year’s fish population (Pn-1) by 1.10, 110%, in order to
get this year’s fish population (Pn). A recursive generalization of this growth is:

P0 = 1000 In words: the fish population starts with 1000 fish


Pn = 1.10(Pn-1) and is 110% of the previous year’s population

If we look at the difference between each year’s fish population we see that in the first year,
the population grew by 100 fish (from 1000 to 1100), in the second year, the population grew
by 110 fish (from 1100 to 1210), and in the third year the population grew by 121 fish (from
1210 to 1331), and in the fourth year the population grew by 133.1 (about 133, from 1331 to
1464.1) fish. While there is a constant percentage growth, the actual increase in number of
fish is increasing each year: the fish population is increasing at an increasing rate.

Graphing these values we see that this Fish Population


growth doesn’t quite appear linear as it is 1700
curving slightly upward. This curvature 1600
makes sense as the increase was not a 1500
Number of fish

1400
constant difference as we saw with linear 1300
growth. 1200
1100
To get a better picture of how this 1000
900
percentage-based growth affects things, we
800
need an explicit form, so we can quickly 0 1 2 3 4 5
calculate values further out in the future. Years from now
Patterns and Growth 13

As we did for the linear growth model, we will start building from the recursive equation:

The fish population begins with 1000 fish P0 = 1000


After 1 year, the population grows by 10%, so it is now P1 = 1.10P0
110% of 1000, what it was last year. P1 = 1.10(1000)
After 2 years, the population has grown another 10%, P2 = 1.10P1
so 110% of last year, which means we’ve multiplied by P2 = 1.10(1.10(1000))
1.10 twice. P2 = (1.10)2(1000)
After 3 years, the population has grown by another P3 = 1.10P2
10%, so 110% of last year, which means we’ve P3 = 1.10(1.10(1.10(1000))
multiplied by 1.10 three times. P3 = (1.10)3(1000)
After 4 years, the population has grown by another P4 = 1.10P3
10%, so 110% of last year, which means we’ve P4 = 1.10(1.10(1.10(1.10(1000)
multiplied by 1.10 four times. P3 = (1.10)4(1000)

Observing a pattern, we can generalize the explicit form to be:


Pn = (1.10)n(1000), or equivalently,
Pn = 1000(1.10n)

From this, we can quickly calculate the number of fish after n =


10, 20, or 30 years:
Fish!8
P10 = (1000)1.1010 = 2594 fish
P20 = (1000)1.1020 = 6727 fish
P30 = (1000)1.1030 = 17449 fish

Adding these values to our graph reveals a Fish Population


shape that is definitely not linear. If our fish
population had been growing linearly, by 100 18000
fish each year, the population would have 16000
only reached 4000 in 30 years compared to 14000
Number of fish

12000
almost 18000 with this percent-based growth,
10000
called exponential or geometric growth. 8000
6000
In exponential growth, the population grows 4000
proportionally to the size of the existing 2000
population, so as the population gets larger, 0
0 10 20 30
the same percent growth will yield a larger
Years from now
numeric growth. The difference between the
values is a percent of a larger base value.

If we reflect back to the beginning of this chapter and Kurzweil’s doubling steps (1, 2, 4, 8,
16, …), we can see that the exponential multiplier here is 2,which makes the percent of
increase 2 = 1 + r, r = 1 or 100%. This 100% increase makes sense if you consider the
pattern as increasing by adding the current amount (100%) to itself (100%): 1 + 1 = 2, 2 + 2
= 4, 4 + 4 = 8, etc., 100% + 100% = 200% or a multiplier of 2.

8
Photo by Janice1334, http://mrg.bz/u65HB6, license morgueFile license
14

Exponential (Geometric) Growth


If a quantity starts at size P0 and grows by R% (written as a decimal, r  100
R%
) every time
period, then the quantity after n time periods can be determined using either of these
relations:

Recursive form:
Pn = (1 + r) Pn-1 In words: multiply the previous term by (1 + r)
Explicit form:
Pn = (1 + r)n P0 or equivalently,
Pn = P0 (1 + r)n In words: start at P0, multiply by (1 + r) for n times

We call r the growth rate.


The term (1 + r) is called the growth multiplier, base or common ratio, and is
equivalent to the percent form 100% + R%.
Using b for base where b = 1 + r, the explicit form is Pn = P0(b)n.

Example 5: Population of Olympia, WA


Between 2007 and 2008, Olympia, WA grew almost 3% to a population of 245 thousand
people. If this growth rate was to continue, what would the population of Olympia be in
2014?

We need to define what year will correspond to n = 0. Since we know the population in
2008, it would make sense to have 2008 correspond to n = 0, so P0 = 245,000, and this also
makes n the number of years after 2008.

We know the growth rate is 3%, giving r = 0.03. With this growth rate, our base multiplier is
100% + 3% = 1 + 0.03 = 1.03, and we can build an equation.
Explicit form:
Pn = (245,000) (1.03)n In words: the population starts in 2007 at 245,000 and
multiplies by 1.03 (103%) per year after 2007

The year 2014 would then be n = 6, since 2014 – 2008 = 6 years after the year we started.
Pn = (245,000) (1.03)n Substitute 6 for n and simplify
6
P6 =(245,000) (1.03) = (245,000)(1.19405) = 292,542.25

The model predicts that in 2014, Olympia would have a population of about 293 thousand
people.

Evaluating exponents on the calculator


To evaluate expressions like (1.03)6, it will be easier to use a calculator than multiply
1.03 by itself six times. Most scientific calculators have a button for exponents. It is
typically either labeled like:
^ , yx , or xy .

To evaluate 1.036 we’d type 1.03 ^ 6, or 1.03 yx 6. Try it out - you should get an
answer around 1.1940523.
Patterns and Growth 15

Try it Now 3:
India is the second most populous country in the world, with a population in 2008 of about
1.14 billion people. The population is growing by about 1.34% each year. If this trend
continues, what will India’s population grow to by 2020?

Example 6: College Tuition


A friend is using the equation Pn = 4600(1.072)n to predict the annual tuition at a local
college. She says the formula is based on years after 2010. What does this equation tell us?

In the equation, P0 = 4600, which is the starting value of the tuition when n = 0. This tells us
that the tuition in 2010 was $4,600.

The growth multiplier is 1.072 = 1 + 0.072, so the growth rate (percent increase) is 0.072, or
7.2%. This tells us that the tuition is expected to grow by 7.2% each year.

Putting this together, we could say that the tuition in 2010 was $4,600, and is expected to
grow by 7.2% each year.

Example 7: Carbon Dioxide


In 1990, the residential energy use in the US was responsible for 962 million metric tons of
carbon dioxide emissions. By the year 2000, that number had risen to 1182 million metric
tons9. If the emissions grow exponentially and continue at the same rate, what will the
emissions grow to by 2050?

Similar to before, we will use n = 0 for 1990, as that is the year for the first piece of data we
have, which makes P0 = 962 (million metric tons of CO2), and n is the number of years after
1990. In this problem, we are not given the growth rate, but instead are given that in 2010,
10 years after 1990, so when n = 10, there are 1182 million metric tons of CO2. We can
write this information as P10 = 1182.

We know the value for P0, P0 = 962, so we can put that into the explicit equation:
Pn = 962(1+r)n

We also know that P10 = 1182, so substituting 1182 for Pn, and 10 for n, we get
1182 = 962(1+r)10

We can now solve this equation for the growth rate, r.


1182 = 962(1+r)10 Start by dividing by 962
1182
 (1  r )10 We now need to undo the exponent of 10 (10th power)
962
1182 10
 1  r 
10
10 Take the 10th root of both sides to cancel the 10th power
962

9
http://www.eia.doe.gov/oiaf/1605/ggrpt/carbon.html
16

1182
10  1 r Subtract 1 from both sides to isolate r
962
1182
r  10  1  0.0208 = 2.08% Simplify and convert r to %
962
So if the emissions are growing exponentially, they are growing by about 2.08% per year.
We know the growth rate, so we can find the growth multiplier: 100% + 2.08% = 1.0208, and
our explicit equation looks like:

Pn = 962(1.0208)n In words: Emissions start at 962 million metric tons of


CO2 in 1990 and multiply by 1.0208
(102.08%) per year after 1990.

We can now predict the emissions in 2050 by finding P60, as 2050 is 2050 – 1990 = 60 years
after 1990.

P60 = 962(1.0208)60 = 3308.4 million metric tons of CO2 in 2050

Rounding
As a note on rounding, notice that if we had rounded the growth rate to 2.1%, our
calculation for the emissions in 2050 would have been 3347 million metric tons of
CO2. Rounding to 2% would have changed our result to 3156 million metric tons of
CO2. A very small difference in the growth rates gets magnified greatly in exponential
growth. Thus is recommended to round the growth rate as little as possible.

If you need to round, keep at least three significant digits - numbers after any leading
zeros. So 0.4162 could be reasonably rounded to 0.416. A growth rate of 0.001027
could be reasonably rounded to 0.00103.

Evaluating roots on the calculator


In the previous example, we had to calculate the 10th root of a number. This is
different than taking the basic square root, √. Many scientific calculators have a button
for general roots. It is typically labeled like:
y
n , x , or x

To evaluate the 3rd root of 8, for example, we’d either type 3 x 8, or 8 x 3,


depending on the calculator. Try it on yours to see which to use – you should get an
answer of 2.

If your calculator does not have a general root button, all is not lost. You can instead
use the property of exponents which states n a  a 1 / n . So, to compute the 3rd root of 8,
you could use your calculator’s exponent key to evaluate 81 / 3 . To do this, type:
8 yx ( 1 ÷ 3 ) and be sure to include the parentheses, as the parentheses tell the
calculator to divide 1/3 before doing the exponent.
Patterns and Growth 17

Try it Now 4:
The number of users on a social networking site was 45 thousand in February when they
officially went public, and grew to 60 thousand by October. If the site is growing
exponentially, and growth continues at the same rate, how many users should then expect
two years after they went public?

Example 8: Comparing Linear and Exponential Growth


Looking back at the last example, for the sake of comparison, what would the carbon
emissions be in 2050 if emissions grow linearly at the same rate? We used the information
that in 1990, carbon dioxide emissions were 962 million metric tons and in 2000, 1182
million metric tons.

Again we use n = 0 for 1990, giving P0 = 962.

To find d, we could take the same approach as earlier, noting that the emissions increased by
220 million metric tons in 10 years, giving a common difference of d = 22 million metric
tons each year.

We can use also an approach similar to that which we used to find the exponential equation.

We know the value for P0, P0 = 962, so we can put that into the explicit linear equation:
P10 = 962 + dn

Since we know P10 = 1182, so substituting 10 for n and 1182 for Pn, we have:
1182 = 962 + 10d

We can now solve this equation for the common difference, d.


1182 = 962 + 10d Subtract 962 from both sides
220 = 10d Divide both sides by 102
22 = d

This value for d tells us that if the emissions are Carbon Dioxide Emissions
changing linearly, they are growing by 22 million 3500
metric tons each year. Predicting the emissions in
Millions of metric tons

3000
2050, we again use n = 2050 – 1990 = 60 2500
2000
P60 = 962 + 22(60) = 2282 million metric tons. 1500
1000
Notice that this number is substantially smaller than the 500
prediction from the exponential growth model. 0
0 10 20 30 40 50 60
Calculating and plotting more values helps illustrate the Years after 1990
differences between the two models, and we can see
that the exponential growth outpaces the linear growth.
18

How do we know which growth model, linear or exponential, to use? There are two
approaches which should be used together whenever possible:

1) Find more than two pieces of data. Plot the values, and look for a trend. Does the
data appear to be changing like a line, or do the values appear to be curving upwards?
2) Consider the factors contributing to the data. Are they things you would expect to
change linearly or exponentially? For example, in the case of carbon emissions, we
could expect that, absent other factors, they would be tied closely to population
values, which tend to change exponentially.

Exponential (geometric) Decay


Suppose we see a bowl of chocolate chips on the table. Every time you walk by, we can’t
resist and eat half of the chocolate chips in the bowl (only half: we don’t want to be greedy).
If there were 900 chocolate chips to begin with and we walk by the bowl every 10 minutes,
how long will it take us to eat them all?

This problem is an example of exponential (geometric)


decay, because the number of chocolate chips is certainly
not growing (we’re eating them!). If we look at the pattern
in the number of chocolate chips, we can see why it is
exponential:

We start with 900 chips, so P0 = 900.


We pass by, we eat half, so half of the chips will remain,
P1 = (1/2)900 = 450 chips will be left.
Overtime at the Chocolate Chip Factory10
We pass by again,
P2 = (1/2)P1 = (1/2)450 = 225 chips left.
We pass by again,
P3 = (1/2)P2 = (1/2)225 = 112.5 chips left.
At this point, we’d probably have 112 chips left (having eaten 113 chips; who wants half a
chip?) and continue.

Recursively, we write this as:


Pn = (1/2)Pn-1 In words: the number of chips left is half the previous amount

After our first pass, the number of chips decreases by 450, then after the second pass, by 225,
so we can see that the difference between the number of chips is not constant, but the ratio
between the amount of chips left is constant:
900, 450, 225, 112.5 …
P1/P0 = 900/450 = 1/2,
P2/P1 = 225/450 = 1/2,
P3/P2 = 112.5/225 = 1/2, the ratio between successive terms is 1/2.

We can view our 1/2 factor as a decay rate, and adjust our exponential model for decay.

10
Photo by Kate Ter Haar, https://flic.kr/p/aM7D94 license CC BY 2.0
Patterns and Growth 19

Exponential (Geometric) Decay


If a quantity starts at size P0 and decays by R% (written as a decimal, r  100
R%
) every
time period, then the quantity after n time periods can be determined using either of
these relations:

Recursive form:
Pn = (1 – r) Pn-1 In words: multiply the previous amount by (1 – r)

Explicit form:
Pn = (1 – r)n P0 or equivalently,
Pn = P0 (1 – r)n In words: start at P0, multiply by (1 – r) for n times

We call r the decay rate.


The term (1 – r) is called the decay multiplier, base or common ratio and is
equivalent to the percent form 100% - R%
Using b for base, where b = 1 – r, the explicit form is Pn = P0(b)n.

Using this concept, the explicit form for our bowl of chocolate chips problem is:
Pn = P0 (1 – r)n would be:
Pn = 900(1 – 1/2)n
Pn = 900(1/2)n or Pn = 900(0.5)n

Recall our original question: how long will it take to eat them all? If we eat them all, we
should have 0 left, so we want to know when Pn = 0:
Pn = 900(0.5)n
0 = 900(0.5)n Divide both sides by 900
n
0 = (0.5)
This last equation is difficult to solve using algebraic means you are familiar with (it requires
the use of the logarithm, which is discussed in an optional part of this unit). If you look at
the equation itself, you want to know what power of 0.5 will give you 0. In the abstract, this
equation has no solution, as (0.5)n > 0, although it gets closer to 0 as n gets closer to infinity.
However, in this context, we can look at what would happen at a concrete level with the
number of chocolate chips by using our explicit equation to generate values:

Pass Chocolate Rounded Decaying m&ms


number chips left down
900
0 900 900 800
1 450 450 700
2 225 225 600
Chips left

3 112.5 112 500


4 56.25 56 400
5 28.125 28 300
6 14.0625 14 200
7 7.03125 7 100
0
8 3.515625 3 0 2 4 6 8 10
9 1.7578125 1 Pass number
10 0.8789063 0
20

We can see practically that after 10 passes, there are no chocolate chips left. Note that the
fractional amount of chocolate chips is rounded down as we’re not going to try to eat part of
a chip, we’ll eat the whole chip! If we look at the graph of the data, we can see that its shape
is similar to the exponential growth curve, just decreasing instead of increasing.

We can now answer the question that what originally asked, how long will it take to eat all
the chocolate chips? It takes 10 passes and they are all gone. Since we pass by the bowl
once every 10 minutes, it will be (10 passes)(10 minutes/pass) = 100 minutes, or 1 hour and
40 minutes until the chips are all gone.

This problem is an example of a half-life problem, as it discusses how long it takes for half
of the material (chocolate chips) to decay (get eaten). This time period is called the half-life.
The half-life of chocolate chips in your house may be longer or shorter than 10 minutes,
depending on the number of chocoholics you have living there.

Decay problems often provide a percent of decrease, such as 50% in the chocolate chip
case. We need to be careful with the difference in meaning between the percent of an
amount (base multiplier) and the percent of decrease. In this problem, they would be the
same value: 50%. But if we were eating 30% of the chips, that would be a 30% percent of
decrease, and our base multiplier would be 100%  30% = 70% (or 1 – 0.3 = 0.7 in decimal
form) as there would be 70% of the amount of chips left.

Try it now 5:
Suppose you have an assignment that is worth 100 points, and your instructor penalizes you
10% off the remaining score for each day the assignment is late. Create recursive and
explicit equations for the value of the assignment where n is the number of late days. If you
turned the assignment in a week late, what is the greatest possible score you could earn?

Comparing Linear, Exponential and Other Growth Models


We’ve looked at linear (arithmetic) and exponential (geometric) growth patterns. We saw
that linear patterns grow at a constant rate, the difference (d) between successive terms
constant, while exponential patterns grow at an increasing rate, the difference between
successive terms itself increasing, with the ratio (1+r) between the terms constant. These are
not the only patterns out there that illustrate growth. Consider the pattern illustrated in the
square numbers11:

We can see from the illustration or by calculating the differences between the terms (visible
as the red dots in each figure) that the square numbers grow by an increasing rate, the
differences are not constant. The square numbers pattern eventually grow faster than a linear

11
Image of square numbers by Aldoaldoz (Own work) license CC-BY-SA-3.0 via Wikimedia Commons
Patterns and Growth 21

pattern, as that increasing difference will eventually grow larger than the constant difference
of a linear pattern, but how does it compare to an exponential one?

What about Fibonacci numbers, illustrated in the


sequence 1, 1, 2, 3, 5, 8, 13, 21, 34, 55… and shown
on the power plant tower in the photograph?12 We
notice that this sequence grows by adding the two
previous terms to get the next term (1+1 = 2, 1+2 = 3,
2+3 = 5, 3+5 = 8 and so on). Looking at the
difference between the numbers, we see that they
repeat the numbers themselves (0, 1, 1, 2, 3, 5, 8, etc.)
which makes sense if we think about how the
sequence is created. The Fibonacci sequence is
growing at an increasing rate as well. We can feel
confident that it will eventually grow faster than the
constant difference of a linear pattern, but will it
outpace an exponential one?

We could put all these patterns in a footrace and see who wins, but to compare them to an
exponential function, we can compare the ratios between successive terms, as these ratios are
constant in an exponential pattern. The tables below show a comparison among the ratios
between successive terms for a linear pattern (Pn = 2n), the square numbers (Pn = n2), an
exponential pattern (Pn = 2n), and the Fibonacci sequence (Pn = Pn-2 + Pn-1).

n Linear Ratios n Square Ratios n Exp. Ratios n Pn = Ratios


Pn = Pn = Pn = Pn-2 +
2n n2 2n Pn-1
0 0 0 0 0 1 0 1
1 2 #DIV/0 1 1 #DIV/0 1 2 2 1 1 1.00000
! !
2 4 2.00000 2 4 4.00000 2 4 2 2 2 2.00000
3 6 1.50000 3 9 2.25000 3 8 2 3 3 1.50000
4 8 1.33333 4 16 1.77778 4 16 2 4 5 1.66667
5 10 1.25000 5 25 1.56250 5 32 2 5 8 1.60000
6 12 1.20000 6 36 1.44000 6 64 2 6 13 1.62500
7 14 1.16667 7 49 1.36111 7 128 2 7 21 1.61538
8 16 1.14286 8 64 1.30612 8 256 2 8 34 1.61905
9 18 1.12500 9 81 1.26563 9 512 2 9 55 1.61765
10 20 1.11111 10 100 1.23457 10 1024 2 10 89 1.61818
11 22 1.10000 11 121 1.21000 11 2048 2 11 144 1.61798
12 24 1.09091 12 144 1.19008 12 4096 2 12 233 1.61806
13 26 1.08333 13 169 1.17361 13 8192 2 13 377 1.61803
14 28 1.07692 14 196 1.15976 14 16384 2 14 610 1.61804
15 30 1.07143 15 225 1.14796 15 32768 2 15 987 1.61803

12
Photo of tower in Turku, Finland, by Kalajoki (Own work) [Public domain], via Wikimedia Commons
22

We can see that although all four patterns are increasing, the ratios between the successive
terms for the linear and square number patterns is getting smaller (if we continued, they
would be getting closer to 1). The exponential pattern by its nature has a constant ratio
related to its multiplier (2 in this case). The ratios for the Fibonacci pattern appear to be
“settling down” to the same number: they seem to be approaching 1.618 or so, suggesting
that if we go out far enough in the sequence, the Fibonacci pattern appears to behave like an
exponential one, with multiplier 1.618. This multiplier is smaller than the example
exponential one (2) but larger than the ratios (getting closer to 1) for the linear and squaring
patterns.

So if we put all four of these patterns in a long foot race, we would expect that the
exponential pattern would come in first, followed by the Fibonacci pattern, then the squaring
pattern, and in last, the linear pattern. Had the exponential function a smaller multiplier than
the 1.618 the Fibonacci appears to have, then the Fibonacci pattern would pass the
exponential one at some point in the race.
13
So what is so special about the Fibonacci sequence and the number 1.618?
Fibonacci was an Italian mathematician whose name was Leonardo of
Pisa, and famous for being one of the first to bring the Hindu-Arabic
numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) to Europe, a far more efficient means of
representing quantities and calculations than Roman numerals14. He
proposed the Fibonacci sequence as a means to model the number of rabbits
produced by a single pair over time. He didn’t call them the Fibonacci
sequence; that name was given 600 years later by another mathematician,
Edouard Lucas.

The value that the ratios of successive terms in the Fibonacci sequence approach, 1.618ish, is
known as the Golden Ratio, and is also known as Φ, the Greek letter Phi (pronounced “fee”)
1 5
The precise value of Φ is defined as:   .
2
The Golden Ratio was studied by Euclid, about 1000 years before Fibonacci. A “golden
rectangle” is a rectangle where the ratio between the sides is equal to Φ.

The Fibonacci sequence and the Golden Ratio can be found in art, music, poetry, and nature.
An interesting website for exploring ideas and popular opinion about the Golden Ratio is
Gary Meisner’s http://www.goldennumber.net/golden-ratio/.
For exploring the Golden Ratio and the Fibonacci sequence, see Dr. Ron Knott’s website
http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/fib.html.

Try it now 6:
Choose two natural numbers. Create a Fibonacci-like sequence of at least 12 terms by
adding the previous two terms to get the next term and so forth. Find the ratios between
successive terms. Does it appear that the ratios approach the Golden Ratio?

13
Photo: Monument of Leonardo da Pisa by Hans-Peter Postel (Own work) [CC BY 2.5], Wikimedia Commons
14
Information on Fibonacci and the Golden Ratio are from Dr. Ron Knott’s webpage:
http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/fib.html
Patterns and Growth 23

Accumulation (adding stuff up)


All of our calculations up to this point have dealt with patterns and predicting single values
or terms within those patterns. Next, we consider what happens when we accumulate these
values. Suppose instead of trusting a bank with your money, you decide to save money by
stuffing it into your mattress. You start with $5 and each month you increase the amount you
stuff in your mattress by $2 (so the next month, you stuff in $7). How much money will you
have accumulated in your mattress at the end of a year?

Your mattress deposit amounts look like:


Month 0 1 2 3 4 5 6 7 8 9 10 11
Deposit $5 $7 $9 $11 $13 $15 $17 $19 $21 $23 $25 $27

This sequence of amounts looks familiar: it is increasing by a constant amount ($2) and
represents linear (arithmetic) growth. We want to know how much is in the mattress at the
end of the year, which would be the sum of this sequence of numbers:
$5 + $7 + $9 + $11 + $13 + $15 + $17 + $19 + $21 + $23 + $25 + $27
This sum is called a series. We could calculate this series by adding them in our calculator,
but that would become cumbersome if we continued our mattress savings plan for several
years.

If we list the series least to greatest and then backwards, greatest to least, we’ll see a pattern:
$5 + $7 + $9 + $11 + $13 + $15 + $17 + $19 + $21 + $23 + $25 + $27
$27+ $25+ $23+ $21 + $19 + $17 + $15 + $13 + $11 + $9 + $7 + $5
$32+ $32+ $32+ $32 + $32 + $32 + $32 + $32 + $32 + $32 + $32 + $32

The terms pair up vertically to make constant sums of $32. Why are the sums constant?
When listed least to greatest, the sequence terms increase by $2, and written backwards, they
decrease by $2 and these changes balance each other so the sums remain constant.

We can find the total sum by taking 12 pairs of terms that sum to $32: 12($32) = $384, but
then we need to divide this total by 2, as it came from writing the series twice, forwards and
backwards: $384/2 = $192. We have $192 total in the mattress at the end of the year.

Notice that if we just looked at the initial deposit ($5) and the final deposit ($27), they add up
to the $32 constant sum value. So we could view our calculations as:

Sum of the series = 12 terms ($5 initial deposit + $27 final deposit) = 12($32) = $192
2 2

This process generalizes to a formula that calculates the sum of any linear (arithmetic) series:
Sum of a linear (arithmetic) series, Sn:
N  initial term  final term  N  P0  Pn 
SN  or S N 
2 2
N is how many terms there are in the series (number of terms you are adding)
P0 is the initial term in the series
Pn is the final term in the series, using n = N -1, as the first term P0 is n = 0, and recall
that we used index n as the number of terms after the starting (initial) term.
24

Example 9: Accumulating Sit-ups


Suppose you signed up for an online sit-up challenge, where you start with 15 sit-ups, then
increase the number of sit-ups you do each day by 5. How many sit-ups will you have done
by the end of a 31-day month?

From the information given, we recognize the sit-ups are growing linearly (arithmetic)
because they increase by a constant 5 each day (d = 5). We know that we start with 15 sit-
ups, so P0 = 15. We can visualize the daily sit-up pattern as 15, 20, 25, 30, 35, 40, and so
forth for 31 days, so we know N = 31 as we’ll have 31 days of sit-ups, so 31 terms.

We also know we want the sum of this sit-up pattern because we want to know how many
sit-ups we’ve done at the end of the month, which would be the total sit-ups for the month.

We know that we start with 15, so P0 = 15. To find the sum, we need to know how many
situps we do on the final day. We have 31 days in the month, so we need to know P30, since
from first day to the 31st day, we will have increased our sit-ups 30 times; we’ve done n = 30
days of sit-ups after the day we started. Notice this value for n relates to the concept that n =
N – 1: 31 days of sit-ups, the final day is n = 31 – 1 = 30 days after the initial (start) day.

Since we aren’t given P30, we can use the explicit equation for linear growth to find it:
Pn = P0 + dn
Pn = 15 + 5n Using P0 = 15, d = 5, and substituting in 30 for n
P30 = 15 + 5(30) = 165, so we did 165 sit-ups on the 31st day

Now we can calculate the total sit-ups using our sum shortcut with N = 31, P0 = 15 for our
initial (start) term, P30 = 165 for our final (end) term,
3115  165 
S31   2790 sit-ups for the month.
2

Try it now 7:
A theater has 30 seats in the first row, 32 seats in the second row, 34 seats in the third row,
and so on. There are 125 rows in the theater. How many seats are there in the entire theater?

Looking back to our mattress savings problem, consider what would happen if you started
stuffing $5 into the mattress then increased your monthly deposit by 20% each month. How
much money would you have in our mattress by the end of the year? Your mattress deposits
would now look like (rounded to the nearest cent):

$5.00, $6.00, $7.20, $8.64, $10.37, $12.44, $14.93, $17.92, $21.50, $25.80, $30.96, $37.15

These amounts are not increasing by a constant difference. Instead, they represent
exponential (geometric) growth, as they were calculated using the multiplier 1.20 (100% +
20% = 120% = 1.20). We can still add these values up “the long way” by entering each
individually, but it would be helpful to have a shortcut. Unfortunately, we can’t use the
shortcut we created for linear (arithmetic growth), so we’ll need to create a new one, and it’s
a bit complicated.
Patterns and Growth 25

In their exponential form, the deposits are:


$5.00(1.2)0, $5.00(1.2)1, $5.00(1.2)2, $5.00(1.2)3, $5.00(1.2)4, $5.00(1.2)5,
$5.00(1.2)6, $5.00(1.2)7, $5.00(1.2)8, $5.00(1.2)9, $5.00(1.2)10, $5.00(1.2)11

Their sum, an exponential (geometric) series, looks like:

S12 = $5.00(1.2)0 + $5.00(1.2)1 + $5.00(1.2)2 + ... + $5.00(1.2)10 + $5.00(1.2)11

If we multiply the whole series by 1.2, it will increase the exponent by 1 as there will be one
more 1.2 factor in each expression:

1.2S12 = $5.00(1.2)1 + $5.00(1.2)2 + $5.00(1.2)3 + ... + $5.00(1.2)11 + $5.00(1.2)12

Now consider what happens if we subtract the two sums (which changes the signs on the
second sum; for convenience, we’ll line up like terms):

1.2S12 = $5.00(1.2)1 + $5.00(1.2)2 + $5.00(1.2)3 + ... + $5.00(1.2)11 + $5.00(1.2)12


– S12 = – $5.00(1.2)0–$5.00(1.2)1 – $5.00(1.2)2 – ... – $5.00(1.2)10 – $5.00(1.2)11
1.2S12 – S12 = $5.00(1.2)12 – $5.00(1.2)0 as all the “middle terms” subtract out

Notice how the left side of the resulting equation has S12 in both terms and the right side has
$5.00 in both terms—these are common factors. If we factor out these common factors on
either side, we have:

S12(1.2 – 1) = $5.00(1.212 – 1.20) Reminder: 1.20 = 1

We can then divide both sides of the equation by (1.2 – 1) to solve for S12:
S12 1.2  1 $5.00 1.2  1
12


1.2  1 1.2  1
S12 

$5.00 1.212  1   $197.9025112...
1.2  1
So we’d have or about $197.90 ($197.91 if you round up) in the mattress at the end of 12
months.

That was a bit complicated, but if you look at the final calculation, we can analyze it and
generalize it:

S12 

$5.00 1.212  1  : $5.00 was P0, 1.2 is our multiplier or base, b = (1 + r), and the
exponent in the numerator, 12, was N, the total number of
1.2  1 deposits made over the year (12 months).

We can now state a general “shortcut” for finding the sum of a series with exponential
(geometric) growth:
26

Sum of an exponential (geometric) series:

SN 

P0 1  r   1
N
 which simplifies to S N 

P0 1  r   1
N

 1  r   1 r
N is how many terms there are in the series
P0 is the initial (first) term
(1 + r) is the growth multiplier or base, r the growth rate

Alternatively, using the growth multiplier b =(1 + r), SN 



P0 b N  1   P 1  b 
0
N

b 1 1 b

Example 10: Sit-up Challenge Revisited


Suppose you signed up for an online sit-up challenge, where you start with 15 sit-ups, then
increase the number of sit-ups you do each day by 10%. How many sit-ups will you have
done by the end of a 31-day month?

In this scenario, we recognize the growth is exponential, as we’re increasing by 10%, thus
r = 0.10, making 1 + r = 1 + 0.10 = 1.10 for the growth multiplier.

We also know P0 = 15, since we are starting out with 15 sit-ups, r = 1.10, and N = 31 as we
have 31 days. Applying our sum formula for exponential (geometric) growth, we have:

SN 

P0 1  r   1
N

r

S31 

15 1.10   1
31
  15 18.1943425  2729.151374 situps
0.10 0.10

We’ll have done about 2729 sit-ups (or 2730 if we round up for part of a sit-up) by the end of
the month. Notice how that amount is almost double the amount in Example 7, which makes
sense if we remember that exponential growth outpaces linear growth.

Example 11: Crazy Sit-Up Challenge!


Suppose you joined a gym where your trainer challenges you to start with 15 sit-ups, then
double the number of sit-ups you do each day. How many sit-ups will you have done by the
end of a 31-day month?

Once again, we recognize the growth is exponential. We’re increasing the number of sit-ups
by doubling, which can be thought of as adding to itself: 15 + 15 = 30 for the next day, 30 +
30 = 60 for the next, we’re doing 100% + 100% = 200%, so r = 1, and 1+ r = 1 + 1 = 2.
Notice how the multiplier or base is 2, which makes sense if we think of doubling as
multiplying by 2.

We also know P0 = 15, since we are starting out with 15 sit-ups, r = 2, and N = 31 as we have
31 days. Applying our sum formula for exponential (geometric) growth, we have:
Patterns and Growth 27

SN 

P0 1  r   1
N

r

S31 

15 2  1 31
  15  2,147, 483, 647   3.22122547110 10
sit-ups
1 1

Notice that this answer is in scientific notation! On some calculators, this answer appears as
3.221225471E10, where the capitalized E indicates the answer is in scientific notation, and
the value (10) after the E is the exponent on the base ten. This exponent (10) indicates that
we would move the decimal point 10 place values to the right, and the number of sit-ups in
standard notation is: 32,212,254,710 sit-ups. The actual value is 32,212,254,705 total sit-
ups. The difference in accuracy is due to most calculators only able to show only 9 digits to
the left of the decimal point.

Either way, I think we’re going to fire our trainer for unrealistic expectations!

A famous fable about doubling, often attributed to Chinese, Indian or Islamic writers,
involves the man who does the king a favor, and the king grants the man any reward he asks.
The man asks the king to put one grain of rice (or wheat, or penny, depending on the version
of the story) on the first square of a chessboard, then double that amount for the next square,
and so on, for all 64 squares on the chessboard.

The pattern of grains (or pennies) looks like:


1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1024 + 2048 + 4096 + 8192…

Recognizing that the initial term (starting square) has one grain on it, thus P0 = 1, and the
base multiplier b = (1 + r) = 2 as it is doubling, and N = 64, since there are 64 total squares
on the chessboard, we can calculate the total number of grains (or pennies) on the chessboard
as:

SN 

P0 1  r   1
N
 or alternatively, S N 

P0 b N  1 
r b 1

S64 

1 2 1 64
  1.8446744707 10 19
S64 

1 264  1   1.844674407 10
19

1 2 1

Either method produces an astronomical answer of approximately 18 quintillion (that’s 18


million trillion) grains (or pennies), more than can be produced by the entire planet. Usually
at this part of the story, the king decides to behead the man (or some other sort of trick that
saves the king from having to actually pay up).

We began our discussion of exponential and linear growth patterns with Ray Kurzweil—and
we end with him as well. In Kurzweil’s telling of the chessboard fable, it is an emperor
rewarding an investor:

“It should be pointed out that as the emperor and the inventor went through the first
half of the chess board, things were fairly uneventful. The inventor was given
spoonfuls of rice, then bowls of rice, then barrels. By the end of the first half of the
28

chess board, the inventor had accumulated one large field’s worth (4 billion grains),
and the emperor did start to take notice. It was as they progressed through the second
half of the chessboard that the situation quickly deteriorated.”15

Kurzeil suggests we have experienced the first half of the chessboard in our technological
advancement, and the interesting challenge and outcomes will be as we progress through the
second half of the chessboard, and dealing with the resulting changes in society with such
advancement. He suggests we stick around to see what happens!

Try it now 8:
A person with an ear infection takes a 200 mg dose of ampicillin once every 4 hours. At the
end of the 4 hour period, about 12% of the drug at the start of the 4 hours still remains
accumulated in the body. How much ampicillin is in the body when the person takes their 6th
dose?
Hint: as this problem involves decay, and 12% is the percent of the drug remaining (rather
than percent of decrease, r), use 0.12 for (1 + r).

Solving Exponentials for Time: Logarithms (optional)


Earlier, we found that since Olympia, WA had a population of 245 thousand in 2008 and had
been growing at 3% per year, the population could be modeled by the equation

Pn = 245,000(1.03)n, where n is the number of years after 2008.

Suppose we wanted to know when the population of Olympia would reach 400 thousand.
Since we are looking for the year n when the population will be 400 thousand, we would
need to solve the equation

400,000 = 245,000(1.03)n Divide both sides by 245,000


1.6327 = 1.03n

One approach to this problem would be to create a table of values, or to use technology to
draw a graph to estimate the solution:

15
Kurzweil, R. (2018). The Law of Accelerating Returns | Kurzweil. [online] Kurzweilai.net. Available at:
http://www.kurzweilai.net/the-law-of-accelerating-returns
Patterns and Growth 29

From the graph, we can estimate that the solution will be around 16 to 17 years after 2008
(2024 to 2025). This is pretty good, but we’d really like to have an algebraic tool to answer
this question. To do that, we need to introduce a new function that will undo exponentials,
similar to how a square root undoes a square. For exponentials, the function we need is
called a logarithm. It is the inverse of the exponential, meaning it undoes the exponential.
While there is a whole family of logarithms with different bases, we will focus on the
common log, which is based on the exponential 10x.

Common Logarithm
The common logarithm, written log(x), undoes the exponential 10x
This means that log(10x) = x, and likewise 10log(x) = x
This also means the statement 10a = b is equivalent to the statement log(b) = a

log(x) is read as “log of x”, and means “the logarithm of the value x”. It is important to
note that this is not multiplication – the log doesn’t mean anything by itself, just like √
doesn’t mean anything by itself; it has to be applied to a number.

Example 12: Evaluating Logarithms


Evaluate each of the following
a) log(100) b) log(1000) c) log(10000) d) log(1/100) e) log(1)

a) log(100) can be written as log(102). Since the log undoes the exponential, log(102) = 2
b) log(1000) = log(103) = 3
c) log(10000) = log(104) = 4
  log 10   2
1  1 
d) Recall: x  n  n . log 2

x  100 
e) Recall: x0 = 1. log(1) = log(100) = 0

It is helpful to note that from the first three parts of the previous example that the number
we’re taking the log of has to get 10 times bigger for the log to increase in value by 1.
Of course, most numbers cannot be written as a nice simple power of 10. For those numbers,
we can evaluate the log using a scientific calculator with a log button.

Example 13: The Common Logarithm


Evaluate log(300)

Using a calculator, log(300) is approximately 2.477121

With an equation, just like we can add a number to both sides, multiply both sides by a
number, or square both sides, we can also take the logarithm of both sides of the equation
and end up with an equivalent equation. This will allow us to solve some simple equations.
30

Example 14: Solving Equations with Common Logarithms


a) Solve 10x = 1000
log(10x) = log(1000) Take the log of both sides
x=3 The log undoes the exponential, log(10x) = x
Similarly log(1000) = log(103) = 3

b) Solve 10x = 3
log(10x) = log(3) Take the log of both sides
x = log(3) The log undoes the exponential, log(10x) = x
We can approximate this value with a calculator. x ≈ 0.477

c) Solve 2(10x) = 8
10x = 4 Isolate the exponential by dividing both sides of
the equation by 2
x
log(10 ) = log(4) Take the log of both sides
x = log(4) The log undoes the exponential, log(10x) = x
We can approximate this value with a calculator. x ≈ 0.602

This approach allows us to solve exponential equations with powers of 10, but what about
problems like 2 = 1.03n from earlier, which have a base of 1.03? For that, we need the
exponent property for logs.

Properties of Logs: Exponent Property


 
log Ar  r log A

To show why this is true, we offer a proof.


Since the logarithm and exponential undo each other, 10 log A  A .

So A r  10 log A r

 
Utilizing the exponential rule that states x a
b
 x ab ,
 r

A r  10 log A  10 r log A
   
So then log Ar  log 10 r log A
Again utilizing the property that the log undoes the exponential on the right side yields the
result
 
log Ar  r log A

Example 15: Exponent Property for Logarithms


Rewrite log(25) using the exponent property for logs
log(25) = log(52) = 2log(5)

This property will finally allow us to answer our original question.


Patterns and Growth 31

Solving exponential equations with logarithms


1. Isolate the exponential. In other words, get it by itself on one side of the equation.
This usually involves dividing by a number multiplying it.
2. Take the log of both sides of the equation.
3. Use the exponent property of logs to rewrite the exponential with the variable
exponent multiplying the logarithm.
4. Divide as needed to solve for the variable.

Example 16: Population


If Olympia is growing according to the equation, Pn = 245(1.03)n, where n is years after
2008, and the population is measured in thousands. Find when the population will be 400
thousand.

We need to solve the equation


400 = 245(1.03)n Begin by dividing both sides by 245 to isolate the exponential
n
1.633 = 1.03 Now take the log of both sides
log(1.633) = log(1.03n) Use the exponent property of logs on the right side
log(1.633)= n log(1.03) Now we can divide by log(1.03)
log(1.633)
n We can approximate this value on a calculator
log(1.03)
n ≈ 16.591

Alternatively, after applying the exponent property of logs on the right side, we could have
evaluated the logarithms to decimal approximations and completed our calculations using
those approximations, as you’ll see in the next example. While the final answer may come
out slightly differently, as long as we keep enough significant values during calculation, our
answer will be close enough for most purposes.

Example 17: Water Filtration


Polluted water is passed through a series of filters. Each filter removes 90% of the remaining
impurities from the water. If you have 10 million particles of pollutant per gallon originally,
how many filters would the water need to be passed through to reduce the pollutant to 500
particles per gallon?

In this problem, our “population” is the number of particles of pollutant per gallon. The
initial pollutant is 10 million particles per gallon, so P0 = 10,000,000. Instead of changing
with time, the pollutant changes with the number of filters, so n will represent the number of
filters the water passes through.

Also, since the amount of pollutant is decreasing with each filter instead of increasing, our
“growth” rate will be negative, indicating that the population is decreasing instead of
increasing, so r = -0.90.

We can then write the explicit equation for the pollutant:


Pn = 10,000,000(1 – 0.90)n = 10,000,000(0.10)n
32

To solve the question of how many filters are needed to lower the pollutant to 500 particles
per gallon, we can set Pn equal to 500, and solve for n.

500 = 10,000,000(0.10)n Divide both sides by 10,000,000


0.00005 = 0.10n Take the log of both sides
log(0.00005) = log(0.10n) Use the exponent property of logs on the right side
log(0.00005) = n log(0.10) Evaluate the logarithms to a decimal approximation
-4.301 = n (-1) Divide by -1, the value multiplying n (isolate n)
4.301 = n

It would take about 4.301 filters. Of course, since we probably can’t install 0.3 filters, we
would need to use 5 filters to bring the pollutant below the desired level.

Try it Now 9:
India had a population in 2008 of about 1.14 billion people. The population is growing by
about 1.34% each year. If this trend continues, when will India’s population reach 1.2
billion?

Reference: General formulas

Linear (arithmetic) Linear (arithmetic) Exponential Exponential


growth/decay accumulation (geometric) (geometric)
Pn  P0  dn N  P0  Pn  growth/decay accumulation
SN  Pn  P0 (1  r ) n P ((1  r ) N  1)
P0 = starting value 2 SN  0
d = common P0 = first term Pn  P0 (b) n r
difference Pn = last term P0 = starting value P (b  1)
N
SN  0
n = number of terms
after the starting
N = total number of b = (1  r) = base or  b  1
terms accumulated multiplier
value (note: n = N – 1) P0 = starting value
n = number of terms b = (1  r) = base of
after the starting multiplier
value N = total number of
terms accumulated
Finance 1

“While money can't buy happiness, it certainly lets you choose your own form of misery.”
–Julius Henry "Groucho" Marx, quoted on http://www.goodreads.com

Finance
We have to work with money every day. While balancing your checkbook or calculating
your monthly expenditures on espresso requires only arithmetic, when we start saving,
planning for retirement, or need a loan, we need more mathematics.

Simple Interest
When we borrow or loan money, usually we have to calculate interest, which can be thought
of as “rent” charged for use of the money. Discussing interest starts with the principal, the
amount of money at the start of an investment or loan. Interest, in its most simple form, is
calculated as a percent of the principal. For example, if you borrowed $100 from a friend
and agree to repay it with 5% interest, then the amount of interest you would pay would just
be 5% of 100: $100(0.05) = $5. The total amount you would repay would be $105, the
principal plus the interest.

One-time Simple Interest


I  P0 r
A  P0  I  P0  P0 r  P0 1  r 

I is the interest
A is the end amount: principal plus interest
P0 is the principal (starting amount)
r is the interest rate (in decimal form, e.g. 5% = 0.05)

Example 1: Friendly Loan


A friend asks to borrow $300 and agrees to repay it in 30 days with 3% interest. How much
interest will you earn?

P0 = $300 Principal, or starting amount of the loan


r = 0.03 3% rate
I = $300(0.03) = $9. Values substituted into I = P0r, and simplified

You will earn $9 in interest.

One-time simple interest is only common for extremely short-term loans. For longer term
loans, it is common for interest to be paid on a daily, monthly, quarterly, or annual basis. In
that case, interest would be earned regularly. For example, bonds are essentially a loan made
to the bond issuer (a company or government) by you, the bond holder. In return for the
loan, the issuer agrees to pay interest, often annually. Bonds have a maturity date, at which
time the issuer pays back the original bond value.

© David Lippman, edited by Laurel Clifford Creative Commons BY-SA


2

Example 2: Bond Issue


Suppose your city is building a new park, and issues bonds to raise the money to build it.
You obtain a $1,000 bond that pays 5% interest annually that matures in 5 years. How much
interest will you earn?

Each year, you would earn 5% interest: $1000(0.05) = $50 in interest. So over the course of
five years, you would earn a total of $250 in interest. When the bond matures, you would
receive back the $1,000 you originally paid in principal, leaving you with a total of $1,250.

We can generalize this idea of simple interest over time.

Simple Interest over Time


I  P0 rt
A  P0  I  P0  P0 rt  P0 (1  rt )

I is the interest
A is the end amount: principal plus interest
P0 is the principal (starting amount)
r is the interest rate in decimal form
t is time

The units of measurement (years, months, etc.) for the time should match the time
period for the interest rate. If the interest is most often quoted annually (per year), so
time should reflect the appropriate fraction of a year.

APR – Annual Percentage Rate


Interest rates are usually given as an annual percentage rate (APR) – the total interest
that will be paid in the year. If the interest is paid in smaller time increments, the APR
will be divided up.

For example, a 6% APR paid monthly would be divided into twelve 0.5% payments.
A 4% annual rate paid quarterly would be divided into four 1% payments.

Example 3: T-note Interest


Treasury Notes (T-notes) are bonds issued by the federal government to cover its expenses.
Suppose you obtain a $1,000 T-note with a 4% annual rate, paid semi-annually, with a
maturity in 4 years. How much interest will you earn?

Since interest is being paid semi-annually (twice a year), the 4% interest will be divided into
two 2% payments, and there will be (4)(2) half-year time periods.

P0 = $1000 Principal (starting amount of the investment)


r = 0.02 2% rate per half-year
t=8 4 years = 8 half-years
I = $1000(0.02)(8) = $160.
You will earn $160 interest total over the four years.
Finance 3

Try it Now 1:
A loan company charges $30 interest for a one month loan of $500. Find the annual interest
rate (r) they charge.

Example 4: Short-term Loan


You take out a $500 loan with a 60 day term at an annual interest rate of 12%. When the
loan comes due, what is the total amount you will have to pay back?

The interest rate quoted, 12%, is an annual rate, but you borrow the money for less than a
year, so you won’t pay the entire 12%. Using 12% = 0.12 the yearly rate for r will require us
to calculate time in years as well. A 60 day term out of 365 gives us 60/365 of a year.

P0 = $500
r = 0.12 per year
t = 60/365 of a year
 0.12   60 
I   $500    year   $9.86 in interest
 1 year   365 
So the total amount to pay back would be A = P0 + I = $500 + $9.86 = $509.86

In the United States, banks may use what is known as Banker’s rule, viewing a year as 360
days, so our 60 day term would now be out of 360 days, 60/360 = 1/6 of a year:

P0 = $500
r = 0.12 per year
t = 60/360 or 1/6 of a year
 0.12   60 
I   $500    year   $10.00 in interest
 1 year   360 
So the total amount to pay back would be
A = P0 + I = $500 + $10.00 = $510.00

Looking at the results of our two calculations, we can see who the
Banker’s rule favors: the lender! Money Flower1

Some lenders allow borrowers to make partial payments on the loan before it is due. The
payment is first applied to the interest accrued to that date, and then the remainder of the
payment applies to principal. When the final payment is made, the remaining interest due is
calculated on the remaining principal for the remaining time.

Example 5: Partial Payments


On June 1st, you take out a $500 loan with a 60 day term at an annual interest rate of 12%.
You decide to make a partial payment of $200 on June 20th. How much is due on the
maturity date of the loan?

1 Photo by Evan-Amos, public domain via http://commons.wikimedia.org


4

We need to be careful with dates and time. June 20th is 19 days after the date the loan was
taken out. The loan has a 60 day term, and 60 days from June 1st is July 31st (not July 30th, as
we need to count days after June 1st). After June 20th, there will be 60 – 19 = 41 days left in
the term of the loan.

Since you’re making a payment on June 20th, we need to calculate the interest accrued up to
that date:

P0 = $5000
r = 0.12 per year
t = 19/360 year (using 19 days out of 360 in a year)
 0.12   19 
I   $500    year   $3.17 in interest
 1 year   360 
Next, we need to apply part of the partial payment to pay the interest accrued:
$200 – $3.17 = $196.83
This remaining amount gets applied to the original principal:
$500 – $196.83 = $303.17

You still owe $303.17 on the loan. So when you repay the remaining principal and interest
on July 31st, you will only pay interest on the $303.17:

P0 = $303.17
r = 0.12 per year
t = 41/360 year (using 41 days out of 360 in a year)
 0.12   41 
I   $303.17    year   $4.14 in interest.
 1 year   360 
With this calculation, your final payment on July 31st, including principal and interest is:
$303.17 + $4.14 = $307.31.

Note that the total interest you paid on this loan is now $3.17 + $4.14 = $7.31.
Without the partial payment (see Example 4), you would pay $10.00 in
interest. Making a partial payment saved you $2.69.

Compound Interest
With simple interest, we were assuming that we pocketed or paid the interest only at the
point when we received it, usually at the end of the term of the loan or investment. In a
standard bank account, any interest we earn is automatically added to our balance, and we
earn interest on that interest in future years. This reinvestment of interest is called
compounding.

Suppose that we deposit $1000 in a bank account offering 3% interest, compounded monthly.
How will our money grow?

The 3% interest is an annual percentage rate (APR) – the total interest to be paid during the
year. Since interest is being paid monthly, each month, we will earn 3% = 0.25% per month.
12
Finance 5

In the first month,


P0 = $1000 Principal deposit
r = 0.0025 (0.25%) Be careful in converting to decimal form: divide 0.25 by 100
I = $1000 (0.0025) = $2.50 Interest earned in the first month
A = $1000 + $2.50 = $1002.50 New account balance, with interest added to principal

In the first month, we will earn $2.50 in interest, raising our account balance to $1002.50.
In the second month,
P0 = $1002.50 Principal is now the account balance with the accrued interest
I = $1002.50 (0.0025) = $2.51 (rounded) Interest is calculated using the new balance.
A = $1002.50 + $2.51 = $1005.01 New account balance: previous balance plus interest

Notice that in the second month we earned more interest than we did in the first month! We
earned interest not only on the original $1000 we deposited, but we also earned interest on
the $2.50 of interest we earned the first month. This is the key advantage that compounding
of interest gives us.

Calculating out a few more months:


Month Starting balance Interest earned Ending Balance
1 1000.00 2.50 1002.50
2 1002.50 2.51 1005.01
3 1005.01 2.51 1007.52
4 1007.52 2.52 1010.04
5 1010.04 2.53 1012.57
6 1012.57 2.53 1015.10
7 1015.10 2.54 1017.64
8 1017.64 2.54 1020.18
9 1020.18 2.55 1022.73
10 1022.73 2.56 1025.29
11 1025.29 2.56 1027.85
12 1027.85 2.57 1030.42

To find an equation to represent this, if Pm represents the amount of money after m months,
then we could write the recursive equation:

P0 = $1000
Pm = (1 + 0.0025)Pm-1 Note the growth rate here: 100% + 0.25% = 100.25% = 1.0025

You probably recognize this as the recursive form of exponential growth. If not, we could
go through the steps to build an explicit equation for the growth:
P0 = $1000
P1 = 1.0025P0 = 1.0025 (1000)
P2 = 1.0025P1 = 1.0025 (1.0025 (1000)) = 1.0025 2(1000)
P3 = 1.0025P2 = 1.0025 (1.00252(1000)) = 1.00253(1000)
P4 = 1.0025P3 = 1.0025 (1.00253(1000)) = 1.00254(1000)
6

Observing a pattern, we could conclude


Pm = (1.0025)m($1000)
which can be rearranged as
Pm = ($1000) (1.0025)m
which is the same as the explicit form of exponential (geometric) growth
Pm = (P0) (1 + r)m

Notice that the $1000 in the equation was P0, the starting amount. We found 1.0025 by
adding one to the growth rate of 3% divided by 12, since we were compounding 12 times per
year:
Pm = ($1000) (1 + 0.03/12)m
In general, we can think of this equation as:
Pm = P0 (1 + r/k)m
where k = the number of times we are compounding per year and the exponent m represents
how many times we have compounded (added in and recalculated) the interest.

While this formula works fine, it is more common to use a formula that involves the number
of years. If t is the number of years, then m = kt. Making this change gives us the standard
formula for compound interest. Some texts use A for Pt, which gives us a mnemonic for
what the output of this formula is: the account balance after time.

Compound Interest
kt
 r
A  Pt  P0 1  
 k
A = Pt is the balance in the account after t years.
P0 is the starting balance of the account (also called initial deposit, or principal)
r is the annual interest rate in decimal form
k is the number of compounding periods in one year (k = 12 for monthly, k = 4 for
quarterly, k = 1 for annually, etc.)

An important thing to remember about using this formula is that it assumes that we put
money in the account once and let it sit there earning interest.

Example 6: Certificate of Deposit


A certificate of deposit (CD) is a savings instrument that many banks offer. It usually gives a
higher interest rate, but you cannot access your investment for a specified length of time.
Suppose you deposit $3000 in a CD paying 6% interest, compounded monthly. How much
will you have in the account after 20 years?

P0 = $3000 Principal, or the initial deposit


r = 0.06 6% annual rate, converted to decimal form (divide by 100)
k = 12 Compounded monthly, 12 times per year
t = 20 We’re looking for how much we’ll have after 20 years

12 20
 0.06 
Thus A  P20  3000 1    $9930.61 (round your answer to the nearest penny)
 12 
Finance 7

Compare the amount of money earned from compounding against the amount you would
earn from simple interest:

Years Simple Interest 6% compounded 25000


($15 per month) monthly = 0.5%

Account Balance ($)


each month. 20000

5 $3900 $4046.55 15000


10 $4800 $5458.19
10000
15 $5700 $7362.28
20 $6600 $9930.61 5000

25 $7500 $13394.91 0
30 $8400 $18067.73 0 5 10 15 20 25 30 35

35 $9300 $24370.65 Years

As you can see, over a long period of time, compounding makes a large difference in the
account balance. You may recognize this as the difference between linear growth and
exponential growth.

REMINDER: Evaluating exponents on the calculator


When we need to calculate something like 53 it is easy enough to just multiply
5⋅5⋅5=125. But when we need to calculate something like 1.005240 , it would be very
tedious to calculate this by multiplying 1.005 by itself 240 times! So to make things
easier, we can harness the power of our scientific calculators.

Most scientific calculators have a button for exponents. It is typically either labeled
like:
^ , yx , or xy .

To evaluate 1.005240 we'd type 1.005 ^ 240, or 1.005 yx 240. Try it out - you should
get something around 3.3102044758.

BE VERY CAREFUL with the exponent on compound interest as it is a product of k


times t, and if you just do (1+0.06/12)^12*20, your calculator will not recognize that
20 should be part of the exponent.

You have two options:


1) Do the exponent product separately: 20*12 = 240, then do (1+0.06/12)^240
2) Use parentheses around the exponent: (1+0.06/12)^(20*12)

If all else fails, follow the order of operations step by step!


12 20
 0.06 
P20  3000 1    $9930.61
 12 
Perform the group 20*12 = 240 first. Then complete the operations inside the
parentheses: 0.06/12 and add 1. Take that result to the 240th exponent, and lastly,
multiply the result by 3000. It should equal about $9930.61.
8

Example 7: Saving for Education (Present Value)


You know that you will need $40,000 for your child’s education in 18 years. If your account
earns 4% compounded quarterly, how much would you need to deposit now to reach your
goal?

Since we’re looking for what we need to start with, we’re looking for P0.
r = 0.04 4% interest rate
k=4 4 quarters in 1 year
t = 18 since we know the balance in 18 years
A = P18 = $40,000 the amount we have in 18 years

In this case, we’re going to have to set up the equation, and solve for P0.
4 18
 0.04 
40000  P0 1  
Substitute the values in for each variable
 4 
40000  P0  2.047099312  Simplify the expression on the right using PEMDAS
40000
 P0 Divide both sides by 2.047099312 to isolate P0.
2.047099312
19539.84341  P0

So you would need to deposit $19,539.85 now to have $40,000 in 18 years. It may seem
strange to round the final answer up, but if we deposit $19,539.84 for P0, we’ll only have
$39,999.99302 in our account, so we won’t quite it $40,000. Rounding up will get us just
over $40,000.

Rounding
It is important to be very careful about rounding when calculating things with
exponents. In general, you want to keep as many decimals during calculations as you
can. Be sure to keep at least 3 significant digits (numbers after any leading zeros).
Rounding 0.00012345 to 0.000123 will usually give you a “close enough” answer, but
keeping more digits is always better. Use your calculator wisely to bring the answers
along step by step.

Example 8: The Effect of Rounding


To see why not over-rounding is so important, suppose you were investing $1000 at 5%
interest compounded monthly for 30 years.

P0 = $1000 the initial deposit


r = 0.05 5% interest rate
k = 12 12 months in 1 year
t = 30 since we’re looking for the amount after 30 years

If we first compute r/k, we find 0.05/12 = 0.00416666666667


Finance 9

The table shows the effect of rounding 0.00416666666667 to different values:


Gives P30 to be: Error
r/k rounded to:
0.004 $4208.59 $259.15
0.0042 $4521.45 $53.71
0.00417 $4473.09 $5.35
0.004167 $4468.28 $0.54
0.0041667 $4467.80 $0.06
no rounding $4467.74

If you’re working in a bank, of course you wouldn’t round at all. For our purposes, the
answer we got by rounding to 0.00417, three significant digits, is close enough - $5 off of
$4500 isn’t too bad. Certainly keeping that fourth decimal place wouldn’t have hurt.

Using your calculator


In many cases, you can avoid rounding completely by how you enter things in your
calculator. For example, in the example above, we needed to calculate
1230
 0.05 
A  P30  1000 1  
 12 
360
 0.05 
We can quickly calculate 12×30 = 360, giving A  P30  1000 1   .
 12 
Now we can use the calculator.

Type this Calculator shows


0.05 ÷ 12 = . 0.00416666666667
+ 1 = . 1.00416666666667
^ or yx 360 = . 4.46774431400613
× 1000 = . 4467.74431400613

The previous steps were assuming you have a “one operation at a time” calculator; a
more advanced calculator will often allow you to type in the entire expression to be
evaluated. If you have a calculator like this, you will probably just need to enter:

1000 × ( 1 + 0.05 ÷ 12 ) yx 360 = .\

On the TI calculators, you can type: 1000(1 + 0.05/12)^(12*30) and get the correct
answer if you use all the parentheses as shown.

Try it now 2:
Suppose you have $2,500 to invest. Which investment would give you a greater return after
10 years, investing the money at 5.75% compounded monthly or 6.00% compounded
annually?
10

Annuities
For most of us, we save for the future by making payments into our savings, depositing a
smaller amount of money from each paycheck into the bank. This idea is called a savings
annuity. Most retirement plans like 401k plans or IRA plans are examples of savings
annuities, accumulating deposits and interest on these deposits.

Suppose we will deposit $100 each month into an account paying 6% interest. We assume
that the account is compounded with the same frequency as we make deposits unless stated
otherwise. In this example:
r = 0.06 since our interest rate is 6%
k = 12 since we’re making 12 deposits per year, it gets compounded 12 times
P = $100 since $100 is our monthly deposit

We’ll use the variable S to represent the total amount in the annuity, as it is a sum of the
deposits and interest.

Assuming we start with no money, then deposit $100 into the account. When we make our
next $100 deposit, we will earn interest on the previous account balance.
S0  0
 0.06 
S1  S0 1    100  0 1.005   100  100
 12 
 0.06 
S 2  S1  1    100  100 1.005   100
 12 
 0.06 
  100  100 1.005   100  1.005   100  100 1.005   100 1.005   100
2
S3  S 2  1 
 12 
A pattern emerges where we can see the (1.005) multiplier and recognize that we have an
exponential (geometric) series:
S3  100 1.005   100 1.005   100
2 1

And at the end of 12 months, we’d have:


S12  100 1.005   100 1.005   ...  100 1.005   100 1.005   100
11 10 2 1

Looking at this series, we can see P0 = 100 (the first term), the multiplier (1 + r) = 1.005,
with r = 0.005 , and n = 12 since we have 12 terms, and applying our formula for the sum of
exponential (geometric) growth,

Sn 

P0 1  r   1
n

r

S12 

$100 1.005   1
12
  $1233.55
0.005
We have accumulated $1233.55 in our annuity after 1 year (12 months). We made 12
deposits of $100, so we’ve deposited $1200, and thus $33.55 is interest earned.
Finance 11

0.06 r
We can generalize this process to a formula. Consider that our “r” came from  , our
12 k
annual interest rate divided by the number of compounding periods per year, we can rewrite
our exponential (geometric) sum formula specifically for calculating the balance in an
annuity. We also replace n with kt since kt will total number of compounding periods, which
matches the total number of terms:

Annuity Formula
  r  kt 
P   1    1
 k  
S  
r
k
S is the balance in the annuity after t years
P is the regular deposit (the amount you deposit each year, each month, etc.)
r is the annual interest rate in decimal form.
k is the number of compounding periods in one year.
t is the total number of years

If the compounding frequency is not explicitly stated, assume there are the same number of
compounds in a year as there are deposits made in a year. For example, if the compounding
frequency isn’t stated:
If you make your deposits every month, use monthly compounding, k = 12.
If you make your deposits every year, use yearly compounding, k = 1.
If you make your deposits every quarter, use quarterly compounding, k = 4, etc.

When do you use this formula?


Annuities assume that you put money in the account on a regular schedule (every
month, year, quarter, etc.) and let it sit there earning interest.

Compound interest assumes that you put money in the account once and let it sit there
earning interest.

Compound interest: One deposit


Annuity: Many deposits.

Example 9: IRA Account


A traditional individual retirement account (IRA) is a special type of retirement account in
which the money you invest is exempt from income taxes until you withdraw it. If you
deposit $100 each month into an IRA earning 6% interest, how much will you have in the
account after 20 years?

P = $100 the monthly deposit


r = 0.06 6% annual interest rate
k = 12 since we’re doing monthly deposits, we’ll compound monthly
t = 20 we want the amount after 20 years
12

Putting these values into the equation:


  r  kt    0.06 12 20 
P   1    1 $100  1    1
 k    12  
S   we have: S  
r 0.06
k 12
Following the order of operations carefully,

S

$100 1.005 
240
1 Simplify the operations inside parentheses and in
the exponent
0.005
$100  2.310204476  Apply the exponent, then subtract 1
S
0.005
S  $46204.08952 Multiply then divide by the numerator

The account will grow to $46,204.08 after 20 years.


Notice that you deposited into the account a total of $24,000 ($100 a month for 240 months).
The difference between what you end up with and how much you put in is the interest
earned. In this case it is $46,204 - $24,000 = $22,204 (and some pennies).

Example 10: Annuity Payments


You want to have $200,000 in your account when you retire in 30 years. Your retirement
account earns 8% interest. How much do you need to deposit each month to meet your
retirement goal?

In this example,
We’re looking for P.
r = 0.08 8% annual interest rate
k = 12 since we’re depositing monthly
t = 30 we retire in 30 years
S = $200,000 the amount we want to have in 30 years

In this case, we’re going to have to set up the equation, and solve for P.
  r  kt 
P   1    1
 k  
S   , substitute in the given values, and we have:
r
k
  0.08 12 30 
P  1    1
 12  
$200, 000   
0.08
12
Simplify the right side very carefully using the order of operations:
Finance 13

$200, 000 

P 1.006666....
360

1 Simplify the operations inside parentheses and in
the exponent
0.00666666...
P  9.935729658  Apply the exponent, then subtract 1
$200, 000 
0.00666666...
$200, 000  P 1490.35945  Divide the numerator by the denominator

Divide both sides by 1490.35945:


$134.1958… = P

So you would need to deposit about $134.20 each month to have $200,000 at the end of 30
years.

We solved for P by applying algebra skills after simplifying the expressions created in the
equation. We can also apply those same algebra skills to solve for P in the general formula
  r  kt 
P   1    1 r
 k   S 
S   and we have P  k .
r   r  kt 
k   1    1
 k  

Try it Now 3:
A more conservative investment account pays 3% interest. If you deposit $5 a day into this
account, how much will you have after 10 years? How much is from interest?

Payout Annuities
In the last section you learned about annuities. In an annuity, you start with nothing, put
money into an account on a regular basis, and end up with money in your account.

In this section, we will learn about a variation called a Payout Annuity. With a payout
annuity, you start with money in the account, and pull money out of the account on a regular
basis. Any remaining money in the account earns interest. After a fixed amount of time, the
account will end up empty.

Payout annuities are typically used after retirement. Perhaps you have saved $500,000 for
retirement, and want to take money out of the account each month to live on. You want the
money to last you 20 years. This is a payout annuity. The formula is derived in a similar
way as we did for savings annuities. The details are omitted here, but you may observe that
the exponent is negative, which makes sense as the account is decaying rather than growing,
money is leaving the account rather than accumulating in it.
14

Payout Annuity Formula


  r   kt 
P 1  1   
  k 
P0   
r
k
P0 is the balance in the account at the beginning (starting amount, or principal).
P is the regular withdrawal (the amount you take out each year, each month, etc.)
r is the annual interest rate (in decimal form. Example: 5% = 0.05)
k is the number of compounding periods in one year.
t is the number of years we plan to take withdrawals

Like with annuities, the compounding frequency is not always explicitly given, but is
determined by how often you take the withdrawals.

When do you use this formula?


Payout annuities assume that you take money from the account on a regular schedule
(every month, year, quarter, etc.) and let the rest sit there earning interest.
 Compound interest: One deposit
 Annuity: Many deposits.
 Payout Annuity: Many withdrawals

Example 11: Payout annuities


After retiring, you want to be able to take $1000 every month for a total of 20 years from
your retirement account. The account earns 6% interest. How much will you need in your
account when you retire?

P = $1000 the monthly withdrawal


r = 0.06 6% annual rate
k = 12 since we’re doing monthly withdrawals, we’ll compound monthly
t = 20 since were taking withdrawals for 20 years

We’re looking for P0; how much money needs to be in the account at the beginning.
Putting this into the equation:

  r   kt 
P 1  1   
  k 
P0   
r
k
  0.06  12 20 
$1000 1  1   
  12 
we have: P0   
0.06
12
Finance 15

P0 

$1000 1  1.005 
240
 Simplify the operations inside parentheses and in
the exponent
0.005
$1000 1  0.3020961416  Apply the exponent, then subtract the result from 1
P0 
0.005
$1000  0.6979038584  Multiply within the numerator, and divide the
P0  numerator by the denominator
0.005
P0  $139,580.7717

You will need $139,580.77 in your account when you retire.

Notice that you withdrew a total of $240,000 ($1000 a month for 240 months). The
difference between what you pulled out and what you started with is the interest earned. In
this case it is $240,000 - $139,580.77 = $100,419.23 in interest.

Evaluating negative exponents on your calculator


With these problems, you need to raise numbers to negative powers. Most calculators
have a separate button for negating a number that is different than the subtraction
button. Some calculators label this (-) , some with +/- . The button is often near the
= key or the decimal point.

If your calculator displays operations on it (typically a calculator with multiline


display), to calculate 1.005-240 you'd type something like: 1.005 ^ (-) 240

If your calculator only shows one value at a time, then usually you hit the (-) key after
a number to negate it, so you'd hit: 1.005 yx 240 (-) =

Give it a try - you should get 1.005-240 = 0.302096

Example 12: Withdrawal from Retirement Fund


You know you will have $500,000 in your account when you retire. You want to be able to
take monthly withdrawals from the account for a total of 30 years. Your retirement account
earns 8% interest. How much will you be able to withdraw each month?

r = 0.08 8% annual rate


k = 12 since we’re withdrawing monthly
t = 30 30 years
P0 = $500,000 we are beginning with $500,000

In this case, we’re looking for P, thus we’re going to have to set up the equation, and solve
for P.
16

  r   kt 
P 1  1   
  k 
P0    , substituting in the given values leads to:
r
k
  0.08  12 30 
P 1  1    Simplify the operations inside parentheses and in
  12 
$500, 000    the exponent, simplify the denominator
0.08
12

$500, 000 

P 1  1.0066666666..
360
 Apply the exponent, then subtract the result from 1
0.00666666...
P  0.9085566276  Divide the numerator by the denominator
$500, 000 
0.00666666....
Divide both sides by 136.2834941 to isolate P
$500, 000  P 136.2834941
$3,668.82 = P

You would be able to withdraw $3,668.82 each month for 30 years.

Once again, we used algebra to solve for P after simplifying the expressions in the equation.
As we did with savings annuities, we can also solve for P in the general formula, so that
  r   kt 
P 1  1    r
  k  P0  
P0    becomes P  k .
r   r   kt 
k  1   1   
  k 

Try it Now 4:
A donor gives $100,000 to a university, and specifies that it is to be used to give annual
scholarships for the next 20 years. If the university can earn 4% interest, how much can they
withdraw out to give in scholarships each year?

Loans and amortization


In the last section, you learned about payout annuities.

In this section, you will learn about conventional loans (also called amortized loans or
installment loans). Examples include auto loans and home mortgages. These techniques do
not apply to payday loans, add-on loans, or other loan types where the interest is calculated
up front.

One great thing about loans is that they use exactly the same formula as a payout annuity. To
see why, imagine that you had $10,000 invested at a bank, and started taking out payments
while earning interest as part of a payout annuity, and after 5 years your balance was zero.
Finance 17

Flip that around, and imagine that you are acting as the bank, and a car lender is acting as
you. The car lender invests $10,000 in you. Since you’re acting as the bank, you pay
interest. The car lender takes payments until the balance is zero.

Loans Formula
  r   kt 
P 1  1   
  k 
P0   
r
k
P0 is the balance in the account at the beginning (the principal, or amount of the loan).
P is your loan payment (your monthly payment, annual payment, etc)
r is the annual interest rate in decimal form.
k is the number of compounding periods in one year.
t is the length of the loan, in years

Like before, the compounding frequency is not always explicitly given, but is determined by
how often you make payments.

When do you use this formula?


The loan formula assumes that you make loan payments on a regular schedule (every
month, year, quarter, etc.) and are paying interest on the loan.
 Compound interest: One deposit
 Annuity: Many deposits.
 Payout Annuity: Many withdrawals
 Loans: Many payments

Example 13: How Much Car?


You can afford $200 per month as a car payment. If you can get an auto loan at 3% interest
for 60 months (5 years), how expensive of a car can you afford? In other words, what
amount loan can you pay off with $200 per month?

In this example,
P = $200 the monthly loan payment
r = 0.03 3% annual rate
k = 12 since we’re doing monthly payments, we’ll compound monthly
t=5 since we’re making monthly payments for 5 years

We’re looking for P0, the starting amount of the loan.


  r   kt 
P 1  1   
  k 
P0   
r
k
Substituting the given values into the equation, we have:
18

  0.03  12 5 
$200 1  1    Simplify the operations inside parentheses and in
  12 
P0    the exponent, simplify the denominator
0.03
12

P0 

$200 1  1  0.0025 
60
 Apply the exponent, then subtract the result from 1
0.0025
$200 1  0.8608691058 
P0 
0.0025 Multiply the numerator terms and divide by the
$200  0.1391308942  denominator
P0 
0.0025
P0  $11130.47154

You can afford an $11,130 loan.

You will pay a total of $12,000 ($200 per month for 60 months) to the loan company. The
difference between the amount you pay and the amount of the loan is the interest paid. In
this case, you’re paying $12,000-$11,320 = $870 interest total.

Example 14: Mortgage Payments


You want to take out a $140,000 mortgage (home loan). The interest rate on the loan is 6%,
and the loan is for 30 years. How much will your monthly payments be?

In this example,
We’re looking for P.
r = 0.06 6% annual interest rate
k = 12 since we’re paying monthly
t = 30 30 years
P0 = $140,000 the starting loan amount

In this case, we’re going to have to set up the equation, and solve for P.
  r   kt 
P 1  1   
  k 
P0    , substituting in the given values results in:
r
k
  0.06  12 30 
P 1  1   
  12 
$140, 000   
0.06
12

Simplify carefully, following the order of operations:


Finance 19

$140, 000 

P 1  1.005 
360

0.005
P  0.833958072 
$140, 000 
0.005
$140, 000  P 166.7916144 

Divide both sides by 166.7916144, and we have:


$839.3707352 = P

You will make payments of $839.37 per month for 30 years.

You’re paying a total of $302,173.20 to the loan company: $839.37 per month for 360
months. You are paying a total of $302,173.20 - $140,000 = $162,173.20 in interest over the
life of the loan.

Try it Now 5:
Janine bought $3,000 of new furniture on credit. Because her credit score isn’t very good,
the store is charging her a fairly high interest rate on the loan: 16%. If she agreed to pay off
the furniture over 2 years, how much will she have to pay each month?

Qualifying for a Mortgage Loan


Lenders need to know whether or not you are capable of paying back a loan. They take a risk
in lending you a large amount of money to purchase a house. Lenders look at your credit
history, your monthly income, and debt payments in comparison to your mortgage payment
to decide whether to take the risk. One guideline is “your monthly mortgage payment,
including principal, interest, real estate taxes and homeowners insurance, should not exceed
28 percent of your gross monthly income.”2 The Federal Housing Authority uses 31% as its
ratio. 3

Suppose you make $36,000 a year, which means you make $36000/12 = $3000 per month
gross income (income before taxes or any other payments are removed). Using the 28%
guideline, you would be able to afford $3000(0.28) = $840 per month for your monthly
housing payment (including taxes and insurance). If the monthly insurance and taxes on the
house are estimated to be $100, that leaves $740 for the mortgage payment. How much
“house” can you afford, assuming your credit is good enough to qualify for a 30 year
conventional mortgage at 4.5% interest rate?

Once again, we are looking for P0, the starting loan amount.

2 Mortgage Basics, Ch. 1: Can you afford that house? Know debt-to-income ratios. (n.d.). Mortgage Basics, Ch.
1: Can you afford that house? Know debt-to-income ratios. Retrieved June 9, 2014, from
http://www.bankrate.com/finance/mortgages/how-much-house-can-you-buy--1.aspx
3 http://portalapps.hud.gov/FHAFAQ/controllerServlet?method=showPopup&faqId=1-6KT-1040
20

We know P = $740, r = 0.045, t = 30, and since these are monthly payments, k = 12. Using
our amortization formula,
  r   kt 
P 1  1   
  k 
P0   
r
k
We have:
  0.045 12 30 
$740 1  1   
  12 
P0   
0.045
12
Simplifying carefully using the order of operations, we have: P0 = $146047.2577, so you
could finance $146,047 for your mortgage.

Is this the total you can pay for your house? Many lenders require a down payment of 5% to
20% of the loan, so this amount would represent 95% to 80% of the purchase price of the
house, not including closing costs. Assuming you have already saved the 5% down payment,
then we know 146047 = 0.95×(actual house price), so divide 146047 by 0.95, and you end up
with $153,733 for the purchase price of the house.

Lenders are also concerned about the debt you already have, and also calculate your debt-to-
income ratio. Add up your mortgage payment (including taxes, insurance) and any long-term
obligations such as car payments, student loan payments, credit card payments, and
installment loan payments that are recurring (lasting longer than 10 months). These debts
cannot be more than 36% of your income (43% for FHA).

Suppose you still have a monthly gross income of $3000, and $840 mortgage, taxes and
insurance payment. If you also have a $300 car payment and $250 student loan payment,
plus $150 monthly credit card payment, will you still qualify for the mortgage? Using
$3000(0.36) = $1080. Your debts add to $840 + $300 + $250 + $150 = $1540, which means
you are over the debit limit, and would not qualify for the mortgage. Pay off the debts
before buying the house.

Try it now 6:
If your annual income is $45,000, and you have a $175 car payment and $125 student loan
payment with no other revolving debt, would you qualify for a $140,000 mortgage with a 30
year loan at 5%? Assume insurance and taxes on the house to be $180 per month.

Remaining Loan Balance


With loans, it is often desirable to determine what the remaining loan balance will be after
some number of years. For example, if you purchase a home and plan to sell it in five years,
Finance 21

you might want to know how much of the loan balance you will have paid off and how much
you have to pay from the sale.

To determine the remaining loan balance after some number of years, we first need to know
the loan payments, if we don’t already know them. Remember that only a portion of your
loan payments go towards the loan balance; a portion is going to go towards interest. For
example, if your payments were $1,000 a month, after a year you will not have paid off
$12,000 of the loan balance.

To determine the remaining loan balance, we can think “how much loan will these loan
payments be able to pay off in the remaining time on the loan?”

Example 15: Remaining Loan Balance


If a mortgage at a 6% interest rate has payments of $1,000 a month, how much will the loan
balance be 10 years from the end the loan?

To determine this, we are looking for the amount of the loan that can be paid off by $1,000 a
month payments in 10 years. In other words, we’re looking for P0 when
P = $1,000 the monthly loan payment
r = 0.06 6% annual rate
k = 12 since we’re doing monthly payments, we’ll compound monthly
t = 10 since we’re making monthly payments for 10 more years

  0.06  10(12) 
1000 1  1   
  12 
P0   
 0.06 
 
 12 

P0 

1000 1  1.005 
120

 0.005 
1000 1  0.5496 
P0   $90, 073.45
 0.005 
The loan balance with 10 years remaining on the loan will be $90,073.45

Which equation to use?


When presented with a finance problem (on an exam or in real life), you're usually not told
what type of problem it is or which equation to use. Here are some hints on deciding which
equation to use based on the wording of the problem.

The easiest types of problem to identify are loans. Loan problems almost always include
words like: "loan", "amortize" (the fancy word for loans), "finance (a car)", or "mortgage" (a
home loan). Look for these words. If they're there, you're probably looking at a loan
22

problem. To make sure, see if you're given what your monthly (or annual) payment is, or if
you're trying to find a monthly payment.

If the problem is not a loan, the next question you want to ask is: "Am I putting money in an
account and letting it sit, or am I making regular (monthly/annually/quarterly) payments or
withdrawals?" If you're letting the money sit in the account with nothing but interest
changing the balance, then you're looking at a compound interest problem. The exception
would be bonds and other investments where the interest is not reinvested; in those cases
you’re looking at simple interest.

If you're making regular payments or withdrawals, the next questions is: "Am I putting
money into the account, or am I pulling money out?" If you're putting money into the
account on a regular basis (monthly/annually/quarterly) then you're looking at a
basic Annuity problem. Basic annuities are when you are saving money. Usually in an
annuity problem, your account starts empty, and has money in the future.
If you're pulling money out of the account on a regular basis, then you're looking at a Payout
Annuity problem. Payout annuities are used for things like retirement income, where you
start with money in your account, pull money out on a regular basis, and your account ends
up empty in the future.

Remember, the most important part of answering any kind of question, money or otherwise,
is first to correctly identify what the question is really asking, and to determine what
approach will best allow you to solve the problem.

Try it Now 7:
For each of the following scenarios, determine if it is a compound interest problem, a savings
annuity problem, a payout annuity problem, or a loans problem.

a. Marcy received an inheritance of $20,000, and invested it at 6% interest. She is


going to use it for college, withdrawing money for tuition and expenses each
quarter. How much can she take out each quarter if she has 3 years of school
left?
b. Paul wants to buy a new car. Rather than take out a loan, he decides to save
$200 a month in an account earning 3% interest compounded monthly. How
much will he have saved up after 3 years?
c. Keisha is managing investments for a non-profit company. They want to invest
some money in an account earning 5% interest compounded annually with the
goal to have $30,000 in the account in 6 years. How much should Keisha
deposit into the account?
d. Miao is going to finance new office equipment at a 2% rate over a 4 year term.
If she can afford monthly payments of $100, how much new equipment can she
buy?
e. How much would you need to save every month in an account earning 4%
interest to have $5,000 saved up in two years?
Finance 23

Reference: Flow Chart for Choosing Financial Formulas

Does it involve simple


or compound interest?

Simple Interest Compound Interest


Is there a single deposit
or periodic payments?

Single Deposit
Compound Interest
Periodic Payments
Do payments go into
an account or pay out
of/pay off an account?

Payments into an Account


Savings Annuity Payments Out
Payout Annuity or
Amortization of Debt
Number Theory 1

“The concept of number is the obvious distinction between the beast and man. Thanks to
number, the cry becomes a song, noise acquires rhythm, the spring is transformed into a
dance, force becomes dynamic, and outlines figures.”—Joseph Marie de Maistre

Number Theory
From the day we become aware of the world around us, we begin recognizing quantity and
number. Whether it be the number of toys in our room or cereal bites on our high chair tray,
we learn to count. Ancient peoples used pebbles, sticks, knots in string, tally marks in clay,
then formal symbols and numeration systems to record the quantities around them. As the
quantities we deal with become more complicated, we develop new numbers to record them.
Our modern number system is a product of millennia of thought and theory. In this chapter,
we examine the numbers we work with and what they mean.

Natural Numbers
A sheepherder looks out at their flocks, and notes how many sheep they have, but how to
record this quantity? A set of numbers is required, and some sort of symbol to represent
these numbers. Our society uses the Hindu-Arabic numerals you have seen since you were a
child, with digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. But what do these numbers mean? How do they
behave? What operations can be performed with them, and what do the results look like?
Once the sheepherder had leisure time to think, they pondered the meaning of the numbers
they used.

Around 300 BC, the Greek mathematician Euclid summarized the known mathematics in his
work The Elements. Normally thought of as a work of geometry, The Elements also includes
sections on number and number theory. Euclid defined concepts about Natural Numbers, a
set with which you are very familiar with, N = {1, 2, 3, 4, …}. Euclid called the first
number, 1, the “unit:”

A unit is that by virtue of which each of the things that exist is called one. 1
A number is a multitude composed of units.

We can think of the number 2 as composed of units, where the unit is 1, simply by recalling
that 1 + 1 = 2. Euclid also defined even and odd numbers using definitions that will seem
very familiar to you as well, where an even number can be divided by 2 and an odd number
cannot be divided by 2, and differs from an even number by… a unit! Surprised? Just in
these first few definitions in The Elements, you can see the effect of Greek mathematics on
your own mathematical education.

Every natural number greater than 1 is either prime or composite. Euclid defined prime
numbers as being “measured” only by 1, meaning the only factors of the number are 1 and
itself. He defined natural numbers that were not prime as composite. Another way to define
prime numbers is to state that prime numbers have only two unique factors, and thus
composite numbers have more than two unique factors. With these definitions, we can

Definition 1. (n.d.). Euclid's Elements, Book VII, Definitions 1 and 2. Retrieved June 16, 2014, from
http://aleph0.clarku.edu/~djoyce/java/elements/bookVII/defVII1.html
© Laurel Clifford Creative Commons BY-SA
2

answer the question: Is 1 a prime number? Ask yourself: how many factors does 1 have? It
has only one factor, so it is not a prime number, because prime numbers have two factors. It
is not composite either, as it doesn’t have more than two factors. Mathematicians view the
number 1 as a special number, giving it the same title Euclid did: 1 is the unit. Another way
to view this unit concept is to think that the 1 item you have represents whatever units you
are using to count (ounces, milligrams, feet, pickles, cats, whatever noun you are counting).

The number 1 is neither prime nor composite, what about other numbers, like 57? Is 57
prime or composite? You may be pondering ideas such as: it is not an even number, so it is
not divisible by 2; could it be divisible by 3? How can you tell without digging out your
calculator? Perhaps you know the divisibility test for determining if numbers are divisible
by 3: Add the digits of the number: 5 + 7 = 12. If the result is divisible by 3, then so is the
original number: since 12 is divisible by 3, so is 57. If you grab your calculator, you can see
that 57 = 3 × 19. Arabic mathematicians of the middle ages proved divisibility tests, as did
Fibonacci. The table below summarizes several divisibility tests:

A number is divisible by: if…


2 the ones’ digit is even (divisible by 2).
3 the sum of the digits is divisible by 3.
4 the last two digits form a number divisible by 4.
5 the ones’ digit is 0 or 5 (divisible by 5).
8 the last three digits form a number divisible by 8.
9 the sum of the digits is divisible by 9.
10 the ones’ digit is 0.

The divisibility test for 7 is not given here. The work involved in determining divisibility by
7 is complicated, and arguably we’re better off dividing the number by 7 to test it! What
about a divisibility test for 6? Consider that 6 is product of 2 and 3, 6 = 2 × 3, so to be
divisible by 6, a number would need to pass the divisibility tests for both 2 and 3.

Look back at the table, and notice that some of the tests focus just on the last digit, while
some use the sum of the digits. Our numeration system is based on sets of 10, with place
values 1, 10, 100, 1000, and so forth. The number 10 is a product of 2 and 5 (10 = 2 × 5), so
any place value other than 1 is divisible by 2 and 5. Thus the divisibility tests for 2 and 5
only look at the digit in the ones’ place. The number 10 is 9 + 1. To test divisibility by 9,
each digit is added, as it represents that extra amount “off” from 9 in each place value.
Understanding the reasoning behind the test will help you remember the test.

Example 1: Divisibility Tests


Use the divisibility tests to determine if 1,158,962,874,003 is a composite number.

This number is too big to put in a basic 4-function calculator to divide! We can reject
divisibility by 2, as the last digit (3) is odd. Similarly, it is not divisible by 5 or 10. To test
divisibility by 3, add the digits: 1+1+5+8+9+6+2+8+7+4+0+0+3 =54, 54 is divisible by 3, so
1,158,962,874,003 is divisible by 3 as well, and thus is a composite number (it has more than
2 factors).
Number Theory 3

Try it now 1:
Use the divisibility tests to determine whether 2, 3, 4, 5, 6, 8, 9, or 10 divide the following:
a. 1,256,957,844,024
b. 3,984,670,912,570

The Greek mathematician Eratosthenes (275-194 BC) devised a


'sieve' to discover prime numbers. A sieve 2 is like a strainer that
you drain spaghetti through when it is done cooking. The water
drains out, leaving your spaghetti behind. Eratosthenes's sieve
drains out composite numbers and leaves prime numbers behind.

To use the sieve of Eratosthenes to find the prime numbers up to 100, make a chart of the
first one hundred whole numbers (1-100):

1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100

Cross out 1, because it is not prime, it is a unit. Circle 2, because it is the smallest positive
even prime. Now cross out every multiple of 2. Circle 3, the next prime. Then cross out all
of the multiples of 3; some multiplies of 3, like 6, may have already been crossed out because
they are even. Circle the next open number, 5. Now cross out all of the multiples of 5. Circle
the next open number, 7. Now cross out all of the multiples of 7. Circle any number that is
left, and you have circled all the prime numbers from 1 to 100.

Why didn’t we have to look for multiples higher than 7? You may have noticed that the first
multiple of 7 you had left to cross out was 49, and 49 = 7 × 7. Every multiple of 7 that was
less than 49 was already crossed out because they had a smaller co-factor that was already
removed. For example, 35 was already removed when we removed multiples of 5. You may
also notice that 49 = 72, and 7 is the square root of 49. The largest prime that we test when
looking for factors of a number will be less than or equal to the square root of the number.

Try it now 2: Sieve of Eratosthenes


Go to the website
http://nlvm.usu.edu/en/nav/frames_asid_158_g_1_t_1.html?open=instructions&from=topic_t
_1.html and use the applet to find the primes less than 100 via the Sieve of Eratosthenes by
setting the rows to 10, and clicking on 2, then 3, then 5, then 7. Notice how the number of
multiples removed gets smaller as the factors get larger.

2 Sieve picture by Donovan Govan. CC-BY-SA-3.0, via Wikimedia Commons


4

After completing the exercise above, you should see a table listing the prime numbers less
than 100:

You may notice patterns and pairs of primes. Twin primes are consecutive prime numbers
such as 11 and 13, and 41 and 43. It is been conjectured that there are an infinite number of
twin primes, but this has never been proven. Mersenne primes are prime numbers with the
form 2n – 1, 1 less than a power of 2. How many Mersenne primes are there in the table?
The largest prime number known to date, discovered in January 2013, is a Mersenne prime,
257,885,161 – 1, which has 17,425,170 digits, and is also known as a titanic prime. 3

The Fundamental Theorem of Arithmetic states that every composite number can be
expressed as a unique product of prime numbers, which means that there is only one way to
factor the number as primes (reordering of factors does not count as a different way). For
example, the composite number 100 can be written as 5×5×2×2 or 5222, but there is no other
list of prime factors for 100; they all will include two 5s and two 2s.

You may recall from previous math courses using factor trees to determine prime factors of
a number. Consider the number 240. We can recognize that it is divisible by 10, and factor
it as 24×10, but we haven’t completed the prime factorization until we have factored the 24
and 10 into their respective prime factors:

Factor trees created using the applet at:


http://nlvm.usu.edu/en/nav/frames_asid_202_g_2_t_1.html?from=topic_t_1.html

3
Weisstein, Eric W. "Titanic Prime." From MathWorld--A Wolfram Web Resource.
http://mathworld.wolfram.com/TitanicPrime.html
Number Theory 5

You can also recognize that 240 is even, thus divisible by 2, and divide by 2 until you reach a
factor that is no longer divisible by 2, then divide by 3, and so forth, similar to the process
used in the Sieve of Eratosthenes. This process is also known as casting out 2s:

Both factoring methods express to us that the prime factorization is the same, 24 = (24)(3)(5).

Try it now 3:
Find the prime factorization of the following numbers:
a. 38
b. 21
c. 360

Prime factorization helps us determine the greatest common divisor (GCD), sometimes
known as the greatest common factor (GCF) of two or more numbers. The greatest
common divisor (GCD) is the largest natural number that divides (“goes into” evenly, with
no remainder) the given numbers. Since the GCD looks for common divisors, it is useful in
problem solving when breaking larger amounts into smaller subsets: the GCD will be the size
of the largest common subset.

If we consider the numbers 240 and 360, we can find their prime factorizations express their
factorizations using a Venn diagram:
240 = (24)(3)(5)
360 = (23)(32)(5) 240: 360:
5
Notice that 240 = (2)(2)(2)(2)(3)(5) 3 3
and 360 = (2)(2)(2)(3)(3)(5) 2
so they have (2)(2)(2), a 3, and a 5 in common: 2 2
The intersection of the prime factorizations is 2
the GCD: (2)(2)(2)(3)(5) = 120 and is noted as
GCD(240, 360) = 120.

The same Venn diagram can be used to find the Least Common Multiple (LCM) of the two
numbers. The Least Common Multiple (LCM) is the smallest natural number that is a
6

multiple of the given numbers (the result from multiplying each of the given numbers by a
number). The LCM is useful in problem solving for predicting common repetitions of both
values. The LCM can be located from the Venn diagram by listing all the factors shown in
the sets, the union of the prime factorizations. The LCM of 240 and 360 would be
(2)(2)(2)(2)(3)(3)(5) = (24)(32)(5) = 720, and is noted as LCM(240, 360) = 720. The LCM
represents every factor from each factorization with the highest exponent for repeated
common factors.

Compare the product of the LCM and GCD of 240 and 360 with the product of 240 and 360:
LCM(240, 360) × GCD(240, 360) = (2)(2)(2)(2)(3)(3)(5)× (2)(2)(2)(3)(5) = (27)(33)(52)
While 240×360 = (2)(2)(2)(2)(3)(5)×(2)(2)(2)(3)(3)(5) = (27)(33)(52)
They both equal the same result, (27)(33)(52) = 86400. The product of the two values is the
same as the product of their LCM and GCD. This relationship is helpful for checking
accuracy of results as well as finding either the LCM or GCD if you know one of them. For
example, if you know the GCD(240, 360) = 120, then take the product of 240 and 360, which
is 86400, and divide it by 120: 86400 ÷ 120 = 720, which is the LCM(240, 360).

Try it now 4:
a. Find the GCD(144, 15) and the LCM(144, 15)
b. Verify the results by finding the product of 144×15 and the product of the
LCM×GCD.

Consider the values 38 and 21, and their prime factorizations: 38 = (2)(19) and 21 = (3)(7).
Organizing their prime factors into a Venn diagram gives us:

In the intersection of the two sets, where we would


normally locate the GCD, there are no values. This 38: 21:
empty intersection tells us the GCD(38, 21) = 1, as 19
every number has a factor of 1. When the GCD of 3
two values is 1, we say the values are relatively
prime. Notice that the LCM, the union of the sets, 2
7
is the product of the factors, LCM(38, 21) =
(2)(19)(3)(7) = 798, which is the same as the
product of 38 and 19: 38×19 = 798.

When solving problems involving the LCM or GCD, we determine if we are looking for
subsets (smaller sets) of the values, which would suggest a divisor or the GCD, or larger
multiples (larger sets) of the values, which would suggest a multiple or the LCM.

Example 2: Hot Dogs vs. Buns


Suppose that you like a particular specialty kind of hot dog that comes in packages of 10, and
you buy buns in packages of 8. How many whole packages of each should you purchase so
that each hot dog has a bun?
Number Theory 7

Notice that we are buying whole packages, and more than one package, so we will end up
with multiples of hot dogs and buns (we are not breaking packages up; grocery stores take
issue with that sort of thing!). Since we don’t necessarily need to feed an army, we are
looking for the least common multiple:
10: 8:
LCM(10, 8) = (5)(2)(2)(2) = 40
We need 40 hot dogs and 40 buns, so 4 packages of 2
hot dogs and 5 packages of buns. 5 2
2
The Venn diagram may seem overkill here, but did
you catch how the number of packages relates to the
non-common factor(s)?

Example 3: Garden Plots


A large field measures 70 feet by 525 feet. If you divide it up into equal square garden plots,
what size is the largest possible plot if the side lengths are natural numbers?

The clue word in this problem is “divide.” We’re dividing the larger dimensions into smaller
sizes, so we are looking for the GCD.

GCD(70, 525) = (7)(5) = 35 70: 525:


2 7 3
So the plots should be 35 feet by 35 feet in size.
From the Venn diagram, we can see the non- 5
common factors tell us how many plots will fit in 5
the field: there will be 2 plots by 15 plots, or 30
total plots in the field.

Try it now 5:
a. Kris and Mickey are running laps around the same track. Kris can run one lap in 8
minutes but Mickey takes 12 minutes. If they both start at the same place, the same
time, and run in the same direction, at what time will they first pass each other?

b. What is the largest size of equal square tiles that could be used to make a
checkerboard pattern on a floor measuring 128 inches by 96 inches?

Beyond Natural Numbers


When counting sheep, natural numbers work quite well for the sheepherder as there is no
meaning to part of a sheep; sheep are whole numbers. If you’re keeping track of your sheep
by making a tally mark on a clay tablet, and you have no sheep, then you make no tally mark.
A picture counting system needs no symbol for 0. However, when working with larger and
larger numbers, making tally marks becomes cumbersome, you end up creating a positional
system, as the Babylonians did, and 0 becomes important, not just to mean 0 sheep, but as a
8

placeholder in place value. The addition of the symbol of 0 to the natural numbers creates
the set of Whole Numbers, W = {0, 1, 2, 3, 4, …}. But will whole numbers be sufficient for
all the counting and mathematical operations we need to do?

An important concept for operations with number sets is the idea of closure. A set of
numbers is closed under an operation if you take two numbers from the set, perform the
operation, and the result is also part of the set. Consider the set of Whole Numbers and the
operation of addition. If you add two whole numbers, will you always get a whole number?
That is, is whole number + whole number = whole number? For example, 3 + 5 = 8, a
whole number. Hopefully you intuitively say the whole numbers are closed under the
operation of addition, although we have not proven it.

But are the whole numbers closed under the operation of subtraction? Is whole number –
whole number = whole number? For example, 3 – 5 = -2, but wait! The result here is not a
whole number, but is instead a negative number. We have a counterexample. There is no
whole number that results from 3 – 5. If we consider a number line, showing whole
numbers, view 3 – 5 as starting at 3 and moving left 5, there is nowhere to move to as the
number line ends at 0:

Lack of closure for whole numbers suggests there’s another number system out there that
includes both the whole numbers and their opposites: Integers:

Extending the number line to the left past 0, and using a (-) sign to show direction leftward,
we can use integers to illustrate 3 – 5 = -2, 2 units to the left of 0. Integers are closed under
the operation of subtraction.

Negative numbers allow us to show distance in a direction opposite from what we call
positive numbers, as we show on the left side of the number line. We use negative numbers
to model debt, money is going in the opposite direction from us! The Chinese (in 200 BC)
and Indians (in 620 AD) used negative numbers to model debt, although modern western
society avoided their formal use until the 19th century4. We use negative numbers to indicate
direction in temperature (below 0) as well as in altitude (below sea level).

Are the integers closed under the operation of multiplication? Does an integer times an
integer always produce another integer? We can try an example: (-8) × 3 = -24, which is an
integer. One example is not proof, but intuitively, we can argue that as multiplication can be
thought of as repeated addition (add 3 sets of -8, or -8 + -8 + -8 = -24), then as the integers
are closed under addition, they should be closed under multiplication.

4Rogers, L. (n.d.). The History of Negative Numbers. : nrich.maths.org. Retrieved June 16, 2014, from
http://nrich.maths.org/5961
Number Theory 9

But what about division? Does an integer divided by an integer always produce an integer?
Consider our previous example and change the operation: (-8) ÷ 3. There is no integer result
here as there will be a remainder: The integers are not closed under division.

So how do we split a debt of $8 among 3 people? We can have each person pay $2, but that
leaves debt remaining, while each person paying $3 pays too much. We need a value
between $2 and $3. There are no integer numbers in between consecutive integers, so we
need a new number set. We need the answer to -8/3, a ratio between two integers, the
Rational Numbers.

Each Whole Number and Integer can be considered Rational Numbers as well, as they can be
expressed as the ratio between two integers. The following are rational numbers:
-2 as it can be written as -2/1
0 as it can be written as 0/5
1.4 as it can be written as 14/10
1/2 as it can be written as 2/4

You may notice that 1/2 is already a ratio between two integers, but it could also be
expressed as 2/4, as well as 3/6, 100/200, -5/-10, 11/22, and so forth. Rational numbers do
not have unique representations as a particular rational number, such as 1/2, has numerous
equivalent rational forms.

Another property that rational numbers have: they are dense. A number set is considered to
be dense if between any two numbers you can find another number that is also a member of
that set. Based on this concept, are integers dense? Consider the number line: there are no
integers in between any two integers. For example, there is no integer between 4 and 5.

Consider the rational numbers: is there a rational number between 4/7 and 5/7? If we
consider equivalent forms of 4/7 and 5/7:

4 8 5 10 9
 and  , and in between them is
7 14 7 14 14

These forms came from multiplying the numerator and denominator by a common factor, 2.

A student suggested that midway between 4/7 and 5/7 should be 4.5/7, using a decimal form.
However, this number, 4.5/7 is not a ratio of two integers, to which the student said,
“Multiply the numerator and denominator by 10 to convert the decimal to an integer:”

4.5 10 45 9
 , a rational number, which can be reduced to
7 10 70 14

We could apply this student’s technique with 4.3/7 or similar to locate another value, 43/70,
in between 4/7 and 5/7.
10

Try it now 6:
Locate a rational number between 7/15 and 8/15.

Rational numbers can be expressed in decimal form. Recall that our place value system uses
base 10 place values. A rational number can be viewed as a quotient: the numerator (top)
divided by the denominator (bottom). Use a calculator to find the decimal forms of the
following fractions and see if you can find a pattern and connection to the place value
system:
1 4 1 1 3 2 5
2 5 3 11 8 15 6
Every rational number in decimal form will either be a terminating (finite) or non-
terminating, repeating decimal. The rational numbers that terminate have denominators (the
divisor in the ratio) with only 2 and 5 as their prime factors. If you consider that our place
value system is based on multiples of 10, and prime factors of 10 are 2 and 5, it makes sense
that to terminate, the denominator needs to be a factor of a multiple of 10.

A simple division process takes a rational number from fraction to decimal form. How do
we go backwards from decimal to fraction form?

If the decimal terminates, it is straight-forward: use the place value of the terminating
digit. For example, 0.875 terminates in the thousandths place, so it is 875/1000, which
reduces:
875 875  125 7
0.875   
1000 1000  125 8

If the decimal does not terminate, a little bit of algebra can help. Consider the repeating
decimal 0.88888888…. which can be written as 0. 8̅ (the bar over the value indicates that
value repeats). We recognize that it is a rational number because it repeats. We know that
number exists, so we call it “n,”

n = 0.8888888….

If we multiply this number by 10 (and multiply both sides of the equation by 10), we have:

10n = 8.888888….

Notice how this moves the decimal place one place, and there are still an infinite amount of
repeating digits following. Writing the two equations together, we have:

10n = 8.8888888…..
n = 0.8888888….. subtract the two (left side – left side, right side – right side)
9n =8 as all the repeating digits will subtract out infinitely.

We now have an equation we can solve for our unknown number, n:


9n = 8 divide both sides by 9,
n = 8/9 which is a ratio of two integers.
Number Theory 11

We can check it by using our calculator and dividing 8 by 9. Your calculator may round the
last decimal place it gives you, but it should still be 0.88888…. repeating infinitely.

Example 4: Converting Repeating Decimals


Convert 0.8787878787… to rational number form.

We recognize that it is a rational number as it is a repeating, nonterminating decimal.


We call this number “n”: n = 0.8787878787…
We also notice that two digits are repeating, so multiply this equation by 100:
100n = 87.87878787…, which moves the decimal place two places.

100n = 87.87878787…
n = 0.8787878787… Subtract and all the repeating decimal values will cancel out
99n = 87 Solve for n (divide by 99).
n = 87/99 Check with your calculator: do 87÷99.

87 29
Note that  since both the numerator and denominator are divisible by 3.
99 33

Example 5: Converting Repeating Decimals


Convert 0.62525252525… to rational number form.

We recognize that it is a rational number as it is a repeating, nonterminating decimal.


We call this number “n”: n = 0.62525252525…
We also notice that two digits are repeating (be careful as the 6 is not part of the repeating
portions), so multiply this equation by 100:
100n = 62.525252525… which moves the decimal place two places.

100n = 62.525252525…
n = 0.62525252525… Subtract and the repeating decimal values will cancel out
99n = 61.9 Solve for n (divide by 99).
n = 61.9/99 WAIT! That’s not done, as there’s a decimal on top.

61.9 10 619
Note that   and we now have a rational number (check it with your
99 10 990
calculator).

Converting Repeating Decimals to Rational Form:


1. Use “n” to represent the unknown rational form of the number.
2. Create a second equation by multiplying by a power of 10 based on the number of
repeating digits.
3. Subtract the two equations to cancel out the repeating digits (make sure the digits
align in order to do so)
4. Solve for n, reducing as necessary.
12

Try it now 7:
Convert each decimal to rational form:
a. 0.742 b. 0.7777777…. c. 0.7474747474… d. 0.742742742742...

Beyond Rational Numbers


Ancient Greek mathematicians were very fond of rational numbers. When they discovered
that there were other numbers which were not rational, they swore that "terrible" discovery to
secrecy. One story (most likely just a story, but dramatically exciting anyway) suggests they
murdered the man who let the secret out! Rather irrational of them.

We created rational numbers to attempt to find closure under the operation of division. Are
rational numbers closed under division? Is a rational number divided by a rational number
always another rational number? Almost… there is one number that creates havoc for
division: division by 0. Rational numbers will never truly be closed under division because
of division by 0.

Rational numbers allowed for ratios to be expressed easily, they can't express every number.
The most obvious examples can be found in geometry.

Consider a square whose sides are all one unit long:


1 ?
Then the distance along the diagonal can be determined by the
Pythagorean Theorem, a2 + b2 = c2:
12 + 12 = c2
1 + 1 = c2 1
2 = c2
We can use the square root operation to undo the squaring and solve for c:
2 c
So what does “c” equal? Is “c” a rational number? What is this number 2 ?
We know that 2 is bigger than 1, as 12 = 1. We also know it is smaller than 2, as 22 = 4. So
we need a rational number between 1 and 2.
If 2 is a rational number, then we can write it as a ratio between two integers, x and y:
x
2
y
But it still has the square root in it, so let’s square both sides to get rid of the square root:
2
x
 2
2
 
 y
x2
2
y2
Which is still kind of yucky, so let’s cross-multiply to get rid of the fraction:
2y 2  x2
Number Theory 13

At this point, it still looks strange, so let’s remind ourselves about that idea earlier that every
number has a unique prime factorization. So whatever x is, it has a unique prime
factorization. Squaring x doubles the number of prime factors. We don’t know what they
are, but we know there are an even number of them.

Even number of factors

2y 2  x2

Odd number of factors with 2

Same can be said for y and y2: it has an even number of prime factors. But if you look at the
left side of the equation, there is a 2 as well, an extra prime factor, so the left side has an odd
number of prime factors, while the right side has an even number of prime factors. Since an
even number can’t equal an odd number, this situation is impossible. There is no rational
number for 2 and so 2 has to be irrational.

Challenge! 8
Apply the same argument to show that √5 is irrational and √4 is rational.

Another famous irrational number (also called a transcendental number)


is the ratio of the circumference of a circle to its diameter, 3.1415926….
or . It is a ratio, but cannot be expressed as the ratio of two integers.

If we consider  or √2 = 1.41421356 …, we notice they are non-


terminating, non-repeating decimals, so they are not rational numbers.

The only way to express these numbers is to expand our number set beyond the rational
number set to include numbers these new numbers, known as Irrational Numbers. The
Real Number set includes both rational and irrational numbers. We can recognize
irrational numbers because they will be decimals that DO NOT repeat or terminate.

As we’ve seen, roots are one common place that irrational numbers show up. Consider the
value of the following roots:
0 1 36 38 80

We can estimate the value of the last two roots, but cannot express that exactly using
decimals. We can find roots that are perfect squares.

12 22 32 42 52 62 72 82 92 102 112 122 132 142 152


1 4 9 16 25 36 49 64 81 100 121 144 169 196 225

We know that 80 is close to 9, since 80 is close to 81. We know that 36 is close to 6 as


38 is close to 36.
14

Sometimes we can use perfect squares to simplify roots, if we recognize that the values have
perfect square factors.
For example, 80  16  5 and since 16 is the perfect square of 4, we can simply it:
80  16  5  16 5  4 5 . Similarly, 120  4  30 and since 4 is the perfect square of
2, we can simplify it: 120  4  30  2 30 . Is there more we could do? Does 30 have any
perfect square factors? No (we’re done).

Real Numbers and Number Properties


At this point, we’ve looked at several number sets. Each number set enlarges the previous.
If we drew Venn diagram 5 of the number sets, it would look like: (there’s a small error on
this diagram: what is labeled Natural includes 0, so it’s actually the Whole number set)

We’ve also looked a CLOSURE, whether the result of an operation is still a member of the
set. Let’s revisit the closure of our number sets: Think about how the issue of closure drives
the creation of new number sets.

Number Set Addition Subtraction Multiplication Division


Natural Closed Not Closed Closed Not Closed
Numbers
Whole Closed Not Closed Closed Not Closed
Numbers
Integers Closed Closed Closed Not Closed

Rational Closed Closed Closed Closed except


Numbers for division by 0
Real Numbers Closed Closed Closed Closed except
for division by 0

5 Image copyright by Keith Enevoldson, http://thinkzone.wlonk.com/Numbers/NumberSets.htm


Number Theory 15

In addition to the closure properties, all real numbers illustrate the commutative properties of
addition and multiplication, reverse the order of addition or multiplication, same result:

Commutative Property: a+b=b+a ab = ba

All real numbers also have the associative property of addition and multiplication, regroup
the terms added or multiplied, same result:

Associative Property: a + (b + c) = (a + b) + c a(bc) = (ab)c

The property of real numbers that ties multiplication and addition together is the distributive
property:

Distributive Property: a(b + c) = ab + ac

Try it now 9:
We have commutative, associative and distributive properties for addition and multiplication.
Do they extend to other operations? Choose values for a, b, c, etc. and test the following
properties to see if they are true:

a. Distributive Property for Subtraction: a(b – c) = ab – ac

b. Distributive Property for Roots: √𝑎 + 𝑏 = √𝑎 + √𝑏

c. Distributive Property for Roots (II): √𝑎 2 + 𝑏2 = √𝑎 2 + √𝑏2

d. Commutative Property for Subtraction: a – b = b – a

e. Associative Property for Subtraction: a – (b – c) = (a – b) – c


Geometry 1

“Where there is matter, there is geometry.”—Johannes Kepler (1571-1630)


“Geometry is the Art of measuring well.”—Peter Ramus (1515-1572)

Geometry
The word GEOMETRY means Earth (GEO) Measure (METRY), a means of measuring our
world. Geometry is one tool we use to view our world, and much of the daily problem
solving we do has some geometric aspect. Geometry has application in many fields,
including practical fields such as carpentry and construction, as well as artistic endeavors
such as sculpture and painting. The Greek mathematician Euclid is famous for the Elements,
beginning with a few basic assumptions (postulates) and developing from these assumptions
the principles and theory of what is now known as Euclidean geometry.

Euclid used deductive logic in the Elements, but geometry is not limited to deductive
reasoning. We can inductively look for patterns in relationships among measurements and
characteristics of geometric figures, and prove these relationships deductively. We can also
look beyond the limitations of Euclidean geometry to other geometries and analyses of the
world around us.

Terms and Notation


Geometry has its own language and symbols. We begin our survey of geometry as Euclid
did, by considering some simple geometric figures: points, lines and planes, then create
more figures using these as building blocks. Our first figures are called undefined terms, as
we develop an intuitive understanding of them without precise mathematical definitions. The
notation we use is critical for efficient communication. Consider how much easier it is to
⃡ rather than write the words, “the line which passes through the two
write the symbols 𝐴𝐵
points A and B and continues on forever in either direction!”

Term Figure and Notation Description


Point A location in space, with no dimension (not measureable).
Indicated with an uppercase letter
Line A collection of points that continues forever in two
directions, has one dimension, is straight (not
⃡𝐴𝐵 measureable). Indicated with two points and a line “hat”
Plane A collection of infinite points that goes on forever in two
dimensions, flat surface with no depth/thickness (not
measurable). Indicated with three points.

Notice that uppercase letters are used to indicate points. A line contains an infinite number
of points, but its notation uses only two points. This notation reflects two of Euclid’s five
postulates on which he built his geometric theory:
1. A straight line segment can be drawn joining any two points.
2. Any straight line segment can be extended indefinitely in a straight line. 1
Thus two points are enough to describe a line as between any two points there is one and
only one line that passes through them and extends forever. One point, such as A would not

1Weisstein, Eric W. "Euclid's Postulates." From MathWorld--A Wolfram Web Resource.


http://mathworld.wolfram.com/EuclidsPostulates.html

© Laurel Clifford Creative Commons BY-SA


2

be sufficient, as there are an infinite number of directions where the line could go, and we
would not know which direction is indicated.

Using these figures, we can create definitions of other figures. Be especially aware of those
figures which are “pieces” of lines, as they use the same two point notation, but will have a
different “hat” on the points, indicating what type of figure they are; the “hat” is like a rank
insignia on a uniform as it tells us exactly what we’re talking about.

Term Figure and Notation Description


Segment A finite subset (piece) of a line with two endpoints,
has measurable length (distance)
𝐴𝐵 refers to the distance between points A and B
̅̅̅̅
𝐴𝐵 ̅̅̅̅ refers to the segment itself. A lower case
while 𝐴𝐵
letter next to the segment can also refer to length.
Ray A piece of a line with one endpoint that continues
on forever in one direction (not measurable)
The notation uses two points to indicate direction.
𝐴𝐵
Angle Two rays with a common endpoint; can be two
B segments with a common endpoint, or created by
1 intersecting lines or line segments
C
The notation uses three points or a number to
A clarify which angle is discussed.
D
A
∠A E Greek letters such as, ,,,  are sometimes used
∠𝐵𝐴𝐶 𝑜𝑟 ∠1 to indicate the measure of the angle.

Be careful when discussing angles that you use notation to indicate clearly the angle you
reference. In the angle figures above, the angle indicated as A may seem unambiguous, but
the figure could illustrate two different angles:

The angle indicated with the blue arc: or the angle indicated by the orange arc:

A A
Drawing the arc on the angle helps clarify which angle is discussed. The orange angle is also
known as a reflex angle.

Other figures are much more ambiguous. If we looked at the figure with
B two intersecting lines and referred to A, it would be unclear exactly
1 which angle we are talking about, as there are multiple angles at point A.
C We use three point notation, using points on other side of the angle and
A the vertex, the vertex (corner point) of the angle is the center point in the
D
notation. Thus the angle with the blue arc is BAC or CAB. Using
E the arc in combination with a number (1) also indicates the angle
discussed.
Geometry 3

We can build geometric figures using the set operations intersection () and union ().
Recall that these operations relate to the Boolean operators AND (intersection) and OR
(union). Visually, the intersection is where the figures cross each other or overlap, what they
have in common. The union includes all the pieces of the figures involved.

Consider the figure at the left. We can see


illustrated one of Euclid’s postulates about
intersecting lines: two lines intersect in a single
point. For example, the result of ⃡GC ∩ HD⃡ is the
point G, since the line through G and C and the
line through H and D cross only at the point G.
The point G is the only point on ⃡GC AND HD ⃡ .

If we consider CB ∪ CF the result would be an


angle, BCF because the union would include
all the points on either of the two rays. We have
two rays with a common endpoint, C, which
creates an angle.

Try it now 1:
Use the figure above to identify the results of the following:
a. ̅̅̅̅
GF ∪ FD
b. ̅̅̅̅
GD ∩ ̅̅̅̅
HF
c. GD̅̅̅̅ ∪ HF
̅̅̅̅
̅̅̅̅
d. CG ∪ CF ̅̅̅̅ ∪ ̅̅̅̅
GF

e. BI ∩ GC ̅̅̅̅

Measurement
Throughout your life you quantify things by assigning a numerical value to it: your height as
you grow up, the time that passes during the day, or the memory you’ve used up storing
pictures on your cell phone. Ancient records as far back as 3000 BC show the Egyptians
using careful measurements and geometry in the construction of the pyramids.

A line segment is a piece of a line between two endpoints, thus linear measurement
measures distance between two points. We need some sort of tool with a standardized unit to
measure this distance, such as a ruler with centimeter or inch marks, or the scale on a map.
The ruler below 2 illustrates the idea that the distance is measured between two endpoints, and
the segment length is 5 units. Even though the segment does not start at the zero mark, we
can see it lies between the 3 unit mark and 8 unit mark, and 8 units – 3 units = 5 units.

2 Image from CK-12 Geometry, license CC-BY-NC-SA


4

Angles and Measurement


We measure segments by measuring the distance between the endpoints, but how do we
measure an angle? When we measure an angle, we are not interested in distance, as the
distance between the sides of the angle vary. Instead, we measure the amount of rotation
(turn) between the sides of the angle. A full turn, like a full circle, is defined as 360º. Why
360? Possibly we inherited 360 from the Babylonian calendar, with 12 months of 30 days. 3

A half turn creates a straight line, and thus we


call this angle a straight angle:

As it is half turn, a straight angle measures


180º.

A quarter turn creates a right


angle, which measures 90.
The square “box” in the corner
of the angle indicates that the
angle is a right angle.

Other angles can be classified in relation to these two angles.


Acute angles are angles that measure less than 90º:

Obtuse angles are angles that measure more than 90 but less than
180º:

Previously mentioned reflex angles measure more than 180 but less
than 360º.

Can an angle measure more than 360 and what does that mean?
If 360º is a full circle, then an angle larger than 360 has rotated at
least one full circle and beyond. Consider if an angle has rotated
405º, it has rotated 360 and then 45º more. It would end or
terminate at the same place as a 45 angle. Such angles are called
coterminal.

3Weisstein, Eric W. "Degree." From MathWorld--A Wolfram Web Resource.


http://mathworld.wolfram.com/Degree.html
Geometry 5

Adjacent angles share a vertex and a side (it helps to


remember that “adjacent” means “next to”). The total
measure of an angle created by two adjacent angles is the
sum of the measures of each individual angle. In the figure
at the right, mDAB = mDAC + mCAB (the “m”
indicates the measure of the angles), thus mDAB = 25.59º
+ 25.94 = 51.53º.

Two special cases of angle pairs that interest us are supplementary and complementary
angles. Supplementary angles are two angles whose sum
equals 180. If one angle measures 60º then its supplement
measures 120. In the illustration, we can see if two
supplementary angles are also adjacent angles, they form a
straight angle.

Complementary angles are two angles whose sum equals 90º. If one angle
measures 60º then its complement measures 30. From the illustration, we
can see that if two complementary angles are also adjacent angles, they form
a right angle.

Intersecting lines form adjacent angles and opposite


angles. In the figure at the right, we can see that 1
and 2 are adjacent angles, while 1 and 3 are
opposite each other, called vertically opposing 1
angles or vertical angles. If you examine the figure 2 4
closely, you may notice that these angles’ measures 3
relate to each other in some interesting ways. We
can draw and measure multiple examples or view a
computer generated example which we can
manipulate at http://www.mathopenref.com/anglesvertical.html and use inductive reasoning
to conclude that m1 = m3, and m2 = m4. In general, we can state that vertical angles
have equal measures.

We can also deductively prove this concept without using specific examples or
measurements. We know that 1 and 2 are adjacent angles and form a straight line. Thus
we know that m1 + m2 = 180. Similarly, 2 and 3 are adjacent angles and form a
straight line. Thus we know that m2 + m3 = 180. Since both angle sums equal 180,
then they must equal each other:
m1 + m2 = m2 + m3

Using a little algebra, if we subtract m2 from both sides of this equation, we have:
m1 = m3

We can use a similar argument to show that m2 = m4.

We can apply these angle relationships to find unknown angles created by intersecting lines.
6

Try it now 2:
Determine the missing angles a., b., and c. in the given figure: 108
a. c.
b.

We can build on this knowledge to explore the angle


relationships created by two parallel lines, coplanar
lines that do not intersect, and a third line that
intersects both, called a transversal (it “transverses” 1
2 4
both lines). 3
5
Examining the figure, we can see the vertical angles 6 8
we are familiar with, and conclude that m1 = m3, 7
and m2 = m4. Similarly, we can see that m5 =
m7, and m6 = m8. But how does m1 relate to
m5?

We can again use inductive reasoning and examples drawn or a computer animation such as
the one at http://www.mathopenref.com/transversal.html to determine the relationship. You
may notice that 1 lies in the same location as 5, above the parallel line, and to the right of
the transversal. If we slid the two parallel lines together, 1 and 5 would match up; they
are examples of corresponding angles. From inductive investigation, we can conjecture that
corresponding angles to parallel lines have equal measures.

Applying this thinking to our illustration, we state that m1 = m5, m4 = m8, m2 =
m6, and m3 = m7. Using the “chain rule” of logic (transitive property) we can say:
m1 = m5 = m3 = m7, and m2 = m6 = m4 = m8. We call 4 and 6 alternate
interior angles (as well as 3 and 5) and can state that alternative interior angles to
parallel lines have equal measures. We call 2 and 8 alternate exterior angles (as well
as 1 and 7) and can state that alternate exterior angles to parallel lines have equal
measures.

Putting all our angle relationship ideas together allows us to solve for the missing angles in
more complicated figures.

Try it now 3:
Determine the missing angles a. – g. in the figure given,
assuming that the lines that look parallel are indeed a.
parallel. 62 c.
b. d.
e. g.
f.
Geometry 7

We can also use these angle relationships to prove that the sum of the interior angles in a
triangle is 180º. In the figure below, we need to show that m1 + m2 + m3 = 180º.

5 4 Assuming the two lines are parallel, and using


3
the sides of the triangle as transversals, we can
conclude that m1 = m5 since they are
alternate interior angles.
1 2 Similarly, m2 = m4.

Notice that m5 + m3 + m4 = 180º since they form a straight angle. Using substitution,
we can take this equation, m5 + m3 + m4 = 180º, replace 5 with 1 and 4 with 2,
and we have m1 + m3 + m2 = 180º, and so the three angles in the triangle add up 180º.

Knowing that the sum of the angles of any triangle is 180º allows us to problem solve with
triangle and other polygon angle sums.

Try it now 4:
Solve for the measures of x, y, and z in the triangles below.

30 z
20

z z
x 40
y y

Polygons
In combining intersecting lines we created triangles, the simplest polygon. We can build
other polygons using segments and angles. The word polygon comes from the Greek poly-
meaning many and –gon meaning angles. A polygon has the same number sides as angles.
We name polygons based on the number of sides:

Number of Name of
sides polygon
3 Triangle
4 Quadrilateral
5 Pentagon
6 Hexagon
7 Heptagon
8 Octagon
9 Nonagon
10 Decagon
12 Dodecagon
n n-gon
8

We previously proved that the sum of the interior angles of any triangle is 180º. We can use
this angle sum for a triangle to find the total interior angle sum for any polygon by dividing
the polygon into triangles. One way to do this process is to draw all the diagonals of the
polygon from a single vertex:

Number of 3 4 5 6 7 8 … n
Sides
Number of 1 2 3 4 5 6 … n–2
Triangles
Created
Total 180º 180º(2) 180º(3) 180º(4) 180º(5) 180º(6) … 180º(n – 2 )
Interior =360º = 540º = 720º 900º =1080º
Angle
Sum

We can see from the table that the number of triangles created by drawing the diagonals from
one vertex is always two less than the number of sides n. We can find the total interior angle
sum by multiplying the number triangles (n – 2) by 180º, 180º(n – 2).

Extending this idea, if we had a decagon, which is a 10-sided polygon, we know that there
would be 8 triangles created, and the total interior angle sum is 180º(8) = 1440º. If the
decagon happened to be a regular polygon, which is a polygon where all the angles and all
the sides are equal, then we could find the measure of each individual angle by dividing the
total measure 1440º by 10 angles, and the result is 144º per angle.

Try it now 5:
Calculate the total interior angle sum of an icosahedron, which has 20 sides. If the
icosahedron was a regular polygon, what is the measure of each interior angle?

Classifying Triangles
We classify triangles and quadrilaterals according to the features they have, such as angles:

Right triangle Acute triangle Obtuse triangle


Geometry 9

And sides (the tick marks indicate equal sides, if any):

(no equal sides) (two equal sides) (three equal sides)


Scalene triangle Isosceles triangle Equilateral triangle

Or by both angles and sides:

Right scalene triangle Acute isosceles triangle Obtuse isosceles triangle

The following pair of triangles are congruent triangles, which means they are the same
shape and size. As a consequence, their angles have the same measures, and their sides have
the same length.

A A’ So if mABC = 32, then m A’B’C’ = 32

and if AB = 10 cm, then A’B’ = 10 cm.


B’
C’
C B

The triangles below are not congruent, but are similar triangles, which means they are the
same shape, but different sizes. One of them is an enlargement of the other. You may
notice that the angles are equal, but the sides are not. The side lengths are proportional.

So if mABC = 46, then mA’B’C =


46

and if AB = 10 cm, RT = 8 cm, and RS = 7


cm, we can find the length of AC using
proportional reasoning:

10 𝑐𝑚 𝑥 𝑐𝑚
=
8 𝑐𝑚 7 𝑐𝑚

Solve for x either using cross-multiplication or scaling (10/8 multiplied by 7 cm) and AC =
8.75 cm.
10

Similar triangles show up in application problems where you may not expect them. If you
are standing next to a lamppost, and your shadow is 3 ft long, while you are 5.5 ft tall, and
the lamppost casts a shadow that is 7.5 ft long, how tall is the lamppost?

If you sketch this situation, and visualize a sunbeam creating the


shadow, you can see the triangles involved:

Using proportional reasoning, we have:

5.5 𝑓𝑡 𝑡𝑎𝑙𝑙 𝑥 𝑡𝑎𝑙𝑙


=
3 𝑓𝑡 𝑠ℎ𝑎𝑑𝑜𝑤 7.5 𝑓𝑡 𝑠ℎ𝑎𝑑𝑜𝑤

Solving for x via cross-multiplication or scaling, we have x = 13.75


feet, so the lamppost is 13.75 feet tall.

Try it now 6:
A forest service truck is 6 feet tall and casts a 9 foot shadow. It is parked next to a fire
lookout tower that casts a 240 foot long shadow. How tall is the lookout tower?

Classifying Quadrilaterals
We classify quadrilaterals by their angle size and side length characteristics as well as
whether they have any parallel sides. A quadrilateral tree helps illustrate these
interrelationships. As we proceed higher in the tree, the quadrilaterals get more specialized,
and every figure higher on the tree has the same features as the figures below it. For
example, a square is a specialized quadrilateral that is both a rhombus and a rectangle.
Square:
Quadrilateral with
all equal angles and
all equal sides

Rhombus: Rectangle:
Isosceles Trapezoid: Quadrilateral with Quadrilateral with
Trapezoid with all equal sides all equal angles
nonparallel sides equal

Parallelogram:
Quadrilateral
Kite: with two pairs
Trapezoid: Quadrilateral with
Quadrilateral of parallel sides
two pairs of adjacent
with one pair of equal sides
paralles sides

Quadrilateral:
Polygon with 4
sides
Geometry 11

Measuring Polygons
We’ve measured side lengths and angle rotation. When working with polygons, we can still
measure their side lengths and their angles. We can also measure other aspects of polygons.

Suppose you are building your dream house, and have designed a room that will be your
office/study as shown below. You can assume that since you are a meticulous builder, all
angles that are supposed to be right angles are actually right angles. You decide to carpet
the room, and need to purchase base board trim as well, so you have two questions to deal
with (besides what color to choose):
8 ft 1. How much carpet will you need to buy?
2. How much trim?

12 ft In order to carpet the room, you need to measure the interior


of the room, the floor area. This room is irregularly shaped,
5 ft
but if you recall how to find the area of a rectangle, we can
divide the room into rectangles and find the area of each
12 ft rectangle then add up the individual areas.

If we use the blue dashed line to separate the room into two rectangles, the lower rectangle
measures 5 ft by 12 ft, while the upper rectangle measures 7 ft by 8 ft (subtracting the 5 ft
from the 12 ft to get the remaining 7 ft). Thus the two areas are: (5 ft)(12 ft) = 60 ft 2, and (7
ft)(8 ft) = 56 ft2, and 60ft2 + 56 ft2 = 116 ft2. Thus we need 116 ft2 of carpeting.

The baseboard trim goes around the edge of the room, so we need to find the perimeter by
adding up each of the distances around the outside. Using the dimensions given, and finding
the unknown dimensions from the given dimensions with which they are parallel, the
perimeter is 8 ft + 12 ft + 12 ft + 5 ft + 4 ft + 7 ft = 48 ft. Thus we need 48 ft of carpeting.

Looking back, how are the two questions different? How are the answers different? We are
measuring two very different things: the interior plane area and the exterior linear border.
As a result, the units we use are also different. When we found area, our units were ft2,
square feet, while perimeter units were ft, linear feet.

In geometry, we often use linear measurement (measuring pieces of lines) to find the
perimeter of figures. The perimeter can be found by adding up the distances along the
outside of the figure. With some polygons, we can create formulas for perimeter:
y x

x Rectangle x s
Regular Octagon
Equilateral Triangle
y
P = 2x + 2y
P = 8x P = 3s
12

There is a danger in memorizing a formula without understanding the concept of perimeter:


you apply the wrong formula for the information given.

Polygons vs. Circles


When looking at polygons, notice what happens to
the shape of the polygon as you increase the number
of sides. The image at the left demonstrates the
pattern when you increase the number of sides in
regular polygons (equal sides and equal angles). We
know from our previous work that the total interior
angle sum gets larger as you increase the number of
sides. We can see from this visual image that the
polygons themselves get rounder and rounder,
approaching the shape of a circle.

A circle can be thought of as a regular polygon with


an infinite number of sides. We can measure the
perimeter and area of circles using some of the same
concepts as polygons.

The perimeter of a circle is called its circumference. The radius of


a circle is the distance from the center of the circle to a point on the
circle itself. The diameter of a circle is the distance from a point on
the circle through the center to the opposite side. The diameter is
twice the radius in length.

Consider the circle below, created by the geometry program, Geometer’s Sketchpad:
Radius BA = 2.64 cm
Circumference BA = 16.61 cm
Diameter = 5.29 cm
B
Circumference BA
= 3.14
A Diameter

With this particular program, we can drag the circle and change its size. The radius,
diameter, and circumference will change, but the ratio between the circumference and
diameter always stays the same, about 3.14 which you may recognize as an approximation
for .

For any circle,


Circumference C
 
diameter d
If we solve this equation for C, we have: C = d,
And given that d = 2r, C = (2r) = 2r
Geometry 13

Example 1: Problem Solving with Circles


With the circumference equation we can solve for linear measurements involving circles. If
we know that a circle has radius 7 cm, we can find the circumference around the circle:
C = 2(7 cm) = 14 cm, or approximately 43.982 cm.

If we know a circle has circumference 86.8 cm, we can find its diameter:
86.8 cm = d, so d = 86.8 cm/ or approximately 27.629 cm.

The ratio  has been studied for millennia. The Hebrews used 3 as an approximation for .
The Babylonians also used 3, but created more precise estimations of its value. An
approximation for  is shown on the Rhind Papyrus (1650 BC) of the Egyptians. The Greek
mathematician Archimedes (287–212 BC) and the Chinese mathematician Zu Chongzhi
(429–501 AD) both calculated approximations for  using regular polygons.4 Today,
supercomputers calculate  to trillions of digits. 5

Area
We used linear measurement to calculate the amount of baseboard trim we needed for our
new office. When calculating the amount of carpet we needed, we are talking about area
measurement, enclosed in the interior of polygons, measured in two dimensions (length and
width), and measured in square units.

1 linear unit looks like a piece of a line:


1 square unit looks like a section of a plane:

We need to be careful when relating between linear units and square units. We know
conversion factors for linear units, such as 1 foot = 12 inches, but these do not translate
directly to area units: 1 square foot does NOT equal 12 square inches. How many square
inches are in 1 square foot?

If we take a 1 foot by 1 foot square, and divide each side of


the square into 12 inch units, we have a square that is 12
inches by 12 inches, and 144 square inches fit inside this
area, as shown by the 144 squares visible.

We see this idea by multiplying the conversion factors:


1 𝑓𝑜𝑜𝑡 1 𝑓𝑜𝑜𝑡 1 𝑓𝑜𝑜𝑡 2
× =
12 𝑖𝑛𝑐ℎ𝑒𝑠 12 𝑖𝑛𝑐ℎ𝑒𝑠 144 𝑖𝑛𝑐ℎ𝑒𝑠 2

If carpet is sold by the square yard, and we need 116 ft2 of carpet, we must take care with our
conversion factors to convert square feet into square yards. We need to cancel square feet:
116 𝑓𝑡 2 1 𝑦𝑎𝑟𝑑 1 𝑦𝑎𝑟𝑑 116 𝑓𝑡 2 1 𝑦𝑎𝑟𝑑2
× × or × = 12. 8̅ 𝑦𝑎𝑟𝑑2
1 3 𝑓𝑡 3 𝑓𝑡 1 9 𝑓𝑡 2

4 Pi Day: History of Pi | Exploratorium. (n.d.). Pi Day: History of Pi | Exploratorium. Retrieved June 23, 2014,
from http://www.exploratorium.edu/pi/history_of_pi/
5 Yes, Trillions! Check out: http://www.numberworld.org/misc_runs/pi-5t/details.html
14

When applying proportional reasoning with area, we must also make sure to be comparing
appropriate units and quantities.

Example 2: Proportional Reasoning with Area


If a 12 inch diameter pizza requires 10 ounces of dough, how much dough is needed for a 16
inch pizza?

To answer this question, we need to consider how the weight of the dough will scale. The
weight will be based on the volume of the dough. However, since both pizzas will be about
the same thickness, the weight will scale with the area of the top of the pizza. We can find
the area of each pizza using the formula for area of a circle, A   r 2 :

A 12” pizza has radius 6 inches, so the area will be  6 2 = about 113 square inches.
A 16” pizza has radius 8 inches, so the area will be  82 = about 201 square inches.

Notice that if both pizzas were 1 inch thick, the volumes would be 113 in 3 and 201 in3
respectively, which are at the same ratio as the areas. As mentioned earlier, since the
thickness is the same for both pizzas, we can safely ignore it.

We can now set up a proportion to find the weight of the dough for a 16” pizza:
10 ounces x ounces
 Multiply both sides by 201
113 in 2 201 in 2
10
x  201 = about 17.8 ounces of dough for a 16” pizza.
113
16
It is interesting to note that while the diameter is = 1.33 times larger, the dough required,
12
which scales with area, is 1.332 = 1.78 times larger.

There are many formulas for finding areas of polygons. It’s better to develop conceptual
understanding than to just memorize formulas. We can build many area formulas from the
area of a rectangle. You probably can easily recall the formula for the area of a rectangle as
A = lw. What are “l” and “w” and what kind of measurement do they represent, linear or
area? Notice that this formula uses linear measurements to find area. Why does it “work” to
use linear measurements (the dimensions of the rectangle) to find area?

If we consider that we are counting the number of squares that fit


inside the rectangle to find the area, we can see how the
dimensions can count these squares for us.

The rectangle has rows of 9 squares. Theses rows of squares are


stacked 5 high, so if we multiply the dimensions (9×5) we really
are multiplying:
9 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 5 𝑟𝑜𝑤𝑠
× = 45 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑡𝑜𝑡𝑎𝑙
𝑟𝑜𝑤 1
When we multiply the base length by the height, we are counting the number of square units
in the figure!
Geometry 15

Area of a rectangle
Because we will be moving on to other polygons, we consider the area of a rectangle
as:
A = (base)(height),

where the base and height are perpendicular (at a right angle) to each other.
This will be our “master formula” for creating formulas for other polygons.

We can apply the same kind of thinking, and “create rectangles” for other polygons. This
will allow us to create more formulas from the “master formula,” A = (base)(height), we
created for the area of a rectangle.

A parallelogram can be thought of as a sheared or tilted


rectangle. If we “cut and paste” a triangle from one side of
the parallelogram to the other side, we create a rectangle, and
our area formula remains the same: A = (base)(height) as
long as we’re careful with the height at a right angle to our
base:
A = (6 units)(5 units) = 30 units 2

A triangle can be thought of as half of a rectangle. If we


copy and paste the triangle, we can create a complete
rectangle. So our area formula, A = (base)(height), will need
to be cut in half:

A = (7 units)(6 units)/2 = 21 units 2

So the area formula for a triangle is:

A = ½(base)(height)

A trapezoid can be cut up into triangles or other shapes to


find its area. One method illustrated here is to copy and
paste the trapezoid rotated 180 next to itself to create a
parallelogram, and then apply the parallelogram area
formula. Since we use two trapezoids to create the
parallelogram, we will cut the area in half:

A = (6 units + 4 units)(3 units)/2 = 15 units 2

Notice that to create the base length, the two parallel sides
(or bases) of the trapezoids are added together, so the area
formula for a trapezoid is: A = ½ (sum of the bases)(height)
16

What about the area of a circle? We can “cut up” a circle to make it approximate a
parallelogram:
(Nice animation of this at
http://www.wku.edu/~tom.richmond/Pir2.html )

Using our area formula A = (base)(height), and the base


of the parallelogram we have created is r, with height
r,

A = (r)(r) = r2

Example 3: Problem Solving with Circle Area


Suppose we have a circle with radius 3.5 cm. We can calculate its area:
A = r2
A = (3.5 cm)2
A =12.25 cm2, or approximately 38.48 cm 2.

If we know the area is 100 cm 2 and want to find the diameter, we’ll have to work a little
harder, as our formula only relates the radius to the area.
A = r2
100 cm2 = r2, divide by  and then square root to undo the square,
5.64 cm  r, but we want the diameter, and knowing d = 2r, d  2(5.64 cm) = 11.28 cm.

One more triangle concept…


Consider the triangle drawn on the grid. We can find its area:

A = ½ (5 units)(5 units) = 12.5 units 2,


using the concept that a triangle is half of a rectangle.

But what happens when we calculate the perimeter?


Adding up the side lengths is usually a straight-forward
process. In this case, P = 5 units + 5 units + …
Here’s where the problem arises: the units along the
diagonal side (the hypotenuse) of the triangle are not the
same size as the units along the two legs. Just eyeballing it suggests that the diagonal
(oblique) units are a bit longer than the horizontal and vertical units. If we are using the grid
as our units, we need a method to calculate the diagonal side.

Pythagorean Theorem
To find the length of the oblique side, which is the hypotenuse in this right triangle,
use the Pythagorean Theorem, which says if a right triangle has legs of lengths a and
b, and hypotenuse c, then a2 + b2 = c2.
Geometry 17

In the case of the previous triangle, we can find the hypotenuse, and then the perimeter:
a = 5 units, b = 5 units, so a2 + b2 = c2 is:
(5 units)2 + (5 units)2 = c2
25 units2 + 25 units2 = c2
50 units2 = c2 (square root to undo the square)
√50 units = c
7.071 units  c

And the perimeter of the triangle is: P = 5 units + 5 units + 7.071 units 17.071 units.

We can look at this result we found for “c” and say it is a little more than 7 units (since the
square root of 49 is 7). We can also simplify the square root by considering that:
√50 = √25√2 = 5√2 units
There’s an interesting pattern that appears here because this triangle is a special case: it is an
isosceles right triangle. Since the legs are equal, we can expect this pattern to appear again.
If a right triangle has equal legs “n” then n2 + n2 = c2, and we have 2n2 = c2.
When you solve for “c” by square rooting, we have:
√2𝑛2 = 𝑛√2 = 𝑐
The hypotenuse of an isosceles triangle will always be the leg length times the square root of
2.

Try it now 7:
Solve for “x” in each triangle below:
The first two triangles are examples of Pythagorean Triples. The third is an isosceles
triangle.
11
a. b. c.
8
x 25
x 11
x

17
24

Example 4: And… back to our dream house…


If you have the room shown below, and you want to put down parquet flooring, which comes
in 1 ft by 1 ft squares, how many squares do you need to buy? How
much baseboard for trim? 8 ft

The flooring is area. We can view this room as a square with a


triangle cut off the corner: A = (12 ft)(12 ft) – ½(4 ft)(4ft) = 136 ft2. 12 ft
Since flooring is (1 ft)(1 ft) = 1 ft2 squares, we 136 squares.
8 ft

Baseboard is perimeter. The oblique edge is the hypotenuse of an


isosceles right triangle, and is 4√2 𝑓𝑡 long, so 12 ft
P = 12 ft + 12 ft + 8 ft + 8 ft + 4√2 𝑓𝑡  45.66 ft of baseboard.
18

Surface Area
Suppose you own the detached “Tiny House Backyard
Studio”6 shown, and want to paint the exterior. How
much paint will you need? The paint manufacturer
suggests that a gallon of exterior paint covers 375 to 400
square feet of area. In order to decide how many
gallons of paint to buy, you need to calculate the areas of
each side of the studio you choose to paint, then add up
all of these areas.

A simplified view of the studio is shown below:

The perspective drawing of the angle shown in the


photograph distorts the shapes of the sides of the
studio, known as faces. To better visualize the areas
you need to paint it helps to draw a net, a two
dimensional diagram of the studio that shows what it
would look like unfolded. Since you are only painting
the sides of the studio and not the roof and floor, the
net will show just the faces to be painted.

From this net, you can see that the sides of the studio are two congruent trapezoids and two
different rectangles. You know the area of a trapezoid is A = ½ (sum of the bases)(height)
and the area of a rectangle is A = (base)(height). Calculating these areas gives you:
A = 2(½(90 in. + 111 in.)(76 in.)) + (111 in.)(144 in.) + (90 in.)(144 in.) = 44,562 in2

Since the area covered by a gallon of paint is measured in square feet, you’ll need to convert
this result from square inches to square feet:
44562 in2 1 ft 1 ft 44562 𝑖𝑛2 1 𝑓𝑡 2
× × =310.083̅ 𝑓𝑡 2 or × = 310.083̅ 𝑓𝑡 2
1 12 in. 12 in. 1 144 𝑖𝑛2

A gallon of paint covers 375 to 400 ft2, so one gallon should be enough for this studio,
especially considering you didn’t subtract out the area of the windows.

66 Photo by Earthworm, https://www.flickr.com/photos/earthworm/4524084357/, license CC-BY-SA 2.0


Geometry 19

The process of viewing the faces of a three-dimensional object through a net, finding the area
of each face, and adding these areas to determine the total area is the process we use when
finding surface area.

Surface Area:
The sum of the areas of the exterior faces of a three-dimensional object is the Surface
Area, abbreviated SA throughout this text.

The studio we looked at is an example of a polyhedron, a three-


dimensional shape made of polygons whose sides meet at edges
and edges meet at vertices with no gaps between them. The
representation of the studio, rotated so that the trapezoidal sides
are the top and bottom faces, shows the faces, edges and vertices
edge

vertex of a polyhedron. This polyhedron is also a prism, a polyhedron


with congruent parallel polygons, called bases, connected to each
other by parallelograms (in this case, rectangles). These
parallelograms are called lateral faces. Prisms are named by base
shape. The studio is an example of a trapezoidal prism as its
bases are trapezoids.

The Platonic solids are examples of regular polyhedra with faces that are all congruent
polygons. Euclid proved that there are only five regular polyhedra in existence. 7 In 350 BC,
the Greek philosopher Plato thought that these five solids correlated with elements as they
were viewed at the time, the cube with earth, the dodecahedron with the heavens, the
icosahedron with water, the octahedron with air, and the tetrahedron with fire. 8

Platonic Solids with Nets: Cube, Dodecahedron, Icosahedron, Octahedron, Tetrahedron

If you are a fan of role-playing games, you may have rolled a few Platonic dice 9:

7In Book XIII, http://aleph0.clarku.edu/~djoyce/java/elements/bookXIII/propXIII18.html


8Image and information from Weisstein, Eric W. "Platonic Solid." From MathWorld--A Wolfram Web
Resource. http://mathworld.wolfram.com/PlatonicSolid.html
9See page for author, http://commons.wikimedia.org/wiki/File:BluePlatonicDice.jpg, license CC-BY-SA-3.0,

via Wikimedia Commons


20

Euclid did not list the tetrahedron as one of the regular


polyhedral because he had already defined it as a
pyramid, a polyhedron that has one polygon base with
vertices that connect to a single point called the apex.
Like prisms, pyramids are named by their bases. Plato’s
tetrahedron is a triangular pyramid. The pyramids the
Egyptians built are square pyramids.

Triangular Pyramid Square Pyramid


Example 5: The Pyramid of Khafre
Khafre was the son of Khufu, the builder of the Great Pyramid of Khufu for
which Egypt is famous. Khafre built his square pyramid next to his dad’s.
The base of the pyramid is 705 feet long, and it is 471 feet tall. 10 What is
the surface area of Khafre’s pyramid?11

Since we’re after the surface area, we create the net of the pyramid by
mentally unfolding it into its base, a square, and 4 congruent
Pyramid of Khafre
triangles as its lateral faces. We know the square has side lengths
705 feet by 705 feet. We know the triangles have base length 705
feet. To find the area of a triangle, we need the height. The 471 feet
given is the height of the pyramid (h), not the height of the triangle on
the side (s).

This length s is called the slant height of the pyramid.


If we visualize a right triangle formed inside the
h = 471 ft

pyramid with the height h = 471 ft as one of its legs s


and s would be the hypotenuse. The other leg is half
the distance across the square base, so its length is half s
of 705 ft, or 352.5 ft.
Using the Pythagorean Theorem, we have: b = 325.5 ft
(471 𝑓𝑡)2 + (352.5 𝑓𝑡)2 = 𝑠 2 a = 705 ft
a = 705 ft

346097.25 𝑓𝑡 2 = 𝑠 2 (square root to undo the square)


588.3 𝑓𝑡 = 𝑠

Now we have s, we can find the areas of the 4 congruent triangles:


A = ½(base)(height) for each triangle,
A = 4(½(705 ft)(588.3 ft)) = 829503.4314 ft 2
The area of the square is:
A = (705 ft)(705 ft) = 497025 ft2
Total surface area = 829503.4314 ft2 + 497025 ft2 = 1326528.4314 ft2
This area is about 30.45 acres, or approximately 23 football fields.

The lateral area alone (the triangles along the side) is about 19 acres or 14.4 football fields.

10The Pyramid of Khafre. (n.d.). The Pyramid of Khafre. Retrieved June 26, 2014, from
http://puffin.creighton.edu/museums/cohagan/giza_khafre.htm
11 Image of Khafre’s Pyramid by Ankur P from Pune, India (IMG_1271) CC-BY-SA-2.0 via Wikimedia

Commons
Geometry 21

r = radius Euclid defined other solids by revolving two dimensional shapes around a
central axis. Revolve a rectangle around a central axis and the solid
formed is a cylinder. A cylinder resembles a prism in that its bases are
parallel and congruent. These bases

h = height
are circles instead of polygons.

Unfolding a cylinder creates a net with


two circles and one large rectangle for b = base length = C = 2r
the lateral area. Like a label on a soup

h = height
can, this rectangle must fold around the
circular bases, and thus its length is
equal to the circumference of the
circular bases.

Surface Area of a Cylinder


SA = 2(circle area) + lateral rectangle area
SA = 2(r2) + 2rh

Revolve a right triangle around a central axis and the solid formed is a
cone. A cone resembles a pyramid in that it has one base and points on the
base connect to a single apex point. The base is a circle instead of a
polygon.
h = height

Unfolding a cone creates a


net with a circle and a part
(sector) of a circle.
Depending on the height of
the cone, the sector can s
r = radius resemble a large piece of s
s
pizza or the video game icon l
Pac-Man.

Surface Area of a Cone


SA = circle area + sector area
SA = r2 + rs

Notice that the slant height s is the hypotenuse of the right triangle with legs h, the
height of the cone, and r, the radius of the circular base.

Try it now 8:
Find the surface area of the snow cone wrapper12 if the rim diameter is 3.25
inches and the cone height is 4.5 inches.13

12 Photo by bunchofpants on Flickr, http://www.fotopedia.com/items/flickr-140986466, license CC-BY-CN-


SA-2.0
13 Size of snow cone cup illustrated at http://www.concessionstands.com/SNO-KONE-6-OZ-CUPS-1000-CS
22

Revolve a semi-circle around a central axis and the solid formed


is a sphere.
r = radius
Unfolding a sphere is a challenge map makers face. The
Lambert's Cylindrical Equal-Area Projection maintains the areas
of the continents although the shapes distort toward the poles.
r = radius

This projection is made by visualizing the sphere circumscribed by a cylinder


with height the same as the diameter of the sphere (height h = 2r), projecting
the sphere onto the cylinder and unrolling the cylinder. The areas are
maintained because the surface area of the sphere is the same as the lateral
surface area of the cylinder.

Recall that the lateral surface of a cylinder is a rectangle that has base length
2r and height h, so A = 2rh. In this cylinder, h = 2r, so the lateral area of the
cylinder (and the surface area of the sphere) is SA = 2r(2r) = 4r2.

Surface Area of a Sphere


SA = 4r2, where r = radius of the sphere (distance from center point to edge of sphere)

Measuring Solids: Volume


We create the units we use to measure features of our world. Recall that linear measurement,
measurement of one dimensional figures, is measured in unit segments. Area, measurement
of the interior of a two-dimensional figure, is measured in unit squares. Volume,
measurement of the amount of space contained inside a three-dimensional figure, is
measured in unit cubes.

A unit segment: A unit square: A representation of a unit cube:

1 inch

1 square inch = 1 in 2
1 cubic inch = 1 in 3
Geometry 23

We can find the volume of the rectangular prism below by counting the unit cubes that would
fit inside of it. If we look at the top layer, we can see 9 cubes in the top layer, and then notice
that the cube is 3 layers high. We know the solid is a prism, the layers have to be uniform in
size so the volume will be:

9 𝑐𝑢𝑏𝑒𝑠 3 𝑙𝑎𝑦𝑒𝑟𝑠
𝑉=( ) = 27 𝑐𝑢𝑏𝑒𝑠
1 𝑙𝑎𝑦𝑒𝑟 1

So the volume is 27 cubic units.

One way to visualize a volume of any rectangular prism that will extend to other solids (and
is a fundamental concept in calculus) is viewing the prism as a stack of infinitesimally thin
rectangular layers, so thin that their volume can be approximated by their area:

Since the figure is a prism, all of the rectangles are


congruent . If we can find the area of one of the rectangle
h = height

layers, we can use the idea that these rectangles are


stacked up “h” high, where h = height of the prism, and
find the volume by finding: V = (layer area)(height).
Since all the layers have the same area, then we can use
the area of the base to find the layer area.

Volume of a Prism
V = (Base Area)(height of prism)
V = Bh

Note that an uppercase “B” is used to indicate this is the area of the base and not a
linear measurement, such as the length of a base.

This “master formula” and the layer concept can be used to find all volumes of prisms and
cylinders.

Example 6: Volume of a rectangular prism


Suppose a rectangular prism (“box”) has a base with dimensions 3
inches by 4 inches, and is 10 inches high.
10 inches

Using the formula V = Bh, we recognize that the Base Area is the
area of the rectangle:
B = (3 inches)(4 inches) = 12 in 2,
And so the volume is:
V = (12 in2)(10 in) = 120 in 3. 3 inches

Notice how the results have cubic units, which makes sense since we found the volume. If
the resultant units were not cubic units, it would suggest we made some sort of mistake along
the way and we should check our work.
24

Example 7: Volume of a Cylinder


A 10 m high cylinder has a base which is a circle with radius 20 cm.
The volume of the cylinder can be found using V = Bh, where the Base Area
is the area of a circle:
B = r2 = (20 cm)2

10 m
B = 400 cm2
In calculating the volume, we have to be mindful of the units. The Base Area
uses square centimeters, so we need to convert the height of the cylinder to
centimeters: r = 20 cm
10 𝑚 100 𝑐𝑚
ℎ= × = 1000 𝑐𝑚
1 1𝑚
So the volume will be:
V = (400 cm2)(1000 cm) = 400000 cm3 1256637.061 cm 3.

We can also convert this result from cm 3 to m3 using dimensional analysis:


1256637.061 𝑐𝑚3 (1 𝑚)3
× = 1.256637061 𝑚3
1 (100 𝑐𝑚)3
Note that the unit conversion needs to match cubic units, hence the cubing of the conversion
factor.

Try it now 9:
Swiss chocolate Toblerone bars 14 come in a box shaped like an
equilateral triangular prism. The giant 4.5 kg bar (almost 10 pounds!)
comes in a box that is 79.4 cm in height. The equilateral triangle bases
have sides that are 14.6 cm long 15.
a. Calculate the surface area of the box.
b. Calculate the volume of the box.

If we consider a pyramid and compare it to the prism with the same height
and base, we know its volume should be less than the prism. But how
much less? If you fill the pyramid with water and pour it into the prism, it
will take three pyramids to fill the prism. So the volume of the pyramid
is 1/3 of the prism. This same principle extends to cones as 1/3 of the
related cylinder. We can adjust our master formula accordingly:

Volume of a Pyramid or Cone


The volume of a pyramid or cone is one-third of the volume of the prism or cylinder
with the same base and height, thus:
1
V = (Base Area)(height)
3
1
V = Bh
3

14
Photo by WestportWiki (Own work) license CC-BY-SA-3.0, via Wikimedia Commons
15
Dimensions given for 4.5 kg bar from http://www.amazon.co.uk/Kraft-Toblerone-Jumbo-4-5-
Kg/dp/B004INT01A
Geometry 25

Try it now 10:


Remember Khafre and Khufu, Egyptian Pharaohs with
competing square pyramids? Khafre’s pyramid has a
base length of 705 feet and height of 471 feet. His
father’s Khufu’s pyramid has a base length of 756 feet
and a height of 481 feet. 16
a. Calculate the volume of Khafre’s pyramid and Khufu’s
pyramid.
b. How much larger is Khufu’s pyramid compared to
Pyramids at Giza: Khafre’s pyramid? Express your answer as a percent of
Khafre's pyramid is the center of the three
larger pyramids, Khufu's pyramid is to its increase.
right.

The Greek mathematician Archimedes (287 – 212 BC) proved


that the volume of a sphere is 2/3 of the volume of the cylinder
circumscribing it (he also proved the surface areas of the
cylinder and sphere were equal, as discussed earlier). Using V =
Bh and h = 2r, the volume of this circumscribing cylinder is V =
r2h = r2(2r) = 2r3. The volume of the sphere is 2/3 of this
volume, V = 2/3(2r3) = 4/3r3

Volume of a Sphere
4
V = r , where r = the radius of the sphere
3
3

If we consider a hemisphere, half of a sphere, with a circular


base and compare it to a circumscribing cylinder with height r
and a cone with the same circular base and height r, we can see
that the volume of a hemisphere should be larger than the

r
1
volume of the cone (V = r3) but smaller than the cylinder (V =
3
2
r2h = r3 since h = r). The volume of a hemisphere is r3,
3 r
1 2
which fits nicely in between the two: r < r < r (that’s rather
3 3 3
3 3
neat, don’t you think?).

Example 8: Snow Cone Volume


We can find the volume of the snow cone with rim diameter 3.25 inches and
the cone height 4.5 inches17 if consider it as two solids: a cone and a
hemisphere. Note that the radius for each is r = 3.25/2 = 1.625 inches.
1 1
Cone volume: V = r2h = (1.625 in)2(4.5 in)  12.444 in3
3 3
2 2
Hemisphere volume: V = r3 = (1.645)3  9.323 in3
3 3
Total volume of the snow cone is: V  12.444 in3 + 9.323 in3 = 21.767 in3.

16The Great Pyramid of Khufu. (n.d.). The Great Pyramid of Khufu. Retrieved June 26, 2014, from
http://puffin.creighton.edu/museums/cohagan/giza_great.htm
17 Size of snow cone cup illustrated at http://www.concessionstands.com/SNO-KONE-6-OZ-CUPS-1000-CS
26

Example 9: Comparing Volumes


A company makes regular and jumbo
marshmallows. The regular marshmallow has 25
calories. How many calories will the jumbo
marshmallow have?

We would expect the calories to scale with


volume. Since the marshmallows have cylindrical
shapes, we can use that formula to find the
volume. From the grid in the image, we can
estimate the radius and height of each
marshmallow.
Photo courtesy Christopher Danielson

The regular marshmallow appears to have a diameter of about 3.5 units, giving a radius of
1.75 units, and a height of about 3.5 units.
The volume is V = (1.75 units)2(3.5 units)  33.7 units3.

The jumbo marshmallow appears to have a diameter of about 5.5 units, giving a radius of
2.75 units, and a height of about 5 units.
The volume is about V = (2.75 units)2(5 units)  118.8 units3.

We could now set up a proportion, or use rates and dimensional analysis. The regular
marshmallow has 25 calories for 33.7 cubic units of volume. The jumbo marshmallow will
have:
118 𝑢𝑛𝑖𝑡𝑠 3 25 𝑐𝑎𝑙𝑜𝑟𝑖𝑒𝑠
× = 88.1 𝑐𝑎𝑙𝑜𝑟𝑖𝑒𝑠
1 33.7 𝑢𝑛𝑖𝑡𝑠 3

It is interesting to note that while the diameter and height are about 1.5 times larger for the
jumbo marshmallow, the volume and calories are about 1.53 = 3.375 times larger.

Try it Now 11:


A website says that you’ll need 48 fifty-pound bags of sand to fill a sandbox that measure 8ft
by 8ft by 1ft. How many bags would you need for a sandbox 6ft by 4ft by 1ft?

Scaling and Measurement


In Example 9, we compared the volume of two
marshmallows, assuming they were cylindrical in shape.
We noticed that their linear dimensions (diameter, height)
were proportional, having a ratio of 1.5. These cylinders
are examples of similar solids, the larger marshmallow an
enlargement/scaled up version of the smaller one. The ratio
of their volumes was not 1.5, but 1.53 = 3.375. If we know
the ratio of the linear measurements of similar solids, what
can we predict for the ratios of their surface areas and
volumes?
Geometry 27

The mineral pyrite often forms crystals in the shape of cubes.


The sample in the picture 18 shows three pyrite crystal cubes on
host rock. These crystals are similar solids as they are the same
shape, just different sizes: all their angles are the same (90)
and all their side lengths are proportional. If the length of the
side of the smallest cube is 12 mm, the “medium” cube is 15
mm, and the largest cube is 26 mm, what are their surface areas
and volumes?

A cube unfolded creates a net of 6 squares, as each face of a cube is a square and
it has 6 faces. Each face has area A = (base)(height) = (side length)(side length)
= (side length)2, and the surface area for each is SA = 6(side length)2. Listing the
surface areas in order from smallest crystal to largest crystal, we have:
SA = 6(12 mm)2 = 864 mm2
SA = 6(15 mm)2 = 1350 mm2
SA = 6(26 mm)2 = 4056 mm2

The volume of a cube is V = (Base Area)(height), and since the


Base Area B = (side length)2 and the height h = side length, then
side length

V = (side length)2(side length) = (side length)3. Listing the


volumes in order from smallest crystal to largest crystal, we
have:
V = (12 mm)3 = 1728 mm3
side length V = (15 mm)3 = 3375 mm3
V = (26 mm)3 = 17576 mm3

The table summarizes the ratios between the linear measurements (side lengths), the surface
areas, and the volumes of the crystals in fraction form.
medium/smallest largest/smallest largest/medium
side length 15 𝑚𝑚 5 26 𝑚𝑚 13 26 𝑚𝑚 26
ratios = = =
12 𝑚𝑚 4 12 𝑚𝑚 6 15 𝑚𝑚 15
surface 1350 𝑚𝑚2 25 4056 𝑚𝑚2 169 4056 𝑚𝑚2 676
area ratios = = =
864 𝑚𝑚2 16 864 𝑚𝑚2 36 3375 𝑚𝑚2 225
volume 3375 𝑚𝑚3 125 17576 𝑚𝑚3 2197 17576 𝑚𝑚3 17576
ratios = = =
1728 𝑚𝑚3 64 1728 𝑚𝑚3 216 3375 𝑚𝑚3 3375

These examples, especially the familiar values in the medium/smallest ratios, show a pattern
in the ratios of the measurements of similar solids.
Scale Factors for Similar Solids
If the ratio between two linear measurements of similar solids is k,
the ratio between the surface areas is k2,
and the ratio between the volumes is k3.

18 Photo by Teravolt at English Wikipedia, license CC-BY-3.0, via Wikimedia Commons


28

So many formulas! When problem solving with measurement, we need to ask ourselves
which formula to use. In order to determine which formula to use, we need to understand
what we are calculating. Is it a linear measurement, such as a distance? Is it an area
measurement, such as the interior of a flat shape such as floor space in a room? Is it a
volume measurement, such as the capacity of a solid? Clearly understanding what kind of
measurement we are working with as well as the shapes involved helps us solve problems
appropriately and arrive a meaningful answers.

Try it now 12:


Determine if the following contexts involve Distance, Area, or Volume.
You do not have to solve the problems.
a. How much Jell-O can a 10” by 12” by 2” pan hold?
b. How much crown molding do I need to trim out a 10 foot by 12 foot room?
c. How much carpet do I need for a 5 meter by 3 meter room?
d. How much fence do I need for a dog run that is 20 feet by 5 feet?
e. How much more soda fits in a large cup compared to a small cup?
f. How big an air conditioner do I need for my garage that is 20 ft by 25 ft with 10 ft tall
ceilings?
g. How much water will I need to fill my octagonal prism fish tank if the base area is 400
square inches and it is 18 inches tall?
h. How much paint do I need to buy to paint a room 10 ft by 12 ft with 8 ft ceilings? How
much will it cost me to have my lawn mowed if my yard is about 100 ft long by 30 ft
wide?
i. How much weather-stripping do I need to buy to help insulate a 7 ft by 3 ft door?

Transformational Geometry
French mathematician and philosopher Rene’ Descartes (1596–1650)
assigned coordinates to points (locations) the Cartesian coordinate
system connecting arithmetic, algebra and geometry. A German
mathematician, Felix Klein (1849–1925), applied the transformations
studied in algebra to geometry, creating a unified geometry that
included Euclidean and non-Euclidean geometries. Considered a part
of Euclidean geometry, although not studied by Euclid himself,
transformational geometry looks at how we can move (transform)
shapes, providing new methods to view and problem solve in geometry, art, architecture, and
anthropology. Cultures throughout history found imagery involving transformations visually
appealing, and archeological finds can be dated in part by how they use transformations.19

In the image at the right, the L-shaped polygon in one corner can be
transformed by shifting it, flipping it, turning it, or doing a combination of
these transformations to match another L-shaped polygon in the pattern. A
transformation is a procedure that maps a figure (called the preimage) onto
an image of itself.

19 Peil, T., & Martin, N. (n.d.). Historical. Historical. Retrieved June 26, 2014, from
http://web.mnstate.edu/peil/geometry/C3Transform/0historical.htm
Geometry 29

Rigid Transformations
The slides, flips and turns described previously are
known as rigid transformations. Each resulting
image is the same size and shape as the preimage;
the images are congruent to the preimage. 20 An
example of a non-rigid transformation is a dilation,
where the image is an enlargement or reduction of
the preimage, and is similar to but not congruent to
the preimage. The rigid transformations we
explore in this course are translations, reflections,
rotations, and glide reflections. Dawn, Seavey Pass, Yosemite National Park: Which is
the image and which is the preimage?

A translation shifts a figure in a specific direction


for a specific distance. Vectors are often used to
denote translation, because the vector shows both a
distance (its length) and a direction (the way it is
pointing). Vectors may look like rays, but are finite
(they end). In the illustration at the left, the red
vector 𝐷𝐸 indicates that the preimage triangle
ABC has been shifted down 1 unit and right 5
units to the image triangle A’B’C’. The tick
marks on the points identify A’B’C’ as the
image. The location of the vector doesn’t matter,
only its length and direction. It doesn’t have to
“touch” the figure and you don’t move the figure to
the vector’s position. Visualize what would happen
if you connected each vertex of the preimage triangle with the corresponding vertex on the
image triangle: the segments you create would look like the translation vector.

A reflection flips a figure over


a line (called the mirror line or
line of reflection) so that the
resulting image is a mirror
image of the original preimage.
In a reflection, each point on
one side of the mirror line is the
same distance away from the
line as its image on the other
side. Visualize what would
happen if you connected the
preimage points with their image points: the mirror line is the perpendicular bisector of the
segments you create (it intersects them at a right angle through the midpoint of the segment.

20 Photo by SteveD. on Flickr, http://www.fotopedia.com/items/flickr-5001437231 license CC-BY 2.0


30

A rotation turns a figure by a specific


angle about a center point, called the
center of rotation. This center point can
be on the shape, so that the shape turns
on part of itself, or off the shape, so that
the shape turns around this point like
hands moving around a clock
(“clockwise” is considered a negative
rotation, while “counterclockwise” is
positive). In illustration at the left, the
center of rotation is point C with
ABC rotated 90 to A’B’C’. If you measure the angle formed by the segments
connecting the preimage vertex to the center of rotation, and the center of rotation to the
image vertex, such as ACA’ shown, it will measure 90º.

The illustrations below show 90º rotations of ABC about a point F inside the triangle and
outside the triangle respectively.

A glide reflection is a combination of


two transformations: a translation by a
vector followed by reflection over a
mirror line. In the illustration at the
right, 𝐷𝐸 is the translation vector with
the red mirror line through the center.
The yellow triangle, ABC, is the
preimage. The blue triangles, both
labeled A’B’C’, are the image after
one of the two transformations, while the
green triangle, A”B”C”, is the image
after completing the glide reflection.
Notices that the glide reflection
processes are commutative; whether we glide then reflect or reflect then glide, the result is
the same.
Geometry 31

Each of the four rigid transformations we’ve looked at are called rigid because the shape and
size of the pre-image is unchanged during the transformation; the image is congruent to the
pre-image. Rigid transformations are examples of isometries. An isometry is a
transformation that maintains the distance between the points; side lengths on the image are
equal to side lengths on the preimage.

Try it now 13:


What kind of transformations have occurred in each illustration? How can you tell?
a. Siena Cathedral floor 21 b. Shark Cove tessellation 22

Symmetry
When a figure remains unchanged after a transformation we say it has symmetry. We see
symmetry in art and the world around us, as well as in physics, where Noether's Symmetry
Theorem asserts that each symmetry of a system corresponds to a physically conserved
quantity, such as symmetry under translation corresponding to conservation of momentum
and symmetry under rotation corresponding to conservation of angular momentum 23.

A shape has reflectional or line symmetry if you can fold it in


half along a line so the two halves match exactly. The figure is
reflecting onto itself. The "folding line" is called the line of
symmetry.

.
If you can rotate a figure around a center point by fewer than
72º
360° and the image matches the preimage, then the figure has
rotational (radial) or point symmetry. The smallest angle
you need to rotate the preimage to match the image is the
angle of rotation. The flower 24 can be rotated five times onto
itself, thus has five-point symmetry with rotation angle 72º.

21Photo by Clay Shonkwiler from Athens, GA, USA (Tessellation Uploaded by David Eppstein) license CC-
BY-2.0, via Wikimedia Commons
22Art by Sethness, http://sethness.deviantart.com/art/Sharks-Cove-Tessellation-388307173 license CC-BY-30
23 Information about Noether's Symmetry Theorem from Weisstein, Eric W. "Noether's Symmetry Theorem."

From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/NoethersSymmetryTheorem.html


24 Photo by Wildfeuer (Own work (own photo)) license CC-BY-SA-3.0.0, via Wikimedia Commons
32

Try it now 14:


Identify the symmetries of each figure. If the figure has line symmetry, identify how many
lines of symmetry exist. If the figure has point symmetry, identify the angle of rotation. If a
figure has both, indicate both.

a. b. c.

Tessellations
A tessellation shows a shape is repeated over and over again covering a plane without any
gaps or overlap between figures. Another word for tessellation is tiling, and if a figure
tessellates, it “tiles the plane.”

Some polygons tessellate by Some polygons need to be paired with another


themselves, like this right triangle: polygon to tessellate, like a regular octagon and
square:

What is the key to whether a polygon will tessellate? Let’s explore tessellation attempts with
regular polygons, using one type of regular polygon at a time:

Equilateral triangle Square Regular Pentagon Regular Hexagon Regular Octagon

We can see that the triangle, square, and regular hexagon


tessellate with no additional polygon needed. However,
the regular pentagon and octagon do not. Consider that the
angles of the polygons that tessellate meet at one vertex,
and what these angles are (shown in the image at the right).
There are 4 squares that meet at a point, each square has
angles that measure 90º, 4(90º) = 360º. Three hexagons
meet at a point, each hexagon has angles that measure
120º, 3(120º) = 360º. But regular pentagons have angles
that measure 108º, and 3(108º) = 324º, which doesn’t leave
Geometry 33

room for another pentagon. In general, if a polygon tessellates, the angles that meet at each
vertex must add to 360º. If we consider any triangle, with total interior angle sum 180º, or
any quadrilateral, with total interior angle sum 360º, they will tessellate the plane as their
angles can be arranged to meet at a point and add to 360º:

Dutch artist M.C. Escher (1898-1972) created intriguing tessellation-based artworks which
are displayed online at http://www.mcescher.com/gallery/symmetry/. Students and other
artists have created artwork inspired by Escher, such as the Shark Cove tessellation shown on
the previous page (Try it now 13). Be inspired by checking out http://tessellations.org/. If
you want a simple online tessellation maker (your imagination will decide what your
tessellation shows!), try http://www.shodor.org/interactivate/activities/Tessellate/.

To create a tessellation using translations, follow these steps:


1. Begin with a polygon you know will tessellate (tile) the plane such as a rectangle:

2. Cut off a piece on one side, and slide this piece to the other side (this is the translation):

3. Repeat this process with the other side (you can cut the shape through the sides as well):

You can use a glide


reflection, too:

4. Trace this figure repeatedly to “tile” the plane, and color as appropriate.

Another tessellation to enjoy:


25

25 Photo by Sharon Drummond on http://www.fotopedia.com/items/flickr-5728579339, license CC BY-ND 3.0


34

Non-Euclidean Geometry
Remember how the Greek mathematician Euclid presented geometry from a few basic
postulates, statements assumed to be true? The fifth postulate, a rather complicated one, is
rewritten here as Playfair’s axiom (based on work by Scottish mathematician John Playfair,
1748–1819) 26. This axiom says that given a line and a point not on the line,

there is one and only one line that can be drawn through the point parallel to the given line.
One important consequence of this postulate is that we were able to deductively prove that
the sum of the angles in any triangle is 180º.

Euclid assumed it was true, but suppose it is not true. What are the two other possibilities?
Each of these possibilities leads to alternative geometries!

1. Given a line and a point not on the line, no line can be drawn through the point
parallel to the given line.

Spherical (or elliptical) geometry arises from this alternative. Lines


on a sphere are great circles, circles with their center at the center of
the sphere. Great circles intersect each other at two points.
Longitudinal lines on the Earth are special cases of great circles, and
they intersect each other at the poles.

2. Given a line and a point not on the line, more than one line can be drawn through the
point parallel to the given line.

Hyperbolic geometry arises from this alternative. Hyperbolic


space is curved like a saddle, and a little more difficult to
visualize than spherical geometry. Like spherical geometry,
the lines are curves.

So which geometry system is correct? They all are, in their own system! Some theorems
hold true for all systems, some only in the system they are in, due to the postulates or axioms
that are accepted as true in their systems.

Try it now 15:


Given the illustrations of triangles in the descriptions above for 1) spherical geometry and 2)
hyperbolic geometry, make a conjecture about whether the sum of the angles for each
geometry is more than 180º, less than 180º or equal to 180º and explain your choice.

26 Proposition 30. (n.d.). Euclid's Elements, Book I,. Retrieved June 26, 2014, from
http://aleph0.clarku.edu/~djoyce/java/elements/bookI/propI30.html
Geometry 35

Fractal Geometry (optional)


Fractals are mathematical sets, usually obtained through recursion, that exhibit interesting
dimensional properties. We’ll explore what that sentence means through the rest of the
chapter. For now, we can begin with the idea of self-similarity, a characteristic of most
fractals.

Self-similarity
A shape is self-similar when it looks essentially the same from a distance as it does
closer up.

Self-similarity can often be found in nature. In the Romanesco broccoli pictured below 27, if
we zoom in on part of the image, the piece remaining looks similar to the whole.

Likewise, in the fern frond below28, one piece of the frond looks similar to the whole.

Similarly, if we zoom in on the coastline of Portugal 29, each zoom reveals previously hidden
detail, and the coastline, while not identical to the view from further way, does exhibit
similar characteristics.

27 http://en.wikipedia.org/wiki/File:Cauliflower_Fractal_AVM.JPG
28 http://www.flickr.com/photos/cjewel/3261398909/
29 Openstreetmap.org, CC-BY-SA
36

Iterated Fractals
This self-similar behavior can be replicated through recursion: repeating a process over and
over.

Example 10: Sierpinski’s gasket


Suppose that we start with a filled-in triangle. We connect the midpoints of each side and
remove the middle triangle. We then repeat this process.

Initial Step 1 Step 2 Step 3

If we repeat this process, the shape that emerges is called the Sierpinski gasket. Notice that it
exhibits self-similarity – any piece of the gasket will look identical to the whole. In fact, we
can say that the Sierpinski gasket contains three copies of itself, each half as tall and wide as
the original. Of course, each of those copies also contains three copies of itself.

We can construct other fractals using a similar approach. To formalize this a bit, we’re going
to introduce the idea of initiators and generators.

Initiators and Generators


An initiator is a starting shape
A generator is an arranged collection of scaled copies of the initiator

To generate fractals from initiators and generators, we follow a simple rule:

Fractal Generation Rule


At each step, replace every copy of the initiator with a scaled copy of the generator,
rotating as necessary

This process is easiest to understand through example.


Geometry 37

Example 11: Koch’s Curve


Use the initiator and generator shown to create the iterated fractal.

initiator generator

This tells us to, at each step, replace each line segment with the spiked shape shown in the
generator. Notice that the generator itself is made up of 4 copies of the initiator. In step 1,
the single line segment in the initiator is replaced with the generator. For step 2, each of the
four line segments of step 1 is replaced with a scaled copy of the generator:

Step 1 Scaled copy A scaled copy Step 2


of generator replaces each line
segment of Step 1

This process is repeated to form Step 3. Again, each line segment is replaced with a scaled
copy of the generator.

Scaled copy
Step 2 Step 3
of generator

Notice that since Step 0 only had 1 line segment, Step 1 only required one copy of Step 0.
Since Step 1 had 4 line segments, Step 2 required 4 copies of the generator.
Step 2 then had 16 line segments, so Step 3 required 16 copies of the generator.
Step 4, then, would require 16*4 = 64 copies of the generator.

The shape resulting from iterating this process is called the


Koch curve, named for Helge von Koch who first explored it
in 1904.
Koch curve
Notice that the Sierpinski gasket can also be described using
the initiator-generator approach

Initiator Generator
38

Example 12: Fractal Tree


Use the initiator and generator below, however only iterate on the “branches.” Sketch
several steps of the iteration.

initiator generator

We begin by replacing the initiator with the generator. We then replace each “branch” of
Step 1 with a scaled copy of the generator to create Step 2.

Step 1 Step 2
We can repeat this process to create later steps. Repeating this process can create intricate
tree shapes30.

Step 3 Step 4 Final shape

Try it Now 16:


Use the initiator and generator shown to produce the next two stages

Initiator Generator

30 http://www.flickr.com/photos/visualarts/5436068969/
Geometry 39

Using iteration processes like those above can create a variety of beautiful images evocative
of nature3132.

More natural shapes can be created by adding in randomness to the steps.

Example 13: Sierpinski’s Gasket with Randomness


Create a variation on the Sierpinski gasket by randomly skewing the corner points each time
an iteration is made.

Suppose we start with the triangle below. We begin, as before, by removing the middle
triangle. We then add in some randomness.

Step 0 Step 1 Step 1 with


randomness

We then repeat this process.

Step 1 with Step 2 with


Step 2
randomness randomness

Continuing this process can create mountain-like structures.

The landscape to the right 33 was created using fractals, then


colored and textured.

31 http://en.wikipedia.org/wiki/File:Fractal_tree_%28Plate_b_-_2%29.jpg
32 http://en.wikipedia.org/wiki/File:Barnsley_Fern_fractals_-_4_states.PNG
33 http://en.wikipedia.org/wiki/File:FractalLandscape.jpg
40

Fractal Dimension
In addition to visual self-similarity, fractals exhibit other interesting properties. For example,
notice that each step of the Sierpinski gasket iteration removes one quarter of the remaining
area. If this process is continued indefinitely, we would end up essentially removing all the
area, meaning we started with a 2-dimensional area, and somehow end up with something
less than that, but seemingly more than just a 1-dimensional line.

To explore this idea, we need to discuss dimension. Something like a line is 1-dimensional;
it only has length. Any curve is 1-dimensional. Things like boxes and circles are 2-
dimensional, since they have length and width, describing an area. Objects like boxes and
cylinders have length, width, and height, describing a volume, and are 3-dimensional.

1-dimensional 2-dimensional 3-dimensional

Certain rules apply for scaling objects, related to their dimension.

If I had a line with length 1, and wanted scale its length by 2, I would need two copies of the
original line. If I had a line of length 1, and wanted to scale its length by 3, I would need
three copies of the original.
1 2 3

If I had a rectangle with length 2 and height 1, and wanted to scale its length and width by 2,
I would need four copies of the original rectangle. If I wanted to scale the length and width
by 3, I would need nine copies of the original rectangle.
2 4 6
1
2
3

If I had a cubical box with sides of length 1, and wanted to scale its length and width by 2, I
would need eight copies of the original cube. If I wanted to scale the length and width by 3, I
would need 27 copies of the original cube.

2 3
1
1 1 2 2
3
3
Geometry 41

Notice that in the 1-dimensional case, copies needed = scale.


In the 2-dimensional case, copies needed = scale2.
In the 3-dimensional case, copies needed = scale3.

From these examples, we might infer a pattern.

Scaling-Dimension Relation
To scale a D-dimensional shape by a scaling factor S, the number of copies C of the
original shape needed will be given by:

Copies = ScaleDimension , or C = SD
Probability 1

Probability
“Life is a school of probability.”—Walter Bagehot

Introduction
The probability of a specified event is the chance or likelihood that it will occur. There are
several ways of viewing probability. One view is experimental or empirical, where we
repeatedly conduct an experiment. Suppose we flipped a coin over and over and over again
and it came up heads about half of the time; we would expect that in the future whenever we
flipped the coin it would turn up heads about half of the time. When a weather reporter says
“there is a 10% chance of rain tomorrow,” she is basing that on prior evidence; out of all
days with similar weather patterns, it has rained on 1 out of 10 of those days.

Another view would be subjective in nature, an educated guess. If someone asked you the
probability that the Arizona Diamondbacks would win their next baseball game, it would be
impossible to conduct an experiment where the same two teams played each other repeatedly,
each time with the same starting lineup and starting pitchers, each starting at the same time of
day on the same field under the precisely the same conditions. Since there are so many
variables to take into account, someone familiar with baseball and with the two teams
involved might make an educated guess that there is a 75% chance they will win the game;
that is, if the same two teams were to play each other repeatedly under identical conditions,
the Diamondbacks would win about three out of every four games. But this is just a guess,
with no way to verify its accuracy, and depending upon how educated the educated guesser
is, a subjective probability may not be worth very much.

We will return to the experimental and subjective probabilities from time to time, but in this
course we will mostly be concerned with theoretical probability, which is defined as
follows: Suppose there is a situation with n equally likely possible outcomes and that m of
those n outcomes correspond to a particular event; then the probability of that event is
m
defined as .
n

Basic Concepts
If you roll a die, pick a card from deck of playing cards, or randomly select a person and
observe their hair color, we are executing an experiment or procedure. In probability, we
look at the likelihood of different outcomes. We begin with some terminology.

Events and Outcomes


The result of an experiment is called an outcome.

An event E is any particular outcome or group of outcomes, n(E) is the number of


outcomes in the event E.

A simple event is an event that cannot be broken down further

The sample space S is the set of all possible simple events, n(S) is the number of
outcomes in the sample space S.

© David Lippman, ed. by Laurel Clifford Creative Commons BY-SA


2

Example 1: Die Toss


If we roll a standard 6-sided die, describe the sample space and some simple events.

The sample space is the set of all possible simple events: {1,2,3,4,5,6}

Some examples of simple events:


We roll a 1
We roll a 5

Some compound events:


We roll a number bigger than 4 Two dice One die
We roll an even number

Basic Probability
Given that all outcomes are equally likely, we can compute the probability of an event
E using this formula:
Number of outcomes corresponding to the event E nE
P( E )  
Total number of equally likely outcomes in the sample space S n  S 

Notice how this ratio represents a “part to whole” relationship.

Example 2: Die Roll Probabilities


If we roll a 6-sided die, calculate
a. P(rolling a 1)
b. P(rolling a number bigger than 4)

Recall that the sample space is {1,2,3,4,5,6}


1
a. There is one outcome corresponding to “rolling a 1”, so the probability is
6
2 1
b. There are two outcomes bigger than a 4, so the probability is 
6 3

Probability ratios are part-to-whole fractions, and can be reduced to lower terms like
fractions. Probabilities can also be expressed in decimal or percent form. We used
theoretical probability to create the ratio of 1/6 for the probability of rolling a 1. We could
conduct an experiment by rolling a die 10 times, recording the amount of times a 1 appeared.
Suppose a 1 appeared 2 times out of the 10 rolls, indicating a probability of 2/10. Converting
to percent form creates a common denominator (n% = n/100) allowing us to easily compare
the two probabilities. Theoretically, the probability of rolling a 1 is 1/6 16.7%, while
experimentally, we found the probability of rolling a 1 to be 2/10 = 20%.

Why the difference in the two probabilities? Does it indicate that our theoretical probability
is incorrect, or that our die is loaded? Theoretical probability predicts the “long run” results,
the likelihood of an event based on numerous trials. Ten die tosses is not very many. If we
toss the die 100 times, 1 appears 19 times, for a 19% chance. Toss it 200 times, 1 appears 33
Probability 3

times for a 16.5% chance. Toss it 500 times, 1 appears


88 times for a 17.6% chance. The graph at the right
shows the results from a computer-generated die toss
experiment. As the number of trials (tosses) increases,
the probability of tossing a particular number (like a 1)
gets closer to 16.67%, an approximation of 1/6. As we
toss the die more often, the experimental probability
gets closer to the theoretical probability. This concept
is known as the Law of Large Numbers:

LAW OF LARGE NUMBERS:


Probability applies to repeated trials, not to single events. It predicts for the long run,
large numbers of trials. As we conduct more trials, the experimental probability
approaches the theoretical probability; the difference between the two probabilities gets
closer to 0.

Example 3: Bag of Cherries


Let's say you have a bag with 20 cherries, 14 sweet and 6 sour. If you pick a cherry at
random, what is the probability that it will be sweet?

There are 20 possible cherries that could be picked, so the number of possible outcomes is
20. Of these 20 possible outcomes, 14 are favorable (sweet), so the probability that the cherry
14 7
will be sweet is  .
20 10

There is one potential complication to this example. It must be assumed that the probability
of picking any of the cherries is the same as the probability of picking any other. This
wouldn't be true if (let us imagine) the sweet cherries are smaller than the sour ones. The sour
cherries would come to hand more readily when you sampled from the bag. Keep in mind
that when we assess probabilities in terms of the ratio of favorable to all potential cases, we
rely heavily on the assumption of equal probability for all outcomes. We use the word
random in probability to indicate this concept: each cherry in the bag is assumed to have an
equal chance of being chosen. Random does not mean haphazard or unrelated, but rather
this assumption that each element of the sample space has an equal chance of being chosen.

Try it Now 1:
At some random moment, you look at your clock and note the minutes reading.
a. What is probability the minutes reading is 15?
b. What is the probability the minutes reading is 15 or less?

Cards
A standard deck of 52 playing cards consists of four suits (hearts, spades, diamonds
and clubs). Spades and clubs are black while hearts and diamonds are red. Each suit
contains 13 cards, each of a different rank: an Ace (which in many games functions as
both a low card and a high card), cards numbered 2 through 10, a Jack, a Queen and a
King.
4

Example 4: Card Deck Probabilities


Calculate the probability of randomly drawing one card from a deck and getting an Ace.
4 1
There are 52 cards in the deck and 4 Aces so P( Ace)    0.0769
52 13
There is a 7.69% chance that a randomly selected card will be an Ace.

Notice that the smallest possible probability is 0 – if there are no outcomes that correspond
with the event. The largest possible probability is 1 – if all possible outcomes correspond
with the event.

Certain and Impossible events


An impossible event has a probability of 0.
A certain event has a probability of 1.
The probability of any event must be 0  P ( E )  1

In the course of this chapter, if you compute a probability and get an answer that is negative
or greater than 1, you have made a mistake and should check your work.

Try it now 2:
If you are rolling a single die, list an event that would have a probability of:
a. 0 b. 0.5 c. 1 d. between 0.5 and 1
and explain why.

Working with Events

Complementary Events
Now let us examine the probability that an event does not happen. As in the previous section,
consider the situation of rolling a six-sided die and first compute the probability of rolling a
six: the answer is P(six) =1/6. Now consider the probability that we do not roll a six: there
are 5 outcomes that are not a six, so the answer is P(not a six) = 5/6. Notice that
1 5 6
P(six )  P(not a six )     1
6 6 6

This is not a coincidence. Consider a generic situation with n possible outcomes and an
event E that corresponds to m of these outcomes. Then the remaining n - m outcomes
correspond to E not happening, thus
nm n m m
P(not E )     1   1  P( E )
n n n n

Complement of an Event
The complement of an event is the event “E doesn’t happen”
The notation E is used for the complement of event E.
We can compute the probability of the complement using P  E   1  P( E )
Notice also that P( E )  1  P  E 
Probability 5

Example 5: Heart vs. Not Heart


If you pull a random card from a deck of playing cards, what is the probability it is not a
heart?

13 1
There are 13 hearts in the deck, so P(heart )   .
52 4
The probability of not drawing a heart is the complement:
1 3
P(not heart )  1  P(heart )  1  
4 4

Understanding the wording we use in describing an event is critical to insuring that we


calculate the probability correctly. Recall the Boolean operators “AND” and “OR” and how
they define the sets on which we focus. When conditions are connected with AND, both
conditions must be true for the outcome to part of the event. When conditions are connected
with OR, if either condition is true (including both at the same time), then the outcomes are
part of the event.

Example 6: Decks of Cards with AND and OR


Assuming you pull a random card from a standard deck of playing cards, find the following
probabilities:

a. P(Queen OR seven):
There are 4 Queens in the deck and 4 sevens in the deck, so there are 8 cards that meet either
condition of being a Queen or a seven out of 52 total cards,
8 2
P(Queen OR seven)    0.153846...  15.4%
52 13

b. P(Queen AND seven):


There is no card that meets the conditions of being a Queen and a seven at the same time,
0
P(Queen AND seven)   0%
52

c. P(heart OR Ace):
There are 13 hearts in the deck and 4 aces, one of which is the Ace of hearts, so there are 16
cards that meet either condition of being a heart or an Ace out of 52 total cards,
16 4
P(heart OR Ace)    0.307692...  30.8%
52 13

d. P(heart AND Ace):


There is only one card in the deck that meets both conditions of being a heart and an Ace at
the same time, the Ace of hearts,
1
P(heart AND Ace)   0.019230...  1.9%
52
6

A Venn diagram illustrates the sets involved in the probabilities and how the conditions
define the outcomes. Recall that AND is the intersection of the sets and OR is the union of
the sets.

Qs 7s s 2 3 As
7 A
4 5
Q 7 6 7 8 A
Q Q A
9 10 J
Q Q Q K A
Q

When we found the probability of a Queen or a seven, we counted the cards. We could also
find the probability using the probability of each set individually. The probability of drawing
a Queen from a standard deck of 52 cards is 524 since there are 4 queens in the deck. The
probability of drawing a seven is also 524 since there are 4 sevens in the deck. If we add these
4 4 8
two probabilities,   , we get the same probability we found by counting the cards
52 52 52
themselves. In the Venn diagram, we can see that there is no danger of over counting in
adding the probabilities since the intersection between the sets is empty. A card cannot be
both a Queen and a seven at the same time. These two events are mutually exclusive.

If we try the addition technique for finding the probability of a heart or an Ace, we have to be
more careful. The probability of drawing a heart from a standard deck of 52 cards is 13 52 since

there 13 hearts in the deck. The probability of drawing an Ace is 524 since there are 4 Aces in
13 4 17
the deck. If we add these two probabilities,   , which is not the same probability
52 52 52
we found by counting the cards. If we look at the Venn diagram, we can see that there is a
card in the intersection between the two sets: the Ace of hearts. An Ace and a heart are not
mutually exclusive events, so adding the individual probabilities double counts this card in
the intersection. We can correct for this over count by subtracting the probability of getting
13 4 1 16
the Ace of hearts, the card in the intersection:    , and we have the same
52 52 52 52
result we found from counting the individual cards.

We can generalize this addition technique to find probabilities involving “OR”:

P(A or B)
The probability of either A or B occurring (or both) is
P(A or B) = P(A) + P(B) – P(A and B),
subtracting out the over-counted event which is in the intersection of the two sets A and
B.

If A and B are mutually exclusive events, P(A and B) = 0,


P(A or B) = P(A) + P(B)
Probability 7

Example 7: Kings or Queens


Suppose we randomly draw one card from a standard deck. What is the probability that we
get a Queen or a King?

There are 4 Queens and 4 Kings in the deck, hence 8 outcomes corresponding to a Queen or
King out of 52 possible outcomes. Thus the probability of drawing a Queen or a King is:
8
P(King or Queen) 
52
Note that in this case, there are no cards that are both a Queen and a King,
so P(King and Queen)  0 . Using our probability rule, we could have said:
4 4 8
P(King or Queen)  P(King )  P(Queen)  P(King and Queen)   0
52 52 52

In the last example, the events (King or Queen) were mutually exclusive, a card cannot be
both a King and a Queen at the same time, so P(A or B) = P(A) + P(B).

Example 8: Red or Kings


Suppose we draw one card from a standard deck. What is the probability that we get a red
card or a King? Notice that these are not mutually exclusive: A card can be both red and a
King at the same time: the King of Hearts and the King of Diamonds.

Half the cards are red, so


26
P(red ) 
52
There are four kings, so
4
P(King ) 
52
There are two red kings, so
2
P(Red and King ) 
52
We can then calculate
26 4 2 28
P(Red or King )  P(Red)  P(King )  P(Red and King )    
52 52 52 52

Try it Now 3:
Suppose you are drawing a single card out of a standard 52 card deck. Find the following
probabilities:
a. P(spade or Jack)
b. P(spade and Jack)
c. P(2 or Jack)
d. P(2 and Jack)
8

Example 9: Empirical Probabilities


The table below shows the number of survey subjects who have received and not received a
speeding ticket in the last year, and the color of their car.
Speeding No speeding Total
ticket ticket
Red car 15 135 150
Not red car 45 470 515
Total 60 605 665
Find the probability that a randomly chosen person:
a. Has a red car and got a speeding ticket
We can see that 15 people of the 665 surveyed had both a red car and got a speeding ticket,
15
P(red car and speeding ticket) =  0.0226  2.3%
665

b. Has a red car or got a speeding ticket.


We could answer this question by simply adding up the numbers: 15 people with red cars
and speeding tickets + 135 with red cars but no ticket + 45 with a ticket but no red car = 195
people,
195
P (red car or speeding ticket) =  0.2932  29.3%
665
We also could have found this probability by:
P(had a red car) + P(got a speeding ticket) – P(had a red car and got a speeding ticket)
150 60 15 195
=    .
665 665 665 665

Conditional Probability
Often it is required to compute the probability of an event given that another event has
occurred. Consider the data from the previous example:

Speeding No speeding Total


ticket ticket
Red car 15 135 150
Not red car 45 470 515
Total 60 605 665

Based on this data, the probability of a randomly chosen car getting a speeding ticket is
60/665  9%, as there are 60 cars with speeding tickets out of 665 total cars. What is the
probability of a randomly chosen car getting a speeding ticket if we already know the car is
red? This condition (red car) changes the sample space to just the 150 red cars, and our
event will be the 15 cars that are both red and have speeding tickets, narrowing the data we
need to the first row of red car data in the table:
Probability 9

Speeding No speeding Total


ticket ticket
Red car 15 135 150
Not red car 45 470 515
Total 60 605 665

15 red cars AND have speeding tickets


P(speeding ticket, given that it’s a red car)   10%
150 total red cars
𝑛(𝐴 𝑎𝑛𝑑 𝐵)
Notice how this ratio has the form P(B given A) = .
𝑛(𝐴)

Conditional Probability
The probability the event B occurs, given that event A has happened, is represented as
P(B | A)
This is read as “the probability of B given A” and can be found calculating the ratio:
n  A and B 
P  B | A 
n  A
Notice how the sample space (denominator or “whole” of the part-whole relationship)
is the defined by the condition, the event A that has already occurred.

Example 10: More Conditional Tickets


The table below shows the number of survey subjects who have received and not received a
speeding ticket in the last year, and the color of their car.
Speeding No speeding Total
ticket ticket
Red car 15 135 150
Not red car 45 470 515
Total 60 605 665

Find the probability that a randomly chosen person:


a. Has a speeding ticket given they have a red car

We calculated this one in our discussion of conditional probability. Since we know the
person has a red car, we are only considering the 150 people in the first row of the table. Of
those, 15 have a speeding ticket, so
15 1
P(ticket | red car) =   0.1 = 10%
150 10
b. Has a red car given they have a speeding ticket
Since we know the person has a speeding ticket, we are only considering the 60 people in the
first column of the table. Of those, 15 have a red car, so
15 1
P(red car | ticket) =   0.25 = 25%
60 4

We can see from the last example that P(B | A) is not equal to P(A | B).
10

Example 11: Test Results


A home pregnancy test was given to women, then pregnancy was verified through blood
tests. The following table shows the home pregnancy test results.
Positive Negative test Total
test
Pregnant 70 4 74
Not Pregnant 5 14 19
Total 75 18 93

Find
a. P(not pregnant | positive test result):
Since we know the test result was positive, we’re limited to the 75 women in the first
column, of which 5 were not pregnant.
5
P(not pregnant | positive test result) =  0.067  6.7%
75
b. P(positive test result | not pregnant):
Since we know the woman is not pregnant, we are limited to the 19 women in the second
row, of which 5 had a positive test.
5
P(positive test result | not pregnant) =  0.263  26.3%
19
The second result is what is usually called a false positive: a positive result when the woman
is not actually pregnant.

Try it now 4:
Consider the data from a survey about preferred ice cream flavors:
Chocolate Strawberry Total
Male 38 62 100
Female 56 44 100
Total 94 106 200
Based on the data, if we randomly choose a person, what is:
a. P(they prefer chocolate | male) b. P(they are female | prefer chocolate)

Odds
The probability ratio has a part-whole form, where the numerator is the count of the “part”
that matches the event E we’re investigating and the denominator is the count of the “whole”
sample space S. Another way to express the likelihood of an event is odds. Odds use a part-
part format rather than part-whole, where the parts relate to the event E and its complement,
E , or “not E.”

Odds against an event


The odds against an event are the number of outcomes for the complement of the
event compared to the number of outcomes for the event:
n( E ) : n( E ), and can be thought of as “how many are not E to how many are E.”

𝑃(𝐸̅ )
The odds against an event can be found using probabilities:
𝑃(𝐸)
Probability 11

Odds in favor of an event


The odds in favor of an event are the number of outcomes for the event compared to
the number of outcomes for complement of the event:
n( E ) : n( E ), and can be thought of as “how many are E to how many are not E.”

𝑃(𝐸)
The odds in favor of an event can be found using probabilities:
𝑃(𝐸̅ )

Notice that the number of outcomes for E and the number outcomes for E (or not E) add up
to the number of outcomes in the entire sample space: n(E) + n( E ) = n(S).

Applying odds to a die toss, we know the P(rolling a one) = 1/6, as there is 1 way to roll a
one out of 6 total outcomes. We can conclude there are 5 ways to not roll a one, so the odds
against rolling a one are 5 to 1. The odds in favor of rolling a one are 1 to 5. The odds in
favor are the reciprocal of the odds against, and the sum of the odds (1 + 5 = 6) matches the
total number of outcomes.

We can use the probabilities to calculate the odds by finding the ratio of the probabilities.
1 5
We know 𝑃(𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑜𝑛𝑒) = and 𝑃(𝒏𝒐𝒕 𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑜𝑛𝑒 ) = . To find the odds against
6 6
𝑃(𝒏𝒐𝒕 𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑜𝑛𝑒) 5/6
rolling a one, we find the ratio of probabilities = . To simplify this ratio
𝑃(𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑜𝑛𝑒) 1/6
5 6 5
we multiply the numerator by the reciprocal of the denominator: × = , giving us the
6 1 1
odds 5 to 1, the same result we found from counting outcomes. The “whole” part of the
probability ratios, the 6, cancels out, leaving us with the ratio of the “parts.”

Example 12: Odds in a Bowl


Consider the bowl of poker chips. There are 5 white chips and
3 blue chips. If we randomly draw out a chip, what are the:

a. odds against drawing out a white chip


The odds against drawing out a white chip are the number of
ways you draw a chip that is not white compared to the number of ways you can draw a chip
that is white. Since there are 3 chips that are not white and 5 chips that are white, the odds
against drawing out a white chip are 3 to 5, or 3:5.

b. odds in favor of drawing out a white chip


The odds in favor of drawing out a white chip are the reverse of the odds against. There are 5
chips that are white and 3 chips that are not white, so the odds in favor of drawing a white
chip are 5 to 3, or 5:3.

If we know the odds, we can create the probabilities. The odds tell us the “parts” in the
situation, and we can find the sum of the parts to determine the whole. Suppose the odds
against successfully navigating an asteroid field are 7020 to 1. These numbers tell us that
there were 7020 not successful navigations to 1 successful navigation of the asteroid field.
The sum of these odds, 7020 + 1 = 7021, gives us the number of outcomes in the “whole”
sample space. Thus the probability of successfully navigating an asteroid field is 1/7021, or
about 0.014%. Note that these numbers may actually represent a reduced fraction (the actual
12

probability may have been 2/14042 or some other equivalent fraction), but since probabilities
are ratios, they reduce to the same ratio.

Example 13: Odds to Probability


If we know that the odds in favor of drawing out a white chip from a bowl are 5:3, what is
P(drawing out a white chip)?

We have the “parts” 5:3, which tell us the white chip “part” is 5, since these are odds in favor
so they start with n(E) and the not white chip “part” is 3. The sum of the parts gives us the
whole, so the total number of chips in the bowl is 5 + 3 = 8. We can conclude:
5
P(drawing out a white chip) =  0.625  62.%.
8
We are assuming that the actual number of chips in the bowl are expressed in the odds, when
they could be a reduced ratio and the actual number of chips some multiple of the odds, such
as 5n and 3n. The sum of the parts is 5n + 3n = 8n, and the probability ratio is 85 nn , with the n
common factor canceling out, leaving us the same result of 5/8.

How do you tell odds from probability? The way we word them is often helpful. If we say
there is a “1 in 5 chance,” we are comparing “part to whole” and P(E) = 1/5. If we say we
have a “1 to 5 chance,” we are comparing “part to part” and these are odds (assumed in favor
for this example) and P(E) = 1/6.

Try it now 5:
A roulette wheel has 38 spaces, divided into:
18 RED, 18 BLACK, numbered 1 – 36, and 2 GREEN, numbered 0, 00
Assuming a ball spun on the wheel has an equal chance of landing on any
of the spaces, determine the following:
a. P(lands on black, if there were no greens were on the wheel)
b. P(lands on red, if there were no greens on the wheel)
c. Odds against landing on black, if there were no greens on the wheel
d. P(lands on black)
e. Odds against landing on black
Note that d, e assume the wheel has the greens included on it.

If you completed the Try it now successfully, you determined the odds against landing on
Black without greens on the wheel to be 18:18 which reduces to 1:1, and the odds against
landing on Black with greens on the wheel to be 20:18 which reduces to 10:9. Notice how
having the greens on the wheel skews the odds to favor the house (increases the odds against
the player). Without the greens, you have an even chance of winning or losing by betting on
black.

Payoff (or Payout) odds are used by gamblers, bettors, and casinos to express the ratio of
how much you win compared to how much you bet. If a particular horse has odds 20 to 1, it
means that should that horse win, you profit $20 for every $1 you bet. If you bet $100, you
would profit $2000 (you get your $100 back and $2000 more). Before you rush off to the
track, be aware that these odds usually reflect the actual odds of the horse winning, so a 20 to
1 horse has only a 1/21 chance of winning.
Probability 13

Looking at the game of roulette, the payoff odds for black are 1:1, so if you win, you profit
$1 for each $1 you bet. Consider the odds of winning on black without the greens on the
wheel are 1:1, matching the payoff odds. The odds of actually winning on black are 9:10, not
even but skewed to the house. The house pays you off as if the game was even, but it is not,
so the house will make money. After all, a casino is a business, not a charity. How much
money can the casino expect to make? We can calculate this expected value.

Expected Value
Expected value, or mathematical expectation, is perhaps the most useful probability concept
we will discuss. It has many applications, from insurance policies to making financial
decisions. Casinos and government agencies that run gambling operations and lotteries hope
most people never learn about it!

Example 14: Expected Value of Roulette


1
Recall the casino game roulette, a wheel with 38 spaces (18 red,
18 black, and 2 green) is spun. Suppose a player bets $1 on a
single number. If that number is spun on the wheel, then they
receive $36 (their original $1 + $35, as the payoff odds for a
single number are 35 to 1). Otherwise, they lose their $1. On
average, how much money should a player expect to win or lose
if they play this game repeatedly?

Suppose you bet $1 on each of the 38 spaces on the wheel, for a total of $38 bet. When the
winning number is spun, you are paid $36 on that number. While you won on that one
number, overall you’ve lost $2. On a per-space basis, you have “won” -$2/$38 ≈ -$0.053. In
other words, on average you lose 5.3 cents per space you bet on.

We call this average gain or loss the expected value of playing roulette. Notice that no one
ever loses exactly 5.3 cents: most people (in fact, about 37 out of every 38) lose $1 and a
very few people (about 1 person out of every 38) gain $35 (the $36 they win minus the $1
they spent to play the game).

There is another way to compute expected value without imagining what would happen if we
play every possible space. There are 38 possible outcomes when the wheel spins, so the
1 37
probability of winning is . The complement, the probability of losing, is .
38 38

Summarizing these along with the values, we get this table:


Outcome We win (ball lands on our space) We lose (ball lands elsewhere)
Probability of 1 37
outcome 38 38
Payoff of outcome
$35 $1
(per $1 bet)

1 Photo CC-BY-SA http://www.flickr.com/photos/stoneflower/


14

Notice that if we multiply each outcome by its corresponding probability we get


1 37
$35   0.9211 and  $1   0.9737 , and if we add these numbers we get
38 38
$0.9211 + (-$0.9737) ≈ -$0.053, which is the expected value we computed above. This value
tells we can expect to lose on average $0.053 or 5.3 cents per game played. Once again, we
can’t actually lose 5.3 cents in a game (we lose our $1 bet or we win $35), but if we
continued to play roulette for 1000 games, betting $1 per game, we can expect to lose 5.3
$0.053 1000 games
cents per game on average, or   $53 over all.
1 game 1

Expected Value or Mathematical Expectation


Expected Value is the average gain or loss of an event if the procedure is repeated
many times. We can compute the expected value by multiplying each outcome by
the probability of that outcome, then adding up the products.

Expected Value (EV) = P1A1 + P2A2 + P3A3 + …


where Pn is the probability of an outcome and An is the payoff for the outcome.

Example 15: Dart at a Target


So suppose we’re playing a game where we randomly toss a dart at the target
shown. If we hit part A, we win $10. If we hit part B, we lose $5. What is A
the expected value (average winnings/losses) over the long run?
B
We use the area of each region to determine the probability of hitting that
region by comparing what part of the whole target area they represent. Part A appears to be
about ¼ of the entire area, and part B appears to be about ¾ of the entire area. If we had
more data/dimensions, we could better determine the exact areas.

Summarizing in the information in a table, we have:


Outcomes Hit A Hit B
Probability of 1 3
Outcome 4 4
Payoff $10 -$5
Notice that since we lose $5 if we hit B, we express this loss as negative (-$5).
1 3
So the expected value is EV   $10    $5   $1.25.
4 4
We would lose an average of $1.25 per game if we played this game a lot. Not a good idea.

Try it Now 6:
You purchase a raffle ticket to help out a charity. The raffle ticket costs $5. The charity is
selling 2000 tickets. One of them will be drawn and the person holding the ticket will be
given a prize worth $4000. Compute the expected value for this raffle.

In general, if the expected value of a game is negative, it is not a good idea to play the game,
since on average you will lose money. It would be better to play a game with a positive
Probability 15

expected value (good luck trying to find one!), although keep in mind that even if the
average winnings are positive it could be the case that most people lose money and one very
fortunate individual wins a great deal of money. If the expected value of a game is 0, we call
it a fair game, since neither side has an advantage.

Not surprisingly, the expected value for casino games is negative for the player, which is
positive for the casino. It must be positive or they would go out of business. Players just
need to keep in mind that when they play a game repeatedly, their expected value is negative.
That is fine so long as you enjoy playing the game and think it is worth the cost. But it
would be wrong to expect to come out ahead.

Expected value also has applications outside of gambling. Expected value is very common
in making insurance decisions.

Example 16: Insurance


A 40-year-old man in the U.S. has a 0.242% risk of dying during the next year 2. An
insurance company charges $275 for a life-insurance policy that pays a $100,000 death
benefit. What is the expected value for the person buying the insurance?

The probabilities and outcomes are summarized in the table. Notice how we used the
concept of complementary events to find the probability the man does not die for the other
outcome.
Outcomes Dies in the next year Does not die in the next year
Probability of
0.00242 1 – 0.00242 = 0.99758
Outcome
Payoff $100,000 – $275 = $99.725 -$275

The expected value is EV = (0.00242) ($99,725) + (0.99758) (-$275) = -$33.

Note that the expected value is negative; the insurance company can only afford to offer
policies if they, on average, make money on each policy. They can afford to pay out the
occasional benefit because they offer enough policies that those benefit payouts are balanced
by the rest of the insured people. For people buying the insurance, there is a negative
expected value, but there is a security that comes from insurance that is worth that cost.

Try it now 7:
Asphalt needs the right temperature for compaction.
Temperatures about 70ºF are more favorable than cooler
temperatures3. Suppose a crew can repair 50 potholes a
day when the temperature is above 70ºF but only 30 when
the temperature is below 70ºF. The forecast for the next
day has a 40% chance of being below 70ºF. What is the
expected number of potholes the crew can repair? Hint:
organize your data in a table.

2 According to the estimator at http://www.numericalexample.com/index.php?view=article&id=91


3 According to http://www.asphaltpavement.org/driveways
16

Multi-stage Events
Up to this point, our work with probability focused on single events, such as rolling a die or
drawing a card. Our next section looks at multi-stage events, such as rolling two dice or
drawing a flush in a five-card hand of poker. Calculating the probability of these events
becomes more complex, and we will explore different problem solving tools that will aid us
in these calculations, such as tree diagrams and counting principles.

Consider the game of Monopoly, where rolling two dice determines


your movement around the board. Rolling “doubles” (the same number
on both dice) can be a good thing or a bad thing. If you roll “doubles,”
you move the number of spaces indicated by the sum of the two dice,
and you also get to roll again. However, rolling doubles three times in
a row sends you to the “In Jail” space. You can get out of Jail by
rolling doubles when your turn comes around again.4 How likely is it
to actually roll doubles?

We know there is a 1/6 chance of rolling a particular number on a single die. How does the
1/6 chance come into play when we look at rolling a particular number on both dice? Listing
the sample space from tossing two dice can let us count the possible outcomes:

1 2 3 4 5 6
1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
6 (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

From the chart above, we can see that there are 36 outcomes in the sample space, of which
“doubles” account for 6 of the outcomes, so the probability of rolling doubles on two dice is
6/36 = 1/6.

Rolling a particular set of doubles, such as double fives, has a probability of 1/36, since there
is one way to roll double fives (5, 5) out of 36 total possible outcomes. If we compare the
probability of rolling a five on a single die to the probability of rolling double fives, or five
on one die AND five on the other die, we can conjecture a rule for finding probabilities of
events involving two events:

P(5 on one die) = 1/6


P(5 on one die and 5 on the other die) = 1/36

Notice that 1/36 = (1/6)(1/6), which suggests that we multiply the probabilities. Throughout
this section, we explore the multiplicative nature of multi-stage or compound probability,
the probability of two or more events occurring.

4 From official Monopoly rules (2007 edition) at http://www.hasbro.com/common/instruct/00009.pdf


Probability 17

Probability of two independent events


Critical to our work with compound probability is the concept of independent events. When
you toss two dice, the outcome on one die does not influence the outcome of the other die. If
we roll a five on the first die, the chance of rolling a five on the second die does not change
(it is still 1/6). It’s not like one die knows what the other die is doing. These events are
independent.

Example 17: Flip a Coin and Roll a Die


Suppose we flipped a coin and rolled a die, and wanted to know the probability of getting a
head on the coin and a 6 on the die.

We could list all possible outcomes: {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}.
Notice there are 2 · 6 = 12 total outcomes. Out of these, only 1 is the desired outcome, {H6}
1
P(Heads on the coin and 6 on the die) = P(H6) = .
12
We can see the multiplicative aspect of these two events in this example as well, since
1 1
P(Heads on the coin) = , P(6 on the die) =
2 6
1 1 1
with P(Heads on the coin and 6 on the die) =   , the same result as when we counted
2 6 12
the outcomes.

Like tossing two dice, flipping a coin and tossing a die are independent events. The flip of
the coin does not have an effect on the toss of the die.

Independent Events
Events A and B are independent events if the probability of Event B occurring is the
same whether or not Event A occurs.

Example 18: Independent vs. Dependent Events


Are these events independent?
a) A fair coin is tossed two times. The two events are (1) first toss is a head and (2) second
toss is a head.
The probability that a head comes up on the second toss is 1/2 regardless of whether or not a
head came up on the first toss, so these events are independent.

b) The two events (1) "It will rain tomorrow in Houston" and (2) "It will rain tomorrow in
Galveston” (a city near Houston).
These events are not independent because it is more likely that it will rain in Galveston on
days it rains in Houston than on days it does not.

c) You draw a card from a deck, then draw a second card without replacing the first.
The probability of the second card being red depends on whether the first card is red or not,
so these events are not independent.
18

As we’ve noticed in our previous examples, when two events are independent, the
probability of both occurring is the product of the probabilities of the individual events.

P(A and B) for independent events


If events A and B are independent, then the probability of both A and B occurring is

P(A and B) = P(A) · P(B)

where P(A and B) is the probability of events A and B both occurring, P(A) is the
probability of event A occurring, and P(B) is the probability of event B occurring

Example 19: White Socks and Shirt?


In your drawer you have 10 pairs of socks, 6 of which are white, and 7 tee shirts, 3 of which
are white. If you randomly reach in and pull out a pair of socks and a tee shirt, what is the
probability both are white?

This question is asking us to calculate P(white socks and white tee shirt). Since socks and
tee shirts are two very different items and you are drawing one from each group, the
probability of getting a white shirt should not be changed by the type of socks you draw;
these events are independent.
6
The probability of choosing a white pair of socks: P(white socks) = .
10
3
The probability of choosing a white tee shirt: P(white tee shirt) = .
7
6 3 18 9
The probability of both being white: P(white socks and white tee shirt) =   
10 7 70 35

Try it Now 8:
A card is pulled a standard deck of playing cards and noted. The card is then replaced, the
deck is shuffled, and a second card is removed and noted. What is the probability that both
cards are Aces?

Tree diagrams and Compound Probability


When we considered the outcomes from tossing two dice, we listed all the outcomes for
rolling two dice in chart form, known as a Cartesian product. This chart method, where one
event’s outcomes is listing in columns, and the other in rows, with the cells of the table
listing the ordered pairs from each column and row, is useful when working with two
independent events. We could also list all the outcomes using a tree diagram, another tool
we explore in this section. Tree diagrams provide us a graphic organizer to help account for
all the outcomes of our events, their probabilities, including independence and dependence,
and the final outcomes for compound events with their probabilities.
Probability 19

Suppose our experiment is a simple coin toss. What are the outcomes? H or T. We can
illustrate these outcomes with a tree diagram:
 Each event is a “level” or layer in the tree.
 Each “branch” in the tree level leads to a possible
outcome for the event.
 We label the end of the “branch” with the outcome.
 We label the branches with their probabilities.
Notice how the branches here are labeled with 1/2, as the
probability of tossing heads or tails on a coin are both 1/2.
If we add the probabilities on the branches, they should total 1, since every outcome for the
event is represented.

With a simple event, a tree diagram may seem like overkill. With compound events, tree
diagrams are helpful and demystify calculating probabilities. Consider if we are tossing
TWO coins (or tossing a coin twice). We could list all the outcomes by thinking through
them in our head, or we could use the tree to build the list and make sure we have accounted
for all the possibilities.

Since there are two events, there are


two levels in the tree, one for each
coin. Notice how the first level looks
like the tree created for the simple
event above. The second level has
the same outcomes and probabilities
as the first level because the events
are independent. The second level
repeats these outcomes as they follow
from the outcomes on the first coin:
if you get H on the first coin, you
could get H or T on the second coin, and if you get T on the first coin, you could still get H
or T on the second coin.

At the end of each branch at the bottom of the tree are the final outcomes for the sample
space of tossing two coins, which are combinations of the first level and the second level,
along with their probabilities, which are products of the probabilities of each level. Notice
how the probabilities along the bottom of the tree all add up to 1, since all outcomes are
represented. From this tree diagram we can see that the probability of tossing two coins and
both coming up tails is 1/4.

In our tree we can see there were four final outcomes, {HH,
HT, TH, TT} and each has a probability of 1/4. This makes
sense since when we toss coins, each outcome is equally
likely. We could also use a Cartesian product chart to list the
outcomes as shown at the right, and each outcome would still
be 1/4. The tree diagram becomes more useful than the chart
when the outcomes are not equally likely, and labeling the
probabilities accurately is critical.
20

Instead of tossing two coins, suppose we spin the spinner below two times. There are three
outcomes, but the probability of each outcome is not 1/3, as they are
not equally likely. Using geometric probability, we can see that
P(A) = 1/4 since the region for A represents 1/4 of the total area.
Similarly, P(B) = 1/4 and P(C) = 2/4 or 1/2.

If we spin the spinner twice, is each spin independent? Consider


that the outcome from one spin has really no effect on the outcome
for the next spin, so the spins should be independent. When we
build our tree, we need two levels, one for each spin, but the
probabilities for each level will not change.

Our first level represents the


first spin of the spinner, and
we have three outcomes, A, B
or C, with their probabilities.
The second level begins at the
end of each branch of the first
level with the same three
outcomes and their respective
probabilities. Notice how
there are nine final outcomes,
but their probabilities are not
1/9, as they are not equally
likely. The probabilities of
each of the final outcomes are found by multiplying down the branches of the tree that led to
that outcome. For example, the probability of spinning an A on both spins, P(AA), is found
by multiplying (1/4)(1/4) = 1/16. The sum of the probabilities of the final outcomes is 1.

With the tree, we can answer all sorts of probability questions about the events. We can find
the probability of spinning B on both spins, P(BB) = 1/16 by multiplying down the branches
that lead to BB. We can also find the probability that we spin the same letter on both spins.
This probability is more complicated, as it encompasses several outcomes. To get the same
letter on both spins, we can spin AA or BB or CC. Notice the use of the Boolean operator
“OR.” We apply the same rule we applied earlier in probability to calculate P(AA or BB or
CC): we add the probabilities:
1 1 1 6 3
P(AA or BB or CC) =    
16 16 4 16 8

This example leads us to the following guidelines for working with tree diagrams:

Tree Diagrams: AND, OR:


When working with tree diagrams,
 we multiply the probabilities down the branches to find the P(A and B)
 we add the probabilities across the levels to find the P(A or B)
Probability 21

1 (1, 1) Tree diagrams can get complicated very quickly if the events have many
2 (1, 2) different outcomes. Remember the question about tossing two dice, and the
3 (1, 3)
1 4 (1, 4) chart we created for it? If we draw a tree diagram, there will be two levels,
5 (1, 5) one for each die, with six branches on the first level, and six branches off the
6 (1, 6)
ends of each of the first level’s branches, for a total of 36 final outcomes, as
1 (2, 1) shown in the margin. The tree is so complex that it was difficult to label the
2 (2, 2)
3 (2, 3) branches with their probabilities! Luckily for us the outcomes are equally
2 4 (2, 4) likely as each would be 1/6 and the final outcomes (1/6)(1/6) = 1/36.
5 (2, 5)
6 (2, 6)
1 (3, 1)
If we are only looking for specific outcomes and their probabilities as opposed
2 (3, 2) to all the outcomes, we can make the tree simpler.
3 (3, 3)
3 4 (3, 4)
5 (3, 5) Example 20: Natural Blackjack
6 (3, 6) Suppose you are playing blackjack and are dealt two cards from a well-
1 (4, 1) shuffled deck without replacement. What is the probability you are dealt a
2 (4, 2)
3 (4, 3)
“natural blackjack,” which is a 10-value card and an Ace in any order?
4 4 (4, 4)
5 (4, 5) We have two cards dealt, so our tree will have two levels. There are 52
6 (4, 6)
outcomes possible for the first card, but we don’t need to draw 52 branches as
1 (5, 1)
2 (5, 2) we are only interested in getting a natural blackjack, so we need either a 10-
3 (5, 3) card or an Ace for the first card (any other card and we won’t have a natural
5 4 (5, 4)
5 (5, 5)
blackjack). We only need branches for the outcomes of our event.
6 (5, 6)
1 (6, 1) For the first card, P(10-card) = 16/52
2 (6, 2) since there are 4 ten cards per suit (J, Q,
3 (6, 3)
6 4 (6, 4)
K, 10 cards are all worth 10 in blackjack)
5 (6, 5) and 4 suits, so 16 ten cards. P(Ace) =
6 (6, 6) 4/52, as there are 4 aces out of 52 cards.

For the second card, the second level only has one
option we are interested in: the card that finishes the
blackjack. If we have a 10-card first, we need an Ace to
get the blackjack. There would still be 4 aces left in the
deck, but only 51 cards as we have been dealt one
already, P(Ace given that the first card is a 10-card) =
4/51.

If we have an Ace first, we need a 10-card to get the blackjack. There would still be 16 of
the 10-cards left, and 51 cards left, P(10-card given that the first card is an Ace) = 16/51.

The final outcomes, which represent the ways to get a natural blackjack, are listed and we
multiply down the branches to get their probabilities. To answer the question we asked, what
is the probability of getting a 10-card and an Ace in any order, we need to find P(10A or
A10) and thus add the probabilities:
64 64 128 32
P(10A or A10) =     4.8%
2652 2652 2652 663
22

Tree Diagrams and Conditional Probability


The blackjack problem reminds us that sometimes we need to compute the probability of an
event given that another event has occurred. In multi-stage probability, conditional
probability often appears as we consider the second event (level on the tree) having occurred
after the first. If the events are dependent events, the probabilities on the second level of the
tree will change.

Example 21: Cards Without Replacement


What is the probability that two cards drawn at random from a deck of playing cards will
both be aces?

It might seem that you could use the formula for the
probability of two independent events and simply
4 4 1
multiply   . This would be incorrect,
52 52 169
however, because the two events are not independent. If
the first card drawn is an ace, then the probability that the
second card is also an ace would be lower because there
would only be three aces left in the deck.

We can use a simplified tree diagram to illustrate the


events and their probabilities. Once the first card chosen
is an ace, the probability that the second card chosen is also an ace is called the conditional
probability of drawing an ace. In this case the "condition" is that the first card is an ace.
Symbolically, we write this as: P(ace on second draw | an ace on the first draw).

What is this probability? After an ace is drawn on the first draw, there are 3 aces out of 51
total cards left. This means that the conditional probability of drawing an ace after one ace
3 1
has already been drawn is  .
51 17
Notice how the tree diagram’s second level has this probability labeled for the Ace outcome.
All of the outcomes in the second level of the tree have probabilities out of 51 for the same
reason: one card out of the 52 has already been drawn.
4 3 12 1
The probability of both cards being aces P(AA) =    .
52 51 2652 221
Notice that although our tree is simplified, all the possible outcomes are accounted for in the
final outcomes at the end of the branches, as you will get either both Aces, an Ace and
something else, or no Aces. The sum of the probabilities along the bottom of the tree is 1.

Although we worked with dependent events in these examples, we still use the multiplicative
concept of multi-stage events, we just need to account for dependent events in our
probabilities; we’re still multiplying down the branches of the tree.

Conditional Probability Formula


If Events A and B are dependent events, then
P(A and B) = P(A) · P(B | A)
Probability 23

Example 22: In Spades


If you pull 2 cards out of a deck, what is the probability that both are spades?

The probability that the first card is a


13
spade is .
52
The probability that the second card is a
12
spade, given the first was a spade, is ,
51
since there is one less spade in the deck,
and one less total cards.

The probability that both cards are spades


13 12 156
is    0.0588
52 51 2652

Example 23: Diet Soda


Suppose that two-thirds of the population are on a diet at least occasionally. Of this group,
4/5 drink diet soda, while 1/2 of the rest of the population drink diet soda. Find the following
probabilities:
a. P(a person drinks diet soda)
b. P(a person diets, but doesn’t drink diet
soda):

A tree diagram helps keep track of all the


information in this problem! Our first event is
whether or not a person diets. Since 2/3 of the
population diets, that makes the complement
1/3 for the population that is not on a diet.
Our second event is whether they drink diet
soda, which depends on whether or not they
are dieting. Since 4/5 of the dieters drink diet
soda, that leaves 1/5 for the complement. Similarly, 1/2 of the not dieting population drinks
diet soda, leaving 1/2 for the complement. We complete the tree by finding the final
outcomes and multiplying down the branches for their probabilities.

a. P(a person drinks diet soda): we need all the outcomes that fit this event, so any
8 1 7
outcomes with soda (S) in them: P(D,S or not,S) =    70%
15 6 10
b. P(a person diets, but doesn’t drink diet soda): we need all the outcomes that fit this event,
2
which is P(D, not) =  13.3%
15
Try it Now 9:
In your drawer you have 10 pairs of socks, 6 of which are white. If you reach in and
randomly grab two pairs of socks, what is the probability that both are white?
24

Counting
Counting? You already know how to count or you wouldn't be taking a college-level math
class, right? Well yes, but what we'll really be investigating here are ways of counting
efficiently. When we get to the probability situations a bit later in this chapter we will need
to count some very large numbers, like the number of possible winning lottery tickets. One
way to do this would be to write down every possible set of numbers that might show up on a
lottery ticket, but believe me: you don't want to do this.

Basic Counting
We start with some more reasonable sorts of counting problems in order to develop the ideas
that we will soon need.

Example 24: Eating Out


Suppose at a particular restaurant you have three choices for an appetizer (soup, salad or
breadsticks) and five choices for a main course (hamburger, sandwich, quiche, fajita or
pizza). If you are allowed to choose exactly one item from each category for your meal, how
many different meal options do you have?

Solution 1: One way to solve this problem would be to systematically list each possible
meal:
soup + hamburger soup + sandwich soup + quiche
soup + fajita soup + pizza salad + hamburger
salad + sandwich salad + quiche salad + fajita
salad + pizza breadsticks + hamburger breadsticks + sandwich
breadsticks + quiche breadsticks + fajita breadsticks + pizza

Assuming that we did this systematically and that we neither missed any possibilities nor
listed any possibility more than once, the answer would be 15. Thus you could go to the
restaurant 15 nights in a row and have a different meal each night.

Solution 2: Another way to solve this problem would be to list all the possibilities in a
Cartesian product table:
hamburger sandwich quiche fajita pizza
soup soup+burger
salad salad+burger
bread etc.

In each of the cells in the table we could list the corresponding meal: soup + hamburger in
the upper left corner, salad + hamburger below it, etc. But if we didn't really care what the
possible meals are, only how many possible meals there are, we could just count the number
of cells and arrive at an answer of 15, which matches our answer from the first solution. (It's
always good when you solve a problem two different ways and get the same answer!)

Solution 3: We already have two perfectly good solutions. Why do we need a third? The
first method was not very systematic, and we might easily have made an omission. The
second method was better, but suppose that in addition to the appetizer and the main course
we further complicated the problem by adding desserts to the menu: we've used the rows of
Probability 25

the table for the appetizers and the columns for the main courses—where will the desserts
go? We would need a third dimension, and since drawing 3-D tables on a 2-D page or
computer screen isn't terribly easy, we need a better way in case we have three categories to
choose form instead of just two.

So, back to the problem in the example. What else can we do? Let's draw a tree diagram:

In this case, we first drew five branches (one for each main course) and then for each of those
branches we drew three more branches (one for each appetizer). We count the number of
branches at the final level and get (surprise, surprise!) 15.

If we wanted, we could instead draw three branches at the first stage for the three appetizers
and then five branches (one for each main course) branching out of each of those three
branches.

We know how to count possibilities using tables and tree diagrams. Imagine a game where
you have two decks of cards (with 52 cards in each deck) and you select one card from each
deck. Would you really want to draw a table or tree diagram to determine the number of
outcomes of this game?

Let's go back to the table (Cartesian product) solution of the meal problem. Notice that one
way to count the number of possible meals is simply to number each of the appropriate cells
in the table, as we have done above. Another way to count the number of cells in the table
would be multiply the number of rows (3) by the number of columns (5) to get 15. We could
arrive at the same result without making a table at all: simply multiply the number of choices
for the appetizer (3) by the number of choices for the main course (5). We can also organize
our information using a “slot” diagram, drawing a “slot” for each category available
(appetizer, main course) and listing the amount in each category in the “slots,” then multiply
the values in the slots.
:
3 * 5 = 15 possible meals
# Appetizers # Main Courses
26

We generalize this technique as the basic counting rule:

Basic Counting Rule


If we are asked to choose one item from each of two separate categories where there
are m items in the first category and n items in the second category, then the total
number of available choices is m · n.
m * n
First Category Second Category

Example 25: Reading List


There are 21 novels and 18 volumes of poetry on a reading list for a college English course.
How many different ways can a student select one novel and one volume of poetry to read
during the quarter?

Using a “slot diagram:” 21 * 18


# Novels # Poetry

There are 21 choices from the first category and 18 for the second, so there are 21 · 18 = 378
possibilities.

The Basic Counting Rule can be extended when there are more than two categories by
applying it repeatedly, as we see in the next example.

Example 26: More Restaurant Meals.


Suppose at a particular restaurant you have three choices for an appetizer (soup, salad or
breadsticks), five choices for a main course (hamburger, sandwich, quiche, fajita or pasta)
and two choices for dessert (pie or ice cream). If you are allowed to choose exactly one item
from each category for your meal, how many different meal options do you have?

Using a “slot diagram:” 3 * 5 * 2


# Appetizers # M. Courses # Desserts

There are 3 choices for an appetizer, 5 for the main course and 2 for dessert, so there are
3 · 5 · 2 = 30 possibilities.

Example 27: Quizzes


A quiz consists of 3 true-or-false questions. In how many ways can a student answer the
quiz?

Using a “slot diagram:” 2 * 2 * 2


#Q1 Options #Q2 Options #Q3 Options

There are 3 questions. Each question has 2 possible answers (true or false), so the quiz may
be answered in 2 · 2 · 2 = 8 different ways. Recall that another way to write 2 · 2 · 2 is 23,
which is much more compact.
Probability 27

Try it Now 10:


Suppose at a particular restaurant you have eight choices for an appetizer, eleven choices for
a main course and five choices for dessert. If you are allowed to choose exactly one item
from each category for your meal, how many different meal options do you have?

Permutations
In this section we will develop an even faster way to solve some of the problems we have
already learned to solve by other means. Let's start with a couple examples.

Example 28: MATH


How many different ways can the letters of the word MATH be rearranged to form a four-
letter code word?

This problem is a bit different. Instead of choosing one item from each of several different
categories, we are repeatedly choosing items from the same category (the category is: the
letters of the word MATH) and each time we choose an item we do not replace it, so there is
one fewer choice at the next stage: we have 4 choices for the first letter (say we choose A),
then 3 choices for the second (M, T and H; say we choose H), then 2 choices for the next
letter (M and T; say we choose M) and only one choice at the last stage (T).

Using a “slot diagram:” 4 * 3 * 2 * 1


# 1st Letters # 2nd Letters # 3rd Letters # 4th Letters

Thus there are 4 · 3 · 2 · 1 = 24 ways to spell a code worth with the letters MATH.

In this example, we needed to calculate n · (n – 1) · (n – 2) ··· 3 · 2 · 1. This calculation


shows up often in mathematics, and is called the factorial, and is notated n!

Factorial
n! = n · (n – 1) · (n – 2) ··· 3 · 2 · 1

Example 29: Door Prizes


How many ways can five different door prizes be distributed among five people?

Using a “slot diagram:”


5 * 4 * 3 * 2 * 1 .
#1st Person # 2nd Person #3rd Person #4th Person #5th Person

There are 5 choices of prize for the first person, 4 choices for the second, and so on. The
number of ways the prizes can be distributed will be 5! = 5 · 4 · 3 · 2 · 1 = 120 ways.

Next we will consider some slightly different examples involving arrangements with more
objects than places to put them. As we explore them, consider how each situation relates to
the factorial concept.
28

Example 30: Charity Raffle


A charity benefit is attended by 25 people and three gift certificates are given away as door
prizes: one gift certificate is in the amount of $100, the second is worth $25 and the third is
worth $10. Assuming that no person receives more than one prize, how many different ways
can the three gift certificates be awarded?

Using a “slot diagram:” 25 * 24 * 23


st nd rd
1 Prize 2 Prize 3 Prize

Using the Basic Counting Rule, there are 25 choices for the person who receives the $100
certificate, 24 remaining choices for the $25 certificate and 23 choices for the $10 certificate,
so there are 25 · 24 · 23 = 13,800 ways in which the prizes can be awarded.

Example 31: Olympic Medals


Eight sprinters have made it to the Olympic finals in the 100-meter race. In how many
different ways can the gold, silver and bronze medals be awarded?

Using a “slot diagram:” 8 * 7 * 6


Gold Medal Silver Medal Bronze Medal

Using the Basic Counting Rule, there are 8 choices for the gold medal winner, 7 remaining
choices for the silver, and 6 for the bronze, so there are 8 · 7 · 6 = 336 ways the three medals
can be awarded to the 8 runners.

Note that in these preceding examples, the gift certificates and the Olympic medals were
awarded without replacement; that is, once we have chosen a winner of the first door prize or
the gold medal, they are not eligible for the other prizes. Thus, at each succeeding stage of
the solution there is one fewer choice (25, then 24, then 23 in the first example; 8, then 7,
then 6 in the second). Contrast this with the situation of a multiple choice test, where there
might be five possible answers — A, B, C, D or E — for each question on the test.

In addition, the order of selection was important in each example: for the three door prizes,
being chosen first means that you receive substantially more money; in the Olympics
example, coming in first means that you get the gold medal instead of the silver or bronze. In
each case, if we had chosen the same three people in a different order there might have been
a different person who received the $100 prize, or a different gold medalist. Contrast this
with the situation where we might draw three names out of a hat to each receive a $10 gift
certificate; in this case the order of selection is not important since each of the three people
receive the same prize. Situations where the order is not important will be discussed in the
next section.

We can use factorials to generalize the situation in the examples above to any problem
without replacement where the order of selection is important. Consider 8 Olympic sprinters
vying for 3 medals. If we use 8!, that’s like giving 8 medals to the sprinters:

8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 = 8!
1st 2nd 3rd 4th 5th 6th 7th 8th
Probability 29

But we don’t have 8 medals, only 3, so our slot diagram has (8 – 3) = 5 extra slots that we
don’t have medals for:
8* 7* 6 * 5*4 *3 * 2 *1
1st 2nd 3rd 4th 5th 6th 7th 8th

Notice what values are in those extra slots: 5*4*3*2*1 = 5!, which relates to the 5 extra slots
we don’t have! So we could calculate the answer by dividing out these extra slots, by
dividing by 5!:
8*7*6*5*4*3*2*1 = 8! = 8*7*6
5*4*3*2*1 5!

If we are arranging in order r items out of n possibilities (instead of 3 out of 8 as in the


previous example), the number of possible arrangements can be found by finding n!, which is
the possible arrangements using all n total, and dividing by (n – r)!, which divides out the
extra slots we don’t have since we are only ordering r of the items: n!/(n – r)!

After doing so, we’ll be left with:


n · (n – 1) · (n – 2) ··· (n – r + 1)
If you don't see why (n — r + 1) is the right number to use for the last factor, just think back
to the first example in this section, where we calculated 25 · 24 · 23 to get 13,800. In this
case n = 25 and r = 3, so n — r + 1 = 25 — 3 + 1 = 23, which is exactly the right number for
the final factor.

Now, why would we want to use this complicated formula when it's actually easier to use the
Basic Counting Rule, as we did in the first two examples? We won't actually use this
formula all that often, we only developed it so that we could attach a special notation and a
special definition to this situation where we are choosing r items out of n possibilities without
replacement and where the order of selection is important. In this situation we write:

Permutations
We say that there are nPr permutations of size r that may be selected from among n
choices without replacement when order matters.
n!
n Pr  = n · (n – 1) · (n – 2) ··· (n – r + 1)
(n  r )!

We usually use technology rather than factorials or repeated multiplication to compute


permutations.

Example 32: Displaying Paintings


I have nine paintings and have room to display only four of them at a time on my wall. How
many different ways could I do this?

We can use a “slot diagram” to solve this problem, or recognize we are choosing 4 paintings
out of 9 without replacement where the order of selection is important, and use a
permutation. We have 9 total paintings (n = 9), and are arranging only 4 of them (4 “slots,”
so r = 4):
9P4 = 9 · 8 · 7 · 6 = 3,024 permutations.
30

Example 33: Executive Committee… with positions


How many ways can a four-person executive committee (president, vice-president, secretary,
treasurer) be selected from a 16-member board of directors of a non-profit organization?

We want to choose 4 people out of 16 without replacement and where the order of selection
is important. So the answer is 16P4 = 16 · 15 · 14 · 13 = 43,680.

Try it Now 11:


How many 5 character passwords can be made using the letters A through Z
a. if letters can repeat
b. if letters cannot repeat

Combinations
In the previous section we considered the situation where we chose r items out of n
possibilities without replacement and where the order of selection was important. We now
consider a similar situation in which the order of selection is not important.

Example 34: Charity Prizes


A charity benefit is attended by 25 people at which three $50 gift certificates are given away
as door prizes. Assuming no person receives more than one prize, how many different ways
can the gift certificates be awarded?

Using the Basic Counting Rule, there are 25 choices for the first person, 24 remaining
choices for the second person and 23 for the third, so there are 25 · 24 · 23 = 13,800 ways to
choose three people. Suppose for a moment that Abe is chosen first, Bea second and Cindy
third; this is one of the 13,800 possible outcomes. Another way to award the prizes would be
to choose Abe first, Cindy second and Bea third; this is another of the 13,800 possible
outcomes. But either way Abe, Bea and Cindy each get $50, so it doesn't really matter the
order in which we select them. In how many different orders can Abe, Bea and Cindy be
selected? It turns out there are 6:

ABC ACB BAC BCA CAB CBA

How can we be sure that we have counted them all? We are really just choosing 3 people out
of 3, so there are 3 · 2 · 1 = 6 ways to do this; we didn't really need to list them all, we can
just use permutations!

So, out of the 13,800 ways to select 3 people out of 25, six of them involve Abe, Bea and
Cindy. The same argument works for any other group of three people (say Abe, Bea and
David or Frank, Gloria and Hildy) so each three-person group is counted six times. Thus the
13,800 figure is six times too big. The number of distinct three-person groups will be
13,800/6 = 2300.
Probability 31

We can generalize the situation in this example above to any problem of choosing a
collection of items without replacement where the order of selection is not important. If we
are choosing r items out of n possibilities (instead of 3 out of 25 as in the previous
P
examples), the number of possible choices will be given by n r , and we could use this
r Pr
formula for computation; dividing by rPr divides out the number of extra repetitious
arrangements of the r items, since order doesn’t matter. This situation arises so frequently
that we attach a special notation and a special definition to this situation where we are
choosing r items out of n possibilities without replacement where the order of selection is not
important.

Combinations
n Pr
n Cr 
r Pr

We say that there are nCr combinations of size r that may be selected from among n
choices without replacement where order doesn’t matter.

We can also write the combinations formula in terms of factorials:


n!
n Cr 
(n  r )!r!

Example 35: Student Council


A group of four students is to be chosen from a 35-member class to represent the class on the
student council. How many ways can this be done?

Since we are choosing 4 people out of 35 without replacement where the order of selection is
35  34  33  32
not important there are 35 C4  = 52,360 combinations.
4  3  2 1

Try it Now 12:


The United States Senate Appropriations Committee consists of 29 members; the Defense
Subcommittee of the Appropriations Committee consists of 19 members. Disregarding party
affiliation or any special seats on the Subcommittee, how many different 19-member
subcommittees may be chosen from among the 29 Senators on the Appropriations
Committee?

In the preceding Try it Now problem we assumed that the 19 members of the Defense
Subcommittee were chosen without regard to party affiliation. In reality this would never
happen: if one party is in the majority they would never let a majority of other party sit on
(and thus control) any subcommittee. So let's consider the problem again, in a slightly more
complicated form:
32

Example 36: Partisan Politics


The United States Senate Appropriations Committee consists of 29 members, 15 Republicans
and 14 Democrats. The Defense Subcommittee consists of 19 members, 10 Republicans and
9 Democrats. How many different ways can the members of the Defense Subcommittee be
chosen from among the 29 Senators on the Appropriations Committee?

In this case we need to choose 10 of the 15 Republicans and 9 of the 14 Democrats. There
are 15C10 = 3003 ways to choose the 10 Republicans and 14C9 = 2002 ways to choose the 9
Democrats. But now what? How do we finish the problem?

Suppose we listed all of the possible 10-member Republican groups on 3003 slips of red
paper and all of the possible 9-member Democratic groups on 2002 slips of blue
paper. How many ways can we choose one red slip and one blue slip? This is a job for the
Basic Counting Rule! We are simply making one choice from the first category and one
choice from the second category, just like in the restaurant menu problems from earlier.

Using a “slot diagram:” 15C10 * 14C9


# for Republicans # for Democrats

There must be 3003 · 2002 = 6,012,006 possible ways of selecting the members of the
Defense Subcommittee.

Probability using Permutations and Combinations


We can use permutations and combinations to help us answer more complex probability
questions. It’s helpful to recall that probability expresses a “part to whole” relationship.

Example 37: PIN Probability


A 4 digit PIN number is selected. What is the probability that there are no repeated digits?

Consider the “whole:” all possible 4-digit PIN numbers.


There are 10 possible values for each digit of the PIN (namely: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9), so
there are 10 · 10 · 10 · 10 = 104 = 10000 total possible PIN numbers.

Consider the “part” we’re interested in: 4-digit PIN numbers with no repeated digits.
To have no repeated digits, all four digits would have to be different, which is selecting
without replacement. We could either compute 10 · 9 · 8 · 7, or notice that this is the same
as the permutation 10P4 = 5040.

The probability of no repeated digits is the number of 4 digit PIN numbers with no repeated
digits (the “part”) divided by the total number of 4 digit PIN numbers (the “whole”).

" part " P 5040


P(no repeated digits in 4 digit PIN numbers) =  10 44   0.504 = 50.4%.
" whole " 10 10000
Probability 33

Example 38: Lottery


In a certain state's lottery, 48 balls numbered 1 through 48 are placed in a machine and six of
them are drawn at random. If the six numbers drawn match the numbers that a player had
chosen, the player wins $1,000,000. In this lottery, the order the numbers are drawn in
doesn’t matter. Compute the probability that you win the million-dollar prize if you purchase
a single lottery ticket.

Consider the “whole:” the total number of ways any six numbers can be drawn.
Since there is no stipulation that the 6 numbers (out of 48 total) be in any particular order, the
number of possible outcomes of the lottery drawing is 48C6 = 12,271,512.

Consider the “part:” the number of ways the six numbers on the player’s ticket could match
the six numbers drawn from the machine (the winning number).
Of all the possible outcomes, only one would match all six numbers on the player’s ticket, all
6 numbers need to come from the set of 6 winning numbers, so the probability of winning the
grand prize is:
" part " C 1
P(winning the big prize) =  6 6   0.0000000815
" whole " 48 C6 12271512

Example 39: More Lottery


In the state lottery from the previous example, if five of the six numbers drawn match the
numbers that a player has chosen, the player wins a second prize of $1,000. Compute the
probability that you win the second prize if you purchase a single lottery ticket.

Consider the “whole:” the total number of ways any six numbers can be drawn.
As above, the number of possible outcomes of the lottery drawing is 48C6 = 12,271,512.

Consider the “part:” the total number of ways five out of six winning numbers can be drawn.
In order to win the second prize, five of the six numbers on the ticket must match five of the
six winning numbers; in other words, we must have five of the six winning numbers and one
of the 42 losing numbers. The number of ways to choose 5 out of the 6 winning numbers is
given by 6C5 = 6 and the number of ways to choose 1 out of the 42 losing numbers is given
by 42C1 = 42. Thus the number of favorable outcomes is then given by the Basic Counting
Rule: 6C5 · 42C1 = 6 · 42 = 252. So the probability of winning the second prize is.

" part "  6 C5  42 C1   252


 0.0000205
P(winning the second prize) = 
" whole " 48 C6 12271512

Try it Now 13:


To play the Arizona Lottery Fantasy 5, you choose 5 numbers from 1 to 41. If you match all
5 numbers, you win the jackpot. If you match 4 out of 5 numbers, you win $500.
Find the probability of each.
34

Example 40: Card Hand


Compute the probability of randomly drawing five cards from a deck and getting exactly one
Ace.

Consider the “whole:” the total number of ways any five cards can be drawn.
In many card games (such as poker) the order in which the cards are drawn is not important
(since the player may rearrange the cards in his hand any way he chooses); in the problems
that follow, we will assume that this is the case unless otherwise stated. Thus we use
combinations to compute the possible number of 5-card hands, 52C5.

Consider the “part:” the total number of ways to draw one Ace and four other cards (none of
them Aces) from the deck.
Since there are four Aces and we want exactly one of them, there will be 4C1 ways to select
one Ace; since there are 48 non-Aces and we want 4 of them, there will be 48C4 ways to
select the four non-Aces. Now we use the Basic Counting Rule to calculate that there will be
4C1 · 48C4 ways to choose one ace and four non-Aces.

Putting this all together, we have:


" part "  4 C1  48 C4  778320
P  one Ace      0.299  29.9%
" whole "  52 C5  2598960

Example 41: Pair of Aces


Compute the probability of randomly drawing five cards from a deck and getting exactly two
Aces.

The solution is similar to the previous example, except now we are choosing 2 Aces out of 4
and 3 non-Aces out of 48; the denominator remains the same:

 4 C 2  48 C3 103776
P( two Aces)    0.0399  3.99%
52 C 5 2598960
It is useful to note that these card problems are remarkably similar to the lottery problems
discussed earlier.

Try it Now 14:


Compute the probability of randomly drawing five cards from a deck of cards and getting
three Aces and two Kings.

Birthday Problem
Let's take a pause to consider a famous problem in probability theory:

Suppose you have a room full of 30 people. What is the probability that there is at
least one shared birthday?
Probability 35

Take a guess at the answer to the above problem. Was your guess fairly low, like around
10%? That seems to be the intuitive answer (30/365, perhaps?). Let's see if we should listen
to our intuition. Let's start with a simpler problem, however.

Example 42: Birthdays with Three


Suppose three people are in a room. What is the probability that there is at least one shared
birthday among these three people?

There are a lot of ways there could be at least one shared birthday. Fortunately there is an
easier way. We ask ourselves “What is the alternative to having at least one shared
birthday?” In this case, the alternative is that there are no shared birthdays. In other words,
the alternative to “at least one” is having none. In other words, since this is a complementary
event,

P(at least one) = 1 – P(none)

We will start, then, by computing the probability that there is no shared birthday. Let's
imagine that you are one of these three people. Your birthday can be anything without
conflict, so there are 365 choices out of 365 for your birthday. What is the probability that
the second person does not share your birthday? There are 365 days in the year (let's ignore
leap years) and removing your birthday from contention, there are 364 choices that will
guarantee that you do not share a birthday with this person, so the probability that the second
person does not share your birthday is 364/365. Now we move to the third person. What is
the probability that this third person does not have the same birthday as either you or the
second person? There are 363 days that will not duplicate your birthday or the second
person's, so the probability that the third person does not share a birthday with the first two is
363/365.

We want the second person not to share a birthday with you and the third person not to share
a birthday with the first two people, so we use the multiplication rule:
365 364 363
P(no shared birthday)     0.9918
365 365 365

and then subtract from 1 to get

P(shared birthday) = 1 – P(no shared birthday) = 1 – 0.9918 = 0.0082.

This is a pretty small number, so maybe it makes sense that the answer to our original
problem will be small. Let's make our group a bit bigger.

Example 43: Birthdays with Five


Suppose five people are in a room. What is the probability that there is at least one shared
birthday among these five people?

Continuing the pattern of the previous example, the answer should be


36

365 364 363 362 361


P(shared birthday)  1       0.0271
365 365 365 365 365

Note that we could rewrite this more compactly as


P
P(shared birthday)  1  365 55  0.0271
365
which makes it a bit easier to type into a calculator or computer, and which suggests a nice
formula as we continue to expand the population of our group.

Example 44: Birthdays with Thirty


Suppose 30 people are in a room. What is the probability that there is at least one shared
birthday among these 30 people?

Here we can calculate


P
P(shared birthday)  1   0.706
365 30
30
365

which gives us the surprising result that when you are in a room with 30 people there is a
70% chance that there will be at least one shared birthday!

If you like to bet, and if you can convince 30 people to reveal their birthdays, you might be
able to win some money by betting a friend that there will be at least two people with the
same birthday in the room anytime you are in a room of 30 or more people. (Of course, you
would need to make sure your friend hasn't studied probability!) You wouldn't be guaranteed
to win, but you should win more than half the time.

This is one of many results in probability theory that is counterintuitive; that is, it goes
against our gut instincts. If you still don't believe the math, you can carry out a
simulation. Just so you won't have to go around rounding up groups of 30 people, someone
has kindly developed a Java applet so that you can conduct a computer simulation. Go to this
web page: http://www-stat.stanford.edu/~susan/surprise/Birthday.html, and once the applet
has loaded, select 30 birthdays and then keep clicking Start and Reset. If you keep track of
the number of times that there is a repeated birthday, you should get a repeated birthday
about 7 out of every 10 times you run the simulation.

Try it Now 15:


Suppose 10 people are in a room. What is the probability that there is at least one shared
birthday among these 10 people?

Example 45: Expected Value Revisited


In a certain state's lottery, 48 balls numbered 1 through 48 are placed in a machine and six of
them are drawn at random. If the six numbers drawn match the numbers that a player had
Probability 37

chosen, the player wins $1,000,000. If they match 5 numbers, then win $1,000. It costs $1
to buy a ticket. Find the expected value.

Earlier, we calculated the probability of matching all 6 numbers and the probability of
matching 5 numbers:
6 C6 1
  0.0000000815 for all 6 numbers,
48 C 6 12271512

 6 C5  42 C1   252
 0.0000205 for 5 numbers.
48 C6 12271512

Our probabilities and outcome values are:


Outcome Match 6 Numbers Match 5 Numbers Lose! (all others)
1 252 12271259
Probability
12271512 12271512 12271512
Payoff $999,999 $999 -$1

The expected value, then is:

$999,999   1
 $999  
252
  $1 
12271259
 $0.898
12271512 12271512 12271512

On average, one can expect to lose about 90 cents on a lottery ticket. Of course, most players
will lose $1.

Binomial Probability
Let’s consider a 10-question multiple choice quiz. If you randomly guess on each question,
what is the probability that you get 50% on the test? Consider what one possible test might
look like with 50% score, 5 questions correct (C), 5 questions wrong (W), graded it might
look like: CCCCCWWWWW.

What is the probability of getting this score? Assuming that each multiple choice question
has 4 options (a, b, c, d) and only one of the options is correct, each question has a 1/4 chance
of being correct, and a 3/4 chance of being wrong. For this score, the probability would be:

P(first 5 C and last 5 W) = (1/4)(1/4)(1/4)(1/4)(1/4)(3/4)(3/4)(3/4)(3/4)(3/4) = (1/4)5(3/4)5

But this is only one possible way of getting 50%. There are other ways, such as…
CWCWCWCWCW WCCWWCCWCW WCWWCWCCWC
CWWCCWWCCW WWWWWCCCCC WCWCWCWCWC

How many ways are there to get 5 questions correct out of 10 total? That’s a combinations
question, and can be found by calculating 10C5 = 252 different ways. Putting these pieces
together, the probability of getting 50% on a multiple choice test is: (252)(1/4)5(3/4)5  5.8%.
38

Our solution to this problem fit a particular pattern:


(# of different ways to get 5 out of 10 correct)(probability of C)5(probability of W)5

This pattern generalizes for “x” out of “n” successes:


(# of different ways to get x out of n successes)(P(success))x(P(not success))n – x

which is called the binomial probability formula:

Binomial Probability
Probability of “x” successes in “n” trials = (nCx)(px)(qn - x)

where p = probability of success, and q = 1 – p = probability of failure


and applies when there are a finite number of independent trials for your event

What is the probability that we could get an “A” on the 10-question multiple choice test,
assuming we randomly guess? We need a score of 9 out of 10 or 10 out of 10. Using the
binomial probability formula we have:
P(9 out 10 correct) = (10C9)(1/4)9(3/4)1 = 2.8610…10-5 = 0.00002861…
P(10 out of 10 correct) = (10C10)(1/4)10(3/4)0 = 9.5367…10-7 = 0.00000095367…
Since either of these scores (9 correct OR 10 correct) gets us an “A,” we add the two
probabilities, and get: P(An “A” on the quiz) = 2.95639…10-5 = 0.0000295639 or
0.00295649%, which you could say is slim to none. The moral of the story: you’re not
likely to get an “A” by randomly guessing on a multiple choice quiz.

Example 46: Free Throw Shots


The probability of a basketball player’s making a free throw successfully in a game is 2/3. If
the player attempts 10 free throws in a game, what is the probability that exactly 6 are made?

We could draw a tree for this problem, but since the player is taking 10 shots, that would be a
tree with 10 levels in it! Binomial probability can save us the tree here as there are a finite
number of shots (10 total) and we’ll assume they are independent.

We have 10 shots, we want to make 6, so 6 successes out of 10 total, leaving 4 failures. The
probability of making the shot is 2/3, so the probability of success = 2/3, while the
probability of failure = 1/3 (the complement).

So the probability that exactly 6 shots are made is:


P(6 out of 10 free throws) = (10C6)(2/3)6(1/3)4 = 0.2276… or about 22.8%.

It may seem strange that the player makes 2/3 of his free throws but only 22.8% chance to
make 6 out of 10. We are looking specifically at exactly 6 out of 10. If we ask a more
reasonable question, it would be the probability that the player makes 6 or more free throws
out of 10 attempts. We would find this sum by adding P(6 out of 10) + P(7 out of 10) + P(8
out of 10) + P(9 out of 10) + P(10 out of 10), as each of these would meet the event “6 or
more.” Calculating these would be similar to 6 out of 10, only the exponents and subscripts
would change. It ends up that the probability of making 6 or more is about 78.7%
Probability 39

Try it now 16:


Two fair dice are rolled 5 times and the sum is recorded. What is the probability that a sum
of 7 occurred in exactly 2 of the 5 rolls? (Hint: to find the probability of a sum of 7, use the
chart from the rolling two dice example)

Reference: Probability Concepts Flow Chart

PART to WHOLE
Single event or multi-
stage event?

Multi-stage Event:
Single Event
How many events?
“AND” vs. “OR”

AND:
(INTERSECTION) OR (UNION): 2 Events:
All conditions More than 2 events:
Add the Draw a tree with or without
must be true probabilities, diagram; be
“GIVEN THAT”: replacement?
subtract any careful with
redefines the “whole” “overlap” dependent events
to match the condition

Are the events independent


(with replacement) or
dependent (without)?

Without Replacement
(Dependent Events): With Replacement
Use counting principles (Independent Events):
Use Binomial Probability
40

Reference: Counting Principles Flow Chart

Are you selecting from


the same set
same sets or separate sets? separate sets

Use the counting


Does order matter? principle:
MULTIPLY
the number of
YES: it’s a NO: it’s a possibilities in each
PERMUTATION COMBINATION set
(use nPr) (use nCr)
Statistics 1

“Florence Nightingale believed—and in all the actions of her life acted upon that belief—
that the administrator could only be successful if he were guided by statistical knowledge.
The legislator—to say nothing of the politician—too often failed for want of this
knowledge.”—K. Pearson1

Statistics
Like most people, you probably feel that it is important to "take control of your life." But
what does this mean? Partly it means being able to properly evaluate the data and claims that
bombard you every day. If you cannot distinguish sound from faulty reasoning, then you are
vulnerable to manipulation and to decisions that are not in your best interest. By choosing to
remain ignorant of methods of analysis, you grant control of your life to someone else, a
grave danger in the current information age. Statistics provides tools that you need in order
to react intelligently to information you hear or read. In this sense, Statistics is one of the
most important things that you can study.

Here are some claims that we have heard on several occasions. We are not saying that each
one of these claims is true!
 4 out of 5 dentists recommend Trident.
 Almost 85% of lung cancers in men and 45% in women are tobacco-related.
 Condoms are effective 94% of the time.
 Native Americans are significantly more likely to be hit crossing the streets than are
people of other ethnicities.
 People tend to be more persuasive when they look others directly in the eye and speak
loudly and quickly.
 Women make 75 cents to every dollar a man makes when they work the same job.
 A surprising new study shows that eating egg whites can increase one's life span.
 People predict that it is very unlikely there will ever be another baseball player with a
batting average over 400.
 There is an 80% chance that in a room full of 30 people that at least two people will
share the same birthday.
 79.48% of all statistics are made up on the spot.

All of these claims are statistical in character. We suspect that some of them sound familiar;
if not, we bet that you have heard other claims like them. Notice how diverse the examples
are; they come from psychology, health, law, sports, business, etc. Indeed, data and data-
interpretation show up in discourse from virtually every facet of contemporary life.

Statistics are often presented in an effort to add credibility to an argument or advice. You can
see this by paying attention to television advertisements. Many of the numbers thrown about
in this way do not represent careful statistical analysis. They can be misleading, and push you
into decisions that you might find cause to regret. For these reasons, learning about statistics
is a long step towards taking control of your life. It is not, of course, the only step needed for
this purpose. These sections will help you learn statistical essentials and help you become an
intelligent consumer of statistical claims.

1K. Pearson The Life, Letters and Labours for Francis Galton, vol. 2, 1924, quoted at
http://math.furman.edu/~mwoodard/mquot.html (Mathematical Quotation Server, Furman University)
© David Lippman, Jeff Eldridge, and www.onlinestatbook.com, Laurel Clifford Creative Commons BY-SA
Statistics 2

You can take the first step right away. To be an intelligent consumer of statistics, your first
reflex must be to question the statistics that you encounter. The British Prime Minister
Benjamin Disraeli famously said, "There are three kinds of lies -- lies, damned lies, and
statistics." This quote reminds us why it is so important to understand statistics. So let us
invite you to reform your statistical habits from now on. No longer will you blindly accept
numbers or findings. Instead, you will begin to think about the numbers, their sources, and
most importantly, the procedures used to generate them.

We have put the emphasis on defending ourselves against fraudulent claims wrapped up as
statistics. Just as important as detecting the deceptive use of statistics is the appreciation of
the proper use of statistics. You must also learn to recognize statistical evidence that
supports a stated conclusion. When a research team is testing a new treatment for a disease,
statistics allows them to conclude based on a relatively small trial that there is good evidence
their drug is effective. Statistics allowed prosecutors in the 1950’s and 60’s to demonstrate
racial bias existed in jury panels. Statistics are all around you, sometimes used well,
sometimes not. We must learn how to distinguish the two cases.

Populations and samples


Before we begin gathering and analyzing data we need to characterize the population we are
studying. If we want to study the amount of money spent on textbooks by a typical first-year
college student, our population might be all first-year students at your college. Or it might
be:
All first-year community college students in the state of Arizona.
All first-year students at public colleges and universities in the state of Arizona.
All first-year students at all colleges and universities in the state of Arizona.
All first-year students at all colleges and universities in the entire United States.
And so on.

Population
The population of a study is the group the collected data is intended to describe.

Sometimes the intended population is called the target population, since if we design our
study badly, the collected data might not actually be representative of the intended
population.

Why is it important to specify the population? We might get different answers to our
question as we vary the population we are studying. First-year students at the University of
Arizona might take slightly more diverse courses than those at your college, and some of
these courses may require less popular textbooks that cost more; or, on the other hand, the
University Bookstore might have a larger pool of used textbooks, reducing the cost of these
books to the students. Whichever the case (and it is likely that some combination of these
and other factors are in play), the data we gather from your college will probably not be the
same as that from the University of Arizona. Particularly when conveying our results to
others, we want to be clear about the population we are describing with our data.
Statistics 3

Example 1: Identifying the Population


A newspaper website contains a poll asking people their opinion on a recent news article.
What is the population?

While the target (intended) population may have been all people, the real population of the
survey is readers of the website.

If we were able to gather data on every member of our population, say the average (we will
define "average" more carefully in a subsequent section) amount of money spent on
textbooks by each first-year student at your college during the 2009-2010 academic year, the
resulting number would be called a parameter.

Parameter
A parameter is a value (average, percentage, etc.) calculated using all the data from a
population

We seldom see parameters, however, since surveying an entire population is usually very
time-consuming and expensive, unless the population is very small or we already have the
data collected.

Census
A survey of an entire population is called a census.

You are probably familiar with two common censuses: the official government Census that
attempts to count the population of the U.S. every ten years, and voting, which asks the
opinion of all eligible voters in a district. The first of these demonstrates one additional
problem with a census: the difficulty in finding and getting participation from everyone in a
large population, which can bias, or skew, the results.

There are occasionally times when a census is appropriate, usually when the population is
fairly small. For example, if the manager of Starbucks wanted to know the average number
of hours her employees worked last week, she should be able to pull up payroll records or ask
each employee directly.

Since surveying an entire population is often impractical, we usually select a sample to


study;

Sample
A sample is a smaller subset of the entire population, ideally one that is fairly
representative of the whole population.

We will discuss sampling methods in greater detail in a later section. For now, let us assume
that samples are chosen in an appropriate manner. If we survey a sample, say 100 first-year
students at your college, and find the average amount of money spent by these students on
textbooks, the resulting number is called a statistic.
Statistics 4

Statistic
A statistic is a value (average, percentage, etc.) calculated using the data from a
sample.

Example 2: Statistics vs. Parameters


A researcher wanted to know how citizens of Tacoma felt about a voter initiative. To study
this, she goes to the Tacoma Mall and randomly selects 500 shoppers and asks them their
opinion. 60% indicate they are supportive of the initiative. What is the sample and
population? Is the 60% value a parameter or a statistic?

The sample is the 500 shoppers questioned. The population is less clear. While the intended
population of this survey was Tacoma citizens, the effective population was mall shoppers.
There is no reason to assume that the 500 shoppers questioned would be representative of all
Tacoma citizens.

The 60% value was based on the sample, so it is a statistic.

Try it Now 1:
To determine the average length of trout in a lake, researchers catch 20 fish and measure
them. What is the sample and population in this study?

Try it Now 2:
A college reports that the average age of their students is 28 years old. Is this a statistic or a
parameter?

Categorizing data
Once we have gathered data, we might wish to classify it. Roughly speaking, data can be
classified as categorical data or quantitative data. Understanding what type of data we are
working with is important as the analysis methods we use depend on the type of data we
have.
Statistics 5

Quantitative and categorical data


Categorical (qualitative) data are pieces of information that allow us to classify the
objects under investigation into various categories.

Quantitative data are responses that are numerical measurements and with which it
makes sense to perform meaningful arithmetic calculations.

Example 3: Movie Survey


We might conduct a survey to determine the name of the favorite movie that each person in a
math class saw in a movie theater.

When we conduct such a survey, the responses would look like: Finding Nemo, The Hulk, or
Terminator 3: Rise of the Machines. We might count the number of people who give each
answer, but the answers themselves do not have any numerical values: we cannot perform
computations with an answer like "Finding Nemo." This would be categorical data.

Example 4: Movie Survey Part II


A survey could ask the number of movies you have seen in a movie theater in the past 12
months (0, 1, 2, 3, 4, ...)

This would be quantitative data.

Other examples of quantitative data would be the running time of the movie you saw most
recently (104 minutes, 137 minutes, 104 minutes, ...) or the amount of money you paid for a
movie ticket the last time you went to a movie theater ($5.50, $7.75, $9, ...).

Sometimes, determining whether or not data is categorical or quantitative can be a bit


trickier. Data may be expressed as numbers where those numbers are used as labels rather
than measurements, such as a baseball player’s jersey number.

Example 5: Zip Codes


Suppose we gather respondents' ZIP codes in a survey to track their geographical location.

ZIP codes are numbers, but we can't do any meaningful mathematical calculations with them
(it doesn't make sense to say that 98036 is "twice" 49018 — that's like saying that
Lynnwood, WA is "twice" Battle Creek, MI, which doesn't make sense at all), so ZIP codes
are really categorical data. ZIP codes are numbers used as labels.

Example 6: Movie Survey Part III


A survey about the movie you most recently attended includes the question "How would you
rate the movie you just saw?" with these possible answers:
Statistics 6

1 - it was awful
2 - it was just OK
3 - I liked it
4 - it was great
5 - best movie ever!

Again, there are numbers associated with the responses, but we can't really do any
calculations with them: a movie that rates a 4 is not necessarily twice as good as a movie that
rates a 2, whatever that means; if two people see the movie and one of them thinks it stinks
and the other thinks it's the best ever it doesn't necessarily make sense to say that "on average
they liked it."

As we study movie-going habits and preferences, we shouldn't forget to specify the


population under consideration. If we survey 3-7 year-olds the runaway favorite might be
Finding Nemo. 13-17 year-olds might prefer Terminator 3. And 33-37 year-olds might
prefer...well, Finding Nemo.

Try it Now 3:
Classify each measurement as categorical or quantitative
a. Eye color of a group of people
b. Daily high temperature of a city over several weeks
c. Annual income

Sampling methods
The first thing we should do before conducting a survey is to identify the population that we
want to study. Suppose we are hired by a politician to determine the amount of support they
have among the electorate should they decide to run for another term. What population
should we study? Every person in the district? Not every person is eligible to vote, and
regardless of how strongly someone likes or dislikes the candidate, they don't have much to
do with the politician being re-elected if they are not able to vote.

What about eligible voters in the district? That might be better, but if someone is eligible to
vote but does not register by the deadline, they won't have any say in the election
either. What about registered voters? Many people are registered but choose not to
vote. What about "likely voters?"

This “likely voter” criteria is used in political polling, but it is sometimes difficult to define a
"likely voter." Is it someone who voted in the last election? In the last general election? In
the last presidential election? Should we consider someone who just turned 18 a "likely
voter?" They weren't eligible to vote in the past, so how do we judge the likelihood that they
will vote in the next election?

In November 1998, former professional wrestler Jesse "The Body" Ventura was elected
governor of Minnesota. Up until right before the election, most polls showed he had little
Statistics 7

chance of winning. There were several contributing factors to the polls not reflecting the
actual intent of the electorate:

Ventura was running on a third-party ticket and most polling methods are better suited to a
two-candidate race.
Many respondents to polls may have been embarrassed to tell pollsters that they were
planning to vote for a professional wrestler.
The mere fact that the polls showed Ventura had little chance of winning might have
prompted some people to vote for him in protest to send a message to the major-party
candidates.

But one of the major contributing factors was that Ventura recruited a substantial amount of
support from young people, particularly college students, who had never voted before and
who registered specifically to vote in the gubernatorial election. The polls did not deem
these young people likely voters (since in most cases young people have a lower rate of voter
registration and a turnout rate for elections) and so the polling samples were subject to
sampling bias: they omitted a portion of the electorate that was weighted in favor of the
winning candidate.

Sampling bias
A sampling method is biased if every member of the population doesn’t have equal
likelihood of being in the sample.

Even identifying the population can be a difficult job, but once we have identified the
population, how do we choose an appropriate sample? Remember, although we would prefer
to survey all members of the population, this is usually impractical unless the population is
very small, so we choose a sample. There are many ways to sample a population, but there is
one goal we need to keep in mind: we would like the sample to be representative of the
population.

Returning to our hypothetical job as a political pollster, we would not anticipate very
accurate results if we drew all of our samples from among the customers at a Starbucks, nor
would we expect that a sample drawn entirely from the membership list of the local Elks club
would provide a useful picture of district-wide support for our candidate.

One way to ensure that the sample has a reasonable chance of mirroring the population is to
employ randomness. The most basic random method is simple random sampling.
Remember that “random” does not mean haphazard. In mathematics, random has a very
specific meaning.

Simple random sample


A random sample is one in which each member of the population has an equal
probability of being chosen.

A simple random sample is one in which every member of the population and any
group of members has an equal probability of being chosen.
Statistics 8

Example 7: Simple Random Sample


If we could somehow identify all likely voters in the state, put each of their names on a piece
of paper, toss the slips into a (very large) hat and draw 1000 slips out of the hat, we would
have a simple random sample.

In practice, computers are better suited for this sort of endeavor than millions of slips of
paper and extremely large headgear.

It is always possible, however, that even a random sample might end up not being totally
representative of the population. If we repeatedly take samples of 1000 people from among
the population of likely voters in the state of Arizona, some of these samples might tend to
have a slightly higher percentage of one political party than does the general population;
some samples might include more older people and some samples might include more
younger people; etc. In most cases, this sampling variability is not significant.

Sampling variability
The natural variation of samples is called sampling variability.
This is unavoidable and expected in random sampling, and in most cases is not an
issue.

To help account for variability, pollsters might instead use a stratified sample.

Stratified sampling
In stratified sampling, a population is divided into a number of subgroups (or strata).
Random samples are then taken from each subgroup with sample sizes proportional to
the size of the subgroup in the population.

Example 8: Stratified Sampling


Suppose in a particular state that previous data indicated that the electorate was comprised of
39% Democrats, 37% Republicans and 24% independents. In a sample of 1000 people, they
would then expect to get about 390 Democrats, 370 Republicans and 240 independents. To
accomplish this, they could randomly select 390 people from among those voters known to
be Democrats, 370 from those known to be Republicans, and 240 from those with no party
affiliation.

Stratified sampling can also be used to select a sample with people in desired age groups, a
specified mix ratio of males and females, etc. A variation on this technique is called quota
sampling.

Quota sampling
Quota sampling is a variation on stratified sampling, wherein samples are collected in
each subgroup until the desired quota is met.

Example 9: Quota Sampling


Suppose the pollsters call people at random, but once they have met their quota of 390
Democrats, they only gather people who do not identify themselves as a Democrat.
Statistics 9

You may have been called by a telephone pollster who started by asking you your age,
income, etc. and then thanked you for your time and hung up before asking any "real"
questions. Most likely, they already had contacted enough people in your demographic
group and were looking for people who were older or younger, richer or poorer, etc. Quota
sampling is usually a bit easier than stratified sampling, but also does not ensure the same
level of randomness.

Another sampling method is cluster sampling, in which the population is divided into
groups, and one or more groups are randomly selected to be in the sample.

Cluster sampling
In cluster sampling, the population is divided into subgroups (clusters), and a set of
subgroups are selected to be in the sample

Example 10: Cluster Sampling


If the college wanted to survey students, since students are already divided into classes, they
could randomly select 10 classes and give the survey to all the students in those classes. This
would be cluster sampling.

Other sampling methods include systematic sampling.

Systematic sampling
In systematic sampling, every nth member of the population is selected to be in the
sample.

Example 11: Systematic Sampling


To select a sample using systematic sampling, a pollster calls every 100th name in the phone
book.

Systematic sampling is not as random as a simple random sample (if your name is Albert
Aardvark and your sister Alexis Aardvark is right after you in the phone book, there is no
way you could both end up in the sample) but it can yield acceptable samples.

Perhaps the worst types of sampling methods are convenience samples and voluntary
response samples.

Convenience sampling and voluntary response sampling


Convenience sampling is samples chosen by selecting whoever is convenient.
Voluntary response sampling is allowing the sample to volunteer.

Example 12: Convenience Sampling


A pollster stands on a street corner and interviews the first 100 people who agree to speak to
him. This is a convenience sample.
Statistics 10

Example 13: Voluntary Response Sampling


A website has a survey asking readers to give their opinion on a tax proposal. This is a self-
selected sample, or voluntary response sample, in which respondents volunteer to
participate.

Usually voluntary response samples are skewed towards people who have a particularly
strong opinion about the subject of the survey or who just have way too much time on their
hands and enjoy taking surveys.

Try it Now 4:
In each case, indicate what sampling method was used
a. Every 4th person in the class was selected
b. A sample was selected to contain 25 men and 35 women
c. Viewers of a new show are asked to vote on the show’s website
d. A website randomly selects 50 of their customers to send a satisfaction survey to
e. To survey voters in a town, a polling company randomly selects 10 city blocks, and
interviews everyone who lives on those blocks.

How to mess things up before you start


There are number of ways that a study can be ruined before you even start collecting data.
The first we have already explored – sampling or selection bias, which is when the sample
is not representative of the population. One example of this is voluntary response bias,
which is bias introduced by only collecting data from those who volunteer to participate.
This is not the only potential source of bias.

Sources of bias
Sampling bias – when the sample is not representative of the population
Voluntary response bias – the sampling bias that often occurs when the sample is
volunteers
Self-interest study – bias that can occur when the researchers have an interest in the
outcome
Response bias – when the responder gives inaccurate responses for any reason
Perceived lack of anonymity – when the responder fears giving an honest answer
might negatively affect them
Loaded questions – when the question wording influences the responses
Non-response bias – when people refusing to participate in the study can influence the
validity of the outcome

Example 14: Self-Interest


Consider a recent study which found that chewing gum may raise math grades in teenagers 2.
This study was conducted by the Wrigley Science Institute, a branch of the Wrigley chewing
gum company. This is an example of a self-interest study; one in which the researches have
a vested interest in the outcome of the study. While this does not necessarily ensure that the
study was biased, it certainly suggests that we should subject the study to extra scrutiny.

2 Reuters. http://news.yahoo.com/s/nm/20090423/od_uk_nm/oukoe_uk_gum_learning. Retrieved 4/27/09


Statistics 11

Example 15: Response Bias


A survey asks people “when was the last time you visited your doctor?” This might suffer
from response bias, since many people might not remember exactly when they last saw a
doctor and give inaccurate responses.

Sources of response bias may be innocent, such as bad memory, or as intentional as


pressuring by the pollster. Consider, for example, how many voting initiative petitions
people sign without even reading them.

Example 16: Anonymous?


A survey asks participants a question about their interactions with members of other races.
Here, a perceived lack of anonymity could influence the outcome. The respondent might
not want to be perceived as racist even if they are, and give an untruthful answer.

Example 17: Lies...


An employer puts out a survey asking their employees if they have a drug abuse problem and
need treatment help. Here, answering truthfully might have consequences; responses might
not be accurate if the employees do not feel their responses are anonymous or fear retribution
from their employer.

Example 18: Lead the Way


A survey asks “do you support funding research of alternative energy sources to reduce our
reliance on high-polluting fossil fuels?” This is an example of a loaded or leading question
– questions whose wording leads the respondent towards an answer.

Loaded questions can occur intentionally by pollsters with an agenda, or accidentally through
poor question wording. Also a concern is question order, where the order of questions
changes the results. A psychology researcher provides an example3:

“My favorite finding is this: we did a study where we asked students, 'How satisfied
are you with your life? How often do you have a date?' The two answers were not
statistically related - you would conclude that there is no relationship between dating
frequency and life satisfaction. But when we reversed the order and asked, 'How often
do you have a date? How satisfied are you with your life?' the statistical relationship
was a strong one. You would now conclude that there is nothing as important in a
student's life as dating frequency.”

Example 19: Just Hang Up?


A telephone poll asks the question “Do you often have time to relax and read a book?”, and
50% of the people called refused to answer the survey. It is unlikely that the results will be
representative of the entire population. This is an example of non-response bias, introduced
by people refusing to participate in a study or dropping out of an experiment. When people
refuse to participate, we can no longer be so certain that our sample is representative of the
population.

3 Swartz, Norbert. http://www.umich.edu/~newsinfo/MT/01/Fal01/mt6f01.html. Retrieved 3/31/2009


Statistics 12

Try it Now 5:
In each situation, identify a potential source of bias
a. A survey asks how many sexual partners a person has had in the last year
b. A radio station asks readers to phone in their choice in a daily poll.
c. A substitute teacher wants to know how students in the class did on their last test. The
teacher asks the 10 students sitting in the front row to state their latest test score.
d. High school students are asked if they have consumed alcohol in the last two weeks.
e. The Beef Council releases a study stating that consuming red meat poses little
cardiovascular risk.
f. A poll asks “Do you support a new transportation tax, or would you prefer to see our
public transportation system fall apart?”

Experiments
So far, we have primarily discussed observational studies – studies in which conclusions
would be drawn from observations of a sample or the population. In some cases these
observations might be unsolicited, such as studying the percentage of cars that turn right at a
red light even when there is a “no turn on red” sign. In other cases the observations are
solicited, like in a survey or a poll.

In contrast, it is common to use experiments when exploring how subjects react to an


outside influence. In an experiment, some kind of treatment is applied to the subjects and
the results are measured and recorded.

Observational studies and experiments


An observational study is a study based on observations or measurements
An experiment is a study in which the effects of a treatment are measured

Example 20: Examples of Experiments


A pharmaceutical company tests a new medicine for treating Alzheimer’s disease by
administering the drug to 50 elderly patients with recent diagnoses. The treatment here is the
new drug.
A gym tests out a new weight loss program by enlisting 30 volunteers to try out the program.
The treatment here is the new program.
You test a new kitchen cleaner by buying a bottle and cleaning your kitchen. The new
cleaner is the treatment.
A psychology researcher explores the effect of music on temperament by measuring people’s
temperament while listening to different types of music. The music is the treatment.

Try it Now 6:
Is each scenario describing an observational study or an experiment?
a. The weights of 30 randomly selected people are measured
b. Subjects are asked to do 20 jumping jacks, and then their heart rates are measured
c. Twenty coffee drinkers and twenty tea drinkers are given a concentration test
Statistics 13

When conducting experiments, it is essential to isolate the treatment being tested.

Example 21: Confounding Variables


Suppose a middle school (junior high) finds that their students are not scoring well on the
state’s standardized math test. They decide to run an experiment to see if an alternate
curriculum would improve scores. To run the test, they hire a math specialist to come in and
teach a class using the new curriculum. To their delight, they see an improvement in test
scores.

The difficulty with this scenario is that it is not clear whether the curriculum is responsible
for the improvement, or whether the improvement is due to a math specialist teaching the
class. This is called confounding – when it is not clear which factor or factors caused the
observed effect. Confounding is the downfall of many experiments, though sometimes it is
hidden.

Confounding
Confounding occurs when there are two potential variables that could have caused the
outcome and it is not possible to determine which actually caused the result.

Example 22: Confounding II


A drug company study about a weight loss pill might report that people lost an average of 8
pounds while using their new drug. However, in the fine print you find a statement saying
that participants were encouraged to also diet and exercise. It is not clear in this case whether
the weight loss is due to the pill, to diet and exercise, or a combination of both. In this case
confounding has occurred.

Example 23: Confounding III


Researchers conduct an experiment to determine whether students will perform better on an
arithmetic test if they listen to music during the test. They first give the student a test without
music, then give a similar test while the student listens to music. In this case, the student
might perform better on the second test, regardless of the music, simply because it was the
second test and they were warmed up.

There are a number of measures that can be introduced to help reduce the likelihood of
confounding. The primary measure is to use a control group.
Control group
When using a control group, the participants are divided into two or more groups,
typically a control group and a treatment group. The treatment group receives the
treatment being tested; the control group does not receive the treatment.

Ideally, the groups are otherwise as similar as possible, isolating the treatment as the only
potential source of difference between the groups. For this reason, the method of dividing
groups is important. Some researchers attempt to ensure that the groups have similar
characteristics (same number of females, same number of people over 50, etc.), but it is
nearly impossible to control for every characteristic. Because of this, random assignment is
very commonly used.
Statistics 14

Example 24: Control Group


To determine if a two day prep course would help high school students improve their scores
on the SAT test, a group of students was randomly divided into two subgroups. The first
group, the treatment group, was given a two day prep course. The second group, the control
group, was not given the prep course. Afterwards, both groups were given the SAT.

Example 25: Control Group II


A company testing a new plant food grows two crops of plants in adjacent fields, the
treatment group receiving the new plant food and the control group not. The crop yield
would then be compared. By growing them at the same time in adjacent fields, they are
controlling for weather and other confounding factors.

Sometimes not giving the control group anything does not completely control for
confounding variables. For example, suppose a medicine study is testing a new headache pill
by giving the treatment group the pill and the control group nothing. If the treatment group
showed improvement, we would not know whether it was due to the medicine in the pill, or a
response to have taken any pill. This is called a placebo effect.

Placebo effect
The placebo effect is when the effectiveness of a treatment is influenced by the
patient’s perception of how effective they think the treatment will be, so a result might
be seen even if the treatment is ineffectual.

Example 26: Placebo Effect


A study found that when doing painful dental tooth extractions, patients told they were
receiving a strong painkiller while actually receiving a saltwater injection found as much
pain relief as patients receiving a dose of morphine.4

To control for the placebo effect, a placebo, or dummy treatment, is often given to the
control group. This way, both groups are truly identical except for the specific treatment
given.

Placebo and Placebo controlled experiments


A placebo is a dummy treatment given to control for the placebo effect.
An experiment that gives the control group a placebo is called a placebo controlled
experiment.

Example 27: Types of Placebos


In a study for a new medicine that is dispensed in a pill form, a sugar pill could be used as a
placebo.
In a study on the effect of alcohol on memory, a non-alcoholic beer might be given to the
control group as a placebo.

4 Levine JD, Gordon NC, Smith R, Fields HL. (1981) Analgesic responses to morphine and placebo in
individuals with postoperative pain. Pain. 10:379-89.
Statistics 15

In a study of a frozen meal diet plan, the treatment group would receive the diet food, and the
control could be given standard frozen meals stripped of their original packaging.

In some cases, it is more appropriate to compare to a conventional treatment than a placebo.


For example, in a cancer research study, it would not be ethical to deny any treatment to the
control group or to give a placebo treatment. In this case, the currently acceptable medicine
would be given to the second group, called a comparison group in this case. In our SAT
test example, the non-treatment group would most likely be encouraged to study on their
own, rather than be asked to not study at all, to provide a meaningful comparison.

When using a placebo, it would defeat the purpose if the participant knew they were
receiving the placebo.

Blind studies
A blind study is one in which the participant does not know whether or not they are
receiving the treatment or a placebo.

A double-blind study is one in which those interacting with the participants don’t
know who is in the treatment group and who is in the control group.

Example 28: Double-blind Study


In a study about anti-depression medicine, you would not want the psychological evaluator to
know whether the patient is in the treatment or control group either, as it might influence
their evaluation, so the experiment should be conducted as a double-blind study.

It should be noted that not every experiment needs a control group.

Example 29: No Control


If a researcher is testing whether a new fabric can withstand fire, she simply needs to torch
multiple samples of the fabric – there is no need for a control group.

Try it Now 7:
To test a new lie detector, two groups of subjects are given the new test. One group is asked
to answer all the questions truthfully, and the second group is asked to lie on one set of
questions. The person administering the lie detector test does not know what group each
subject is in.

Does this experiment have a control group? Is it blind, double-blind, or neither?


Statistics 16

Describing Data
Once we have collected data from surveys or experiments, we need to summarize and present
the data in a way that will be meaningful to the reader. We will begin with graphical
presentations of data then explore numerical summaries of data.

Presenting Categorical Data Graphically


Categorical, or qualitative, data are pieces of information that allow us to classify the objects
under investigation into various categories. We usually begin working with categorical data
by summarizing the data into a frequency table.

Frequency Table
A frequency table is a table with two columns. One column lists the categories, and
another for the frequencies with which the items in the categories occur (how many
items fit into each category).

Example 30: Car Color and Accident Frequency


An insurance company determines vehicle insurance premiums based on known risk factors.
If a person is considered a higher risk, their premiums will be higher. One potential factor is
the color of your car. The insurance company believes that people with some color cars are
more likely to get in accidents. To research this, they examine police reports for recent total-
loss collisions. The data is summarized in the frequency table below.

Color Frequency
Blue 25
Green 52
Red 41
White 36
Black 39
Grey 23

Sometimes we need an even more intuitive way of displaying data, where charts and graphs
come in. There are many, many ways of displaying data graphically, but we will concentrate
on one very useful type of graph called a bar graph. In this section we will work with bar
graphs that display categorical data; the next section will be devoted to bar graphs that
display quantitative data.

Bar graph
A bar graph is a graph that displays a bar for each category with the length of each bar
indicating the frequency of that category.

To construct a bar graph, we need to draw a vertical axis and a horizontal axis. The vertical
axis measures the frequency of each category, so it has a numerical scale. The horizontal
axis labels the categories of the data. The construction of a bar chart is most easily described
by use of an example.
Statistics 17

Example 31: Bar Graph of Car Data from Example 1


Using our car data, we note the highest frequency is 52, so our vertical axis needs to go from
0 to 52, but we might as well use 0 to 55, so that we can put a hash mark every 5 units:

55
50
Color Frequency 45
Blue 25 40

Frequency
35
Green 52 30
25
Red 41 20
White 36 15
10
Black 39 5
Grey 23 0
Blue Green Red White Black Grey
Vehicle color involved in total-loss collision

Notice that the height of each bar is determined by the frequency of the corresponding
color. The horizontal gridlines are a nice touch, but not necessary. In practice, you will find
it useful to draw bar graphs using graph paper, so the gridlines will already be in place, or
using technology. Instead of gridlines, we might also list the frequencies at the top of each
bar, like this:
55 52
50
45 41 39
40 36
Frequency

35
30 25 23
25
20
15
10
5
0
Blue Green Red White Black Grey

Vehicle color involved in total-loss collision

In this case, our chart might benefit from being reordered from largest to smallest frequency.
This arrangement can make it easier to compare similar values in the chart, even without
gridlines. When we arrange the categories in decreasing frequency order like this, it is called
a Pareto chart, named after an Italian economist Wilfredo Pareto.

Pareto chart
A Pareto chart is a bar graph ordered from highest to lowest frequency
Statistics 18

Example 32: Pareto Chart of Card Data


Transforming our bar graph from earlier into a Pareto chart, we get:
55 52
50
45 41 39
40 36
Frequency

35
30 25 23
25
20
15
10
5
0
Green Red Black White Blue Grey

Vehicle color involved in total-loss collision

With the Pareto chart, we can see that the green cars have the highest total-loss accident
frequency while the blue and grey cars have the lowest frequency.

Notice that we have not changed the data or size of the bars between the two graphs: they are
the same, just reordered. We can reorder the bars in a bar graph of qualitative data as the
categories themselves have no particular order. We need to be mindful of the type of data we
work with, as not all data can be reordered in this manner.

Since they are organized from


greatest to least frequency, Pareto Cause of Death in US (2010)
charts identify visually the 35% 100%

CUMULATIVE PERCENT
category of data with the greatest
PERCENT OF DEATHS

30% 90%
impact on the situation being 80%
25% 70%
analyzed. The Pareto chart at the 20% 60%
right shows the top ten causes of 50%
5 15% 40%
death in the United States in 2010
10% 30%
and includes a line representing 20%
5%
the cumulative effect (running 10%
total of the percentages) of each 0% 0%
Alzheimer's
Stroke

Accidents
Cancer

Suicide
L. Respiratory
Heart Disease

Flu/Pneumonia
Nephritis
Diabetes

category. From this chart, we can


see that heart disease and cancer
cause a large percent of the deaths
in comparison to the other CAUSE OF DEATH
categories. The cumulative line is
steep for these two categories then levels off more dramatically for the other categories as
their proportion of the total deaths is much less.

What do we do with this information? Do we use it to argue that the majority of our
resources should be aimed at preventing and treating heart disease and cancer, and less
toward preventing and treating diabetes and suicide? A Pareto chart tells us which categories
have the greatest effect on the whole; if this was a case of quality control in a factory, it

5 Chart created from CDC data, http://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm


Statistics 19

certainly makes sense to aim resources as the greatest source of loss. Does the same idea
apply to human life? That is an ethical and moral question we have to answer ourselves.
Bar graphs and Pareto charts provide us a visual way to compare sizes of categories to each
other. To show category size in relation to the whole data set, it is common to use a pie
chart.

Pie Chart
A pie chart is a circle with wedges cut of varying sizes marked out like slices of pie or
pizza. The relative sizes of the wedges correspond to the relative frequencies of the
categories.

The central angle of each wedge is found by multiplying the proportion (part to whole
ratio) of each category by 360º.

Example 33: Pie Chart with Car Data from Example 1


For our vehicle color data, a pie chart might look like this:

Color Frequency
Blue 25
Green 52
Red 41
White 36
Black 39
Grey 23

Note that the category with the largest frequency is green, as we observed in the bar chart
form. In the pie chart, we can see that green cars represent just a little bit less than a quarter
of all the total-loss car collisions. We can also see that red, black, and white are next in size,
although it is difficult to see the differences among them, which a bar graph shows more
clearly.

Often having the category names next to


the pie slices also makes the chart
clearer. Pie charts can be clearer if we
include frequencies or relative
frequencies (in percent form) in the
chart next to the pie slices as opposed to 87º
a separate scale.

Note that the green category is 24.1% of


the whole data. The central angle of the
green pie wedge is 24.1% of 360º,
(0.241)(360º) = 86.76º, or about 87º.
Statistics 20

Example 34: Voting Preferences Voter preferences


The pie chart to the right shows the
percentage of voters supporting each
candidate running for a local senate seat.
Ellison
46% Douglas
43%
If there are 20,000 voters in the district, the
pie chart shows that about 11% of those,
about 2,200 voters, support Reeves.
Reeves
11%
Beware: since pie charts represent their
categories’ proportion of the whole data set,
we need to make sure all the data set is
represented to avoid making a misleading chart. If there was a fourth candidate in this
election, they should be included. If there are voters are undecided on a candidate, an
“undecided” category should be included. A missing category will change the shape of the
pie charts by exaggerating the included categories’ proportion of the whole.

Pie charts look nice, but are harder to draw by hand than bar charts as accurate drawings
require computing the central angle of each wedge and measuring the angle with a protractor.
Computers are much better suited to drawing pie charts. Common software programs like
Microsoft Word or Excel, OpenOffice.org Write or Calc, or Google Docs are able to create
bar graphs, pie charts, and other graph types. There are also numerous online tools that can
create graphs6.

Try it Now 8:
Create a bar graph and a pie chart to illustrate the grades on a history exam below.
A: 12 students, B: 19 students, C: 14 students, D: 4 students, F: 5 students

60
Don’t get fancy with graphs! People
50
sometimes add features to graphs that
Frequency

40
don’t help to convey their information. 30
For example, three-dimensional bar and 20
pie charts like the ones shown are 10
0
usually not as effective as their two-
Blue
Green

dimensional counterparts. Notice how


Red

White

the three dimensional effect diminishes


Grey

Black

the bar graph’s power in comparing Car Color


sizes of the categories to each other.

Notice how the three dimensional effect


distorts the relative size of each pie wedge
compared to the whole, making the red wedge
appear largest. If a third dimension has no
meaning, don’t add it; it distorts the visual
comparison these charts provide.

6 For example: http://nces.ed.gov/nceskids/createAgraph/ or http://docs.google.com


Statistics 21

Another way that fanciness can lead to trouble: instead of plain bars, it is tempting to
substitute meaningful images. This type of graph is called a pictogram.

Pictogram
A pictogram is a statistical graphic in which the size of the picture is intended to
represent the frequencies or size of the values being represented.

Example 35: Pictogram Distortion


A labor union might produce the graph to the right to show the
difference between the average manager salary and the average worker
salary.

Looking at the picture, it would be reasonable to guess that the


manager salaries is 4 times as large as the worker salaries – the area of
the bag looks about 4 times as large. However, the manager salaries
are in fact only twice as large as worker salaries, which were reflected Manager Worker
in the picture by making the manager bag twice as tall. Salaries Salaries

Another distortion in bar charts results from setting the baseline to a value other than zero.
The baseline is the bottom of the vertical axis, representing the least number of cases that
could have occurred in a category. Normally, this number should be zero.

Example 36: Scaling Distortion


The three graphs shown represent the same data set: the percent
of high school seniors who smoke for the years 1990, 2000, and High School Seniors Who Smoke
20107. 31.0%

Notice how different the graphs look and yet there is no change 26.0%

in the data itself, just the scale. Do you see how changing the
21.0%
scale can visually exaggerate or minimize differences between
data values?
16.0%
1990 2000 2010
A scale with a small range, shown on the top right graph, makes Year
the differences between the bars look large.
High School Seniors who Smoke
A scale with a large range, such as the graph on the lower right, 100.0%
makes the differences between the bars appear small. Even 80.0%
though the scale starts appropriately at 0%, its maximum is far 60.0%
beyond the actual data maximum and the empty vertical space
40.0%
compresses the visual differences among the bars.
20.0%
A more appropriate scale could begin at 0% and have a 0.0%
maximum around 40%. 1990 2000 2010
Year

7 Graphs created from data provided by the CDC at http://www.cdc.gov/nchs/hus/healthrisk.htm


Statistics 22

Try it Now 9:
A poll was taken asking people if they agreed with the Nguyen McKee,
positions of the 4 candidates for a county office. Does , 42% 35%
the pie chart present a good representation of this data?
Explain.
Jones, Brown,
64% 52%

Presenting Quantitative Data Graphically


Quantitative, or numerical, data can also be summarized into frequency tables.

Example 37: Quiz Scores


A teacher records scores on a 20-point quiz for the 30 students in the class. The scores are:

19, 20, 18, 18, 17, 18, 19, 17, 20, 18, 20, 16, 20, 15, 17, 12, 18, 19, 18, 19, 17, 20, 18, 16, 15,
18, 20, 5, 0, 0

These scores could be summarized into a frequency table by grouping like values:

Score Frequency
0 2
5 1
12 1
15 2
16 2
17 4
18 8
19 4
20 6

8
Using this table, it would be possible
7
to create a standard bar chart from
6
this summary, like we did for
Frequency

5
categorical data, as shown here: 4
3
However, since the scores are 2
numerical values, this chart doesn’t 1
really make sense; the first and 0
0 5 12 15 16 17 18 19 20
second bars are five values apart,
while the later bars are only one Score
value apart. It would be more correct
to treat the horizontal axis as a number line. This type of graph is called a histogram.
Statistics 23

Histogram
A histogram is like a bar graph, where the both the vertical and horizontal axes are
number lines with consistent, consecutive increments.

Histograms display the distribution of the data.

The height of each bar represents the frequency for the value or interval of values along
the horizontal axis. Bars are drawn adjacent to each other with no gaps between them.
A gap between the bars indicates the value on the horizontal axis has 0 frequency.

Example 38: Histogram of Quiz Scores


A histogram of the quiz score data looks like:
9 Score Frequency
8 0 2
7 5 1
6 12 1
Frequency

5 15 2
4
16 2
17 4
3
18 8
2
19 4
1
20 6
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Score

Notice that in the histogram, a bar represents values on the horizontal axis from the value on
the left hand-side of the bar up to but not including the value on the right hand side of the
bar. Thus the first bar represents quiz scores from 0 up to but not including 1. Some people
choose to have bars start at ½ values to avoid this ambiguity, centering the bar over the
whole number value.

Unfortunately, not a lot of common


software packages can correctly graph 8
7
a histogram. About the best you can do
6
in Excel or Word is a bar graph with no
Frequency

5
gap between the bars and spacing 4
added to simulate a numerical 3
horizontal axis. Like many software 2
packages, the TI83/84 calculators can 1
0
draw histograms but do not allow the 0 2 4 6 8 10 12 14 16 18 20
user to choose the interval size/scale Score
for the horizontal axis. For a JAVA
applet that draws histograms from data
and allows the user to control the interval size on the horizontal axis, try the histogram applet
at http://nlvm.usu.edu/en/nav/topic_t_5.html.
Statistics 24

If we have a large number of widely varying data values, creating a frequency table that lists
every possible value as a category would lead to an exceptionally long frequency table, and
probably would not reveal any patterns. For this reason, it is common with quantitative data
to group data into class intervals.

Class Intervals
Class intervals are groupings of the data. In general, we define class intervals so that:
 Each interval is equal in size. For example, if the first class contains values from
120-129, the second class should include values from 130-139.
 We have somewhere between 5 and 20 classes, typically, depending upon the
number of data we’re working with.

Example 39: Grouped Frequency Table and Histogram


Suppose that we have collected weights from 100 male subjects as part of a nutrition study.
For our weight data, we have values ranging from a low of 121 pounds to a high of 263
pounds, giving a total span of 263-121 = 142. We could create 7 intervals with a width of
around 20, 14 intervals with a width of around 10, or somewhere in between. Often time we
have to experiment with a few possibilities to find something that represents the data well.
Let us try using an interval width of 15. We could start at 121, or at 120 since it is a nice
round number. From the frequency table we can create the histogram:
30
Interval Frequency
120 - 134 4 25

135 – 149 14
20
150 – 164 16
Frequency

165 – 179 28 15
180 – 194 12
10
195 – 209 8
210 – 224 7 5
225 – 239 6
240 – 254 2 0
120 135 150 165 180 195 210 225 240 255 270
255 - 269 3
Weights (pounds)

30
In many software 25
packages, such as
Frequency

20
EXCEL, you can create a 15
graph similar to a 10
histogram by putting the 5
class intervals as the
0
labels on a bar chart. 120- 135- 150- 165- 180- 195- 210- 225- 240- 255-
134 149 164 179 194 209 224 239 254 269

Weights (pounds)
Statistics 25

Try it Now 10:


The total cost of textbooks for the term was collected from 36 students. Create a histogram
for this data.

$140 $160 $160 $165 $180 $220 $235 $240 $250 $260 $280 $285
$285 $285 $290 $300 $300 $305 $310 $310 $315 $315 $320 $320
$330 $340 $345 $350 $355 $360 $360 $380 $395 $420 $460 $460

When collecting data to compare two groups, it is desirable to create a graph that compares
quantities.

Example 40: Computer Target Practice


The data below came from a task in which the goal is to move a computer mouse to a target
on the screen as fast as possible. On 20 of the trials, the target was a small rectangle; on the
other 20, the target was a large rectangle. Time to reach the target was recorded on each trial.

Interval Frequency Frequency


(milliseconds) small target large target
300-399 0 0
400-499 1 5
500-599 3 10
600-699 6 5
700-799 5 0
800-899 4 0
900-999 0 0
1000-1099 1 0
1100-1199 0 0

One option to represent this data would be a comparative histogram or bar chart, in which
bars for the small target group and large target group are placed next to each other.
10

8
Frequency

6 Small Target

4 Large Target

0
300- 400- 500- 600- 700- 800- 900- 1000- 1100-
399 499 599 699 799 899 999 1099 1199

Reaction time (milliseconds)

Another option is to create a frequency polygon by marking the midpoints of the top of the
histogram bars and connecting these dots.
Statistics 26

This graph makes it easier to see that reaction times were generally shorter for the larger
target, and that the reaction times for the smaller target were more spread out.
10

8
Frequency

6 Small Target

4 Large Target

0
350 450 550 650 750 850 950 1050 1150

Reaction time (milliseconds)

Frequency polygon
An alternative representation is a frequency polygon. A frequency polygon starts out
like a histogram, but instead of drawing a bar, a point is placed in the midpoint of each
interval at height equal to the frequency. Typically the points are connected with
straight lines to emphasize the distribution of the data.

Other plots that create a visual display of quantitative data that is similar to that of a
histogram include stem and leaf plots and dot plots. These plots have the advantage of
being easy to do quickly using just pencil and paper and providing a display of the
distribution of the data.

Stem and leaf plots use the right-most digit of the data to represent the data value itself, and
the remaining digits as an interval indicator. The data set given below shows the number of
times the top rookies of the National League were walked during the 2013 baseball season 8:
23 23 15 11 11 17 21 10 37 33 2 30 34 20
4 13 33 36 31 33 3 30 25 3 20 1 31 12
The stem and leaf plot of this data is created by using the tens’ digit of each value as the
“stem,” drawn on the left side of the vertical bar, and the ones’ digit of each value as the
“leaf,” drawn on the right side of the vertical bar, thus 0|1 indicates the value of 1 while 2|1
indicates the value of 21. Values with the same tens’ digit do not repeat the tens’ digit but
list the ones’ digit adjacent to each other. Each leaf value represents a data value.

0|1 2 3 3 4
Stem 1|0 1 1 2 3 5 7 Leaf Values
Values 2|0 0 1 3 3 5
3|0 0 1 1 3 3 3 4 6 7

Thus the line 2|0 0 1 3 3 5 represents the data values 20, 20, 21, 23, 23, 25. Visually we can
see the frequency of each interval (there are 6 values in the 20-29 interval) as well as the

8 Data from http://mlb.mlb.com/stats/sortable_rookie.jsp


Statistics 27

individual data values themselves; we could easily recreate the data set from the plot. With
consistent font size and spacing, we also see the shape of the distribution, creating a sideways
histogram-like display:

0|1 2 3 3 4
1|0 1 1 2 3 5 7
2|0 0 1 3 3 5
3|0 0 1 1 3 3 3 4 6 7

Looking at the display, we can see that the interval with the greatest frequency is 30 to 39
walks.

Stem and Leaf Plot


A stem and leaf plot shows the distribution of quantitative data by using the digits as
the display, separating the right-most digit from the rest of the digits.

The left-most digit(s) are stem values, indicating the class interval, with the right-most
digit is a leaf value, indicating the frequency of the value in the interval.

We can see more detail in the distribution if we break the intervals into two parts, listing each
stem value twice, the first stem value with leaf values from 0 to 4, and the second stem value
with leaf values 5 to 9:

0| 1 2 3 3 4
0|
1| 0 1 1 2 3
1| 5 7
2| 0 0 1 3 3
2| 5
3| 0 0 1 1 3 3 3 4
3| 6 7

In this revised plot, gaps between values, clusters of values, or other patterns in the
distribution may become more apparent. This data shows a gap on the second 0 row: no
rookies were walked 5-9 times.

A back-to-back steam and leaf plot can compare two distributions by using the same stem
values for both, but listing one distribution’s leaf values to the right of the stem, and the other
to the left. The plot below shows the number of walks for top rookies of the National League
in the 2012 season on the left side of the stem, with the 2013 season on the right side:

8 7 7 5|0|1 2 3 3 4
9 7 7 0|1|0 1 1 2 3 5 7
7 5 5 3 2 2|2|0 0 1 3 3 5
6 4 1|3|0 0 1 1 3 3 3 4 6 7
3|4|
6|5|
2|6|
Statistics 28

From this display, we can see that the most frequent walk interval of 2012 was 20 to 29,
lower than 2013’s 30 to 39. We can also see that the data in 2012 is more spread out, with
three values higher than all of 2013’s values.

A dot plot, sometimes called a line plot, is another visual summary of the distribution of the
quantitative data that can be viewed like a histogram. Dot plots work well with a relatively
small range of data, and like a stem and leaf plot, can be created quickly by hand. The data
below shows the number of green M&Ms in 19 “fun size” packages:
4, 9, 7, 6, 7, 6, 1, 3, 1, 2, 3, 3, 3, 4, 4, 8, 8, 4, 1

To create a dot plot of the data, we mark a symbol (dot or “x”) over the value on a number
line. When values are repeated, the symbol is stacked vertically, creating a bar-like shape
with height that matches the frequency of the value:

From the shape of the dot plot above we can observe that the two most frequent counts of
green M&Ms in a fun size package are 3 and 4, the data is distributed with counts from 1 to
9, with a gap at 5, and less frequent amounts greater than 5.

Dot Plot
A dot plot shows the distribution of the data by plotting each data value as a dot or
other symbol above value on a number line, with repeated values stacked vertically.
The height of the dots shows the frequency of each data value.

What to Look for in a Distribution


When we create a visual display of the distribution of quantitative data, we look for certain
features it may or may not have, and what these features may tell us about the data. We look
at its overall shape, if it has symmetry,
whether there are any gaps or outliers,
where any peaks or clusters of data lie, and
how spread out the data is.

The histogram shown at the right illustrates


FICO (credit) scores 9. Notice that it peaks
toward the right (750 to 799 interval) and
tapers toward the left. This distribution is
skewed by the left side “tail,” and we
describe it as skewed to the left. There are
no visible gaps or outliers.

9 By Vikjam (Own work) [CC-BY-SA-3.0], via Wikimedia Commons


Statistics 29

The histogram at the left, showing the


number of incoming links with articles on
Wikipedia 10 peaks on the left, with 1 link
having the greatest frequency, then tapers
off to the right. This distribution is skewed
by the right “tail” and we describe it as
skewed to the right. It has no gaps or
outliers visible.

The next two histograms come from a study


of Markov chains11 (a topic a bit beyond our
course; we’re just viewing the histograms).
The histogram at the right shows almost even
bars, with little variation among them, no
gaps, no outliers, and . It represents uniform
distribution. A perfectly uniform
distribution would look like a rectangle.

This histogram on the left is almost symmetrical, with a


peak in the middle and tapering off on both the left and
right sides. This shape is often referred to as a “bell
shape” and is commonly known as a normal
distribution. Many natural processes and behaviors
show a normal distribution. We examine the normal
distribution more closely later in this text.

This last histogram (an example histogram not representing real data) has
two visible peaks in the data. This type of graph is called bimodal
distribution.

Try it now 11:


Classify each of the following distributions as: normal, uniform, skewed to the left, skewed
to the right, or bimodal, and provide a reason:
b. c. d.

10By Ragesoss (Own work) [CC-BY-SA-3.0] via Wikimedia Commons


11Histograms from http://statmechalgcomp.wikispaces.com/Coupling+of+Markov+chains+-+Perfect+sampling
[CC-BY-SA-3.0]
Statistics 30

Numerical Summaries of Data


In addition to graphically displaying data we can use numbers to summarize a distribution.
One important aspect of a distribution is where its center is located. Measures of central
tendency are discussed first. A second aspect of a distribution is how spread out it is. In other
words, how much the data in the distribution vary from one another. The second section
describes measures of variability.

Measures of Central Tendency


Let's begin by trying to find the most "typical" value of a data set.

Note that we just used the word "typical" although in many cases you might think of using
the word "average." We need to be careful with the word "average" as it means different
things to different people in different contexts. One of the most common uses of the word
"average" is what mathematicians and statisticians call the arithmetic mean, or just plain old
mean for short. "Arithmetic mean" sounds rather fancy, but you have likely calculated a
mean many times without realizing it; the mean is what most people think of when they use
the word "average".

Mean
The mean of a set of data is the sum of the data values divided by the number of
values.

The formula for the mean is:


∑𝑥
𝑥̅ =
𝑛
The symbol 𝑥̅ , an x with a “bar” on the top, stands for the mean, the Greek letter 
means the sum, the variable x stands for the data values and n is the number of values;
this formula is the symbolic representation of the definition of the mean.

Example 41: Exam Average


Marci’s exam scores for her last math class were: 79, 86, 82, 94. The mean of these values
would be:

79  86  82  94
 85.25 .
4
Typically we round means to one more decimal place than the original data had. In this
case, we would round 85.25 to 85.3.

Example 42: Touchdown Average


The number of touchdown (TD) passes thrown by each of the 31 teams in the National
Football League in the 2000 season are shown below.
37 33 33 32 29 28 28 23 22 22 22 21 21 21 20
20 19 19 18 18 18 18 16 15 14 14 14 12 12 9 6

Adding these values, we get 634 total TDs. Dividing by 31, the number of data values, we
get 634/31 = 20.4516. It would be appropriate to round this to 20.5.
Statistics 31

It would be most correct for us to report that “The mean number of touchdown passes thrown
in the NFL in the 2000 season was 20.5 passes,” but it is not uncommon to see the more
casual word “average” used in place of “mean.”

Try it Now 12:


The price of a jar of peanut butter at 5 stores was: $3.29, $3.59, $3.79, $3.75, and $3.99.
Find the mean price.

Example 43: Mean from a Frequency Table


One hundred families in a particular neighborhood are asked their annual household income,
to the nearest $5 thousand dollars. The results are summarized in a frequency table below.

Income (thousands of dollars) Frequency


15 6
20 8
25 11
30 17
35 19
40 20
45 12
50 7

Caution! A common error that students make with summarized data is to add the incomes
themselves (15 + 20 + 25 + 30 + 35 + 40 + 45 + 50), then divide those by how many there
are. This method assumes there is only one income of $15,000, but in reality there are six
incomes of $15,000. We need to make sure to include all the data values.

Calculating the mean by hand could get tricky if we try to type in all 100 values:
6 terms 8 terms 11 terms

15   15  20   20  25   25 
100

We could calculate this more easily by noticing that adding 15 to itself six times is the same
as 15  6 = 90. Using this simplification, we get

15  6  20  8  25 11  30 17  35 19  40  20  45 12  50  7 3390


  33.9
100 100

The mean household income of our sample is 33.9 thousand dollars ($33,900).

Example 44: Impact of an Extreme Value on the Mean


Extending off the last example, suppose a new family moves into the neighborhood example
that has a household income of $5 million ($5000 thousand). Adding this to our sample, our
mean is now:
Statistics 32

15  6  20  8  25 11  30 17  35 19  40  20  45 12  50  7  5000 1 8390


  83.069
101 101

While 83.1 thousand dollars ($83,069) is the correct mean household income, it no longer
represents a “typical” value.

Imagine the data values on a see-saw or balance scale. The mean is the value that keeps the
data in balance, like in the picture below.

If we graph our household data, the $5 million data value is so far out to the right that the
mean has to adjust up to keep things in balance

For this reason, when working with data that have outliers – values far outside the primary
grouping – it is common to use a different measure of center, the median.

Median
The median of a set of data is the value in the middle when the data is in order

To find the median, begin by listing the data in order from smallest to largest, or largest
to smallest.

If the number of data values, n is odd, then the median is the middle data value.
If the number of data values is even, there is no one middle value, so we find the mean
of the two middle values.

Example 45: Touchdown Median


Returning to the football touchdown data, we would start by listing the data in order.
Luckily, it was already in decreasing order, so we can work with it without needing to
reorder it first.
37 33 33 32 29 28 28 23 22 22 22 21 21 21 20
20 19 19 18 18 18 18 16 15 14 14 14 12 12 9 6

Since there are 31 data values, an odd number, the median will be the middle number, the
16th data value (31/2 = 15.5, round up to 16, leaving 15 values below and 15 above). The
16th data value is 20, so the median number of touchdown passes in the 2000 season was 20
passes. Notice that for this data, the median is fairly close to the mean we calculated earlier,
20.5.
Statistics 33

Example 46: Quiz Score Median


Find the median of these quiz scores: 5 10 8 6 4 8 2 5 7 7

We start by listing the data in order: 2 4 5 5 6 7 7 8 8 10

Since there are 10 data values, an even number, there is no one middle number. So we find
the mean of the two middle numbers, 6 and 7, and get (6+7)/2 = 6.5.

The median quiz score was 6.5.

Try it Now 13:


The price of a jar of peanut butter at 5 stores were: $3.29, $3.59, $3.79, $3.75, and $3.99.
Find the median price.

Example 47: Median Income


Let us return now to our original household income data

Income (thousands of dollars) Frequency


15 6
20 8
25 11
30 17
35 19
40 20
45 12
50 7

Here we have 100 data values. If we didn’t already know that, we could find it by adding the
frequencies. Since 100 is an even number, we need to find the mean of the middle two data
values - the 50th and 51st data values. To find these, we start counting up from the bottom:
There are 6 data values of $15, so Values 1 to 6 are $15 thousand
The next 8 data values are $20, so Values 7 to (6+8) =14 are $20 thousand
The next 11 data values are $25, so Values 15 to (14+11) = 25 are $25 thousand
The next 17 data values are $30, so Values 26 to (25+17) = 42 are $30 thousand
The next 19 data values are $35, so Values 43 to (42+19) = 61 are $35 thousand

From this we can tell that values 50 and 51 will be $35 thousand, and the mean of these two
values is $35 thousand. The median income in this neighborhood is $35 thousand.

Example 48: Effect of Extreme Value on the Median


If we add in the new neighbor with a $5 million household income, then there will be 101
data values, and the 51st value will be the median. As we discovered in the last example, the
Statistics 34

51st value is $35 thousand. Notice that the new neighbor did not affect the median in this
case. The median is not swayed as much by outliers as the mean.

In addition to the mean and the median, there is one other common measurement of the
"typical" value of a data set: the mode.

Mode
The mode is the element of the data set that occurs most frequently.

The mode is fairly useless with data like weights or heights where there are a large number of
possible values. The mode is most commonly used for categorical (qualitative) data, for
which median and mean cannot be computed.

Example 49: Car Color Mode


In our vehicle color survey, we collected the data

Color Frequency
Blue 3
Green 5
Red 4
White 3
Black 2
Grey 3

For this data, Green is the mode, since it is the data value that occurred the most frequently.

It is possible for a data set to have more than one mode if several categories have the same
frequency, or no modes if each every category occurs only once.

Try it Now 14:


Reviewers were asked to rate a product on a scale of 1 to 5. Find
a. The mean rating
b. The median rating
c. The mode rating

Rating Frequency
1 4
2 8
3 7
4 3
5 1
Statistics 35

Measures of Variation
Consider these three sets of quiz scores:

Section A: 5, 5, 5, 5, 5, 5, 5, 5, 5, 5

Section B: 0, 0, 0, 0, 0, 10, 10, 10, 10, 10

Section C: 4, 4, 4, 5, 5, 5, 5, 6, 6, 6

All three of these sets of data have a mean of 5 and median of 5, yet the sets of scores are
clearly quite different. In section A, everyone had the same score; in section B half the class
got no points and the other half got a perfect score, assuming this was a 10-point
quiz. Section C was not as consistent as section A, but not as widely varied as section B.

In addition to the mean and median, which are measures of the "typical" or "middle" value,
we also need a measure of how "spread out" or varied each data set is.

There are several ways to measure this "spread" of the data. The first is the simplest and is
called the range.

Range
The range is the difference between the maximum value and the minimum value of the
data set.

Be careful! We commonly think of a “range” as two different numbers (high and low
values), but the statistical range is a single value, the difference between these high and low
values.

Example 50: Quiz Score Range


Using the quiz scores from above,
Section A: 5, 5, 5, 5, 5, 5, 5, 5, 5, 5

Section B: 0, 0, 0, 0, 0, 10, 10, 10, 10, 10

Section C: 4, 4, 4, 5, 5, 5, 5, 6, 6, 6

For section A, the range is 0 since both maximum and minimum are 5 and 5 – 5 = 0
For section B, the range is 10 since 10 – 0 = 10
For section C, the range is 2 since 6 – 4 = 2

In the last example, the range seems to be revealing how spread out the data is. However,
suppose we add a fourth section, Section D, with scores 0, 5, 5, 5, 5, 5, 5, 5, 5, 10.

This section also has a mean and median of 5. The range is 10, yet this data set is quite
different than Section B. To better illuminate the differences, we’ll have to turn to more
sophisticated measures of variation.
Statistics 36

Standard deviation
The standard deviation is a measure of variation based on measuring how far each data
value deviates, or is different, from the mean. A few important characteristics:
 Standard deviation is always positive. Standard deviation will be zero if all the
data values are equal, and will get larger as the data spreads out.
 Standard deviation has the same units as the original data.
 Standard deviation, like the mean, can be highly influenced by outliers.

Using the data from section D, we could compute for each data value the deviation or
difference between the data value and the mean, 𝑥 − 𝑥̅ :

data value deviation 𝑥 − 𝑥̅ :


data value - mean
0 0-5 = -5
5 5-5 = 0
5 5-5 = 0
5 5-5 = 0
5 5-5 = 0
5 5-5 = 0
5 5-5 = 0
5 5-5 = 0
5 5-5 = 0
10 10-5 = 5

We would like to get an idea of the "average" deviation from the mean, but if we find the
average of the values in the second column the negative and positive values cancel each other
out (this will always happen), so to prevent this we square every value in the second column:

data value deviation: data value – mean deviation squared


𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2:
0 0-5 = -5 (-5)2 = 25
5 5-5 = 0 02 = 0
5 5-5 = 0 02 = 0
5 5-5 = 0 02 = 0
5 5-5 = 0 02 = 0
5 5-5 = 0 02 = 0
5 5-5 = 0 02 = 0
5 5-5 = 0 02 = 0
5 5-5 = 0 02 = 0
10 10-5 = 5 (5)2 = 25
Sum () ∑(𝑥 − 𝑥̅ ) = 0 ∑(𝑥 − 𝑥̅ )2 = 50

We then add the squared deviations up to get 25 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 25 =


50. Ordinarily we would then divide by the number of scores, n, (in this case, 10) to find the
mean of the deviations. But we only do this if the data set represents a population; if the data
Statistics 37

set represents a sample (as it almost always does), we instead divide by n - 1 (in this case,
10 - 1 = 9). 12

So in our example, we would have 50/10 = 5 if section D represents a population and 50/9 =
about 5.56 if section D represents a sample. These values (5 and 5.56) are called,
respectively, the population variance and the sample variance for section D.

Variance can be a useful statistical concept, but note that the units of variance in this instance
would be points-squared since we squared all of the deviations. What are points-
squared? Good question. We would rather deal with the units we started with (points in this
case), so to convert back we take the square root and get:
50
population standard deviation   5  2.2
10
or
50
sample standard deviation   2.4
9

If we are unsure whether the data set is a sample or a population, we will usually assume it is
a sample, and we will round answers to one more decimal place than the original data, as we
have done above.

Standard Deviation: (find the mean first)


1. Find the deviation of each data value from the mean. In other words, subtract the
mean from the data value, (𝑥 − 𝑥̅ ).
2. Square each deviation, (𝑥 − 𝑥̅ )2 .
3. Add the squared deviations, ∑(𝑥 − 𝑥̅ )2 .
4. Divide by n, the number of data values, if the data represents a whole population;
divide by n – 1 if the data is from a sample.
5. Compute the square root of the result.

∑(𝑥−𝑥̅ )2 ∑(𝑥−𝑥̅ )2
Population Variance: Sample Variance:
𝑛 𝑛−1

Population Standard Deviation: Sample Standard Deviation:


∑(𝑥−𝑥̅ )2 ∑(𝑥−𝑥̅ )2
𝜎 = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √ 𝑠 = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √
𝑛 𝑛−1

Example 51: Quiz Score Standard Deviation


Computing the standard deviation for Section B above, we first calculate that the mean is 5.
Using a table can help keep track of your computations for the standard deviation:

12The reason we do this is highly technical, but we can see how it might be useful by considering the case of a
small sample from a population that contains an outlier, which would increase the average deviation: the outlier
very likely won't be included in the sample, so the mean deviation of the sample would underestimate the mean
deviation of the population; thus we divide by a slightly smaller number to get a slightly bigger average
deviation.
Statistics 38

data value deviation: data value - mean deviation squared


0 0-5 = -5 (-5)2 = 25
0 0-5 = -5 (-5)2 = 25
0 0-5 = -5 (-5)2 = 25
0 0-5 = -5 (-5)2 = 25
0 0-5 = -5 (-5)2 = 25
10 10-5 = 5 (5)2 = 25
10 10-5 = 5 (5)2 = 25
10 10-5 = 5 (5)2 = 25
10 10-5 = 5 (5)2 = 25
10 10-5 = 5 (5)2 = 25

Assuming this data represents a population, we will add the squared deviations, divide by 10,
the number of data values, and compute the square root:

25  25  25  25  25  25  25  25  25  25 250
 5
10 10

Notice that the standard deviation of this data set is much larger than that of section D since
the data in this set is more spread out.

For comparison, the standard deviations of all four sections are:


Section A: 5 5 5 5 5 5 5 5 5 5 Standard deviation: 0
Section B: 0 0 0 0 0 10 10 10 10 10 Standard deviation: 5
Section C: 4 4 4 5 5 5 5 6 6 6 Standard deviation: 0.8
Section D: 0 5 5 5 5 5 5 5 5 10 Standard deviation: 2.2

Try it Now 15:


The price of a jar of peanut butter at 5 stores were: $3.29, $3.59, $3.79, $3.75, and $3.99.
Find the standard deviation of the prices.

Where standard deviation is a measure of variation based on the mean, quartiles are based
on the median. The median divides the data into two halves. Dividing each of the halves in
half again divides the data in quarters. Where we make the “slices” to divide the data in
quarters are called quartiles.

Quartiles
Quartiles are values that divide the data in quarters.

The first quartile (Q1) is the value so that 25% of the data values are below it; the third
quartile (Q3) is the value so that 75% of the data values are below it. You may have
guessed that the second quartile is the same as the median, since the median is the
value so that 50% of the data values are below it. This divides the data into quarters;
25% of the data is between the minimum and Q1, 25% is between Q1 and the median,
25% is between the median and Q3, and 25% is between Q3 and the maximum value
Statistics 39

Visually, if the bar below, represents the entire data set, we have:

25% 25% 25% 25%


    
Min. Q1 Q2 Q3 Max
Median

Quartiles are not a single-number summary of variation like standard deviation. The quartiles
are used with the median, minimum, and maximum values to form a 5 number summary of
the data.

Five number summary


The five number summary of a data set takes this form:
Minimum, Q1, Median, Q3, Maximum

If we have a small number of data values we can locate the quartiles by finding the median
first, then find each remaining quartile by finding the median of each half of the data, like
cutting the data in half, then cutting each half in half. Statistician John Tukey advocated this
method. Not everyone agrees on a specific method for finding the quartiles, thus there are
other methods that different software packages use (such as EXCEL). 13

Locating Quartiles
Divide the data in half to locate the median (Q2)
 If there are an odd number of data values, the median will be the middle data value.
 If there are an even number of data values, the median will fall between two data
values.
Divide each half of the data in half to locate the first and third quartiles (Q 1, Q3)
 If there are an odd number of data values in each half, the quartiles will be middle
data value.
 If there are an even number of data values in each half, the quartiles will fall
between two data values.

There are patterns with even and odd number of values (n):
 If n is a multiple of 4, all three quartiles (Q1, median, Q3) fall in between data
values.
 If n is even but not a multiple of 4, the median falls between values, but Q 1, Q2 are
data values.
 If n is odd and n – 1 is a multiple of 4, the median is a data value, and Q 1, Q3 fall in
between data values. If n is odd, and n – 1 is not a multiple of 4, all three quartiles
are data values.

Example 52: Height (Odd Number of Values)


Suppose we have measured 9 females and their heights (in inches), sorted from smallest to
largest are: 59, 60, 62, 64, 66, 67, 69, 70, 72

13 For a discussion of various methods and more information about quartiles, see
http://mathforum.org/library/drmath/view/60969.html
Statistics 40

We can “cut it in half and in half again” to find the quartiles. If we cut it in half, the first
slice is at 66:
59, 60, 62, 64, 66, 67, 69, 70, 72

We have 4 data values above 66 and 4 below. We the cut each of these halves in half:
59, 60, slice here, 62, 64, 66, 67, 69, slice here, 70, 72

Since there is an even number of values in each half, the “cuts” fall between two data values.
These quartiles are shown in parentheses in this example to show they are not actual data
values in the set:
59, 60, (61), 62, 64, 66, 67, 69, (69.5), 70, 72

We have: Q1 = 61, median = 66, Q3 = 69.5

Example 53: Height (Even Number of Values)


Suppose we had measured 8 females and their heights (in inches), sorted from smallest to
largest are:

59, 60, 62, 64, 66, 67, 69, 70

If we cut it in half, the first cut falls between 64 and 66:


59, 60, 62, 64, (slice here), 66, 67, 69, 70

We have 4 data values above 66 and 4 below. We the cut each of these halves in half:
59, 60, (slice here), 62, 64, (65), 66, 67, (slice here), 69, 70

Since there is an even number of values in each half, the “cuts” fall between two data values.
These quartiles are shown in parentheses in this example to show they are not actual data
values in the set:
59, 60, (61), 62, 64, (65), 66, 67, (68), 69, 70
We have: Q1 = 61, median = 65, Q3 = 68.

As mentioned previously, the 5-number summary includes the first and third quartile with the
minimum, median, and maximum values.

Example 54: 5-Number Summary


For the 9 female sample, the median is 66, the minimum is 59, and the maximum is 72.
59, 60, (61), 62, 64, 66, 67, 69, (69.5), 70, 72

The 5 number summary is: Min = 59, Q1 = 61, Median = 66, Q3 = 69.5, Max = 72.

For the 8 female sample, the median is 65, the minimum is 59, and the maximum is 70.
59, 60, (61), 62, 64, (65), 66, 67, (68), 69, 70

The 5 number summary would be: 59, 61, 65, 68, and 70.
Statistics 41

Example 55: Quiz Score Quartiles


Five-number summaries for the quiz scores are shown below:

Section and data 5-number summary


Section A: 5, 5, 5, 5, 5, 5, 5, 5, 5, 5 5, 5, 5, 5, 5
Section B: 0, 0, 0, 0, 0, 10, 10, 10, 10, 10 0, 0, 5, 10, 10
Section C: 4, 4, 4, 5, 5, 5, 5, 6, 6, 6 4, 4, 5, 6, 6
Section D: 0 5 5 5 5 5 5 5 5 10 0, 5, 5, 5, 10

Of course, with a relatively small data set, finding a five-number summary is a bit silly, since
the summary contains almost as many values as the original data.

Try it Now 16:


The total cost of textbooks for the term was collected from 36 students. Find the 5 number
summary of this data.
$140 $160 $160 $165 $180 $220 $235 $240 $250 $260 $280 $285
$285 $285 $290 $300 $300 $305 $310 $310 $315 $315 $320 $320
$330 $340 $345 $350 $355 $360 $360 $380 $395 $420 $460 $460

Example 56: 5-number Summary from a Frequency Table


Returning to the household income data from earlier, create the five-number summary.

Income (thousands of dollars) Frequency


15 6
20 8
25 11
30 17
35 19
40 20
45 12
50 7

By adding the frequencies, we can see there are 100 data values represented in the table. In
Example 47, we found the median was $35 thousand. We can see in the table that the
minimum income is $15 thousand, and the maximum is $50 thousand.

To find Q1, we need to find halfway through the bottom 50 incomes, which would be
between the 25th and 26th income.
Counting up in the data as we did before,

There are 6 data values of $15, so Values 1 to 6 are $15 thousand


The next 8 data values are $20, so Values 7 to (6+8) = 14 are $20 thousand
The next 11 data values are $25, so Values 15 to (14+11) = 25 are $25 thousand
The next 17 data values are $30, so Values 26 to (25+17) = 42 are $30 thousand
Statistics 42

The 25th data value is $25 thousand, and the 26th data value is $30 thousand, so Q1 will be the
mean of these: (25 + 30)/2 = $27.5 thousand.

To find Q3, we need to find halfway through the top 50 incomes, which would be between
the 75th and 76th income.

Continuing our counting from earlier,


The next 19 data values are $35, so Values 43 to (42+19) = 61 are $35 thousand
The next 20 data values are $40, so Values 61 to (61+20) = 81 are $40 thousand

Both the 75th and 76th data values lie in this group, so Q3 will be $40 thousand.

Putting these values together into a five-number summary, we get: 15, 27.5, 35, 40, 50

Note that the 5 number summary divides the data into four intervals, each of which will
contain about 25% of the data. In the previous example, that means about 25% of
households have income between $40 thousand and $50 thousand.

For visualizing data, there is a graphical representation of a 5-number summary called a box
plot, or box and whisker graph.

Box plot
A box plot is a graphical representation of a five-number summary.

To create a box plot or box-whisker plot:

1. Draw a number line.


2. Locate the 5-number summary values on the number line and make a mark above
each value; the plot is located above the number line not on the number line itself.
3. Draw a box from the first quartile to the third quartile, and a line through the box at
the median.
4. Draw “whiskers” extended from Q1, the first quartile, out to the minimum value
and from Q3, the third quartile, out to the maximum value.
Statistics 43

Example 57: Household Income Box Plot


The box plot below is based on the household income data with 5 number summary:
15, 27.5, 35, 40, 50

Try it Now 17:


Create a box plot based on the textbook price data from the last Try it Now.

Box plots are particularly useful for comparing data from two populations.

Example 58: Comparing Box Plots


The box plot of service times for two fast-food restaurants is shown below.

While store 2 had a slightly shorter median service time (2.1 minutes vs. 2.3 minutes), store 2
is less consistent, with a wider spread of the data.

At store 1, 75% of customers were served within 2.9 minutes, while at store 2, 75% of
customers were served within 5.7 minutes.

Which store should you go to in a hurry? That depends upon your opinions about luck –
25% of customers at store 2 had to wait between 5.7 and 9.6 minutes.
Statistics 44

Example 59: Comparing Box Plots II


The boxplot below is based on the birth weights of infants with severe idiopathic respiratory
distress syndrome (SIRDS)14. The boxplot is separated to show the birth weights of infants
who survived and those that did not.

Comparing the two groups, the boxplot reveals that the birth weights of the infants that died
appear to be, overall, smaller than the weights of infants that survived. In fact, we can see
that the median birth weight of infants that survived is the same as the third quartile of the
infants that died.

Similarly, we can see that the first quartile of the survivors is larger than the median weight
of those that died, meaning that over 75% of the survivors had a birth weight larger than the
median birth weight of those that died.

Looking at the maximum value for those that died and the third quartile of the survivors, we
can see that over 25% of the survivors had birth weights higher than the heaviest infant that
died.

The box plot gives us a quick, albeit informal, way to determine that birth weight is quite
likely linked to survival of infants with SIRDS.

Measures of Position
When we calculate the median and quartiles, we describe the position of the data values in
relation to the entire data set. If a data value is above the third quartile, then it is in the top
25% of the data values. If we divide our data sets into 100 sections, using 99 “slices” instead
of the three we use with quartiles, these 99 “slices” are known as percentiles. If a data value
lies at the 65th percentile, then it is higher than 65% of the data values in the set and lower
than 35% of the data values.
14 van Vliet, P.K. and Gupta, J.M. (1973) Sodium bicarbonate in idiopathic respiratory distress syndrome. Arch.
Disease in Childhood, 48, 249–255. As quoted on
http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398296&section=1.1.3
Statistics 45

We can also measure the position of the data using the mean and standard deviation. Recall
that the standard deviation is a form of average distance or deviation from the mean. If we
compare a data value’s distance from the mean to the overall average distance from the
mean, we create a z-score.

Z-score
A z-score measures how many standard deviations a particular data value is from the
mean, and is found by calculating:
𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛
𝑧 =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

𝑥−𝑥̅
Symbolically: 𝑧=
𝑠

Using z-scores, you can compare data values. Larger z-scores indicate greater distance from
the mean. The further out from the mean a data value is, the more atypical or unusual the
data value.

Example 60: Strange Men?


Suppose the mean height of adult males is 69.0 inches, with a standard deviation of 2.8
inches. Find the z-scores for:

a. Actor Danny DeVito, 5 ft. tall15


b. Basketball Player Shaquille O’Neal, 7 ft. 1 inch 16
c. MCC Lab Tech Brian Clifford, 5 ft. 11 inches

Danny DeVito: 5 feet tall converted to inches is 60 inches, so Danny’s z-score is:
60.0 − 69.0
𝑧= ≈ −3.21
2.8
Notice how Danny’s z-score is negative. This indicates that he is 3.21 standard deviations
below the mean, or below average.

Shaquille O’Neal: 7 feet 1 inch tall converted to inches is 85 inches, so Shaquille’s z-scores
is:
85.0 − 69.0
𝑧= ≈ 5.71
2.8
Shaq is 5.71 standard deviations above the mean, or above average. Notice that Shaq is
further above the mean than Danny is below it.

Brian Clifford: 5 feet 11 inches tall converted to inches is 71 inches, so Brian’s z-score is:
71.0 − 69.0
𝑧= ≈ 0.71
2.8
Brian is only slightly above average.

15 http://www.imdb.com/name/nm0000362/
16 http://shaq.com/facts
Statistics 46

We can see that z-scores give us distance from the mean for particular data values. From the
previous example we can see that both Danny DeVito and Shaquille O’Neal vary greatly
from the mean, with Shaquille more standard deviations above the mean than Danny is
below. Brian is closer to the mean and his height is more “average.”

If data is normally distributed, with a perfectly symmetrical bell-shape, than the mean has a
z-score of 0. Using calculus and the area under the curve, we can calculate what percentage
of the population is above or below a given z-score. The graph 17 below shows a standard
normal curve.

In a standard normal distribution, the mean values in the middle (indicated with  on the
graph above). Most of the data lies close to the mean (the peak). Within one standard
deviation above and below the mean (z =  1) lies 68.2% of the data. As we move further out
from the mean, the data tapers off. Within two standard deviations of the mean (z =  2) lies
95.4% of the data. Within three standard deviations (z =  3) lies 99.6% of the data. A very
small percent of the data lies beyond three standard deviations.

Usual Range
If we consider than 99.6% of the data lies within 2 standard deviations of the mean,
then z-scores between -2 and 2 are considered within a usual range (-2  z  2), and z-
scores outside this range (z  2 or z  -2) are considered unusual.

Using this distribution and our previously calculated z-scores, we can see that both Danny
DeVito (z = -3.21) and Shaquille O’Neal (z = 5.71) are very unusual for their height in
comparison to the rest of the population. Shaq is more unusually tall than Danny is
unusually short as his z-score is further away from the mean (z = 0) than Danny is. Brian,
who has a z-score of 0.71 is within 2 standard deviations of the mean, and within the “usual
range.”

Z-scores also give us a way to compare values from different sets with different means and
standard deviations.

17 By Mwtoews [CC-BY-2.5], via Wikimedia Commons


Statistics 47

Example 61: M&Ms and Skittles


Suppose we are comparing standard packages of M&Ms and Skittles, and find that the
number of green M&Ms in a package has a mean of 6.1 and standard deviation of 2.0, while
the number of green Skittles in a package has a mean of 7.2 and a standard deviation of 1.6.

Which would be more unusual, a package with 3 green M&Ms or a package with 3 green
skittles?

Calculate the z-scores for each, using the mean and standard deviation:
3−6.1
M&Ms: 𝑧 = = −1.55
2.0
3−7.2
Skittles: 𝑧 = ≈ −2.63
1.6

We can see that 3 green candies would be a below average for both of them, but it is not
unusual for M&Ms as it is within 2 standard deviations of the mean, while it is unusual for
Skittles, as it is more than 2 standard deviations from the mean.

Try it now 18:


Suppose the mean number of M&Ms in a fun-size package is listed as 18 with a standard
deviation of 1.8. Would it be more unusual to have a package with 16 or a package with 21
M&Ms? Explain.

We can use z-scores and the area under the normal curve to find percentages of data values
above or below a particular value and use these percentages to predict the probability of a
data value falling in a particular range of values.

From our previous example, we know that 3 green M&Ms in a standard size package is not
unusual, as it had a z-score of -1.55, still within two standard deviations of the mean. What
percent of packages of M&Ms can we expect to have less than 3 green M&Ms?

Visualizing the normal curve, recall that the peak


(middle) of the graph is z = 0. We can mark where z =
-1.55 on the number line below the curve, and where
this z-score “slices” the curve. The percent of packages
of M&Ms with less than 3 green M&Ms is the area
shown below this line.

Statistical tables list the area by z-score. Different


tables refer to the area in different ways. The table we
use in this text always lists area to the left of the particular z-score. Since we want to know
the percent of packages of M&Ms less than 3, we want the area below the z-score -1.55,
which is the area to the left of the z-score as we sketched on the normal curve. Since this
matches the table area, we just need to read the table to find this percent.

The table has z-scores to the tenths place in rows, and the hundredths place is indicated by
column. To find the z-score -1.55, we find the row with -1.5 and read that row over until
we’re under the column .05. The percent (in decimal form) for a z-score of -1.55 is 0.0606 or
Statistics 48

about 6.06% of the packages of M&Ms will have less than 3 green M&Ms. We could also
say the probability of getting a package with less than 3 green M&Ms is 6.06%.

We can also use the complement to predict the percent of packages that have more than 3
M&Ms: 100% - 6.06% = 93.94%.

To find the percent of data related to a z-score (between any two, or above or
below a particular one):
1. Find the z-score for the values you are working with
2. Draw a picture and shade where you are looking on the normal curve
3. Look up the percent that matches the z-score and relate that to the area you’re
looking for, remembering that our table shows the area that is to the left of the z-score.

Example 62: IQ Scores


Suppose IQ scores are normally distributed with a mean of 100 and a standard deviation of
15. Find the percent of IQ scores that are:
a. Greater than 128
b. Between 128 and 146
c. Less than 146

For each case, we need to find z-scores.


128−100 146−100
The z-score for 128 is: 𝑧 = ≈ 1.87. The z-score for 146 is: 𝑧 = ≈ 3.07.
15 15

For a. IQ scores greater than 128, we need the area above the z-
score of 128, which is the area to the right of the z-score. The
table gives the area to the left of the z-score, so we need to find
the complement of the table percent.

Using the table, we find 1.87 with the value 0.9693. The complement is 1 – 0.9693 = 0.0307
or 3.07% of IQ scores are greater than 127.

For b., between 128 and 146, we want the area between the z-scores
of 1.87 and 3.07. Subtracting the two areas will give us the area
between them. The area for z = 1.87 is 0.9693, while the area for z
= 3.07 is 0.9989. The difference is 0.9989 – 0.9693 = 0.0296 or
2.96% of IQs are between 128 and 146.
Statistics 49

For part c. IQ scores less than 146, the area for z = 3.07 matches this question (less than 146),
so 0.9989 or 99.89% of the IQ scores are less than 146.

We can also work backwards and solve for a particular data value from a given percent.

Example 63: Top 10% IQ Scores


Using the mean of 100 and standard deviation of 15 for IQ scores, what score is the cutoff for
the top 10% of all IQ scores?

We know how to calculate a z-score, but this question asks for a data value (x).
𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛 𝑥−100
We know 𝑧 = , and with our given information, 𝑧 = .
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 15

Since the table gives us percents and we know the percent needs to be the top 10% = 0.1000,
we look in the body of the table for the closest value to the complement of 0.1000 = 0.9000,
since the table gives values below rather than above z-scores.

The z-score is in the 1.2 row and between the 0.08 and 0.09 column. Since 0.9000 is closer
to 0.8997 than 0.9015, we use z = 1.28:

𝑥 − 100
1.28 =
15

And solve for x (multiply both sides by 15):


19.2 = x – 100 (add 100 to both sides)
119.2 = x

So if your IQ score is higher than 119.2, than you’re in the top 10% of IQ scores (you’re not
unusual, though, as your z-score is still 1.28).

If you are feeling clever, you can use the symmetry of the curve to solve this problem. If you
look in the table for where 0.1000 lies, it is closest to z = -1.28, same z-score, but opposite
sign.

Try it now 19:


Using the mean height of men to be 69 inches with a standard deviation of 2.8 inches, find:
The percent of men under 6 feet tall
The percent of men over 6 feet tall
The percent of men between 5.5 feet and 6 feet tall
The cutoff height for the top 5% of men
Voting and Apportionment 1

“A vote is like a rifle: its usefulness depends upon the character of the user.”
—Theodore Roosevelt 1

Voting Theory
A group of friends decides which movie to watch. A company decides which product
design to manufacture. A democratic country elects its leader. In many decision making
situations, it is necessary to gather the group consensus.

While the basic idea of voting is fairly universal, the method by which those votes are used to
determine a winner can vary. Amongst a group of friends, you may decide upon a movie by
voting for all the movies you’re willing to watch, with the winner being the one with the
greatest approval. A company might eliminate unpopular designs then revote on the
remaining. A country might look for the candidate with the most votes.

In deciding upon a winner, there is always one main goal: to reflect the preferences of the
people in the most fair way possible.

Preference Schedules
A traditional ballot usually asks you to pick your favorite from a list of choices. This ballot
fails to provide any information on how a voter would rank the alternatives if their first
choice was unsuccessful. A preference ballot provides more information than a traditional
ballot.

Preference ballot
A preference ballot is a ballot in which the voter ranks the choices in order of
preference.

Example 1: Preference Ballot and Schedule


A vacation club is trying to decide which destination to visit this year: Hawaii (H), Orlando
(O), or Anaheim (A). Their votes are shown below:

Bob Ann Marv Alice Eve Omar Lupe Dave Tish Jim
st
1 choice A A O H A O H O H A
2nd choice O H H A H H A H A H
3rd choice H O A O O A O A O O

These individual ballots are typically combined into one preference schedule, which shows
the number of voters in the top row that voted for each option:

Notice that by totaling the vote counts 1 3 3 3


across the top of the preference schedule 1st choice A A O H
we can recover the total number of votes 2nd choice O H H A
cast: 1+3+3+3 = 10 total votes. 3rd choice H O A O

1 http://www.goodreads.com/quotes/4662-a-vote-is-like-a-rifle-its-usefulness-depends-upon
© David Lippman, edited LC Creative Commons BY-SA
2

Plurality
The voting method we’re most familiar with in the United States is the plurality method.

Plurality Method
In this method, the choice with the most first-preference votes is declared the winner.
Ties are possible, and would have to be settled through some sort of run-off vote.

This method is sometimes mistakenly called the majority method, or “majority rules”, but it
is not necessary for a choice to have gained a majority of votes to win. A majority is over
50%; it is possible for a winner to have a plurality without having a majority.

Example 2: Plurality Method


In our election from above, we had the preference table:

1 3 3 3
1st choice A A O H
2nd choice O H H A
3rd choice H O A O

For the plurality method, we only care about the first choice options. Totaling them up:
Anaheim: 1+3 = 4 first-choice votes
Orlando: 3 first-choice votes
Hawaii: 3 first-choice votes

Anaheim is the winner using the plurality voting method.

Notice that Anaheim won with 4 out of 10 votes, 40% of the votes, which is a plurality of the
votes, but not a majority.

Try it Now 1:
Three candidates are running in an election for County Executive: Goings (G), McCarthy
(M), and Bunney (B) 2. The voting schedule is shown below. Which candidate wins under
the plurality method?

44 14 20 70 22 80 39
st
1 choice G G G M M B B
2nd choice M B G B M
3rd choice B M B G G

Note: In the third column and last column, those voters only recorded a first-place vote, so
we don’t know who their second and third choices would have been.

2This data is loosely based on the 2008 County Executive election in Pierce County, Washington. See
http://www.co.pierce.wa.us/xml/abtus/ourorg/aud/Elections/RCV/ranked/exec/summary.pdf
Voting and Apportionment 3

What’s Wrong with Plurality?


The election from Example 2 may seem totally clean, but there is a problem lurking that
arises whenever there are three or more choices. Looking back at our preference table, how
would our members vote if they only had two choices?

Anaheim vs Orlando: 7 out of the 10 would prefer Anaheim over Orlando


1 3 3 3
st
1 choice A A O H
2nd choice O H H A
rd
3 choice H O A O

Anaheim vs Hawaii: 6 out of 10 would prefer Hawaii over Anaheim


1 3 3 3
st
1 choice A A O H
nd
2 choice O H H A
3rd choice H O A O

This doesn’t seem right, does it? Anaheim just won the election, yet 6 out of 10 voters, 60%
of them, would have preferred Hawaii! That hardly seems fair. Marquis de Condorcet, a
French philosopher, mathematician, and political scientist wrote about how this could happen
in 1785, and for him we name our first fairness criterion.

Fairness Criteria
The fairness criteria are statements that seem like they should be true in a fair election.

Condorcet Criterion
If there is a choice that is preferred in every one-to-one comparison with the other
choices, that choice should be the winner. We call this winner the Condorcet Winner,
or Condorcet Candidate.

Example 3: Condorcet Winner


In the election from Example 2, what choice is the Condorcet Winner?

We see above that Hawaii is preferred over Anaheim. Comparing Hawaii to Orlando, we can
see 6 out of 10 would prefer Hawaii to Orlando.

1 3 3 3
st
1 choice A A O H
2nd choice O H H A
3rd choice H O A O

Since Hawaii is preferred in a one-to-one comparison to both other choices, Hawaii is the
Condorcet Winner.
4

Example 4: Plurality vs. Condorcet


Consider a city council election in a district that is historically 60% Democratic voters and
40% Republican voters. Even though city council is technically a nonpartisan office, people
generally know the affiliations of the candidates. In this election there are three candidates:
Don and Key, both Democrats, and Elle, a Republican. A preference schedule for the votes
looks as follows:

342 214 298


1st choice Elle Don Key
2nd choice Don Key Don
3rd choice Key Elle Elle

We can see a total of 342 + 214 + 298 = 854 voters participated in this election. Computing
percentage of first place votes:

Don: 214/854 = 25.1%


Key: 298/854 = 34.9%
Elle: 342/854 = 40.0%

So in this election, the Democratic voters split their vote over the two Democratic candidates,
allowing the Republican candidate Elle to win under the plurality method with 40% of the
vote.

Analyzing this election closer, we see that it violates the Condorcet Criterion. Analyzing the
one-to-one comparisons:

Elle vs Don: 342 prefer Elle; 512 prefer Don: Don is preferred
Elle vs Key: 342 prefer Elle; 512 prefer Key: Key is preferred
Don vs Key: 556 prefer Don; 298 prefer Key: Don is preferred

So even though Don had the smallest number of first-place votes in the election, he is the
Condorcet winner, being preferred in every one-to-one comparison with the other candidates.

Try it Now 2:
Consider the election from Try it Now 1. Is there a Condorcet winner in this election?

44 14 20 70 22 80 39
st
1 choice G G G M M B B
2nd choice M B G B M
3rd choice B M B G G
Voting and Apportionment 5

Insincere Voting
Situations like the one in Example 4 above, when there are more than one candidate that
share somewhat similar points of view, can lead to insincere voting. Insincere voting is
when a person casts a ballot counter to their actual preference for strategic purposes. In the
case above, the democratic leadership might realize that Don and Key will split the vote, and
encourage voters to vote for Key by officially endorsing him. Not wanting to see their party
lose the election, as happened in the scenario above, Don’s supporters might insincerely vote
for Key, effectively voting against Elle.

Instant Runoff Voting


Instant Runoff Voting (IRV), also called Plurality with Elimination, is a modification of the
plurality method that attempts to address the issue of insincere voting.

Instant Runoff Voting (IRV) or Plurality with Elimination


In IRV, voting is done with preference ballots, and a preference schedule is generated.
The choice with the least first-place votes is then eliminated from the election, and any
votes for that candidate are redistributed to the voters’ next choice. This continues
until a choice has a majority (over 50%).

This is similar to the idea of holding runoff elections, but since every voter’s order of
preference is recorded on the ballot, the runoff can be computed without requiring a second
costly election.

This voting method is used in several political elections around the world, including election
of members of the Australian House of Representatives. A version of IRV is used by the
International Olympic Committee to select host nations.

Example 5: Plurality with Elimination


Consider the preference schedule below, in which a company’s advertising team is voting on
five different advertising slogans, called A, B, C, D, and E here for simplicity.

Initial votes
3 4 4 6 2 1
st
1 choice B C B D B E
2nd choice C A D C E A
3rd choice A D C A A D
4th choice D B A E C B
5th choice E E E B D C

If this was a plurality election, note that B would be the winner with 9 first-choice votes,
compared to 6 for D, 4 for C, and 1 for E.

There are total of 3+4+4+6+2+1 = 20 votes. A majority would be 11 votes, and B has 9/20 =
45%, C has 4/20 = 20%, D has 6/20 = 30%, E has 1/20 = 5%. No one yet has a majority, so
we proceed to elimination rounds.
6

Round 1: We make our first elimination. Choice A has the fewest first-place votes, so we
remove that choice
3 4 4 6 2 1
st
1 choice B C B D B E
2nd choice C D C E
3rd choice D C D
th
4 choice D B E C B
5th choice E E E B D C

We then shift everyone’s choices up to fill the gaps. There is still no choice with a majority
(still has B has 9/20 = 45%, C has 4/20 = 20%, D has 6/20 = 30%, E has 1/20 = 5%), so we
eliminate again.
3 4 4 6 2 1
st
1 choice B C B D B E
2nd choice C D D C E D
rd
3 choice D B C E C B
th
4 choice E E E B D C

Round 2: We make our second elimination. Choice E has the fewest first-place votes, so we
remove that choice, shifting everyone’s options to fill the gaps.
3 4 4 6 2 1
1st choice B C B D B D
nd
2 choice C D D C C B
rd
3 choice D B C B D C

Notice that the first and fifth columns have the same preferences now, we can condense those
down to one column.
5 4 4 6 1
st
1 choice B C B D D
2nd choice C D D C B
rd
3 choice D B C B C

Now B has 9/20 = 45%, C has 4/20 = 20% and D has 7/20 = 35%. Still no majority, so we
eliminate again.
Round 3: We make our third elimination. C has the fewest votes.
5 4 4 6 1
st
1 choice B D B D D
2nd choice D B D B B

Condensing this down:


9 11
1st choice B D
nd
2 choice D B

D has now 11/20 = 55%, while B is still 9/20 = 45%. D is declared the winner under IRV.
Voting and Apportionment 7

Try it Now 3:
Consider again the election from Try it Now 1. Find the winner using IRV (Plurality with
Elimination).

44 14 20 70 22 80 39
st
1 choice G G G M M B B
2nd choice M B G B M
3rd choice B M B G G

What’s Wrong with IRV?


Let’s return to our City Council Election, where Elle won using plurality but did not get the
majority of first choice votes.

342 214 298


1st choice Elle Don Key
2nd choice Don Key Don
3rd choice Key Elle Elle

In this election, Don has the smallest number of first place votes, so if we use plurality with
elimination, Don is eliminated in the first round. The 214 people who voted for Don have
their votes transferred to their second choice, Key.

342 512
st
1 choice Elle Key
2nd choice Key Elle

So Key is the winner under the IRV method, with 512/854  60% of the vote.

Notice that in this election, IRV violates the Condorcet Criterion, since we determined earlier
that Don was the Condorcet winner. On the other hand, the temptation has been removed for
Don’s supporters to vote for Key; they now know their vote will be transferred to Key, not
simply discarded.

Example 6: Changing Votes


Consider the voting system below, using IRV/plurality with elimination:
37 22 12 29
st
1 choice Adams Brown Brown Carter
nd
2 choice Brown Carter Adams Adams
3rd choice Carter Adams Carter Brown

No candidate has a majority. Using elimination, Carter would be eliminated in the first
round, and Adams receives Carter’s votes to be the winner with 66 votes to 34 for Brown.
8

Now suppose that the results were announced, but election officials accidentally destroyed
the ballots before they could be certified, and the votes had to be recast. Wanting to “jump
on the bandwagon,” 10 of the voters who had originally voted in the order Brown, Adams,
Carter change their vote to favor the presumed winner, changing those votes to Adams,
Brown, Carter.

47 22 2 29
st
1 choice Adams Brown Brown Carter
2nd choice Brown Carter Adams Adams
3rd choice Carter Adams Carter Brown

In this re-vote, Brown will be eliminated in the first round, having the fewest first-place
votes. After transferring Brown’s votes to Carter (22) and Adams (2), we find that Carter
will win this election with 51 votes to Adams’ 49 votes! Even though the only vote changes
made favored Adams, the change ended up costing Adams the election. This doesn’t seem
right, and introduces our second fairness criterion:

Monotonicity Criterion
If voters change their votes to increase the preference for a candidate, it should not
harm that candidate’s chances of winning.

This criterion is violated by this election. Note that even though the criterion is violated in
this particular election, it does not mean that IRV always violates the criterion; just that IRV
has the potential to violate the criterion in certain elections.

Borda Count
Borda Count is another voting method, named for Jean-Charles de Borda, who developed the
system in 1770.

Borda Count
In this method, points are assigned to candidates based on their ranking; 1 point for last
choice, 2 points for second-to-last choice, and so on. The point values for all ballots
are totaled, and the candidate with the largest point total is the winner.

Example 7: Borda Count


A group of mathematicians are getting together for a conference.
The members are coming from four cities: Seattle, Tacoma,
Puyallup, and Olympia. Their approximate locations on a map
are shown to the right.
Voting and Apportionment 9

The votes for where to hold the conference were:


51 25 10 14
1st choice Seattle Tacoma Puyallup Olympia
2nd choice Tacoma Puyallup Tacoma Tacoma
3rd choice Olympia Olympia Olympia Puyallup
4th choice Puyallup Seattle Seattle Seattle

In each of the 51 ballots ranking Seattle first, Puyallup will be given 1 point, Olympia 2
points, Tacoma 3 points, and Seattle 4 points. Multiplying the points per vote times the
number of votes allows us to calculate points awarded:

51 25 10 14
1st choice Seattle Tacoma Puyallup Olympia
4 points 4·51 = 204 4·25 = 4·10 = 40 4·14 = 56
100
2nd choice Tacoma Puyallup Tacoma Tacoma
3 points 3·51 = 153 3·25 = 75 3·10 = 30 3·14 = 42
3rd choice Olympia Olympia Olympia Puyallup
2 points 2·51 = 102 2·25 = 50 2·10 = 20 2·14 = 28
4th choice Puyallup Seattle Seattle Seattle
1 point 1·51 = 51 1·25 = 25 1·10 = 10 1·14 = 14

Adding up the points:


Seattle: 204 + 25 + 10 + 14 = 253 points
Tacoma: 153 + 100 + 30 + 42 = 325 points
Puyallup: 51 + 75 + 40 + 28 = 194 points
Olympia: 102 + 50 + 20 + 56 = 228 points

Under the Borda Count method, Tacoma is the winner of this vote.

Try it Now 4:
Consider again the election from Try it Now 1. Find the winner using Borda Count. Since
we have some incomplete preference ballots, for simplicity, give every unranked candidate 1
point, the points they would normally get for last place.

44 14 20 70 22 80 39
1st choice G G G M M B B
2nd choice M B G B M
3rd choice B M B G G
10

What’s Wrong with Borda Count?


You might have already noticed one potential flaw of the Borda Count from the previous
example. In that example, Seattle had a majority of first-choice votes, yet lost the election!
This seems odd, and prompts our next fairness criterion:

Majority Criterion
If a choice has a majority of first-place votes, that choice should be the winner.

The election from Example 7 using the Borda Count violates the Majority Criterion. Notice
also that this automatically means that the Condorcet Criterion will also be violated, as
Seattle would have been preferred by 51% of voters in any head-to-head comparison.

Borda count is sometimes described as a consensus-based voting system, since it can


sometimes choose a more broadly acceptable option over the one with majority support. In
the example above, Tacoma is probably the best compromise location. This is a different
approach than plurality and instant runoff voting that focus on first-choice votes; Borda
Count considers every voter’s entire ranking to determine the outcome.

Because of this consensus behavior, Borda Count, or some variation of it, is commonly used
in awarding sports awards. Variations are used to determine FIFA World Player of the Year
in soccer, to rank teams in NCAA sports, and to award the Heisman trophy.

Major League Baseball uses a modified Borda count to


determine the Most Valuable Player (MVP) award. Of the 10
votes each voter is allotted, 10th place votes receive 1 point, 9th
receive 2 points, and so on, with 2nd receiving 9 points, but 1st
receives 14 points instead of 10 points. 3
Detroit Tigers’ first baseman Miguel Cabrera 4 won the
National League MVP award for the 2012 and 2013 seasons.

Copeland’s Method (Pairwise Comparisons)


So far none of our voting methods have satisfied the Condorcet Criterion. The Copeland
Method specifically attempts to satisfy the Condorcet Criterion by looking at pairwise (one-
to-one) comparisons.

Copeland’s Method or Pairwise Comparisons


In this method, each pair of candidates is compared, using all preferences to determine
which of the two is more preferred. The more preferred candidate is awarded 1 point.
If there is a tie, each candidate is awarded ½ point. After all pairwise comparisons are
made, the candidate with the most points, and hence the most pairwise wins, is
declared the winner.

3 http://www.fairsportsrules.com/fair-sports-rules-blog/should-baseball-change-its-mvp-voting-system
4Photo by Keith Allison (Flickr: Miguel Cabrera) CC-BY-SA-2.0, via Wikimedia Commons
Voting and Apportionment 11

Variations of Copeland’s Method are used in many professional organizations, including


election of the Board of Trustees for the Wikimedia Foundation that runs Wikipedia.

Example 8: Copeland’s Method (Pairwise Comparisons)


Consider our vacation group example from the beginning of the chapter. Determine the
winner using Copeland’s Method.

1 3 3 3
1st choice A A O H
2nd choice O H H A
3rd choice H O A O

We need to look at each pair of choices, and see which choice would win in a one-to-one
comparison. You may recall we did this earlier when determining the Condorcet Winner.
For example, comparing Hawaii vs Orlando, we see that 6 voters, those shaded below in the
first table below, would prefer Hawaii to Orlando. Note that Hawaii doesn’t have to be the
voter’s first choice – we’re imagining that Anaheim wasn’t an option. If it helps, you can
imagine removing Anaheim, as in the second table below.

1 3 3 3 1 3 3 3
st st
1 choice A A O H 1 choice O H
2nd choice O H H A 2nd choice O H H
3rd choice H O A O 3rd choice H O O

Based on this, in the comparison of Hawaii vs Orlando, Hawaii wins, and receives 1 point.

Comparing Anaheim to Orlando, the 1 voter in the first column clearly prefers Anaheim, as
do the 3 voters in the second column. The 3 voters in the third column clearly prefer
Orlando. The 3 voters in the last column prefer Hawaii as their first choice, but if they had to
choose between Anaheim and Orlando, they'd choose Anaheim, their second choice overall.
So, altogether 1+3+3=7 voters prefer Anaheim over Orlando, and 3 prefer Orlando over
Anaheim. So, comparing Anaheim vs Orlando: 7 votes to 3 votes: Anaheim gets 1 point.

All together,

Hawaii vs Orlando: 6 votes to 4 votes: Hawaii gets 1 point


Anaheim vs Orlando: 7 votes to 3 votes: Anaheim gets 1 point
Hawaii vs Anaheim: 6 votes to 4 votes: Hawaii gets 1 point

Hawaii is the winner under Copeland’s Method, having earned the most points.

Notice this process is consistent with our determination of a Condorcet Winner.


12

Example 9: Copeland’s Method (Pairwise Comparison) with More Options


Consider the advertising group’s vote we explored earlier. Determine the winner using
Copeland’s method.

3 4 4 6 2 1
1st choice B C B D B E
2nd choice C A D C E A
3rd choice A D C A A D
4th choice D B A E C B
5th choice E E E B D C

With 5 candidates, there are 5C2 =10 pairwise comparisons to make:

A vs B: 11 votes to 9 votes A gets 1 point


A vs C: 3 votes to 17 votes C gets 1 point
A vs D: 10 votes to 10 votes A gets ½ point, D gets ½ point
A vs E: 17 votes to 3 votes A gets 1 point
B vs C: 10 votes to 10 votes B gets ½ point, C gets ½ point
B vs D: 9 votes to 11 votes D gets 1 point
B vs E: 13 votes to 7 votes B gets 1 point
C vs D: 9 votes to 11 votes D gets 1 point
C vs E: 17 votes to 3 votes C gets 1 point
D vs E: 17 votes to 3 votes D gets 1 point

Totaling these up:


A gets 2½ points
B gets 1½ points
C gets 2½ points
D gets 3½ points
E gets 0 points

Using Copeland’s Method, we declare D as the winner. Notice that in this case, D is not a
Condorcet Winner. While Copeland’s method will also select a Condorcet Candidate as the
winner, the method still works in cases where there is no Condorcet Winner.

Try it Now 5:
Consider again the election from Try it Now 1. Find the winner using Copeland’s method.
Since we have some incomplete preference ballots, we’ll have to adjust. For example, when
comparing M to B, we’ll ignore the 20 votes in the third column which do not rank either
candidate.

44 14 20 70 22 80 39
st
1 choice G G G M M B B
2nd choice M B G B M
3rd choice B M B G G
Voting and Apportionment 13

What’s Wrong with Copeland’s Method?


As already noted, Copeland’s Method does satisfy the Condorcet Criterion. It also satisfies
the Majority Criterion and the Monotonicity Criterion. So is this the perfect method? Well,
in a word, no.

Example 10: Copeland’s Method and Removing Votes


A committee is trying to award a scholarship to one of four students, Anna (A), Brian (B),
Carlos (C), and Dimitry (D). The votes are shown below:

5 5 6 4
1st choice D A C B
2nd choice A C B D
3rd choice C B D A
4th choice B D A C

Making the comparisons:


A vs B: 10 votes to 10 votes A gets ½ point, B gets ½ point
A vs C: 14 votes to 6 votes: A gets 1 point
A vs D: 5 votes to 15 votes: D gets 1 point
B vs C: 4 votes to 16 votes: C gets 1 point
B vs D: 15 votes to 5 votes: B gets 1 point
C vs D: 11 votes to 9 votes: C gets 1 point

Totaling:
A has 1 ½ points B has 1 ½ points
C has 2 points D has 1 point

So Carlos is awarded the scholarship. However, the committee then discovers that Dimitry
was not eligible for the scholarship (he failed his last math class). Even though this seems
like it shouldn’t affect the outcome, the committee decides to recount the vote, removing
Dimitry from consideration. This reduces the preference schedule to:

5 5 6 4
st
1 choice A A C B
2nd choice C C B A
3rd choice B B A C

A vs B: 10 votes to 10 votes A gets ½ point, B gets ½ point


A vs C: 14 votes to 6 votes A gets 1 point
B vs C: 4 votes to 16 votes C gets 1 point

Totaling:
A has 1 ½ points B has ½ point
C has 1 point

Suddenly Anna is the winner! This leads us to another fairness criterion.


14

The Independence of Irrelevant Alternatives (IIA) Criterion


If a non-winning choice is removed from the ballot, it should not change the winner of
the election.

Equivalently, if choice A is preferred over choice B, introducing or removing a choice


C should not cause B to be preferred over A.

In the election from Example 11, the IIA Criterion was violated.

This anecdote illustrating the IIA issue is attributed to Sidney Morgenbesser:

After finishing dinner, Sidney Morgenbesser decides to order dessert. The waitress
tells him he has two choices: apple pie and blueberry pie. Sidney orders the apple pie.
After a few minutes the waitress returns and says that they also have cherry pie at
which point Morgenbesser says "In that case I'll have the blueberry pie."

Another disadvantage of Copeland’s Method is that it is fairly easy for the election to end in
a tie. For this reason, Copeland’s method is usually the first part of a more advanced method
that uses more sophisticated methods for breaking ties and determining the winner when
there is not a Condorcet Candidate.

So Where’s the Fair Method?


At this point, you’re probably asking why we keep looking at method after method just to
point out that they are not fully fair. We must be holding out on the perfect method, right?

Unfortunately, no. A mathematical economist, Kenneth Arrow, was able to prove in 1949
that there is no voting method that will satisfy all the fairness criteria we have discussed.

Arrow’s Impossibility Theorem


Arrow’s Impossibility Theorem states, roughly, that it is not possible for a voting
method to satisfy every fairness criteria that we’ve discussed.

To see a very simple example of how difficult voting can be, consider the election below:

5 5 5
1st choice A C B
2nd choice B A C
3rd choice C B A

Notice that in this election:


10 people prefer A to B
10 people prefer B to C
10 people prefer C to A
Voting and Apportionment 15

No matter whom we choose as the winner, 2/3 of voters would prefer someone else! This
scenario is dubbed Condorcet’s Voting Paradox, and demonstrates how voting preferences
are not transitive (just because A is preferred over B, and B over C, does not mean A is
preferred over C). In this election, there is no fair resolution.

It is because of this impossibility of a totally fair method that Plurality, IRV, Borda Count,
Copeland’s Method, and dozens of variants are all still used. Usually the decision of which
method to use is based on what seems most fair for the situation in which it is being applied.

Approval Voting
Up until now, we’ve been considering voting methods that require ranking of candidates on a
preference ballot. There is another method of voting that can be more appropriate in some
decision making scenarios. With Approval Voting, the ballot asks you to mark all choices
that you find acceptable. The results are tallied, and the option with the most approval is the
winner.

Example 11: Approval Voting


A group of friends is trying to decide upon a movie to watch. Three choices are provided,
and each person is asked to mark with an “X” which movies they are willing to watch. The
results are:

Bob Ann Marv Alice Eve Omar Lupe Dave Tish Jim
Titanic X X X X X
Scream X X X X X X
The Matrix X X X X X X X

Totaling the results, we find


Titanic received 5 approvals
Scream received 6 approvals
The Matrix received 7 approvals.

In this vote, The Matrix would be the winner.

Try it Now 6:
Our mathematicians deciding on a conference location from earlier decide to use Approval
voting. Their votes are tallied below. Find the winner using Approval voting.

30 10 15 20 15 5 5
Seattle X X X X
Tacoma X X X X X
Puyallup X X X X
Olympia X X X
16

What’s Wrong with Approval Voting?


Approval voting can very easily violate the Majority Criterion.

Example 12: Approval Voting: Least Disliked vs. Most Liked


Consider the voting schedule:

80 15 5
st
1 choice A B C
2nd choice B C B
3rd choice C A A

Clearly A is the majority winner. Now suppose that this election was held using Approval
Voting, and every voter marked approval of their top two candidates.

A would receive approval from 80 voters


B would receive approval from 100 voters
C would receive approval from 20 voters

B would be the winner. Some argue that Approval Voting tends to vote the least disliked
choice, rather than the most liked candidate.

Additionally, Approval Voting is susceptible to strategic insincere voting, in which a voter


does not vote their true preference to try to increase the chances of their choice winning. For
example, in the movie example above, suppose Bob and Alice would much rather watch
Scream. They remove The Matrix from their approval list, resulting in a different result.

Bob Ann Marv Alice Eve Omar Lupe Dave Tish Jim
Titanic X X X X X
Scream X X X X X X
The Matrix X X X X X

Totaling the results, we find Titanic received 5 approvals, Scream received 6 approvals, and
The Matrix received 5 approvals. By voting insincerely, Bob and Alice were able to sway
the result in favor of their preference.

Voting in America
In American politics, there is a lot more to selecting our representatives than simply casting
and counting ballots. The process of selecting the president is even more complicated, so
we’ll save that for the next chapter. Instead, let’s look at the process by which state
congressional representatives and local politicians get elected.

For most offices, a sequence of two public votes is held: a primary election and the general
election. For non-partisan offices like sheriff and judge, in which political party affiliation is
not declared, the primary election is usually used to narrow the field of candidates.
Voting and Apportionment 17

Typically, the two candidates receiving the most votes in the primary will then move forward
to the general election. While somewhat similar to instant runoff voting, this is actually an
example of sequential voting - a process in which voters cast totally new ballots after each
round of eliminations. Sequential voting has become quite common in television, where it is
used in reality competition shows like American Idol.

Congressional, county, and city representatives are partisan offices, in which candidates
usually declare themselves a member of a political party, like the Democrats, Republicans,
the Green Party, or one of the many other smaller parties. As with non-partisan offices, a
primary election is usually held to narrow down the field prior to the general election. Prior
to the primary election, the candidate would have met with the political party leaders and
gotten their approval to run under that party’s affiliation.

In some states a closed primary is used, in which only voters who are members of the
Democrat party can vote on the Democratic candidates, and similar for Republican voters. In
other states, an open primary is used, in which any voter can pick the party whose primary
they want to vote in. In other states, caucuses are used, which are basically meetings of the
political parties, only open to party members. Closed primaries are often disliked by
independent voters, who like the flexibility to change which party they are voting in. Open
primaries do have the disadvantage that they allow raiding, in which a voter will vote in their
non-preferred party’s primary with the intent of selecting a weaker opponent for their
preferred party’s candidate.

Apportionment
Apportionment is the problem of dividing up a fixed number of things among groups of
different sizes. In politics, this takes the form of allocating a limited number of
representatives amongst voters. Presumably this problem is older than the United States, but
the best-known ways to solve it have their origins in the problem of assigning each state an
appropriate number of representatives in the new Congress when the country was formed.
States also face this apportionment problem in defining how to draw districts for state
representatives. The apportionment problem comes up in a variety of non-political areas too,
though. We face several restrictions in this process:

Apportionment rules
1. The things being divided up can exist only in whole numbers.
2. We must use all of the things being divided up, and we cannot use any more.
3. Each group must get at least one of the things being divided up.
4. The number of things assigned to each group should be at least approximately
proportional to the population of the group.

Exact proportionality isn’t possible because of the whole number requirement, but we
should try to be close, and in any case, if Group A is larger than Group B, then Group
B shouldn’t get more of the things than Group A does.
18

In terms of the apportionment of the United States House of Representatives, these rules
imply:
1. We can only have whole representatives (a state can’t have 3.4 representatives)
2. We can only use the (currently) 435 representatives available. If one state gets
another representative, another state has to lose one.
3. Every state gets at least one representative
4. The number of representatives each state gets should be approximately proportional
to the state population. This way, the number of constituents each representative has
should be approximately equal.

We will look at different of solving the apportionment problem. Three of them have been
used at various times to apportion the U.S. Congress, although the method currently in use
(the Huntington-Hill method) is significantly more complicated.

Hamilton’s Method
Alexander Hamilton proposed the method that now bears his name. His method was
approved by Congress in 1791, but was vetoed by President Washington. It was later
adopted in 1852 and used through 1911. He begins by determining, to several decimal
places, how many things each group should get. Since he was interested in the question of
Congressional representation, we’ll use the language of states and representatives, so he
determines how many representatives each state should get. He follows these steps:

Hamilton’s Method
1. Determine how many people each representative should represent. Do this by
dividing the total population of all the states by the total number of representatives.
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
This answer is called the divisor: 𝑑𝑖𝑣𝑖𝑠𝑜𝑟 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒𝑠

2. Divide each state’s population by the divisor to determine how many


representatives it should have. Record this answer to several decimal places. This
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
answer is called the quota: 𝑞𝑢𝑜𝑡𝑎 =
𝑑𝑖𝑣𝑖𝑠𝑜𝑟

Since we can only allocate whole representatives, Hamilton resolves the whole number
problem, as follows:

3. Cut off all the decimal parts of all the quotas by rounding down to the nearest
whole number (but don’t forget what the decimals were). These are called the
lower quotas. Add up the remaining whole numbers. This answer will always be
less than or equal to the total number of representatives (and the “or equal to” part
happens only in very specific circumstances that are incredibly unlikely to turn up).

4. Assuming that the total from Step 3 was less than the total number of
representatives, assign the remaining representatives, one each, to the states whose
decimal parts of the quota were largest, until the desired total is reached.

Make sure that each state ends up with at least one representative!
Voting and Apportionment 19

Note on rounding: Today we have technological advantages that Hamilton (and the others)
couldn’t even have imagined. Take advantage of them, and keep several decimal places.

Example 13: Hamilton’s Method with Delaware


The state of Delaware has three counties: Kent, New Castle, and Sussex. The Delaware
state House of Representatives has 41 members. If Delaware wants to divide this
representation along county lines (which is not required, but let’s pretend they do), let’s use
Hamilton’s method to apportion them. The populations of the counties are as follows (from
the 2010 Census):

County Population
Kent 162,310
New Castle 538,479
Sussex 197,145
Total 897,934

1. First, we determine the divisor: 897,934/41 = 21,900.82927

2. Now we determine each county’s quota by dividing the county’s population by the
divisor:

County Population Quota


Kent 162,310 7.4111
New Castle 538,479 24.5872
Sussex 197,145 9.0017
Total 897,934

3. Removing the decimal parts of the quotas gives:

County Population Quota Initial


Kent 162,310 7.4111 7
New Castle 538,479 24.5872 24
Sussex 197,145 9.0017 9
Total 897,934 40

4. We need 41 representatives and this only gives 40. The remaining one goes to the county
with the largest decimal part, which is New Castle:

County Population Quota Initial Final


Kent 162,310 7.4111 7 7
New Castle 538,479 24.5872 24 25
Sussex 197,145 9.0017 9 9
Total 897,934 40
20

Example 14: Hamilton’s Method with Rhode Island


Use Hamilton’s method to apportion the 75 seats of Rhode Island’s House of Representatives
among its five counties.

County Population
Bristol 49,875
Kent 166,158
Newport 82,888
Providence 626,667
Washington 126,979
Total 1,052,567

1. The divisor is 1,052,567/75 = 14,034.22667

2. Determine each county’s quota by dividing its population by the divisor:

County Population Quota


Bristol 49,875 3.5538
Kent 166,158 11.8395
Newport 82,888 5.9061
Providence 626,667 44.6528
Washington 126,979 9.0478
Total 1,052,567

3. Remove the decimal part of each quota:

County Population Quota Initial


Bristol 49,875 3.5538 3
Kent 166,158 11.8395 11
Newport 82,888 5.9061 5
Providence 626,667 44.6528 44
Washington 126,979 9.0478 9
Total 1,052,567 72

4. We need 75 representatives and we only have 72, so we assign the remaining three, one
each, to the three counties with the largest decimal parts, which are Newport, Kent, and
Providence:

County Population Quota Initial Final


Bristol 49,875 3.5538 3 3
Kent 166,158 11.8395 11 12
Newport 82,888 5.9061 5 6
Providence 626,667 44.6528 44 45
Washington 126,979 9.0478 9 9
Total 1,052,567 72 75
Voting and Apportionment 21

Note that even though Bristol County’s decimal part is greater than .5, it isn’t big enough to
get an additional representative, because three other counties have greater decimal parts.

Hamilton’s method obeys something called the Quota Rule. The Quota Rule isn’t a law of
any sort, but just an idea that some people, including Hamilton, think is a good one.

Quota Rule
The Quota Rule says that the final number of representatives a state gets should be
within one of that state’s quota. Since we’re dealing with whole numbers for our final
answers, that means that each state should either go up to the next whole number above
its quota, or down to the next whole number below its quota.

Controversy
After seeing Hamilton’s method, many people find that it makes sense, it’s not that difficult
to use (or, at least, the difficulty comes from the numbers that are involved and the amount of
computation that’s needed, not from the method), and they wonder why anyone would want
another method. The problem is that Hamilton’s method is subject to several paradoxes.
Three of them happened, on separate occasions, when Hamilton’s method was used to
apportion the United States House of Representatives.

The Alabama Paradox is named for an incident that happened during the apportionment that
took place after the 1880 census. (A similar incident happened ten years earlier involving the
state of Rhode Island, but the paradox is named after Alabama.) The post-1880
apportionment had been completed, using Hamilton’s method and the new population
numbers from the census. Because of the country’s population growth, it was decided that
the House of Representatives should be made larger. The apportionment would need to be
done again, still using Hamilton’s method and the same 1880 census numbers, but with more
representatives. The assumption was that some states would gain another representative and
others would stay with the same number they already had (since there weren’t enough new
representatives being added to give one more to every state). The paradox is that Alabama
ended up losing a representative in the process, even though no populations were changed
and the total number of representatives increased.

The New States Paradox happened when Oklahoma became a state in 1907. Oklahoma had
enough population to qualify for five representatives in Congress. Those five representatives
would need to come from somewhere, though, so five states, presumably, would lose one
representative each. That happened, but another thing also happened: Maine gained a
representative (from New York).

The Population Paradox happened between the apportionments after the census of 1900 and
of 1910. In those ten years, Virginia’s population grew at an average annual rate of 1.07%,
while Maine’s grew at an average annual rate of 0.67%. Virginia started with more people,
grew at a faster rate, grew by more people, and ended up with more people than Maine. By
itself, that doesn’t mean that Virginia should gain representatives or Maine shouldn’t,
because there are lots of other states involved. But Virginia ended up losing a representative
to Maine.
22

Jefferson’s Method
Thomas Jefferson proposed a different method for apportionment. After Washington vetoed
Hamilton’s method, Jefferson’s method was adopted, and used in Congress from 1791
through 1842. Jefferson had political reasons for wanting his method to be used rather than
Hamilton’s. His method favors larger states, and his own home state of Virginia was the
largest in the country at the time. He would also argue that it’s the ratio of people to
representatives that is the critical thing, and apportionment methods should be based on that.
But the paradoxes we saw also provide mathematical reasons for concluding that Hamilton’s
method isn’t so good, and we should look for other possibilities.

The first steps of Jefferson’s method are the same as Hamilton’s method. He finds the same
divisor and the same quota, and cuts off the decimal parts in the same way, giving a total
number of representatives that is less than the required total. The difference is in how
Jefferson resolves that difference. He says that since we ended up with an answer that is too
small, our divisor must have been too big. He changes the divisor by making it smaller,
finding new quotas with the new divisor, cutting off the decimal parts, and looking at the new
total, until we find a divisor that produces the required total.

Jefferson’s Method
1. Determine how many people each representative should represent. Do this by
dividing the total population of all the states by the total number of representatives.
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
This answer is called the standard divisor. 𝑑𝑖𝑣𝑖𝑠𝑜𝑟 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒𝑠

2. Divide each state’s population by the divisor to determine how many


representatives it should have. Record this answer to several decimal places. This
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
answer is called the quota. 𝑞𝑢𝑜𝑡𝑎 =
𝑑𝑖𝑣𝑖𝑠𝑜𝑟

3. Cut off all the decimal parts of all the quotas by rounding down to the nearest
whole number (but don’t forget what the decimals were). These are the lower
quotas. Add up the remaining whole numbers. This answer will always be less
than or equal to the total number of representatives.

4. If the total from Step 3 was less than the total number of representatives, reduce
the divisor and recalculate the quota and allocation. Continue doing this until the
total in Step 3 is equal to the total number of representatives. The divisor we end
up using is called the modified divisor or adjusted divisor.

Example 15: Jefferson’s Method with Delaware


We’ll return to Delaware and apply Jefferson’s method. We begin, as we did with
Hamilton’s method, by finding the quotas with the original divisor, 21,900.82927:

County Population Quota Initial (Quota rounded down)


Kent 162,310 7.4111 7
New Castle 538,479 24.5872 24
Sussex 197,145 9.0017 9
Total 897,934 40
Voting and Apportionment 23

We need 41 representatives, and this divisor gives only 40. We must reduce the divisor until
we get 41 representatives. Let’s try 21,500 as the divisor:

County Population Quota Initial (Quota rounded down)


Kent 162,310 7.5493 7
New Castle 538,479 25.0455 25
Sussex 197,145 9.1695 9
Total 897,934 41

This gives us the required 41 representatives, so we’re done. If we had fewer than 41, we’d
need to reduce the divisor more. If we had more than 41, we’d need to choose a divisor less
than the original but greater than the second choice. Notice that with the new, lower divisor,
the quota for New Castle County (the largest county in the state) increased by much more
than those of Kent County or Sussex County.

Example 16: Jefferson’s Method with Rhode Island


We’ll apply Jefferson’s method for Rhode Island. The original divisor of 14,034.22667 gave
these results:

County Population Quota Initial (Quota rounded down)


Bristol 49,875 3.5538 3
Kent 166,158 11.8395 11
Newport 82,888 5.9061 5
Providence 626,667 44.6528 44
Washington 126,979 9.0478 9
Total 1,052,567 72

We need 75 representatives and we only have 72, so we need to use a smaller divisor. Let’s
try 13,500:

County Population Quota Initial (Quota rounded down)


Bristol 49,875 3.6944 3
Kent 166,158 12.3080 12
Newport 82,888 6.1399 6
Providence 626,667 46.4198 46
Washington 126,979 9.4059 9
Total 1,052,567 76

We’ve gone too far. We need a divisor that’s greater than 13,500 but less than 14,034.22667.
Let’s try 13,700:

County Population Quota Initial (Quota rounded down)


Bristol 49,875 3.6405 3
Kent 166,158 12.1283 12
Newport 82,888 6.0502 6
Providence 626,667 45.7421 45
Washington 126,979 9.2685 9
Total 1,052,567 75 This works!
24

In comparison to Hamilton’s method, although the results were the same, they came about in
a different way, and the outcome was almost different. Providence County (the largest)
almost went up to 46 representatives before Kent (which is much smaller) got to 12.
Although that didn’t happen here, it can. Divisor-adjusting methods like Jefferson’s are not
guaranteed to follow the quota rule!

Adams’ Method
In 1832, John Quincy Adams proposed an alternative similar to Jefferson’s method, but
rounds quotas up rather than down. This means we usually need a modified divisor that is
smaller than the standard divisor. It favors smaller states and was never adopted by congress.

Adams’ Method
1. Determine how many people each representative should represent. Do this by
dividing the total population of all the states by the total number of representatives.
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
This answer is called the standard divisor: 𝑑𝑖𝑣𝑖𝑠𝑜𝑟 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒𝑠

2. Divide each state’s population by the divisor to determine how many


representatives it should have. Record this answer to several decimal places. This
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
answer is called the quota: 𝑞𝑢𝑜𝑡𝑎 =
𝑑𝑖𝑣𝑖𝑠𝑜𝑟

3. Cut off all the decimal parts of all the quotas by rounding up to the nearest whole
number (but don’t forget what the decimals were). These are the upper quotas.
Add up these whole numbers. This answer will always be more than or equal to the
total number of representatives.

4. If the total from Step 3 was more than the total number of representatives, increase
the divisor and recalculate the quota and allocation. Continue doing this until the
total in Step 3 is equal to the total number of representatives. The divisor we end
up using is called the modified divisor or adjusted divisor.

Example 17: Adams’ Method with Delaware


We’ll return to Delaware and now apply Adams’ method. We begin, as we did with other
methods, by finding the quotas with the original divisor, 21,900.82927:

County Population Quota Initial (Quota rounded up)


Kent 162,310 7.4111 8
New Castle 538,479 24.5872 25
Sussex 197,145 9.0017 10
Total 897,934 43

We need 41 representatives, and this divisor gives 43. We must increase the divisor until we
get 41 representatives; notice that the process for the Adams method is the opposite of the
Jefferson method. Let’s try 23,000 as the divisor:
Voting and Apportionment 25

County Population Quota Initial (Quota rounded up)


Kent 162,310 7.057 8
New Castle 538,479 23.4121 24
Sussex 197,145 8.5715 9
Total 897,934 41 This works!

Webster’s Method
Daniel Webster (1782-1852) proposed a method similar to Jefferson’s in 1832. It was
adopted by Congress in 1842, but replaced by Hamilton’s method in 1852. It was then
adopted again in 1901. The difference is that Webster rounds the quotas to the nearest whole
number rather than rounding down (dropping the decimal) like Jefferson’s, or rounding up
like Adams’. If that doesn’t produce the desired results at the beginning, he says, like
Jefferson, to adjust the divisor until it does. (In Jefferson’s case, at least the first adjustment
will always be to make the divisor smaller. That is not always the case with Webster’s
method.)

Webster’s Method
1. Determine how many people each representative should represent. Do this by
dividing the total population of all the states by the total number of representatives.
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
This answer is called the standard divisor: 𝑑𝑖𝑣𝑖𝑠𝑜𝑟 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒𝑠

2. Divide each state’s population by the divisor to determine how many


representatives it should have. Record this answer to several decimal places. This
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
answer is called the quota: 𝑞𝑢𝑜𝑡𝑎 =
𝑑𝑖𝑣𝑖𝑠𝑜𝑟

3. Round all the quotas to the nearest whole number (but don’t forget what the
decimals were). Add up the remaining whole numbers.

4. If the total from Step 3 was less than the total number of representatives, reduce
the divisor and recalculate the quota and allocation. If the total from step 3 was
more than the total number of representatives, increase the divisor and recalculate
the quota and allocation. Continue doing this until the total in Step 3 is equal to the
total number of representatives. The divisor we end up using is called the modified
divisor or adjusted divisor.

Example 18: Webster’s Method with Delaware


Again, Delaware, with an initial divisor of 21,900.82927:

County Population Quota Initial


Kent 162,310 7.4111 7
New Castle 538,479 24.5872 25
Sussex 197,145 9.0017 9
Total 897,934 41

This gives the required total, so we’re done.


26

Example 19: Webster’s Method with Rhode Island


Again, Rhode Island, with an initial divisor of 14,034.22667:

County Population Quota Initial


Bristol 49,875 3.5538 4
Kent 166,158 11.8395 12
Newport 82,888 5.9061 6
Providence 626,667 44.6528 45
Washington 126,979 9.0478 9
Total 1,052,567 76

This is too many, so we need to increase the divisor. Let’s try 14,100:

County Population Quota Initial


Bristol 49,875 3.5372 4
Kent 166,158 11.7843 12
Newport 82,888 5.8786 6
Providence 626,667 44.4445 44
Washington 126,979 9.0056 9
Total 1,052,567 75

This works, so we’re done.

Like Jefferson’s method, Webster’s method carries a bias in favor of states with large
populations, but rounding the quotas to the nearest whole number greatly reduces this bias.
Notice that Providence County, the largest, is the one that gets a representative trimmed
because of the increased quota. Also like Jefferson’s method, Webster’s method does not
always follow the quota rule, but it follows the quota rule much more often than Jefferson’s
method does. In fact, if Webster’s method had been applied to every apportionment of
Congress in all of American history, it would have followed the quota rule every single time.

In 1980, two mathematicians, Peyton Young and Mike Balinski, proved what we now call
the Balinski-Young Impossibility Theorem.

Balinski-Young Impossibility Theorem


The Balinski-Young Impossibility Theorem shows that any apportionment method
which always follows the quota rule will be subject to the possibility of paradoxes like
the Alabama, New States, or Population paradoxes. In other words, we can choose a
method that avoids those paradoxes, but only if we are willing to give up the guarantee
of following the quota rule.

Huntington-Hill Method
In 1920, no new apportionment was done, because Congress couldn’t agree on the method to
be used. They appointed a committee of mathematicians to investigate, and they
recommended the Huntington-Hill method. They continued to use Webster’s method in
1931, but after a second report recommending Huntington-Hill, it was adopted in 1941 and is
Voting and Apportionment 27

the current method of apportionment used in Congress. In 1992, Montana challenged the
constitutionality of the Huntington-Hill method because Montana had lost a seat to
Washington as a result of the 1990 census. Montana lost their suit as the Supreme Court
upheld the Huntington-Hill method.

The Huntington-Hill Method is similar to Webster’s method, but attempts to minimize the
percent differences of how many people each representative will represent.

Huntington-Hill Method
1. Determine how many people each representative should represent. Do this by
dividing the total population of all the states by the total number of representatives.
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
This answer is called the standard divisor: 𝑑𝑖𝑣𝑖𝑠𝑜𝑟 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒𝑠

2. Divide each state’s population by the divisor to determine how many


representatives it should have. Record this answer to several decimal places. This
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
answer is called the quota: 𝑞𝑢𝑜𝑡𝑎 =
𝑑𝑖𝑣𝑖𝑠𝑜𝑟

3. Round down (cut off the decimal part of the quota) to obtain the lower quota,
which we’ll call n. Compute n(n  1) , which is the geometric mean of the lower
quota and one value higher.

4. If the initial quota is larger than the geometric mean, round up the quota; if the
quota is smaller than the geometric mean, round down the quota. Add up the
resulting whole numbers to get the initial allocation.

5. If the total from Step 4 was less than the total number of representatives, reduce
the divisor and recalculate the quota and allocation. If the total from step 4 was
more than the total number of representatives, increase the divisor and recalculate
the quota and allocation. Continue doing this until the total in Step 4 is equal to the
total number of representatives. The divisor we end up using is called the modified
divisor or adjusted divisor.

Example 20: Huntington Hill Method with Delaware


Again, Delaware, with an initial divisor of 21,900.82927:

County Population Quota Lower Quota Geom Mean Initial


Kent 162,310 7.4111 7 7.48 7
New Castle 538,479 24.5872 24 24.49 25
Sussex 197,145 9.0017 9 9.49 9
Total 897,934 41

This gives the required total, so we’re done.


28

Example 21: Huntington-Hill Method with Rhode Island


Again, Rhode Island, with an initial divisor of 14,034.22667:

County Population Quota Lower Quota Geom Mean Initial


Bristol 49,875 3.5538 3 3.46 4
Kent 166,158 11.8395 11 11.49 12
Newport 82,888 5.9061 5 5.48 6
Providence 626,667 44.6528 44 44.50 45
Washington 126,979 9.0478 9 9.49 9
Total 1,052,567 76

This is too many, so we need to increase the divisor. Let’s try 14,100:

County Population Quota Lower Quota Geom Mean Initial


Bristol 49,875 3.5372 3 3.46 4
Kent 166,158 11.7843 11 11.49 12
Newport 82,888 5.8786 5 5.48 6
Providence 626,667 44.4445 44 44.50 44
Washington 126,979 9.0056 9 9.49 9
Total 1,052,567 75

This works, so we’re done.

In both these cases, the apportionment produced by the Huntington-Hill method was the
same as those from Webster’s method.

Example 22: Huntington-Hill vs. Webster’s


Consider a small country with 5 states, two of which are much larger than the others. We
need to apportion 70 representatives. We will apportion using both Webster’s method and
the Huntington-Hill method.

State Population
A 300,500
B 200,000
C 50,000
D 38,000
E 21,500

1. The total population is 610,000. Dividing this by the 70 representatives gives the divisor:
8714.286

2. Dividing each state’s population by the divisor gives the quotas


Voting and Apportionment 29

State Population Quota


A 300,500 34.48361
B 200,000 22.95082
C 50,000 5.737705
D 38,000 4.360656
E 21,500 2.467213

Webster’s Method
3. Using Webster’s method, we round each quota to the nearest whole number
State Population Quota Initial
A 300,500 34.48361 34
B 200,000 22.95082 23
C 50,000 5.737705 6
D 38,000 4.360656 4
E 21,500 2.467213 2

4. Adding these up, they only total 69 representatives, so we adjust the divisor down.
Adjusting the divisor down to 8700 gives an updated allocation totaling 70 representatives
State Population Quota Initial
A 300,500 34.54023 35
B 200,000 22.98851 23
C 50,000 5.747126 6
D 38,000 4.367816 4
E 21,500 2.471264 2

Huntington-Hill Method
3. Using the Huntington-Hill method, we round down to find the lower quota, then calculate
the geometric mean based on each lower quota. If the quota is less than the geometric
mean, we round down; if the quota is more than the geometric mean, we round up.
State Population Quota Lower Geometric Initial
Quota Mean
A 300,500 34.48361 34 34.49638 34
B 200,000 22.95082 22 22.49444 23
C 50,000 5.737705 5 5.477226 6
D 38,000 4.360656 4 4.472136 4
E 21,500 2.467213 2 2.44949 3

These allocations add up to 70, so we’re done.

Notice that this allocation is different than that produced by Webster’s method. In this case,
state E got the extra seat instead of state A.

As you can see, there is no “right answer” when it comes to choosing a method for
apportionment. Each method has its virtues, and favors different sized states.
30

Lowndes’ Method (optional, but interesting)


William Lowndes (1782-1822) was a Congressman from South Carolina (a small state) who
proposed a method of apportionment that was more favorable to smaller states. Unlike the
methods of Hamilton, Jefferson, and Webster, Lowndes’s method has never been used to
apportion Congress.

Lowndes believed that an additional representative was much more valuable to a small state
than to a large one. If a state already has 20 or 30 representatives, getting one more doesn’t
matter very much. But if it only has 2 or 3, one more is a big deal, and he felt that the
additional representatives should go where they could make the most difference.

Like Hamilton’s method, Lowndes’s method follows the quota rule. In fact, it arrives at the
same quotas as Hamilton and the rest, and like Hamilton and Jefferson, it drops the decimal
parts. But in deciding where the remaining representatives should go, we divide the decimal
part of each state’s quota by the whole number part (so that the same decimal part with a
smaller whole number is worth more, because it matters more to that state).

Lowndes’s Method
1. Determine how many people each representative should represent. Do this by
dividing the total population of all the states by the total number of representatives.
This answer is called the divisor.

2. Divide each state’s population by the divisor to determine how many


representatives it should have. Record this answer to several decimal places. This
answer is called the quota.

3. Cut off all the decimal parts of all the quotas (but don’t forget what the decimals
were). Add up the remaining whole numbers.

4. Assuming that the total from Step 3 was less than the total number of
representatives, divide the decimal part of each state’s quota by the whole number
part. Assign the remaining representatives, one each, to the states whose ratio of
decimal part to whole part were largest, until the desired total is reached.

Example 23: Delaware with Lowndes’ Method


We’ll do Delaware again. We begin in the same way as with Hamilton’s method:

County Population Quota Initial


Kent 162,310 7.4111 7
New Castle 538,479 24.5872 24
Sussex 197,145 9.0017 9
Total 897,934 40
Voting and Apportionment 31

We need one more representative. To find out which county should get it, Lowndes says to
divide each county’s decimal part by its whole number part, with the largest result getting the
extra representative:

Kent: 0.4111/7 ≈ 0.0587


New Castle: 0.5872/24 ≈ 0.0245
Sussex: 0.0017/9 ≈ 0.0002

The largest of these is Kent’s, so Kent gets the 41st representative:

County Population Quota Initial Ratio Final


Kent 162,310 7.4111 7 0.0587 8
New Castle 538,479 24.5872 24 0.0245 24
Sussex 197,145 9.0017 9 0.0002 9
Total 897,934 40 41

Example 24: Rhode Island with Lowndes’ Method


Rhode Island, again beginning in the same way as Hamilton:

County Population Quota Initial


Bristol 49,875 3.5538 3
Kent 166,158 11.8395 11
Newport 82,888 5.9061 5
Providence 626,667 44.6528 44
Washington 126,979 9.0478 9
Total 1,052,567 72

We divide each county’s quota’s decimal part by its whole number part to determine which
three should get the remaining representatives:

Bristol: 0.5538/3 ≈ 0.1846


Kent: 0.8395/11 ≈ 0.0763
Newport: 0.9061/5 ≈ 0.1812
Providence: 0.6528/44 ≈ 0.0148
Washington: 0.0478/9 ≈ 0.0053

The three largest of these are Bristol, Newport, and Kent, so they get the remaining three
representatives:

County Population Quota Initial Ratio Final


Bristol 49,875 3.5538 3 0.1846 4
Kent 166,158 11.8395 11 0.0763 12
Newport 82,888 5.9061 5 0.1812 6
Providence 626,667 44.6528 44 0.0148 44
Washington 126,979 9.0478 9 0.0053 9
Total 1,052,567 72 75
32

Apportionment of Legislative Districts


In most states, there are a fixed number of representatives to the state legislature. Rather than
apportioning each county a number of representatives, legislative districts are drawn so that
each legislator represents a district. The apportionment process, then, comes in the drawing
of the legislative districts, with the goal of having each district include approximately the
same number of constituents. Because of this goal, a geographically small city may have
several representatives, while a large rural region may be represented by one legislator.

When populations change, it becomes necessary to redistrict the regions each legislator
represents (Incidentally, this also occurs for the regions that federal legislators represent).
The process of redistricting is typically done by the legislature itself, so not surprisingly it is
common to see gerrymandering.

Gerrymandering
Gerrymandering is when districts are drawn based on the political affiliation of the
constituents to the advantage of those drawing the boundary.

Example 25: Gerrymandering in Illinois


The map to the right shows the 4th congressional district in
Illinois in 2004. 5 This district was drawn to contain the two
predominantly Hispanic areas of Chicago. The largely
Puerto Rican area to the north and the southern Mexican
areas are only connected in this districting by a piece of the
highway to the west.

Try it now 7:
A college offers tutoring in Math, English, Chemistry, and
Biology. The number of students enrolled in each subject is Math 330
shown. If the college can only afford to hire 15 tutors, determine English 265
how many tutors should be assigned to each subject, using the Chemistry 130
methods indicated. Biology 70
a. Hamilton’s Method TOTAL 795
b. Jefferson’s Method
c. Adams’ Method
d. Webster’s Method
e. Huntington-Hill Method

5 http://en.wikipedia.org/wiki/File:Illinois_District_4_2004.png

You might also like