[go: up one dir, main page]

100% found this document useful (1 vote)
903 views198 pages

Newman Computational Physics Chap 2-5

This document introduces Python programming for physicists. It discusses using the Python IDLE development environment to write, edit, save, and run Python programs. IDLE creates two windows - a shell window to interactively run commands, and an editor window to write full programs across multiple lines. The document demonstrates writing a simple 2-line program that assigns a variable and prints it, saving the program with a .py extension, and running it to display the output in the shell window.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
903 views198 pages

Newman Computational Physics Chap 2-5

This document introduces Python programming for physicists. It discusses using the Python IDLE development environment to write, edit, save, and run Python programs. IDLE creates two windows - a shell window to interactively run commands, and an editor window to write full programs across multiple lines. The document demonstrates writing a simple 2-line program that assigns a variable and prints it, saving the program with a .py extension, and running it to display the output in the shell window.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 198

C HAPTER 2

P YTHON PROGRAMMING FOR PHYSICISTS

item of business is to learn how to write computer programs in


O UR FIRST
the Python programming language.
Python is easy to learn, simple to use, and enormously powerful. It has
facilities and features for performing tasks of many kinds. You can do art or
engineering in Python, surf the web or calculate your taxes, write words or
write music, make a movie or make the next billion-dollar Internet start-up.1
We will not attempt to learn about all of Python’s features, however, but restrict
ourselves to the subset that are most useful for doing physics calculations. We
will learn about the core structure of the language first, how to put together the
instructions that make up a program, but we will also learn about some of the
powerful features that can make the life of a computational physicist easier,
such as features for doing calculations with vectors and matrices, and features
for making graphs and computer graphics. Some other features of Python
that are more specialized, but still occasionally useful for physicists, will not
be covered here. Luckily there is excellent documentation available on-line,
so if there’s something you want to do and it’s not explained in this book, I
encourage you to see what you can find. A good place to start when looking
for information about Python is the official Python website at www.python.org.

2.1 G ETTING STARTED

A Python program consists of a list of instructions, resembling a mixture of


English words and mathematics and collectively referred to as code. We’ll see
exactly what form the instructions take in a moment, but first we need to know
how and where to enter them into the computer.

1
Some of these also require that you have a good idea.

9
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

When you are programming in Python—developing a program, as the jar-


gon goes—you typically work in a development environment, which is a window
or windows on your computer screen that show the program you are working
on and allow you to enter or edit lines of code. There are several different
development environments available for use with Python, but the most com-
monly used is the one called IDLE.2 If you have Python installed on your com-
puter then you probably have IDLE installed as well. (If not, it is available as a
free download from the web.3 ) How you start IDLE depends on what kind of
computer you have, but most commonly you click on an icon on the desktop
or under the start menu on a PC, or in the dock or the applications folder on
a Mac. If you wish, you can now start IDLE running on your computer and
follow along with the developments in this chapter step by step.
The first thing that happens when you start IDLE is that a window appears
on the computer screen. This is the Python shell window. It will have some text
in it, looking something like this:

Python 3.2 (r32:88445, Feb 21 2011, 21:12:33)


Type "help" for more information.
>>>

This tells you what version of Python you are running (your version may be
different from the one above), along with some other information, followed by
the symbol “>>>”, which is a prompt: it tells you that the computer is ready
for you to type something in. When you see this prompt you can type any
command in the Python language at the keyboard and the computer will carry
out that command immediately. This can be a useful way to quickly try in-
dividual Python commands when you’re not sure how something works, but
it’s not the main way that we will use Python commands. Normally, we want
to type in an entire Python program at once, consisting of many commands
one after another, then run the whole program together. To do this, go to the
top of the window, where you will see a set of menu headings. Click on the
“File” menu and select “New Window”. This will create a second window on
the screen, this one completely empty. This is an editor window. It behaves dif-

2
IDLE stands for “Integrated Development Environment” (sort of). The name is also a joke, the
Python language itself being named, allegedly, after the influential British comedy troupe Monty
Python, one of whose members was the comedian Eric Idle.
3
For Linux users, IDLE does not usually come installed automatically, so you may have to in-
stall it yourself. The most widely used brands of Linux, including Ubuntu and Fedora, have freely
available versions of IDLE that can be installed using their built-in software installer programs.

10
2.1 | G ETTING STARTED

ferently from the Python shell window. You type a complete program into this
window, usually consisting of many lines. You can edit it, add things, delete
things, cut, paste, and so forth, in a manner similar to the way one works with
a word processor. The menus at the top of the window provide a range of
word-processor style features, such as cut and paste, and when you are fin-
ished writing your program you can save your work just as you would with
a word processor document. Then you can run your complete program, the
whole thing, by clicking on the “Run” menu at the top of the editor window
and selecting “Run Module” (or you can press the F5 function key, which is
quicker). This is the main way in which we will use Python and IDLE in this
book.
To get the hang of how it works, try the following quick exercise. Open
up an editor window if you didn’t already (by selecting “New Window” from
the “File” menu) and type the following (useless) two-line program into the
window, just as it appears here:

x = 1
print(x)

(If it’s not obvious what this does, it will be soon.) Now save your program
by selecting “Save” from the “File” menu at the top of the editor window and
typing in a name.4 The names of all Python programs must end with “.py”, so
a suitable name might be “example.py” or something like that. (If you do not
give your program a name ending in “.py” then the computer will not know
that it is a Python program and will not handle it properly when you try to
load it again—you will probably find that such a program will not even run at
all, so the “.py” is important.)
Once you have saved your program, run it by selecting “Run module” from
the “Run” menu. When you do this the program will start running, and any
output it produces—anything it says or does or prints out—will appear in the
Python shell window (the other window, the one that appeared first). In this
case you should see something like this in the Python shell window:

4
Note that you can have several windows open at once, including the Python shell window
and one or more editor windows, and that each window has its own “File” menu with its own
“Save” item. When you click on one of these to save, IDLE saves the contents of the specific
window you clicked on. Thus if you want to save a program you must be careful to click on the
“File” menu in the window containing the program, rather than in any other window. If you click
on the menu in the shell window, for instance, IDLE will save the contents of the shell window,
not your program, which is probably not what you wanted.

11
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

1
>>>

The only result of this small program is that the computer prints out the num-
ber “1” on the screen. (It’s the value of the variable x in the program—see
Section 2.2.1 below.) The number is followed by a prompt “>>>” again, which
tells you that the computer is done running your program and is ready to do
something else.
This same procedure is the one you’ll use for running all your programs
and you’ll get used to it soon. It’s a good idea to save your programs, as here,
when they’re finished and ready to run. If you forget to do it, IDLE will ask
you if you want to save before it runs your program.
IDLE is by no means the only development environment for Python. If you
are comfortable with computers and enjoy trying things out, there are a wide
range of others available on the Internet, mostly for free, with names like Py-
Dev, Eric, BlackAdder, Komodo, Wing, and more. Feel free to experiment and
see what works for you, or you can just stick with IDLE. IDLE can do every-
thing we’ll need for the material in this book. But nothing in the book will
depend on what development environment you use. As far as the program-
ming and the physics go, they are all equivalent.

2.2 B ASIC PROGRAMMING

A program is a list of instructions, or statements, which under normal circum-


stances the computer carries out, or executes, in the order they appear in the
program. Individual statements do things like performing arithmetic, asking
for input from the user of the program, or printing out results. The following
sections introduce the various types of statements in the Python language one
by one.

2.2.1 VARIABLES AND ASSIGNMENTS

Quantities of interest in a program—which in physics usually means numbers,


or sets of numbers like vectors or matrices—are represented by variables, which
play roughly the same role as they do in ordinary algebra. Our first example
of a program statement in Python is this:

x = 1

12
2.2 | B ASIC PROGRAMMING

This is an assignment statement. It tells the computer that there is a variable


called x and we are assigning it the value 1. You can think of the variable as
a box that stores a value for you, so that you can come back and retrieve that
value at any later time, or change it to a different value. We will use variables
extensively in our computer programs to represent physical quantities like po-
sitions, velocities, forces, fields, voltages, probabilities, and wavefunctions.
In normal algebra variable names are usually just a single letter like x, but
in Python (and in most other programming languages) they don’t have to be—
they can be two, three, or more letters, or entire words if you want. Variable
names in Python can be as long as you like and can contain both letters and
numbers, as well as the underscore symbol “_”, but they cannot start with a
number, or contain any other symbols, or spaces. Thus x and Physics_101
are fine names for variables, but 4Score&7Years is not (because it starts with
a number, and also because it contains a &). Upper- and lower-case letters are
distinct from one another, meaning that x and X are two different variables
which can have different values.5
Many of the programs you will write will contain large numbers of vari-
ables representing the values of different things and keeping them straight in
your head can be a challenge. It is a very good idea—one that is guaranteed
to save you time and effort in the long run—to give your variables meaningful
names that describe what they represent. If you have a variable that repre-
sents the energy of a system, for instance, you might call it energy. If you have
a variable that represents the velocity of an object you could call it velocity.
For more complex concepts, you can make use of the underscore symbol “_”
to create variable names with more than one word, like maximum_energy or
angular_velocity. Of course, there will be times when single-letter variable
names are appropriate. If you need variables to represent the x and y positions
of an object, for instance, then by all means call them x and y. And there’s no
reason why you can’t call your velocity variable simply v if that seems natural
to you. But whatever you do, choose names that help you remember what the
variables represent.

5
Also variables cannot have names that are “reserved words” in Python. Reserved words are
the words used to assemble programming statements and include “for”, “if”, and “while”. (We
will see the special uses of each of these words in Python programming later in the chapter.)

13
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

2.2.2 VARIABLE TYPES

Variables come in several types. Variables of different types store different


kinds of quantities. The main types we will use for our physics calculations
are the following:
• Integer: Integer variables can take integer values and integer values only,
such as 1, 0, or −286784. Both positive and negative values are allowed,
but not fractional values like 1.5.
• Float: A floating-point variable, or “float” for short, can take real, or
floating-point, values such as 3.14159, −6.63 × 10−34 , or 1.0. Notice that
a floating-point variable can take an integer value like 1.0 (which after all
is also a real number), by contrast with integer variables which cannot
take noninteger values.
• Complex: A complex variable can take a complex value, such as 1 +
2j or −3.5 − 0.4j. Notice that in Python the unit imaginary number is
called j, not i. (Despite this, we will use i in some of the mathematical
formulas we derive in this book, since it is the common notation among
physicists. Just remember that when you translate your formulas into
computer programs you must use j instead.)
You might be asking yourself what these different types mean. What does it
mean that a variable has a particular type? Why do we need different types?
Couldn’t all values, including integers and real numbers, be represented with
complex variables, so that we only need one type of variable? In principle
they could, but there are great advantages to having the different types. For
instance, the values of the variables in a program are stored by the computer
in its memory, and it takes twice as much memory to store a complex number
as it does a float, because the computer has to store both the real and imagi-
nary parts. Even if the imaginary part is zero (so that the number is actually
real), the computer still takes up memory space storing that zero. This may
not seem like a big issue given the huge amounts of memory computers have
these days, but in many physics programs we need to store enormous num-
bers of variables—millions or billions of them—in which case memory space
can become a limiting factor in writing the program.
Moreover, calculations with complex numbers take longer to complete, be-
cause the computer has to calculate both the real and imaginary parts. Again,
even if the imaginary part is zero, the computer still has to do the calculation,
so it takes longer either way. Many of our physics programs will involve mil-
lions or billions of operations. Big physics calculations can take days or weeks

14
2.2 | B ASIC PROGRAMMING

to run, so the speed of individual mathematical operations can have a big ef-
fect. Of course, if we need to work with complex numbers then we will have
to use complex variables, but if our numbers are real, then it is better to use a
floating-point variable.
Similar considerations apply to floating-point variables and integers. If the
numbers we are working with are genuinely noninteger real numbers, then we
should use floating-point variables to represent them. But if we know that the
numbers are integers then using integer variables is usually faster and takes
up less memory space.
Moreover, integer variables are in some cases actually more accurate than
floating-point variables. As we will see in Section 4.2, floating-point calcula-
tions on computers are not infinitely accurate. Just as on a hand-held calcula-
tor, computer calculations are only accurate to a certain number of significant
figures (typically about 16 on modern computers). That means that the value 1
assigned to a floating-point variable may actually be stored on the computer
as 0.9999999999999999. In many cases the difference will not matter much, but
what happens, for instance, if something special is supposed to take place in
your program if, and only if, the number is less than 1? In that case, the differ-
ence between 1 and 0.9999999999999999 could be crucially important. Numer-
ous bugs and problems in computer programs have arisen because of exactly
this kind of issue. Luckily there is a simple way to avoid it. If the quantity
you’re dealing with is genuinely an integer, then you should store it in an in-
teger variable. That way you know that 1 means 1. Integer variables are not
accurate to just 16 significant figures: they are perfectly accurate. They repre-
sent the exact integer you assign to them, nothing more and nothing less. If
you say “x = 1”, then indeed x is equal to 1.
This is an important lesson, and one that is often missed when people first
start programming computers: if you have an integer quantity, use an integer
variable. In quantum mechanics most quantum numbers are integers. The
number of atoms in a gas is an integer. So is the number of planets in the
solar system or the number of stars in the galaxy. Coordinates on lattices in
solid-state physics are often integers. Dates are integers. The population of
the world is an integer. If you were representing any of these quantities in a
program it would in most cases be an excellent idea to use an integer variable.
More generally, whenever you create a variable to represent a quantity in one
of your programs, think about what type of value that quantity will take and
choose the type of your variable to match it.

15
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

And how do you tell the computer what type you want a variable to be?
The name of the variable is no help. A variable called x could be an integer or
it could be a complex variable.
The type of a variable is set by the value that we give it. Thus for instance if
we say “x = 1” then x will be an integer variable, because we have given it an
integer value. If we say “x = 1.5” on the other hand then it will be a float. If
we say “x = 1+2j” it will be complex. Very large floating-point or complex
values can be specified using scientific notation, in the form “x = 1.2e34”
(which means 1.2 × 1034 ) or “x = 1e-12 + 2.3e45j” (which means 10−12 +
2.3 × 1045 j).
The type of a variable can change as a Python program runs. For example,
suppose we have the following two lines one after the other in our program:

x = 1
x = 1.5

If we run this program then after the first line is executed by the computer x
will be an integer variable with value 1. But immediately after that the com-
puter will execute the second line and x will become a float with value 1.5. It’s
type has changed from integer to float.6
However, although you can change the types of variables in this way, it
doesn’t mean you should. It is considered poor programming to use the same
variable as two different types in a single program, because it makes the pro-
gram significantly more difficult to follow and increases the chance that you
may make a mistake in your programming. If x is an integer in some parts of
the program and a float in others then it becomes difficult to remember which
it is and confusion can ensue. A good programmer, therefore, will use a given
variable to store only one type of quantity in a given program. If you need a
variable to store another type, use a different variable with a different name.
Thus, in a well written program, the type of a variable will be set the first
time it is given a value and will remain the same for the rest of the program.
This doesn’t quite tell us the whole story, however, because as we’ve said a
floating-point variable can also take an integer value. There will be times when

6
If you have previously programmed in one of the so-called static-typed languages, such as C,
C++, Fortran, or Java, then you’ll be used to creating variables with a declaration such as “int i”
which means “I’m going to be using an integer variable called i.” In such languages the types of
variables are fixed once they are declared and cannot change. There is no equivalent declaration
in Python. Variables in Python are created when you first use them, with types which are deduced
from the values they are given and which may change when they are given new values.

16
2.2 | B ASIC PROGRAMMING

we wish to give a variable an integer value, like 1, but nonetheless have that
variable be a float. There’s no contradiction in this, but how do we tell the
computer that this is what we want? If we simply say “x = 1” then, as we
have seen, x will be an integer variable.
There are two simple ways to do what we want here. The first is to specify
a value that has an explicit decimal point in it, as in “x = 1.0”. The decimal
point is a signal to the computer that this is a floating-point value (even though,
mathematically speaking, 1 is of course an integer) and the computer knows
in this situation to make the variable x a float. Thus “x = 1.0” specifies a
floating-point variable called x with the value 1.
A slightly more complicated way to achieve the same thing is to write
“x = float(1)”, which tells the computer to take the value 1 and convert
it into a floating-point value before assigning it to the variable x. This also
achieves the goal of making x a float.
A similar issue can arise with complex variables. There will be times when
we want to create a variable of complex type, but we want to give it a purely
real value. If we just say “x = 1.5” then x will be a real, floating-point vari-
able, which is not what we want. So instead we say “x = 1.5 + 0j”, which
tells the computer that we intend x to be complex. Alternatively, we can write
“x = complex(1.5)”, which achieves the same thing.
There is one further type of variable, the string, which is often used in
Python programs but which comes up only rarely in physics programming,
which is why we have not mentioned it so far. A string variable stores text
in the form of strings of letters, punctuation, symbols, digits, and so forth. To
indicate a string value one uses quotation marks, like this:

x = "This is a string"

This statement would create a variable x of string type with the value “This is
a string”. Any character can appear in a string, including numerical digits.
Thus one is allowed to say, for example, x = "1.234", which creates a string
variable x with the value “1.234”. It’s crucial to understand that this is not the
same as a floating-point variable with the value 1.234. A floating-point variable
contains a number, the computer knows it’s a number, and, as we will shortly
see, one can do arithmetic with that number, or use it as the starting point
for some mathematical calculation. A string variable with the value “1.234”
does not represent a number. The value “1.234” is, as far as the computer is
concerned, just a string of symbols in a row. The symbols happen to be digits
(and a decimal point) in this case, but they could just as easily be letters or

17
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

spaces or punctuation. If you try to do arithmetic with a string variable, even


one that appears to contain a number, the computer will most likely either
complain or give you something entirely unexpected. We will not have much
need for string variables in this book and they will as a result appear only
rather rarely. One place they do appear, however, is in the following section
on output and input.
In all of the statements we have seen so far you are free to put spaces be-
tween parts of the statement. Thus “x=1” and “x = 1” do the exact same thing;
the spaces have no effect. They can, however, do much to improve the read-
ability of a program. When we start writing more complicated statements in
the following sections, we will find it very helpful to add some spaces here
and there. There are a few places where one cannot add extra spaces, the most
important being at the beginning of a line, before the start of the statement.
As we will see in Section 2.3.1, inserting extra spaces at the beginning of a line
does have an effect on the way the program works. Thus, unless you know
what you are doing, you should avoid putting spaces at the beginning of lines.
You can also include blank lines between statements in a program, at any
point and as many as you like. This can be useful for separating logically
distinct parts of a program from one another, again making the program easier
to understand. We will use this trick many times in the programs in this book
to improve their readability.

2.2.3 O UTPUT AND INPUT STATEMENTS

We have so far seen one example of a program statement, the assignment state-
ment, which takes the form “x = 1” or something similar. The next types of
statements we will examine are the statements for output and input of data
in Python programs. We have already seen one example of the basic output
statement, or “print” statement. In Section 2.1 we gave this very short exam-
ple program:

x = 1
print(x)

The first line of this program we understand: it creates an integer variable


called x and gives it the value 1. The second statement tells the computer to
“print” the value of x on the screen of the computer. Note that it is the value
of the variable x that is printed, not the letter “x”. The value of the variable in
this case is 1, so this short program will result in the computer printing a “1”

18
2.2 | B ASIC PROGRAMMING

on the screen, as we saw on page 12.


The print statement always prints the current value of the variable at the
moment the statement is executed. Thus consider this program:

x = 1
print(x)
x = 2
print(x)

First the variable x is set to 1 and its value is printed out, resulting in a 1 on the
screen as before. Then the value of x is changed to 2 and the value is printed
again, which produces a 2 on the screen. Overall we get this:

1
2

Thus the two print statements, though they look identical, produce different
results in this case. Note also that each print statement starts its printing on a
new line.
The print statement can be used to print out more than one thing at a time.
Consider this program:

x = 1
y = 2
print(x,y)

which produces this result:

1 2

Note that the two variables in the print statement are separated by a comma.
When their values are printed out, however, they are printed with a space
between them (not a comma).
We can also print out words, like this:

x = 1
y = 2
print("The value of x is",x,"and the value of y is",y)

which produces this on the screen:

The value of x is 1 and the value of y is 2

19
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

Adding a few words to your program like this can make its output much eas-
ier to read and understand. You can also have print statements that print
out only words if you like, as in print("The results are as follows") or
print("End of program").
The print statement can also print out the values of floating-point and com-
plex variables. For instance, we can write

x = 1.5
z = 2+3j
print(x,z)

and we get

1.5 (2+3j)

In general, a print statement can include any string of quantities separated by


commas, or text in quotation marks, and the computer will simply print out
the appropriate things in order, with spaces in between.7 Occasionally you
may want to print things with something other than spaces in between, in
which case you can write something like the following:

print(x,z,sep="...")

which would print

1.5...(2+3j)

The code sep="..." tells the computer to use whatever appears between the
quotation marks as a separator between values—three dots in this case, but
you could use any letters, numbers, or symbols you like. You can also have no
separator between values at all by writing print(x,z,sep="") with nothing
between the quotation marks, which in the present case would give

1.5(2+3j)

7
The print statement is one of the things that differs between Python version 3 and earlier
versions. In earlier versions there were no parentheses around the items to be printed—you would
just write “print x”. If you are using an earlier version of Python with this book then you will
have to remember to omit the parentheses from your print statements. Alternatively, if you are
using version 2.6 or later (but not version 3) then you can make the print statement behave as
it does in version 3 by including the statement from future import print function at the
start of your program. (Note that there are two underscore symbols before the word “future” and
two after it.) See Appendix B for further discussion of the differences between Python versions.

20
2.2 | B ASIC PROGRAMMING

Input statements are only a little more complicated. The basic form of an
input statement in Python is like this:

x = input("Enter the value of x: ")

When the computer executes this statement it does two things. First, the state-
ment acts something like a print statement and prints out the quantity, if any,
inside the parentheses.8 So in this case the computer would print the words
“Enter the value of x: ”. If there is nothing inside the parentheses, as in
“x = input()”, then the computer prints nothing, but the parentheses are still
required nonetheless.
Next the computer will stop and wait. It is waiting for the user to type a
value on the keyboard. It will wait patiently until the user types something
and then the value that the user types is assigned to the variable x. However,
there is a catch: the value entered is always interpreted as a string value, even
if you type in a number.9 (We encountered strings previously in Section 2.2.2.)
Thus consider this simple two-line program:

x = input("Enter the value of x: ")


print("The value of x is",x)

This does nothing more than collect a value from the user then print it out
again. If we run this program it might look something like the following:

Enter the value of x: 1.5


The value of x is 1.5

This looks reasonable. But we could also do the following:

8
It doesn’t act exactly like a print statement however, since it can only print a single quantity,
such as a string of text in quotes (as here) or a variable, where the print statement can print many
quantities in a row.
9
Input statements are another thing that changed between versions 2 and 3 of Python. In
version 2 and earlier the value generated by an input statement would have the same type as
whatever the user entered. If the user entered an integer, the input statement would give an
integer value. If the user entered a float it would give a float, and so forth. However, this was
considered confusing, because it meant that if you then assigned that value to a variable (as in the
program above) there would be no way to know in advance what the type of the variable would
be—the type would depend on what the user entered at the keyboard. So in version 3 of Python
the behavior was changed to its present form in which the input is always interpreted as a string.
If you are using a version of Python earlier than version 3 and you want to reproduce the behavior
of version 3 then you can write “x = raw input()”. The function raw input in earlier versions is
the equivalent of input in version 3.

21
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

Enter the value of x: Hello


The value of x is Hello

As you can see “value” is interpreted rather loosely. As far as the computer
is concerned, anything you type in is a string, so it doesn’t care whether you
enter digits, letters, a complete word, or several words. Anything is fine.
For physics calculations, however, we usually want to enter numbers, and
have them interpreted correctly as numbers, not strings. Luckily it is straight-
forward to convert a string into a number. The following will do it:

temp = input("Enter the value of x: ")


x = float(temp)
print("The value of x is",x)

This is slightly more complicated. It receives a string input from the user and
assigns it to the temporary variable temp, which will be a string-type vari-
able. Then the statement “x = float(temp)” converts the string value to a
floating-point value, which is then assigned to the variable x, and this is the
value that is printed out. One can also convert string input values into in-
tegers or complex numbers with statements of the form “x = int(temp)” or
“x = complex(temp)”.
In fact, one doesn’t have to use a temporary variable. The code above can
be expressed more succinctly like this:

x = float(input("Enter the value of x: "))


print("The value of x is",x)

which takes the string value given by input, converts it to a float, and assigns
it directly to the variable x. We will use this trick many times in this book.
In order for this program to work, the value the user types must be one that
makes sense as a floating-point value, otherwise the computer will complain.
Thus, for instance, the following is fine:

Enter the value of x: 1.5


The value of x is 1.5

But if I enter the wrong thing, I get this:

Enter the value of x: Hello


ValueError: invalid literal for float(): Hello

22
2.2 | B ASIC PROGRAMMING

This is our first example of an error message. The computer, in rather opaque
technical jargon, is complaining that we have given it an incorrect value.
It’s normal to make a few mistakes when writing or using computer pro-
grams, and you will soon become accustomed to the occasional error message
(if you are not already). Working out what these messages mean is one of the
tricks of the business—they are often not entirely transparent.

2.2.4 A RITHMETIC

So far our programs have done very little, certainly nothing that would be
much use for physics. But we can make them much more useful by adding
some arithmetic into the mix.
In most places where you can use a single variable in Python you can also
use a mathematical expression, like “x+y”. Thus you can write “print(x)” but
you can also write “print(x+y)” and the computer will calculate the sum of
x and y for you and print out the result. The basic mathematical operations—
addition, subtraction, etc.—are written as follows:
x+y addition
x-y subtraction
x*y multiplication
x/y division
x**y raising x to the power of y
Notice that we use the asterisk symbol ”*” for multiplication and the slash
symbol ”/” for division, because there is no × or ÷ symbol on a standard
computer keyboard.
Two more obscure, but still useful operations, are integer division and the
modulo operation:
x//y the integer part of x divided by y, meaning x is divided by y and
the part after the decimal point is discarded. For instance, 14//3
gives 4.
x%y modulo, which means the remainder after x is divided by y. For
instance, 14%3 gives 2, because 14 divided by 3 gives 4-remainder-
2. This also works for nonintegers: 1.5%0.4 gives 0.3, because 1.5
is 3 × 0.4, remainder 0.3. (There is, however, no modulo opera-
tion for complex numbers.) The modulo operation is particularly
useful for telling when one number is divisible by another—the
value of n%m will be zero if n is divisible by m. Thus, for instance,
n%2 is zero if n is even (and one if n is odd).

23
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

There are a few other mathematical operations available in Python as well,


but they’re more obscure and rarely used.10
An important rule about arithmetic in Python is that the type of result a
calculation gives depends on the types of the variables that go into it. Consider,
for example, this statement

x = a + b

If a and b are variables of the same type—integer, float, complex—then when


they are added together the result will also have the same type and this will
be the type of variable x. So if a is 1.5 and b is 2.4 the end result will be that
x is a floating-point variable with value 3.9. Note when adding floats like this
that even if the end result of the calculation is a whole number, the variable x
would still be floating point: if a is 1.5 and b is 2.5, then the result of adding
them together is 4, but x will still be a floating-point variable with value 4.0
because the variables a and b that went into it are floating point.
If a and b are of different types, then the end result has the more general
of the two types that went into it. This means that if you add a float and an
integer, for example, the end result will be a float. If you add a float and a
complex variable, the end result will be complex.
The same rules apply to subtraction, multiplication, integer division, and
the modulo operation: the end result is the same type as the starting values, or
the more general type if there are two different starting types. The division op-
eration, however—ordinary non-integer division denoted by ”/”—is slightly
different: it follows basically the same rules except that it never gives an inte-
ger result. Only floating-point or complex values result from division. This is
necessary because you can divide one integer by another and get a noninteger
result (like 3 ÷ 2 = 1.5 for example), so it wouldn’t make sense to have inte-
ger starting values always give an integer final result.11 Thus if you divide any

10
Such as:
x|y bitwise (binary) OR of two integers
x&y bitwise (binary) AND of two integers
x^y bitwise (binary) XOR of two integers
x>>y shift the bits of integer x rightwards y places
x<<y shift the bits of integer x leftwards y places
11
This is another respect in which version 3 of Python differs from earlier versions. In version 2
and earlier all operations gave results of the same type that went into them, including division.
This, however, caused a lot of confusion for exactly the reason given here: if you divided 3 by 2, for

24
2.2 | B ASIC PROGRAMMING

combination of integers or floats by one another you will always get a floating-
point value. If you start with one or more complex numbers then you will get
a complex value at the end.
You can combine several mathematical operations together to make a more
complicated expression, like x+2*y-z/3. When you do this the operations obey
rules similar to those of normal algebra. Multiplications and divisions are per-
formed before additions and subtractions. If there are several multiplications
or divisions in a row they are carried out in order from left to right. Powers are
calculated before anything else. Thus

x+2*y is equivalent to x + 2y
x-y/2 is equivalent to x − 12 y
3*x**2 is equivalent to 3x2
1
x/2*y is equivalent to 2 xy

You can also use parentheses () in your algebraic expressions, just as you
would in normal algebra, to mark things that should be evaluated as a unit,
as in 2*(x+y). And you can add spaces between the parts of a mathematical
expression to make it easier to read; the spaces don’t affect the value of the ex-
pression. So “x=2*(a+b)” and “x = 2 * ( a + b )” do the same thing. Thus
the following are allowed statements in Python

x = a + b/c
x = (a + b)/c
x = a + 2*b - 0.5*(1.618**c + 2/7)

On the other hand, the following will not work:

2*x = y

You might expect that this would result in the value of x being set to half
the value of y, but it’s not so. In fact, if you write this line in a program the
computer will simply stop when it gets to that line and print a typically cryp-
tic error message—”SyntaxError: can’t assign to operator”—because it

instance, the result had to be an integer, so the computer rounded it down from 1.5 to 1. Because of
the difficulties this caused, the language was changed in version 3 to give the current more sensible
behavior. You can still get the old behavior of dividing then rounding down using the integer
divide operation //. Thus 3//2 gives 1 in all versions of Python. If you are using Python version 2
(technically, version 2.1 or later) and want the newer behavior of the divide operation, you can
achieve it by including the statement “from future import division” at the start of your
program. The differences between Python versions are discussed in more detail in Appendix B.

25
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

doesn’t know what to do. The problem is that Python does not know how to
solve equations for you by rearranging them. It only knows about the simplest
forms of equations, such as “x = y/2”. If an equation needs to be rearranged
to give the value of x then you have to do the rearranging for yourself. Python
will do basic sums for you, but its knowledge of math is very limited.
To be more precise, statements like “x = a + b/c” in Python are not techni-
cally equations at all, in the mathematical sense. They are assignments. When
it sees a statement like this, what your computer actually does is very simple-
minded. It first examines the right-hand side of the equals sign and evaluates
whatever expression it finds there, using the current values of any variables
involved. When it is finished working out the value of the whole expression,
and only then, it takes that value and assigns it to the variable on the left of
the equals sign. In practice, this means that assignment statements in Python
sometimes behave like ordinary equations, but sometimes they don’t. A sim-
ple statement like “x = 1” does exactly what you would think, but what about
this statement:

x = x + 1

This does not make sense, under any circumstances, as a mathematical equa-
tion. There is no way that x can ever be equal to x + 1—it would imply that
0 = 1. But this statement makes perfect sense in Python. Suppose the value of
x is currently 1. When the statement above is executed by the computer it first
evaluates the expression on the right-hand side, which is x + 1 and therefore
has the value 1 + 1 = 2. Then, when it has calculated this value it assigns it
to the variable on the left-hand side, which just happens in this case to be the
same variable x. So x now gets a new value 2. In fact, no matter what value
of x we start with, this statement will always end up giving x a new value that
is 1 greater. So this statement has the simple (but often very useful) effect of
increasing the value of x by one.
Thus consider the following lines:

x = 0
print(x)
x = x**2 - 2
print(x)

What will happen when the computer executes these lines? The first two are
straightforward enough: the variable x gets the value 0 and then the 0 gets
printed out. But then what? The third line says “x = x**2 - 2” which in nor-

26
2.2 | B ASIC PROGRAMMING

mal mathematical notation would be x = x2 − 2, which is a quadratic equation


with solutions x = 2 and x = −1. However, the computer will not set x equal
to either of these values. Instead it will evaluate the right-hand side of the
equals sign and get x2 − 2 = 02 − 2 = −2 and then set x to this new value.
Then the last line of the program will print out ”-2”.
Thus the computer does not necessarily do what one might think it would,
based on one’s experience with normal mathematics. The computer will not
solve equations for x or any other variable. It won’t do your algebra for you—
it’s not that smart.
Another set of useful tricks are the Python modifiers, which allow you to
make changes to a variable as follows:
x += 1 add 1 to x (i.e., make x bigger by 1)
x -= 4 subtract 4 from x
x *= -2.6 multiply x by −2.6
x /= 5*y divide x by 5 times y
x //= 3.4 divide x by 3.4 and round down to an integer
As we have seen, you can achieve the same result as these modifiers with
statements like “x = x + 1”, but the modifiers are more succinct. Some people
also prefer them precisely because “x = x + 1” looks like bad algebra and can
be confusing.
Finally in this section, a nice feature of Python, not available in most other
computer languages, is the ability to assign the values of two variables with a
single statement. For instance, we can write

x,y = 1,2.5

which is equivalent to the two statements

x = 1
y = 2.5

One can assign three or more variables in the same way, listing them and their
assigned values with commas in between.
A more sophisticated example is

x,y = 2*z+1,(x+y)/3

An important point to appreciate is that, like all other assignment statements,


this one calculates the whole of the right-hand side of the equation before as-
signing values to the variables on the left. Thus in this example the computer

27
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

will calculate both of the values 2*z+1 and (x+y)/3 from the current x, y, and z,
before assigning those calculated values to x and y.
One purpose for which this type of multiple assignment is commonly used
is to interchange the values of two variables. If we want to swap the values of
x and y we can write:

x,y = y,x

and the two will be exchanged. (In most other computer languages such swaps
are more complicated, requiring the use of an additional temporary variable.)

E XAMPLE 2.1: A BALL DROPPED FROM A TOWER

Let us use what we have learned to solve a first physics problem. This is a
very simple problem, one we could easily do for ourselves on paper, but don’t
worry—we will move onto more complex problems shortly.
The problem is as follows. A ball is dropped from a tower of height h. It has
initial velocity zero and accelerates downwards under gravity. The challenge
is to write a program that asks the user to enter the height in meters of the
tower and a time interval t in seconds, then prints on the screen the height of
the ball above the ground at time t after it is dropped, ignoring air resistance.
The steps involved are the following. First, we will use input statements
to get the values of h and t from the user. Second, we will calculate how far
the ball falls in the given time, using the standard kinematic formula s = 12 gt2 ,
where g = 9.81 ms−2 is the acceleration due to gravity. Third, we print the
height above the ground at time t, which is equal to the total height of the
tower minus this value, or h − s.
Here’s what the program looks like, all four lines of it:12

File: dropped.py h = float(input("Enter the height of the tower: "))


t = float(input("Enter the time interval: "))
s = 9.81*t**2/2
print("The height of the ball is",h-s,"meters")

12
Many of the example programs in this book are also available on-line for you to download
and run on your own computer if you wish. The programs, along various other useful resources,
are packaged together in a single “zip” file (of size about nine megabytes) which can be down-
loaded from http://www.umich.edu/~mejn/cpresources.zip. Throughout the book, a name
printed in the margin next to a program, such as “dropped.py” above, indicates that the complete
program can be found, under that name, in this file. Any mention of programs or data in the
“on-line resources” also refers to the same file.

28
2.2 | B ASIC PROGRAMMING

Let us use this program to calculate the height of a ball dropped from a 100 m
high tower after 1 second and after 5 seconds. Running the program twice in
succession we find the following:

Enter the height of the tower: 100


Enter the time interval: 1
The height of the ball is 95.095 meters

Enter the height of the tower: 100


Enter the time interval: 5
The height of the ball is -22.625 meters

Notice that the result is negative in the second case, which means that the
ball would have fallen to below ground level if that were possible, though in
practice the ball would hit the ground first. Thus a negative value indicates
that the ball hits the ground before time t.
Before we leave this example, here’s a suggestion for a possible improve-
ment to the program above. At present we perform the calculation of the
distance traveled with the single line “s = 9.81*t**2/2”, which includes the
constant 9.81 representing the acceleration due to gravity. When we do physics
calculations on paper, however, we normally don’t write out the values of con-
stants in full like this. Normally we would write s = 21 gt2 , with the under-
standing that g represents the acceleration. We do this primarily because it’s
easier to read and understand. A single symbol g is easier to read than a row of
digits, and moreover the use of the standard letter g reminds us that the quan-
tity we are talking about is the gravitational acceleration, rather than some
other constant that happens to have value 9.81. Especially in the case of con-
stants that have many digits, such as π = 3.14159265 . . ., the use of symbols
rather than digits in algebra makes life a lot easier.
The same is also true of computer programs. You can make your programs
substantially easier to read and understand by using symbols for constants in-
stead of writing the values out in full. This is easy to do—just create a variable
to represent the constant, like this:

g = 9.81
s = g*t**2/2

You only have to create the variable g once in your program (usually some-
where near the beginning) and then you can use it as many times as you like
thereafter. Doing this also has the advantage of decreasing the chances that

29
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

you’ll make a typographical error in the value of a constant. If you have to


type out many digits every time you need a particular constant, odds are you
are going to make a mistake at some point. If you have a variable representing
the constant then you know the value will be right every time you use it, just
so long as you typed it correctly when you first created the variable.13
Using variables to represent constants in this way is one example of a pro-
gramming trick that improves your programs even though it doesn’t change
the way they actually work. Instead it improves readability and reliability,
which can be almost as important as writing a correct program. We will see
other examples of such tricks later.

Exercise 2.1: Another ball dropped from a tower


A ball is again dropped from a tower of height h with initial velocity zero. Write a
program that asks the user to enter the height in meters of the tower and then calculates
and prints the time the ball takes until it hits the ground, ignoring air resistance. Use
your program to calculate the time for a ball dropped from a 100 m high tower.

Exercise 2.2: Altitude of a satellite


A satellite is to be launched into a circular orbit around the Earth so that it orbits the
planet once every T seconds.
a) Show that the altitude h above the Earth’s surface that the satellite must have is
¶1/3
GMT 2
µ
h= − R,
4π 2

where G = 6.67 × 10−11 m3 kg−1 s−2 is Newton’s gravitational constant, M =


5.97 × 1024 kg is the mass of the Earth, and R = 6371 km is its radius.
b) Write a program that asks the user to enter the desired value of T and then calcu-
lates and prints out the correct altitude in meters.
c) Use your program to calculate the altitudes of satellites that orbit the Earth once
a day (so-called “geosynchronous” orbit), once every 90 minutes, and once every
45 minutes. What do you conclude from the last of these calculations?
d) Technically a geosynchronous satellite is one that orbits the Earth once per sidereal
day, which is 23.93 hours, not 24 hours. Why is this? And how much difference
will it make to the altitude of the satellite?

13
In some computer languages, such as C, there are separate entities called “variables” and
“constants,” a constant being like a variable except that its value can be set only once in a program
and is fixed thereafter. There is no such thing in Python, however; there are only variables.

30
2.2 | B ASIC PROGRAMMING

2.2.5 F UNCTIONS , PACKAGES , AND MODULES

There are many operations one might want to perform in a program that are
more complicated than simple arithmetic, such as multiplying matrices, calcu-
lating a logarithm, or making a graph. Python comes with facilities for doing
each of these and many other common tasks easily and quickly. These facil-
ities are divided into packages—collections of related useful things—and each
package has a name by which you can refer to it. For instance, all the standard
mathematical functions, such as logarithm and square root, are contained in a
package called math. Before you can use any of these functions you have to tell
the computer that you want to. For example, to tell the computer you want to
use the log function, you would add the following line to your program:

from math import log

This tells the computer to “import” the logarithm function from the math pack-
age, which means that it copies the code defining the function from where it is
stored (usually on the hard disk of your computer) into the computer’s mem-
ory, ready for use by your program. You need to import each function you use
only once per program: once the function has been imported it continues to be
available until the program ends. You must import the function before the first
time you use it in a calculation and it is good practice to put the “from” state-
ment at the very start of the program, which guarantees that it occurs before
the first use of the function and also makes it easy to find when you are work-
ing on your code. As we write more complicated programs, there will often
be situations where we need to import many different functions into a single
program with many different from statements, and keeping those statements
together in a tidy block at the start of the code will make things much easier.
Once you have imported the log function you can use it in a calculation like
this:

x = log(2.5)

which will calculate the (natural) logarithm of 2.5 and set the variable x equal
to the result. Note that the argument of the logarithm, the number 2.5 in this
case, goes in parentheses. If you miss out the parentheses the computer will
complain. (Also if you use the log function without first importing it from the
math package the computer will complain.)
The math package contains a good selection of the most commonly used
mathematical functions:

31
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

log natural logarithm


log10 log base 10
exp exponential
sin, cos, tan sine, cosine, tangent (argument in radians)
asin, acos, atan arcsine, arccosine, arctangent (in radians)
sinh, cosh, tanh hyperbolic sine, cosine, tangent
sqrt positive square root

Note that the trigonometric functions work with angles specified in radians,
not degrees. And the exponential and square root functions may seem re-
dundant, since one can calculate both exponentials and square roots by taking
powers. For instance, x**0.5 would give the square root of x. Because of the
way the computer calculates powers and roots, however, using the functions
above is usually quicker and more accurate.
The math package also contains a number of less common functions, such
as the Gaussian error function and the gamma function, as well as two objects
that are not functions at all but constants, namely e and π, which are denoted
e and pi. This program, for instance, calculates the value of π 2 :

from math import pi


print(pi**2)

which prints 9.86960440109 (which is roughly the right answer). Note that
there are no parentheses after the “pi” when we use it in the print statement,
because it is not a function. It’s just a variable called pi with value 3.14159 . . .
The functions in the math package do not work with complex numbers and
the computer will give an error message if you try. But there is another pack-
age called cmath that contains versions of most of the same functions that do
work with complex numbers, plus a few additional functions that are specific
to complex arithmetic.
In some cases you may find you want to use more than one function from
the same package in a program. You can import two different functions—say
the log and exponential functions—with two from statements, like this:

from math import log


from math import exp

but a more succinct way to do it is to use a single line like this:

from math import log,exp

32
2.2 | B ASIC PROGRAMMING

You can import a list as long as you like from a single package in this way:

from math import log,exp,sin,cos,sqrt,pi,e

You can also import all of the functions in a package with a statement of the
form

from math import *

The * here means “everything”.14 In most cases, however, I advise against


using this import-everything form because it can give rise to some unexpected
behaviors (for instance, if, unbeknownst to you, a package contains a function
with the same name as one of your variables, causing a clash between the two).
It’s usually better to explicitly import only those functions you actually need
to use.15
Finally, some large packages are for convenience split into smaller sub-
packages, called modules. A module within a larger package is referred to as
packagename.modulename. As we will see shortly, for example, there are a
large number of useful mathematical facilities available in the package called
numpy, including facilities for linear algebra and Fourier transforms, each in
their own module within the larger package. Thus the linear algebra module
is called numpy.linalg and the Fourier transform module is called numpy.fft
(for “fast Fourier transform”). We can import a function from a module thus:

from numpy.linalg import inv

This would import the inv function, which calculates the inverse of a matrix.
Smaller packages, like the math package, have no submodules, in which
case one could, arguably, say that the entire package is also a module, and in

14
There is also another way to import the entire contents of a package in Python, with a state-
ment of the form “import math”. If you use this form, however, then when you subsequently
use one of the imported functions you have to write, for example, x = math.log(2.5), instead of
just x = log(2.5). Since the former is more complicated and annoying, it gets used rather rarely.
Moreover the existence of the two types of import, and particularly their simultaneous use in the
same program, can be quite confusing, so we will use only the “from” form in this book.
15
A particular problem is when an imported package contains a function with the same name
as a previously existing function. In such a case the newly imported one will be selected in favor of
the previous one, which may not always be what you want. For instance, the packages math and
cmath contain many functions with the same names, such as sqrt. But the sqrt function in cmath
works with complex numbers and the one in math does not. If one did “from cmath import *”
followed by “from math import *”, one would end up with the version of sqrt that works only
with real numbers. And if one then attempted to calculate the square root of a complex number,
one would get an error message.

33
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

such cases the words package and module are often used interchangeably.

E XAMPLE 2.2: C ONVERTING POLAR COORDINATES

Suppose the position of a point in two-dimensional space is given to us in polar


coordinates r, θ and we want to convert it to Cartesian coordinates x, y. How
would we write a program to do this? The appropriate steps are:
1. Get the user to enter the values of r and θ.
2. Convert those values to Cartesian coordinates using the standard formu-
las:
x = r cos θ, y = r sin θ. (2.1)

3. Print out the results.


Since the formulas (2.1) involve the mathematical functions sin and cos we are
going to have to import those functions from the math package at the start of
the program. Also, the sine and cosine functions in Python (and in most other
computer languages) take arguments in radians. If we want to be able to enter
the angle θ in degrees then we are going to have to convert from degrees to
radians, which means multiplying by π and dividing by 180.
Thus our program might look something like this:

File: polar.py from math import sin,cos,pi

r = float(input("Enter r: "))
d = float(input("Enter theta in degrees: "))

theta = d*pi/180
x = r*cos(theta)
y = r*sin(theta)

print("x =",x," y =",y)

Take a moment to read through this complete program and make sure you un-
derstand what each line is doing. If we run the program, it will do something
like the following:

Enter r: 2
Enter theta in degrees: 60
x = 1.0 y = 1.73205080757

(Try it for yourself if you like.)

34
2.2 | B ASIC PROGRAMMING

2.2.6 B UILT- IN FUNCTIONS

There are a small number of functions in Python, called built-in functions, which
don’t come from any package. These functions are always available to you in
every program; you do not have to import them. We have in fact seen several
examples of built-in functions already. For instance, we saw the float func-
tion, which takes a number and converts it to floating point (if it’s not floating
point already):

x = float(1)

There are similar functions int and complex that convert to integers and com-
plex numbers. Another example of a built-in function, one we haven’t seen
previously, is the abs function, which returns the absolute value of a number,
or the modulus in the case of a complex number. Thus, abs(-2) returns the
integer value 2 and abs(3+4j) returns the floating-point value 5.0.
Earlier we also used the built-in functions input and print, which are not
mathematical functions in the usual sense of taking a number as argument and
performing a calculation on it, but as far as the computer is concerned they are
still functions. Consider, for instance, the statement

x = input("Enter the value of x: ")

Here the input function takes as argument the string “Enter the value of
x: ”, prints it out, waits for the user to type something in response, then sets x
equal to that something.
The print function is slightly different. When we say

print(x)

print is a function, but it is not here generating a value the way the log or input
functions do. It does something with its argument x, namely printing it out
on the screen, but it does not generate a value. This differs from the functions
we are used to in mathematics, but it’s allowed in Python. Sometimes you just
want a function to do something but it doesn’t need to generate a value.

Exercise 2.3: Write a program to perform the inverse operation to that of Exercise 2.2.
That is, ask the user for the Cartesian coordinates x, y of a point in two-dimensional
space, and calculate and print the corresponding polar coordinates, with the angle θ
given in degrees.

35
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

Exercise 2.4: A spaceship travels from Earth in a straight line at relativistic speed v to
another planet x light years away. Write a program to ask the user for the value of x
and the speed v as a fraction of the speed of light c, then print out the time in years that
the spaceship takes to reach its destination (a) in the rest frame of an observer on Earth
and (b) as perceived by a passenger on board the ship. Use your program to calculate
the answers for a planet 10 light years away with v = 0.99c.

Exercise 2.5: A well-known quantum mechanics problem involves a particle of mass m


that encounters a one-dimensional potential step, like this:

E
R T
V
incoming

0

The particle with initial kinetic energy E and wavevector k1 = 2mE/h̄ enters from the
left and encounters a sudden jump in potential energy of height V at position x = 0.
By solving the Schrödinger equation, one can show that when E > V the particle may
either (a) pass the step, in which case it has a lower kineticpenergy of E − V on the other
side and a correspondingly smaller wavevector of k2 = 2m( E − V )/h̄, or (b) it may
be reflected, keeping all of its kinetic energy and an unchanged wavevector but moving
in the opposite direction. The probabilities T and R for transmission and reflection are
given by
¶2
k1 − k2
µ
4k1 k2
T= , R = .
( k 1 + k 2 )2 k1 + k2
Suppose we have a particle with mass equal to the electron mass m = 9.11 ×
10−31 kg and energy 10 eV encountering a potential step of height 9 eV. Write a Python
program to compute and print out the transmission and reflection probabilities using
the formulas above.

2.2.7 C OMMENT STATEMENTS

This is a good time to mention another important feature of Python (and every
other computer language), namely comments. In Python any program line that
starts with a hash mark “#” is ignored completely by the computer. You can
type anything you like on the line following a hash mark and it will have no
effect:

# Hello! Hi there! This line does nothing at all.

36
2.2 | B ASIC PROGRAMMING

Such lines are called comments. Comments make no difference whatsoever


to the way a program runs, but they can be very useful nonetheless. You can
use comment lines to leave reminders for yourself in your programs, saying
what particular parts of the program do, what quantities are represented by
which variables, changes that you mean to make later to the program, things
you’re not sure about, and so forth. Here, for instance, is a version of the
polar coordinates program from Example 2.2, with comments added to explain
what’s happening:

from math import sin,cos,pi File: polar.py

# Ask the user for the values of the radius and angle
r = float(input("Enter r: "))
d = float(input("Enter theta in degrees: "))

# Convert the angle to radians


theta = d*pi/180

# Calculate the equivalent Cartesian coordinates


x = r*cos(theta)
y = r*sin(theta)

# Print out the results


print("x =",x," y =",y)

This version of the program will perform identically to the original version on
page 34, but it is easier to understand how it works.
Comments may seem unnecessary for short programs like this one, but
when you get on to creating larger programs that perform complex physics cal-
culations you will find them very useful for reminding yourself of how things
work. When you’re writing a program you may think you remember how ev-
erything works and there is no need to add comments, but when you return to
the same program again a week later after spending the intervening time on
something else you’ll find it’s a different story—you can’t remember how any-
thing works or why you did things this way or that, and you will be very glad
if you scattered a few helpful pointers in comment lines around the program.
Comments become even more important if someone else other than you
needs to understand a program you have written, for instance if you’re work-
ing as part of a team that is developing a large program together. Understand-
ing how other people’s programs work can be tough at the best of times, and
you will make your collaborators’ lives a lot easier if you include some ex-

37
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

planatory comments as you go along.


Comments don’t have to start at the beginning of a line. Python ignores
any portion of a line that follows a hash mark, whether the hash mark is at the
beginning or not. Thus you can write things like this:

theta = d*pi/180 # Convert the angle to radians

and the computer will perform the calculation θ = dπ/180 at the beginning of
the line but completely ignore the hash mark and the text at the end. This is a
useful trick when you intend that a comment should refer to a specific single
line of code only.

2.3 C ONTROLLING PROGRAMS WITH “ IF ” AND “ WHILE ”


The programs we have seen so far are all very linear. They march from one
statement to the next, from beginning to end of the program, then they stop.
An important feature of computers is their ability to break this linear flow, to
jump around the program, execute some lines but not others, or make deci-
sions about what to do next based on given criteria. In this section we will see
how this is done in the Python language.

2.3.1 T HE IF STATEMENT

It will happen often in our computer programs that we want to do something


only if a certain condition is met: only if n = 0 perhaps, or x > 12 . We can do
this using an if statement. Consider the following example:

x = int(input("Enter a whole number no greater than ten: "))


if x>10:
print("You entered a number greater than ten.")
print("Let me fix that for you.")
x = 10
print("Your number is",x)

If I run this program and type in “5”, I get:

Enter a whole number no greater than ten: 5


Your number is 5

But if I break the rules and enter 11, I get:

38
2.3 | C ONTROLLING PROGRAMS WITH “ IF ” AND “ WHILE ”

Enter a whole number no greater than ten: 11


You entered a number greater than ten.
Let me fix that for you.
Your number is 10

This behavior is achieved using an if statement—the second line in the pro-


gram above—which tests the value of the variable x to see if it is greater than
ten. Note the structure of the if statement: there is the “if” part itself, which
consists of the word “if” followed by the condition you are applying. In this
case the condition is that x > 10. The condition is followed by a colon, and
following that are one or more lines that tell the computer what to do if the
condition is satisfied. In our program there are three of these lines, the first
two printing out a message and the third fixing the value of x. Note that these
three lines are indented—they start with a few spaces so that the text is shifted
over a bit from the left-hand edge. This is how we tell the program which
instructions are “part of the if.” The indented instructions will be executed
only if the condition in the if statement is met, i.e., only if x > 10 in this case.
Whether or not the condition is met, the computer then moves on to the next
line of the program, which prints the value of x.
(In Section 1 we saw that you are free to add spaces between the parts of
a Python statement to make it more readable, as in “x = 1”, and that such
spaces will have no effect on the operation of the program. Here we see an
exception to that rule: spaces at the beginning of lines do have an effect with
an if statement. For this reason one should be careful about putting spaces at
the beginning of lines—they should be added only when they are needed, as
here, and not otherwise.)
A question that people sometimes ask is, “How many spaces should I put at
the start of a line when I am indenting it?” The answer is that you can use any
number you like. Python considers any number of spaces, from one upward,
to constitute an indentation. However, it has over the years become standard
practice among Python programmers to use four spaces for an indentation,
and this is the number used in all the programs in this book. In fact, most
Python development environments, including IDLE, automatically insert the
spaces for you when they see an if statement, and they typically insert four.
There are various different types of conditions one can use in an if state-
ment. Here are some examples:

39
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

if x==1: Check if x = 1. Note the double equals sign.


if x>1: Check if x >1
if x>=1: Check if x ≥1
if x<1: Check if x <1
if x<=1: Check if x ≤1
if x!=1: Check if x 6= 1
Note particularly the double equals sign in the first example. It is one of the
most common programming errors that people make in Python to use a single
equals sign in an if statement instead of a double one. If you do this, you’ll get
an error message when you try to run your program.
You can also combine two conditions in a single if statement, like this:

if x>10 or x<1:
print("Your number is either too big or too small.")

You can use “and” in a similar way:

if x<=10 and x>=1:


print("Your number is just right.")

You can combine more than two criteria on a line as well—as many as you like.
Two useful further elaborations of the if statement are else and elif:

if x>10:
print("Your number is greater than ten.")
else:
print("Your number is fine. Nothing to see here.")

This prints different messages depending on whether x is greater than 10 or


not. Note that the else line, like the original if, is not indented and has a
colon at the end. It is followed by one or more indented lines, the indentation
indicating that the lines are “inside” the else clause.
An even more elaborate example is the following:

if x>10:
print("Your number is greater than ten.")
elif x>9:
print("Your number is OK, but you’re cutting it close.")
else:
print("Your number is fine. Move along.")

40
2.3 | C ONTROLLING PROGRAMS WITH “ IF ” AND “ WHILE ”

The statement elif means “else if”—if the first criterion is not met it tells the
computer to try a different one. Notice that we can use both elif and else,
as here—if neither of the conditions specified in the if and elif clauses is
satisfied then the computer moves on to the else clause. You can also have
more than one elif, indeed you can have as many as you like, each one testing
a different condition if the previous one was not satisfied.

2.3.2 T HE WHILE STATEMENT

A useful variation on the if statement is the while statement. It looks and be-
haves similarly to the if statement:

x = int(input("Enter a whole number no greater than ten: "))


while x>10:
print("This is greater than ten. Please try again.")
x = int(input("Enter a whole number no greater than ten: "))
print("Your number is",x)

As with the if statement, the while statement checks if the condition given is
met (in this case if x > 10). If it is, it executes the indented block of code
immediately following; if not, it skips the block. However (and this is the
important difference), if the condition is met and the block is executed, the
program then loops back from the end of the block to the beginning and checks
the condition again. If the condition is still true, then the indented lines will be
executed again. And it will go on looping around like this, repeatedly checking
the condition and executing the indented code, until the condition is finally
false. (And if it is never false, then the loop goes on forever.16 ) Thus, if I were
to run the snippet of code above, I would get something like this:

Enter a whole number no greater than ten: 11


This is greater than ten. Please try again.
Enter a whole number no greater than ten: 57
This is greater than ten. Please try again.
Enter a whole number no greater than ten: 100
This is greater than ten. Please try again.
Enter a whole number no greater than ten: 5
Your number is 5

16
If you accidentally create a program with a loop that goes on for ever then you’ll need to
know how to stop the program: just closing the window where the program is running does the
trick.

41
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

The computer keeps on going around the loop, asking for a number until it
gets what it wants. This construct—sometimes also called a while loop—is com-
monly used in this way to ensure that some condition is met in a program or to
keep on performing an operation until a certain point or situation is reached.
As with the if statement, we can specify two or more criteria in a single
while statement using “and” or “or”. The while statement can also be followed
by an else statement, which is executed once (and once only) if and when the
condition in the while statement fails. (This type of else statement is primarily
used in combination with the break statement described in the next section.)
There is no equivalent of elif for a while loop, but there are two other useful
statements that modify its behavior, break and continue.

2.3.3 B REAK AND CONTINUE

Two useful refinements of the while statement are the break and continue state-
ments. The break statement allows us to break out of a loop even if the condi-
tion in the while statement is not met. For instance,

while x>10:
print("This is greater than ten. Please try again.")
x = int(input("Enter a whole number no greater than ten: "))
if x==111:
break

This loop will continue looping until you enter a number not greater than 10,
except if you enter the number 111, in which case it will give up and proceed
with the rest of the program.
If the while loop is followed by an else statement, the else statement is not
executed after a break. This allows you to create a program that does different
things if the while loop finishes normally (and executes the else statement) or
via a break (in which case the else statement is skipped).
This example also illustrates another new concept: it contains an if state-
ment inside a while loop. This is allowed in Python and used often. In the
programming jargon we say the if statement is nested inside the while loop.
While loops nested inside if statements are also allowed, or ifs within ifs, or
whiles within whiles. And it doesn’t have to stop at just two levels. Any num-
ber of statements within statements within statements is allowed. When we
get onto some of the more complicated calculations in this book we will see
examples nested four or five levels deep. In the example above, note how the
break statement is doubly indented from the left margin—it is indented by an

42
2.3 | C ONTROLLING PROGRAMS WITH “ IF ” AND “ WHILE ”

extra four spaces, for a total of eight, to indicate that it is part of a statement-
within-a-statement.17
A variant on the idea of the break statement is the continue statement. Say-
ing continue anywhere in a loop will make the program skip the rest of the
indented code in the while loop, but instead of getting on with the rest of the
program, it then goes back to the beginning of the loop, checks the condition
in the while statement again, and goes around the loop again if the condition
is met. (The continue statement turns out to be used rather rarely in practice.
The break statement, on the other hand, gets used often and is definitely worth
knowing about.)

E XAMPLE 2.3: E VEN AND ODD NUMBERS

Suppose we want to write a program that takes as input a single integer and
prints out the word “even” if the number is even, and “odd” if the number is
odd. We can do this by making use of the fact that n modulo 2 is zero if (and
only if) n is even. Recalling that n modulo 2 is written as n%2 in Python, here’s
how the program would go:

n = int(input("Enter an integer: "))


if n%2==0:
print("even")
else:
print("odd")

Now suppose we want a program that asks for two integers, one even and
one odd—in either order—and keeps on asking until it gets what it wants. We
could do this by checking all of the various combinations of even and odd, but
a simpler approach is to notice that if we have one even and one odd number
then their sum is odd; otherwise it’s even. Thus our program might look like
this:

print("Enter two integers, one even, one odd.") File: evenodd.py


m = int(input("Enter the first integer: "))
n = int(input("Enter the second integer: "))
while (m+n)%2==0:

17
We will come across some examples in this book where we have a loop nested inside another
loop and then a break statement inside the inner loop. In that case the break statement breaks out
of the inner loop only, and not the outer one. (If this doesn’t make sense to you, don’t worry—it’ll
become clear later when we look at examples with more than one loop.)

43
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

print("One must be even and the other odd.")


m = int(input("Enter the first integer: "))
n = int(input("Enter the second integer: "))
print("The numbers you chose are",m,"and",n)

Note how the while loop checks to see if m + n is even. If it is, then the numbers
you entered must be wrong—either both are even or both are odd—so the
program asks for another pair, and it will keep on doing this until it gets what
it wants.
As before, take a moment to look over this program and make sure you
understand what each line does and how the program works.

E XAMPLE 2.4: T HE F IBONACCI SEQUENCE

The Fibonacci numbers are the sequence of integers in which each is the sum
of the previous two, with the first two numbers being 1, 1. Thus the first few
members of the sequence are 1, 1, 2, 3, 5, 8, 13, 21. Suppose we want to write
a program to calculate the Fibonacci sequence up to 1000. This would be quite
a laborious task for a human, but it is straightforward for a computer. All we
need to do is keep a record of the last two numbers in the sequence, add them
together to calculate the next number, then keep on repeating for as long as the
numbers are less than 1000. Here’s a program to do it:

f1 = 1
f2 = 1
next = f1 + f2
while f1<=1000:
print(f1)
f1 = f2
f2 = next
next = f1 + f2

Observe how the program works. At all times the variables f1 and f2 store the
two most recent elements of the sequence and the variable next stores the next
element, calculated by summing f1 and f2. If f1 is less than 1000, we print it
out, update the values of f1 and f2, and calculate a new value for next. The
process continues until the value of f1 exceeds 1000, then the program stops.
This program would work fine—it gets the job done—but here’s a neater
way to solve the same problem using the “multiple assignment” feature of
Python discussed in Section 2.2.4:

44
2.4 | L ISTS AND ARRAYS

f1,f2 = 1,1 File: fibonacci.py


while f1<=1000:
print(f1)
f1,f2 = f2,f1+f2

If we run this program, we get the following:

1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987

Indeed, the computer will happily print out the Fibonacci sequence up to a
billion or more in just a second or two. Try it if you like.

Exercise 2.6: Catalan numbers


The Catalan numbers Cn are a sequence of integers 1, 1, 2, 5, 14, 42, 132. . . that play
an important role in quantum mechanics and the theory of disordered systems. (They
were central to Eugene Wigner’s proof of the so-called semicircle law.) They are given by
4n + 2
C0 = 1, Cn+1 = Cn .
n+2
Write a program that prints in increasing order all Catalan numbers less than or equal
to one billion.

2.4 L ISTS AND ARRAYS

We have seen how to work with integer, real, and complex quantities in Python
and how to use variables to store those quantities. All the variables we have

45
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

seen so far, however, represent only a single value, a single integer, real, or
complex number. But in physics it is common for a variable to represent sev-
eral numbers at once. We might use a vector r, for instance, to represent the
position of a point in three-dimensional space, meaning that the single sym-
bol r actually corresponds to three real numbers ( x, y, z). Similarly, a matrix,
again usually denoted by just a single symbol, can represent an entire grid of
numbers, m × n of them, where m and n could be as large as we like. There
are also many cases where we have a set of numbers that we would like to
treat as a single entity even if they do not form a vector or matrix. We might,
for instance, do an experiment in the lab and make a hundred measurements
of some quantity. Rather than give a different name to each one—a, b, c, and
so forth—it makes sense to denote them as say a1 , a2 , a3 , and then to consider
them collectively as a set A = { ai }, a single entity made up of a hundred num-
bers.
Situations like these are so common that Python provides standard fea-
tures, called containers, for storing collections of numbers. There are several
kinds of containers. In this section we look at two of them, lists and arrays.18

2.4.1 L ISTS

The most basic type of container in Python is the list. A list, as the name sug-
gests, is a list of quantities, one after another. In all the examples in this book
the quantities will be numbers of some kind—integers, floats, and so forth—
although any type of quantity that Python knows about is allowed in a list,
such as strings for example.19
The quantities in a list, which are called its elements, do not have to be all
of the same type. You can have an integer, followed by a float, followed by a
complex number if you want. In most of the cases we’ll deal with, however, the
elements will be all of the same type—all integers, say, or all floats—because
this is what physics calculations usually demand. Thus, for instance, in the
example described above where we make a hundred measurements of a given
quantity in the lab and we want to represent them on the computer, we could
use a list one hundred elements long, and all the elements would presumably

18
There are several others as well, the main ones being tuples, dicts, and sets. These, however,
find only occasional use in physics calculations, and we will not use them in this book.
19
If you have programmed in another computer language, then you may be familiar with “ar-
rays,” which are similar to lists but not exactly the same. Python has both lists and arrays and both
have their uses in physics calculations. We study arrays in Section 2.4.2.

46
2.4 | L ISTS AND ARRAYS

be of the same type (probably floats) because they all represent measurements
of the same thing.
A list in Python is written like this: [ 3, 0, 0, -7, 24 ]. The elements
of this particular list are all integers. Note that the elements are separated by
commas and the whole list is surrounded by square brackets. Another example
of a list might be [ 1, 2.5, 3+4.6j ]. This example has three elements of
different types, one integer, one real, and one complex.
A variable can be set equal to a list:

r = [ 1, 1, 2, 3, 5, 8, 13, 21 ]

Previously in this chapter all variables have represented just single numbers,
but here we see that a variable can also represent a list of numbers. You can
print a list variable, just as you can any other variable, and the computer will
print out the entire list. If we run this program:

r = [ 1, 1, 2, 3, 5, 8, 13, 21 ]
print(r)

we get this:

[1, 1, 2, 3, 5, 8, 13, 21]

The quantities that make up the elements of a list can be specified using other
variables, like this:

x = 1.0
y = 1.5
z = -2.2

r = [ x, y, z ]

This will create a three-element list with the value [ 1.0, 1.5, -2.2 ]. It is
important to bear in mind, in this case, what happens when Python encoun-
ters an assignment statement like “r = [ x, y, z ]”. Remember that in such
situations Python first evaluates the expression on the right-hand side, which
gives [ 1.0, 1.5, -2.2 ] in this case, then assigns that value to the variable
on the left. Thus the end result is that r is equal to [ 1.0, 1.5, -2.2 ]. It
is a common error to think of r as being equal to [ x, y, z ] so that if, say,
the value of x is changed later in the program the value of r will change as
well. But this is incorrect. The value of r will get set to [ 1.0, 1.5, -2.2 ]
and will not change later if x is changed. If you want to change the value of

47
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

r you have to explicitly assign a new value to it, with another statement like
“r = [ x, y, z ]”.
The elements of lists can also be calculated from entire mathematical ex-
pressions, like this:

r = [ 2*x, x+y, z/sqrt(x**2+y**2) ]

The computer will evaluate all the expressions on the right-hand side then
create a list from the values it calculated.
Once we have created a list we probably want to do some calculations with
the elements it contains. The individual elements in a list r are denoted r[0],
r[1], r[2], and so forth. That is they are numbered in order, from beginning to
end of the list, the numbers go in square brackets after the variable name, and
crucially the numbers start from zero, not one. This may seem a little odd—it’s
not usually the way we do things in physics or in everyday life—and it takes a
little getting used to. However, it turns out, as we’ll see, to be more convenient
in a lot of situations than starting from one.
The individual elements, such as r[0], behave like single variables and you
can use them in the same way you would use a single variable. Thus, here is
a short program that calculates and prints out the length of a vector in three
dimensions:

from math import sqrt


r = [ 1.0, 1.5, -2.2 ]
length = sqrt( r[0]**2 + r[1]**2 + r[2]**2 )
print(length)

The first line imports the square root function from the math package, which
we need for the calculation. The second line creates the vector, in the form of
a three-element list. The third line is the one that does the actual calculation.
It takes each of the three elements of the vector, which are denoted r[0], r[1],
and r[2], squares them, and adds them together. Then it takes the square root
of the result, which by Pythagoras’ theorem gives us the length of the vector.
The final line prints out the length. If we run this program it prints

2.84429253067

which is the correct answer (to twelve significant figures).


We can change the values of individual elements of a list at any time, like
this:

48
2.4 | L ISTS AND ARRAYS

r = [ 1.0, 1.5, -2.2 ]


r[1] = 3.5
print(r)

The first line will create a list with three elements. The second then changes
the value of element 1, which is the middle of the three elements, since they
are numbered starting from zero. So if we run the program it prints out this:

[1.0, 3.5, -2.2]

A powerful and useful feature of Python is its ability to perform operations


on entire lists at once. For instance, it commonly happens that we want to
know the sum of the values in a list. Python contains a built-in function called
sum that can calculate such sums in a single line, thus:

r = [ 1.0, 1.5, -2.2 ]


total = sum(r)
print(total)

The first line here creates a three-element list and the second calculates the sum
of its elements. The final line prints out the result, and if we run the program
we get this:

0.3

Some other useful built-in functions are max and min, which give the largest
and smallest values in a list respectively, and len, which calculates the number
of elements in a list. Applied to the list r above, for instance, max(r) would
give 1.5 and min(r) would give −2.2, while len(r) would give 3. Thus, for
example, one can calculate the mean of the values in a list like this:

r = [ 1.0, 1.5, -2.2 ]


mean = sum(r)/len(r)
print(mean)

The second line here sums the elements in the list and then divides by the
number of elements to give the mean value. In this case, the calculation would
give a mean of 0.1.
A special, and especially useful, function for lists is the function map, which
is a kind of meta-function—it allows you apply ordinary functions, like log
or sqrt, to all the elements of a list at once. Thus map(log,r) takes the nat-
ural logarithm of each element of a list r in turn. More precisely, map creates

49
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

a specialized object in the computer memory, called an iterator, that contains


the logs, one after another in order.20 Normally we will want to convert this
iterator into a new list, which we can do with the built-in function list. Thus,
consider this snippet of code:

from math import log


r = [ 1.0, 1.5, 2.2 ]
logr = list(map(log,r))
print(logr)

This will create a list logr containing the logs of the three numbers 1, 1.5, and
2.2, and print it out thus:

[0.0, 0.4054651081, 0.7884573603]|

Another feature of lists in Python, one that we will use often, is the ability
to add elements to an already existing list. Suppose we have a list called r and
we want to add a new element to the end of the list with value, say, 6.1. We
can do this with the statement

r.append(6.1)

This slightly odd-looking statement is a little different in form from the ones
we’ve seen previously.21 It consists of the name of our list, which is r, followed
by a dot (i.e., a period), then “append(6.1)”. Its effect is to add a new element
to the end of the list and set that element equal to the value given, which is 6.1
in this case. The value can also be specified using a variable or a mathematical
expression, thus:

20
The difference between an iterator and a list is that the values in an iterator are not stored
in the computer’s memory the way the values in a list are. Instead, the computer calculates them
on the fly when they are needed, which saves memory. Thus, in this case, the computer only
calculates the logs of the elements of r when you convert the iterator to a list. In versions of
Python prior to version 3, the map function produced a list, not an iterator, so if you are using an
earlier version of the language you do not need to convert to a list using the list function. You
can just say “logr = map(log,r)”. For further discussion of this point, and of iterators in general,
see Appendix B.
21
This is an example of Python’s object-oriented programming features. The function append
is technically a “method” that belongs to the list “object” r. The function doesn’t exist as an entity
in its own right, only as a subpart of the list object. We will not dig into Python’s object-oriented
features in this book, since they are of relatively little use for the type of physics programming we
will be doing. For software developers engaged on large-scale commercial or group programming
projects, however, they can be invaluable.

50
2.4 | L ISTS AND ARRAYS

r = [ 1.0, 1.5, -2.2 ]


x = 0.8
r.append(2*x+1)
print(r)

If we run this program we get

[1.0, 1.5, -2.2, 2.6]

Note how the computer has calculated the value of 2*x+1 to be 2.6, then added
that value to the end of the list.
A particularly useful trick that we will employ frequently in this book is
the following. We create an empty list, a list with no elements in it at all, then
add elements to it one by one as we learn of or calculate their values. A list
created in this way can grow as large as we like (within limitations set by the
amount of memory the computer has to store the list).
To create an empty list we say

r = []

This creates a list called r with no elements. Even though it has no elements
in it, the list still exists. It’s like an empty set in mathematics—it exists as an
object, but it doesn’t contain anything (yet). Now we can add elements thus:

r.append(1.0)
r.append(1.5)
r.append(-2.2)
print(r)

which produces

[1.0, 1.5, -2.2]

We will use this technique, for instance, to make graphs in Section 3.1. Note
that you must create the empty list first before adding elements. You cannot
add elements to a list until it has been created—the computer will give a give
an error message if you try.
We can also remove a value from the end of a list by saying r.pop():

r = [ 1.0, 1.5, -2.2, 2.6 ]


r.pop()
print(r)

51
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

which gives

[1.0, 1.5, -2.2]

And we can remove a value from anywhere in a list by saying r.pop(n), where
n is the number of the element you want to remove.22 Bear in mind that the
elements are numbered from zero, so if you want to remove the first item from
a list you would say r.pop(0).

2.4.2 A RRAYS

As we have seen, a list in Python is an ordered set of values, such as a set of


integers or a set of floats. There is another object in Python that is somewhat
similar, an array. An array is also an ordered set of values, but there are some
important differences between lists and arrays:
1. The number of elements in an array is fixed. You cannot add elements to
an array once it is created, or remove them.
2. The elements of an array must all be of the same type, such as all floats or
all integers. You cannot mix elements of different types in the same array
and you cannot change the type of the elements once an array is created.
Lists, as we have seen, have neither of these restrictions and, on the face of
it, these seem like significant drawbacks of the array. Why would we ever
use an array if lists are more flexible? The answer is that arrays have several
significant advantages over lists as well:
3. Arrays can be two-dimensional, like matrices in algebra. That is, rather
than just a one-dimensional row of elements, we can have a grid of them.
Indeed, arrays can in principle have any number of dimensions, includ-
ing three or more, although we won’t use dimensions above two in this
book. Lists, by contrast, are always just one-dimensional.
4. Arrays behave roughly like vectors or matrices: you can do arithmetic
with them, such as adding them together, and you will get the result you
expect. This is not true with lists. If you try to do arithmetic with a list

22
However, removing an element from the middle (or the beginning) of a list is a slow operation
because the computer then has to move all the elements above that down one place to fill the gap.
For a long list this can take a long time and slow down your program, so you should avoid doing it
if possible. (On the other hand, if it doesn’t matter to you what order the elements of a list appear
in, then you can effectively remove any element rapidly by first setting it equal to the last element
in the list, then removing the last element.)

52
2.4 | L ISTS AND ARRAYS

you will either get an error message, or you will not get the result you
expect.
5. Arrays work faster than lists. Especially if you have a very large array
with many elements then calculations may be significantly faster using
an array.
In physics it often happens that we are working with a fixed number of ele-
ments all of the same type, as when we are working with matrices or vectors,
for instance. In that case, arrays are clearly the tool of choice: the fact that we
cannot add or remove elements is immaterial if we never need to do such a
thing, and the superior speed of arrays and their flexibility in other respects
can make a significant difference to our programs. We will use arrays exten-
sively in this book—far more than we will use lists.
Before you use an array you need to create it, meaning you need to tell the
computer how many elements it will have and of what type. Python provides
functions that allow you do this in several different ways. These functions are
all found in the package numpy.23
In the simplest case, we can create a one-dimensional array with n ele-
ments, all of which are initially equal to zero, using the function zeros from
the numpy package. The function takes two arguments. The first is the number
of elements the array is to have and the second is the type of the elements,
such as int, float, or complex. For instance, to create a new array with four
floating-point elements we would do the following:

from numpy import zeros


a = zeros(4,float)
print(a)

In this example the new array is denoted a. When we run the program the
array is printed out as follows:

[ 0. 0. 0. 0.]

Note that arrays are printed out slightly differently from lists—there are no
commas between the elements, only spaces.
We could similarly create an array of ten integers with the statement “a =
zeros(10,int)” or an array of a hundred complex numbers with the state-
ment “a = zeros(100,complex)”. The size of the arrays you can create is lim-

23
The word numpy is short for “numerical Python.”

53
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

ited only by the computer memory available to hold them. Modern computers
can hold arrays with hundreds of millions or even billions of elements.
To create a two-dimensional floating-point array with m rows and n col-
umns, you say “zeros([m,n],float)”, so

a = zeros([3,4],float)
print(a)

produces

[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]

Note that the first argument of zeros in this case is itself a list (that’s why it
is enclosed in brackets [. . . ]), whose elements give the size of the array along
each dimension. We could create a three-dimensional array by giving a three-
element list (and so on for higher dimensions).
There is also a similar function in numpy called ones that creates an array
with all elements equal to one. The form of the function is exactly the same as
for the function zeros. Only the values in the array are different.
On the other hand, if we are going to change the values in an array im-
mediately after we create it, then it doesn’t make sense to have the computer
set all of them to zero (or one)—setting them to zero takes some time, time
that is wasted if you don’t need the zeros. In that case you can use a different
function, empty, again from the package numpy, to create an empty array:

from numpy import empty


a = empty(4,float)

This creates an array of four “empty” floating-point elements. In practice the


elements aren’t actually empty. Instead they contain whatever random num-
bers happened to be littered around the computer’s memory at the time the
array is created. The computer just leaves those values as they are and doesn’t
waste any time changing them. You can also create empty integer or complex
arrays by saying int or complex instead of float.
A different way to create an array is to take a list and convert it into an
array, which you can do with the function array from the package numpy. For
instance we can say:

r = [ 1.0, 1.5, -2.2 ]


a = array(r,float)

54
2.4 | L ISTS AND ARRAYS

which will create an array of three floating-point elements, with values 1.0, 1.5,
and −2.2. If the elements of the list (or some of them) are not already floats,
they will be converted to floats. You can also create integer or complex arrays
in the same fashion, and the list elements will be converted to the appropriate
type if necessary.24,25
The two lines above can conveniently be combined into one, like this:

a = array([1.0,1.5,-2.2],float)

This is a quick and neat way to create a new array with predetermined values
in its elements. We will use this trick frequently.
We can also create two-dimensional arrays with specified initial values. To
do this we again use the array function, but now the argument we give it must
be a list of lists, which gives the elements of the array row by row. For example,
we can write

a = array([[1,2,3],[4,5,6]],int)
print(a)

This creates a two-dimensional array of integers and prints it out:

[[ 1 2 3]
[ 4 5 6]]

The list of lists must have the same number of elements for each row of the
array (three in this case) or the computer will complain.
We can refer to individual elements of an array in a manner similar to the
way we refer to the elements of a list. For a one-dimensional array a we write
a[0], a[1], and so forth. Note, as with lists, that the numbering of the elements
starts at zero, not one. We can also set individual elements equal to new values
thus:

24
Two caveats apply here. (1) If you create an integer array from a list that has any floating-
point elements, the fractional part, if any (i.e., the part after the decimal point), of the floating-point
elements will be thrown away. (2) If you try to create a floating-point or integer array from a list
containing complex values you will get an error message. This is not allowed.
25
Though it’s not something we will often need to do, you can also convert an array into a list
using the built-in function list, thus:
r = list(a)

Note that you do not specify a type for the list, because lists don’t have types. The types of the
elements in the list will just be the same as the types of the elements in the array.

55
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

a[2] = 4

Note, however, that, since the elements of an array are of a particular type
(which cannot be changed after the array is created), any value you specify
will be converted to that type. If you give an integer value for a floating-point
array element, it will be converted to floating-point. If you give a floating-
point value for an integer array, the fractional part of the value will be deleted.
(And if you try to assign a complex value to an integer or floating-point array
you will get an error message—this is not allowed.)
For two-dimensional arrays we use two indices, separated by commas, to
denote the individual elements, as in a[2,4], with counting again starting at
zero, for both indices. Thus, for example

from numpy import zeros


a = zeros([2,2],int)
a[0,1] = 1
a[1,0] = -1
print(a)

would produce the output

[[ 0 1]
[-1 0]]

(You should check this example and make sure you understand why it does
what it does.)
Note that when Python prints a two-dimensional array it observes the con-
vention of standard matrix arithmetic that the first index of a two-dimensional
array denotes the row of the array element and the second denotes the column.

2.4.3 R EADING AN ARRAY FROM A FILE

Another, somewhat different, way to create an array is to read a set of values


from a computer file, which we can do with the function loadtxt from the
package numpy. Suppose we have a text file that contains the following string
of numbers, on consecutive lines:

1.0
1.5
-2.2
2.6

56
2.4 | L ISTS AND ARRAYS

and suppose that this file is called values.txt on the computer. Then we can
do the following:

from numpy import loadtxt


a = loadtxt("values.txt",float)
print(a)

When we run this program, we get the following printed on the screen:

[ 1.0 1.5 -2.2 2.6]

As you can see, the computer has read the numbers in the file and put them
in a float-point array of the appropriate length. (For this to work the file
values.txt has to be in the same folder or directory on the computer as your
Python program is saved in.26 )
We can use the same trick to read a two-dimensional grid of values and put
them in a two-dimensional array. If the file values.txt contained the follow-
ing:

1 2 3 4
3 4 5 6
5 6 7 8

then the exact same program above would create a two-dimensional 3 × 4 ar-
ray of floats with the appropriate values in it.
The loadtxt function is a very useful one for physics calculations. It hap-
pens often that we have a file or files containing numbers we need for a cal-
culation. They might be data from an experiment, for example, or numbers
calculated by another computer program. We can use the loadtxt to transfer
those numbers into an array so that we can perform further calculations on
them.

2.4.4 A RITHMETIC WITH ARRAYS

As with lists, the individual elements of an array behave like ordinary vari-
ables, and we can do arithmetic with them in the usual way. We can write
things like

a[0] = a[1] + 1

26
You can also give a full path name for the file, specifying explicitly the folder as well as the
file name, in which case the file can be in any folder.

57
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

or

x = a[2]**2 - 2*a[3]/y

But we can also do arithmetic with entire arrays at once, a powerful feature
that can be enormously useful in physics calculations. In general, when doing
arithmetic with entire arrays, the rule is that whatever arithmetic operation
you specify is done independently to each element of the array or arrays in-
volved. Consider this short program:

from numpy import array


a = array([1,2,3,4],int)
b = 2*a
print(b)

When we run this program it prints

[2 4 6 8]

As you can see, when we multiply the array a by 2 the computer simply mul-
tiplies each individual element by 2. A similar thing happens if you divide.
Notice that when we run this program, the computer creates a new array b
holding the results of our multiplication. This is another way to create arrays,
different from the methods we mentioned before. We do not have to create the
array b explicitly, using for instance the empty function. When we perform a
calculation with arrays, Python will automatically create a new array for us to
hold the results.
If you add or subtract two arrays, the computer will add or subtract each
element separately, so that

a = array([1,2,3,4],int)
b = array([2,4,6,8],int)
print(a+b)

results in

[3 6 9 12]

(For this to work, the arrays must have the same size. If they do not, the com-
puter will complain.)
All of these operations give the same result as the equivalent mathematical
operations on vectors in normal algebra, which makes arrays well suited to

58
2.4 | L ISTS AND ARRAYS

representing vectors in physics calculations.27 If we represent a vector using


an array then arithmetic operations such as multiplying or dividing by a scalar
quantity or adding or subtracting vectors can be written just as they would in
normal mathematics. You can also add a scalar quantity to an array (or subtract
one), which the computer interprets to mean it should add that quantity to
every element. So

a = array([1,2,3,4],int)
print(a+1)

results in

[2 3 4 5]

However, if we multiply two arrays together the outcome is perhaps not


exactly what you would expect—you do not get the vector (dot) product of the
two. If we do this:

a = array([1,2,3,4],int)
b = array([2,4,6,8],int)
print(a*b)

we get

[2 8 18 32]

What has the computer done here? It has multiplied the two arrays together
element by corresponding element. The first elements of the two arrays are
multiplied together, then the second elements, and so on. This is logical in a
sense—it is the exact equivalent of what happens when you add. Each ele-
ment of the first array is multiplied by the corresponding element of the sec-
ond. (Division works similarly.) Occasionally this may be what you want the
computer to do, but more often in physics calculations we want the true vector
dot product of our arrays. This can be calculated using the function dot from
the package numpy:

from numpy import array,dot


a = array([1,2,3,4],int)
b = array([2,4,6,8],int)
print(dot(a,b))

27
The same operations, by contrast, do not work with lists, so lists are less good for storing
vector values.

59
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

The function dot takes two arrays as arguments and calculates their dot prod-
uct, which would be 60 in this case.
All of the operations above also work with two-dimensional arrays, which
makes such arrays convenient for storing matrices. Multiplying and dividing
by scalars as well as addition and subtraction of two-dimensional arrays all
work as in standard matrix algebra. Multiplication will multiply element by
element, which is usually not what you want, but the dot function calculates
the standard matrix product. Consider, for example, this matrix calculation:
à !à ! à ! à !
13 4 −2 12 −3 5
+2 =
24 −3 1 21 0 2

In Python we would do this as follows:

a = array([[1,3],[2,4]],int)
b = array([[4,-2],[-3,1]],int)
c = array([[1,2],[2,1]],int)
print(dot(a,b)+2*c)

You can also multiply matrices and vectors together. If v is a one-dimensional


array then dot(a,v) treats it as a column vector and multiplies it on the left
by the matrix a, while dot(v,a) treats it as a row vector and multiplies on the
right by a. Python is intelligent enough to know the difference between row
and column vectors, and between left- and right-multiplication.
Functions can be applied to arrays in much the same way as to lists. The
built-in functions sum, max, min and len described in Section 2.4.1 can be ap-
plied to one-dimensional arrays to calculate sums of elements, maximum and
minimum values, and the number of elements. The map function also works,
applying any ordinary function to all elements of a one-dimensional array and
producing an iterator (see Section 2.4.1), which can be converted back into an
array using the array function with a statement like

b = array(map(sqrt,a),float)

This will create an iterator whose elements are the square roots of the elements
of a, then convert the iterator into an array b containing those values.
Applying functions to arrays with two or more dimensions produces more
erratic results. For instance, the len function applied to a two-dimensional
array returns the number of rows in the array and the functions max and min
produce only error messages. However, the numpy package contains functions
that perform similar duties and work more predictably with two-dimensional

60
2.4 | L ISTS AND ARRAYS

arrays, such as functions min and max that find minimum and maximum val-
ues. In place of the len function, there are two different features, called size
and shape. Consider this example:

a = array([[1,2,3],[4,5,6]],int)
print(a.size)
print(a.shape)

which produces

6
(2, 3)

That is, a.size tells you the total number of elements in all rows and columns
of the array a (which is roughly the equivalent of the len function for lists and
one-dimensional arrays), and a.shape returns a list giving the dimensions of
the array along each axis. (Technically it is a “tuple” not a list, but for our
purposes it is roughly the same thing. You can say n = a.shape, and then
n[0] is the number of rows of a and n[1] is the number of columns.) For one-
dimensional arrays there is not really any difference between size and shape.
They both give the total number of elements.
There are a number of other functions in the numpy package that are useful
for performing calculations with arrays. The full list can be found in the on-line
documentation at www.scipy.org.

E XAMPLE 2.5: AVERAGE OF A SET OF VALUES IN A FILE

Suppose we have a set of numbers stored in a file values.txt and we want to


calculate their mean. Even if we don’t know how many numbers there are we
can do the calculation quite easily:

from numpy import loadtxt


values = loadtxt("values.txt",float)
mean = sum(values)/len(values)
print(mean)

The first line imports the loadtxt function and the second uses it to read the
values in the file and put them in an array called values. The third line calcu-
lates the mean as the sum of the values divided by the number of values and
the fourth prints out the result.
Now suppose we want to calculate the mean-square value. To do this, we
first need to calculate the squares of the individual values, which we can do by

61
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

multiplying the array values by itself. Recall, that the product of two arrays
in Python is calculated by multiplying together each pair of corresponding
elements, so values*values is an array with elements equal to the squares of
the original values. (We could also write values**2, which would produce the
same result.) Then we can use the function sum again to add up the squares.
Thus our program might look like this:

from numpy import loadtxt


values = loadtxt("values.txt",float)
mean = sum(values*values)/len(values)
print(mean)

On the other hand, suppose we want to calculate the geometric mean of our
set of numbers. The geometric mean of a set of n values xi is defined to be the
nth root of their product, thus:
· n ¸1/n
x = ∏ xi . (2.2)
i =1

Taking natural logs of both sides we get


· n ¸1/n n
1
ln x = ln ∏ xi = ∑ ln xi (2.3)
i =1
n i =1

or µ n ¶
1
x = exp
n ∑ ln xi . (2.4)
i =1
In other words, the geometric mean is the exponential of the arithmetic mean
of the logarithms. We can modify our previous program for the arithmetic
mean to calculate the geometric mean thus:

from numpy import loadtxt


from math import log,exp
values = loadtxt("values.txt",float)
logs = array(map(log,values),float)
geometric = exp(sum(logs)/len(logs))
print(geometric)

Note how we combined the map function and the log function to calculate the
logarithms and then calculated the arithmetic mean of the resulting values.
If we want to be clever, we can streamline this program by noting that the
numpy package contains its own log function that will calculate the logs of all
the elements of an array. Thus we can rewrite our program as

62
2.4 | L ISTS AND ARRAYS

from numpy import loadtxt,log


from math import exp
values = loadtxt("values.txt",float)
geometric = exp(sum(log(values))/len(values))
print(geometric)

As well as being more elegant, this version of the program will probably also
run a little faster, since the log function in numpy is designed specifically to
work efficiently with arrays.

Finally in this section, here is a word of warning. Consider the following


program:

from numpy import array


a = array([1,1],int)
b = a
a[0] = 2
print(a)
print(b)

Take a look at this program and work out for yourself what you think it will
print. If we actually run it (and you can try this for yourself) it prints the
following:

[2 1]
[2 1]

This is probably not what you were expecting. Reading the program, it looks
like array a should be equal to [2,1] and b should be equal to [1,1] when the
program ends, but the output of the program appears to indicate that both are
equal to [2,1]. What has gone wrong?
The answer lies in the line “b = a” in the program. In Python direct assign-
ment of arrays in this way, setting the value of one array equal to another, does
not work as you might expect it to. You might imagine that “b = a” would
cause Python to create a new array b holding a copy of the numbers in the ar-
ray a, but this is not what happens. In fact, all that “b = a” does is it declares
“b” to be a new name for the array previously called “a”. That is, “a” and “b”
now both refer to the same array of numbers, stored somewhere in the mem-
ory of the computer. If we change the value of an element in array a, as we
do in the program above, then we also change the same element of array b,

63
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

because a and b are really just the same array.28


This is a tricky point, one that can catch you out if you are not aware of it.
You can do all sorts of arithmetic operations with arrays and they will work
just fine, but this one operation, setting an array equal to another array, does
not work the way you expect it to.
Why does Python work like this? At first sight it seems peculiar, annoying
even, but there is a good reason for it. Arrays can be very large, with millions or
even billions of elements. So if a statement like “b = a” caused the computer
to create a new array b that was a complete copy of the array a, it might have
to copy very many numbers in the process, potentially using a lot of time and
memory space. But in many cases it’s not actually necessary to make a copy
of the array. Particularly if you are interested only in reading the numbers in
an array, not in changing them, then it does not matter whether a and b are
separate arrays that happen to contain the same values or are actually just two
names for the same array—everything will work the same either way. Creating
a new name for an old array is normally far faster than making a copy of the
entire contents, so, in the interests of efficiency, this is what Python does.
Of course there are times when you really do want to make a new copy of
an array, so Python also provides a way of doing this. To make a copy of an
array a we can use the function copy from the numpy package thus:

from numpy import copy


b = copy(a)

This will create a separate new array b whose elements are an exact copy of
those of array a. If we were to use this line, instead of the line “b = a”, in the
program above, then run the program, it would print this:

[2 1]
[1 1]

which is now the “correct” answer.

28
If you have worked with the programming languages C or C++ you may find this behavior
familiar, since those languages treat arrays the same way. In C, the statement “b = a”, where a
and b are arrays, also merely creates a new name for the array a, not a new array.

64
2.4 | L ISTS AND ARRAYS

Exercise 2.7: Suppose arrays a and b are defined as follows:


from numpy import array
a = array([1,2,3,4],int)
b = array([2,4,6,8],int)

What will the computer print upon executing the following lines? (Try to work out the
answer before trying it on the computer.)
a) print(b/a+1)
b) print(b/(a+1))
c) print(1/a)

2.4.5 S LICING

Here’s another useful trick, called slicing, which works with both arrays and
lists. Suppose we have a list r. Then r[m:n] is another list composed of a
subset of the elements of r, starting with element m and going up to but not
including element n. Here’s an example:

r = [ 1, 3, 5, 7, 9, 11, 13, 15 ]
s = r[2:5]
print(s)

which produces

[5, 7, 9]

Observe what happens here. The variable s is a new list, which is a sublist of r
consisting of elements 2, 3, and 4 of r, but not element 5. Since the numbering
of elements starts at zero, not one, element 2 is actually the third element of
the list, which is the 5, and elements 3 and 4 are the 7 and 9. So s has three
elements equal to 5, 7, and 9.
Slicing can be useful in many physics calculations, particularly, as we’ll see,
in matrix calculations, calculations on lattices, and in the solution of differen-
tial equations. There are a number of variants on the basic slicing formula
above. You can write r[2:], which means all elements of the list from ele-
ment 2 up to the end of the list, or r[:5], which means all elements from the
start of the list up to, but not including, element 5. And r[:] with no numbers
at all means all elements from the beginning to the end of the list, i.e., the entire
list. This last is not very useful—if we want to refer to the whole list we can

65
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

just say r. We get the same thing, for example, whether we write print(r[:])
or print(r). However, we will see a use for this form in a moment.
Slicing also works with arrays: you can specify a subpart of an array and
you get another array, of the same type as the first, as in this example:

from numpy import array


a = array([2,4,6,8,10,12,14,16],int)
b = a[3:6]
print(b)

which prints

[ 8 10 12]

You can also write a[3:], or a[:6], or a[:], as with lists.


Slicing works with two-dimensional arrays as well. For instance a[2,3:6]
gives you a one-dimensional array with three elements equal to a[2,3], a[2,4],
and a[2,5]. And a[2:4,3:6] gives you a two-dimensional array of size 2 × 3
with values drawn from the appropriate subblock of a, starting at a[2,3].
Finally, a[2,:] gives you the whole of row 2 of array a, which means the
third row since the numbering starts at zero. And a[:,3] gives you the whole
of column 3, which is the fourth column. These forms will be particularly
useful to us for doing vector and matrix arithmetic.

2.5 “F OR ” LOOPS

In Section 2.3.2 we saw a way to make a program loop repeatedly around a


given section of code using a while statement. It turns out, however, that while
statements are used only rather rarely. There is another, much more commonly
used loop construction in the Python language, the for loop. A for loop is a loop
that runs through the elements of a list or array in turn. Consider this short
example:

r = [ 1, 3, 5 ]
for n in r:
print(n)
print(2*n)
print("Finished")

If we run this program it prints out the following:

66
2.5 | “F OR ” LOOPS

1
2
3
6
5
10
Finished

What’s happening here is as follows. The program first creates a list called r,
then the for statement sets n equal to each value in the list in turn. For each
value the computer carries out the steps in the following two lines, printing
out n and 2n, then loops back around to the for statement again and sets n to
the next value in the list. Note that the two print statements are indented, in
a manner similar to the if and while statements we saw earlier. This is how
we tell the program which instructions are “in the loop.” Only the indented
instructions will be executed each time round the loop. When the loop has
worked its way through all the values in the list, it stops looping and moves
on to the next line of the program, which in this case is a third print statement
which prints the word “Finished.” Thus in this example the computer will go
around the loop three times, since there are three elements in the list r.
The same construction works with arrays as well—you can use a for loop
to go through the elements of a (one-dimensional) array in turn. Also the state-
ments break and continue (see Section 2.3.3) can be used with for loops the
same way they’re used with while loops: break ends the loop and moves to
the next statement after the loop; continue abandons the current iteration of
the loop and moves on to the next iteration.
The most common use of the for loop, by far, is simply to run through a
given piece of code a specified number of times, such as ten, say, or a million.
To achieve this, Python provides a special built-in function called range, which
creates a list of a given length, usually for use with a for loop. For example
range(5) gives a list [ 0, 1, 2, 3, 4 ]—that is, a list of consecutive inte-
gers, starting at zero and going up to, but not including, 5. Note that this means
the list contains exactly five elements but does not include the number 5 itself.
(Technically, range produces an iterator, not a list. Iterators were discussed
previously in Section 2.4.1. If you actually want to produce a list using range
then you would write something of the form “list(range(5))”, which creates
an iterator and then converts it to a list. In practice, however, we do this very
rarely, and never in this book—the main use of the range function is in for
loops and you are allowed to use an iterator directly in a for loop without

67
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

converting it into a list first.29 )


Thus

r = range(5)
for n in r:
print("Hello again")

produces the following output

Hello again
Hello again
Hello again
Hello again
Hello again

The for loop gives n each of the values in r in turn, of which there are five, and
for each of them it prints out the words Hello again. So the end result is that
the computer prints out the same message five times. In this case we are not
actually interested in the values r contains, only the fact that there are five of
them—they merely provide a convenient tool that allows us to run around the
same piece of code a given number of times.
A more interesting use of the range function is the following:

r = range(5)
for n in r:
print(n**2)

Now we are making use of the actual values r contains, printing out the square
of each one in turn:

0
1
4
9
16

In both these examples we used a variable r to store the results of the range
function, but it’s not necessary to do this. Often one takes a shortcut and just

29
In versions of Python prior to version 3, the range function actually did produce a list, not
an iterator. If you wanted to produce an iterator you used the separate function xrange, which no
longer exists in version 3. Both list and iterator give essentially the same results, however, so the
for loops in this book will work without modification with either version of Python. For further
discussion of this point, and of iterators in general, see Appendix B.

68
2.5 | “F OR ” LOOPS

writes something like

for n in range(5):
print(n**2)

which achieves the same result with less fuss. This is probably the most com-
mon form of the for loop and we will see many loops of this form throughout
this book.
There are a number of useful variants of the range function, as follows:

range(5) gives [ 0, 1, 2, 3, 4 ]
range(2,8) gives [ 2, 3, 4, 5, 6, 7 ]
range(2,20,3) gives [ 2, 5, 8, 11, 14, 17 ]
range(20,2,-3) gives [ 20, 17, 14, 11, 8, 5 ]

When there are two arguments to the function it generates integer values that
run from the first up to, but not including, the second. When there are three
arguments, the values run from the first up to but not including second, in steps
of the third. Thus in the third example above the values increase in steps of 3.
In the fourth example, which has a negative argument, the values decrease in
steps of 3. Note that in each case the values returned by the function do not
include the value at the end of the given range—the first value in the range is
always included; the last never is.
Thus, for example, we can print out the first ten powers of two with the
following lines:

for n in range(1,11):
print(2**n)

Notice how the upper limit of the range must be given as 11. This program
will print out the powers 2, 4, 8, 16, and so forth up to 1024. It stops at 210 ,
not 211 , because range always excludes the last value.
A further point to notice about the range function is that all of its argu-
ments must be integers. The function doesn’t work if you give it noninteger
arguments, such as floats, and you will get an error message if you try. It is
particularly important to remember this when the arguments are calculated
from the values of other variables. This short program, for example, will not
work:

p = 10
q = 2

69
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

for n in range(p/q):
print(n)

You might imagine these lines would print out the integers from zero to four,
but if you try them you will just get an error message because, as discussed in
Section 2.2.4, the division operation always returns a floating-point quantity,
even if the result of the division is, mathematically speaking, an integer. Thus
the quantity “p/q” in the program above is a floating-point quantity equal to
5.0 and is not allowed as an argument of the range function. We can fix this
problem by using an integer division instead:

for n in range(p//q):
print(n)

This will now work as expected. (See Section 2.2.4, page 23 for a discussion of
integer division.)
Another useful function is the function arange from the numpy package,
which is similar to range but generates arrays, rather than lists or iterators30
and, more importantly, works with floating-point arguments as well as inte-
ger ones. For example arange(1,8,2) gives a one-dimensional array of inte-
gers [1,3,5,7] as we would expect, but arange(1.0,8.0,2.0) gives an array
of floating-point values [1.0,3.0,5.0,7.0] and arange(2.0,2.8,0.2) gives
[2.0,2.2,2.4,2.6]. As with range, arange can be used with one, two, or
three arguments, and does the equivalent thing to range in each case.
Another similar function is the function linspace, also from the numpy
package, which generates an array with a given number of floating-point val-
ues between given limits. For instance, linspace(2.0,2.8,5) divides the in-
terval from 2.0 to 2.8 into 5 values, creating an array with floating-point ele-
ments [2.0,2.2,2.4,2.6,2.8]. Similarly, linspace(2.0,2.8,3) would cre-
ate an array with elements [2.0,2.4,2.8]. Note that, unlike both range and
arange, linspace includes the last point in the range. Also note that although
linspace can take either integer or floating-point arguments, it always gener-
ates floating-point values, even when the arguments are integers.

30
The function arange generates an actual array, calculating all the values and storing them
in the computer’s memory. This can cause problems if you generate a very large array because
the computer can run out of memory, crashing your program, an issue that does not arise with
the iterators generated by the range function. For instance, arange(10000000000) will produce
an error message on most computers, while the equivalent expression with range will not. See
Appendix B for more discussion of this point.

70
2.5 | “F OR ” LOOPS

E XAMPLE 2.6: P ERFORMING A SUM

It happens often in physics calculations that we need to evaluate a sum. If we


have the values of the terms in the sum stored in a list or array then we can
calculate the sum using the built-in function sum described in Section 2.4.1. In
more complicated situations, however, it is often more convenient to use a for
loop to calculate a sum. Suppose, for instance, we want to know the value of
the sum s = ∑100
k =1 (1/k ). The standard way to program this is as follows:
1. First create a variable to hold the value of the sum, and initially set it
equal to zero. As above we’ll call the variable s, and we want it to be a
floating-point variable, so we’d do “s = 0.0”.
2. Now use a for loop to take the variable k through all values from 1 to 100.
For each value, calculate 1/k and add it to the variable s.
3. When the for loop ends the variable s will contain the value of the com-
plete sum.
Thus our program looks like this:

s = 0.0
for k in range(1,101):
s += 1/k
print(s)

Note how we use range(1,101) so that the values of k start at 1 and end at 100.
We also used the “+=” modifier, which adds to a variable as described in Sec-
tion 2.2.4. If we run this program it prints the value of the sum thus:

5.1873775176

As another example, suppose we have a set of real values stored in a com-


puter file called values.txt and we want to compute and print the sum of
their squares. We could achieve this as follows:

from numpy import loadtxt


values = loadtxt("values.txt",float)
s = 0.0
for x in values:
s += x**2
print(s)

Here we have used the function loadtxt from Section 2.4.3 to read the values
in the file and put them in an array called values. Note also how this example

71
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

does not use the range function, but simply goes through the list of values
directly.
For loops and the sum function give us two different ways to compute sums
of quantities. It is not uncommon for there to be more than one way to achieve
a given goal in a computer program, and in particular it’s often the case that
one can use either a for loop or a function or similar array operation to perform
the same calculation. In general for loops are more flexible, but direct array
operations are often faster and can save significant amounts of time if you are
dealing with large arrays. Thus both approaches have their advantages. Part
of the art of good computer programming is learning which approach is best
in which situation.

E XAMPLE 2.7: E MISSION LINES OF HYDROGEN

Let us revisit an example we saw in Chapter 1. On page 6 we gave a pro-


gram for calculating the wavelengths of emission lines in the spectrum of the
hydrogen atom, based on the Rydberg formula
µ ¶
1 1 1
=R − . (2.5)
λ m2 n2

Our program looked like this:

R = 1.097e-2
for m in [1,2,3]:
print("Series for m =",m)
for k in [1,2,3,4,5]:
n = m + k
invlambda = R*(1/m**2-1/n**2)
print(" ",1/invlambda," nm")

We can now understand exactly how this program works. It uses two nested
for loops—a loop within another loop—with the code inside the inner loop
doubly indented. We discussed nesting previously in Section 2.3.3. The first
for loop takes the integer variable m through the values 1, 2, 3. And for each
value of m the second, inner loop takes k though the values 1, 2, 3, 4, 5, adds
those values to m to calculate n and then applies the Rydberg formula. The end
result will be that the program prints out a wavelength for each combination
of values of m and n, which is what we want.
In fact, knowing what we now know, we can write a simpler version of this
program, by making use of the range function, thus:

72
2.5 | “F OR ” LOOPS

R = 1.097e-2 File: rydberg.py


for m in range(1,4):
print("Series for m =",m)
for n in range(m+1,m+6):
invlambda = R*(1/m**2-1/n**2)
print(" ",1/invlambda," nm")

Note how we were able to eliminate the variable k in this version by specifying
a range for n that depends directly on the value of m.

Exercise 2.8: The Madelung constant


In condensed matter physics the Madelung constant gives the total electric potential
felt by an atom in a solid. It depends on the charges on the other atoms nearby and
their locations. Consider for instance solid sodium chloride—table salt. The sodium
chloride crystal has atoms arranged on a cubic lattice, but with alternating sodium and
chlorine atoms, the sodium ones having a single positive charge +e and the chlorine
ones a single negative charge −e, where e is the charge on the electron. If we label each
position on the lattice by three integer coordinates (i, j, k), then the sodium atoms fall
at positions where i + j + k is even, and the chlorine atoms at positions where i + j + k
is odd.
Consider a sodium atom at the origin, i = j = k = 0, and let us calculate the
Madelung constant. If the spacing of atoms on the lattice is a, then the distance from
the origin to the atom at position (i, j, k) is
q q
(ia)2 + ( ja)2 + (ka)2 = a i2 + j2 + k2 ,

and the potential at the origin created by such an atom is


e
V (i, j, k) = ± p ,
4πǫ0 a i2 + j2 + k 2
with ǫ0 being the permittivity of the vacuum and the sign of the expression depending
on whether i + j + k is even or odd. The total potential felt by the sodium atom is then
the sum of this quantity over all other atoms. Let us assume a cubic box around the
sodium at the origin, with L atoms in all directions. Then
L
e
Vtotal = ∑ V (i, j, k) =
4πǫ0 a
M,
i,j,k =− L
not i = j=k =0

where M is the Madelung constant, at least approximately—technically the Madelung


constant is the value of M when L → ∞, but one can get a good approximation just by
using a large value of L.

73
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

Write a program to calculate and print the Madelung constant for sodium chloride.
Use as large a value of L as you can, while still having your program run in reasonable
time—say in a minute or less.

Exercise 2.9: The semi-empirical mass formula


In nuclear physics, the semi-empirical mass formula is a formula for calculating the
approximate nuclear binding energy B of an atomic nucleus with atomic number Z
and mass number A:
Z2 ( A − 2Z )2 a5
B = a1 A − a2 A2/3 − a3 1/3
− a4 + 1/2 ,
A A A
where, in units of millions of electron volts, the constants are a1 = 15.67, a2 = 17.23,
a3 = 0.75, a4 = 93.2, and

0
 if A is odd,
a5 = 12.0 if A and Z are both even,
 −12.0 if A is even and Z is odd.

a) Write a program that takes as its input the values of A and Z, and prints out
the binding energy for the corresponding atom. Use your program to find the
binding energy of an atom with A = 58 and Z = 28. (Hint: The correct answer is
around 490 MeV.)
b) Modify your program to print out not the total binding energy B, but the binding
energy per nucleon, which is B/A.
c) Now modify your program so that it takes as input just a single value of the
atomic number Z and then goes through all values of A from A = Z to A = 3Z,
to find the one that has the largest binding energy per nucleon. This is the most
stable nucleus with the given atomic number. Have your program print out the
value of A for this most stable nucleus and the value of the binding energy per
nucleon.
d) Modify your program again so that, instead of taking Z as input, it runs through
all values of Z from 1 to 100 and prints out the most stable value of A for each
one. At what value of Z does the maximum binding energy per nucleon occur?
(The true answer, in real life, is Z = 28, which is nickel. You should find that the
semi-empirical mass formula gets the answer roughly right, but not exactly.)

2.6 U SER - DEFINED FUNCTIONS


We saw in Section 2.2.5 how to use functions, such as log or sqrt, to do math-
ematics in our programs, and Python comes with a broad array of functions
for performing all kinds of calculations. There are many situations in compu-
tational physics, however, where we need a specialized function to perform a

74
2.6 | U SER - DEFINED FUNCTIONS

particular calculation and Python allows you to define your own functions in
such cases.
Suppose, for example, we are performing a calculation that requires us to
calculate the factorials of integers. Recall that the factorial of n is defined as the
product of all integers from 1 to n:
n
n! = ∏ k. (2.6)
k =1

We can calculate this in Python with a loop like this:

f = 1.0
for k in range(1,n+1):
f *= k

When the loop finishes, the variable f will be equal to the factorial we want.
If our calculation requires us to calculate factorials many times in various
different parts of the program we could write out a loop, as above, each time,
but this could get tedious quickly and would increase the chances that we
would make an error. A more convenient approach is to define our own func-
tion to calculate the factorial, which we do like this:

def factorial(n):
f = 1.0
for k in range(1,n+1):
f *= k
return f

Note how the lines of code that define the function are indented, in a manner
similar to the if statements and for loops of previous sections. The indentation
tells Python where the function ends.
Now, anywhere later in the program we can simply say something like

a = factorial(10)

or

b = factorial(r+2*s)

and the program will calculate the factorial of the appropriate number. In effect
what happens when we write “a = factorial(10)”—when the function is
called—is that the program jumps to the definition of the function (the part
starting with def above), sets n = 10, and then runs through the instructions

75
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

in the function. When it gets to the final line “return f” it jumps back to
where it came from and the value of the factorial function is set equal to
whatever quantity appeared after the word return—which is the final value
of the variable f in this case. The net effect is that the we calculate the factorial
of 10 and set the variable a equal to the result.
An important point to note is that any variables created inside the defini-
tion of a function exist only inside that function. Such variables are called local
variables. For instance the variables f and k in the factorial function above are
local variables. This means we can use them only when we are inside the func-
tion and they disappear when we leave. Thus, for example, you could print the
value of the variable k just fine if you put the print statement inside the func-
tion, but if you were to try to print the variable anywhere outside the function
then you would get an error message telling you that no such variable exists.31
Note, however, that the reverse is not true—you can use a variable inside a
function that is defined outside it.
User-defined functions allow us to encapsulate complex calculations inside
a single function definition and can make programs much easier to write and
to read. We will see many uses for them in this book.
User-defined functions can have more than one argument. Suppose, for
example, we have a point specified in cylindrical coordinates r, θ, z, and we
want to know the distance d between the point and the origin. The simplest
way to do the calculation is to convert r and θ to Cartesian coordinates first,
then apply Pythagoras’ Theorem to calculate d:
q
x = r cos θ, y = r sin θ, d = x 2 + y2 + z2 . (2.7)

If we find ourselves having to do such a conversion many times within a pro-


gram we might want to define a function to do it. Here’s a suitable function in
Python:

def distance(r,theta,z):
x = r*cos(theta)
y = r*sin(theta)
d = sqrt(x**2+y**2+z**2)
return d

31
To make things more complicated, you can separately define a variable called k outside the
function and then you are allowed to print that (or do any other operation with it), but in that case
it is a different variable—now you have two variables called k that have separate values and which
value you get depends on whether you are inside the function or not.

76
2.6 | U SER - DEFINED FUNCTIONS

(This assumes that we have already imported the functions sin, cos, and sqrt
from the math package.)
Note how the function takes three different arguments now. When we call
the function we must now supply it with three different arguments and they
must come in the same order—r, θ, z—that they come in in the definition of
the function. Thus if we say

D = distance(2.0,0.1,-1.5)

the program will calculate the distance for r = 2, θ = 0.1, and z = −1.5.
The values used as arguments for a function can be any type of quantity
Python knows about, including integers and real and complex numbers, but
also including, for instance, lists and arrays. This allows us, for example, to
create functions that perform operations on vectors or matrices stored in ar-
rays. We will see examples of such functions when we look at linear algebra
methods in Chapter 6.
The value returned by a function can also be of any type, including integer,
real, complex, or a list or array. Using lists or arrays allows us to return more
than one value if want to, or to return a vector or matrix. For instance, we
might write a function to convert from polar coordinates to Cartesian coordi-
nates like this:

def cartesian(r,theta):
x = r*cos(theta)
y = r*sin(theta)
position = [x,y]
return position

This function takes a pair of values r, θ and returns a two-element list contain-
ing the corresponding values of x and y. In fact, we could combine the two
final lines here into one and say simply

return [x,y]

Or we could return x and y in the form of a two-element array by saying

return array([x,y],float)

An alternative way to return multiple values from a function is to use the


“multiple assignment” feature of Python, which we examined in Section 2.2.4.
We saw there that one can write statements of the form “x,y = a,b” which

77
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

will simultaneously set x = a and y = b. The equivalent maneuver with a


user-defined function is to write

def f(z):
# Some calculations here...
return a,b

which will make the function return the values of a and b both. To call such a
function we write something like

x,y = f(1)

and the two returned values would get assigned to the variables x and y. One
can also specify three or more returned values in this fashion, and the individ-
ual values themselves can again be lists, arrays, or other objects, in addition to
single numbers, which allows functions to return very complex sets of values
when necessary.
User-defined functions can also return no value at all—it is permitted for
functions to end without a return statement. The body of the function is marked
by indenting the lines of code and the function ends when the indentation
does, whether or not there is a return statement. If the function ends without
a return statement then the program will jump back to wherever it came from,
to the statement where it was called, without giving a value. Why would you
want to do this? In fact there are many cases where this is a useful thing to do.
For example, suppose you have a program that uses three-element arrays to
hold vectors and you find that you frequently want to print out the values of
those vectors. You could write something like

print("(",r[0],r[1],r[2],")")

every time you want to print a vector, but this is difficult to read and prone to
typing errors. A better way to do it would be to define a function that prints a
vector, like this:

def print_vector(r):
print("(",r[0],r[1],r[2],")")

Then when you want to print a vector you simply say “print_vector(r)”
and the computer handles the rest. Note how, when calling a function that
returns no value you simply give the name of the function. One just says
“print_vector(r)”, and not “x = print_vector(r)” or something like that.

78
2.6 | U SER - DEFINED FUNCTIONS

This is different from the functions we are used to in mathematics, which al-
ways return a value. Perhaps a better name for functions like this would be
“user-defined statements” or something similar, but by convention they are
still called “functions” in Python.32
The definition of a user-defined function—the code starting with def—can
occur anywhere in a program, except that it must occur before the first time
you use the function. It is good programming style to put all your function
definitions (you will often have more than one) at or near the beginning of
your programs. This guarantees that they come before their first use, and also
makes them easier to find if you want to look them up or change them later.
Once defined, your functions can be used in the same way as any other Python
functions. You can use them in mathematical expressions. You can use them in
print statements. You can even use them inside the definitions of other func-
tions. You can also apply a user-defined function to all the elements of a list or
array with the map function. For instance, to multiply every element of a list
by 2 and subtract 1, we could do the following:

def f(x):
return 2*x-1

newlist = list(map(f,oldlist))

This applies the function f to every element in turn of the list oldlist and
makes a list of the results called newlist.
One more trick is worth mentioning, though it is more advanced and you
should feel free to skip it if you’re not interested. The functions you define
do not have to be saved in the same file on your computer as your main pro-
gram. You could, for example, place a function definition for a function called
myfunction in a separate file called mydefinitions.py. You can put the defi-
nitions for many different functions in the same file if you want. Then, when
you want to use a function in a program, you say

from mydefinitions import myfunction

This tells Python to look in the file mydefinitions.py to find the definition
of myfunction and magically that function will now become available in your
program. This is a very convenient feature if you write a function that you

32
We have already seen one other example of a function with no return value, the standard
print function itself.

79
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

want to use in many different programs: you need write it only once and store
it in a file; then you can import it into as many other programs as you like.
As you will no doubt have realized by now, this is what is happening when
we say things like “from math import sqrt” in a program. Someone wrote a
function called sqrt that calculates square roots and placed it in a file so that
you can import it whenever you need it. The math package in Python is noth-
ing other than a large collection of function definitions for useful mathematical
functions, gathered together in one file.33

E XAMPLE 2.8: P RIME FACTORS AND PRIME NUMBERS

Suppose we have an integer n and we want to know its prime factors. The
prime factors can be calculated relatively easily by dividing repeatedly by all
integers from 2 up to n and checking to see if the remainder is zero. Recall that
the remainder after division can be calculated in Python using the modulo
operation ”%”. Here is a function that takes the number n as argument and
returns a list of its prime factors:

def factors(n):
factorlist = []
k = 2
while k<=n:
while n%k==0:
factorlist.append(k)
n //= k
k += 1
return factorlist

This is a slightly tricky piece of code—make sure you understand how it does
the calculation. Note how we have used the integer division operation “//”
to perform the divisions, which ensures that the result returned is another in-
teger. (Remember that the ordinary division operation “/” always produces
a float.) Note also how we change the value of the variable n (which is the
argument of the function) inside the function. This is allowed: the argument
variable behaves like any other variable and can be modified, although, like
all variables inside functions, it is a local variable—it exists only inside the

33
In fact the functions in the math package aren’t written in Python—they’re written in the C
programming language, and one has to do some additional trickery to make these C functions
work in Python, but the same basic principle still applies.

80
2.6 | U SER - DEFINED FUNCTIONS

function and gets erased when the function ends.


Now if we say “print(factors(17556))”, the computer prints out the
list of factors “[2, 2, 3, 7, 11, 19]”. On the other hand, if we specify a
prime number in the argument, such as “print(factors(23))”, we get back
“[23]”—the only prime factor of a prime number is itself. We can use this
fact to make a program that prints out the prime numbers up to any limit we
choose by checking to see if they have only a single prime factor:

for n in range(2,10000):
if len(factors(n))==1:
print(n)

Run this program, and in a matter of seconds we have a list of the primes up to
10 000. (This is, however, not a very efficient way of calculating the primes—
see Exercise 2.10 for a faster way of doing it.)

Exercise 2.10: The program above is not a very efficient way of calculating prime num-
bers: it checks each number to see if it is divisible by any number less than it. We
can develop a much faster program for prime numbers by making use of the following
observations:
a) A number n is prime if it has no prime factors less than n. Hence we only need
to check if it is divisible by other primes.
b) If a number n is non-prime, having a factor r, then n = rs, where s is also a factor.
√ √ √
If r ≥ n then n = rs ≥ ns, which implies that s ≤ n. In other words, any
non-prime must have factors, and hence also prime factors, less than or equal

to n. Thus to determine if a number is prime we have to check its prime factors

only up to and including n—if there are none then the number is prime.

c) If we find even a single prime factor less than n then we know that the number
is non-prime, and hence there is no need to check any further—we can abandon
this number and move on to something else.
Write a Python program that finds all the primes up to ten thousand. Create a list to
store the primes, which starts out with just the one prime number 2 in it. Then for each
number n from 3 to 10 000 check whether the number is divisible by any of the primes

in the list up to and including n. As soon as you find a single prime factor you can
stop checking the rest of them—you know n is not a prime. If you find no prime factors

n or less then n is prime and you should add it to the list. You can print out the list
all in one go at the end of the program, or you can print out the individual numbers as
you find them.

81
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

2.7 G OOD PROGRAMMING STYLE


When writing a program to solve a physics problem there are, usually, many
ways to do it, many programs that will give you the solution you are looking
for. For instance, you can use different names for your variables, use either
lists or arrays for storing sets of numbers, break up the code by using user-
defined functions to do some operations, and so forth. Although all of these
approaches may ultimately give the same answer, not all of them are equally
satisfactory. There are well written programs and poorly written ones. A well
written program will, as far as possible, have a simple structure, be easy to
read and understand, and, ideally, run fast. A poorly written one may be con-
voluted or unnecessarily long, difficult to follow, or may run slowly. Making
programs easy to read is a particularly important—and often overlooked—
goal. An easy-to-read program makes it easier to find problems, easier to mod-
ify the code, and easier for other people to understand how things work.
Good programming is, to some extent, a matter of experience, and you will
quickly get the hang of it as you start to write programs. But here are a few
general rules of thumb that may help.
1. Include comments in your programs. Leave comments in the code to
remind yourself what particular variables mean, what calculations are
being performed in different sections of the code, what arguments func-
tions require, and so forth. It’s amazing how you can come back to a
program you wrote only a week ago and not remember how it works.
You will thank yourself later if you include comments. And comments
are even more important if you are writing programs that other people
will have to read and understand. It’s frustrating to be the person who
has to fix or modify someone else’s code if they neglected to include any
comments to explain how it works.
2. Use meaningful variable names. Give your variables names that help
you remember what they represent. The names don’t have to be long. In
fact, very long names are usually harder to read. But choose your names
sensibly. Use E for energy and t for time. Use full words where appro-
priate or even pairs of words to spell out what a variable represents, like
mass or angular_momentum. If you are writing a program to calculate the
value of a mathematical formula, give your variables the same names as
in the formula. If variables are called x and β in the formula, call them x
and beta in the program.
3. Use the right types of variables. Use integer variables to represent quan-

82
2.7 | G OOD PROGRAMMING STYLE

tities that actually are integers, like vector indices or quantum numbers.
Use floats and complex variables for quantities that really are real or com-
plex numbers.
4. Import functions first. If you are importing functions from packages, put
your import statements at the start of your program. This makes them
easy to find if you need to check them or add to them, and it ensures that
you import functions before the first time they are used.
5. Give your constants names. If there are constants in your program, such
as the number of atoms N in a gas or the mass m of a particle, create
similarly named variables at the beginning of your program to repre-
sent these quantities, then use those variables wherever those quantities
appear in your program. This makes formulas easier to read and under-
stand and it allows you later to change the values of constants by chang-
ing only a single line at the beginning of the program, even if the constant
appears many times throughout your calculations. Thus, for example,
you might have a line “A = 58” that sets the atomic mass of an atom for
a calculation at the beginning of the program, then you would use A ev-
erywhere else in the program that you need to refer to the atomic mass.
If you later want to perform the same calculation for atomic mass 59,
you need only change the single line at the beginning to “A = 59”. Most
physics programs have a section near the beginning (usually right after
the import statements) that defines all the constants and parameters of
the program, making them easy to find when you need to change their
values.
6. Employ user-defined functions, where appropriate. Use user-defined
functions to represent repeated operations, especially complicated op-
erations. Functions can greatly increase the legibility of your program.
Avoid overusing them, however: simple operations, ones that can be
represented by just a line or two of code, are often better left in the main
body of the program, because it allows you to follow the flow of the cal-
culation when reading through your program without having to jump to
a function and back. Normally you should put your function definitions
at the start of your program (probably after imports and constant defi-
nitions). This ensures that each definition appears before the first use of
the function it defines and that the definitions can be easily found and
modified when necessary.
7. Print out partial results and updates throughout your program. Large

83
C HAPTER 2 | P YTHON PROGRAMMING FOR PHYSICISTS

computational physics calculations can take a long time—minutes, hours,


or even days. You will find it helpful to include print statements in your
program that print updates about where the program has got to or partial
results from the calculations, so you know how the program is progress-
ing. It’s difficult to tell whether a calculation is working correctly if the
computer simply sits silently, saying nothing, for hours on end. Thus, for
example, if there is a for loop in your program that repeats many times,
it is useful to include code like this at the beginning of the loop:

for n in range(1000000):
if n%1000==0:
print("Step",n)

These lines will cause the program to print out what step it has reached
every time n is exactly divisible by 1000, i.e., every thousandth step. So it
will print:

Step 0
Step 1000
Step 2000
Step 3000

and so forth as it goes along.


8. Lay out your programs clearly. You can add spaces or blank lines in
most places within a Python program without changing the operation of
the program and doing so can improve readability. Make use of blank
lines to split code into logical blocks. Make use of spaces to divide up
complicated algebraic expressions or particularly long program lines.
You can also split long program lines into more than one line if necessary.
If you place a backslash symbol “\” at the end of a line it tells the com-
puter that the following line is a continuation of the current one, rather
than a new line in its own right. Thus, for instance you can write:

energy = mass*(vx**2 + vy**2)/2 + mass*g*y \


+ moment_of_inertia*omega**2/2

and the computer will interpret this as a single formula. If a program


line is very long indeed you can spread it over three or more lines on the

84
2.7 | G OOD PROGRAMMING STYLE

screen with backslashes at the end of each one, except the last.34
9. Don’t make your programs unnecessarily complicated. A short simple
program is enormously preferable to a long involved one. If the job can
be done in ten or twenty lines, then it’s probably worth doing it that
way—the code will be easier to understand, for you or anyone else, and
if there are mistakes in the program it will be easier to work out where
they lie.
Good programming, like good science, is a matter of creativity as well as tech-
nical skill. As you gain more experience with programming you will no doubt
develop your own programming style and learn to write code in a way that
makes sense to you and others, creating programs that achieve their scientific
goals quickly and elegantly.

34
Under certain circumstances, you do not need to use a backslash. If a line does not make
sense on its own but it does make sense when the following line is interpreted as a continuation,
then Python will automatically assume the continuation even if there is no backslash character.
This, however, is a complicated rule to remember, and there are no adverse consequences to using
a backslash when it’s not strictly needed, so in most cases it is simpler just to use the backslash
and not worry about the rules.

85
C HAPTER 3

G RAPHICS AND VISUALIZATION

we have created programs that print out words and numbers, but of-
S O FAR
ten we will also want our programs to produce graphics, meaning pictures
of some sort. In this chapter we will see how to produce the two main types
of computer graphics used in physics. First, we look at that most common of
scientific visualizations, the graph: a depiction of numerical data displayed on
calibrated axes. Second, we look at methods for making scientific diagrams
and animations: depictions of the arrangement or motion of the parts of a
physical system, which can be useful in understanding the structure or behav-
ior of the system.

3.1 G RAPHS
A number of Python packages include features for making graphs. In this
book we will use the powerful, easy-to-use, and popular package pylab.1 The
package contains features for generating graphs of many different types. We
will concentrate of three types that are especially useful in physics: ordinary
line graphs, scatter plots, and density (or heat) plots. We start by looking at
line graphs.2
To create an ordinary graph in Python we use the function plot from the

1
The name pylab is a reference to the scientific calculation program Matlab, whose graph-
drawing features pylab is intended to mimic. The pylab package is a part of a larger pack-
age called matplotlib, some of whose features can occasionally be useful in computational
physics, although we will use only the ones in pylab in this book. If you’re interested in the
other features of matplotlib, take a look at the on-line documentation, which can be found at
matplotlib.sourceforge.net.
2
The pylab package can also make contour plots, polar plots, pie charts, histograms, and more,
and all of these find occasional use in physics. If you find yourself needing one of these more
specialized graph types, you can find instructions for making them in the on-line documentation
at matplotlib.sourceforge.net.

86
3.1 | G RAPHS

pylab package. In the simplest case, this function takes one argument, which
is a list or array of the values we want to plot. The function creates a graph of
the given values in the memory of the computer, but it doesn’t actually display
it on the screen of the computer—it’s stored in the memory but not yet visible
to the computer user. To display the graph we use a second function from
pylab, the show function, which takes the graph in memory and draws it on
the screen. Here is a complete program for plotting a small graph:

from pylab import plot,show


y = [ 1.0, 2.4, 1.7, 0.3, 0.6, 1.8 ]
plot(y)
show()

After importing the two functions from pylab, we create the list of values to be
plotted, create a graph of those values with plot(y), then display that graph
on the screen with show(). Note that show() has brackets after it—it is a func-
tion that has no argument, but the brackets still need to be there.
If we run the program above, it produces a new window on the screen with
a graph in it like this:

2.5

2.0

1.5

1.0

0.5

0.0
0 1 2 3 4 5

The computer has plotted the values in the list y at unit intervals along the x-
axis (starting from zero in the standard Python style) and joined them up with
straight lines.

87
C HAPTER 3 | G RAPHICS AND VISUALIZATION

While it’s better than nothing, this is not a very useful kind of graph for
physics purposes. Normally we want to specify both the x- and y-coordinates
for the points in the graph. We can do this using a plot statement with two list
arguments, thus:

from pylab import plot,show


x = [ 0.5, 1.0, 2.0, 4.0, 7.0, 10.0 ]
y = [ 1.0, 2.4, 1.7, 0.3, 0.6, 1.8 ]
plot(x,y)
show()

which produces a graph like this:

2.5

2.0

1.5

1.0

0.5

0.0
0 2 4 6 8 10

The first of the two lists now specifies the x-coordinates of each point, the sec-
ond the y-coordinates. The computer plots the points at the given positions
and then again joins them with straight lines. The two lists must have the
same number of entries, as here. If they do not, you’ll get an error message
and no graph.
Why do we need two commands, plot and show, to make a graph? In the
simple examples above it seems like it would be fine to combine the two into
a single command that both creates a graph and shows it on the screen. How-
ever, there are more complicated situations where it is useful to have separate
commands. In particular, in cases where we want to plot two or more different
curves on the same graph, we can do so by using the plot function two or

88
3.1 | G RAPHS

more times, once for each curve. Then we use the show function once to make
a single graph with all the curves. We will see examples of this shortly.
Once you have displayed a graph on the screen you can do other things
with it. You will notice a number of buttons along the bottom of the window
in which the graph appears (not shown in the figures here, but you will see
them if you run the programs on your own computer). Among other things,
these buttons allow you to zoom in on portions of the graph, move your view
around the graph, or save the graph as an image file on your computer. You
can also save the graph in “PostScript” format, which you can then print out
on a printer or insert as a figure in a word processor document.
Let us apply the plot and show functions to the creation of a slightly more
interesting graph, a graph of the function sin x from x = 0 to x = 10. To do this
we first create an array of the x values, then we take the sines of those values
to get the y-coordinates of the points:

from pylab import plot,show


from numpy import linspace,sin

x = linspace(0,10,100)
y = sin(x)
plot(x,y)
show()

Notice how we used the linspace function from numpy (see Section 2.5) to gen-
erate the array of x-values, and the sin function from numpy, which is a special
version of sine that works with arrays—it just takes the sine of every element
in the array. (We could alternatively have used the ordinary sin function from
the math package and taken the sines of each element individually using a for
loop, or using map(sin,x). As is often the case, there’s more than one way to
do the job.)
If we run this program we get the classic sine curve graph shown in Fig. 3.1.
Notice that we have not really drawn a curve at all here: our plot consists of a
finite set of points—a hundred of them in this case—and the computer draws
straight lines joining these points. So the end result is not actually curved; it’s
a set of straight-line segments. To us, however, it looks like a convincing sine
wave because our eyes are not sharp enough to see the slight kinks where the
segments meet. This is a useful and widely used trick for making curves in
computer graphics: choose a set of points spaced close enough together that
when joined with straight lines the result looks like a curve even though it
really isn’t.

89
C HAPTER 3 | G RAPHICS AND VISUALIZATION

1.0

0.5

0.0

0.5

1.0
0 2 4 6 8 10

Figure 3.1: Graph of the sine function. A simple graph of the sine function produced
by the program given in the text.

As another example of the use of the plot function, suppose we have some
experimental data in a computer file values.txt, stored in two columns, like
this:

0 12121.71
1 12136.44
2 12226.73
3 12221.93
4 12194.13
5 12283.85
6 12331.6
7 12309.25
...

We can make a graph of these data as follows:

from pylab import plot,show


from numpy import loadtxt

data = loadtxt("values.txt",float)

90
3.1 | G RAPHS

15000

14000

13000

12000

11000

10000

9000

8000

7000

6000
0 200 400 600 800 1000 1200

Figure 3.2: Graph of data from a file. This graph was produced by reading two
columns of data from a file using the program given in the text.

x = data[:,0]
y = data[:,1]
plot(x,y)
show()

In this example we have used the loadtxt function from numpy (see Section
2.4.3) to read the values in the file and put them in an array and then we have
used Python’s array slicing facilities (Section 2.4.5) to extract the first and sec-
ond columns of the array and put them in separate arrays x and y for plotting.
The end result is a plot as shown in Fig. 3.2.
In fact, it’s not necessary in this case to use the separate arrays x and y. We
could shorten the program by saying instead

data = loadtxt("values.txt",float)
plot(data[:,0],data[:,1])
show()

which achieves the same result. (Arguably, however, this program is more
difficult to read. As we emphasized in Section 2.7, it is a good idea to make
programs easy to read where possible, so you might, in this case, want to use

91
C HAPTER 3 | G RAPHICS AND VISUALIZATION

the extra arrays x and y even though they are not strictly necessary.)
An important point to notice about all of these examples is that the program
stops when it displays the graph. To be precise it stops when it gets to the
show function. Once you use show to display a graph, the program will go no
further until you close the window containing the graph. Only once you close
the window will the computer proceed with the next line of your program.
The function show is said to be a blocking function—it blocks the progress of
the program until the function is done with its job. We have seen one other
example of a blocking function previously, the function input, which collects
input from the user at the keyboard. It too halts the running of the program
until its job is done. (The blocking action of the show function has little impact
in the programs above, since the show statement is the last line of the program
in each case. But in more complex example there might we be further lines
after the show statement and their execution would be delayed until the graph
window was closed.)
A useful trick that we will employ frequently in this book is to build the lists
of x- and y-coordinates for a graph step by step as we go through a calculation.
It will happen often that we do not know all of the x or y values for a graph
ahead of time, but work them out one by one as part of some calculation we
are doing. In that case, a good way to create a graph of the results is to start
with two empty lists for the x- and y-coordinates and add points to them one
by one, as we calculate the values. Going back to the sine wave example, for
instance, here is an alternative way to make a graph of sin x that calculates the
individual values one by one and adds them to a growing list:

from pylab import plot,show


from math import sin
from numpy import linspace

xpoints = []
ypoints = []
for x in linspace(0,10,100):
xpoints.append(x)
ypoints.append(sin(x))

plot(xpoints,ypoints)
show()

If you run it, this program produces a picture of a sine wave identical to the
one in Fig. 3.1 on page 90. Notice how we created the two empty lists and

92
3.1 | G RAPHS

then appended values to the end of each one, one by one, using a for loop. We
will use this technique often. (See Section 2.4.1 for a discussion of the append
function.)
The graphs we have seen so far are very simple, but there are many extra
features we can add to them, some of which are illustrated in Fig. 3.3. For
instance, in all the previous graphs the computer chose the range of x and
y values for the two axes. Normally the computer makes good choices, but
occasionally you might like to make different ones. In our picture of a sine
wave, Fig. 3.1, for instance, you might decide that the graph would be clearer
if the curve did not butt right up against the top and bottom of the frame—
a little more space at top and bottom would be nice. You can override the
computer’s choice of x- and y-axis limits with the functions xlim and ylim.
These functions take two arguments each, for the lower and upper limits of
the range of the respective axes. Thus, for instance, we might modify our sine
wave program as follows:

from pylab import plot,ylim,show


from numpy import linspace,sin

x = linspace(0,10,100)
y = sin(x)
plot(x,y)
ylim(-1.1,1.1)
show()

The resulting graph is shown in Fig. 3.3a and, as we can see, it now has a little
extra space above and below the curve because the y-axis has been modified to
run from −1.1 to +1.1. Note that the ylim statement has to come after the plot
statement but before the show statement—the plot statement has to create the
graph first before you can modify its axes.
It’s good practice to label the axes of your graphs, so that you and anyone
else knows what they represent. You can add labels to the x- and y-axes with
the functions xlabel and ylabel, which take a string argument—a string of
letters or numbers in quotation marks. Thus we could again modify our sine
wave program above, changing the final lines to say:

plot(x,y)
ylim(-1.1,1.1)
xlabel("x axis")
ylabel("y = sin x")
show()

93
C HAPTER 3 | G RAPHICS AND VISUALIZATION

1.0 1.0

0.5 0.5

y = sin x
0.0 0.0

 0.5  0.5

 1.0  1.0

0 2 4 6 8 10 0 2 4 6 8 10
x axis

(a) (b)

1.0 1.0

0.5 0.5
y = sin x or y = cos x
y = sin x

0.0 0.0

 0.5 −0.5

 1.0 −1.0
0 2 4 6 8 10 0 2 4 6 8 10
x axis x axis

(c) (d)

Figure 3.3: Graph styles. Several different versions of the same sine wave plot. (a) A basic graph, but with a little
extra space added above and below the curve to make it clearer; (b) a graph with labeled axes; (c) a graph with the
curve replaced by circular dots; (d) sine and cosine curves on the same graph.

94
3.1 | G RAPHS

which produces the graph shown in Fig. 3.3b.


You can also vary the style in which the computer draws the curve on the
graph. To do this a third argument is added to the plot function, which takes
the form of a (slightly cryptic) string of characters, like this:

plot(x,y,"g--")

The first letter of the string tells the computer what color to draw the curve
with. Allowed letters are r, g, b, c, m, y, k, and w, for red, green, blue, cyan,
magenta, yellow, black, and white, respectively. The remainder of the string
says what style to use for the line. Here there are many options, but the ones
we’ll use most often are “-” for a solid line (like the ones we’ve seen so far),
“--” for a dashed line, “o” to mark points with a circle (but not connect them
with lines), and “s” to mark points with a square. Thus, for example, this
modification:

plot(x,y,"ko")
ylim(-1.1,1.1)
xlabel("x axis")
ylabel("y = sin x")
show()

tells the computer to plot our sine wave as a set of black circular points. The
result is shown in Fig. 3.3c.
Finally, we will often need to plot more than one curve or set of points on
the same graph. This can be achieved by using the plot function repeatedly.
For instance, here is a complete program that plots both the sine function and
the cosine function on the same graph, one as a solid curve, the other as a
dashed curve:

from pylab import plot,ylim,xlabel,ylabel,show


from numpy import linspace,sin,cos

x = linspace(0,10,100)
y1 = sin(x)
y2 = cos(x)
plot(x,y1,"k-")
plot(x,y2,"k--")
ylim(-1.1,1.1)
xlabel("x axis")
ylabel("y = sin x or y = cos x")
show()

95
C HAPTER 3 | G RAPHICS AND VISUALIZATION

The result is shown in Fig. 3.3d. You could also, for example, use a variant of
the same trick to make a plot that had both dots and lines for the same data—
just plot the data twice on the same graph, using two plot statements, one
with dots and one with lines.
There are many other variations and styles available in the pylab package.
You can add legends and annotations to your graphs. You can change the color,
size, or typeface used in the labels. You can change the color or style of the axes,
or add a background color to the graph. These and many other possibilities are
described in the on-line documentation at matplotlib.sourceforge.net.

Exercise 3.1: Plotting experimental data


In the on-line resources3 you will find a file called sunspots.txt, which contains the
observed number of sunspots on the Sun for each month since January 1749. The file
contains two columns of numbers, the first being the month and the second being the
sunspot number.
a) Write a program that reads in the data and makes a graph of sunspots as a func-
tion of time.
b) Modify your program to display only the first 1000 data points on the graph.
c) Modify your program further to calculate and plot the running average of the
data, defined by
1 r
2r m∑
Yk = yk+m ,
=−r
where r = 5 in this case (and the yk are the sunspot numbers). Have the program
plot both the original data and the running average on the same graph, again
over the range covered by the first 1000 data points.

3.2 S CATTER PLOTS

In an ordinary graph, such as those of the previous section, there is one in-
dependent variable, usually placed on the horizontal axis, and one depen-
dent variable, on the vertical axis. The graph is a visual representation of the
variation of the dependent variable as a function of the independent one—
voltage as a function of time, say, or temperature as a function of position. In

3
The on-line resources for this book can be downloaded in the form of a single “zip” file from
http://www.umich.edu/~mejn/cpresources.zip.

96
3.2 | S CATTER PLOTS

other cases, however, we measure or calculate two dependent variables. A


classic example in physics is the temperature and brightness—also called the
magnitude—of stars. Typically we might measure temperature and magnitude
for each star in a given set and we would like some way to visualize how the
two quantities are related. A standard approach is to use a scatter plot, a graph
in which the two quantities are placed along the axes and we make a dot on
the plot for each pair of measurements, i.e., for each star in this case.
There are two different ways to make a scatter plot using the
pylab package. One of them we have already seen: we can make 0.3

an ordinary graph, but with dots rather than lines to represent


the data points, using a statement of the form:
0.2
plot(x,y,"ko")

This will place a black dot at each point. A slight variant of the
0.1
same idea is this:

plot(x,y,"k.")
0
which will produce smaller dots. 0 0.1 0.2 0.3

Alternatively, pylab provides the function scatter, which is


A small scatter plot.
designed specifically for making scatter plots. It works in a sim-
ilar fashion to the plot function: you give it two lists or arrays,
one containing the x-coordinates of the points and the other containing the
y-coordinates, and it creates the corresponding scatter plot:

scatter(x,y)

You do not have to give a third argument telling scatter to plot the data
as dots—all scatter plots use dots automatically. As with the plot function,
scatter only creates the scatter plot in the memory of the computer but does
not display it on the screen. To display it you need to use the function show.
Suppose, for example, that we have the temperatures and magnitudes of a
set of stars in a file called stars.txt on our computer, like this:

4849.4 5.97
5337.8 5.54
4576.1 7.72
4792.4 7.18
5141.7 5.92
6202.5 4.13
...

97
C HAPTER 3 | G RAPHICS AND VISUALIZATION

The first column is the temperature and the second is the magnitude. Here’s a
Python program to make a scatter plot of these data:

File: hrdiagram.py from pylab import scatter,xlabel,ylabel,xlim,ylim,show


from numpy import loadtxt

data = loadtxt("stars.txt",float)
x = data[:,0]
y = data[:,1]

scatter(x,y)
xlabel("Temperature")
ylabel("Magnitude")
xlim(0,13000)
ylim(-5,20)
show()

If we run this program it produces the figure shown in Fig. 3.4.


Many of the same variants illustrated in Fig. 3.3 for the plot function work
for the scatter function also. In this program we used xlabel and ylabel
to label the temperature and magnitude axes, and xlim and ylim to set the
ranges of the axes. You can also change the size and style of the dots and
many other things. In addition, as with the plot function, you can use scatter
two or more times in succession to plot two or more sets of data on the same
graph, or you can use any combination of scatter and plot functions to draw
scatter data and curves on the same graph. Again, see the on-line manual at
matplotlib.sourceforge.net for more details.
The scatter plot of the magnitudes and temperatures in Fig. 3.4 reveals an
interesting pattern in the data: a substantial majority of the points lie along a
rough band running from top left to bottom right of the plot. This is the so-
called main sequence to which most stars belong. Rarer types of stars, such as
red giants and white dwarfs, stand out in the figure as dots that lie well off
the main sequence. A scatter plot of stellar magnitude against temperature is
called a Hertzsprung–Russell diagram after the astronomers who first drew it.
The diagram is one of the fundamental tools of stellar astrophysics.
In fact, Fig. 3.4 is, in a sense, upside down, because the Hertzsprung–
Russell diagram is, for historical reasons,4 normally plotted with both the mag-

4
The magnitude of a star is defined in such a way that it actually increases as the star gets
fainter, so reversing the vertical axis makes sense since it puts the brightest stars at the top. The
temperature axis is commonly plotted not directly in terms of temperature but in terms of the so-

98
3.2 | S CATTER PLOTS

20

15

10
Magnitude

50 2000 4000 6000 8000 10000 12000


Temperature

Figure 3.4: The Hertzsprung–Russell diagram. A scatter plot of the magnitude


(i.e., brightness) of stars against their approximate surface temperature (which is es-
timated from the color of the light they emit). Each dot on the plot represents one star
out of a catalog of 7860 stars that are close to our solar system.

nitude and temperature axes decreasing, rather than increasing. One of the nice
things about pylab is that it is easy to change this kind of thing with just a
small modification of the Python program. All we need to do in this case is
change the xlim and ylim statements so that the start and end points of each
axis are reversed, thus:

xlim(13000,0)
ylim(-20,5)

Then the figure will be magically turned around.

called color index, which is a measure of the color of light a star emits, which is in turn a measure
of temperature. Temperature decreases with increasing color index, which is why the standard
Hertzsprung–Russell diagram has temperature decreasing along the horizontal axis.

99
C HAPTER 3 | G RAPHICS AND VISUALIZATION

3.3 D ENSITY PLOTS

There are many times in physics when we need to work with two-dimensional
grids of data. A condensed matter physicist might measure variations in charge
or temperature or atomic deposition on a solid surface; a fluid dynamicist
might measure the heights of waves in a ripple tank; a particle physicist might
measure the distribution of particles incident on an imaging detector; and so
on. Two-dimensional data is harder to visualize on a computer screen than
the one-dimensional lists of values that go into an ordinary graph. But one
tool that is helpful in many cases is the density plot, a two-dimensional plot
where color or brightness is used to indicate data values. Figure 3.5 shows an
example.
In Python density plots are produced by the function
0 imshow from pylab. Here’s the program that produced
Fig. 3.5:
100
from pylab import imshow,show
from numpy import loadtxt
200
data = loadtxt("circular.txt",float)
imshow(data)
300 show()

The file circular.txt contains a simple array of values,


400 like this:

0.0050 0.0233 0.0515 0.0795 0.1075 ...


500 0.0233 0.0516 0.0798 0.1078 0.1358 ...
0 100 200 300 400 500
0.0515 0.0798 0.1080 0.1360 0.1639 ...
0.0795 0.1078 0.1360 0.1640 0.1918 ...
Figure 3.5: A example of a density plot
0.1075 0.1358 0.1639 0.1918 0.2195 ...
... ... ... ... ...

The program reads the values in the file and puts them in
the two-dimensional array data using the loadtxt function, then creates the
density plot with the imshow function and displays it with show. The computer
automatically adjusts the color-scale so that the picture uses the full range of
available shades.
The computer also adds numbered axes along the sides of the figure, which
measure the rows and columns of the array, though it is possible to change
the calibration of the axes to use different units—we’ll see how to do this in

100
3.3 | D ENSITY PLOTS

a moment. The image produced is a direct picture of the array, laid out in
the usual fashion for matrices, row by row, starting at the top and working
downwards. Thus the top left corner in Fig. 3.5 represents the value stored in
the array element data[0,0], followed to the right by data[0,1], data[0,2],
and so on. Immediately below those, the next row is data[1,0], data[1,1],
data[1,2], and so on.
Note that the numerical labels on the axes reflect the array indices, with
the origin of the figure being at the top left and the vertical axis increasing
downwards. While this is natural from the point of view of matrices, it is a
little odd for a graph. Most of us are accustomed to graphs whose vertical axes
increase upwards. What’s more, the array elements data[i,j] are written
(as is the standard with matrices) with the row index first—i.e., the vertical
index—and the column, or horizontal, index second. This is the opposite of the
convention normally used with graphs where we list the coordinates of a point
in x, y order—i.e., horizontal first, then vertical. There’s nothing much we can
do about the conventions for matrices: they are the ones that mathematicians
settled upon centuries ago and it’s too late to change them now. But the conflict
between those conventions and the conventions used when plotting graphs
can be confusing, so take this opportunity to make a mental note.
In fact, Python provides a way to deal with the first problem, of the origin
in a density plot being at the top. You can include an additional argument with
the imshow function thus:

imshow(data,origin="lower")

which flips the density plot top-to-bottom, putting the array element data[0,0]
in the lower left corner, as is conventional, and changing the labeling of the ver-
tical axis accordingly, so that it increases in the upward direction. The resulting
plot is shown in Fig. 3.6a. We will use this trick for most of the density plots
in this book. Note, however, that this does not fix our other problem: indices
i and j for the element data[i,j] still correspond to vertical and horizontal
positions respectively, not the reverse. That is, the index i corresponds to the y-
coordinate and the index j corresponds to the x-coordinate. You need to keep
this in mind when making density plots—it’s easy to get the axes swapped by
mistake.
The black-and-white printing in this book doesn’t really do justice to the
density plot in Fig. 3.6a. The original is in bright colors, ranging through the
spectrum from dark blue for the lowest values to red for the highest. If you
wish, you can run the program for yourself to see the density plot in its full

101
C HAPTER 3 | G RAPHICS AND VISUALIZATION

500 500

400 400

300 300

200 200

100 100

0 0
0 100 200 300 400 500 0 100 200 300 400 500

(a) (b)

5
4
4

3 3

2 2

1
1

0
0 2 4 6 8 10
0
2 3 4 5 6 7 8

(c) (d)

Figure 3.6: Density plots. Four different versions of the same density plot. (a) A plot using the default “heat map”
color scheme, which is colorful on the computer screen but doesn’t make much sense with the black-and-white
printing in this book. (b) The gray color scheme, which runs from black for the lowest values to white for the
highest. (c) The same plot as in panel (b) but with the calibration of the axes changed. Because the range chosen
is different for the horizontal and vertical axes, the computer has altered the shape of the figure to keep distances
equal in all directions. (d) The same plot as in (c) but with the horizontal range reduced so that only the middle
portion of the data is shown.

102
3.3 | D ENSITY PLOTS

glory—copies of the program, which is called circular.py, and the data file
circular.txt can be found in the on-line resources. Density plots with this
particular choice of colors from blue to red (or similar) are sometimes called
heat maps, because the same color scheme is often used to denote temperature,
with blue being the coldest temperature and red being the hottest.5 The heat
map color scheme is the default choice for density maps in Python, but it’s
not always the best. In fact, for most purposes, a simple gray-scale from black
to white is easier to read. Luckily, it’s simple to change the color scheme. To
change to gray-scale, for instance, you use the function gray, which takes no
arguments:6

from pylab import imshow,gray,show


from numpy import loadtxt

data = loadtxt("circular.txt",float)
imshow(data,origin="lower")
gray()
show()

Figure 3.6b shows the result. Even in black-and-white it looks somewhat dif-
ferent from the heat-map version in panel (a), and on the screen it looks entirely
different. Try it if you like.
All of the density plots in this book use the gray scale (except Figs. 3.5
and 3.6a of course). It may not be flashy, but it’s informative, easy to read,
and suitable for printing on monochrome printers or for publications (like
many scientific books and journals) that are in black-and-white only. How-
ever, pylab provides many other color schemes, which you may find useful
occasionally. A complete list, with illustrations, is given in the on-line docu-
mentation at matplotlib.sourceforge.net, but here are a few that might find
use in physics:

5
It’s not completely clear why people use these colors. As every physicist knows, red light
has the longest wavelength of the visible colors and corresponds to the coolest objects, while blue
has the shortest wavelengths and corresponds to the hottest—the exact opposite of the traditional
choices. The hottest stars, for instance, are blue and the coolest are red.
6
The function gray works slightly differently from other functions we have seen that modify
plots, such as xlabel or ylim. Those functions modified only the current plot, whereas gray (and
the other color scheme functions in pylab) changes the color scheme for all subsequent density
plots. If you write a program that makes more than one plot, you only need to call gray once.

103
C HAPTER 3 | G RAPHICS AND VISUALIZATION

jet The default heat-map color scheme


gray Gray-scale running from black to white
hot An alternative heat map that goes black-red-yellow-white
spectral A spectrum with 7 clearly defined colors, plus black and white
bone An alternative gray-scale with a hint of blue
hsv A rainbow scheme that starts and ends with red

Each of these has a corresponding function, jet(), spectral(), and so forth,


that selects the relevant color scheme for use in future density plots. Many
more color schemes are given in pylab and one can also define one’s own
schemes, although the definitions involve some slightly tricky programming.
Example code is given in Appendix E and in the on-line resources to define
three additional schemes that can be useful for physics:7

redblue Runs from red to blue via black


redwhiteblue Runs from red to blue via white
inversegray Runs from white to black, the opposite of gray

As with graphs and scatter plots, you can modify the appearance of den-
sity plots in various ways. The functions xlabel and ylabel work as before,
adding labels to the two axes. You can also change the scale marked on the
axes. By default, the scale corresponds to the elements of the array holding the
data, but you might want to calibrate your plot with a different scale. You can
do this by adding an extra parameter to imshow, like this:

imshow(data,origin="lower",extent=[0,10,0,5])

which results in a modified plot as shown in Fig. 3.6c. The argument consists of
“extent=” followed by a list of four values, which give, in order, the beginning
and end of the horizontal scale and the beginning and end of the vertical scale.
The computer will use these numbers to mark the axes, but the actual content
displayed in the body of the density plot remains unchanged—the extent ar-
gument affects only how the plot is labeled. This trick can be very useful if you
want to calibrate your plot in “real” units. If the plot is a picture of the surface
of the Earth, for instance, you might want axes marked in units of latitude and
longitude; if it’s a picture of a surface at the atomic scale you might want axes
marked in nanometers.

7
To use these color schemes copy the file colormaps.py from the on-line resources into the
folder containing your program and then in your program say, for example, “from colormaps
import redblue”. Then the statement “redblue()” will switch to the redblue color map.

104
3.3 | D ENSITY PLOTS

Note also that in Fig. 3.6c the computer has changed the shape of the plot—
its aspect ratio—to accommodate the fact that the horizontal and vertical axes
have different ranges. The imshow function attempts to make unit distances
equal along the horizontal and vertical directions where possible. Sometimes,
however, this is not what we want, in which case we can tell the computer to
use a different aspect ratio. For instance, if we wanted the present figure to
remain square we would say:

imshow(data,origin="lower",extent=[0,10,0,5],aspect=2.0)

This tells the computer to use unit distances twice as large along the vertical
axis as along the horizontal one, which will make the plot square once more.
Note that, as here, we are free to use any or all of the origin, extent, and
aspect arguments together in the same function. We don’t have to use them
all if we don’t want to—any selection is allowed—and they can come in any
order.
We can also limit our density plot to just a portion of the data, using the
functions xlim and ylim, just as with graphs and scatter plots. These func-
tions work with the scales specified by the extent argument, if there is one,
or with the row and column indices otherwise. So, for instance, we could say
xlim(2,8) to reduce the density plot of Fig. 3.6b to just the middle portion of
the horizontal scale, from 2 to 8. Figure 3.6d shows the result. Note that, un-
like the extent argument, xlim and ylim do change which data are displayed
in the body of the density plot—the extent argument makes purely cosmetic
changes to the labeling of the axes, but xlim and ylim actually change which
data appear.
Finally, you can use the functions plot and scatter to superimpose graphs
or scatter plots of data on the same axes as a density plot. You can use any
combination of imshow, plot, and scatter in sequence, followed by show, to
create a single graph with density data, curves, or scatter data, all on the same
set of axes.

E XAMPLE 3.1: WAVE INTERFERENCE

Suppose we drop a pebble in a pond and waves radiate out from the spot
where it fell. We could create a simple representation of the physics with a sine
wave, spreading out in a uniform circle, to represent the height of the waves at
some later time. If the center of the circle is at x1 , y1 then the distance r1 to the

105
C HAPTER 3 | G RAPHICS AND VISUALIZATION

center from a point x, y is


q
r1 = ( x − x1 )2 + ( y − y1 )2 (3.1)

and the sine wave for the height is

ξ 1 ( x, y) = ξ 0 sin kr1 , (3.2)

where ξ 0 is the amplitude of the waves and k is the wavevector, related to the
wavelength λ by k = 2π/λ.
Now suppose we drop another pebble in the pond, creating another circu-
lar set of waves with the same wavelength and amplitude but centered on a
different point x2 , y2 :
q
ξ 2 ( x, y) = ξ 0 sin kr2 with r2 = ( x − x2 )2 + ( y − y2 )2 . (3.3)

Then, assuming the waves add linearly (which is a reasonable assumption for
water waves, provided they are not too big), the total height of the surface at a
point x, y is
ξ ( x, y) = ξ 0 sin kr1 + ξ 0 sin kr2 . (3.4)
Suppose the wavelength of the waves is λ = 5 cm, the amplitude is 1 cm, and
the centers of the circles are 20 cm apart. Here is a program to make an image of
the height over a 1 m square region of the pond. To make the image we create
an array of values representing the height ξ at a grid of points and then use
that array to make a density plot. In this example we use a grid of 500 × 500
points to cover the 1 m square, which means the grid points have a separation
of 100/500 = 0.2 cm.

File: ripples.py from math import sqrt,sin,pi


from numpy import empty
from pylab import imshow,gray,show

wavelength = 5.0
k = 2*pi/wavelength
xi0 = 1.0
separation = 20.0 # Separation of centers in cm
side = 100.0 # Side of the square in cm
points = 500 # Number of grid points along each side
spacing = side/points # Spacing of points in cm

106
3.3 | D ENSITY PLOTS

# Calculate the positions of the centers of the circles


x1 = side/2 + separation/2
y1 = side/2
x2 = side/2 - separation/2
y2 = side/2

# Make an array to store the heights


xi = empty([points,points],float)

# Calculate the values in the array


for i in range(points):
y = spacing*i
for j in range(points):
x = spacing*j
r1 = sqrt((x-x1)**2+(y-y1)**2)
r2 = sqrt((x-x2)**2+(y-y2)**2)
xi[i,j] = xi0*sin(k*r1) + xi0*sin(k*r2)

# Make the plot


imshow(xi,origin="lower",extent=[0,side,0,side])
gray()
show()

This is the longest and most involved program we have seen so far, so it may
be worth taking a moment to make sure you understand how it works. Note
in particular how the height is calculated and stored in the array xi. The vari-
ables i and j go through the rows and columns of the array respectively, and
from these we calculate the values of the coordinates x and y. Since, as dis-
cussed earlier, the rows correspond to the vertical axis and the columns to the
horizontal axis, the value of x is calculated from j and the value of y is calcu-
lated from i. Other than this subtlety, the program is a fairly straightforward
translation of Eqs. (3.1–3.4).8
If we run the program above, it produces the picture shown in Fig. 3.7. The
picture shows clearly the interference of the two sets of waves. The interference
fringes are visible as the gray bands radiating from the center.

8
One other small detail is worth mentioning. We called the variable for the wavelength
“wavelength”. You might be tempted to call it “lambda” but if you did you would get an er-
ror message and the program would not run. The word “lambda” has a special meaning in the
Python language and cannot be used as a variable name, just as words like “for” and “if” cannot
be used as variable names. (See footnote 5 on page 13.) The names of other Greek letters—alpha,
beta, gamma, and so on—are allowed as variable names.

107
C HAPTER 3 | G RAPHICS AND VISUALIZATION

100

80

60

40

20

0
0 20 40 60 80 100

Figure 3.7: Interference pattern. This plot, produced by the program given in the text,
shows the superposition of two circular sets of sine waves, creating an interference
pattern with fringes that appear as the gray bars radiating out from the center of the
picture.

Exercise 3.2: There is a file in the on-line resources called stm.txt, which contains a
grid of values from scanning tunneling microscope measurements of the (111) surface
of silicon. A scanning tunneling microscope (STM) is a device that measures the shape
of a surface at the atomic level by tracking a sharp tip over the surface and measuring
quantum tunneling current as a function of position. The end result is a grid of values
that represent the height of the surface and the file stm.txt contains just such a grid of
values. Write a program that reads the data contained in the file and makes a density
plot of the values. Use the various options and variants you have learned about to make
a picture that shows the structure of the silicon surface clearly.

108
3.4 | 3D GRAPHICS

3.4 3D GRAPHICS

One of the flashiest applications of computers today is the creation of 3D graph-


ics and computer animation. In any given week millions of people flock to
cinemas worldwide to watch the latest computer-animated movie from the
big animation studios. 3D graphics and animation find a more humble, but
very useful, application in computational physics as a tool for visualizing the
behavior of physical systems. Python provides some excellent tools for this
purpose, which we’ll use extensively in this book.
There are a number of different packages available for graphics and anima-
tion in Python, but we will focus on the package visual, which is specifically
designed with physicists in mind. This package provides a way to create sim-
ple pictures and animations with a minimum of effort, but also has enough
power to handle complex situations when needed.
The visual package works by creating specified objects on the screen, such
as spheres, cylinders, cones, and so forth, and then, if necessary, changing their
position, orientation, or shape to make them move around. Here’s a short first
program using the package:

from visual import sphere


sphere()

When we run this program a window appears on the screen with a large sphere
in it, like this:

The window of course is two-dimensional, but the computer stores the shape
and position of the sphere in three dimensions and automatically does a per-

109
C HAPTER 3 | G RAPHICS AND VISUALIZATION

spective rendering of the sphere with a 3D look to it that aids the eye in under-
standing the scene.
You can choose the size and position of the sphere like this

sphere(radius=0.5,pos=[1.0,-0.2,0.0])

The radius is specified as a single number. The units are arbitrary and the
computer will zoom in or out as necessary to make the sphere visible. So you
can set the radius to 0.5 as here, or to 10−15 if you’re drawing a picture of a
proton. Either will work fine.
The position of the sphere is a three-dimensional vector, which you give as
a list or array of three real numbers x, y, z (we used a list in this case). The x-
and y-axes run to the right and upwards in the window, as normal, and the
z-axis runs directly out of the screen towards you. You can also specify the
position as a list or array of just two numbers, x and y, in which case Python
will assume the z-coordinate to be zero. This can be useful for drawing pictures
of two-dimensional systems, which have no z-coordinate.
You can also change the color of the sphere thus:

from visual import sphere,color


sphere(color=color.green)

Note how we have imported the object called color from the visual package,
then individual colors are called things like color.green and color.red. The
available colors are the same as those for drawing graphs with pylab: red,
green, blue, cyan, magenta, yellow, black, and white.9 The color argument
can be used at the same time as the radius and position arguments, so one
can control all features of the sphere at the same time.
We can also create several spheres, all in the same window on the screen,
by using the sphere function repeatedly, putting different spheres in different
places to build up an entire scene made of spheres. The following exercise
gives an example.

9
All visible colors can be represented as mixtures of the primary colors red, green, and blue,
and this is how they are stored inside the computer. A “color” in the visual package is actually
just a list of three floating-point numbers giving the intensities of red, green, and blue (in that
order) on a scale of 0 to 1 each. Thus red is [ 1.0, 0.0, 0.0 ], yellow is [ 1.0, 1.0, 0.0 ],
and white is [ 1.0, 1.0, 1.0 ]. You can create your own colors if you want by writing things
like midgray = [ 0.5, 0.5, 0.5 ]. Then you can use “midgray” just like any other color. (You
would just say midgray, not color.midgray, because the color you defined is an ordinary variable,
not a part of the color object in visual.)

110
3.4 | 3D GRAPHICS

Figure 3.8: Visualization of atoms in a simple cubic lattice. A perspective rendering


of atoms in a simple cubic lattice, generated using the visual package and the program
lattice.py given in the text.

E XAMPLE 3.2: P ICTURING AN ATOMIC LATTICE

Suppose we have a solid composed of atoms arranged on a simple cubic lattice.


We can visualize the arrangement of the atoms using the visual package by
creating a picture with many spheres at positions (i, j, k ) with i, j, k = − L . . . L,
thus:

from visual import sphere File: lattice.py


L = 5
R = 0.3
for i in range(-L,L+1):
for j in range(-L,L+1):
for k in range(-L,L+1):
sphere(pos=[i,j,k],radius=R)

Notice how this program has three nested for loops that run through all com-
binations of the values of i, j, and k. Run this program and it produces the
picture shown in Fig. 3.8. Download the program and try it if you like.
After running the program, you can rotate your view of the lattice to look
at it from different angles by moving the mouse while holding down either the

111
C HAPTER 3 | G RAPHICS AND VISUALIZATION

right mouse button or the Ctrl key on the keyboard (the Command key on a
Mac). You can also hold down both mouse buttons (if you have two), or the
Alt key (the Option key on a Mac) and move the mouse in order to zoom in
and out of the picture.

Exercise 3.3: Using the program from Example 3.2 above as a starting point, or starting
from scratch if you prefer, do the following:
a) A sodium chloride crystal has sodium and chlorine atoms arranged on a cubic
lattice but the atoms alternate between sodium and chlorine, so that each sodium
is surrounded by six chlorines and each chlorine is surrounded by six sodiums.
Create a visualization of the sodium chloride lattice using two different colors to
represent the two types of atoms.
b) The face-centered cubic (fcc) lattice, which is the most common lattice in naturally
occurring crystals, consists of a cubic lattice with atoms positioned not only at the
Atoms in the fcc lattice lie corners of each cube but also at the center of each face. Create a visualization of
at the corners and center an fcc lattice with a single species of atom (such as occurs in metallic iron, for
of each face of a cubic cell. instance).

It is possible to change the properties of a sphere after it is first created,


including its position, size, and color. When we do this the sphere will actually
move or change on the screen. In order to refer to a particular sphere on the
screen we must use a slightly different form of the sphere function to create it,
like this:

s = sphere()

This form, in addition to drawing a sphere on the computer screen, creates a


variable s in a manner similar to the way functions like zeros or empty create
arrays (see Section 2.4.2). The new variable s is a variable of type ”sphere”,
in the same way that other variables are of type int or float. This is a spe-
cial variable type used only in the visual package to store the properties of
spheres. Each sphere variable corresponds to a sphere on the screen and when
we change the properties stored in the sphere variable the on-screen sphere
changes accordingly. Thus, for example, we can say

s.radius = 0.5

112
3.4 | 3D GRAPHICS

and the radius of the corresponding sphere on the screen will change to 0.5,
right before our eyes. Or we can say

s.color = color.blue

and the color will change. You can also change the position of a sphere in this
way, in which case the sphere will move on the screen. We will use this trick
in Section 3.5 to create animations of physical systems.
You can use variables of the sphere type in similar ways to other types of
variable. A useful trick, for instance, is to create an array of spheres thus:

from visual import sphere


from numpy import empty
s = empty(10,sphere)

This creates an array, initially empty, of ten sphere-type variables that you can
then fill with actual spheres thus:

for n in range(10):
s[n] = sphere()

As each sphere is created, a corresponding sphere will appear on the screen.


This technique can be useful if you are creating a visualization or animation
with many spheres and you want to be able to change the properties of any of
them at will. Exercise 3.4 involves exactly such a situation, and the trick above
would be a good one to use in solving that exercise.
Spheres are by no means the only shape one can draw. There is a large
selection of other elements provided by the visual package, including boxes,
cones, cylinders, pyramids, and arrows. Here are the functions that create each
of these objects:

from visual import box,cone,cylinder,pyramid,arrow

box(pos=[x,y,z], axis=[a,b,c], \
length=L, height=H, width=W, up=[q,r,s])
cone(pos=[x,y,z], axis=[a,b,c], radius=R)
cylinder(pos=[x,y,z], axis=[a,b,c], radius=R)
pyramid(pos=[x,y,z], size=[z,b,c])
arrow(pos=[x,y,z], axis=[a,b,c], \
headwidth=H, headlength=L, shaftwidth=W)

For a detailed explanation of the meaning of all the parameters, take a look at
the on-line documentation at www.vpython.org. In addition to the parameters

113
C HAPTER 3 | G RAPHICS AND VISUALIZATION

above, standard ones like color can also be used to give the objects a different
appearance. And each element has a corresponding variable type—box, cone,
cylinder, and so forth—that is used for storing and changing the properties
of elements after they are created.
Another useful feature of the visual package is the ability to change vari-
ous properties of the screen window in which your objects appear. You can, for
example, change the window’s size and position on the screen, you can change
the background color, and you can change the direction that the “camera” is
looking in. All of these things you do with the function display. Here is an
example:

from visual import display


display(x=100,y=100,width=600,height=600, \
center=[5,0,0],forward=[0,0,-1], \
background=color.blue,foreground=color.yellow)

This will produce a window 600 × 600 in size, where size is measured in pixels
(the small dots that make up the picture on a computer screen). The win-
dow will be 100 pixels in from the left and top of the screen. The argument
“center=[5,0,0]” sets the point in 3D space that will be in the center of the
window, and “forward=[0,0,-1]” chooses the direction in which we are look-
ing. Between the two of them these two arguments determine the position and
direction of our view of the scene. The background color of the window will
be blue in this case and objects appearing in the window—the “foreground”—
will be yellow by default, although you can specify other colors for individual
objects in the manner described above for spheres.
(Notice also how we used the backslash character ”\” in the code above to
indicate to the computer that a single logical line of code has been spread over
more than one line in the text of the program. We discussed this use of the
backslash previously in Section 2.7.)
The arguments for the display function can be in any order and you do
not have to include all of them. You need include only those you want. The
ones you don’t include have sensible default values. For example, the default
background color is black and the default foreground color is white, so if you
don’t specify any colors you get white objects on a black background.
As with the sphere function you can assign a variable to keep track of the
display window by writing, for example,

d = display(background=color.blue)

114
3.5 | A NIMATION

or even just

d = display()

This allows you to change display parameters later in your program. For in-
stance, you can change the background color to black at any time by writ-
ing “d.background = color.black”. Some parameters, however, cannot be
changed later, notably the size and position of the window, which are fixed
when the window is created (although you can change the size and position
manually by dragging the window around the screen with your mouse).
There are many other features of the visual package that are not listed
here, including features that allow one to make some very sophisticated graph-
ics. For more details take a look at www.vpython.org.

3.5 A NIMATION
As we have seen, the visual package allows you to change the properties of an
on-screen object, such as its size, color, orientation, or position. If you change
the position of an object repeatedly and rapidly, you can make the object ap-
pear to be moving and you have an animation. We will use such animations in
this book to help us understand the behavior of physical systems.
For example, to create a sphere and then change its position you could do
the following:

from visual import sphere


s = sphere(pos=[0,0,0])
s.pos = [1,4,3]

This will create a sphere at the origin, then move it to the new position (1, 4, 3).
This is not not a very useful program, however. The computer is so fast
that you probably wouldn’t even see the sphere in its first position at the ori-
gin before it gets moved. To slow down movements to a point where they are
visible, visual provides a function called rate. Saying rate(x) tells the com-
puter to wait until 1/x of a second has passed since the last time you called
rate. Thus if you call rate(30) immediately before each change you make on
the screen, you will ensure that changes never get made more than 30 times a
second, which is very useful for making smooth animations.

115
C HAPTER 3 | G RAPHICS AND VISUALIZATION

E XAMPLE 3.3: A MOVING SPHERE

Here is a program to move a sphere around on the screen:

File: revolve.py from visual import sphere,rate


from math import cos,sin,pi
from numpy import arange

s = sphere(pos=[1,0,0],radius=0.1)
for theta in arange(0,10*pi,0.1):
rate(30)
x = cos(theta)
y = sin(theta)
s.pos = [x,y,0]

Here the value of the angle variable theta increases by 0.1 radians every 30th
of a second, the rate function ensuring that we go around the for loop 30 times
each second. The angle is converted into Cartesian coordinates and used to up-
date the position of the sphere. The net result, if we run the program is that
a sphere appears on the screen and moves around in a circle. Download the
program and try it if you like. This simple animation could be the basis, for in-
stance, for an animation of the simultaneous motions of the planets of the solar
system. Exercise 3.4 below invites you to create exactly such an animation.

Exercise 3.4: Visualization of the solar system


The innermost six planets of our solar system revolve around the Sun in roughly cir-
cular orbits that all lie approximately in the same (ecliptic) plane. Here are some basic
parameters:

Radius of object Radius of orbit Period of orbit


Object (km) (millions of km) (days)
Mercury 2440 57.9 88.0
Venus 6052 108.2 224.7
Earth 6371 149.6 365.3
Mars 3386 227.9 687.0
Jupiter 69173 778.5 4331.6
Saturn 57316 1433.4 10759.2
Sun 695500 – –

Using the facilities provided by the visual package, create an animation of the solar
system that shows the following:

116
3.5 | A NIMATION

a) The Sun and planets as spheres in their appropriate positions and with sizes pro-
portional to their actual sizes. Because the radii of the planets are tiny compared
to the distances between them, represent the planets by spheres with radii c1
times larger than their correct proportionate values, so that you can see them
clearly. Find a good value for c1 that makes the planets visible. You’ll also need
to find a good radius for the Sun. Choose any value that gives a clear visualiza-
tion. (It doesn’t work to scale the radius of the Sun by the same factor you use for
the planets, because it’ll come out looking much too large. So just use whatever
works.) For added realism, you may also want to make your spheres different
colors. For instance, Earth could be blue and the Sun could be yellow.
b) The motion of the planets as they move around the Sun (by making the spheres
of the planets move). In the interests of alleviating boredom, construct your pro-
gram so that time in your animation runs a factor of c2 faster than actual time.
Find a good value of c2 that makes the motion of the orbits easily visible but not
unreasonably fast. Make use of the rate function to make your animation run
smoothly.
Hint: You may find it useful to store the sphere variables representing the planets in an
array of the kind described on page 113.

Here’s one more trick that can prove useful. As mentioned above, you can
make your objects small or large and the computer will automatically zoom
in or out so that they remain visible. And if you make an animation in which
your objects move around the screen the computer will zoom out when ob-
jects move out of view, or zoom in as objects recede into the distance. While
this is useful in many cases, it can be annoying in others. The display func-
tion provides a parameter for turning the automatic zooming off if it becomes
distracting, thus:

display(autoscale=False)

More commonly, one calls the display function at the beginning of the pro-
gram and then turns off the zooming separately later, thus:

d = display()
d.autoscale = False

One can also turn it back on with

d.autoscale = True

A common approach is to place all the objects of your animation in their initial
positions on the screen first, allow the computer to zoom in or out appropri-

117
C HAPTER 3 | G RAPHICS AND VISUALIZATION

ately, so that they are all visible, then turn zooming off with “d.autoscale =
False” before beginning the animation proper, so that the view remains fixed
as objects move around.

F URTHER EXERCISES

3.5 Deterministic chaos and the Feigenbaum plot: One of the most famous examples
of the phenomenon of chaos is the logistic map, defined by the equation

x ′ = rx (1 − x ). (3.5)

For a given value of the constant r you take a value of x—say x = 21 —and you feed it
into the right-hand side of this equation, which gives you a value of x ′ . Then you take
that value and feed it back in on the right-hand side again, which gives you another
value, and so forth. This is a iterative map. You keep doing the same operation over and
over on your value of x, and one of three things happens:
1. The value settles down to a fixed number and stays there. This is called a fixed
point. For instance, x = 0 is always a fixed point of the logistic map. (You put
x = 0 on the right-hand side and you get x ′ = 0 on the left.)
2. It doesn’t settle down to a single value, but it settles down into a periodic pat-
tern, rotating around a set of values, such as say four values, repeating them in
sequence over and over. This is called a limit cycle.
3. It goes crazy. It generates a seemingly random sequence of numbers that appear
to have no rhyme or reason to them at all. This is deterministic chaos. “Chaos”
because it really does look chaotic, and “deterministic” because even though the
values look random, they’re not. They’re clearly entirely predictable, because they
are given to you by one simple equation. The behavior is determined, although it
may not look like it.
Write a program that calculates and displays the behavior of the logistic map. Here’s
what you need to do.
For a given value of r, start with x = 21 , and iterate the logistic map equation a
thousand times. That will give it a chance to settle down to a fixed point or limit cycle
if it’s going to. Then run for another thousand iterations and plot the points (r, x ) on
a graph where the horizontal axis is r and the vertical axis is x. You can either use the
plot function with the options "ko" or "k." to draw a graph with dots, one for each
point, of you can use the scatter function to draw a scatter plot (which always uses
dots). Repeat the whole calculation for values of r from 1 to 4 in steps of 0.01, plotting
the dots for all values of r on the same figure and then finally using the function show
once to display the complete figure.

118
E XERCISES

Your program should generate a distinctive plot that looks like a tree bent over onto
its side. This famous picture is called the Feigenbaum plot, after its discoverer Mitchell
Feigenbaum, or sometimes the figtree plot, a play on the fact that it looks like a tree and
Feigenbaum means “figtree” in German.10
Give answers to the following questions:
a) For a given value of r what would a fixed point look like on the Feigenbaum plot?
How about a limit cycle? And what would chaos look like?
b) Based on your plot, at what value of r does the system move from orderly be-
havior (fixed points or limit cycles) to chaotic behavior? This point is sometimes
called the “edge of chaos.”
The logistic map is a very simple mathematical system, but deterministic chaos is
seen in many more complex physical systems also, including especially fluid dynamics
and the weather. Because of its apparently random nature, the behavior of chaotic
systems is difficult to predict and strongly affected by small perturbations in outside
conditions. You’ve probably heard of the classic exemplar of chaos in weather systems,
the butterfly effect, which was popularized by physicist Edward Lorenz in 1972 when
he gave a lecture to the American Association for the Advancement of Science entitled,
“Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?”11

3.6 The Mandelbrot set: The Mandelbrot set, named after its discoverer, the French
mathematician Benoı̂t Mandelbrot, is a fractal, an infinitely ramified mathematical ob-
ject that contains structure within structure within structure, as deep as we care to look.
The definition of the Mandelbrot set is in terms of complex numbers as follows.
Consider the equation
z′ = z2 + c,
where z is a complex number and c is a complex constant. For any given value of
c this equation turns an input number z into an output number z′ . The definition of
the Mandelbrot set involves the repeated iteration of this equation: we take an initial
starting value of z and feed it into the equation to get a new value z′ . Then we take that
value and feed it in again to get another value, and so forth. The Mandelbrot set is the
set of points in the complex plane that satisfies the following definition:

10
There is another approach for computing the Feigenbaum plot, which is neater and faster,
making use of Python’s ability to perform arithmetic with entire arrays. You could create
an array r with one element containing each distinct value of r you want to investigate:
[1.0, 1.01, 1.02, ... ]. Then create another array x of the same size to hold the correspond-
ing values of x, which should all be initially set to 0.5. Then an iteration of the logistic map can be
performed for all values of r at once with a statement of the form x = r*x*(1-x). Because of the
speed with which Python can perform calculations on arrays, this method should be significantly
faster than the more basic method above.
11
Although arguably the first person to suggest the butterfly effect was not a physicist at all,
but the science fiction writer Ray Bradbury in his famous 1952 short story A Sound of Thunder, in
which a time traveler’s careless destruction of a butterfly during a tourist trip to the Jurassic era
changes the course of history.

119
C HAPTER 3 | G RAPHICS AND VISUALIZATION

For a given complex value of c, start with z = 0 and iterate repeatedly. If the
magnitude |z| of the resulting value is ever greater than 2, then the point in the
complex plane at position c is not in the Mandelbrot set, otherwise it is in the set.
In order to use this definition one would, in principle, have to iterate infinitely many
times to prove that a point is in the Mandelbrot set, since a point is in the set only if
the iteration never passes |z| = 2 ever. In practice, however, one usually just performs
some large number of iterations, say 100, and if |z| hasn’t exceeded 2 by that point then
we call that good enough.
Write a program to make an image of the Mandelbrot set by performing the iteration
for all values of c = x + iy on an N × N grid spanning the region where −2 ≤ x ≤ 2
and −2 ≤ y ≤ 2. Make a density plot in which grid points inside the Mandelbrot set
are colored black and those outside are colored white. The Mandelbrot set has a very
distinctive shape that looks something like a beetle with a long snout—you’ll know it
when you see it.
Hint: You will probably find it useful to start off with quite a coarse grid, i.e., with a
small value of N—perhaps N = 100—so that your program runs quickly while you are
testing it. Once you are sure it is working correctly, increase the value of N to produce
a final high-quality image of the shape of the set.
If you are feeling enthusiastic, here is another variant of the same exercise that can
produce amazing looking pictures. Instead of coloring points just black or white, color
points according to the number of iterations of the equation before |z| becomes greater
than 2 (or the maximum number of iterations if |z| never becomes greater than 2). If you
use one of the more colorful color schemes Python provides for density plots, such as
the “hot” or “jet” schemes, you can make some spectacular images this way. Another
interesting variant is to color according to the logarithm of the number of iterations,
which helps reveal some of the finer structure outside the set.

3.7 Least-squares fitting and the photoelectric effect: It’s a common situation in
physics that an experiment produces data that lies roughly on a straight line, like the
dots in this figure:

120
E XERCISES

The solid line here represents the underlying straight-line form, which we usually don’t
know, and the points representing the measured data lie roughly along the line but
don’t fall exactly on it, typically because of measurement error.
The straight line can be represented in the familiar form y = mx + c and a frequent
question is what the appropriate values of the slope m and intercept c are that corre-
spond to the measured data. Since the data don’t fall perfectly on a straight line, there
is no perfect answer to such a question, but we can find the straight line that gives the
best compromise fit to the data. The standard technique for doing this is the method of
least squares.
Suppose we make some guess about the parameters m and c for the straight line.
We then calculate the vertical distances between the data points and that line, as rep-
resented by the short vertical lines in the figure, then we calculate the sum of the
squares of those distances, which we denote χ2 . If we have N data points with co-
ordinates ( xi , yi ), then χ2 is given by
N
χ2 = ∑ (mxi + c − yi )2 .
i =1

The least-squares fit of the straight line to the data is the straight line that minimizes
this total squared distance from data to line. We find the minimum by differentiating
with respect to both m and c and setting the derivatives to zero, which gives
N N N
m ∑ xi2 + c ∑ xi − ∑ xi yi = 0,
i =1 i =1 i =1
N N
m ∑ xi + cN − ∑ yi = 0.
i =1 i =1

For convenience, let us define the following quantities:


N N N N
1 1 1 1
Ex =
N ∑ xi , Ey =
N ∑ yi , Exx =
N ∑ xi2 , Exy =
N ∑ xi yi ,
i =1 i =1 i =1 i =1

in terms of which our equations can be written


mExx + cEx = Exy ,
mEx + c = Ey .
Solving these equations simultaneously for m and c now gives
Exy − Ex Ey Exx Ey − Ex Exy
m= , c= .
Exx − Ex2 Exx − Ex2
These are the equations for the least-squares fit of a straight line to N data points. They
tell you the values of m and c for the line that best fits the given data.
a) In the on-line resources you will find a file called millikan.txt. The file contains
two columns of numbers, giving the x and y coordinates of a set of data points.
Write a program to read these data points and make a graph with one dot or circle
for each point.

121
C HAPTER 3 | G RAPHICS AND VISUALIZATION

b) Add code to your program, before the part that makes the graph, to calculate the
quantities Ex , Ey , Exx , and Exy defined above, and from them calculate and print
out the slope m and intercept c of the best-fit line.
c) Now write code that goes through each of the data points in turn and evaluates
the quantity mxi + c using the values of m and c that you calculated. Store these
values in a new array or list, and then graph this new array, as a solid line, on the
same plot as the original data. You should end up with a plot of the data points
plus a straight line that runs through them.
d) The data in the file millikan.txt are taken from a historic experiment by Robert
Millikan that measured the photoelectric effect. When light of an appropriate wave-
length is shone on the surface of a metal, the photons in the light can strike con-
duction electrons in the metal and, sometimes, eject them from the surface into the
free space above. The energy of an ejected electron is equal to the energy of the
photon that struck it minus a small amount φ called the work function of the sur-
face, which represents the energy needed to remove an electron from the surface.
The energy of a photon is hν, where h is Planck’s constant and ν is the frequency
of the light, and we can measure the energy of an ejected electron by measuring
the voltage V that is just sufficient to stop the electron moving. Then the voltage,
frequency, and work function are related by the equation

h
V= ν − φ,
e
where e is the charge on the electron. This equation was first given by Albert
Einstein in 1905.
The data in the file millikan.txt represent frequencies ν in hertz (first column)
and voltages V in volts (second column) from photoelectric measurements of this
kind. Using the equation above and the program you wrote, and given that the
charge on the electron is 1.602 × 10−19 C, calculate from Millikan’s experimental
data a value for Planck’s constant. Compare your value with the accepted value
of the constant, which you can find in books or on-line. You should get a result
within a couple of percent of the accepted value.
This calculation is essentially the same as the one that Millikan himself used to de-
termine of the value of Planck’s constant, although, lacking a computer, he fitted his
straight line to the data by eye. In part for this work, Millikan was awarded the Nobel
prize in physics in 1923.

122
C HAPTER 4

A CCURACY AND SPEED


now seen the basic elements of programming in Python: input
W E HAVE
and output, variables and arithmetic, loops and if statements. With
these we can perform a wide variety of calculations. We have also seen how to
visualize our results using various types of computer graphics. There are many
additional features of the Python language that we haven’t covered. In later
chapters of the book, for example, we will introduce a number of specialized
features, such as facilities for doing linear algebra and Fourier transforms. But
for now we have the main components in place to start doing physics.
There is, however, one fundamental issue that we have not touched upon.
Computers have limitations. They cannot store real numbers with an infinite
number of decimal places. There is a limit to the largest and smallest num-
bers they can store. They can perform calculations quickly, but not infinitely
quickly. In many cases these issues need not bother us—the computer is fast
enough and accurate enough for many of the calculations we do in physics.
However, there are also situations in which the computer’s limitations will af-
fect us significantly, so it will be crucial that we understand those limitations,
as well as methods for mitigating them or working around them when neces-
sary.

4.1 VARIABLES AND RANGES

We have seen examples of the use of variables in computer programs, includ-


ing integer, floating-point, and complex variables, as well as lists and arrays.
Python variables can hold numbers that span a wide range of values, including
very large numbers, but they cannot hold numbers that are arbitrarily large.
For instance, the largest value you can give a floating-point variable is about
10308 . (There is also a corresponding largest negative value of about −10308 .)
This is enough for most physics calculations, but we will see occasional ex-

123
C HAPTER 4 | A CCURACY AND SPEED

amples where we run into problems. Complex numbers are similar: both their
real and imaginary parts can go up to about ±10308 but not larger.1 Large num-
bers can be specified in scientific notation, using an “e” to denote the exponent.
For instance, 2e9 means 2 × 109 and 1.602e-19 means 1.602 × 10−19 . Note that
numbers specified in scientific notation are always floats. Even if the number
is, mathematically speaking, an integer (like 2e9), the computer will still treat
it as a float.
If the value of a variable exceeds the largest floating-point number that can
be stored on the computer we say the variable has overflowed. For instance, if a
floating-point variable x holds a number close to the maximum allowed value
of 10308 and then we execute a statement like “y = 10*x” it is likely that the
result will be larger than the maximum and the variable y will overflow (but
not the variable x, whose value is unchanged).
If this happened in the course of a calculation you might imagine that the
program would stop, perhaps giving an error message, but in Python this is
not what happens. Instead the computer will set the variable to the special
value “inf,” which means infinity. If you print such a variable with a print
statement, the computer will actually print the word “inf” on the screen. In
effect, every number over 10308 is infinity as far as the computer is concerned.
Unfortunately, this is usually not what you want, and when it happens your
program will probably give incorrect answers, so you need to watch out for
this problem. It’s rare, but it’ll probably happen to you at some point.
There is also a smallest number (meaning smallest magnitude) that can
be represented by a floating-point variable. In Python this number is 10−308
roughly.2 If you go any smaller than this—if the calculation underflows—the
computer will just set the number to zero. Again, this usually messes things
up and gives wrong answers, so you need to be on the lookout.
What about integers? Here Python does something clever. There is no
largest integer value in Python: it can represent integers to arbitrary precision.
This means that no matter how many digits an integer has, Python stores all
of them—provided you have enough memory on your computer. Be aware,
however, that calculations with integers, even simple arithmetic operations,
take longer with more digits, and can take a very long time if there are very

1
The actual largest number is 1.79769 × 10308 , which is the decimal representation of the binary
number 21024 , the largest number that can be represented in the IEEE 754 double-precision floating-
point format used by the Python language.
2
Actually 2.22507 × 10−308 , which is 2−1022 .

124
4.2 | N UMERICAL ERROR

many digits. Try, for example, doing print(2**1000000) in Python. The cal-
culation can be done—it yields a number with 301 030 digits—but it’s so slow
that you might as well forget about using your computer for anything else for
the next few minutes.3

Exercise 4.1: Write a program to calculate and print the factorial of a number entered by
the user. If you wish you can base your program on the user-defined function for facto-
rial given in Section 2.6, but write your program so that it calculates the factorial using
integer variables, not floating-point ones. Use your program to calculate the factorial
of 200.
Now modify your program to use floating-point variables instead and again calcu-
late the factorial of 200. What do you find? Explain.

4.2 N UMERICAL ERROR

Floating-point numbers (unlike integers) are represented on the computer to


only a certain precision. In Python, at least at the time of writing of this book,
the standard level of precision is 16 significant digits. This means that numbers

like π or 2, which have an infinite number of digits after the decimal point,
can only be represented approximately. Thus, for instance:

True value of π: 3.1415926535897932384626 . . .


Value in Python: 3.141592653589793
Difference: 0.0000000000000002384626 . . .

The difference between the true value of a number and its value on the com-
puter is called the rounding error on the number. It is the amount by which the
computer’s representation of the number is wrong.
A number does not have to be irrational like π to suffer from rounding
error—any number whose true value has more than 16 significant figures will
get rounded off. What’s more, when one performs arithmetic with floating-
point numbers, the answers are only guaranteed accurate to about 16 figures,
even if the numbers that went into the calculation were expressed exactly. If

3
If you do actually try this, then you might want to know how to stop your program if you
get bored waiting for it to finish. The simplest thing to do is just to close the window where it’s
running.

125
C HAPTER 4 | A CCURACY AND SPEED

you add 1.1 and 2.2 in Python, then obviously the answer should be 3.3, but
the computer might give 3.299999999999999 instead.
Usually this is accurate enough, but there are times when it can cause prob-
lems. One important consequence of rounding error is that you should never
use an if statement to test the equality of two floats. For instance, you should never,
in any program, have a statement like

if x==3.3:
print(x)

because it may well not do what you want it to do. If the value of x is supposed
to be 3.3 but it’s actually 3.299999999999999, then as far as the computer is
concerned it’s not 3.3 and the if statement will fail. In fact, it rarely occurs in
physics calculations that you need to test the equality of floats, but if you do,
then you should do something like this instead:

epsilon = 1e-12
if abs(x-3.3)<epsilon:
print(x)

As we saw in Section 2.2.6, the built-in function abs calculates the absolute
value of its argument, so abs(x-3.3) is the absolute difference | x − 3.3|. The
code above tests whether this difference is less than the small number epsilon.
In other words, the if statement will succeed whenever x is very close to 3.3,
but the two don’t have to be exactly equal. If x is 3.299999999999999 things will
still work as expected. The value of epsilon has to be chosen appropriately
for the situation—there’s nothing special or universal about the value of 10−12
used above and a different value may be appropriate in another calculation.
The rounding error on a number, which we will denote ǫ, is defined to be
the amount you would have to add to the value calculated by the computer to
get the true value. For instance, if we do the following:

from math import sqrt


x = sqrt(2)
√ √
then we will end up not with x = 2, but rather with x + ǫ = 2, where ǫ is

the rounding error, or equivalently x = 2 − ǫ. This is the same definition of
error that one uses when discussing measurement error in experiments. When
we say, for instance, that the age of the universe is 13.75 ± 0.11 billion years,
we mean that the measured value is 13.75 billion years, but the true value is
possibly greater or less than this by an amount of order 0.11 billion years.

126
4.2 | N UMERICAL ERROR

The error ǫ in the example above could be either positive or negative, de-
pending on how the variable x gets rounded off. If we are lucky ǫ could be
small, but we cannot count on it. In general if x is accurate to a certain number
of significant digits, say 16, then the rounding error will have a typical size
of x/1016 . It’s usually a good assumption to consider the error to be a (uni-
formly distributed) random number with standard deviation σ = Cx, where
C ≃ 10−16 in this case. We will refer to the constant C as the error constant.
When quoting the error on a calculation we typically give the standard de-
viation σ. (We can’t give the value of the error ǫ itself, since we don’t know
it—if we did, then we could calculate x + ǫ and recover the exact value for the
quantity of interest, so there would in effect be no error at all.)
Rounding error is important, as described above, if we are testing the equal-
ity of two floating-point numbers, but in other respects it may appear to be
only a minor annoyance. An error of one part in 1016 does not seem very bad.
But what happens if we now add, subtract, or otherwise combine several dif-
ferent numbers, each with its own error? In many ways the rounding error on
a number behaves similarly to measurement error in a laboratory experiment,
and the rules for combining errors are the same. For instance, if we add or
subtract two numbers x1 and x2 , with standard deviations σ1 and σ2 , then the
error on the sum or difference x = x1 ± x2 is given by the central limit theorem,
one of the classic results of probability theory, which says that the variance σ2
of the sum or difference is equal to the sum of the individual variances:

σ2 = σ12 + σ22 . (4.1)

Hence the standard deviation of the sum or difference is


q
σ = σ12 + σ22 . (4.2)

Similarly if we multiply or divide two numbers then the variance of the result x
obeys
σ2 σ2 σ2
2
= 12 + 22 . (4.3)
x x1 x2
But, as discussed above, the standard deviations on x1 and x2 are given
by σ1 = Cx1 and σ2 = Cx2 , so that if, for example, we are adding or subtracting
our two numbers, meaning Eq. (4.2) applies, then
q q
σ = C2 x12 + C2 x22 = C x12 + x22 . (4.4)

127
C HAPTER 4 | A CCURACY AND SPEED

I leave it as an exercise to show that the corresponding result for the error on
the product of two numbers x = x1 x2 or their ratio x = x1 /x2 is

σ = 2 Cx. (4.5)

We can extend these results to combinations of more than two numbers.


If, for instance, we are calculating the sum of N numbers x1 . . . x N with errors
having standard deviation σi = Cxi , then the variance on the final result is the
sum of the variances on the individual numbers:
N N
σ2 = ∑ σi2 = ∑ C2 xi2 = C2 Nx2 , (4.6)
i =1 i =1

where x2 is the mean-square value of x. Thus the standard deviation on the


final result is √ p
σ = C N x2 . (4.7)
As we can see, this quantity increases in size as N increases—the more num-
bers we combine, the larger the error on the result—although the increase is a
relatively slow one, proportional to the square root of N.
We can also ask about the fractional error on ∑i xi , i.e., the total error divided
by the value of the sum. The size of the fractional error is given by
p p
σ C N x2 C x2
= = √ , (4.8)
∑i xi Nx N x

where x = N −1 ∑i xi is the mean value of x. In other words the fractional error


in the sum actually goes down as we add more numbers.
At first glance this appears to be pretty good. So what’s the problem? Ac-
tually, there are a couple of them. One is when the sizes of the numbers you
are adding vary widely. If some are much smaller than others then the smaller
ones may get lost. But the most severe problems arise when you are not adding
but subtracting numbers. Suppose, for instance, that we have the following
two numbers:

x = 1000000000000000
y = 1000000000000001.2345678901234

and we want to calculate the difference y − x. Unfortunately, the computer


only represents these two numbers to 16 significant figures, which means that

128
4.2 | N UMERICAL ERROR

as far as the computer is concerned:

x = 100000000000000
y = 100000000000001.2

The first number is represented exactly in this case, but the second has been
truncated. Now when we take the difference we get y − x = 1.2, when the true
result would be 1.2345678901234. In other words, instead of 16-figure accuracy,
we now only have two figures and the fractional error is several percent of the
true value. This is much worse than before.
To put this in more general terms, if the difference between two numbers is
very small, comparable with the error on the numbers, i.e., with the accuracy
of the computer, then the fractional error can become large and you may have
a problem.

E XAMPLE 4.1: T HE DIFFERENCE OF TWO NUMBERS

To see an example of this in practice, consider the two numbers



x = 1, y = 1 + 10−14 2. (4.9)

Trivially we see that √


1014 (y − x ) = 2. (4.10)
Let us perform the same calculation in Python and see what we get. Here is
the program:

from math import sqrt


x = 1.0
y = 1.0 + (1e-14)*sqrt(2)
print((1e14)*(y-x))
print(sqrt(2))

The penultimate line calculates the value in Eq. (4.10) while the last line prints

out the true value of 2 (at least to the accuracy of the computer). Here’s what
we get when we run the program:

1.42108547152
1.41421356237

As we can see, the calculation is accurate to only the first decimal place—after
that the rest is garbage.

129
C HAPTER 4 | A CCURACY AND SPEED

This issue, of large errors in calculations that involve the subtraction of


numbers that are nearly equal, arises with some frequency in physics calcu-
lations. We will see various examples throughout the book. It is perhaps the
most common cause of significant numerical error in computations and you
need to be aware of it at all times when writing programs.

Exercise 4.2: Quadratic equations


a) Write a program that takes as input three numbers, a, b, and c, and prints out
the two solutions to the quadratic equation ax2 + bx + c = 0 using the standard
formula √
−b ± b2 − 4ac
x= .
2a
Use your program to compute the solutions of 0.001x2 + 1000x + 0.001 = 0.
b) There is another way to write the solutions to √
a quadratic equation. Multiplying
top and bottom of the solution above by −b ∓ b2 − 4ac, show that the solutions
can also be written as
2c
x= √ .
−b ∓ b2 − 4ac
Add further lines to your program to print out these values in addition to the
earlier ones and again use the program to solve 0.001x2 + 1000x + 0.001 = 0.
What do you see? How do you explain it?
c) Using what you have learned, write a new program that calculates both roots of
a quadratic equation accurately in all cases.
This is a good example of how computers don’t always work the way you expect them
to. If you simply apply the standard formula for the quadratic equation, the computer
will sometimes get the wrong answer. In practice the method you have worked out here
is the correct way to solve a quadratic equation on a computer, even though it’s more
complicated than the standard formula. If you were writing a program that involved
solving many quadratic equations this method might be a good candidate for a user-
defined function: you could put the details of the solution method inside a function to
save yourself the trouble of going through it step by step every time you have a new
equation to solve.

Exercise 4.3: Calculating derivatives


Suppose we have a function f ( x ) and we want to calculate its derivative at a point x.
We can do that with pencil and paper if we know the mathematical form of the function,
or we can do it on the computer by making use of the definition of the derivative:

df f ( x + δ) − f ( x )
= lim .
dx δ →0 δ

130
4.3 | P ROGRAM SPEED

On the computer we can’t actually take the limit as δ goes to zero, but we can get a
reasonable approximation just by making δ small.
a) Write a program that defines a function f(x) returning the value x ( x − 1), then
calculates the derivative of the function at the point x = 1 using the formula
above with δ = 10−2 . Calculate the true value of the same derivative analyti-
cally and compare with the answer your program gives. The two will not agree
perfectly. Why not?
b) Repeat the calculation for δ = 10−4 , 10−6 , 10−8 , 10−10 , 10−12 , and 10−14 . You
should see that the accuracy of the calculation initially gets better as δ gets smaller,
but then gets worse again. Why is this?
We will look at numerical derivatives in more detail in Section 5.9, where we will study
techniques for dealing with these issues and maximizing the accuracy of our calcula-
tions.

4.3 P ROGRAM SPEED

As we have seen, computers are not infinitely accurate. And neither are they
infinitely fast. Yes, they work at amazing speeds, but many physics calcula-
tions require the computer to perform millions of individual computations to
get a desired overall result and collectively those computations can take a sig-
nificant amount of time. Some of the example calculations described in Chap-
ter 1 took months to complete, even though they were run on some of the most
powerful computers in the world.
One thing we need to get a feel for is how fast computers really are. As a
general guide, performing a million mathematical operations is no big prob-
lem for a computer—it usually takes less than a second. Adding a million
numbers together, for instance, or finding a million square roots, can be done
in very little time. Performing a billion operations, on the other hand, could
take minutes or hours, though it’s still possible provided you are patient. Per-
forming a trillion operations, however, will basically take forever. So a fair rule
of thumb is that the calculations we can perform on a computer are ones that
can be done with about a billion operations or less.
This is only a rough guide. Not all operations are equal and it makes a
difference whether we are talking about additions or multiplications of single
numbers (which are easy and quick) versus, say, calculating Bessel functions
or multiplying matrices (which are not). Moreover, the billion-operation rule
will change over time because computers get faster. However, computers have
been getting faster a lot less quickly in the last few years—progress has slowed.
So we’re probably stuck with a billion operations for a while.

131
C HAPTER 4 | A CCURACY AND SPEED

E XAMPLE 4.2: Q UANTUM HARMONIC OSCILLATOR AT FINITE TEMPERATURE

The quantum simple harmonic oscillator has energy levels En = h̄ω (n + 12 ),


where n = 0, 1, 2, . . . , ∞. As shown by Boltzmann and Gibbs, the average en-
ergy of a simple harmonic oscillator at temperature T is

1
h Ei =
Z ∑ En e−βE ,
n
(4.11)
n =0

where β = 1/(k B T ) with k B being the Boltzmann constant, and Z = ∑∞ n =0 e


− βEn .

Suppose we want to calculate, approximately, the value of h Ei when k B T =


100. Since the terms in the sums for h Ei and Z dwindle in size quite quickly as
n becomes large, we can get a reasonable approximation by taking just the first
1000 terms in each sum. Working in units where h̄ = ω = 1, here’s a program
to do the calculation:

File: qsho.py from math import exp

terms = 1000
beta = 1/100
S = 0.0
Z = 0.0
for n in range(terms):
E = n + 0.5
weight = exp(-beta*E)
S += weight*E
Z += weight

print(S/Z)

Note a few features of this program:


1. Constants like the number of terms and the value of β are assigned to
variables at the beginning of the program. As discussed in Section 2.7,
this is good programming style because it makes them easy to find and
modify and makes the rest of the program more readable.
2. We used just one for loop to calculate both sums. This saves time, making
the program run faster.
3. Although the exponential e− βEn occurs separately in both sums, we calcu-
late it only once each time around the loop and save its value in the vari-
able weight. This also saves time: exponentials take significantly longer
to calculate than, for example, additions or multiplications. (Of course

132
4.3 | P ROGRAM SPEED

“longer” is relative—the times involved are probably still less than a mi-
crosecond. But if one has to go many times around the loop even those
short times can add up.)
If we run the program we get this result:

99.9554313409

The calculation (on my desktop computer) takes 0.01 seconds. Now let us try
increasing the number of terms in the sums (which just means increasing the
value of the variable terms at the top of the program). This will make our
approximation more accurate and give us a better estimate of our answer, at
the expense of taking more time to complete the calculation. If we increase the
number of terms to a million then it does change our answer somewhat:

100.000833332

The calculation now takes 1.4 seconds, which is significantly longer, but still a
short time in absolute terms.
Now let’s increase the number of terms to a billion. When we do this the
calculation takes 22 minutes to finish, but the result does not change at all:

100.000833332

There are three morals to this story. First, a billion operations is indeed doable—
if a calculation is important to us we can probably wait twenty minutes for an
answer. But it’s approaching the limit of what is reasonable. If we increased
the number of terms in our sum by another factor of ten the calculation would
take 220 minutes, or nearly four hours. A factor of ten beyond that and we’d
be waiting a couple of days for an answer.
Second, there is a balance to be struck between time spent and accuracy. In
this case it was probably worthwhile to calculate a million terms of the sum—
it didn’t take long and the result was noticeably, though not wildly, different
from the result for a thousand terms. But the change to a billion terms was
clearly not worth the effort—the calculation took much longer to complete but
the answer was exactly the same as before. We will see plenty of further exam-
ples in this book of calculations where we need to find an appropriate balance
between speed and accuracy.
Third, it’s pretty easy to write a program that will take forever to finish.
If we set the program above to calculate a trillion terms, it would take weeks
to run. So it’s worth taking a moment, before you spend a whole lot of time

133
C HAPTER 4 | A CCURACY AND SPEED

writing and running a program, to do a quick estimate of how long you expect
your calculation to take. If it’s going to take a year then it’s not worth it: you
need to find a faster way to do the calculation, or settle for a quicker but less
accurate answer. The simplest way to estimate running time is to make a rough
count of the number of mathematical operations the calculation will involve;
if the number is significantly greater than a billion, you have a problem.

E XAMPLE 4.3: M ATRIX MULTIPLICATION

Suppose we have two N × N matrices represented as arrays A and B on the


computer and we want to multiply them together to calculate their matrix
product. Here is a fragment of code to do the multiplication and place the
result in a new array called C:

from numpy import zeros


N = 1000
C = zeros([N,N],float)

for i in range(N):
for j in range(N):
for k in range(N):
C[i,j] += A[i,k]*B[k,j]

We could use this code, for example, as the basis for a user-defined function to
multiply arrays together. (As we saw in Section 2.4.4, Python already provides
the function “dot” for calculating matrix products, but it’s a useful exercise
to write our own code for the calculation. Among other things, it helps us
understand how many operations are involved in calculating such a product.)
How large a pair of matrices could we multiply together in this way if the
calculation is to take a reasonable amount of time? The program has three
nested for loops in it. The innermost loop, which runs through values of the
variable k, goes around N times doing one multiplication operation each time
and one addition, for a total of 2N operations. That whole loop is itself exe-
cuted N times, once for each value of j in the middle loop, giving 2N 2 opera-
tions. And those 2N 2 operations are themselves performed N times as we go
through the values of i in the outermost loop. The end result is that the ma-
trix multiplication takes 2N 3 operations overall. Thus if N = 1000, as above,
the whole calculation would involve two billion operations, which is feasible
in a few minutes of running time. Larger values of N, however, will rapidly
become intractable. For N = 2000, for instance, we would have 16 billion op-

134
4.3 | P ROGRAM SPEED

erations, which could take hours to complete. Thus the largest matrices we can
multiply are about 1000 × 1000 in size.4

Exercise 4.4: Calculating integrals


Suppose we want to calculate the value of the integral
Z 1 √
I= 1 − x2 dx.
−1

The integrand looks like a semicircle of radius 1:

-1 0 1

and hence the value of the integral—the area under the curve—must be equal to 21 π =
1.57079632679 . . .
Alternatively, we can evaluate the integral on the computer by dividing the domain
of integration into a large number N of slices of width h = 2/N each and then using
the Riemann definition of the integral:
N
I = lim
N →∞
∑ hyk ,
k =1

4
Interestingly, the direct matrix multiplication represented by the code given here is not the
fastest way to multiply two matrices on a computer. Strassen’s algorithm is an iterative method for
multiplying matrices that uses some clever shortcuts to reduce the number of operations needed so
that the total number is proportional to about N 2.8 rather than N 3 . For very large matrices this can
result in significantly faster computations. Unfortunately, Strassen’s algorithm suffers from large
numerical errors because of problems with subtraction of nearly equal numbers (see Section 4.2)
and for this reason it is rarely used. On paper, an even faster method for matrix multiplication is
the Coppersmith–Winograd algorithm, which requires a number of operations proportional to only
about N 2.4 , but in practice this method is so complex to program as to be essentially worthless—the
extra complexity means that in real applications the method is always slower than direct multipli-
cation.

135
C HAPTER 4 | A CCURACY AND SPEED

where q
yk = 1 − xk2 and xk = −1 + hk.
We cannot in practice take the limit N → ∞, but we can make a reasonable approxima-
tion by just making N large.
a) Write a program to evaluate the integral above with N = 100 and compare the
result with the exact value. The two will not agree very well, because N = 100 is
not a sufficiently large number of slices.
b) Increase the value of N to get a more accurate value for the integral. If we require
that the program runs in about one second or less, how accurate a value can you
get?
Evaluating integrals is a common task in computational physics calculations. We will
study techniques for doing integrals in detail in the next chapter. As we will see, there
are substantially quicker and more accurate methods than the simple one we have used
here.

136
C HAPTER 5

I NTEGRALS AND DERIVATIVES


preceding chapters we looked at the basics of computer program-
I N THE
ming using Python and solved some simple physics problems using what
we learned. You will get plenty of further opportunities to polish your pro-
gramming skills, but our main task from here on is to learn about the ideas
and techniques of computational physics, the physical and mathematical in-
sights that allow us to perform accurate calculations of physical quantities on
the computer.
One of the most basic but also most important applications of computers
in physics is the evaluation of integrals and derivatives. Numerical evalua-
tion of integrals is a particularly crucial topic because integrals occur widely
in physics calculations and, while some integrals can be done analytically in
closed form, most cannot. They can, however, almost always be done on a
computer. In this chapter we examine a number of different techniques for
evaluating integrals and derivatives, as well as taking a brief look at the re-
lated operation of interpolation.

5.1 F UNDAMENTAL METHODS FOR EVALUATING INTEGRALS

Suppose we wish to evaluate the integral of a given function. Let us consider


initially the simplest case, the integral of a function of a single variable over
a finite range. We will study a range of techniques for the numerical evalua-
tion of such integrals, but we start with the most basic—and also most widely
used—the trapezoidal rule.1

1
Also called the trapezium rule in British English.

137
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

f(x) f(x) f(x)

a b a b a b
x x x

(a) (b) (c)

Figure 5.1: Estimating the area under a curve. (a) A simple scheme for estimating the area under a curve by
dividing the area into rectangular slices. The gray shaded area approximates the area under the curve, though
not very well. (b) The trapezoidal rule approximates the area as a set of trapezoids, and is usually more accurate.
(c) With a larger number of slices, the shaded area is a more accurate approximation to the true area under the
curve.

5.1.1 T HE TRAPEZOIDAL RULE

Suppose we have a function f ( x ) and we want to calculate its integral with


respect to x from x = a to x = b, which we denote I ( a, b):
Z b
I ( a, b) = f ( x ) dx. (5.1)
a

This is equivalent to calculating the area under the curve of f ( x ) from a to b.


There is no known way to calculate such an area exactly in all cases on a com-
puter, but we can do it approximately by the method shown in Fig. 5.1a: we
divide the area up into rectangular slices, calculate the area of each one, and
then add them up. This, however, is a pretty poor approximation. The area
under the rectangles is not very close to the area under the curve.
A better approach, which involves very little extra work, is that shown
in Fig. 5.1b, where the area is divided into trapezoids rather than rectangles.
The area under the trapezoids is a considerably better approximation to the
area under the curve, and this approach, though simple, often gives perfectly
adequate results.

138
5.1 | F UNDAMENTAL METHODS FOR EVALUATING INTEGRALS

Suppose we divide the interval from a to b into N slices or steps, so that


each slice has width h = (b − a)/N. Then the right-hand side of the kth slice
falls at a + kh, and the left-hand side falls at a + kh − h = a + (k − 1)h. Thus
the area of the trapezoid for this slice is

Ak = 12 h f ( a + (k − 1)h) + f ( a + kh) .
£ ¤
(5.2)

This is the trapezoidal rule. It gives us a trapezoidal approximation to the area


under one slice of our function.
Now our approximation for the area under the whole curve is the sum of
the areas of the trapezoids for all N slices:
N N
∑ Ak = 21 h ∑
£ ¤
I ( a, b) ≃ f ( a + (k − 1)h) + f ( a + kh)
k =1 k =1
£1
f ( a) + f ( a + h) + f ( a + 2h) + . . . + 21 f (b)
¤
=h 2
· N −1 ¸
1 1
= h 2 f ( a) + 2 f (b) + ∑ f ( a + kh) . (5.3)
k =1

This is the extended trapezoidal rule—it is the extension to many slices of the
basic trapezoidal rule of Eq. (5.2). Being slightly sloppy in our usage, however,
we will often refer to it simply as the trapezoidal rule. Note the structure of the
formula: the quantity inside the square brackets is a sum over values of f ( x )
measured at equally spaced points in the integration domain, and we take a
half of the values at the start and end points but one times the value at all the
interior points.

E XAMPLE 5.1: I NTEGRATING A FUNCTION

Let us use the trapezoidal rule to calculate the integral of x4 − 2x + 1 from


x = 0 to x = 2. This is actually an integral we can do by hand, which means
we don’t really need to do it using the computer in this case, but it’s a good
first example because we can check easily if our program is working and how
accurate an answer it gives.
Here is a program to do the integration using the trapezoidal rule with
N = 10 slices:

def f(x): File:


return x**4 - 2*x + 1 trapezoidal.py

139
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

N = 10
a = 0.0
b = 2.0
h = (b-a)/N

s = 0.5*f(a) + 0.5*f(b)
for k in range(1,N):
s += f(a+k*h)

print(h*s)

This is a straightforward translation of the trapezoidal rule formula into com-


puter code: we create a function that calculates the integrand, set up all the
constants used, evaluate the sum for the integral I ( a, b) term by term, and then
multiply it by h and print it out.
If we run the program it prints

4.50656

The correct answer is


Z 2 h i2
( x4 − 2x + 1)dx = 1 5
5x − x2 + x = 4.4. (5.4)
0 0

So our calculation is moderately but not exceptionally accurate—the answer is


off by about 2%.
We can make the calculation more accurate by increasing the number of
slices. As shown in Fig. 5.1c, we approximate the area under the curve better
when N is larger, though the program will also take longer to reach an answer
because there are more terms in the sum to evaluate. If we increase the number
of slices to N = 100 and run the program again we get 4.40107, which is now
accurate to 0.02%, which is pretty good. And if we use N = 1000 we get
4.40001, which is accurate to 0.0002%. In Section 5.2 we will study in more
detail the accuracy of the trapezoidal rule.

Exercise 5.1:
In the on-line resources you will find a file called velocities.txt, which contains two
columns of numbers, the first representing time t in seconds and the second the x-
velocity in meters per second of a particle, measured once every second from time t = 0
to t = 100. The first few lines look like this:

140
5.1 | F UNDAMENTAL METHODS FOR EVALUATING INTEGRALS

0 0
1 0.069478
2 0.137694
3 0.204332
4 0.269083
5 0.331656

Write a program to do the following:


a) Read in the data and, using the trapezoidal rule, calculate from them the approx-
imate distance traveled by the particle in the x direction as a function of time. See
Section 2.4.3 on page 56 if you want a reminder of how to read data from a file.
b) Extend your program to make a graph that shows, on the same plot, both the
original velocity curve and the distance traveled as a function of time.

5.1.2 S IMPSON ’ S RULE

The trapezoidal rule is the simplest of numerical integration methods, taking


only a few lines of code as we have seen, but it is often perfectly adequate
for calculations where no great accuracy is required. It happens frequently in
physics calculations that we don’t need an answer accurate to many significant
figures and in such cases the ease and simplicity of the trapezoidal rule can
make it the method of choice. One should not turn up one’s nose at simple
methods like this; they play an important role and are used widely. Moreover,
the trapezoidal rule is the basis for several other more sophisticated methods
of evaluating integrals, including the adaptive methods that we will study in
Section 5.3 and the Romberg integration method of Section 5.4.
However, there are also cases where greater accuracy is required. As we
have seen we can increase the accuracy of the trapezoidal rule by increasing
the number N of steps used in the calculation. But in some cases, particularly
for integrands that are rapidly varying, a very large number of steps may be
needed to achieve the desired accuracy, which means the calculation can be-
come slow. There are other, more advanced schemes for calculating integrals
that can achieve high accuracy while still arriving at an answer quickly. In this
section we study one such scheme, Simpson’s rule.
In effect, the trapezoidal rule estimates the area under a curve by approxi-
mating the curve with straight-line segments—see Fig. 5.1b. We can often get a
better result if we approximate the function instead with curves of some kind.
Simpson’s rule does exactly this, using quadratic curves, as shown in Fig. 5.2.
In order to specify a quadratic completely one needs three points, not just two
as with a straight line. So in this method we take a pair of adjacent slices and fit

141
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

Quadratic 1
Quadratic 2

f(x)

a b
x

Figure 5.2: Simpson’s rule. Simpson’s rule involves fitting quadratic curves to pairs of
slices and then calculating the area under the quadratics.

a quadratic through the three points that mark the boundaries of those slices.
In Fig. 5.2 there are two quadratics, fitted to four slices. Simpson’s rule involves
approximating the integrand with quadratics in this way, then calculating the
area under those quadratics, which gives an approximation to the area under
the true curve.
Suppose, as before, that our integrand is denoted f ( x ) and the spacing of
adjacent points is h. And suppose for the purposes of argument that we have
three points at x = − h, 0, and + h. If we fit a quadratic ax2 + bx + c through
these points, then by definition we will have:

f (− h) = ah2 − bh + c, f (0) = c, f (h) = ah2 + bh + c. (5.5)

Solving these equations simultaneously for a, b, and c gives

1 £1 1£
f (− h) − f (0) + 12 f (h) ,
¤ ¤
a= 2 2 b= f (h) − f (− h) , c = f (0), (5.6)
h 2h
and the area under the curve of f ( x ) from − h to + h is given approximately by
the area under the quadratic:
Z h
( ax2 + bx + c) dx = 32 ah3 + 2ch = 13 h f (−h) + 4 f (0) + f (h) .
£ ¤
(5.7)
−h

142
5.1 | F UNDAMENTAL METHODS FOR EVALUATING INTEGRALS

This is Simpson’s rule. It gives us an approximation to the area under two ad-
jacent slices of our function. Note that the final formula for the area involves
only h and the value of the function at evenly spaced points, just as with the
trapezoidal rule. So to use Simpson’s rule we don’t actually have to worry
about the details of fitting a quadratic—we just plug numbers into this for-
mula and it gives us an answer. This makes Simpson’s rule almost as simple
to use as the trapezoidal rule, and yet Simpson’s rule often gives much more
accurate answers, as we will see.
To use Simpson’s rule to perform a general integral we note that Eq. (5.7)
does not depend on the fact that our three points lie at x = − h, 0, and + h. If
we were to slide the curve along the x-axis to either higher or lower values,
the area underneath it would not change. So we can use the same rule for any
three uniformly spaced points. Applying Simpson’s rule involves dividing
the domain of integration into many slices and using the rule to separately
estimate the area under successive pairs of slices, then adding the estimates
for all pairs to get the final answer. If, as before, we are integrating from x = a
to x = b in slices of width h then the three points bounding the first pair of
slices fall at x = a, a + h and a + 2h, the second pair at a + 2h, a + 3h, a + 4h,
and so forth. More generally the boundaries of the kth pair of slices fall at
a + (2k − 2)h, a + (2k − 1)h, and a + 2kh. Hence Simpson’s rule gives the area
under the kth pair, approximately, as

Ak = 13 h f ( a + (2k − 2)h) + 4 f ( a + (2k − 1)h) + f ( a + 2kh) .


£ ¤
(5.8)

Since there are N slices in total, there are N/2 pairs of slices, and the approxi-
mate value of the entire integral is given by the sum
N/2 N/2£
∑ Ak = 13 h ∑
¤
I ( a, b) ≃ f ( a + (2k − 2)h) + 4 f ( a + (2k − 1)h) + f ( a + 2kh)
k =1 k =1
1
£ ¤
= 3h f ( a) + 4 f ( a + h) + 2 f ( a + 2h) + 4 f ( a + 3h) + . . . + f (b)
· N/2 N/2−1 ¸
= 31 h f ( a) + f (b) + 4 ∑ f ( a + (2k − 1)h) + 2 ∑ f ( a + 2kh) . (5.9)
k =1 k =1

Note that the total number of slices must be even for Simpson’s rule to work.
Comparing the last line of Eq. (5.9) to Eq. (5.3) we see that Simpson’s rule is
modestly more complicated than the trapezoidal rule, but not enormously so.
Programs using it are still straightforward to create.
As an example, suppose we apply Simpson’s rule with N = 10 slices to
R2
the integral from Example 5.1, 0 ( x4 − 2x + 1)dx, whose true value, as we

143
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

saw, is 4.4. As shown in Exercise 5.2, this gives an answer of 4.400427, which
is already accurate to better than 0.01%, orders of magnitude better than the
trapezoidal rule with N = 10. Results for N = 100 and N = 1000 are better
still—see the exercise.
If you need an accurate answer for an integral, Simpson’s rule is a good
choice in many cases, giving precise results with relatively little effort. Alter-
natively, if you need to evaluate an integral quickly—perhaps because you will
be evaluating very many integrals as part of a larger calculation—then Simp-
son’s rule may again be a good choice, since it can give moderately accurate
answers even with only a small number of steps.

Exercise 5.2:
R2
a) Write a program to calculate an approximate value for the integral 0 ( x4 − 2x +
1) dx from Example 5.1, but using Simpson’s rule with 10 slices instead of the
trapezoidal rule. You may wish to base your program on the trapezoidal rule
program on page 139.
b) Run the program and compare your result to the known correct value of 4.4.
What is the fractional error on your calculation?
c) Modify the program to use a hundred slices instead, then a thousand. Note the
improvement in the result. How do the results compare with those from Exam-
ple 5.1 for the trapezoidal rule with the same numbers of slices?

Exercise 5.3: Consider the integral


Z x 2
E( x ) = e−t dt.
0

a) Write a program to calculate E( x ) for values of x from 0 to 3 in steps of 0.1.


Choose for yourself what method you will use for performing the integral and a
suitable number of slices.
b) When you are convinced your program is working, extend it further to make a
graph of E( x ) as a function of x. If you want to remind yourself of how to make
a graph, you should consult Section 3.1, starting on page 86.
Note that there is no known way to perform this particular integral analytically, so
numerical approaches are the only way forward.

Exercise 5.4: The diffraction limit of a telescope


Our ability to resolve detail in astronomical observations is limited by the diffraction of
light in our telescopes. Light from stars can be treated effectively as coming from a point

144
5.2 | E RRORS ON INTEGRALS

source at infinity. When such light, with wavelength λ, passes through the circular
aperture of a telescope (which we’ll assume to have unit radius) and is focused by the
telescope in the focal plane, it produces not a single dot, but a circular diffraction pattern
consisting of central spot surrounded by a series of concentric rings. The intensity of
the light in this diffraction pattern is given by
¶2
J1 (kr )
µ
I (r ) = ,
kr

where r is the distance in the focal plane from the center of the diffraction pattern,
k = 2π/λ, and J1 ( x ) is a Bessel function. The Bessel functions Jm ( x ) are given by

1
Z π
Jm ( x ) = cos(mθ − x sin θ ) dθ,
π 0

where m is a nonnegative integer and x ≥ 0.


a) Write a Python function J(m,x) that calculates the value of Jm ( x ) using Simpson’s The diffraction pattern
rule with N = 1000 points. Use your function in a program to make a plot, on a produced by a point
single graph, of the Bessel functions J0 , J1 , and J2 as a function of x from x = 0 to source of light when
x = 20. viewed through a tele-
scope.
b) Make a second program that makes a density plot of the intensity of the circular
diffraction pattern of a point light source with λ = 500 nm, in a square region of
the focal plane, using the formula given above. Your picture should cover values
of r from zero up to about 1 µm.
Hint 1: You may find it useful to know that limx→0 J1 ( x )/x = 21 . Hint 2: The central
spot in the diffraction pattern is so bright that it may be difficult to see the rings around
it on the computer screen. If you run into this problem a simple way to deal with it is to
use one of the other color schemes for density plots described in Section 3.3. The “hot”
scheme works well. For a more sophisticated solution to the problem, the imshow func-
tion has an additional argument vmax that allows you to set the value that corresponds
to the brightest point in the plot. For instance, if you say “imshow(x,vmax=0.1)”, then
elements in x with value 0.1, or any greater value, will produce the brightest (most pos-
itive) color on the screen. By lowering the vmax value, you can reduce the total range of
values between the minimum and maximum brightness, and hence increase the sensi-
tivity of the plot, making subtle details visible. (There is also a vmin argument that can
be used to set the value that corresponds to the dimmest (most negative) color.) For this
exercise a value of vmax=0.01 appears to work well.

5.2 E RRORS ON INTEGRALS

Our numerical integrals are only approximations. As with most numerical cal-
culations there is usually a rounding error when we calculate an integral, as
described in Section 4.2, but this is not the main source of error. The main

145
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

source of error is the so-called approximation error—the fact that our integration
rules themselves are only approximations to the true integral. Both the trape-
zoidal and Simpson rules calculate the area under an approximation (either
linear or quadratic) to the integrand, not the integrand itself. How big an error
does this approximation introduce?
Rb
Consider again an integral a f ( x ) dx, and let us first look at the trapezoidal
rule of Eq. (5.3). To simplify our notation a little, let us define xk = a + kh as a
shorthand for the positions at which we evaluate the integrand f ( x ). We will
refer to these positions as sample points. Now consider one particular slice of
the integral, the one that falls between xk−1 and xk , and let us perform a Taylor
expansion of f ( x ) about xk−1 thus:

f ( x ) = f ( xk−1 ) + ( x − xk−1 ) f ′ ( xk−1 ) + 21 ( x − xk−1 )2 f ′′ ( xk−1 ) + . . . (5.10)

where f ′ and f ′′ denote the first and second derivatives of f respectively. Inte-
grating this expression from xk−1 to xk gives
Z xk Z xk Z xk
f ( x ) dx = f ( xk−1 ) dx + f ′ ( xk−1 ) ( x − xk−1 ) dx
x k −1 x k −1 x k −1
Z xk
+ 12 f ′′ ( xk−1 ) ( x − xk−1 )2 dx + . . . (5.11)
x k −1

Now we make the substitution u = x − xk−1 , which gives


Z xk Z h Z h Z h
f ( x ) dx = f ( xk−1 ) du + f ′ ( xk−1 ) u du + 21 f ′′ ( xk−1 ) u2 du + . . .
x k −1 0 0 0

= h f ( xk−1 ) + 21 h2 f ′ ( xk−1 ) + 61 h3 f ′′ ( xk−1 ) + O(h4 ), (5.12)

where O(h4 ) denotes the rest of the terms in the series, those in h4 and higher,
which we are neglecting.
We can do a similar expansion around x = xk and again integrate from xk−1
to xk to get
Z xk
f ( x ) dx = h f ( xk ) − 21 h2 f ′ ( xk ) + 16 h3 f ′′ ( xk ) − O(h4 ). (5.13)
x k −1

Then, taking the average of Eqs. (5.12) and (5.13), we get


Z xk
f ( x ) dx = 21 h[ f ( xk−1 ) + f ( xk )] + 14 h2 [ f ′ ( xk−1 ) − f ′ ( xk )]
x k −1
1 3 ′′
+ 12 h [ f ( x k −1 ) + f ′′ ( xk )] + O(h4 ). (5.14)

146
5.2 | E RRORS ON INTEGRALS

Finally, we sum this expression over all slices k to get the full integral that we
want:
Z b N Z xk

a
f ( x ) dx = ∑ x k −1
f ( x ) dx
k =1
N
= 12 h ∑ [ f ( xk−1 ) + f ( xk )] + 41 h2 [ f ′ ( a) − f ′ (b)]
k =1
N
+ 1 3
12 h ∑ [ f ′′ (xk−1 ) + f ′′ (xk )] + O(h4 ). (5.15)
k =1

Let’s take a close look at this expression to see what’s going on.
The first sum on the right-hand side of the equation is precisely equal to
the trapezoidal rule, Eq. (5.3). When we use the trapezoidal rule, we evaluate
only this sum and discard all the terms following. The size of the discarded
terms—the rest of the series—measures the amount we would have to add to
the trapezoidal rule value to get the true value of the integral. In other words
it is equal to the error we incur when we use the trapezoidal rule, the so-called
approximation error.
In the second term, the term in h2 , notice that almost all of the terms have
canceled out of the sum, leaving only the first and last terms, the ones evalu-
ated at a and b. Although we haven’t shown it, a similar cancellation happens
for the terms in h4 , h6 , and all even powers of h.
Now take a look at the term in h3 and notice the following useful fact: the
sum in this term is itself, to within an overall constant, just the trapezoidal rule
formula, Eq. (5.3), for the integral of f ′′ ( x ) over the interval from a to b:
Z b N
f ′′ ( x ) dx ≃ 12 h ∑ [ f ′′ ( xk−1 ) + f ′′ ( xk )]. (5.16)
a k =1
Rb
But a f ′′ ( x ) dx = f ′ (b) − f ′ ( a) and, substituting this into Eq. (5.15) and can-
celing some terms, we find
Z b N
1 2 ′
f ( x ) dx = 12 h ∑ [ f ( xk−1 ) + f ( xk )] + 12 h [ f ( a ) − f ′ (b)] + O(h4 ). (5.17)
a k =1

Thus, to leading order in h, the value of the terms dropped when we use the
trapezoidal rule, which equals the approximation error ǫ on the integral, is

h f ′ ( a) − f ′ (b) .
1 2
£ ¤
ǫ = 12 (5.18)

This is the Euler–Maclaurin formula for the error on the trapezoidal rule. More
correctly it is the first term in the Euler–Maclaurin formula; the full formula

147
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

keeps the terms to all orders in h. We can see from Eq. (5.17) that the next term
in the series is of order h4 . We might imagine it would be of order h3 , but the h3
term cancels out, and in fact it’s fairly straightforward to show that only even
powers of h survive in the full formula at all orders, so the next term after h4
is h6 , then h8 , and so forth. So long as h is small, however, we can neglect the
h4 and higher terms—the leading term, Eq. (5.18), is usually enough.
Equation (5.18) tells us that the trapezoidal rule is a first-order integration
rule, meaning it is accurate up to and including terms proportional to h and
the leading-order approximation error is of order h2 . That is, a first-order rule
is accurate to O(h) and has an error O(h2 ).
In addition to approximation error, there is also a rounding error on our
calculation. As discussed in Section 4.2, this rounding error will have approxi-
mate size C times the value of the integral, where C is the error constant, which
is about 10−16 in current versions of Python.2 Equation (5.18) tells us that the
approximation error gets smaller as h gets smaller, so we can make our inte-
gral more accurate by using smaller h or, equivalently, a larger number N of
slices. However, there is little point in making h so small that the approxima-
tion error becomes much smaller than the rounding error. Further decreases
in h beyond this point will only make our program slower, by increasing the
number of terms in the sum for Eq. (5.3), without improving the accuracy of
our calculation significantly, since accuracy will be dominated by the rounding
error.
Thus decreases in h will only help us up to the point at which the approxi-
mation and rounding errors are roughly equal, which is the point where
Z b
1 2
f ′ ( a) − f ′ (b) ≃ C
£ ¤
12 h f ( x ) dx. (5.19)
a

Rearranging for h we get


s Rb
12 a f ( x ) dx 1/2
h≃ C . (5.20)
f ′ ( a) − f ′ (b)

2
One might imagine that the rounding error would be larger than this because the trapezoidal
rule involves a sum of terms in Eq. (5.3) and each term will incur its own rounding error, the
individual errors accumulating over the course of the calculation. As√shown in Section 4.2 and
Eq. (4.7), however, the size of such cumulative errors goes up only as N, while the trapezoidal
rule equation (5.3) includes a factor of h, which falls off as 1/N. The
√ net result is that the theoretical
cumulative error on the trapezoidal rule actually decreases as 1/ N, rather than increasing, so it
is safe to say that the final error is no greater than the error incurred by the final operation in the
calculation, which will have size C times the final value.

148
5.2 | E RRORS ON INTEGRALS

Or we can set h = (b − a)/N to get


s
f ′ ( a) − f ′ (b) −1/2
N ≃ (b − a) Rb C . (5.21)
12 a f ( x ) dx

Thus if, for example, all the factors except the last are of order unity, then
rounding error will become important when N ≃ 108 . Looked at another way,
this is the point at which the accuracy of the trapezoidal rule reaches the “ma-
chine precision,” the maximum accuracy with which the computer can repre-
sent the result. There is no point increasing the number of integration slices be-
yond this point; the calculation will not become any more accurate. However,
N = 108 would be an usually large number of slices for the trapezoidal rule—
it would be rare to use such a large number when equivalent accuracy can be
achieved using much smaller N with a more accurate rule such as Simpson’s
rule. In most practical situations, therefore, we will be in the regime where ap-
proximation error is the dominant source of inaccuracy and it is safe to assume
that rounding error can be ignored.
We can do an analogous error analysis for Simpson’s rule. The algebra is
similar but more tedious. Here we’ll just quote the results. For an integral over
the interval from a to b the approximation error is given to leading order by
1 4
f ′′′ ( a) − f ′′′ (b) .
£ ¤
ǫ= 90 h (5.22)

Thus Simpson’s rule is a third-order integration rule—two orders better than


the trapezoidal rule—with a fourth-order approximation error. For small val-
ues of h this means that the error on Simpson’s rule will typically be much
smaller than the error on the trapezoidal rule and it explains why Simpson’s
rule gave such superior results in our example calculations (see Section 5.1.2).
Rb
The rounding error for Simpson’s rule is again of order C a f ( x ) dx and
the equivalent of Eq. (5.21) is
s
f ′′′ ( a) − f ′′′ (b) −1/4
N = (b − a) 4 Rb C . (5.23)
90 a f ( x ) dx

If, again, the leading factors are roughly of order unity, this implies that round-
ing error will become important when N ≃ 10 000. Beyond this point Simp-
son’s rule is so accurate that its accuracy exceeds the machine precision of the
computer and there is no point using larger values of N. By contrast with the
case for the trapezoidal rule, however, N = 10 000 is not an unusually large
number of slices to use in a calculation. Calculations with ten thousand slices

149
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

can be done easily in a fraction of a second. Thus it is worth bearing this re-
sult in mind: there is no point using more than a few thousand slices with
Simpson’s rule because the calculation will reach the limits of precision of the
computer and larger values of N will do no further good.
Finally in this section, let us note that while Simpson’s rule does in general
give superior accuracy, it is not always guaranteed to do better than the trape-
zoidal rule, since the errors on the trapezoidal and Simpson rules also depend
on derivatives of the integrand function via Eqs. (5.18) and (5.22). It would be
possible, for instance, for f ′′′ ( a) by bad luck to be large in some particular in-
stance, making the error in Eq. (5.22) similarly large, and possibly worse than
the error for the trapezoidal rule. It would be fair to say that Simpson’s rule
usually gives better results than the trapezoidal rule, but the prudent scientist
will bear in mind that it can do worse on occasion.

5.2.1 P RACTICAL ESTIMATION OF ERRORS

The Euler–Maclaurin formula of Eq. (5.18), or its equivalent for Simpson’s rule,
Eq. (5.22), allows us to calculate the error on our integrals provided we have a
known closed-form expression for the integrand f ( x ), so that we can calculate
the derivatives that appear in the formulas. Unfortunately, in many cases—
perhaps most—we have no such expression. For instance, the integrand may
not be a mathematical function at all but a set of measurements made in the
laboratory, or it might itself be the output of another computer program. In
such cases we cannot differentiate the function and Eq. (5.18) or (5.22) will not
work. There is, however, still a way to calculate the error.
Suppose, as before, that we are evaluating an integral over the interval from
x = a to x = b and let’s assume that we are using the trapezoidal rule, since it
makes the argument simpler, although the method described here extends to
Simpson’s rule too. Let us perform the integral with some number of steps N1 ,
so that the step size is h1 = (b − a)/N1 .
Then here’s the trick: we now double the number of steps and perform
the integral again. That is we define a new number of steps N2 = 2N1 and a
new step size h2 = (b − a)/N2 = 21 h1 and we reevaluate the integral using the
trapezoidal rule, giving a new answer, which will normally be more accurate
than the previous one. As we have seen, the trapezoidal rule introduces an
error of order O(h2 ), which means when we half the value of h we quarter the
size of our error. Knowing this fact allows us to estimate how big the error is.
Suppose that the true value of our integral is I and let us denote our first

150
5.2 | E RRORS ON INTEGRALS

estimate using the trapezoidal rule with N1 steps by I1 . The difference between
the true value and estimate, which is the error on the estimate, is proportional
to h2 , so let us write it as ch2 , where c is a constant. Then I and I1 are related by
I = I1 + ch21 , neglecting higher-order terms.
We can also write a similar formula for our second estimate I2 of the inte-
gral, with N2 steps: I = I2 + ch22 . Equating the two expressions for I we then
get
I1 + ch21 = I2 + ch22 , (5.24)
or
I2 − I1 = 3ch22 , (5.25)
where we have made use of the fact that h1 = 2h2 . Rearranging this expression
then gives the error ǫ2 on the second estimate of the integral to be

ǫ2 = ch22 = 13 ( I2 − I1 ). (5.26)

As we have written it, this expression can be either positive or negative, de-
pending on which way the error happens to go. If we want only the absolute
size of the error then we can take the absolute value 13 | I2 − I1 |, which in Python
would be done using the built-in function abs.
This method gives us a simple way to estimate the error on the trapezoidal
rule without using the Euler–Maclaurin formula. Indeed, even in cases where
we could in principle use the Euler–Maclaurin formula because we know the
mathematical form of the integrand, it is often simpler in practice to use the
method of Eq. (5.26) instead—it is easy to program and gives reliable answers.
The same principle can be applied to integrals evaluated using Simpson’s
rule too. The equivalent of Eq. (5.26) in that case turns out to be
1
ǫ2 = 15 ( I2 − I1 ). (5.27)

The derivation is left to the reader (see Exercise 5.5).

Exercise 5.5: Error on Simpson’s rule


Following the same line of argument that led to Eq. (5.26), show that the error on an
integral evaluated using Simpson’s rule is given, to leading order in h, by Eq. (5.27).

Exercise 5.6: Write a program, or modify an earlier one, to once more calculate the
R2
value of the integral 0 ( x4 − 2x + 1) dx from Example 5.1, using 20 slices, but this time

151
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

have the program also print an estimate of the error on the result, calculated using the
method of Eq. (5.26). To do this you will need to evaluate the integral twice, once with
N1 = 10 slices and then again with N2 = 20 slices. Then Eq. (5.26) gives the error. How
does the error you calculated compare with a direct computation of the error as the
difference between your value for the integral and the true value of 4.4? Why do the
two not agree perfectly?

5.3 C HOOSING THE NUMBER OF STEPS


So far we have not specified how the number N of steps used in our integrals
is to be chosen. In our example calculations we just chose round numbers and
looked to see if the results seemed reasonable. This is fine for quick calcula-
tions, but for serious physics we want a more principled approach. In some
calculations we may know in advance how many steps we want to use. Some-
times we have a “budget,” a certain amount of computer time that we can
spend on a calculation and our goal is simply to make the most accurate cal-
culation we can in the given amount of time. If we know, for instance, that we
have time to do a thousand steps, then that’s what we do.
But a more common situation is that we want to calculate the value of an
integral to a given accuracy, such as four decimal places, and we would like to
know how many steps will be needed. So long as the desired accuracy does not
exceed the fundamental limit set by the machine precision of our computer—
the rounding error that limits all calculations—then it should always be pos-
sible to meet our goal by using a large enough number of steps. At the same
time, we want to avoid using more steps than are necessary, since more steps
take more time and our calculation will be slower. Ideally we would like an N
that gives us the accuracy we want and no more.
A simple way to achieve this is to start with a small value of N and re-
peatedly double it until we achieve the accuracy we want. As we saw in Sec-
tion 5.2.1, there is a simple formula, Eq. (5.26), for calculating the error on an
integral when we double the number of steps. By using this formula with
repeated doublings we can evaluate an integral to exactly the accuracy we de-
sire. The procedure is straightforward. We start off by evaluating the integral
with some small number of steps N1 . For instance, we might choose N1 = 10.
Then we double the number to N2 = 2N1 , evaluate the integral again, and
apply Eq. (5.26) to calculate the error. If the error is small enough to satisfy
our accuracy requirements, then we’re done—we have our answer. If not, we
double again to N3 = 2N2 and we keep on doubling until we achieve the re-

152
5.3 | C HOOSING THE NUMBER OF STEPS

quired accuracy. The error on the ith step of the process is given by the obvious
generalization of Eq. (5.26):

ǫi = 13 ( Ii − Ii−1 ), (5.28)

where Ii is the ith estimate of the integral.


This method is an example of an adaptive integration method,
one that changes its own parameters to get a desired answer. A 1 1
2
1 1 1 2
particularly nice feature of the method is that when we double
the number of steps we don’t actually have to recalculate the en-
tire integral again. We can reuse our previous calculation rather 1
2
1 1 1 1 1 1 1 12
than just throwing it away. To see this, take a look at Fig. 5.3. The
top part of the figure depicts the locations of the sample points,
the values of x at which the integrand is evaluated in the trape-
zoidal rule. The sample points are regularly spaced, and bear
in mind that the first and last points are treated differently from Figure 5.3: Doubling the number of
the others—the trapezoidal rule formula, Eq. (5.3), specifies that steps in the trapezoidal rule. Top:
1
the values of f ( x ) at these points are multiplied by a factor of 2 We evaluate the integrand at evenly
spaced points as shown, with the
where the values at the interior points are multiplied by 1.
value at each point being multiplied
The lower part of the figure shows what happens when we
by the appropriate factor. Bottom:
double the number of slices. This adds an additional set of sam-
when we double the number of steps,
ple points half way between the old ones, as indicated by the we effectively add a new set of points,
arrows. Note that the original points are still included in the cal- exactly half way between the previous
culation and still carry the same multiplying factors as before— points, as indicated by the arrows.
1
2 at the ends and 1 in the middle—while the new points are all
multiplied by a simple factor of 1. Thus we have all of the same terms in our
trapezoidal rule sum that we had before, terms that we have already evalu-
ated, but we also a set of new ones, which we have to add into the sum to
calculate its full value. In the jargon of computational physics we say that the
sample points for the first estimate of the integral are nested inside the points
for the second estimate.
To put this in mathematical terms, consider the trapezoidal rule at the ith
step of the calculation. Let the number of slices at this step be Ni and the width
of a slice be hi = (b − a)/Ni , and note that on the previous step there were half

153
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

as many slices of twice the width, so that Ni−1 = 21 Ni and hi−1 = 2hi . Then
· Ni −1 ¸
1
Ii = hi 2 f ( a) + 12 f (b) + ∑ f ( a + khi )
k =1
· Ni /2−1 Ni /2 ¸
1
= hi 2 f ( a) + 12 f (b) + ∑ f ( a + 2khi ) + ∑ f ( a + (2k − 1)hi )
k =1 k =1
· Ni−1 −1 ¸ Ni /2
1 1 1
= 2 h i −1 2 f ( a) + f (b) +
2 ∑ f ( a + khi−1 ) + hi ∑ f ( a + (2k − 1)hi ).
k =1 k =1
(5.29)

Now we note that the term hi−1 [. . .] in the last line is precisely the trapezoidal
rule estimate Ii−1 of the integral on the previous iteration of the process, and
hence
Ni /2
Ii = 21 Ii−1 + hi ∑ f ( a + (2k − 1)hi ). (5.30)
k =1

In effect, our old estimate gives us half of the terms in our trapezoidal rule
sum and we only have to calculate the other half. In this way we avoid ever
recalculating any term that has already been calculated, meaning that each
term in our sums is calculated only once, regardless of how many levels of
the calculation it’s used in. This means it takes only about as much work to
calculate Ii by going through all the successive levels I1 , I2 , I3 , . . . in turn as it
does to calculate Ii outright using the ordinary trapezoidal rule. Thus we pay
very little extra price in terms of the running time of our program to use this
adaptive method and we gain the significant advantage of a guarantee in the
accuracy of our integral.
The entire process is as follows:
1. Choose an initial number of steps N1 and decide on the target accuracy
for the value of the integral. Calculate the first approximation I1 to the
integral using the chosen value of N1 with the standard trapezoidal rule
formula, Eq. (5.3).
2. Double the number of steps and use Eq. (5.30) to calculate an improved
estimate of the integral. Also calculate the error on that estimate from
Eq. (5.28).
3. If the absolute magnitude of the error is less than the target accuracy for
the integral, stop. Otherwise repeat from step 2.
We can also derive a similar method for integrals evaluated using Simp-
son’s rule. Again we double the number of steps on each iteration of the pro-

154
5.4 | R OMBERG INTEGRATION

cess and the equivalent of Eq. (5.28) is


1
ǫi = 15 ( Ii − Ii−1 ). (5.31)

The equivalent of Eq. (5.30) is a little more complicated. We define


· Ni /2−1 ¸
1
Si = 3 f ( a) + f (b) + 2 ∑ f ( a + 2khi ) , (5.32)
k =1

and
Ni /2
2
Ti = 3 ∑ f ( a + (2k − 1)hi ). (5.33)
k =1

Then we can show that


Si = Si−1 + Ti−1 , (5.34)
and
Ii = hi (Si + 2Ti ). (5.35)
Thus for Simpson’s rule the complete process is:
1. Choose an initial number of steps and a target accuracy, and calculate the
sums S1 and T1 from Eqs. (5.32) and (5.33) and the initial value I1 of the
integral from Eq. (5.35).
2. Double the number of steps then use Eqs. (5.33), (5.34), and (5.35) to cal-
culate the new values of Si and Ti and the new estimate of the integral.
Also calculate the error on that estimate from Eq. (5.31).
3. If the absolute magnitude of the error is less than the target accuracy for
the integral, stop. Otherwise repeat from step 2.
Again notice that on each iteration of the process you only have to calculate
one sum, Eq. (5.33), which includes only those terms in the Simpson’s rule
formula that have not previously been calculated. As a result, the complete
calculation of Ii takes very little more computer time than the basic Simpson
rule.

5.4 R OMBERG INTEGRATION

We can do even better than the adaptive method of the last section with only
a little more effort. Let us go back to the trapezoidal rule again. We have
seen that the leading-order error on the trapezoidal rule, at the ith step of the
adaptive method, can be written as ch2i for some constant c and is given by
Eq. (5.28) to be
ch2i = 31 ( Ii − Ii−1 ). (5.36)

155
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

But by definition the true value of the integral is I = Ii + ch2i + O(h4i ), where
we are including the O(h4i ) term to remind us of the next term in the series—
see Eq. (5.17). (Remember that there are only even order terms in this series.)
So in other words
I = Ii + 13 ( Ii − Ii−1 ) + O(h4i ). (5.37)
But this expression is now accurate to third order, and has only a fourth order
error, which is as accurate as Simpson’s rule, and yet we calculated it using
only our results from the trapezoidal rule, with hardly any extra work; we are
just reusing numbers we already calculated while carrying out the repeated
doubling procedure of Section 5.3.
An alternative way to arrive at the same result is to write out the expres-
sions for the integrals at successive steps of the doubling procedure in full:

I = Ii + ch2i + O(h4i ), (5.38)


I = Ii−1 + ch2i−1 + O(h4i−1 ) = Ii−1 + 4ch2i + O(h4i ), (5.39)

where we have used hi−1 = 2hi . Now we multiply the first equation by 4 and
subtract the second from it to get Eq. (5.37) again.
We can take this process further. Let us refine our notation a little and
define

Ri,1 = Ii , Ri,2 = Ii + 31 ( Ii − Ii−1 ) = Ri,1 + 13 ( Ri,1 − Ri−1,1 ). (5.40)

Then, from Eq. (5.37),


I = Ri,2 + c2 h4i + O(h6i ), (5.41)
where c2 is another constant and we have made use of the fact that the series
for I contains only even powers of hi . Analogously,

I = Ri−1,2 + c2 h4i−1 + O(h6i−1 ) = Ri−1,2 + 16c2 h4i + O(h6i ). (5.42)

Multiplying the first of these equations by 16 and subtracting the second then
gives
1
I = Ri,2 + 15 ( Ri,2 − Ri−1,2 ) + O(h6i ). (5.43)
Now we have eliminated the h4i term and generated an estimate accurate to
fifth order, with a sixth-order error!
We can now continue this process, canceling out higher and higher order
error terms and getting more and more accurate results. In general, if Ri,m is

156
5.4 | R OMBERG INTEGRATION

an estimate calculated at the ith round of the doubling procedure and accurate
to order h2m−1 , with an error of order h2m , then
¡ 2m+2 ¢
I = Ri,m + cm h2m
i + O hi , (5.44)
¡ 2m+2 ¢ ¡ 2m+2 ¢
2m
I = Ri−1,m + cm hi−1 + O hi−1 = Ri−1,m + 4m cm h2m
i + O hi . (5.45)

Eliminating the term in h2m


i then gives us
¡ +2 ¢
I = Ri,m+1 + O h2m
i , (5.46)

where
1
Ri,m+1 = Ri,m + ( Ri,m − Ri−1,m ), (5.47)
4m − 1
which is accurate to order h2m+1 with an error of order h2m+2 .
To make use of these results in practice we do the following:
1. We calculate our first two estimates of the integral using the regular
trapezoidal rule: I1 ≡ R1,1 and I2 ≡ R2,1 .
2. From these we calculate the more accurate estimate R2,2 using Eq. (5.47).
This is as much as we can do with only the two starting estimates.
3. Now we calculate the next trapezoidal rule estimate I3 ≡ R3,1 and from
this, with Eq. (5.47), we calculate R3,2 , and then R3,3 .
4. At each successive stage we compute one more trapezoidal rule estimate
Ii ≡ Ri,1 , and from it then, with very little extra effort, we can calculate
Ri,2 . . . Ri,i .
Perhaps a picture will help make things clearer. This diagram shows which
values Ri,m are needed to calculate further Rs:

I1 ≡ R1,1
ց
I2 ≡ R2,1 → R2,2
ց ց
I3 ≡ R3,1 → R3,2 → R3,3
ց ց ց
I4 ≡ R4,1 → R4,2 → R4,3 → R4,4
ց ց ց ց

Each row here lists one trapezoidal rule estimate Ii followed by the other
higher-order estimates it allows us to make. The arrows show which previous
estimates go into the calculation of each new one via Eq. (5.47).

157
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

Note how each fundamental trapezoidal rule estimate Ii allows us to go


one step further with calculating the Ri,m . The most accurate estimate we get
from the whole process is the very last one: if we do n levels of the process,
then the last estimate is Rn,n , which is accurate to order h2n
n .
This procedure is called Romberg integration. It’s essentially an “add-on”
to our earlier adaptive trapezoidal rule scheme: all the tough work is done in
the trapezoidal rule calculations and the Romberg integration takes almost no
extra computer time, although it does involve extra programming. The payoff
is a value for the integral that is accurate to much higher order in h than the
simple trapezoidal rule value (or even than Simpson’s rule).

Exercise 5.7: Consider the integral


Z 1 √
I= sin2 100x dx
0

a) Write a program that uses the adaptive trapezoidal rule method of Section 5.3
and Eq. (5.30) to calculate the value of this integral to an approximate accuracy
of ǫ = 10−6 (i.e., correct to six digits after the decimal point). Start with one single
integration slice and work up from there to two, four, eight, and so forth. Have
your program print out the number of slices, its estimate of the integral, and its
estimate of the error on the integral, for each value of the number of slices N,
until the target accuracy is reached. (Hint: You should find the result is around
I = 0.45.)
b) Now modify your program to evaluate the same integral using the Romberg in-
tegration technique described in this section. Have your program print out a
triangular table of values, as on page 157, of all the Romberg estimates of the
integral.

Exercise 5.8: Write a program that uses the adaptive Simpson’s rule method of Sec-
tion 5.3 and Eqs. (5.31) to (5.35) to calculate the same integral as in Exercise 5.7, again
to an approximate accuracy of ǫ = 10−6 . Starting this time with two integration slices,
work up from there to four, eight, and so forth, printing out the results at each step
until the required accuracy is reached. You should find you reach that accuracy for a
significantly smaller of slices than with the trapezoidal rule calculation in part (a) of
Exercise 5.7.

158
5.5 | H IGHER - ORDER INTEGRATION METHODS

5.5 H IGHER - ORDER INTEGRATION METHODS

As we have seen, the trapezoidal rule is based on approximating an inte-


grand f ( x ) with straight-line segments, while Simpson’s rule uses quadratics.
We can create higher-order (and hence potentially more accurate) rules by us-
ing higher-order polynomials, fitting f ( x ) with cubics, quartics, and so forth.
The general form of the trapezoidal and Simpson rules is
Z b N

a
f ( x ) dx ≃ ∑ w k f ( x k ), (5.48)
k =1

where the xk are the positions of the sample points at which we calculate the
integrand and the wk are some set of weights. In the trapezoidal rule, Eq. (5.3),
the first and last weights are 21 and the others are all 1, while in Simpson’s rule
the weights are 13 for the first and last slices and alternate between 34 and 23 for
the other slices—see Eq. (5.9).
For higher-order rules the basic form is the same: after fitting to the appro-
priate polynomial and integrating we end up with a set of weights that multi-
ply the values f ( xk ) of the integrand at evenly spaced sample points. Here are
the weights up to quartic order:

Degree Polynomial Coefficients


1 1
1 (trapezoidal rule) Straight line 2 , 1, 1, . . . , 1, 2
1 4 2 4 4 1
2 (Simpson’s rule) Quadratic 3, 3, 3, 3, . . . , 3, 3
3 9 9 3 9 9 3 9 3
3 Cubic 8, 8, 8, 4, 8, 8, 4, . . . , 8, 8
14 64 8 64 28 64 8 64 64 14
4 Quartic 45 , 45 , 15 , 45 , 45 , 45 , 15 , 45 , . . . , 45 , 45

Higher-order integration rules of this kind are called Newton–Cotes formulas


and in principle they can be extended to any order we like.
However, we can do better still. The point to notice is that the trapezoidal
rule is exact if the function being integrated is actually a straight line, because
then the straight-line approximation isn’t an approximation at all. Similarly,
Simpson’s rule is exact if the function being integrated is a quadratic, and the
kth Newton–Cotes rule is exact if the function being integrated is a degree-k
polynomial.
But if we have N sample points, then presumably that means we could just
fit one ( N − 1)th-order polynomial to the whole integration interval, and get
an integration method that is exact for ( N − 1)th order polynomials—and for
any lower-order polynomials as well. (Note that it’s N − 1 because you need
three points to fit a quadratic, four for a cubic, and so forth.)

159
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

But we can do even better than this. We have been assuming here that the
sample points are evenly spaced. This has some significant advantages. Meth-
ods with evenly spaced points are relatively simple to program, and it’s easy to
increase the number of points by adding new points half way between the old
ones, as we saw in Section 5.3. However, it is also possible to derive integra-
tion methods that use unevenly spaced points and, while they lack some of the
advantages above, they have others of their own. In particular, they can give
very accurate answers with only small numbers of points, making them par-
ticularly suitable for cases where we need to do integrals very fast, or where
evaluation of the integrand itself takes a long time.
Suppose then that we broaden our outlook into include rules of the form of
Eq. (5.48), but where we are allowed to vary not only the weights wk but also
the positions xk of the sample points. Any choice of positions is allowed, and
particularly ones that are not equally spaced. As we have said, it is possible
to create an integration method accurate to ( N − 1)th order with N equally
spaced points. Varying the positions of the points gives us N extra degrees of
freedom, which suggests that it then might be possible to create an integration
rule that is exact for polynomials up to order 2N − 1 if all of those degrees
of freedom are chosen correctly. For large values of N this could give us the
power to fit functions very accurately indeed, and hence to do very accurate
integrals. It turns out indeed that it is possible to do this and the develop-
ments lead to the superbly accurate integration method known as Gaussian
quadrature, which we describe in the next section.

5.6 G AUSSIAN QUADRATURE

The derivation of the Gaussian quadrature method has two parts. First, we will
see how to derive integration rules with nonuniform sample points xk . Then
we will choose the particular set of nonuniform points that give the optimal
integration rule.

5.6.1 N ONUNIFORM SAMPLE POINTS

Suppose we are given a nonuniform set of N points xk and we wish to create an


integration rule of the form (5.48) that calculates integrals over a given interval
from a to b, based only on the values f ( xk ) of the integrand at those points. In
other words, we want to choose weights wk so that Eq. (5.48) works for gen-
eral f ( x ). To do this, we will fit a single polynomial through the values f ( xk )

160
5.6 | G AUSSIAN QUADRATURE

and then integrate that polynomial from a to b to calculate an approximation


to the true integral. To fit N points we need to use a polynomial of order N − 1.
The fitting can be done using the method of interpolating polynomials.
Consider the following quantity:

( x − xm )
φk ( x ) = ∏ ( xk − xm )
m=1...N
m6=k

( x − x1 ) ( x − x k −1 ) ( x − x k +1 ) (x − xN )
= ×...× × ×...× ,
( x k − x1 ) ( x k − x k −1 ) ( x k − x k +1 ) ( xk − x N )
(5.49)

which is called an interpolating polynomial. Note that the numerator contains


one factor for each sample point except the point xk . Thus φk ( x ) is a polynomial
in x of degree N − 1. For values of k from 1 to N, Eq. (5.49) defines N different
such polynomials.
You can confirm for yourself that if we evaluate φk ( x ) at one of the sample
points x = xm we get
½
1 if m = k,
φk ( xm ) = (5.50)
0 if m 6= k,

or, to be more concise,


φk ( xm ) = δkm , (5.51)
where δkm is the Kronecker delta—the quantity that is 1 when k = m and zero
otherwise.
So now consider the following expression:
N
Φ( x ) = ∑ f (xk ) φk (x). (5.52)
k =1

Since it is a linear combination of polynomials of degree N − 1, this entire


quantity is also a polynomial of degree N − 1. And if we evaluate it at any one
of the sample points x = xm we get
N N
Φ( xm ) = ∑ f (xk ) φk (xm ) = ∑ f (xk ) δkm = f ( x m ), (5.53)
k =1 k =1

where we have used Eq. (5.51).


In other words Φ( x ) is a polynomial of degree N − 1 that fits the integrand
f ( x ) at all of the sample points. This is exactly the quantity we were looking

161
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

for to create our integration rule. Moreover, the polynomial of degree N − 1


that fits a given N points is unique: it has N free coefficients and our points
give us N constraints, so the coefficients are completely determined. Hence
Φ( x ) is not merely a polynomial that fits our points, it is the polynomial. There
are no others.
To calculate an approximation to our integral, all we have to do now is
integrate Φ( x ) from a to b thus:
Z b Z b Z b N

a
f ( x ) dx ≃
a
Φ( x ) dx = ∑ f (xk )φk (x) dx
a k =1
N Z b
= ∑ f ( xk )
a
φk ( x ) dx, (5.54)
k =1

where we have interchanged the order of the sum and integral in the second
line. Comparing this expression with Eq. (5.48) we now see that the weights
we need for our integration rule are given by
Z b
wk = φk ( x ) dx. (5.55)
a

In other words we have found a general method for creating an integration rule
of the form (5.48) for any set of sample points xk : we simply set the weights wk
equal to the integrals of the interpolating polynomials, Eq. (5.49), over the do-
main of integration.
There is no general closed-form formula for the integrals of the interpolat-
ing polynomials.3 In some special cases it is possible to perform the integrals
exactly, but often it is not, in which case we may have to perform them on the
computer, using one of our other integration methods, such as Simpson’s rule
or Romberg integration. This may seem to defeat the point of our calculation,
which was to find an integration method that didn’t rely on uniformly spaced
sample points, and here we are using Simpson’s rule, which has uniformly
spaced points! But in fact the exercise is not as self-defeating as it may appear.
The important point to notice is that we only have to calculate the weights wk
once, and then we can use them in Eq. (5.48) to integrate as many different
functions over the given integration domain as we like. So we may have to put
some effort into the calculation of the weights, using, say, Simpson’s rule with

3
One can in principle expand Eq. (5.49) and then integrate the resulting expression term by
term, since powers of x can be integrated in closed form. However, the result would be a sum
of 2 N −1 different terms, which would be intractable even for the fastest computers, for relatively
modest values of N.

162
5.6 | G AUSSIAN QUADRATURE

very many slices to get as accurate an answer as possible. But we only have
to do it once, and thereafter other integrals can be done rapidly and accurately
using Eq. (5.48).
In fact, it’s better than this. Once one has calculated the weights for a par-
ticular set of sample points and domain of integration, it’s possible to map
those weights and points onto any other domain and get an integration rule
of the form (5.48) without having to recalculate the weights. Typically one
gives sample points and weights arranged in a standard interval, which for
historical reasons is usually taken to be the interval from x = −1 to x = +1.
Thus to specify an integration rule one gives a set of sample points in the range
−1 ≤ xk ≤ 1 and a set of weights
Z 1
wk = φk ( x ) dx. (5.56)
−1

If we want to use any integration domain other than the one from −1 to +1, we
map these values to that other domain. Since the area under a curve doesn’t
depend on where that curve is along the x line, the sample points can be slid up
and down the x line en masse and the integration rule will still work fine. If the
desired domain is wider or narrower than the interval from −1 to +1 then we
also need to spread the points out or squeeze them together. The correct rule
for mapping the points to a general domain that runs from x = a to x = b is:

xk′ = 21 (b − a) xk + 21 (b + a). (5.57)

Similarly the weights do not change if we are simply sliding the sample points
up or down the x line, but if the width of the integration domain changes then
the value of the integral will increase or decrease by a corresponding factor,
and hence the weights have to be rescaled thus:

wk′ = 12 (b − a)wk . (5.58)

Once we have calculated the rescaled positions and weights then the integral
itself is given by
Z b n

a
f ( x ) dx ≃ ∑ wk′ f (xk′ ). (5.59)
k =1

5.6.2 S AMPLE POINTS FOR G AUSSIAN QUADRATURE

The developments of the previous section solve half our problem. Given the
positions of the sample points xk they tell us how to choose the weights wk , but

163
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

we still need to choose the sample points. As we argued in Section 5.5, in the
best case it should be possible to choose our N points so that our integration
rule is exact for all polynomial integrands up to and including order 2N − 1.
The proof that this is indeed possible, and the accompanying derivation of the
positions, is not difficult, but it is quite long and it’s not really important for
our purposes. If you want to see it, it’s given in Appendix C on page 466. Here
we’ll just look at the results, which definitely are important and useful.
The bottom line is this: to get an integration rule accurate to the highest
possible order of 2N − 1, the sample points xk should be chosen to coincide
with the zeros of the Nth Legendre polynomial PN ( x ), rescaled if necessary to
the window of integration using Eq. (5.57), and the corresponding weights wk
are " µ ¶ −1 #
2 dPN
wk = , (5.60)
(1 − x2 ) dx
x = xk

also rescaled if necessary, using Eq. (5.58).


This method is called Gaussian quadrature4 and although it might sound
rather formidable from the description above, in practice it’s beautifully sim-
ple: given the values xk and wk for your chosen N, all you have to do is rescale
them if necessary using Eqs. (5.57) and (5.58) and then perform the sum in
Eq. (5.59).
The only catch is finding the values in the first place. In principle the results
quoted above tell us everything we need to know but in practice the zeros of
the Legendre polynomials are not trivial to compute. Tables containing values
of xk and wk up to about N = 20 can be found in books or on-line,5 or they can
be calculated for any N using a suitable computer program. Python functions
to perform the calculation are given in Appendix E and also in the on-line
resources in the file gaussxw.py—Example 5.2 below shows how to use them.
Figure 5.4 shows what the sample points and weights look like for the cases
N = 10 and N = 100. Note how the points get closer together at the edges
while at the same time the weights get smaller.

4
It’s called “Gaussian” because it was pioneered by the legendary mathematician Carl
Friedrich Gauss. “Quadrature” is an old (19th century) name for numerical integration—Gauss’s
work predates the invention of computers, to a time when people did numerical integrals by hand,
meaning they were very concerned about getting the best answers when N is small. When you’re
doing calculations by hand, Simpson’s rule with N = 1000 is not an option.
5
See for example Abramowitz, M. and Stegun, I. A., eds., Handbook of Mathematical Functions,
Dover Publishing, New York (1974).

164
5.6 | G AUSSIAN QUADRATURE

0.3 0.03
Weight w

Weight w
0.02
0.2

0.01

0.1

0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
Position x Position x

(a) (b)

Figure 5.4: Sample points and weights for Gaussian quadrature. The positions and heights of the bars represent
the sample points and their associated weights for Gaussian quadrature with (a) N = 10 and (b) N = 100.

E XAMPLE 5.2: G AUSSIAN INTEGRAL OF A SIMPLE FUNCTION


R2
Consider again the integral we did in Example 5.1, 0 ( x4 − 2x + 1) dx, whose
true value, as we saw, is 4.4. Here’s a program to evaluate the same integral
using Gaussian quadrature. Just to emphasize the impressive power of the
method, we will perform the calculation with only N = 3 sample points:

from gaussxw import gaussxw File: gaussint.py

def f(x):
return x**4 - 2*x + 1

N = 3
a = 0.0
b = 2.0

# Calculate the sample points and weights, then map them


# to the required integration domain
x,w = gaussxw(N)
xp = 0.5*(b-a)*x + 0.5*(b+a)
wp = 0.5*(b-a)*w

165
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

# Perform the integration


s = 0.0
for k in range(N):
s += wp[k]*f(xp[k])

print(s)

For this program to work you must have a copy of the file gaussxw.py in the
same folder as the program itself.
Note how the function gaussxw(N) returns two variables, not just one. We
discussed functions of this type in Section 2.6 but this is the first time we’ve
seen one in use. In this case the variables are arrays, x and w, containing the
sample points and weights for Gaussian quadrature on N points over the stan-
dard interval from −1 to +1. Notice also how we mapped the points and the
weights from the standard interval to our desired integration domain: we have
used Python’s ability to perform calculations with entire arrays to achieve the
mapping in just two lines.
There is also an alternative function gaussxwab(N,a,b) that calculates the
positions and weights and then does the mapping for you. To use this function,
we would say “from gaussxw import gaussxwab”, then

x,w = gaussxwab(N,a,b)
s = 0.0
for k in range(N):
s += w[k]*f(x[k])

It’s worth noting that the calculation of the sample points and weights takes
quite a lot of work—the functions above may take a second or so to complete
the calculation. That’s fine if you call them only once in your program, but
you should avoid calling them many times or you may find your program
runs slowly. Thus, for instance, if you need to do many integrals over different
domains of integration, you should call the function gaussxw once to calculate
the sample points over the standard interval from −1 to +1 and then map the
points yourself to the other integration domains you need. Calling gaussxwab
separately for each different integration domain would be slow and waste-
ful, since it would needlessly recalculate the zeros of the Legendre polynomial
each time.
Our Gaussian quadrature program is quite simple—only a little more com-
plicated than the program for the trapezoidal rule in Example 5.1. Yet when
we run it, it prints the following:

166
5.6 | G AUSSIAN QUADRATURE

4.4

The program has calculated the answer exactly, with just three sample points!
This is not a mistake, or luck, or a coincidence. It’s exactly what we expect.
Gaussian integration on N points gives exact answers for the integrals of poly-
nomial functions up to and including order 2N − 1, which means up to fifth
order when N = 3. The function x4 − 2x + 1 that we are integrating here is a
fourth-order polynomial, so we expect the method to return the exact answer
of 4.4, and indeed it does. Nonetheless, the performance of the program does
seem almost magical in this case: the program has evaluated the integrand
at just three points and from those three values alone it is, amazingly, able to
deduce the integral of the entire function exactly.
This is the strength of Gaussian quadrature: it can give remarkably accurate
answers, even with small numbers of sample points. This makes it especially
useful in situations where you cannot afford to use large numbers of points, ei-
ther because you need to be able to calculate an answer very quickly or because
evaluating your integrand takes a long time even for just a few points.
The method does have its disadvantages. In particular, because the sample
points are not uniformly distributed it takes more work if we want to employ
the trick of repeatedly doubling N, as we did in Section 5.3, to successively
improve the accuracy of the integral—if we change the value of N then all
the sample points and weights have to be recalculated, and the entire sum
over points, Eq. (5.48), has to be redone. We cannot reuse the calculations
for old sample points as we did with the trapezoidal rule; in the language of
computational physics we would say that the sample points are not nested.6

6
There are other methods, such as Gauss–Kronrod quadrature and Clenshaw–Curtis quadrature,
which have nonuniformly distributed sample points and still permit nesting, although these meth-
ods have their own disadvantages. Gauss–Kronrod quadrature permits only one step of nesting:
it provides two sets of integration points, one nested inside the other, but no way to generate sub-
sequent points nested inside those. Two sets of points are enough to make error estimates, via
a formula analogous to Eq. (5.26), but one cannot keep on doubling the number of points to re-
duce the error below a given target, as with the adaptive method of Section 5.3. Clenshaw–Curtis
quadrature does permit nesting over an arbitrary number of steps, but is not based on an integra-
tion rule of the simple form (5.48). Instead the method uses a more complicated formula whose
evaluation involves, among other steps, performing a Fourier transform, which is more computa-
tionally demanding, and hence slower, than the simple sum used in Gaussian quadrature. In ad-
dition, neither Gauss–Kronrod quadrature nor Clenshaw–Curtis quadrature achieves the level of
accuracy provided by Gaussian quadrature, although both are highly accurate and probably good
enough for most purposes. Gauss–Kronrod quadrature in particular is widely used in mathemati-
cal software to compute definite integrals, because of its ability to provide both good accuracy and
error estimates. Gauss–Kronrod quadrature is discussed further in Appendix C.

167
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

Exercise 5.9: Heat capacity of a solid


Debye’s theory of solids gives the heat capacity of a solid at temperature T to be
¶3 Z θ D /T x4 ex
µ
T
CV = 9Vρk B dx,
θD 0 (e x− 1)2

where V is the volume of the solid, ρ is the number density of atoms, k B is Boltzmann’s
constant, and θ D is the so-called Debye temperature, a property of solids that depends on
their density and speed of sound.
a) Write a Python function cv(T) that calculates CV for a given value of the tem-
perature, for a sample consisting of 1000 cubic centimeters of solid aluminum,
which has a number density of ρ = 6.022 × 1028 m−3 and a Debye temperature
of θ D = 428 K. Use Gaussian quadrature to evaluate the integral, with N = 50
sample points.
b) Use your function to make a graph of the heat capacity as a function of tempera-
ture from T = 5 K to T = 500 K.

Exercise 5.10: Period of an anharmonic oscillator


The simple harmonic oscillator crops up in many places. Its behavior can be studied
readily using analytic methods and it has the important property that its period of os-
cillation is a constant, independent of its amplitude, making it useful, for instance, for
keeping time in watches and clocks. Frequently in physics, however, we also come
across anharmonic oscillators, whose period varies with amplitude and whose behav-
ior cannot usually be calculated analytically.
A general classical oscillator can be thought of as a particle in a concave potential
well. When disturbed, the particle will rock back and forth in the well:

V( x )

The harmonic oscillator corresponds to a quadratic potential V ( x ) ∝ x2 . Any other form


gives an anharmonic oscillator. (Thus there are many different kinds of anharmonic
oscillator, depending on the exact form of the potential.)

168
5.6 | G AUSSIAN QUADRATURE

One way to calculate the motion of an oscillator is to write down the equation for
the conservation of energy in the system. If the particle has mass m and position x, then
the total energy is equal to the sum of the kinetic and potential energies thus:
µ ¶2
dx
E = 21 m + V ( x ).
dt

Since the energy must be constant over time, this equation is effectively a (nonlinear)
differential equation linking x and t.
Let us assume that the potential V ( x ) is symmetric about x = 0 and let us set our
anharmonic oscillator going with amplitude a. That is, at t = 0 we release it from rest at
position x = a and it swings back towards the origin. Then at t = 0 we have dx/dt = 0
and the equation above reads E = V ( a), which gives us the total energy of the particle
in terms of the amplitude.
a) When the particle reaches the origin for the first time, it has gone through one
quarter of a period of the oscillator. By rearranging the equation above for dx/dt
and then integrating with respect to t from 0 to 14 T, show that the period T is
given by
√ Z a dx
T = 8m p .
0 V ( a) − V ( x )
b) Suppose the potential is V ( x ) = x4 and the mass of the particle is m = 1. Write a
Python function that calculates the period of the oscillator for given amplitude a
using Gaussian quadrature with N = 20 points, then use your function to make
a graph of the period for amplitudes ranging from a = 0 to a = 2.
c) You should find that the oscillator gets faster as the amplitude increases, even
though the particle has further to travel for larger amplitude. And you should
find that the period diverges as the amplitude goes to zero. How do you explain
these results?

5.6.3 E RRORS ON G AUSSIAN QUADRATURE

In our study of the trapezoidal rule we derived an expression, the Euler–


Maclaurin formula of Eq. (5.18), for the approximation error on the value of
an integral. There exists a corresponding expression for Gaussian quadrature
but it is, unfortunately, ungainly and not easy to use in practice. What it does
tell us, however, is that Gaussian quadrature is impressively accurate. Roughly
speaking, the approximation error—the difference between the value of an in-
tegral calculated using Gaussian quadrature and the true value of the same
integral, neglecting rounding error—improves by a factor of c/N 2 when we
increase the number of samples by just one, where c is a constant whose value
depends on the detailed shape of the integrand and the size of the domain of

169
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

integration. Thus, for instance, if we go from N = 10 to N = 11 our estimate


of the integral will improve by a factor of order a hundred. This means that we
converge extremely quickly on the true value of the integral, and in practice it
is rarely necessary to use more than a few tens of points, or at most perhaps a
hundred, to get an estimate of an integral accurate to the limits of precision of
the computer.
There are some caveats. An important one is that the function being in-
tegrated must be reasonably smooth. When one is calculating an integral us-
ing a relatively small number of sample points, the points will inevitably be
far apart, which leaves room for the function to vary significantly between
them. Since Gaussian quadrature looks only at the values of the function at
the sample points and nowhere else, substantial variation between points is
not taken into account in calculating the value of the integral. If, on the other
hand, the function is relatively smooth, then the samples we take will give a
good approximation of the function’s behavior and Gaussian quadrature will
work well. Thus for rapidly varying functions one needs to use enough sam-
ple points to capture the variation, and in such cases larger values of N may
be warranted.
Another issue is that there is no direct equivalent of Eq. (5.26) for estimat-
ing the error in practice. As we have said, however, the error improves by a
factor of c/N 2 when the number of samples is increased by one, which is typi-
cally a substantial improvement if N is reasonably large. And if we double the
value of N then we compound many such improvements, giving an overall
reduction in the error by a factor of something like N −2N , which is typically a
huge improvement.
If we make a Gaussian estimate IN of the true value I of an integral using
N sample points, then I = IN + ǫ N , where ǫ N is the approximation error. And
if we double the number of samples to 2N we have, I = I2N + ǫ2N . Equating
the two expressions for I and rearranging, we have

ǫ N − ǫ2N = I2N − IN . (5.61)

But, as we have argued, the error is expected to improve by a large factor when
we double the number of sample points, meaning that ǫ2N ≪ ǫ N . So, to a good
approximation,
ǫ N ≃ I2N − IN . (5.62)
Another way of saying this is that I2N is so much better an estimate of the true
value of the integral than IN that for the purposes of estimating the error we

170
5.6 | G AUSSIAN QUADRATURE

can treat it as if it were the true value, so that I2N − IN is a good estimate of the
error.
We can use Eq. (5.62) in an adaptive integration method where we double
the number of sample points at each step, calculating the error and repeating
until the desired target accuracy is reached. Such a method is not entirely
satisfactory, for a couple of reasons. First, when we double the number of
sample points from N to 2N, Eq. (5.62) gives us only the error on the previous
estimate of the integral IN , not on the new estimate I2N . This means that we
always end up doubling N one more time than is strictly necessary to achieve
the desired accuracy, and that the final value for the integral will probably be
significantly more accurate than we really need it to be, which means we have
wasted time on unnecessary calculations. Second, we have to perform the
entire calculation of the integral anew for each new value of N. As mentioned
earlier, and unlike the adaptive trapezoidal method of Section 5.3, we cannot
reuse the results of earlier calculations to speed up the computation. So an
adaptive calculation of this type would be slower than just a single instance of
Gaussian quadrature. On the other hand, it’s straightforward to show that the
total number of terms in all the sums we perform, over all steps of the process,
is never greater than twice the final value of N used, which means that the
adaptive procedure costs us no more than about twice the effort required for
the simple Gaussian quadrature. Moreover, as we have said, we rarely need
to go beyond N = 100 to get a highly accurate answer, so the number of times
we double N is typically rather small. If we start with, say, N = 10, we will
probably only have to double three or four times. The net result is that, despite
the extra work, Gaussian quadrature is often more efficient than methods like
the trapezoidal rule or Simpson’s rule in terms of overall time needed to get an
answer to a desired degree of accuracy.
An alternative, though more complex, solution to the problem of estimating
the error in Gaussian quadrature is to use Gauss–Kronrod quadrature, a variant
of Gaussian quadrature based on the properties of Stieltjes polynomials, which
provides not only an accurate estimate of our integral (though not quite as
accurate as ordinary Gaussian quadrature) but also an estimate of the error.
We will not use Gauss–Kronrod quadrature in this book, but the interested
reader can find a short discussion, with some derivations, in Appendix C.

171
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

5.7 I NTEGRALS OVER INFINITE RANGES


R∞
Often in physics we encounter integrals over infinite ranges, like 0 f ( x ) dx.
The techniques we have seen so far don’t work for these integrals because
we’d need an infinite number of sample points to span an infinite range. The
solution to this problem is to change variables. For an integral over the range
from 0 to ∞ the standard change of variables is
x z
z= or equivalently x= . (5.63)
1+x 1−z

Then dx = dz/(1 − z)2 and


∞ 1
µ ¶
1 z
Z Z
f ( x ) dx = f dz, (5.64)
0 0 (1 − z )2 1−z

which can be done using any of the techniques earlier in the chapter, including
the trapezoidal and Simpson rules, or Gaussian quadrature.
This is not the only change of variables that we can use, however. In fact, a
change of the form
x
z= (5.65)
c+x
would work for any value of c, or z = x γ /(1 + x γ ) for any γ, or any of a
range of other possibilities. Some choices typically work better than others
for particular integrals and sometimes you have to play around with things a
little to find what works for a given problem, but Eq. (5.63) is often a good first
guess. (See Exercise 5.1 for a counterexample.)
To do an integral over a range from some nonzero value a to ∞ we can
use a similar approach, but make two changes of variables, first to y = x − a,
which shifts the start of the integration range to 0, and then z = y/(1 + y) as
in Eq. (5.63). Or we can combine both changes into a single one:
x−a z
z= or x= + a, (5.66)
1+x−a 1−z

and again dx = dz/(1 − z)2 , so that


∞ 1
µ ¶
1 z
Z Z
f ( x ) dx = f + a dz. (5.67)
a 0 (1 − z )2 1−z

Integrals from −∞ to a can be done the same way—just substitute z → −z.


For integrals that run from −∞ to ∞ we can split the integral into two parts,
one from −∞ to 0 and one from 0 to ∞, and then use the tricks above for the

172
5.7 | I NTEGRALS OVER INFINITE RANGES

two integrals separately. Or we can put the split at some other point a and
perform separate integrals from −∞ to a and from a to ∞. Alternatively, one
could use a single change of variables, such as

z 1 + z2
x= , dx = dz, (5.68)
1 − z2 (1 − z2 )2
which would give
∞ 1 1 + z2
µ ¶
z
Z Z
f ( x ) dx = f dz. (5.69)
−∞ −1 (1 − z2 )2 1 − z2

Another possibility, perhaps simpler, is

dz
x = tan z, dx = , (5.70)
cos2 z
which gives
Z ∞ Z π/2 f (tan z)
f ( x ) dx = dz. (5.71)
−∞ −π/2 cos2 z

E XAMPLE 5.3: I NTEGRATING OVER AN INFINITE RANGE

Let us calculate the value of the following integral using Gaussian quadrature:
Z ∞ 2
I= e−t dt. (5.72)
0

We make the change of variables given in Eq. (5.63) and the integral becomes
2 2
1 e− z / (1− z )
Z
I= dz. (5.73)
0 (1 − z )2
We can modify our program from Example 5.2 to perform this integral using
Gaussian quadrature with N = 50 sample points:

from gaussxw import gaussxwab File: intinf.py


from math import exp

def f(z):
return exp(-z**2/(1-z)**2)/(1-z)**2

N = 50
a = 0.0
b = 1.0
x,w = gaussxwab(N,a,b)

173
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

s = 0.0
for k in range(N):
s += w[k]*f(x[k])
print(s)

If we run this program it prints

0.886226925453

In fact, the value of this particular integral is known exactly to be 12 π =
0.886226925453. . . Again we see the impressive accuracy of the Gaussian quad-
rature method: with just 50 sample points, we have calculated an estimate of
the integral that is correct to the limits of precision of the computer.

Exercise 5.11: The Stefan–Boltzmann constant


The Planck theory of thermal radiation tells us that in the (angular) frequency interval ω
to ω + dω, a black body of unit area radiates electromagnetically an amount of thermal
energy per second equal to I (ω ) dω, where

h̄ ω3
I (ω ) = .
4π 2 c2 (eh̄ω/k B T − 1)
Here h̄ is Planck’s constant over 2π, c is the speed of light, and k B is Boltzmann’s con-
stant.
a) Show that the total energy per unit area radiated by a black body is

k4B T 4 ∞ x3
Z
W= dx.
4π 2 c2 h̄3 0 ex −1
b) Write a program to evaluate the integral in this expression. Explain what method
you used, and how accurate you think your answer is.
c) Even before Planck gave his theory of thermal radiation around the turn of the
20th century, it was known that the total energy W given off by a black body
per second followed Stefan’s law: W = σT 4 , where σ is the Stefan–Boltzmann
constant. Use your value for the integral above to compute a value for the Stefan–
Boltzmann constant (in SI units) to three significant figures. Check your result
against the known value, which you can find in books or on-line. You should get
good agreement.

Exercise 5.12: Quantum uncertainty in the harmonic oscillator


In units where all the constants are 1, the wavefunction of the nth energy level of
the one-dimensional quantum harmonic oscillator—i.e., a spinless point particle in a

174
5.8 | M ULTIPLE INTEGRALS

quadratic potential well—is given by

1 − x2 /2
ψn ( x ) = p √ e Hn ( x ),
2n n! π

for n = 0 . . . ∞, where Hn ( x ) is the nth Hermite polynomial. Hermite polynomials


satisfy a relation somewhat similar to that for the Fibonacci numbers, although more
complex:
Hn+1 ( x ) = 2xHn ( x ) − 2nHn−1 ( x ).
The first two Hermite polynomials are H0 ( x ) = 1 and H1 ( x ) = 2x.
a) Write a user-defined function H(n,x) that calculates Hn ( x ) for given x and any
integer n ≥ 0. Use your function to make a plot that shows the wavefunctions
for n = 0, 1, 2, and 3, all on the same graph, in the range x = −4 to x = 4. Hint:
There is a function factorial in the math package that calculates the factorial of
an integer.
b) Make a separate plot of the wavefunction for n = 30 from x = −10 to x = 10.
Hint: If your program takes too long to run in this case, then you’re doing the
calculation wrong—the program should take only a second or so to run.
c) The quantum uncertainty of a particle in the nth level of a pquantum harmonic
oscillator can be quantified by its root-mean-square position h x2 i, where
Z ∞
h x2 i = x2 |ψn ( x )|2 dx.
−∞

Write a program that evaluates this integral using Gaussian quadrature on 100
points and then calculates the uncertainty (i.e., the RMS position of the particle)
for a given value of n. Use your program topcalculate the uncertainty for n = 5.
You should get an answer in the vicinity of h x2 i = 2.3.

5.8 M ULTIPLE INTEGRALS

Integrals over more than one variable are common in physics problems and
can be tackled using generalizations of the methods we have already seen.
Consider for instance the integral
Z 1 Z 1
I= f ( x, y) dx dy. (5.74)
0 0

We can rewrite this by defining a function F (y) thus


Z 1
F (y) = f ( x, y) dx. (5.75)
0

175
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

Figure 5.5: Sample points for Gaussian quadrature in two dimensions. If one applies
Eq. (5.78) to integrate the function f ( x, y) in two dimensions, using Gaussian quadra-
ture with N = 10 points along each axis, the resulting set of sample points in the two-
dimensional space looks like this.

Then Z 1
I= F (y) dy. (5.76)
0

Thus one way to do the multiple integral numerically is first to evaluate F (y)
for a suitable set of y values, which means performing the integral in Eq. (5.75),
then using those values of F (y) to do the integral in Eq. (5.76). For instance, if
we do the integrals by Gaussian quadrature with the same number N of points
for both x and y integrals, we have
N N
F (y) ≃ ∑ wi f ( x i , y ) and I≃ ∑ w j F ( y j ). (5.77)
i =1 j =1

An alternative way to look at the calculation is to substitute the first sum into
the second to get the Gauss–Legendre product formula:
N N
I≃ ∑ ∑ wi w j f ( x i , y j ). (5.78)
i =1 j =1

This expression has a form similar to the standard integration formula for sin-
gle integrals, Eq. (5.48), with a sum over values of the function f ( x, y) at a set of

176
5.8 | M ULTIPLE INTEGRALS

Figure 5.6: 128-point Sobol sequence. The Sobol sequence is one example of a low-
discrepancy point set that gives good results for integrals in high dimensions. This
figure shows a Sobol sequence of 128 points in two dimensions.

sample points, multiplied by appropriate weights. Equation (5.78) represents a


kind of two-dimensional Gaussian quadrature, with weights wi w j distributed
over a two-dimensional grid of points as shown in Fig. 5.5.
Once you look at it this way, however, you realize that in principle there’s
no reason why the sample points have to be on a grid. They could be any-
where—we can use any set of 2D locations and suitable weights that give a
good estimate of the integral. Just as Gaussian quadrature gives the best choice
of points for an integral in one dimension, so we can ask what the best choice
is for two dimensions, or for higher dimensions like three or four. It turns
out, however, that the answer to this question is not known in general. There
are some results for special cases, but no general answer. Various point sets
have been proposed for use with 2D integrals that appear to give reasonable
results, but there is no claim that they are the best possible choices. Typically
they are selected because they have some other desirable properties, such as
nesting, and not because they give the most accurate answer. One common
choice of point set is the Sobol sequence, shown for N = 128 points in Fig. 5.6.
Sobol sequences and similar sets of points are known as low-discrepancy point
sets or sometimes quasi-random point sets (although the latter name is a poor
one because there’s nothing random about them). Another common way to

177
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

Figure 5.7: Integration over a non-rectangular domain. When the limits of multiple
integrals depend on one another they can produce arbitrarily shaped domains of in-
tegration. This figure shows the triangular domain that results from the integral in
Eq. (5.79). The gray region is the domain of integration. Note how the points become
squashed together towards the bottom of the plot.

choose the sample points is to choose them completely randomly, which leads
to the method known as Monte Carlo integration. Choosing points at random
may seem like an odd idea, but as we will see it can be a useful approach for
certain types of integrals, particularly integrals over very many variables. We
will look at Monte Carlo integration in Section 10.2, after we study random
number generators.
In the integral of Eq. (5.74) the limits of both integrals were constant, which
made the domain of integration rectangular in xy space. It’s not uncommon,
however, for the limits of one integral to depend on the other, as here:
Z 1 Z y
I= dy dx f ( x, y). (5.79)
0 0

We can use the same approach as before to evaluate this integral. We define
Z y
F (y) = f ( x, y) dx, (5.80)
0

so that Z 1
I= F (y) dy, (5.81)
0

178
5.8 | M ULTIPLE INTEGRALS

Figure 5.8: A complicated integration domain. Integration domains can be arbitrarily


complicated in their shapes. They can even contain holes, or take on complex topologies
in higher dimensions such as tori or knotted topologies.

and then do both integrals with any method we choose, such as Gaussian
quadrature. The result, again, is a two-dimensional integration rule, but now
with the sample points arranged in a triangular space as shown in Fig. 5.7.
This method will work, and will probably give reasonable answers, but it’s
not ideal. In particular note how the sample points are crammed together in
the lower left corner of the integration domain but much farther apart at the
top. This means, all other things being equal, that we’ll have lower accuracy
for the part of the integral at the top. It would be better if the accuracy were
roughly uniform.
And things can get worse still. Suppose the domain of integration takes
some more complicated shape like Fig. 5.8. We will not come across any exam-
ples this complicated in this book, but if we did there would be various tech-
niques we could use. One is the Monte Carlo integration method mentioned
above, which we study in detail in Section 10.2. Another is to set the integrand
to zero everywhere outside the domain of integration and then integrate it us-
ing a standard method over some larger, regularly shaped domain, such as a
rectangle, that completely encloses the irregular one. There are many more
sophisticated techniques as well, but we will not need them for the moment.

179
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

Exercise 5.13: Gravitational pull of a uniform sheet


A uniform square sheet of metal is floating motionless in space:

1kg point mass

z
y

10m

The sheet is 10 m on a side and of negligible thickness, and it has a mass of 10 metric
tonnes.
a) Consider the gravitational force due to the plate felt by a point mass of 1 kg a
distance z from the center of the square, in the direction perpendicular to the
sheet, as shown above. Show that the component of the force along the z-axis is
ZZ L/2 dx dy
dFz = Gρz ,
− L/2 ( x2 + y2 + z2 )3/2

where G = 6.674 × 10−11 m3 kg−1 s−2 is Newton’s gravitational constant and ρ is


the mass per unit area of the sheet.
b) Write a program to calculate and plot the force as a function of z from z = 0
to z = 10 m. For the double integral use (double) Gaussian quadrature, as in
Eq. (5.78), with 100 sample points along each axis.
c) You should see a smooth curve, except at very small values of z, where the force
should drop off suddenly to zero. This drop is not a real effect, but an artifact of
the way we have done the calculation. Explain briefly where this artifact comes
from and suggest a strategy to remove it, or at least to decrease its size.

5.9 D ERIVATIVES
The opposite of a numerical integral is a numerical derivative. You hear a
lot less about numerical derivatives than integrals, however, for a number of
reasons:

180
5.9 | D ERIVATIVES

1. The basic techniques for numerical derivatives are quite simple, so they
don’t take long to explain.
2. Derivatives of known functions can always been calculated analytically,
so there’s less need to calculate them numerically.
3. There are some significant practical problems with numerical deriva-
tives, which means they are used less often then numerical integrals.
(There are, however, some situations in which they are important, par-
ticularly in the solution of partial differential equations, which we will
look at in Chapter 9.)
For all of these reasons this is a short section—you need to know about numer-
ical derivatives, but we won’t spend too much time on them.

5.9.1 F ORWARD AND BACKWARD DIFFERENCES

The standard definition of a derivative, the one you see in the calculus books,
is
df f ( x + h) − f ( x )
= lim . (5.82)
dx h →0 h
The basic method for calculating numerical derivatives is precisely an imple-
mentation of this formula. We can’t take the limit h → 0 in practice, but we
can make h very small and then calculate

df f ( x + h) − f ( x )
≃ . (5.83)
dx h
This approximation to the derivative is called the forward difference, because
it’s measured in the forward (i.e., positive) direction from the point of inter-
est x. You can think of it in geometric terms as shown in Fig. 5.9—it’s simply
the slope of the curve f ( x ) measured over a small interval of width h in the
forward direction from x.
There is also the backward difference, which has the mirror image definition

df f ( x ) − f ( x − h)
≃ . (5.84)
dx h
The forward and backward differences typically give about the same answer
and in many cases you can use either. Most often one uses the forward dif-
ference. There are a few special cases where one is preferred over the other,
particularly when there is a discontinuity in the derivative of the function at
the point x or when the domain of the function is bounded and you want the
value of the derivative on the boundary, in which case only one or other of the

181
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

f(x)

difference
forward
difference
backward
x−h x x+h

Figure 5.9: Forward and backward differences. The forward and backward differ-
ences provide two different approximations to the derivative of a function f ( x ) at the
point x in terms of the slopes of small segments measured in the forward (i.e., positive)
direction from x and the backward (negative) direction, respectively.

two difference formulas will work. The rest of the time, however, there is little
to choose between them.
Before using either the forward or backward difference we must choose a
value for h. To work out what the best value is we need to look at the errors
and inaccuracies involved in calculating numerical derivatives.

5.9.2 E RRORS

Calculations of derivatives using forward and backward differences are not


perfectly accurate. There are two sources of error. The first is rounding error of
the type discussed in Section 4.2. The second is the approximation error that
arises because we cannot take the limit h → 0, so our differences are not really
true derivatives. By contrast with numerical integrals, where, as we have seen,
rounding error is usually negligible, it turns out that both sources of error are
important when we calculate a derivative.
To understand why this is let us focus on the forward difference and con-
sider the Taylor expansion of f ( x ) about x:

f ( x + h) = f ( x ) + h f ′ ( x ) + 12 h2 f ′′ ( x ) + . . . (5.85)

where f ′ and f ′′ denote the first and second derivatives of f . Rearranging this

182
5.9 | D ERIVATIVES

expression, we get

f ( x + h) − f ( x ) 1 ′′
f ′ (x) = − 2 h f (x) + . . . (5.86)
h
When we calculate the forward difference we calculate only the first part on
the right-hand side, and neglect the term in f ′′ ( x ) and all higher terms. The
size of these neglected terms measures the approximation error on the forward
difference. Thus, to leading order in h, the absolute magnitude of the approxi-
mation error is 21 h | f ′′ ( x )|, which is linear in h so that, as we would expect, we
should get more accurate answers if we use smaller values of h.
But now here is the problem: as we saw in Section 4.2, subtracting numbers
from one another on a computer can give rise to big rounding errors (in frac-
tional terms) if the numbers are close to one another. And that’s exactly what
happens here—the numbers f ( x + h) and f ( x ) that we are subtracting will be
very close to one another if we make h small. Thus if we make h too small,
we will get a large rounding error in our result. This puts us in a difficult
situation: we want to make h small to make the forward difference approxi-
mation as accurate as possible, but if we make it too small we will get a large
rounding error. To get the best possible answer, we are going to have to find a
compromise.
In Section 4.2 we saw that the computer can typically calculate a number
such as f ( x ) to an accuracy of C f ( x ), where the value of the error constant C
can vary but is typically about C = 10−16 in Python. Since f ( x + h) is normally
close in value to f ( x ), the accuracy of our value for f ( x + h) will also be about
the same, and the absolute magnitude of the total rounding error on f ( x +
h) − f ( x ) will, in the worst case, be about 2C | f ( x )|—it might be better than
this if the two errors go in opposite directions and happen to cancel out, but
we cannot assume that this will be the case. Then the worst-case rounding
error on the complete forward difference, Eq. (5.83), will be 2C | f ( x )|/h.
Meanwhile, the approximation error is, as we have said, about 12 h | f ′′ ( x )|
from Eq. (5.86), which means that the total error ǫ on our derivative, in the
worst case, is
2C | f ( x )| 1 ¯¯ ′′ ¯¯
ǫ= + 2 h f (x) . (5.87)
h
We want to find the value of h that minimizes this error, so we differentiate
with respect to h and set the result equal to zero, which gives

2C | f ( x )| 1 ¯¯ ′′ ¯¯
− + 2 f ( x ) = 0, (5.88)
h2

183
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

or equivalently s ¯ ¯
¯ f (x) ¯
h= 4C ¯ ′′
¯ ¯. (5.89)
f (x) ¯
Substituting this value back into Eq. (5.87) we find that the error on our deriva-
tive is q ¯
ǫ = h | f ′′ ( x )| = 4C ¯ f ( x ) f ′′ ( x )¯.
¯
(5.90)
Thus, for instance, if f ( x ) and f ′′ ( x ) are of order 1, we should choose h to be

roughly of order C, which will be typically about 10−8 , and the final error on

our result will also be about C or 10−8 . A similar analysis can be applied to
the backward difference, and gives the same end result.
In other words, we can get about half of the usual numerical precision on
our derivatives but not better. If the precision is, as here, about 16 digits, then
we can get 8 digits of precision on our derivatives. This is substantially poorer
than most of the calculations we have seen so far in this book, and could be a
significant source of error for calculations that require high accuracy.

5.9.3 C ENTRAL DIFFERENCES

We have seen that forward and backward differences are not very accurate.
What can we do to improve the situation? A simple improvement is to use the
central difference:
df f ( x + h/2) − f ( x − h/2)
≃ . (5.91)
dx h
The central difference is similar to the forward and backward differences, ap-
proximating the derivative using the difference between two values of f ( x ) at
points a distance h apart. What’s changed is that the two points are now placed
symmetrically around x, one at a distance 12 h in the forward (i.e., positive) di-
rection and the other at a distance 21 h in the backward (negative) direction.
To calculate the approximation error on the central difference we write two
Taylor expansions:

f ( x + h/2) = f ( x ) + 21 h f ′ ( x ) + 18 h2 f ′′ ( x ) + 1 3 ′′′
48 h f ( x ) +... (5.92)
1 ′ 1 2 ′′ 1 3 ′′′
f ( x − h/2) = f ( x ) − 2 h f (x) + 8 h f (x) − 48 h f ( x ) +... (5.93)

Subtracting the second expression from the first and rearranging for f ′ ( x ), we
get
f ( x + h/2) − f ( x − h/2)
f ′ (x) = 1 2 ′′′
− 24 h f (x) + . . . (5.94)
h

184
5.9 | D ERIVATIVES

To leading order the magnitude of the error is now 24 1 2


h | f ′′′ ( x )|, which is one
order in h higher than before. There is also, as before, a rounding error; its size
is unchanged from our previous calculation, having magnitude 2C | f ( x )|/h,
so the magnitude of the total error on our estimate of the derivative is

2C | f ( x )| 1 ¯
+ h2 ¯ f ′′′ ( x )¯.
¯
ǫ= (5.95)
h 24
Differentiating to find the minimum and rearranging, we find that the optimal
value of h is
¯ f ( x ) ¯ 1/3
µ ¯ ¯¶
h = 24C ¯ ′′′
¯ ¯ , (5.96)
f (x) ¯
and substituting this back into Eq. (5.95) we find the optimal error itself to be
¯¢1/3
ǫ = 81 h2 ¯ f ′′′ ( x )¯ = 89 C2 [ f ( x )]2 ¯ f ′′′ ( x )¯
¯ ¯ ¡ ¯
. (5.97)

Thus, for instance, if f ( x ) and f ′′′ ( x ) are of order 1, the ideal value of h is going
to be around h ≃ C1/3 , which is typically about 10−5 but the error itself will be
around C2/3 , or about 10−10 .
Thus the central difference is indeed more accurate than the forward and
backward differences, by a factor of 100 or so in this case, though we get this
accuracy by using a larger value of h. This may seem slightly surprising, but it
is the correct result.

E XAMPLE 5.4: D ERIVATIVE OF A SAMPLED FUNCTION

As an example application of the central difference, suppose we are given the


values of a function f ( x ) measured at regularly spaced sample points a dis-
tance h apart—see Fig. 5.10. One often gets such samples from data collected
in the laboratory, for example. Now suppose we want to calculate the deriva-
tive of f at one of these points (case (a) in the figure). We could use a forward
or backward difference based on the sample at x and one of the adjacent ones,
or we could use a central difference. However, if we use a central difference,
which is based on points equally spaced on either side of x, then we must use
the points at x + h and x − h. We cannot, as in Eq. (5.91), use points at x + h/2
and x − h/2 because there are no such points—we only have the samples we
are given. The formula for the central difference in this case will thus be

df f ( x + h) − f ( x − h)
≃ . (5.98)
dx 2h

185
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

(a)
(b)
f (x )

Figure 5.10: Derivative of a sampled function. (a) If we only know the function at a
set of sample points spaced a distance h apart then we must chose between calculating
the forward or backward difference between adjacent samples, or the central difference
between samples 2h apart. We cannot calculate a central difference using the standard
formula, Eq. (5.91), because we do not know the value of the function at x ± 21 h. (b) We
can, however, calculate the value of the derivative at a point half way between two
samples (dotted line) using the standard formula.

This means that the interval between the points we use is 2h for the central
difference, but only h for the forward and backward differences. So which will
give a better answer? The central difference because it’s a better approximation
or the forward difference because of its smaller interval?
From Eq. (5.90) we see the error on the forward difference is h | f ′′ ( x )| and
from Eq. (5.97) the error on the central difference—with h replaced by 2h—
is h2 | f ′′′ ( x )|. Which is smaller depends on the value of h. For the central
difference to give the more accurate answer, we require h2 | f ′′′ ( x )| < h | f ′′ ( x )|
or ¯ ′′ ¯
¯ f (x) ¯
h < ¯¯ ′′′ ¯. (5.99)
f (x) ¯
If h is larger than this then the forward difference is actually the better approx-
imation in this case.
But now suppose that instead of calculating the value of the derivative at
one of the sample points itself, we want to calculate it at a point x that lies half
way between two of the samples—case (b) in Fig. 5.10. Viewed from that point
we do have samples at x + h/2 and x − h/2, so now we can use the original
form of the central difference, Eq. (5.91), with an interval only h wide, as with
the forward difference. This calculation will give a more accurate answer, but
only at the expense of calculating the result at a point in between the samples.

186
5.9 | D ERIVATIVES

Exercise 5.14: Create a user-defined function f(x) that returns the value 1 + 12 tanh 2x,
then use a central difference to calculate the derivative of the function in the range
−2 ≤ x ≤ 2. Calculate an analytic formula for the derivative and make a graph with
your numerical result and the analytic answer on the same plot. It may help to plot the
exact answer as lines and the numerical one as dots. (Hint: In Python the tanh function
is found in the math package, and it’s called simply tanh.)

Exercise 5.15: Even when we can find the value of f ( x ) for any value of x the forward
difference can still be more accurate than the central difference for sufficiently large h.
For what values of h will the approximation error on the forward difference of Eq. (5.83)
be smaller than on the central difference of Eq. (5.91)?

5.9.4 H IGHER - ORDER APPROXIMATIONS FOR DERIVATIVES

One way to think about the numerical derivatives of the previous sections is
that we are fitting a straight line through two points, such as the points f ( x )
and f ( x + h), and then asking about the slope of that line at the point x. The
trapezoidal rule of Section 5.1.1 does a similar thing for integrals, approximat-
ing a curve by a straight line between sample points and estimating the area
under the curve using that line. We saw that we can make a higher-order—and
usually better—approximation to an integral by fitting a quadratic or higher-
order polynomial instead of a straight line, and this led to the Simpson and
Newton–Cotes rules for integrals. We can take a similar approach with deriva-
tives by fitting a polynomial to a set of sample points and then calculating the
derivative of the polynomial at x.
Consider, for example, fitting a quadratic curve y = ax2 + bx + c to the
function f ( x ). We require three sample points to make the fit and, as with the
central difference of Section 5.9.3, the best results are obtained by placing the
points symmetrically about the point of interest x. Suppose, for example, that
we are interested in the derivative at x = 0, so we place our three points at − h,
0, and + h, for some h that we choose. Requiring that our quadratic is equal to
f ( x ) at these three points gives us three equations thus:

ah2 − bh + c = f (− h), c = f (0), ah2 + bh + c = f (h), (5.100)

In principle, we can now solve these equations for the three parameters a, b,
and c. (This is the same calculation that we did in Section 5.1.2 for Simpson’s
rule.) However, in this case, we don’t need the whole solution, because we

187
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

don’t need all of the parameters. Given the quadratic fit y = ax2 + bx + c, the
derivative of the curve at the point x = 0 is

dy h i
= 2ax + b = b. (5.101)
dx x =0

So we need only the one parameter b, which we can get from Eq. (5.100) by
subtracting the first equation from the third to give 2bh = f (h) − f (− h) and
rearranging. Thus our approximation for the derivative at x = 0 is

df f (h) − f (− h)
≃ . (5.102)
dx 2h
We have done this calculation for the derivative at x = 0, but the same result
applies at any other point—we can slide the whole function up or down the
x-axis, to put any point x at the origin and then calculate the derivative from
the formula above. Or, equivalently, we can just write

df f ( x + h) − f ( x − h)
≃ (5.103)
dx 2h
for general x.
This is the correct result for the quadratic approximation, but it’s a disap-
pointing result, since Eq. (5.103) is nothing other than the central difference
approximation for sample points 2h apart, which we already saw in Eq. (5.98).
In other words, the higher-order approximation has not helped us in this case.
However, going to still higher orders does help. If we use a cubic or quar-
tic approximation, we do get improved estimates of the derivative. At higher
orders there is a distinction between the odd- and even-order approximations.
For the odd-order ones the sample points fall at “half-way” points, as with
the central difference of Eq. (5.91). For instance, to get the four sample points
required for a cubic approximation, symmetrically distributed about zero, we
would choose them to fall at x = − 23 h, − 21 h, 12 h, and 32 h. For even-order ap-
proximations, on the other hand, the samples fall at “integer” points; the five
points for the quartic approximation, for instance, fall at −2h, − h, 0, h, and 2h.
The methodology for deriving the higher-order approximations follows the
same pattern as for the quadratic case: we write down the required value of the
polynomial at each of the sample points, which gives us a set of simultaneous
equations in the polynomial coefficients. As before, we actually need only
one of those coefficients, the coefficient of the linear term in the polynomial.
Solving for this coefficient gives us our expression for the derivative. At each
order the expression is a linear combination of the samples, divided by h. We

188
5.9 | D ERIVATIVES

Degree f (− 25 h) f (−2h) f (− 23 h) f (− h) f (− 21 h) f (0) f ( 21 h) f (h) f ( 23 h) f (2h) f ( 25 h) Error


1 −1 1 O( h2 )
2 − 21 1
2 O( h2 )
1 27 27 1
3 24 − 24 24 − 24 O( h4 )
1 2 2 1
4 12 −3 3 − 12 O( h4 )
3 25 75 75 25 3
5 − 640 384 − 64 64 − 384 640 O( h6 )
Table 5.1: Coefficients for numerical derivatives. The coefficients for central approximations to the first derivative
of f ( x ) at x = 0. To derive the full expression for an approximation, multiply the samples listed in the top row
of the table by the coefficients in one of the other rows, then divide by h. For instance, the cubic approximation
1
would be [ 24 f (− 32 h) − 24
27
f (− 21 h) + 24
27
f ( 12 h) − 24
1
f ( 23 h)]/h. For derivatives at points other than x = 0 the same
coefficients apply—one just uses the appropriate sample points around the value x of interest. The final column of
the table gives the order of the approximation error on the derivative.

will not go through the derivations in detail, but Table 5.1 gives the coefficients
of the combinations for the first five approximations.
Each of the approximations given in the table is exact, apart from rounding
error, if the function being differentiated is actually a polynomial of the appro-
priate (or lower) degree, so that the polynomial fit is a perfect one. Most of
the time, however, this will not be the case and there will be an approxima-
tion error involved in calculating the derivative. One can calculate this error
to leading order for each of the approximations by a method analogous to our
calculations for the forward, backward, and central differences: we perform
Taylor expansions about x = 0 to derive expressions for f ( x ) at each of the
sample points, then plug these expressions into the formula for the derivative.
The order in h of the resulting error is listed in the final column of Table 5.1. As
before, this approximation error must be balanced against the rounding error
and a suitable value of h chosen to minimize the overall error in the derivative.
An interesting point to notice about Table 5.1 is that the coefficient for f (0)
in all the approximations is zero. The value of the function exactly at the point
of interest never plays a role in the evaluation of the derivative. Another (not
unrelated) point is that the order in h of the error given in the final column does
not go up uniformly with the degree of the polynomial—it is the same for the
even-degree polynomials as for the next-lower odd-degree ones. We saw a
special case of this result for the quadratic: the quadratic fit just gives us an or-
dinary central difference and therefore necessarily has an error O(h2 ), the same
as the central difference derived from the linear fit. In general, the odd-degree
approximations give us slightly more accurate results than the even-degree
ones—the error is of the same order in h but the constant of proportionality

189
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

is smaller. On the other hand, the odd-degree approximations require sam-


ples at the half-way points, as we have noted, which can be inconvenient. As
discussed in Example 5.4, we sometimes have samples at only the “integer”
points, in which case we must use the even-degree approximations.
We will not be using quadratic or higher-order derivative approximations
in the remainder of this book—the forward, backward, and central differences
will be all we need. But it is worth knowing about them nonetheless; such
things come in handy every once in a while.

5.9.5 S ECOND DERIVATIVES

We can also derive numerical approximations for the second derivative of a


function f ( x ). The second derivative is, by definition, the derivative of the
first derivative, so we can calculate it by applying our first-derivative formulas
twice. For example, starting with the central difference formula, Eq. (5.91), we
can write expressions for the first derivative at x + h/2 and x − h/2 thus:

f ( x + h) − f ( x ) f ( x ) − f ( x − h)
f ′ ( x + h/2) ≃ , f ′ ( x − h/2) ≃ . (5.104)
h h
Then we apply the central difference again to get an expression for the second
derivative:
f ′ ( x + h/2) − f ′ ( x − h/2)
f ′′ ( x ) ≃
h
[ f ( x + h) − f ( x )]/h − [ f ( x ) − f ( x − h)]/h
=
h
f ( x + h) − 2 f ( x ) + f ( x − h)
= . (5.105)
h2
This is the simplest approximation for the second derivative. We will use it ex-
tensively in Chapter 9 for solving second-order differential equations. Higher-
order approximations exist too, but we will not use them in this book.
We can also calculate the error on Eq. (5.105). We perform two Taylor ex-
pansions of f ( x ) thus:

f ( x + h) = f ( x ) + h f ′ ( x ) + 21 h2 f ′′ ( x ) + 61 h3 f ′′′ ( x ) + 1 ′′′′
24 f (x) +... (5.106)
′ 1 2 ′′ 1 3 ′′′ 1 ′′′′
f ( x − h) = f ( x ) − h f ( x ) + 2 h f (x) − 6 h f (x) + 24 f (x) −... (5.107)

Adding them together and rearranging, we find that

f ( x + h) − 2 f ( x ) + f ( x − h)
f ′′ ( x ) = − 1 2 ′′′′
12 h f (x) +... (5.108)
h2

190
5.9 | D ERIVATIVES

The first term on the right is our formula for the second derivative, Eq. (5.105),
and the remainder of the terms measure the error. Thus, to leading order,
the absolute error inherent in our approximation to the second derivative is
1 2 ′′′′ ( x )|. As before, we also need to take rounding error into account,
12 h | f
which contributes an error of roughly C | f ( x )| on each value of f ( x ) so that, in
the worst case, the total rounding error in the numerator of (5.105) is 4C | f ( x )|
and the rounding error on the whole expression is 4C | f ( x )|/h2 . Then the com-
plete error on the derivative is
4C | f ( x )| 1 2 ¯ ′′′′
¯ ¯
ǫ= 2
+ 12 h f ( x ) ¯. (5.109)
h
Differentiating with respect to h and setting the result to zero then gives an
optimum value of h of
¯ f ( x ) ¯ 1/4
µ ¯ ¯¶
h = 48C ¯ ′′′′¯ ¯ . (5.110)
f (x) ¯
Substituting this expression back into Eq. (5.109) gives the size of the optimal
error to be
¯¢1/2
ǫ = 16 h2 ¯ f ′′′′ ( x )¯ = 43 C ¯ f ( x ) f ′′′′ ( x )¯
¯ ¯ ¡ ¯
. (5.111)
So if, for instance, f ( x ) and f ′′′′ ( x ) are of order 1, the error will be roughly of

order C, which is typically about 10−8 . This is about the same accuracy as
we found for the forward and backward difference approximations to the first
derivative in Section 5.9.2. Thus our expression for the second derivative is
not very accurate—about as good as, but not better than, the forward differ-
ence. As we mentioned above, there are higher-order approximations for the
second derivative that can give more accurate answers, but for our purposes,
Eq. (5.105) will be good enough.

5.9.6 PARTIAL DERIVATIVES

We will come across a number of situations where we need to calculate par-


tial derivatives—derivatives of a function of several variables with respect to
only one of those variables. The calculation of such partial derivatives is a
simple generalization of the calculation of ordinary derivatives. If you have a
function f ( x, y) of two variables, for instance, then the central difference ap-
proximations to derivatives with respect to x and y are
∂f f ( x + h/2, y) − f ( x − h/2, y)
= , (5.112)
∂x h
∂f f ( x, y + h/2) − f ( x, y − h/2)
= . (5.113)
∂y h

191
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

By analogy with our approach for the second derivative in Section 5.9.5 we
can also calculate second derivatives with respect to either variable, or a mixed
second derivative with respect to both, which is given by

∂2 f f ( x + h/2, y + h/2) − f ( x − h/2, y + h/2) − f ( x + h/2, y − h/2) + f ( x − h/2, y − h/2)


= .
∂x∂y h2
(5.114)
We leave the derivation to the avid reader.

5.9.7 D ERIVATIVES OF NOISY DATA

Suppose we have some measurements of a quantity that, when plotted on a


graph, look like Fig. 5.11a. Perhaps they come from an experiment in the lab,
for instance. The overall shape of the curve is clear from the figure, but there
is some noise in the data, so the curve is not completely smooth.
Now suppose we want to calculate the first derivative of this curve. So we
write a program to calculate, say, the forward difference at each point and plot
the values we get. The result is shown in Fig. 5.11b. As you can see, taking the
derivative has made our noise problem much worse. Now it’s almost impos-
sible to see the shape of the curve. This is a common problem with numerical
derivatives—if there’s any noise in the curve you’re differentiating, then it can
be greatly exaggerated by taking the derivative, perhaps to the point where the
results are useless.
The reason for the problem is easy to see if you zoom in on a small portion
of the original data, as shown in Fig. 5.12. In this figure the solid line represents
the actual data, and the dotted line is a sketch of what the underlying curve,
without the noise, probably looks like. (We don’t usually know the underlying
curve, so this is just a guess.) When viewed close-up like this, we can see that,
because of the noise, the slope of the noisy line is very steep in some places,
and completely different from the slope of the underlying curve. Although the
noisy curve follows the underlying one reasonably closely, its derivative does
not. So now, when we calculate the derivative, we generate spurious large
values where there should be none.
Unfortunately, this kind of issue is common with physics data, and this
is one of the reasons why numerical derivatives are used less than numerical
integrals. There are, however, some things we can do to mitigate the problem,
although they all also decrease the accuracy of our results:
1. The simplest thing we can do is increase the value of h. We can treat

192
5.9 | D ERIVATIVES

-1

0 200 400 600 800 1000

(a)

0.2

0.1

-0.1

-0.2

0 200 400 600 800 1000

(b)

Figure 5.11: Derivative of noisy data. (a) An example of a noisy data set. The data
plotted in this graph have a clear underlying form, but contain some random noise
or experimental error as well. (b) The derivative of the same data calculated using a
forward difference. The action of taking the derivative amplifies the noise and makes
the underlying form of the result difficult to discern.

193
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

0.3

0.2

0.1

-0.1

-0.2
0 10 20 30 40 50

Figure 5.12: An expanded view of the noisy data. The jagged line in this plot is an
enlargement of the first portion of the data from Fig. 5.11a, while the dotted line is a
guess about the form of the underlying curve, without the noise.

the noise in the same way that we treat rounding error and calculate an
optimum value for h that balances the error from the noise against the
error in our approximation of the derivative. The end result is a formula
similar to Eq. (5.89) for the forward difference or Eq. (5.96) for the central
difference, but with the error constant C replaced by the fractional error
introduced into the data by the noise (which is the inverse of the so-called
signal-to-noise ratio).
2. Another approach is to fit a curve to a portion of the data near the point
where we want the derivative, then differentiate the curve. For instance,
we might fit a quadratic or a cubic, then differentiate that. We do not,
however, fit a quadratic to just three sample points or a cubic to just four,
as we did in Section 5.9.4. Instead we do a least-squares fit to find the
curve that best approximates a larger number of points, even though it
will not typically pass exactly through all those points. In effect, we are
trying to find an approximation to the underlying smooth curve depicted
in Fig. 5.12. The derivative of this curve then gives an estimate of the true
derivative of the data without noise.
3. A third approach is to smooth the data in some other fashion before dif-
ferentiating, which can be done, for instance, using Fourier transforms,
which we will study in Chapter 7. (See Exercise 7.3 for an example

194
5.10 | I NTERPOLATION

0.5

-0.5

-1

0 200 400 600 800 1000

Figure 5.13: Smoothed data and an improved estimate of the derivative. The gray
curve in this plot is a version of the data from Fig. 5.11a that has been smoothed to
remove noise using a Fourier transform method. The black curve shows the numerical
derivative of the smoothed function, which is a significant improvement over Fig. 5.11b.

of Fourier smoothing.) Figure 5.13 shows a version of the data from


Fig. 5.11 that has been smoothed in this way, and the corresponding
derivative, which is much cleaner now.

5.10 I NTERPOLATION
We will tackle one more topic briefly in this chapter, the topic of interpolation,
which is not directly related to integrals and derivatives, but uses similar math-
ematical methods, making this a good moment to look into it.
Suppose you are given the value of a function f ( x ) at just two points x =
a, b and you want to know the value of the function at another point x in be-
tween. What do you do? There are a number of possibilities, of which the
simplest is linear interpolation, which is illustrated in Fig. 5.14. We assume our
function follows a straight line from f ( a) to f (b), which in most cases is an
approximation—likely the function follows a curve between the two points, as
sketched in the figure. But if we make this assumption then we can calculate
f ( x ) with some elementary geometry.
The slope of the straight-line approximation is

f (b) − f ( a)
m= , (5.115)
b−a

195
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

actual curve
f(b)

straight line
y
f(a)

a x b

Figure 5.14: Linear interpolation. The value of f ( x ) in between the two known points
at x = a and x = b is estimated by assuming a straight line from f ( a) to f (b).

and the distance marked y on the figure is given in terms of this slope by y =
m( x − a). The distance marked z is equal to f ( a), so

f (b) − f ( a)
f (x) ≃ y + z = ( x − a) + f ( a)
b−a
(b − x ) f ( a) + ( x − a) f (b)
= . (5.116)
b−a
This is the fundamental formula of linear interpolation. In fact, this same for-
mula can also be used to extrapolate the function to points outside the interval
from a to b, although one should not extrapolate too far. The further you go,
the less likely it is that the extrapolation will be accurate.
How accurate is the linear interpolation formula? The calculation of the
error is similar to that for derivatives, making use of two Taylor expansions:

f ( a) = f ( x ) + ( a − x ) f ′ ( x ) + 12 ( a − x )2 f ′′ ( x ) + . . . (5.117)
f (b) = f ( x ) + (b − x ) f ′ ( x ) + 12 (b − x )2 f ′′ ( x ) + . . . (5.118)

Substituting these into Eq. (5.116), the terms in f ′ ( x ) cancel and, after rearrang-
ing a little, we find that

(b − x ) f ( a) + ( x − a) f (b)
f (x) = + ( a − x )(b − x ) f ′′ ( x ) + . . . (5.119)
b−a
The first term on the right-hand side is our linear interpolation formula; the
remainder of the terms are the error. Note that the leading-order error term

196
5.10 | I NTERPOLATION

vanishes as x tends to either a or b, so that either b − x or a − x becomes small.


And assuming f ′′ ( x ) varies slowly, the error will be largest in the middle of
the interval. If we denote the width of the interval by b − a = h, then when
we are in the middle we have x − a = b − x = 12 h and the magnitude of the
leading-order error is 14 h2 | f ′′ ( x )|. Thus, like the central difference formula for
a first derivative, the worst-case error on a linear interpolation is O(h2 ), and
we can make the interpolation more accurate by making h smaller.
By contrast with the case of derivatives, however, we do not need to be
particularly careful about rounding error when using linear interpolation. The
interpolation formula, Eq. (5.116), involves the sum of values of f ( x ) at two
closely spaced points, not the difference, so we don’t normally run into the
accuracy problems that plague calculations (like calculations of derivatives)
that are based on subtractions.
Can we do better than linear interpolation? Not if we know the value of the
function f ( x ) at only two points—there is no better approximation in that case.
If we know the function at more than two points there are several ways to im-
prove on linear interpolation. The most obvious is to interpolate using higher-
order polynomials. If we have three points, for instance, we can fit a quadratic
through them, which will usually give a better match to the underlying curve.
Fitting quadratics or higher polynomials leads to a set of higher-order methods
known as Lagrange interpolation methods.
When the number of points becomes large, however, this approach breaks
down. If we have a large number N of points then you might think the best
thing to do would be to fit an ( N − 1)th order polynomial through them, but it
turns out this doesn’t work because very high order polynomials tend to have a
lot of wiggles in them and can deviate from the fitted points badly in the inter-
vals between points. It’s better in this case to fit many lower-order polynomials
such as quadratics or cubics to smaller sets of adjacent points. Unfortunately,
the naive implementation of such a scheme gives rather uneven interpolations
because the slope of the interpolation changes at the join-points between poly-
nomials. A more satisfactory approach is to fit polynomials to the measured
points plus the derivatives at their ends, so that one gets a function that goes
through the points and has a smooth slope everywhere. Such interpolations
are called splines. The most widely used type are cubic splines. We won’t go
into these methods further, however. For our purposes, linear interpolation
will be good enough.

197
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

F URTHER EXERCISES

5.1 The gamma function: A commonly occurring function in physics calculations is


the gamma function Γ( a), which is defined by the integral
Z ∞
Γ( a) = x a−1 e−x dx.
0

There is no closed-form expression for the gamma function, but one can calculate its
value for given a by performing the integral above numerically. You have to be careful
how you do it, however, if you wish to get an accurate answer.
a) Write a program to make a graph of the value of the integrand x a−1 e−x as a func-
tion of x from x = 0 to x = 5, with three separate curves for a = 2, 3, and 4,
all on the same axes. You should find that the integrand starts at zero, rises to a
maximum, and then decays again for each curve.
b) Show analytically that the maximum falls at x = a − 1.
c) Most of the area under the integrand falls near the maximum, so to get an accurate
value of the gamma function we need to do a good job of this part of the integral.
We can change the integral from 0 to ∞ to one over a finite range from 0 to 1 using
the change of variables in Eq. (5.63), but this tends to squash the peak towards the
edge of the [0, 1] range and does a poor job of evaluating the integral accurately.
We can do a better job by making a different change of variables that puts the
peak in the middle of the integration range, around 21 . We will use the change of
variables given in Eq. (5.65), which we repeat here for convenience:
x
z= .
c+x
For what value of x does this change of variables give z = 12 ? Hence what is the
appropriate choice of the parameter c that puts the peak of the integrand for the
gamma function at z = 12 ?
d) Before we can calculate the gamma function, there is another detail we need to
attend to. The integrand x a−1 e−x can be difficult to evaluate because the fac-
tor x a−1 can become very large and the factor e−x very small, causing numerical
overflow or underflow, or both, for some values of x. Write x a−1 = e(a−1) ln x to
derive an alternative expression for the integrand that does not suffer from these
problems (or at least not so much). Explain why your new expression is better
than the old one.
e) Now, using the change of variables above and the value of c you have chosen,
write a user-defined function gamma(a) to calculate the gamma function for arbi-
trary argument a. Use whatever integration method you feel is appropriate. Test
your function by using it to calculate and print the value of Γ( 32 ), which is known

to be equal to 12 π ≃ 0.886.
f) For integer values of a it can be shown that Γ( a) is equal to the factorial of a −
1. Use your Python function to calculate Γ(3), Γ(6), and Γ(10). You should get
answers closely equal to 2! = 2, 5! = 120, and 9! = 362 880.

198
E XERCISES

5.2 Diffraction gratings: Light with wavelength λ is incident on a diffraction grating


of total width w, gets diffracted, is focused with a lens of focal length f , and falls on a
screen:
Incident light

Grating Lens
Screen
f

Theory tells us that the intensity of the diffraction pattern on the screen, a distance x
from the central axis of the system, is given by
¯Z w/2 q ¯2
q(u) ei2πxu/λ f du¯¯ ,
¯ ¯
I ( x ) = ¯¯
−w/2

where q(u) is the intensity transmission function of the diffraction grating at a dis-
tance u from the central axis.
a) Consider a grating with transmission function q(u) = sin2 αu. What is the sepa-
ration of the “slits” in this grating, expressed in terms of α?
b) Write a Python function q(u) that returns the transmission function q(u) = sin2 αu
as above at position u for a grating whose slits have separation 20 µm.
c) Use your function in a program to calculate and graph the intensity of the diffrac-
tion pattern produced by such a grating having ten slits in total, if the incident
light has wavelength λ = 500 nm. Assume the lens has a focal length of 1 meter
and the screen is 10 cm wide. You can use whatever method you think appropriate
for doing the integral. Once you’ve made your choice you’ll also need to decide
the number of sample points you’ll use. What criteria play into this decision?
Notice that the integrand in the equation for I ( x ) is complex, so you will have to
use complex variables in your program. As mentioned in Section 2.2.5, there is a
version of the math package for use with complex variables called cmath. In par-
ticular you may find the exp function from cmath useful because it can calculate
the exponentials of complex arguments.
d) Create a visualization of how the diffraction pattern would look on the screen
using a density plot (see Section 3.3). Your plot should look something like this:

199
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

e) Modify your program further to make pictures of the diffraction patterns pro-
duced by gratings with the following profiles:
i) A transmission profile that obeys q(u) = sin2 αu sin2 βu, with α as before and
the same total grating width w, and β = 21 α.
ii) Two “square” slits, meaning slits with 100% transmission through the slit
and 0% transmission everywhere else. Calculate the diffraction pattern for
non-identical slits, one 10 µm wide and the other 20 µm wide, with a 60 µm
gap between the two.

5.3 Electric field of a charge distribution: We have a distribution of charges and we


want to calculate the resulting electric field. One way to do this is to first calculate the
electric potential φ and then take its gradient. For a point charge q at the origin, the
electric potential at a distance r from the origin is φ = q/4πǫ0 r and the electric field is
E = −∇φ.
a) You have two charges, of ±1 C, 10 cm apart. Calculate the resulting electric po-
tential on a 1 m × 1 m square plane surrounding the charges and passing through
them. Calculate the potential at 1 cm spaced points in a grid and make a visual-
ization on the screen of the potential using a density plot.
b) Now calculate the partial derivatives of the potential with respect to x and y and
hence find the electric field in the xy plane. Make a visualization of the field also.
This is a little trickier than visualizing the potential, because the electric field has
both magnitude and direction. One way to do it might be to make two density
plots, one for the magnitude, and one for the direction, the latter using the “hsv”
color scheme in pylab, which is a rainbow scheme that passes through all the
colors but starts and ends with the same shade of red, which makes it suitable
for representing things like directions or angles that go around the full circle and
end up where they started. A more sophisticated visualization might use the
arrow object from the visual package, drawing a grid of arrows with direction
and length chosen to represent the field.
c) Now suppose you have a continuous distribution of charge over an L × L square.
The charge density in Cm−2 is
2πx 2πy
ρ( x, y) = q0 sin sin .
L L
Calculate and visualize the resulting electric field at 1 cm-spaced points in 1 square
meter of the xy plane for the case where L = 10 cm, the charge distribution is cen-
tered in the middle of the visualized area, and q0 = 100 Cm−2 . You will have
to perform a double integral over x and y, then differentiate the potential with
respect to position to get the electric field. Choose whatever integration method
seems appropriate for the integrals.

5.4 Differentiating by integrating: If you are familiar with the calculus of complex
variables, you may find the following technique useful and interesting.

200
E XERCISES

Suppose we have a function f (z) whose value we know not only on the real line
but also for complex values of its argument. Then we can calculate derivatives of that
function at any point z0 by performing a contour integral, using the Cauchy derivative
formula: µ m ¶
d f m! f (z)
I
= dz,
dz m
z = z0
2πi ( z − z 0 ) m +1
where the integral is performed counterclockwise around any contour in the complex
plane that surrounds the point z0 but contains no poles in f (z). Since numerical in-
tegration is significantly easier and more accurate than numerical differentiation, this
formula provides us with a method for calculating derivatives—and especially multiple
derivatives—accurately by turning them into integrals.
Suppose, for example, that we want to calculate derivatives of f (z) at z = 0. Let
us apply the Cauchy formula above using the trapezoidal rule to calculate the integral
along a circular contour centered on the origin with radius 1. The trapezoidal rule will
be slightly different from the version we are used to because the value of the interval h
is now a complex number, and moreover is not constant from one slice of the integral
to the next—it stays constant in modulus, but its argument changes from one slice to
another.
We will divide our contour integral into N slices with sample points zk distributed
uniformly around the circular contour at the positions zk = ei2πk/N for k = 0 . . . N. Then
the distance between consecutive sample points is

hk = zk+1 − zk = ei2π (k+1)/N − ei2πk/N ,

and, introducing the shorthand g(z) = f (z)/zm+1 for the integrand, the trapezoidal
rule approximation to the integral is
I N −1
∑ 1
g(zk+1 ) + g(zk ) ei2π (k+1)/N − ei2πk/N
£ ¤£ ¤
g(z) dz ≃ 2
k =0
· N −1 N −1
= 1
2 ∑ g(zk+1 ) ei2π (k+1)/N − ∑ g(zk ) ei2πk/N
k =0 k =0
N −1 N −1 ¸
− ∑ g(zk+1 ) ei2πk/N + ∑ g(zk ) ei2π (k+1)/N .
k =0 k =0

Noting that z N = z0 , the first two sums inside the brackets cancel each other in their
entirety, and the remaining two sums are equal except for trivial phase factors, so the
entire expression simplifies to
I ¤ N −1
1
ei2π/N − e−i2π/N ∑ g(zk ) ei2πk/N
£
g(z) dz ≃ 2
k =0
N −1
2πi
≃ ∑ f (zk ) e−i2πkm/N ,
N k =0

201
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

where we have used the definition of g(z) again. Combining this result with the Cauchy
formula, we then have

m! N −1
µ m ¶
d f
f (zk ) e−i2πkm/N .
N k∑

dzm z=0 =0

Write a program to calculate the first twenty derivatives of f (z) = e2z at z = 0 using
this formula with N = 10000. You will need to use the version of the exp function
from the cmath package, which can handle complex arguments. You may also find
the function factorial from the math package useful; it calculates factorials of integer
arguments.
The correct value for the mth derivative in this case is easily shown to be 2m , so
it should be straightforward to tell if your program is working—the results should be
powers of two, 2, 4, 8, 16, 32, etc. You should find that it is possible to get reason-
ably accurate results for all twenty derivatives rapidly using this technique. If you use
standard difference formulas for the derivatives, on the other hand, you will find that
you can calculate only the first three or four derivatives accurately before the numerical
errors become so large that the results are useless. In this case, therefore, the Cauchy
formula gives the better results.
The sum ∑k f (zk ) eı2πkm/N that appears in the formula above is known as the discrete
Fourier transform of the complex samples f (zk ). There exists an elegant technique for
evaluating the Fourier transform for many values of m simultaneously, known as the
fast Fourier transform, which could be useful in cases where the direct evaluation of the
formula is slow. We will study the fast Fourier transform in detail in Chapter 7.

5.5 Image processing and the STM: When light strikes a surface, the amount falling
per unit area depends not only on the intensity of the light, but also on the angle of
incidence. If the light makes an angle θ to the normal, it only “sees” cos θ of area per
unit of actual area on the surface:

t
ligh

θ What the light sees

surface

So the intensity of illumination is a cos θ, if a is the raw intensity of the light. This simple
physical law is a central element of 3D computer graphics. It allows us to calculate how
light falls on three-dimensional objects and hence how they will look when illuminated
from various angles.
Suppose, for instance, that we are looking down on the Earth from above and we
see mountains. We know the height of the mountains w( x, y) as a function of position in

202
E XERCISES

the plane, so the equation for the Earth’s surface is simply z = w( x, y), or equivalently
w( x, y) − z = 0, and the normal vector v to the surface is given by the gradient of
w( x, y) − z thus:
   
∂/∂x ∂w/∂x
v = ∇[w( x, y) − z] =  ∂/∂y  [w( x, y) − z] =  ∂w/∂y  .
   
∂/∂z −1

Now suppose we have light coming in represented by a vector a with magnitude equal
to the intensity of the light. Then the dot product of the vectors a and v is

a · v = |a| |v| cos θ,

where θ is the angle between the vectors. Thus the intensity of illumination of the
surface of the mountains is
a·v a x (∂w/∂x ) + ay (∂w/∂y) − az
I = |a| cos θ = = p .
|v| (∂w/∂x )2 + (∂w/∂y)2 + 1

Let’s take a simple case where the light is shining horizontally with unit intensity, along
a line an angle φ counter-clockwise from the east-west axis, so that a = (cos φ, sin φ, 0).
Then our intensity of illumination simplifies to

cos φ (∂w/∂x ) + sin φ (∂w/∂y)


I= p .
(∂w/∂x )2 + (∂w/∂y)2 + 1

If we can calculate the derivatives of the height w( x, y) and we know φ we can calculate
the intensity at any point.
a) In the on-line resources you’ll find a file called altitude.txt, which contains the
altitude w( x, y) in meters above sea level (or depth below sea level) of the surface
of the Earth, measured on a grid of points ( x, y). Write a program that reads this
file and stores the data in an array. Then calculate the derivatives ∂w/∂x and
∂w/∂y at each grid point. Explain what method you used to calculate them and
why. (Hint: You’ll probably have to use more than one method to get every grid
point, because awkward things happen at the edges of the grid.) To calculate the
derivatives you’ll need to know the value of h, the distance in meters between
grid points, which is about 30 000 m in this case.7
b) Now using your values for the derivatives calculate the intensity for each grid
point, with φ = 45◦ , and make a density plot of the resulting values in which the
brightness of each dot depends on the corresponding intensity value. If you get
it working right, the plot should look like a relief map of the world—you should
be able to see the continents and mountain ranges in 3D. (Common problems

7
It’s actually not precisely constant because we are representing the spherical Earth on a flat
map, but h = 30 000 m will give reasonable results.

203
C HAPTER 5 | I NTEGRALS AND DERIVATIVES

include a map that is upside-down or sideways, or a relief map that is “inside-


out,” meaning the high regions look low and vice versa. Work with the details of
your program until you get a map that looks right to you.)
c) There is another file in the on-line resources called stm.txt, which contains a grid
of values from scanning tunneling microscope measurements of the (111) surface
of silicon. A scanning tunneling microscope (STM) is a device that measures the
shape of surfaces at the atomic level by tracking a sharp tip over the surface and
measuring quantum tunneling current as a function of position. The end result is
a grid of values that represent the height of the surface as a function of position
and the data in the file stm.txt contain just such a grid of values. Modify the
program you just wrote to visualize the STM data and hence create a 3D picture
of what the silicon surface looks like. The value of h for the derivatives in this case
is around h = 2.5 (in arbitrary units).

204
A PPENDIX A

I NSTALLING P YTHON
appendix explains how to install on your computer the software you
T HIS
will need for programming in the Python programming language. All of
the software is distributed by its makers for free and is available for download
on the Internet. To make best use of the book there are four software pack-
ages you should install: the Python language itself, and the packages “numpy”,
“matplotlib”, and “visual” (also called “vpython” in some places).
There are currently two different versions of Python in circulation, ver-
sion 2 and version 3. This book uses version 3, which is the most recent and
up-to-date version and has some nice features not available in version 2. Un-
fortunately, at the time of writing, version 3 was not quite ready for prime
time, because some packages, including the matplotlib package, had not yet
been updated to work with it. It is possible that this problem will have been
rectified by the time you read this (Python is being rapidly improved and up-
dated all the time), but if it hasn’t one can get around the problem fairly easily.
The trick is to use Python version 2, the older version, which allows us to use
all the packages we need, but to “switch on” the most important features of
version 3, which version 2 allows one to do by including the following line at
the beginning of every program:

from __future__ import division,print_function

(Note the two underscore characters “_” on either side of the word “future”.)
If you include this line in your programs, then version 2 will behave essen-
tially the same as version 3 and all the programs in this book will work fine.
This line is not included explicitly in the programs as they are reproduced in
the book, but if you are using Python version 2—at least until the creators of
the Python packages get around to updating their products—you should add
it at the beginning of all programs. You can find a further discussion of the

460
A PPENDIX A | I NSTALLING P YTHON

differences between Python versions 2 and 3 in Appendix B.


Bearing this in mind, the simplest way to install Python and the additional
packages needed for this book is as follows.
1. Open a web browser and go to www.vpython.org, which is the web page
for the visual package. Click on the “download” link for your operat-
ing system, Windows or Macintosh. (The packages are also available for
users of the Linux operating system. The installation procedure is differ-
ent for Linux, but if you are a Linux user you probably know what you’re
doing better than I do.)
2. The visual download page helpfully gives a link to the correct version
of the programs for the Python language itself. You should download
and install the Python language first, before anything else. As discussed
above, you should install version 2, unless know that you want to use
version 3. Most likely you will use version 2.7 or later.
3. Having installed the Python language you should follow the instruc-
tions to download and install the visual package (also sometimes called
“vpython”). If you are using Windows this will automatically install the
numpy package for you as well. If you are using a Mac you will need to
install numpy separately—see step 5 below.
4. Next you need to install the package matplotlib, which you can find at
www.sourceforge.net/projects/matplotlib/files/matplotlib. You
should click on the link for the latest version of matplotlib, which at the
time of writing is version 1.1.1, and you will be presented with a list of
packages for different computers. Select and install the one that corre-
sponds to your computer and the version of Python that you installed.
For instance, you would click on matplotlib-1.1.1.win32-py2.7.exe
for a Windows computer with Python version 2.7 installed.
5. If you use a Mac, you will also need to install the package numpy. (If you
use Windows, numpy will already have been installed for you when you
installed visual.) You can download the latest version of numpy from
sourceforge.net/projects/numpy/files/numpy and install.

461

You might also like