IQC Masterfile
IQC Masterfile
Sevag Gharibian
July 2021
5 Non-Local Games 33
5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Does entanglement allow superluminal signalling? . . . . . . . . . . . . . . . . . . 34
5.3 Non-local games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.1 The CHSH game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.2 The magic square game . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3.3 Closing thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
i
7.3 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.3.1 A naive idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.3.2 Deutsch’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.3.3 The phase kickback trick . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.4 The Deutsch-Josza algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.5 The Berstein-Vazirani algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bibliography 113
ii
1 Introduction and Linear Algebra Review
I recall that during one walk Einstein suddenly stopped, turned to me and asked
whether I really believed that the moon exists only when I look at it.
— Abraham Pais.
1.1 Introduction
Welcome to Introduction to Quantum Computation! In this course, we shall explore the subject
of quantum computation from a theoretical computer science perspective. As the quote by
Abraham Pais above foreshadows, our story will involve surprising twists and turns, which will
seem completely at odds with your perception of the world around you. Indeed, in a quantum
world, a single particle can be in two places simultaneously; two particles can be so strongly
“bound” that they can appear to communicate instantaneously even if they are light-years
apart; and the very act of ”looking” at a quantum system can irreversibly alter the system
itself! It is precisely these quirks of quantum mechanics which we shall aim to exploit in our
study of computation.
The basic premise of quantum computing is “simple”: To build a computer whose bits are
not represented by transistors, but by subatomic particles such as electrons or photons. In this
subatomic world, the pertinent laws of physics are no longer Newton’s classical mechanics, but
rather the laws of quantum mechanics. Hence, the name “quantum computing”. Why would
we ever want to build such a computer? There are a number of reasons. From an engineering
standpoint, microchip components have become so small that they encounter quantum effects
which impede their functionality. To a physicist, the study of quantum computing is a natural
approach for simulating and studying quantum systems in nature. And to a computer scientist,
quantum computers are remarkable in that they can solve problems which are believed to be
intractable on a classical computer!
The field of quantum computing, which arguably started with famed physicist Richard Feyn-
man’s ideas (1982) for efficiently simulating physical systems (although it should be noted that
ideas for crytography based on quantum mechanics date back to Stephen Wiesner around 1970),
is nowadays far too large to be captured in a single course. Here, we shall focus on a broad
introduction which aims to cover topics such as: What is quantum mechanics, and how can it
be harnessed to build a computer? What kind of computational problems can such a computer
solve? Are there problems which are hard even for a quantum computer? And finally, what
does the study of quantum computing tell us about nature itself? Even if this course is the
last time you encounter the topic of quantum computing, the experience should hopefully leave
you with an appreciation for the fine interplay between the limits of physics and computing, as
well as strengthen your background in Linear Algebra, which is useful in many other areas of
computer science.
The entire course will take place in the mathematical framework of Linear Algebra, which
we now review. It is crucial that you familiarize yourself with these concepts before proceeding
with the course. These notes contain many exercises intended to help the reader; it is strongly
recommended for you to work on these as you read along.
1
1.2 Linear Algebra
This course assumes a basic background in Linear Algebra. Thus, much of what is covered in
this section is intended to be a refresher (although some of the later concepts here may be new
to you); we thus cover this section briskly. Throughout this course, the symbols C, R, Z, and
N denote the sets of complex, real, integer, and natural numbers, respectively. For m a positive
integer, the notation [m] indicates the set {1, . . . , m}.
The basic objects we shall work with are complex vectors |ψ⟩ ∈ Cd , i.e.
ψ1
|ψ⟩ = ... , (1.1)
ψd
for ψi ∈ C. Recall here that a complex number c ∈ C can be written in two equivalent ways:
Either as c = a + bi for a, b ∈ R and i2 = −1, or in its polar form as c = reiθ for r, θ ∈ R. One of
the advantages of the polar form is that it can directly be visualized on the 2D complex plane:
imaginary axis
(cos θ, sin θ)
r
θ
real axis
Here, the x and y axes correspond to real and imaginary axes, and r denotes the length of the
vector (cos θ, sin θ). For example, observe that 1 can be written in polar form with r = 1 and
θ = 0, i.e. 1 is represented in the 2D plane as vector (1, 0). In this course, the polar form will
be used repeatedly. Recall also that the complex conjugate of c, denoted c∗ , is defined as a − bi
or re−iθ , respectively. Finally, the notation |·⟩ is called Dirac notation, named after physicist
Paul Dirac, and is simply a useful convention for referring to column vectors. The term |ψ⟩ is
read “ket ψ”.
√
Exercise 1.1. The magnitude or “length” of c ∈ C is given by |c| = cc∗ . What is the
magnitude of eiθ for any θ ∈ R? How about the magnitude of reiθ ?
Complex vectors shall be crucial to us for a single reason: They represent quantum states
(more details in subsequent lectures). It is thus important to establish some further basic
properties of vectors. First, the conjugate transpose of |ψ⟩ is given by
⟨ψ| = (ψ1∗ , ψ2∗ , . . . , ψd∗ ) , (1.2)
where ⟨ψ| is a row vector. The term ⟨ψ| is pronounced “bra ψ”. This allows us to define how
much two vectors “overlap” via the inner product function, defined as
d
X
⟨ψ|ϕ⟩ = ψi∗ ϕi . (1.3)
i=1
The inner product satisfies (⟨ψ|ϕ⟩)∗= ⟨ϕ|ψ⟩. The “length” of a vector |ψ⟩ can now be quantified
p
by measuring the overlap of |ψ⟩ with itself, which yields the Euclidean norm, ∥ |ψ⟩ ∥2 = ⟨ψ|ψ⟩.
2
Exercise 1.2. Let |ψ⟩ = √1 (1, i)T ∈ C2 , where T denotes the transpose. What is ⟨ψ|? How
2
about ∥ |ψ⟩ ∥2 ?
With a norm in hand, we can define a notion of distance between vectors |ψ⟩, |ϕ⟩, called the
Euclidean distance: ∥ |ψ⟩ − |ϕ⟩ ∥2 . This distance will play the important role of quantifying
how well two quantum states |ψ⟩ and |ϕ⟩ can be “distinguished” via measurements. Two useful
properties of the Euclidean norm are:
2. (Triangle inequality) For any |ψ⟩, |ϕ⟩, one has ∥ |ψ⟩ + |ϕ⟩ ∥2 ≤ ∥ |ψ⟩ ∥2 + ∥ |ϕ⟩ ∥2 .
These two properties can be used to show that for all |ψ⟩ ∈ Cd , ∥ |ψ⟩ ∥2 ≥ 0.
√
3 T
Exercise 1.3. Let |ψ⟩ = √12 (1, i)T ∈ C2 and |ϕ⟩ = ( 21 , 2 ) ∈ C2 , where T denotes the
transpose. What is ∥ |ψ⟩ − |ϕ⟩ ∥2 ?
Orthonormal bases. A much more natural way to represent vectors in this course shall be in
terms of orthonormal bases. Recall that a set of vectors {|ψ⟩i } ⊆ Cd is orthogonal if for all i ̸= j,
⟨ψ|i |ψ⟩j = 0, and orthonormal if ⟨ψ|i |ψ⟩j = δij . Here, δij is the Kroenecker delta, whose value
is 1 if i = j and 0 otherwise. For the vector space Cd , which has dimension d, it is necessary
and sufficient to use d orthonormal vectors in order to form an orthonormal basis.
One of the most common bases we use is the computational basis, defined for C2 as
1 0
|0⟩ = |1⟩ = . (1.4)
0 1
Since {|0⟩, |1⟩} is an orthonormal basis, any vector |ψ⟩ ∈ C2 can be written as |ψ⟩ = α|0⟩ + β|1⟩
for some α, β ∈ C. We say |ψ⟩ is normalized when it has length 1, i.e. ∥ |ψ⟩ ∥2 = 1; equivalently,
this means |α|2 + |β|2 = 1.
Exercise 1.4. Let |ψ⟩ = √1 (1, 1)T ∈ C2 . Write |ψ⟩ in terms of the computational basis for C2 .
2
Is |ψ⟩ normalized?
Linear maps. Given a vector |ψ⟩ ∈ Cd , we are interested in how |ψ⟩ can be “mapped” to other
vectors. The maps
P we consider are linear, which by definition means that for map Φ : Cd → Cd
d
and arbitrary i αi |ψi ⟩ ∈ C ,
!
X X
Φ αi |ψi ⟩ = αi Φ(|ψi ⟩). (1.5)
i i
The set of linear maps from vector space X to Y is denoted L(X , Y). For brevity, we use the
shorthand L(X ) to mean L(X , X ).
3
Exercise 1.5. Consider the linear map Φ : C2 → C2 with action Φ(|0⟩) = |1⟩ and Φ(|1⟩) = |0⟩.
If |ψ⟩ = α|0⟩ + β|1⟩, what is Φ(|ψ⟩)?
The exercise above teaches us an important lesson — the action of a linear map Φ ∈ L(Cd )
is fully characterized by understanding how Φ acts on a basis for Cd . This leads to a natural
representation for Φ in terms of a matrix.
Recall that a d × d matrix A is a two-dimensional array of complex numbers whose (i, j)th
entry is denoted A(i, j) ∈ C for i, j ∈ [d]. To represent a linear map Φ : Cd → Cd as an d × d
matrix AΦ , we use its action on a basis for Cd . Specifically, define the ith column of AΦ as
Φ(|i⟩) for {|i⟩} the standard basis for Cd , or
AΦ = Φ(|0⟩), Φ(|1⟩), . . . , Φ(|d − 1⟩) . (1.6)
In this course, we use both the matrix and linear map views interchangeably, with the application
notion clear from context.
Exercise 1.6. What is the 2 × 2 complex matrix representing the linear map Φ from the
previous exercise? What is the linear map whose matrix (with respect to the computational
basis) is the identity matrix
1 0
I= ? (1.7)
0 1
Note that unlike for scalars, for matrices it is not always true that AB = BA. In the special
case where AB = BA, we say A and B commute.
Do X and Z commute?
We would like to understand how “large” the output space of a linear map A ∈ L(Cd ) is. To
this end, the image of A is the set of all possible output vectors under the action of A, i.e.
n o
Im(A) := |ψ⟩ ∈ Cd | |ψ⟩ = A|ϕ⟩ for some |ϕ⟩ ∈ Cd . (1.10)
Exercise 1.8. Suppose rank(A) = 0. What is Im(A)? How about the case of rank(A) = d?
The set of all vectors sent to zero by A is called its null space, i.e. Null(A) := |ψ⟩ ∈ Cd | A|ψ⟩ = 0 .
4
Exercise 1.9. Is there a non-zero vector in the null space of matrix Z from Equation (1.9)?
(Hint: Multiply an arbitrary vector |ψ⟩ = α|0⟩ + β|1⟩ by Z and see if you can make the zero
vector pop out.) What does the answer tell you about rank(Z)? What is the null space of
matrix
1 0
B= , (1.11)
0 0
and what is rank(B)?
Matrix operations. Matrices encode the operations which we are allowed to perform on our
vectors. There are some simple operations on matrices themselves which will show up repeatedly
in our discussions. The first three of these are the linear maps complex conjugate, transpose
and adjoint, defined respectively as
A∗ (i, j) = (A(i, j))∗ AT (i, j) = A(j, i) A† = (A∗ )T . (1.12)
Note that (AB)† = B † A† , and similarly for the transpose. These operations apply to vectors
as well so that ⟨ψ|, defined in Equation (1.2), is simply |ψ⟩† . The adjoint will especially play a
crucial role in this course.
Exercise 1.10. Calculate X † and Z † for X and Z from Equation (1.9), as well as the adjoint
of
1 2
A= . (1.13)
3 eiπ/2
is simply a linear map Tr : L(Cd ) → C
Another useful function on matrices is the trace, which P
summing the entries on the diagonal of A, i.e. Tr(A) = di=1 A(i, i). A wonderful property of
the trace is that it is cyclic, i.e. Tr(ABC) = Tr(CAB). This implies that even if A and B do
not commute, i.e. AB ̸= BA, it nevertheless holds that Tr(AB) = Tr(BA)!
Exercise 1.11. In a previous exercise, you showed that X and Z do not commute. Compute
Tr(XZ) and Tr(ZX) and verify that they are indeed the same.
Outer products. The Dirac notation lends itself particularly well to an alternate description
of matrices via outer products. For vectors |ψ⟩, |ϕ⟩ ∈ Cd , the outer product is |ψ⟩⟨ϕ| ∈ L(Cd );
unlike the inner product, the outer product yields a d × d matrix. It can be computed straight-
forwardly via the rules of matrix multiplication. For example,
1 1 0 0 0 0
|0⟩⟨0| = 1 0 = and |1⟩⟨0| = 1 0 = . (1.14)
0 0 0 1 1 0
More generally, the matrix |i⟩⟨j| ∈ L(Cd ) has a 1 at position (i, j) and zeroes elsewhere. This
d
P trick: A matrix A ∈ L(C ) written in the computational basis can
yields a simple yet neat
hence be expressed as ij A(i, j)|i⟩⟨j|. It is thus easy to evaluate expressions of the form
X X X
⟨i|A|j⟩ = ⟨i| A(i′ , j ′ )|i′ ⟩⟨j ′ | |j⟩ = A(i′ , j ′ )⟨i|i′ ⟩⟨j|j ′ ⟩ = A(i′ , j ′ )δii′ δjj ′ = A(i, j),
i′ j ′ i′ j ′ i′ j ′
(1.15)
where the third equality follows since {|i⟩} forms an orthonormal basis for Cd . In other words,
⟨i|A|j⟩ simply rips out entry A(i, j)! These types of expressions will be ubiquitous in the setting
of quantum measurements.
5
Exercise 1.12. Observe that X from Equation 1.9 can be written X = |0⟩⟨1| + |1⟩⟨0|. What is
⟨0|X|0⟩? How about ⟨0|X|1⟩? How can you rewrite Tr(X) in terms of expressions of this form?
Eigenvalues and eigenvectors. With outer products in hand, we can discuss one of the most
fundamental tools in our Linear Algebraic toolkit — eigenvalues and eigenvectors. Given any
matrix A ∈ L(Cd ), an eigenvector is a special non-zero vector satisfying the equation
Exercise 1.13. Show that |+⟩ = √12 (|0⟩ + |1⟩) and |−⟩ = √12 (|0⟩ − |1⟩) are both eigenvectors
of X from Equation (1.9). What are their respective eigenvalues?
The outer product can now be used to state an expression which will be used repeatedly in
this course. For any matrix A satisfying AA† = A† A (such matrices are called normal ; most
matrices in this course will be normal), we can decompose A in terms of its eigenvalues and
eigenvectors as
Xd
A= λi |λi ⟩⟨λi |, (1.17)
i=1
where λi and |λi ⟩ are the eigenvalues and corresponding eigenvectors of A. This is called the
spectral decomposition of A. The spectral decomposition is useful for a few reasons. First, it
tells us exactly how A acts on Cd ; this is because the eigenvectors |λi ⟩ ∈ Cd can be chosen1
to form an orthonormal basis for CdP . Thus, any vector |ψ⟩ ∈ Cd can be written in terms
of the eigenvectors of A, i.e. |ψ⟩ = i αi |λi ⟩ for some αi ∈ C. The spectral decomposition
also immediately reveals the rank of A; specifically, rank(A) is just the number of non-zero
eigenvalues
P of A. Finally, Tr(A) has a very simple expression in terms of A’s eigenvalues —
Tr(A) = i λi . Let us quickly prove this last claim:
!
X X X X
Tr(A) = Tr λi |λi ⟩⟨λi | = λi Tr(|λi ⟩⟨λi |) = λi Tr(⟨λi |λi ⟩) = λi . (1.18)
i i i i
Here, the second equality follows since the trace is linear, the third by the cyclic property of
the trace, and the last since the eigenvectors |λi ⟩ are orthonormal.
Exercise 1.14. In the previous exercise, you computed the eigenvectors and eigenvalues of X.
Use these to write down the spectral decomposition of X, and verify that it indeed evaluates
to X. Next, recall that X|0⟩ = |1⟩. Note that |0⟩ = √12 (|+⟩ + |−⟩). Use this and the spectral
decomposition of X to verify that indeed X|0⟩ = |1⟩.
Finally, recall that the eigenvalues of A ∈ L(Cd ) can be computed as the roots of the degree-d
characteristic polynomial of A, pA , defined
1
This statement need not hold for non-normal matrices. In fact, one can prove that a matrix is normal if and
only if it admits a spectral decomposition. Non-normal matrices do, however, admit the more general singular
value decomposition.
6
where the determinant det can be defined recursively as
d
X
det(A) = (−1)i+j A(i, j) det(Aij ). (1.20)
j=1
Here, Aij is the matrix obtained from A by deleting row i and column j, and we define the base
case of this recursion (i.e. a 1 × 1 matrix [c]) as det([c]) = c. This equation holds for any i ∈ [d].
In the special case when d = 2, this reduces nicely to
a b
det = ad − bc. (1.21)
c d
Exercise 1.15. Use Equations (1.19) and (1.21) to compute the eigenvalues of Z from Equa-
tion (1.9). Then, plug these back into Equation (1.16) to solve for the eigenvectors of Z. What
is the spectral decomposition of Z?
Let us close with a simple observation: For any diagonal matrix A (written in the com-
putational basis), the eigenvalues of A are simply the entries on the diagonal of A, and the
eigenvectors are just the computational basis vectors. In the exercise above, this immediately
confirms that the eigenvalues of Z are 1 and −1 with eigenvectors |0⟩ and |1⟩, respectively.
7
2 Introduction to Quantum Mechanics
...the “paradox” is only a conflict between reality and your feeling of what reality
“ought to be.”
— Richard Feynman.
The only restriction is that |ψ⟩ must be a unit vector, i.e. that |α|2 + |β|2 = 1. To summarize,
any unit vector in C2 describes the state of a single qubit.
Exercise 2.1. Among the most commonly used single qubit states are |+⟩ = √1 (|0⟩ + |1⟩) and
2
|−⟩ = √1 (|0⟩ − |1⟩). Verify that these are indeed unit vectors.
2
8
Figure 2.1: A depiction of Schrödinger’s cat.
Aside: Schrödinger’s cat. To demonstrate how strange the concept of quantum superposi-
tion is, in 1935 Austrian physicist Erwin Schrödinger devised a thought experiment, nowadays
infamously referred to as Schrödinger’s cat. The experiment, depicted in1 Figure 2.1, goes as
follows (we give a slight variant suitable to our exposition of quantum mechanics here): Suppose
that we place a cat in a box and close the box (i.e. one cannot look inside the box). In the box,
we place a flask of poison, along with a hammer. The hammer is connected to a mechanism
outside the box, which is controlled by a computer. If the computer is fed input 0, then nothing
happens, and the cat is happy doing whatever it is doing in the box. On the other hand, if the
input is 1, then the hammer falls and breaks the flask, releases the poison, and kills the cat.
And now Schrödinger asked the key question: What if we input a superposition of 0 and 1
to the computer, i.e. the state |0⟩ + |1⟩? If we interpret quantum mechanics literally, then we
conclude that the cat is both alive and dead, at the same time! Of course, this makes absolutely
no sense. Moreover, common sense tells us that if you simply open the box and look inside, we
will find either a cat which is alive or dead, not both. How can this paradox be resolved? Read
on to Postulate 4 to find out!
Finally, thus far we have described the state of a single (2-dimensional) qubit. More generally,
the state of a d-dimensional quantum system, called a qud it, is described by a unit vector
|ψ⟩ ∈ Cd , which can be described as
d−1
X
|ψ⟩ = α0 |0⟩ + α1 |1⟩ + · · · + αd−1 |d − 1⟩ = αi |i⟩,
i=0
where recall |i⟩ ∈PCd denotes the ith computational basis vector and αi ∈ C. Since |ψ⟩ is a unit
vector, we have d−1 2
i=0 |αi | = 1.
9
In other words, the inverse of U is simple to calculate — just take the dagger of U . This
immediately yields a key insight — all quantum gates are reversible.
Among the most common single qubit gates are the following, known as the Pauli gates, after
Austrian-Swiss physicist Wolfgang Pauli:
0 1 0 −i 1 0
X= Y = Z= .
1 0 i 0 0 −1
It follows that |+⟩ and |−⟩ are eigenvectors of X, i.e. X|+⟩ = |+⟩ and X|−⟩ = −|−⟩ (as we
calculated in the last lecture). The spectral decomposition of X is hence X = |+⟩⟨+| − |−⟩⟨−|.
Exercise 2.3. Write |+⟩⟨+| − |−⟩⟨−| out as a matrix to verify that it indeed equals X.
In other words, Z leaves |0⟩ invariant, but injects a “phase” of −1 in front of |1⟩. This also
immediately shows that |0⟩ and |1⟩ are eigenvectors of Z with eigenvalues 1 and −1, respectively.
The Z gate is special in that it allows us to inject a relative phase into a quantum state. For
example,
1 1 1 1 1 1
Z|+⟩ = Z √ |0⟩ + √ |1⟩ = √ Z|0⟩ + √ Z|1⟩ = √ |0⟩ − √ |1⟩ = |−⟩.
2 2 2 2 2 2
By relative phase, we mean that only the amplitude on |1⟩ had its sign changed (or more
generally, was multiplied by a phase eiπ = −1). If all the amplitudes in the state were instead
multiplied by eiπ , then we could simply factor out the eiπ from the entire state — in this case,
we would call eiπ a global phase. It turns out that a global phase is insignificant in that it
cannot be experimentally detected. A relative phase may seemingly also look unimportant -
yet, as we shall see in this course, it is one of the features of quantum mechanics which allows
quantum computers to outperform classical ones!
Finally, we come to a fourth important unitary gate, the Hadamard gate:
1 1 1
H=√ .
2 1 −1
10
The Hadamard gate is special in that it creates superpositions for us. Namely, we have H|0⟩ =
|+⟩ and H|1⟩ = |−⟩. It can also “erase” superpositions, i.e. H|+⟩ = |0⟩ and H|−⟩ = |1⟩. In
other words, H is self-inverse — we have that H 2 = I for I the identity matrix. In fact, the
Pauli matrices are also self-inverse.
Exercise 2.5. Verify that H|0⟩ = |+⟩ and H|1⟩ = |−⟩. Also verify that H 2 = I.
It is very useful to graphically depict sequences of quantum gates via quantum circuits. For
example, here are three circuits:
They correspond to evolutions X|ψ⟩, H|ψ⟩, and HX|ψ⟩, respectively. Each wire in such a
diagram denotes a quantum system, and a box labelled by gate U depicts the action of unitary
U . We think of time going from left to right; for the last circuit above, note that the X appears
on the “left” in the circuit diagram but on the “right” in the expression HX|ψ⟩; this is because
X should be applied first to |ψ⟩, then H.
Exercise 2.6. What single-qubit state does the following circuit output? (Hint: Rather than
explicitly calculating this, try to use your knowledge of the action of H on states |0⟩ and |+⟩,
and the eigenvectors of X.)
|0i H X H
(|ψ⟩ ⊗ |ϕ⟩)(i, j) := ψi ϕj ,
where recall ψi and ϕj are the entries of |ψ⟩ and |ϕ⟩, respectively. Here, you should think of
the pair (i, j) as representing the bits of a single index x ∈ {0, 1, 2, 3}. So for example, (0, 0) is
equivalent to index 0, (0, 1) to index 1, and (1, 1) to index 3. This implies that we can think of
|ψ⟩ ⊗ |ϕ⟩ as having four entries, i.e. |ψ⟩ ⊗ |ϕ⟩ ∈ C4 . Let us demonstrate with some examples:
1 0
1 1 0 1 0 1
|0⟩ ⊗ |0⟩ = ⊗ = 0 and |0⟩ ⊗ |1⟩ = 0 ⊗ 1 = 0 .
0 0
0 0
11
Exercise 2.7. Verify that
0 0
0 1 0 0 0 0
|1⟩ ⊗ |0⟩ = ⊗ = and |1⟩ ⊗ |1⟩ = ⊗ =
0 .
1 0 1 1 1
0 1
Note that in the four equations above, the four-dimensional vectors obtained are just the com-
putational basis vectors for C4 ! This hints at an important fact: If we take orthonormal
bases B1 = {|ψ0 ⟩, |ψ1 ⟩} and B2 = {|ϕ0 ⟩, |ϕ1 ⟩} for C2 , then we can obtain an orthonormal ba-
sis for C4 by tensoring together the elements of B1 and B2 in all four possible combinations,
i.e. {|ψ0 ⟩ ⊗ |ϕ0 ⟩, |ψ0 ⟩ ⊗ |ϕ1 ⟩, |ψ1 ⟩ ⊗ |ϕ0 ⟩, |ψ1 ⟩ ⊗ |ϕ1 ⟩} forms an orthonormal basis for C4 . For
brevity, we shall often drop the notation ⊗ and simply write |ψ⟩ ⊗ |ϕ⟩ = |ψ⟩|ϕ⟩.
Exercise 2.8. Compute the 4-dimensional vectors corresponding to |1⟩ ⊗ |−⟩ and |+⟩ ⊗ |+⟩.
Our discussion thus far generalizes straightforwardly to the case of Cd1 ⊗ Cd2 . Specifically,
for |ψ⟩ ∈ Cd1 and |ϕ⟩ ∈ Cd2 , we have that |ψ⟩ ⊗ |ϕ⟩ ∈ Cd1 d2 . Then, for i ∈ {0, . . . , d1 − 1}
and j ∈ {0, . . . , d2 − 1}, we have (|ψ⟩ ⊗ |ϕ⟩)(i, j) := ψi ϕj . Thus, for example, if we add a third
qubit to our existing two qubit system, then we have a state which lives in C4 ⊗ C2 = C8 . In
fact, for each qubit we add to our system, the dimension grows by a factor of 2, i.e. it grows
n
exponentially — in general, an n-qubit state will correspond to a vector |ψ⟩ ∈ (C)2 ! It is
precisely this exponential growth in complexity which makes it difficult for classical computers
to simulate the mechanics of an n-qubit quantum state — indeed, this was the reason why
physicist Richard Feynman proposed the concept of a quantum computer in 1982 to begin with!
Finally, the tensor product has the following important properties for any |a⟩, |b⟩ ∈ Cd1 and
|c⟩, |d⟩ ∈ Cd2 , which we will use repeatedly:
Exercise 2.9. What is the inner product of |0⟩|1⟩ and |1⟩|0⟩? How about the inner product of
|0⟩|0⟩ and |+⟩|−⟩?
Quantum entanglement. Now that we know how to stitch together a pair of single qubit
states, it turns out we have opened Pandora’s box. For we can now talk about the two-qubit
state which troubled Einstein to the end of his days — the innocuous-looking Bell state:
1
√
1 1 2
+ 0
|Φ ⟩ = √ |0⟩|0⟩ + √ |1⟩|1⟩ = .
2 2 0
√1
2
12
describe the state of q0 or q1 alone — only the joint state of q0 and q1 can be described
precisely. In the language of tensor products, this is captured by the following statement:
There do not exist |ψ1 ⟩, |ψ2 ⟩ ∈ C2 such that |Φ+ ⟩ = |ψ1 ⟩ ⊗ |ψ2 ⟩. In 1935, Einstein, Podolsky
and Rosen published a famous paper nowadays referred to as the “EPR” paper, in which they
argue that quantum mechanics cannot be a complete physical theory because it allows the
existence of states such as |Φ+ ⟩. Fast forwarding a number of decades, we now not only believe
entanglement is real, but we know that is is a necessary resource for quantum computers to
outperform classical ones.
We shall later return to the topic of entanglement, but for now let us remark that there are
three other such Bell states:
1
√
2
1 1
0
|Φ− ⟩ = √ |00⟩ − √ |11⟩ =
0
2 2
− √12
0
1 1 √1
|Ψ+ ⟩ = √ |01⟩ + √ |10⟩ = 2
√1
2 2 2
0
0
1 1 √1
|Ψ− ⟩ = √ |01⟩ − √ |10⟩ = 2 .
− √12
2 2
0
Note that here we have further simplified notation by letting (e.g.) |0⟩|0⟩ = |00⟩. The four Bell
states {|Φ+ ⟩, |Φ− ⟩, |Ψ+ ⟩, |Ψ− ⟩} form an orthonormal basis for C4 known as the Bell basis, after
Northern Irish physicist John Bell.
Exercise 2.10. Verify that the Bell basis indeed forms an orthonormal basis, i.e. check that
the Bell states are pairwise orthogonal unit vectors.
Two-qubit quantum gates. We have seen that two-qubit quantum states are described by unit
vectors in C4 . We can thus discuss two-qubit quantum gates, i.e. unitary operators U ∈ L(C4 ).
There are two types of such gates: The first are simply tensor products of one-qubit gates, such
as X ⊗ Z or H ⊗ H. Here, the tensor product is defined analogously for matrices as it is for
vectors. (The formal description is cumbersome, but we follow with a helpful illustration to
clarify.) For any A ∈ L(Cd1 ), B ∈ L(Cd2 ), A ⊗ B is a d1 d2 × d1 d2 complex matrix whose entries
are indexed by ([d1 ] × [d2 ], [d1 ] × [d2 ]) (where [d] = {0, . . . , d − 1} here), such that
13
Then, A ⊗ B is given by
b1 b2 b1 b2
a1 · b3 a ·
b4 2 b3 b4
A⊗B = .
b1 b2 b1 b2
a3 · a4 ·
b3 b4 b3 b4
In other words, A⊗B is obtained by taking four copies of B, each time multiplying by a different
scalar entry of A.
The tensor product for matrices shares the properties of the tensor product for vectors, with
the addition of two rules below:
(A ⊗ B)(C ⊗ D) = AC ⊗ BD and Tr(A ⊗ B) = Tr(A)Tr(B).
The circuit diagrams for tensor products of unitaries are depicted below: We consider the
cases of X ⊗ I, I ⊗ Z, and H ⊗ H, respectively.
|ψi X |ψi |ψi H
Exercise 2.13. What is the circuit diagram for Z ⊗ Z? What is (X ⊗ X)|0⟩ ⊗ |1⟩? How about
(Z ⊗ Z)|1⟩ ⊗ |1⟩?
Finally, we can also consider genuinely two-qubit gates, i.e. gates which are not the tensor
product of single qubit gates. One important such gate is the controlled-NOT gate, denoted
CNOT. The CNOT treats one qubit as the control qubit, and the other as the target qubit. It
then applies the Pauli X gate to the target qubit only if the control qubit is set to |1⟩. More
precisely, the action of the CNOT on a two-qubit basis is given as follows, where qubit 1 is the
control and qubit 2 is the target:
CNOT |00⟩ = |00⟩ CNOT |01⟩ = |01⟩ CNOT |10⟩ = |11⟩ CNOT |11⟩ = |10⟩.
Exercise 2.14. What is CNOT |Φ+ ⟩ for |Φ+ ⟩ the Bell state? Can you simplify the result to
get answer |+⟩|0⟩?
14
Exercise 2.15. Verify that multiplying |11⟩ by the matrix for CNOT indeed yields |10⟩.
|ψi •
|φi ⊕
With this in hand, we can do our first interesting computation — we can prepare the Bell
state |Φ+ ⟩ starting from an initial state of |0⟩|0⟩! The preparation circuit is given as:
|ψi = |0i H •
|φi = |0i ⊕
Exercise 2.16. Show that applying the preparation circuit above on initial states |01⟩, |10⟩,
and |11⟩ yields the remaining Bell basis states |Ψ+ ⟩, |Φ− ⟩, and |Ψ− ⟩, respectively.
15
3 Measurement and Quantum Teleportation
“I think that a particle must have a separate reality independent of measurements.
That is, an electron has spin, location and so forth even when it is not being mea-
sured. I like to think the moon is there even if I am not looking at it.”
— Albert Einstein
“. . . experiments have now shown that what bothered Einstein is not a debatable point
but the observed behaviour of the real world.”
— N. David Mermin
Exercise 3.1. Verify that Pauli Y is Hermitian. Is the Hadamard gate Hermitian?
16
Exercise 3.2. Prove that Pauli X and Z are not positive semi-definite. (Hint: Recall the
spectral decompositions of X and Z.)
where the last equality follows since {|λi ⟩} is an orthonormal basis. Since the |λi ⟩ are
orthogonal, we thus have that for all i, λi = λ2i . But this can only hold if λi ∈ {0, 1}, as
claimed.
Exercise 3.3. Verify that I, |0⟩⟨0|, and |1⟩⟨1| are all projectors. More generally, show
that for arbitrary unit vector |ψ⟩ ∈ Cd , |ψ⟩⟨ψ| is a projector.
Observe that a projector Π has rank 1 if and only if Π = |ψ⟩⟨ψ| for some |ψ⟩ ∈ Cd , since
the rank of Π equals1 the number of non-zero eigenvalues of Π, and here Π = |ψ⟩⟨ψ| is
a spectral decomposition of Π. Finally,Plet us develop an intuition for what a projector
actually does — for any projector Π = i |ψi ⟩⟨ψi | and state |ϕ⟩ to be measured, we have
!
X X X
Π|ϕ⟩ = |ψi ⟩⟨ψi | |ϕ⟩ = |ψi ⟩(⟨ψi |ϕ⟩) = (⟨ψi |ϕ⟩)|ψi ⟩ ∈ Span({ψi }),
i i i
where note ⟨ψi |ϕ⟩ ∈ C. Thus, Π projects us down onto the span of the vectors {|ψi ⟩}.
Exercise 3.5. Consider three-dimensional vector |ϕ⟩ = α|0⟩ + β|1⟩ + γ|2⟩ ∈ C3 and
Π = |0⟩⟨0| + |1⟩⟨1|. Compute Π|ϕ⟩, and observe that the latter indeed lies in the two-
dimensional space Span({|0⟩, |1⟩}).
Projective measurements. With projectors in hand, we can now define a projective measure-
m Pm
ment. A projective measurement is a set of projectors B = {Πi }i=0 such that i=0 Πi = I. The
latter condition is called the completeness relation. If each Πi is rank one, i.e. Πi = |ψi ⟩⟨ψi |,
then we say that B models a measurement in basis {|ψi ⟩}. Often, we shall measure in the
computational basis, which is specified by B = {|0⟩⟨0|, |1⟩⟨1|} in the case of C2 (and generalizes
as B = {|i⟩⟨i|}d−1 d
i=0 for C ).
1
This holds since Π is Hermitian, and hence also normal.
17
Exercise 3.6. Verify that B = {|0⟩⟨0|, |1⟩⟨1|} is a projective measurement on C2 .
where the second equality follows by the cyclic property of the trace and the third since Πi is
a projector.
Exercise 3.7. Let |ψ⟩ = α|0⟩ + β|1⟩ ∈ C2 . Show that if we measure in the computational basis,
i.e. using B = {|0⟩⟨0|, |1⟩⟨1|}, then the probabilities of obtaining outcomes 0 and 1 are |α|2 and
|β|2 , respectively.
The exercise above has an important moral — requiring a quantum state |ψ⟩ to be a unit vector
(i.e. |α|2 + |β|2 = 1) ensures that when measuring |ψ⟩, the distribution over the outcomes is a
valid probability distribution, i.e. the probabilities for all possible outcomes sum to 1. The other
important take-home message here is that measurements in quantum mechanics are inherently
probabilistic — in general, the outcomes cannot be perfectly predicted!
Finally, we started this lecture by saying that the very act of measuring a quantum state
disturbs the system. Let us now formalize this; we will crucially use the fact discussed earlier
that a projector projects a vector |ψ⟩ down into a smaller subspace. Specifically, upon obtaining
outcome Πi when measuring B, the state of system “collapses” to
Πi |ψ⟩⟨ψ|Πi Πi |ψ⟩⟨ψ|Πi
= . (3.1)
Tr(Πi |ψ⟩⟨ψ|Πi ) Tr(Πi |ψ⟩⟨ψ|)
Note the denominator above is a scalar, and is just the probability of outcome i. There are
two points here which may confuse you: Why have we written the output state as a matrix
Πi |ψ⟩⟨ψ|Πi rather than a vector Πi |ψ⟩, and what is the role of the denominator? Let us handle
each of these in order.
First, conditioned on outcome Πi , the output state is indeed a vector, namely Πi |ψ⟩. However,
there is a more general formalism which we shall discuss shortly called the density operator for-
malism, in which quantum states are written as matrices, not vectors. Specifically, the “density
matrix” representing vector |ψ⟩ would be written as matrix |ψ⟩⟨ψ|. The density operator for-
malism is more general than the state vector approach we have taken so far, and will be crucial
for studying individual subsystems of a larger composite quantum state. Thus, the answer to
question 1 is that we have written the output as a matrix simply to slowly ease the transition
into the density matrix formalism.
The motive behind question 2 is somewhat less sinister — the problem here is that since we
projected out part of |ψ⟩ during the measurement, the output Πi |ψ⟩ may not necessarily be
normalized. To renormalize Πi |ψ⟩, we simply divide by its Euclidean norm to obtain
The state |ψ ′ ⟩ describes the post-measurement state of our system, assuming we have obtained
outcome i.
18
Exercise 3.8. Show that (the density matrix) |ψ ′ ⟩⟨ψ ′ | equals the expression in Equation (3.1).
Exercise 3.9. Let |ψ⟩ = α|0⟩ + β|1⟩ ∈ C2 . Show that if we measure in the computational basis,
i.e. using B = {|0⟩⟨0|, |1⟩⟨1|}, and obtain outcome i ∈ {0, 1}, then the post-measurement state
is |i⟩ (or |i⟩⟨i| in density matrix form).
A final quirk we should iron out is the following — in terms of measurements, what is the
consequence of the fact that a projector Πi satisfies Π2i = Πi ? Well, if you observe a quantum
system now, and then again five minutes from now, and if the system has not been subjected to
any gates or noise in between the measurements, then the two measurement results you obtain
should agree (I realize the study of quantum mechanics has likely distorted your view of when
you can trust your intuition, but this is one case in which you can). To model this, suppose
we measure using B = {Πi } and obtain results i and j in measurements 1 and 2, respectively.
Then:
Πi |ψ⟩⟨ψ|Πi Tr (Πj Πi |ψ⟩⟨ψ|Πi Πj )
Pr(outcome j | outcome i) = Tr Πj Πj = (3.2)
Tr(Πi |ψ⟩⟨ψ|) Tr(Πi |ψ⟩⟨ψ|)
To simplifyP this expression, we use the fact that if the completeness relation holds for projectors
{Πi }, i.e. i Πi = I, then it turns out that Πj Πi = δij Πi , where recall δij is the Kronecker
delta. Thus, if i ̸= j, Equation (3.2) equals 0, and if i = j, it reduces to
Tr Π2i |ψ⟩⟨ψ|Π2i
Tr (Πj Πi |ψ⟩⟨ψ|Πi Πj ) Tr (Πi |ψ⟩⟨ψ|Πi ) Tr (Πi |ψ⟩⟨ψ|)
= = = = 1,
Tr(Πi |ψ⟩⟨ψ|) Tr(Πi |ψ⟩⟨ψ|) Tr(Πi |ψ⟩⟨ψ|) Tr(Πi |ψ⟩⟨ψ|)
i.e. measuring the state a second time again yields outcome i with probability 1, as desired.
Thus, although observing a state for the first time disturbs it, subsequent measurements will
consistently return the same measurement result!
Exercise 3.10. Suppose we measure |0⟩ in basis B = {|+⟩⟨+|, |−⟩⟨−|}. What are the probabil-
ities of outcomes + and −, respectively? What are the post-measurement states if one obtains
outcome + or −, respectively?
We close this section by giving the circuit symbol which denotes a measurement of a qubit
|ψ⟩ ∈ C2 in the computational basis {|0⟩⟨0|, |1⟩⟨1|}:
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤ ✙✙ ✤✤
|ψi ✤✤ ✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
✤
The double-wires on the right side indicate that the output of the measurement is a classical
string (indicating which measurement outcome was obtained).
19
your state. How can you send it to her? One obvious way is simply to pop your system in
the mail and physically send it over. However, it turns out that by exploiting the phenomenon
of entanglement, you can do something incredible — by sending two classical bits over the
telephone to Alice, you can “teleport” |ψ⟩ instantly to her!
To teleport |ψ⟩, we assume that you and Alice each share half of a Bell state |Φ+ ⟩ =
1
√ (|00⟩ + |11⟩) to begin with; specifically, you hold qubit 1 of |Φ+ ⟩, and Alice holds qubit
2
2. The teleportation circuit is then given as follows:
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤ ✙✙ ✤✤
|ψi • H ✤✤ ✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
✤ •
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
|Φ+ ✙✙
✤✤ ✤✤
Ai ⊕ ✤✤ ✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
✤ •
|Φ+
Bi X Z
Let us break this down piece by piece. The first two wires are held by you; wire 1 holds the
state to be teleported, |ψ⟩, and wire 2 holds your half of |Φ+ ⟩. The third wire holds Alice’s
half of |Φ+ ⟩. Note that we have used |Φ+ + +
A ⟩ and |ΦB ⟩ to denote the two “halves” of |Φ ⟩, but
+ + +
this is poor notation — read literally, this diagram suggests |Φ ⟩ = |ΦA ⟩ ⊗ |ΦB ⟩, which is not
true since |Φ+ ⟩ is entangled, and hence from last lecture we know that there do not exist states
|Φ+ + + + +
A ⟩, |ΦB ⟩ such that |Φ ⟩ = |ΦA ⟩ ⊗ |ΦB ⟩. This notation is chosen simply because I was unable
+
to find a way to correctly display |Φ ⟩ across wires 2 and 3 in the brief time I dedicated to the
task.
The circuit can be divided into 5 steps: Step 1 performs the CNOT, Step 2 the Hadamard
gate, Step 3 measures qubits 1 and 2, Step 4 applies a conditional X gate, and Step 5 applies
a conditional Z gate. The latter two require clarification: The conditional X gate here takes a
classical bit b as input (hence the incoming wire at the top is a double line), and applies X if
and only if b = 1. The conditional Z gate is defined analogously.
Now that we have technically parsed this diagram, let us intuitively parse it. First, you begin
in Steps 1 and 2 by performing a CNOT and Hadamard on your qubits, followed by a standard
basis measurement in Step 3. Since a measurement in the standard basis maps each qubit to
either |0⟩ or |1⟩, the output of your two measurements can jointly be thought of as one of the
four bit strings 00, 01, 10, or 11. Call these bits b0 b1 . Now you pick up the telephone, call
Alice, and tell her the value of b0 b1 . Conditioned on b0 , she applies X to her half of the Bell
pair, followed by Z conditioned on b1 . The claim is that at this point, Alice’s qubit’s state has
been magically converted to |ψ⟩. In fact, as we shall see shortly, |ψ⟩ has also disappeared from
your possession! In this sense, teleportation has taken place.
Let us formally analyze the action of this circuit. Denote by |ψi ⟩ for i ∈ {0, . . . , 5} the joint
state of your and Alice’s systems immediately after Step i has taken place. Here, we define |ψ0 ⟩
as the initial joint state before any gates are applied; it is given by
+ 1 1 1
|ψ0 ⟩ = |ψ⟩|Φ ⟩ = (α|0⟩+β|1⟩) √ |00⟩ + √ |11⟩ = √ (α|000⟩ + α|011⟩ + β|100⟩ + β|111⟩) .
2 2 2
After Step 1, i.e. after the CNOT, we have state
1
√ (α|000⟩ + α|011⟩ + β|110⟩ + β|101⟩) .
2
20
After Step 2, i.e. after the Hadamard gate, we have
1
√ (α|+⟩|00⟩ + α|+⟩|11⟩ + β|−⟩|10⟩ + β|−⟩|01⟩)
2
1
= (α(|0⟩ + |1⟩)|00⟩ + α(|0⟩ + |1⟩)|11⟩ + β(|0⟩ − |1⟩)|10⟩ + β(|0⟩ − |1⟩)|01⟩)
2
1
= (|00⟩(α|0⟩ + β|1⟩) + |01⟩(α|1⟩ + β|0⟩) + |10⟩(α|0⟩ − β|1⟩) + |11⟩(α|1⟩ − β|0⟩)) .(3.3)
2
Let us now pause and analyze the state of affairs. There are four terms in this superposition,
each of which begins with a distinct bit string |00⟩, |01⟩, |10⟩, or |11⟩. This means that if you
now measure qubits 1 and 2 in the standard basis and obtain outcome (say) |00⟩, then Alice’s
qubit on wire 3 collapses to the only consistent possibility, α|0⟩+β|1⟩. In this case, teleportation
has already taken place.
Exercise 3.11. Let |ϕ⟩ ∈ (C2 )⊗3 denote the state in Equation (3.3). Suppose you now measure
qubits 1 and 2 in the standard basis. This can be modelled by projective measurement
where we have I on qubit 3 since we are not measuring it. Show that the probability of outcome
00 is 1/4. Next, show that conditioned on outcome 00, the post-measurement state collapses to
|00⟩(α|0⟩ + β|1⟩).
More generally, the four possible outcomes upon measuring qubits 1 and 2 result in four distinct
residual states on Alice’s qubit as follows:
Thus, if you simply send the two bits b0 b1 encoding the measurement outcome to Alice, then
regardless of the value of b0 b1 , she can recover your original state α|0⟩ + β|1⟩ via the following
identities:
In other words, by conditionally applying X and Z based on the outputs b0 b1 from your mea-
surement, Alice can successfully recover your state |ψ⟩ = α|0⟩ + β|1⟩. This is precisely what is
depicted in Steps 4 and 5 of the teleportation circuit. Finally, note that since measuring your
qubits leaves you in one of the four standard basis states |00⟩, |01⟩, |10⟩, |11⟩, the state |ψ⟩ has
now “disappeared” from your possession!
21
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤ ✙✙ ✤✤
|ψi ✤✤ ✙✙ ✙
✤❴ ❴ ❴ ❴ ❴✙ ❴ ❴ ❴ ✤✤
✤
,
whereas more generally projective measurements allow measuring in an arbitrary basis B =
{|ψ1 ⟩, |ψ2 ⟩} ⊆ C2 . It turns out that without loss of generality, we can restrict ourselves to
measurements in the standard basis as follows. For this, we require an important fact about
unitary matrices, which intuitively says that a unitary matrices maps orthonormal bases to
orthonormal bases:
Lemma 3.13. Let B1 = {|ψi ⟩}di=1 ⊆ Cd be an orthonormal basis. Then, for any unitary
U ∈ L(Cd ), B2 = {U |ψi ⟩}di=1 ⊆ Cd is an orthonormal basis.
Proof. Since U is unitary, we have U † U = I. Therefore, for any i, j ∈ {1, . . . , d}
(⟨ψi |U † )(U |ψj ⟩) = ⟨ψi |ψj ⟩ = δij ,
In particular, for any orthonormal bases B1 = {|ψi ⟩}di=1 ⊆ Cd and B2 = {|ϕi ⟩}di=1 ⊆ Cd , there
exists a unitary U mapping B1 to B2 . This unitary is given by
d
X
U= |ϕi ⟩⟨ψi |.
i=1
Exercise 3.14. Verify that for any i ∈ {1, . . . , d}, U |ψi ⟩ = |ϕi ⟩. Verify that U is unitary,
† d
P that U U = I. (Hint: Use the fact that for any orthonormal basis {|ϕi ⟩}i=1 , we have
i.e.
i |ϕi ⟩⟨ϕi | = I.)
Exercise 3.15. Give a quantum circuit for measuring a qubit in the {|+⟩, |−⟩} basis.
Returning to the teleportation circuit, let us now observe that Steps 1 and 2 together are simply
the reverse of our circuit from Lecture 2 which mapped the standard basis to the Bell basis.
In other words, Steps 1 and 2 in teleportation map the Bell basis back to the standard basis
— thus, Steps 1, 2, and 3 are simply measuring the first two qubits in the Bell basis, i.e. by
projectors B = {|Φ+ ⟩⟨Φ+ |, |Ψ+ ⟩⟨Ψ+ |, |Φ− ⟩⟨Φ− |, |Ψ− ⟩⟨Ψ− |}.
22
3.4 Revisiting Schrödinger’s cat
In Lecture 2, we introduced Schrödinger’s cat, which through the magic of superposition, man-
aged to be both alive and dead simultaneously, so long as the box containing the cat remained
sealed. This begged the question: What happens if one opens the box and looks inside? Surely,
one does not expect to find both a living and dead cat. The paradox here can finally be resolved
via the measurement postulate — the act of opening the box and looking at the cat is itself a
measurement. This measurement effectively collapses the superposition of dead and alive, leav-
ing simply a cat which is either dead or alive. Note here that the very act of looking at something
implies a measurement — this is because in order to “observe” the cat, photons must interact
with the cat, and subsequently reach your eyes. This interaction and subsequent processing of
the information contained in the photons by your brain “constitutes” a measurement.
23
4 No Cloning, Entanglement, and Density
Matrices
“[Entanglement is] not just one of many traits, but the characteristic trait of quan-
tum physics. . . ”
— Erwin Schrödinger
The primary focus of the next two lectures is to further practice working with quantum states
and measurements, a natural avenue for which is the study of cloning, quantum entanglement,
and non-local games. In the process, we will run into a more general formalism for describing
quantum states, known as the density operator formalism. We begin in this lecture with cloning,
quantum entanglement, and density operators.
|ϕ⟩ = U (|ψ1 ⟩ ⊗ |0⟩) = |ψ1 ⟩ ⊗ |ψ1 ⟩ and |ϕ′ ⟩ = U (|ψ2 ⟩ ⊗ |0⟩) = |ψ2 ⟩ ⊗ |ψ2 ⟩.
What happens if we now take the inner product ⟨ϕ|ϕ′ ⟩? The middle terms of these equations
yield
(⟨ψ1 | ⊗ ⟨0|)U † U (|ψ2 ⟩ ⊗ |0⟩) = (⟨ψ1 | ⊗ ⟨0|)(|ψ2 ⟩ ⊗ |0⟩) = ⟨ψ1 |ψ2 ⟩⟨0|0⟩ = ⟨ψ1 |ψ2 ⟩.
(⟨ψ1 | ⊗ ⟨ψ1 |)(|ψ1 ⟩ ⊗ |ψ1 ⟩) = ⟨ψ1 |ψ2 ⟩⟨ψ1 |ψ2 ⟩ = (⟨ψ1 |ψ2 ⟩)2 .
Combining these two, we have that ⟨ψ1 |ψ2 ⟩ = (⟨ψ1 |ψ2 ⟩)2 for some complex number ⟨ψ1 |ψ2 ⟩ ∈ C.
This is equivalent to saying c = c2 for c ∈ C, which has only two solutions — either c = 1 or
c = 0. In other words, unless |ψ1 ⟩ and |ψ2 ⟩ are the same vector (in which case ⟨ψ1 |ψ2 ⟩ = 1)
or are orthogonal (i.e. ⟨ψ1 |ψ2 ⟩ = 0), we have a contradiction. Thus, there does not exist a U
which can perform the mapping |ψ⟩ ⊗ |0⟩ 7→ |ψ⟩ ⊗ |ψ⟩ for arbitrary |ψ⟩.
24
Exercise 4.1. This section showed that cloning arbitrary states |ψ⟩ ∈ C2 is impossible. How-
ever, the proof technique failed when the states |ψ1 ⟩ and |ψ2 ⟩ are orthogonal. Indeed, it turns
out in this case cloning is possible. Give a quantum circuit which can clone states |0⟩ and |1⟩,
i.e. maps |0⟩|0⟩ 7→ |0⟩|0⟩ and |1⟩|0⟩ 7→ |1⟩|1⟩. How about a circuit which clones |+⟩ and |−⟩?
(Hint: Convert the +/− basis into the standard basis first, then apply your cloning circuit for
the standard basis.)
History. First, a bit of history. The term “entanglement” was coined in 1935 by physicist
Erwin Schrödinger, who used the term “Vershränkung”, which in colloquial “non-physicist”
German means “folding of the arms”. As discussed in a previous lecture, the question of
whether entanglement truly exists has been a subject of intense debate. The famous Einstein-
Podolsky-Rosen paper of 1935, in particular, argued that quantum mechanics could not be a
complete physical theory due to its prediction of entangled states such as |Φ+ ⟩. In 1964, however,
physicist John Bell proposed what is now known as a “Bell inequality” or “Bell test”; this test
could in principle be run in a lab to confirm whether Nature indeed allows the strong type of
correlations between qubits which entanglement would allow. Recently, such tests have been
studied under the guise of “non-local games”, which is a topic we shall soon visit. Moreover, it
is important to note that we nowadays know that any quantum computation (on pure quantum
states) cannot achieve an exponential speedup over classical computers unless the amount of
quantum entanglement in the system is “large”. Thus, entanglement is generally regarded as
an important resource.
Bipartite entanglement. In this lecture, we study the notion of bipartite entanglement, mean-
ing entanglement between a pair of quantum systems. Let |ψ⟩ ∈ Cd1 ⊗ Cd2 be an arbitrary
bipartite state. How can we tell if |ψ⟩ is entangled? Strictly speaking, earlier we defined |ψ⟩ as
entangled if it cannot be written as the tensor product of two states |ψ1 ⟩ ∈ Cd1 and |ψ2 ⟩ ∈ Cd2 ,
i.e.
∀ |ψ1 ⟩ ∈ Cd1 , |ψ2 ⟩ ∈ Cd2 we have |ψ1 ⟩ ⊗ |ψ2 ⟩ ≠ |ψ⟩.
For example, let us confirm that |Φ+ ⟩ = √12 (|00⟩ + |11⟩) is entangled. Again, we give a proof
by contradiction. Suppose there exist |ψ1 ⟩ ∈ C2 , |ψ2 ⟩ ∈ C2 such that |ψ1 ⟩ ⊗ |ψ2 ⟩ = |Φ+ ⟩. If we
write |ψ1 ⟩ = α1 |0⟩ + β1 |1⟩ and |ψ2 ⟩ = α2 |0⟩ + β2 |1⟩, we have
|ψ1 ⟩ ⊗ |ψ2 ⟩ = (α1 |0⟩ + β1 |1⟩) ⊗ (α2 |0⟩ + β2 |1⟩) = α1 α2 |00⟩ + α1 β2 |01⟩ + β1 α2 |10⟩ + β1 β2 |11⟩.
Since this supposed to equal |Φ+ ⟩ = √12 (|00⟩ + |11⟩), we must have α1 β2 = β1 α2 = 0 (since the
standard basis vectors |00⟩, |01⟩, |10⟩, |11⟩ are orthogonal). Suppose, without loss of generality,
that β2 = 0, so that α1 β2 = 0. Then, we must also have β1 = 0 so that β1 α2 = 0 (the other
option is to set α2 = 0, but this would yield α2 = β2 = 0, i.e. |ψ2 ⟩ is not a unit vector). But if
̸ |Φ+ ⟩! Thus, we have
β1 = β2 = 0, then |ψ1 ⟩ = |0⟩ and |ψ2 ⟩ = |0⟩, so that |ψ1 ⟩ ⊗ |ψ2 ⟩ = |00⟩ =
a contradiction, and so |Φ+ ⟩ is entangled.
25
The Schmidt decomposition. Rather than go through such proofs each time we wish to check
if |ψ⟩ is entangled, there is a more elegant tool we can use known as the Schmidt decomposition.
To keep our discussion simple, we shall restrict our attention to the setting of two qubits;
the ideas here extend straightforwardly to arbitrary local dimensions d1 and d2 . The Schmidt
decomposition states that any vector |ψ⟩ ∈ C2 ⊗ C2 can be written in the following special form:
where the αi are real and satisfy αi ≥ 0, and where {|ai ⟩}2i=1 and {|bi ⟩}2i=1 are orthonormal
bases for C2 . The number of non-zero αi is called the Schmidt rank of |ψ⟩.
Exercise 4.2. Verify that |00⟩ and |Φ+ ⟩ = √12 (|00⟩ + |11⟩) are already written in terms of
Schmidt decompositions. What are the Schmidt ranks of these two states?
Observe now that if |ψ⟩ has Schmidt rank 1, then clearly it is not entangled, as its Schmidt
decomposition is in tensor product form. And this is no coincidence — it turns out that an
arbitrary state |ψ⟩ is entangled if and only if its Schmidt rank is strictly greater than 1.
In this course, we shall not be tasked with finding Schmidt decompositions of states. However,
like the spectral decomposition for matrices, when doing proofs it is often useful to know that
bipartite states |ψ⟩ can always be written in this way. Let us close by pointing out a subtle fact
about Schmidt decompositions. Note that any vector |ψ⟩ ∈ C2 ⊗ C2 can be written as
This decomposition requires four terms (i.e. one per standard basis vector for C4 ). Part of the
power of the Schmidt decomposition is that it manages to write this same state using just two
terms.
26
can you describe the quantum state in your possession, given that you don’t know whether you
actually have |ψ1 ⟩ or |ψ2 ⟩? This is done via the density operator
1 1
ρ = |ψ1 ⟩⟨ψ1 | + |ψ2 ⟩⟨ψ2 |. (4.1)
2 2
MoreP generally, if we play this game with m possible states |ψi ⟩, each given with probability pi
(i.e. i pi = 1 and pi ≥ 0), then the density operator describing your system is
m
X
ρ= pi |ψi ⟩⟨ψi |.
i=1
Such a state is called mixed because you don’t know with certainty which |ψi ⟩ you have in your
possession. Let us make two important observations here.
P
P A mixture of states i pi |ψi ⟩⟨ψi | is entirely different than a
1. Mixtures vs. superpositions.
superposition of states i αi |ψi ⟩. For starters, the former is a sum of matrices, whereas
the latter is a sum of vectors. More importantly, the former models a state of ignorance
about which |ψi ⟩ we actually have in our possession — we know our system is in precisely
one such |ψi ⟩, but which one is unknown. In stark contrast, in a superposition, our system
is in all of the states |ψi ⟩ simultaneously.
2. Pure states. If there is only one state |ψi ⟩ in the mixture, i.e. pi = 1 for some i, then the
mixture simply reads ρ = |ψi ⟩⟨ψi |. This state is called pure and has rank 1. Conversely,
for any pure state |ψ⟩, its density matrix is the rank 1 operator ρ = |ψ⟩⟨ψ|.
Let us stress the first point above with a concrete example. The density matrix ρ = 12 |0⟩⟨0| +
1
2 |1⟩⟨1| is:
1
1 1 1 1 0 1 0 0 2 0 I
ρ = |0⟩⟨0| + |1⟩⟨1| = + = 1 = .
2 2 2 0 0 2 0 1 0 2 2
Note that the former is a matrix, while the latter is a vector, i.e. these are completely different
objects!
Exercise 4.4. What is the density matrix for pure state |ψ⟩ = α|0⟩ + β|1⟩? Next, write down
the 2 × 2 density matrix ρ = 13 |0⟩⟨0| + 32 |+⟩⟨+|.
Let us now generalize our discussion further. A linear operator ρ ∈ L(Cd ) is called a density
matrix if the following two properties hold:
• ρ is positive semi-definite, i.e. is Hermitian and has non-negative real eigenvalues, and
27
Let us check that the density operators we have considered thus far satisfy these two properties.
For example, for ρ from Equation (4.1), we have
1 1 1 1 1 1
Tr(ρ) = Tr |ψ1 ⟩⟨ψ1 | + |ψ2 ⟩⟨ψ2 | = Tr (|ψ1 ⟩⟨ψ1 |)+ Tr (|ψ2 ⟩⟨ψ2 |) = ⟨ψ1 |ψ1 ⟩+ ⟨ψ2 |ψ2 ⟩ = 1.
2 2 2 2 2 2
As for the property of being positive semidefinite, we use the following fact: For any two positive
semi-definite matrices A and B and real numbers p, q ≥ 0, it holds that pA + qB is also positive
semi-definite. In particular, since both |ψ1 ⟩⟨ψ1 | and |ψ2 ⟩⟨ψ2 | are positive semi-definite (as you
will show in the exercise below), we have that ρ is also positive semi-definite. Thus, ρ is a valid
density operator.
Exercise 4.5. Why is |ψ⟩⟨ψ| positive semi-definite for any |ψ⟩? (Hint: Use the spectral de-
composition).
In fact, the requirements that Tr(ρ) = 1 and ρ be positive semidefinite allow us to recover
exactly the interpretation of mixed states which this section started with — taking the spectral
decomposition of ρ, we have X
ρ= λi |λi ⟩⟨λi |.
i
P
Since ρ is positive semidefinite, we know λi ≥ 0 for P all i. Moreover, recalling that Tr(ρ) = i λi ,
we know that since Tr(ρ) = 1, it holds that i λ i = 1. Thus, {λ i } i forms a probability
distribution. This means that we can interpret ρ as follows: With probability λi , prepare state
|λi ⟩. This is precisely the idea we started this section with! We thus have a good picture of how
one can describe probabilistic mixtures over pure quantum states.
Exercise 4.6. Why are the Pauli gates X, Y , and Z not density matrices?
The maximally mixed state. Finally, let us consider a very special density matrix in L(Cd ),
the maximally mixed state ρ = I/d (where I denotes the identity matrix). Note that ρ is
positive semidefinite since I is positive semidefinite (in fact, I has eigenvalues all equal to 1),
and Tr(I/d) = d1 Tr(I) = 1. Thus, ρ is a valid density operator. But what exactly does I/d
represent? Here, we use the fact that for any orthonormal basis {|ψi ⟩}di=1 for Cd , we have
d
X
|ψi ⟩⟨ψi | = I.
i=1
In other words, for any orthonormal basis {|ψi ⟩}di=1 , ρ represents the following state: Pick state
|ψi ⟩ with probability 1/d, and prepare |ψi ⟩. Since this holds for any basis, we conclude that
ρ gives us absolutely no information about which state |ψ⟩ we actually have — every state is
an eigenvector of ρ, and the eigenvalues of ρ form a uniform distribution. Thus, the maximally
mixed state represents the case where we know nothing about the state of our system! This
will be crucial when we discuss entanglement again below.
28
4.3.1 The partial trace operation
Our motivation in introducing density matrices stemmed from the fact that we did not know
how to describe the state of qubit 1 in the Bell pair |Φ+ ⟩ = √12 (|00⟩ + |11⟩). We claimed that
the answer lies in the framework of density matrices, but we have not yet prescribed how one
can use density matrices to describe the state of qubit 1 of |Φ+ ⟩. We now do this via the partial
trace operation.
Intuitively, given a bipartite density matrix ρAB on systems A and B, the partial trace
operation TrB (ρAB ) returns a density matrix on system A alone (analogously, TrA (ρAB ) returns
a density matrix on B alone). Formally, we have that
How exactly is TrB defined? Recall that the trace of a matrix ρ ∈ L(Cd ) is defined as
X d
X
Tr(ρ) = ρ(i, i) = ⟨i|ρ|i⟩.
i i=1
For the partial trace, we copy this idea, except we only apply it to one subsystem as follows:
For ρ ∈ L(Cd1 ⊗ Cd2 ),
Xd2
TrB (ρ) = (IA ⊗ ⟨i|)ρ(IA ⊗ |i⟩).
i=1
In other words, we leave system A untouched (hence the IA terms above), and “trace out”
system B. Note that like the trace, the partial trace is a linear map. Let us practice this
operation on a number of states, and in the process draw our desired connection to the Bell
state which we started this lecture with.
Example 1: Product states. Suppose first that ρ is the density matrix of a pure product state
|ψ⟩ = |ψ1 ⟩ ⊗ |ψ2 ⟩. In other words,
Recall that earlier we said the state of |ψ⟩ on qubit 1 is exactly |ψ1 ⟩. Let us confirm this by
applying the partial trace to ρ to trace out subsystem B:
where the second equality applies the definition of the partial trace, the second-last equality
uses the definition of the trace, and the last equality uses the cyclic property of the trace. In
29
other words, the state of qubit 1 is |ψ1 ⟩, as claimed! Note that the calculation above did not
use the fact that |ψ⟩ = |ψ1 ⟩ ⊗ |ψ2 ⟩ is pure — indeed, the same calculation yields that for any
operator ρ = ρ1 ⊗ ρ2 ,
TrB (ρ) = ρ1 · Tr(ρ2 ). (4.2)
Exercise 4.7. Show that for any ρ = ρ1 ⊗ ρ2 , TrA (ρ) = Tr(ρ1 ) · ρ2 . (Hint: Follow the chain of
equalities in the calculation above.)
Example 2: Separable states. We have said that a pure state |ψ⟩ ∈ Cd1 ⊗ Cd2 is not entangled,
or separable, if and only if |ψ⟩ = |ψ1 ⟩ ⊗ |ψ2 ⟩ for some |ψ1 ⟩ ∈ Cd1 and |ψ2 ⟩ ∈ Cd2 . This idea
extends to the setting of mixed states as follows: A bipartite density matrix ρ ∈ L(Cd1 ⊗ Cd2 )
is unentangled or separable if
X
ρ= pi |ψi ⟩⟨ψi | ⊗ |ϕi ⟩⟨ϕi |,
i
for some (possibly non-orthogonal) sets of vectors {|ψi ⟩} ⊆ Cd1 and {|ϕi ⟩} ⊆ Cd2 , and where
the {pi } form a probability distribution. In other words, ρ is a probabilistic mixture of pure
product states. An example of a separable state is
1 1
ρ = |0⟩⟨0| ⊗ |0⟩⟨0| + |1⟩⟨1| ⊗ |1⟩⟨1|. (4.3)
2 2
Since the partial trace is a linear map, and since we know that TrB (ρ1 ⊗ ρ2 ) = ρ1 · Tr(ρ2 ) = ρ1
for density matrices ρ1 , ρ2 , computing the partial trace of ρ for separable states is simple:
!
X X X X
TrB pi |ψi ⟩⟨ψi | ⊗ |ϕi ⟩⟨ϕi | = pi TrB (|ψi ⟩⟨ψi | ⊗ |ϕi ⟩⟨ϕi |) = pi |ψi ⟩⟨ψi |·Tr(|ϕi ⟩⟨ϕi |) = pi |ψi ⟩⟨ψi |.
i i i i
Let us make one further comment about mixed separable states. We saw earlier that it is
“easy” to check if a bipartite pure state |ψ⟩ ∈ Cd ⊗ Cd is entangled by determining its Schmidt
rank (we did not explicitly state this, but the Schmidt rank can be calculated in time polynomial
in the dimension d). In stark contrast, it turns out that determining whether a mixed state
ρ ∈ L(Cd ⊗ Cd ) is separable is NP-hard! This is another strong reminder that the settings of
pure versus mixed states are indeed very different; the latter is typically much more difficult to
work with.
Example 3: Pure entangled states. We finally arrive at the case we started this lecture with:
Pure entangled states such as |Φ+ ⟩. To compute the partial trace of any pure bipartite |ψ⟩, one
approach is to simply write out |ψ⟩ in the standard basis, take its density matrix ρ = |ψ⟩⟨ψ|,
30
and then compute TrB (ρ). For example, for |Φ+ ⟩, we have
1
TrB (|Φ+ ⟩⟨Φ+ |) = TrB ((|00⟩ + |11⟩)(⟨00| + ⟨11|)
2
1
= TrB (|00⟩⟨00| + |00⟩⟨11| + |11⟩⟨00| + |11⟩⟨11|)
2
1 1 1 1
= TrB (|00⟩⟨00|) + TrB (|00⟩⟨11|) + TrB (|11⟩⟨00|) + TrB (|11⟩⟨11|)
2 2 2 2
1 1 1 1
= TrB (|0⟩⟨0| ⊗ |0⟩⟨0|) + TrB (|0⟩⟨1| ⊗ |0⟩⟨1|) + TrB (|1⟩⟨0| ⊗ |1⟩⟨0|) + TrB (|1⟩⟨1| ⊗ |1⟩⟨
2 2 2 2
1 1 1 1
= |0⟩⟨0|Tr(|0⟩⟨0|) + |0⟩⟨1|Tr(|0⟩⟨1|) + |1⟩⟨0|Tr(|1⟩⟨0|) + |1⟩⟨1|Tr(|1⟩⟨1|)
2 2 2 2
1 1
= |0⟩⟨0| + |1⟩⟨1|
2 2
1
= I,
2
where the third equality follows from the linearity of the partial trace, the fifth by Equation (4.2),
and the sixth by the cyclic property of the trace and since |0⟩ and |1⟩ are orthogonal. Thus, the
reduced state on qubit 1 for the Bell state is maximally mixed ! In other words, it is a completely
random state about which we have zero information. In a similar fashion, one can show that
TrA (|Φ+ ⟩⟨Φ+ |) = I/2.
And here we arrive at one of the most confounding aspects of quantum mechanics: As in the case
of the Bell state, it is possible for us to know absolutely nothing about the states of qubits 1 and
2 individually (e.g. they have reduced states I/2), but when we bring both qubits together, we
know everything about their joint state (e.g. |Φ+ ⟩ is a pure state, i.e. there is no uncertainty).
This is yet another striking example of a quantum phenomenon which does not occur in the
classical world.
where αi ≥ 0 and {|ai ⟩} and {|bi ⟩} are orthonormal sets. We also said that |ψ⟩ is entangled if
and only if its Schmidt rank (i.e. number of non-zero Schmidt coefficients αi ) is at least two.
However, we did not discuss how to actually find the Schmidt decomposition of |ψ⟩, i.e. we
have not specified a procedure for computing the Schmidt rank of |ψ⟩. The partial trace allows
us to solve this problem.
Exercise 4.10. Prove that the Schmidt rank of |ψ⟩ ∈ Cd ⊗ Cd equals the rank of ρA =
TrB (|ψ⟩⟨ψ|). You should use the fact that in the definition of the partial trace, the stan-
dard basis {|i⟩} on B can be replaced by an arbitrary orthonormal basis, {|ψi ⟩}, on B. (Hint:
Choose {|ψi ⟩} as the Schmidt basis for |ψ⟩ on system B.)
31
The exercise above immediately gives us a method for determining whether a bipartite pure
state |ψ⟩ is entangled — namely, compute ρA = TrB (|ψ⟩⟨ψ|), and check if rank(ρA ) > 1. If so,
|ψ⟩ is entangled; otherwise, it is a product state. Finally, observe that this trick only works for
pure states |ψ⟩ — indeed, for mixed bipartite ρAB , detecting entanglement is NP-hard, so such
a simple criterion for entanglement is highly unlikely to exist!
32
5 Non-Local Games
“I don’t demand that a theory correspond to reality because I don’t know what it is.
Reality is not a quality you can test with litmus paper. All I’m concerned with is
that the theory should predict the results of measurements.”
— Stephen Hawking
5.1 Background
In the past two lectures, we have discussed measurements and entanglement. We now combine
these two topics to discuss one of the most fundamental questions in quantum theory: Does
entanglement as we understand it actually exist in Nature? Recall that in 1935, Einstein,
Podolsky and Rosen (EPR) argued that quantum mechanics could not be a complete physical
theory due to its prediction of states such as the Bell state |Φ+ ⟩ = √12 (|00⟩ + |11⟩). For example,
as we will discuss shortly, the Bell state appears to allow superluminal (i.e. faster than the speed
of light) communication, which is impossible by the theory of relativity. Thus, EPR suggested
that there must be more to Nature than what quantum theory prescribes — namely, that there
must be “ hidden variables” which contain extra information missing in quantum mechanics
about what is actually happening in the subatomic world. In other words, the moon is there
even if you do not look at it — it’s just that we “do not have access” to the hidden variables
prescribing the state of the moon. Moreover, this information must be “local”, in the sense
that instantaneous communication between variables should not be possible. Such theories are
hence called local hidden variable theories.
Remarkably, in 1964, John Bell proved that no local hidden variable theory could ever repro-
duce the statistics predicted by quantum mechanics! Thus, either local hidden variables theories
are wrong, or quantum mechanics is wrong. In fact, Bell went a step further — he proposed
a “simple” experiment which could be run in a lab to test which of these two cases represents
reality. This experiment was based on (what is now called) a Bell inequality, and was run for
example by the celebrated effort of Aspect, Grangier, and Roger in 1981, who wrote about their
findings:
Our results, in excellent agreement with the quantum mechanical predictions, strongly
violate the generalized Bell’s inequalities, and rule out the whole class of realistic lo-
cal theories.”
— Aspect, Grangier, Roger, PRL 1981.
Thus, quantum mechanics, and not a local hidden variable theory, appears to be correct, and
Bell states as we understand them really do exist. 1
1
For completeness, however, it should be mentioned that these experiments are not iron-clad results — it
turns out there are conceivable “loopholes” in such experimental setups which could explain the outcomes
of such experiments without ruling out local hidden variable theories. Nevertheless, over the years improved
experimental setups have managed to close some of these “loopholes”, and it is probably fair to say that the
general physics community regards this at least as strong evidence that local hidden variable theories are
insufficient to model the subatomic world.
33
The aim of this lecture is to discuss an equivalent, but more “computer science-oriented”
version of Bell inequalities known as non-local games. In the process, we will further practice
working with measurements, and see another instance of how the non-local nature of entangle-
ment can be harnessed in a computational task to perform a classically impossible feat.
34
5.3 Non-local games
As we just saw in Section 5.2, measuring half of a Bell state does not allow one to communicate
information from Alice to Bob. Surprisingly, though, the story has only just begun — one can
nevertheless use local measurements on both halves of a Bell state to generate a probability
distribution which contains correlations stronger than those possible classically. It is precisely
this principle which is harnessed in non-local games.
The setup of a non-local game is as follows. Suppose you and a friend are charged with
doing too much homework on school property, and taken to police headquarters, where you
are separated into two different rooms (i.e. you cannot communicate with one another). An
interrogator takes turns asking each of you a question, and subsequently compares and cross-
checks your answers to verify that you are both telling the truth. It turns out that if you and
your friend happened to share two halves of a Bell state before being taken to the police station,
then by performing appropriate local measurement on your qubits based on the interrogator’s
questions, you can sometimes convince the interrogator of your honesty with higher probability
than you could hope to do so without the use of entanglement. Let us demonstrate this via the
concrete example of the CHSH game.
Exercise 5.1. Suppose the questions satify qA = 0 and qB = 1. What possible pairs of answers
(rA , rB ) allow Alice and Bob to win? What if we have qA = qB = 1?
35
Exercise 5.2. Suppose Alice and Bob choose to always output the same bit they receive, i.e.
rA = qA and rB = qB . What is the probability of this strategy to win?
Finally, you may wonder whether Alice and Bob can classically do better if they use a ran-
domized strategy for selecting answers. It turns out the answer is no, as a randomized strategy
may be viewed as an average over all deterministic strategies. Thus, they might as well choose
the optimal deterministic strategy.
A quantum strategy
We now show that if Alice and Bob share the Bell pair |Φ+ ⟩ before the game starts, then by
performing appropriate local measurements based their questions qA and qB , they can win the
game with a higher probability of cos2 (π/8) ≈ 0.854 (compared to 3/4 in the classical case).
This is equivalent to saying the original CHSH inequality is violated by quantum mechanics,
meaning a local hidden variable theory cannot explain the measurement statistics predicted by
quantum mechanics (though we will not discuss these details here). To model Alice and Bob’s
strategy, we introduce the concept of an observable.
For example, consider a measurement in the standard basis {|0⟩, |1⟩}, where the outcomes are
labelled with set S = {1, −1}, respectively. Then, the corresponding observable is (for Pauli
operator Z)
C = |0⟩⟨0| − |1⟩⟨1| = Z.
Exercise 5.3. Consider measurement M = {|+⟩⟨+|, |−⟩⟨−|} with outcome labels S = {1, −1}.
What is the corresponding observable?
Observables are useful in that they allow us to quickly calculate the expected value of a mea-
surement. Recall that for a random variable X distributed over set Y , the expected value of X
is defined X
E[X] = Pr(X = y) · y,
y∈Y
and captures what value X takes on average. Similarly, the outcome of a measurement M =
{Πi } can be modelled by a random variable X over possible outcome labels i ∈ S, and we can
ask what E[X] is. For a measurement on density operator ρ, this is given by the formula
!!
X X X
E[X] = Pr(X = i) · i = Tr(ρΠi ) · i = Tr ρ Πi · i = Tr (ρC)
i∈S i∈S i
P
for observable C = ii · Πi , and where the third equality follows by linearity of the trace.
36
Exercise 5.4. Suppose we measure in the standard basis {|0⟩, |1⟩} ⊆ C2 with corresponding
measurement labels {1, −1}, respectively. Convince yourself that the corresponding observable
is Pauli Z. What is the expected value of measuring density operator ρ = 32 |0⟩⟨0| + 13 |+⟩⟨+|
with observable Z?
Alice and Bob’s strategy. With observables in hand, we state Alice’s and Bob’s strategy. First,
let us change the encoding of their output bits. Namely, instead of outputting rA , rB ∈ {0, 1}, in
their measurements Alice and Bob use label 1 to mean output 0, and label −1 to mean output
bit 1. Then, conditioned on questions qA and qB , Alice and Bob use observables AqA and BqB
as follows:
A0 = Z A1 = X B0 = H B1 = ZHZ, (5.1)
for H the Hadamard gate. Note that all four of these observables have eigenvalues in set
S = {1, −1}, and so they can be thought of as measurements in their respective eigenbases with
outcomes labelled by S.
Exercise 5.5. Intuitively, which bases do Alice’s two measurements A0 and A1 correspond to?
Calculating the success probability. At first glance, it is likely unclear why such a strategy should
be interesting at all. Let us first calculate the success probability of this strategy to demonstrate
that it does work, and subsequently give an intuitive understanding of why it works.
First, we claim that for arbitrary observables with spectral decompositions A = |a0 ⟩⟨a0 | −
|a1 ⟩⟨a1 | and B = |b0 ⟩⟨b0 |−|b1 ⟩⟨b1 |, the quantity Tr(A⊗B|Φ+ ⟩⟨Φ+ |) encodes the probability that
Alice and Bob output the same bits minus the probability they output different bits, assuming
they measure using A and B. To see this, we have
Tr(A ⊗ B|Φ+ ⟩⟨Φ+ |) = Tr((|a0 ⟩⟨a0 | − |a1 ⟩⟨a1 |) ⊗ (|b0 ⟩⟨b0 | − |b1 ⟩⟨b1 |)|Φ+ ⟩⟨Φ+ |)
= Tr(|a0 ⟩⟨a0 | ⊗ |b0 ⟩⟨b0 | · |Φ+ ⟩⟨Φ+ |) − Tr(|a0 ⟩⟨a0 | ⊗ |b1 ⟩⟨b1 | · |Φ+ ⟩⟨Φ+ |) −
Tr(|a1 ⟩⟨a1 | ⊗ |b0 ⟩⟨b0 | · |Φ+ ⟩⟨Φ+ |) + Tr(|a1 ⟩⟨a1 | ⊗ |b1 ⟩⟨b1 | · |Φ+ ⟩⟨Φ+ |)
= Pr(output 00) − Pr(output 01) − Pr(output 10) + Pr(output 11)
= Pr(output same bits) − Pr(output different bits).
We conclude that for pairs of questions qA qB ∈ {00, 01, 10} (i.e. Alice and Bob should output
the same bit), the term Tr(A ⊗ B|Φ+ ⟩⟨Φ+ |) denotes the probability Alice and Bob win minus
the probability they lose. Similarly, for the question pair qA qB = 11 (i.e. their answers should
disagree) the analogous quantity is −Tr(A ⊗ B|Φ+ ⟩⟨Φ+ |). It follows that that the probability
that Alice and Bob win minus the probability they lose is given by
Pr(qA qB = 00) · Tr((A0 ⊗ B0 )|Φ+ ⟩⟨Φ+ |) + Pr(qA qB = 01) · Tr((A0 ⊗ B1 )|Φ+ ⟩⟨Φ+ |) +
Pr(qA qB = 10) · Tr((A1 ⊗ B0 )|Φ+ ⟩⟨Φ+ |) − Pr(qA qB = 11) · Tr((A1 ⊗ B1 )|Φ+ ⟩⟨Φ+ |)
1
Tr((A0 ⊗ B0 + A0 ⊗ B1 + A1 ⊗ B0 − A1 ⊗ B1 )|Φ+ ⟩⟨Φ+ |) ,
= (5.2)
4
where the factor of 1/4 appears because each question pair qA qB appears with probability 1/4.
A direct calculation now yields that for our choices of Ai and Bi ,
1
Tr((A0 ⊗ B0 )|Φ+ ⟩⟨Φ+ |) = Tr((A0 ⊗ B1 )|Φ+ ⟩⟨Φ+ |) = Tr((A1 ⊗ B0 )|Φ+ ⟩⟨Φ+ |) = √ , (5.3)
2
√
whereas Tr((A1 ⊗ B1 )|Φ+ ⟩⟨Φ+ |) = −1/ 2.
37
|1i |1i
|a1 i
|a1 i |a0 i
|a0 i 45◦
|0i |0i
|1i
|1i
|b1 i
|b1 i
|b0 i
22.5◦
|0i
22.5 ◦ |b0 i
|0i
√ √
Exercise 5.6. Verify that Tr((A0 ⊗B0 )|Φ+ ⟩⟨Φ+ |) = 1/ 2 and Tr((A1 ⊗B1 )|Φ+ ⟩⟨Φ+ |) = −1/ 2
for Ai and Bi as chosen in Equation (5.1).
Let p be the probability with which Alice and Bob win with this strategy. Equations (5.2) √
and 5.3 tell us that the√probability of winning minus losing, p − (1 − p) = 2p − 1, equals 1/ 2.
Hence, p = 1/2 + 1/(2 2) = cos2 (π/8) ≈ 0.854, which is strictly better than the optimal clas-
sical winning probability of 0.75, as claimed.
Intuition behind Alice and Bob’s strategy. Why does this strategy work? To see this, we plot the
eigenbases for each observable used (i.e. the measurement bases) in Figure 5.6. Each eigenvector
is denoted |a0 ⟩ or |a1 ⟩ for Alice (|b0 ⟩ or |b1 ⟩ for Bob), corresponding to Alice outputting 0 or 1,
respectively. To begin, intuitively, the Bell state has the special property that if Alice and Bob
measure in “similar” bases on their respective qubits, then they should get the same outcome
with high probability, whereas if they measure with “very different” bases, then they will get
opposite outcomes with high probability. In other words, we want to choose observables A0 ,
A1 , B0 , and B1 such that for questions qA qB ∈ {00, 01, 10}, Alice and Bob measure in bases
which are “close” (since their output bits should match), and for qA qB = 11, they measure in
bases which are “far apart” (since their outputs should differ). Figure 5.6 depicts exactly this
scenario. For example, for qA qB = 00, the plots depicting Alice and Bob’s measurement bases
are the top-left and bottom-left, respectively. Here we see that |a0 ⟩ for Alice has high overlap
with |b0 ⟩ for Bob (similarly for |a1 ⟩ and |b1 ⟩), so they are likely to obtain the same outcome.
Conversely, for qA qB = 11 (i.e. top-right and bottom-right plots), |a0 ⟩ and |b0 ⟩ are almost
orthogonal (similarly for |a1 ⟩ and |b1 ⟩), meaning it is unlikely that Alice and Bob’s outputs will
38
agree.
where ⊕ denotes the XOR, i.e. addition modulo 2. It turns out that it is impossible to fill out
this square so that all six of these equations are satisfied.
Exercise 5.7. Show that there do not exist assignments to all xij ∈ {0, 1} satisfying all six
equations above. (Hint: Add all the equations to achieve a contradiction.)
Making a game of the magic square. We can turn the magic square into a non-local game between
parties Alice and Bob as follows. As her question, Alice receives uniformly at random one of the
rows or columns of the square, where we index the rows and columns by qA ∈ {1, 2, 3, 4, 5, 6}.
Suppose row/column qA contains variables x, y, z. Then, Alice must return assignments for x,
y, and z — denote her answers xA , yA , zA . As for Bob’s question, we now choose uniformly one
of x, y, z, and ask Bob to provide an assignment for it. For example, Bob might be asked to
provide an assignment for y, which we would denote yB . We say Alice and Bob win the game
if xA , yA , zA satisfies the parity constraint for the corresponding row or column, and if Bob’s
answer matches that of Alice (i.e. yB = yA in our example).
Exercise 5.8. Suppose Alice is asked to provide an assignment for row 1, i.e. x11 , x12 , x13 , and
Bob is asked to provide an assignment for x13 . What response of Alice and Bob can win the
game? What response will lose the game?
Because the magic square cannot be filled out perfectly (i.e. Equations (5.4), (5.5), (5.6)
cannot be simultaneously satisfied), it is not difficult to argue that no classical strategy of Alice
and Bob can win this non-local game with probability 1. There is, however, a perfect quantum
strategy, and as for the CHSH game, it relies on the non-local nature of entanglement.
39
A perfect quantum strategy. As for the CHSH game, we begin by switching from basis {0, 1}
for encoding responses to {1, −1}, respectively. (As a result, our observables will have eigevalues
in {1, −1}.) Note that this translates an expression of the form x⊕y ⊕z = b for x, y, z, b ∈ {0, 1}
to xyz = (−1)b for x, y, z ∈ {1, −1}, i.e. the direct sum condition converts to a multiplication.
For example, with this “change of basis”, 0 ⊕ 1 ⊕ 0 = 1 converts to (1)(−1)(1) = (−1)1 . Also
as for CHSH, before the game starts, Alice and Bob share an entangled state between them; in
this case, we use a higher dimensional analogue of the Bell state,
1 1 1 1
|ψ⟩ = |00⟩ + |11⟩ + |22⟩ + |33⟩ ∈ C4 ⊗ C4 ,
2 2 2 2
where recall {|0⟩, |1⟩, |2⟩, |3⟩} is an orthonormal basis for C4 . Now recall Alice will receive three
cells x, y, z in the magic square (from a single row or column) for which she has to provide
values in {1, −1}, and Bob will receive one of x, y, or z, for which he must provide a value in
{1, −1}. The following chart shows which measurement Alice or Bob should apply to their half
of |ψ⟩ depending on which cell(s) they receive questions for:
Z ⊗I I ⊗Z Z ⊗Z
I ⊗X X ⊗I X ⊗X (5.7)
Z ⊗X X ⊗Z Y ⊗Y
For example, if Alice receives row 1 as her question, she measures observables Z ⊗ I, I ⊗ Z, and
Z ⊗ Z on her half of |ψ⟩. Then, if Bob receives the top middle cell as his question, he measures
I ⊗ Z on his half of |ψ⟩. Since all observables above have eigenvalues in {1, −1}, all of Alice
and Bob’s responses will also be in {1, −1}.
Why does this work? Suppose Alice gets the first row as a question. Then, she measures
according to Z ⊗ I, I ⊗ Z, and Z ⊗ Z, obtaining 3 values from set {1, −1}, each of which
corresponds to one of the cells in row 1. The expected value for the product of these (since
recall switching to the {1, −1} output encoding converted our constraints to multiplication of
variables) is given by
In other words, Alice always outputs a triple from set {1, −1}×3 with product 1 when she is
asked row 1, and hence answers her question correctly. (Aside: Note that each operator above,
e.g. Z ⊗ I, acts entirely on Alice’s 4-dimensional half of |ψ⟩.) More generally, multiplying out
the observables in any row and in the first two columns similarly yields the matrix I ⊗ I — this
means Alice always outputs values with product 1 in these cases, as desired. As for the last
column, the observables multiply out to −I ⊗ I, meaning Alice outputs values with product −1,
again as desired.
Exercise 5.9. Show that multiplying the observables in column 3 above yield −I ⊗ I.
How about Bob — will his measurement result match Alice’s? Note that by definition, Alice
and Bob both perform the same measurement for a given cell on their respective halves of |ψ⟩.
For example, if Alice is asked row 1 and Bob is asked the top-middle cell, then both of them
measure I ⊗ Z on their respective halves of |ψ⟩. Here, we use a nice fact:
Fact 5.10. For |ψ⟩ = √1d di=1 |ii⟩ and any observable A, we have A ⊗ I|ψ⟩ = I ⊗ AT |ψ⟩.
P
40
Using this fact, the expected value for the product of Alice and Bob’s measurements for the
top-middle cell is
since Z 2 = I. In other words, their outputs always agree. A similar analysis applies for any of
the nine cells in the magic square. We conclude that Alice and Bob win the magic square game
with certainty.
Finally, we mention an important point, but one which we will not dwell on at this point in
the course. Recall that in general, measuring a quantum state disturbs it, and so the order in
which a sequence of measurements is performed is important. There is an exception to this rule,
though — if the observables corresponding to the measurements all commute, then the order
in which the measurements are performed does not matter. In the strategy above, any pair of
observables in the table pairwise commute - thus, e.g., when Alice does three measurements in
a row on her half of |ψ⟩, the precise order of the measurements does not matter. This ensures
the strategies above are well-defined.
Connections to solving systems of equations. Recall the magic square corresponds to the
following inconsistent system of equations (where each xij ∈ {1, −1}):
The reason why the magic square game has a perfect quantum strategy is intuitively that
if we allow higher dimensional assignments, then the analogous system does have a solution!
Formally, let us work in L(C4 ), obtaining system
Exercise 5.11. Let Mij be given by the observable in row i and column j of Table 5.7. Show
that this choice of assignment satisfies the system in Equations (5.11),(5.12), and (5.13).
It follows immediately from the exercise above that regardless of which state |ψ⟩ Alice and
Bob share, Alice will always output a correct answer. The only reason we now require |ψ⟩ to
specifically be a high dimensional analogue of the Bell state is to apply Fact 5.10, which allows
Bob’s condition to always be satisfied. Thus, at the heart of the magic square game is the
idea that even if a system of equations has no solution over a low dimensional space, its high
dimensional analogue may nevertheless have a solution.
41
demonstrate that, indeed, quantum states can give rise to correlations strictly stronger than
those possible with classical states. Alternatively, we can state this as: Assuming experimental
loopholes can be closed, Einstein was wrong, and the non-local nature of entanglement appears
to, in fact, be real.
Second, in recent years, non-local games such as the CHSH game have been exploited with
great success in areas of quantum information research such as device independent cryptography,
randomness expansion, and computational complexity theory. The first of these areas, for
example, roughly studies the concept of reliable cryptography using quantum devices, even if
those devices may have been tampered with. At the heart of some of these advances are “rigidity
theorems”. These state, roughly, that the CHSH game’s optimal success probability is robust or
rigid — even if one wants to obtain a success probability for CHSH which is close to optimal (as
opposed to exactly optimal), then one must use a strategy which is “close” to the one discussed
here (for an appropriate notion of “close”).
42
6 Entropy and Entanglement Distillation
“Sea water is rendered potable by evaporation; wine and other liquids can be submit-
ted to the same process, for, after having been converted into vapours, they can be
condensed back into liquids.”
— Aristotle (writing about distillation)
The theme of the last two lectures has been of a quantum information theoretic nature — we
have studied cloning (or rather, lack thereof), entanglement, and non-local correlations. Before
progressing to our next main theme of quantum algorithms, we now give a brief taste of more
advanced ideas in quantum information. In the process, we will continue getting used to working
with quantum states in both the state vector and density operator formalisms.
The main questions we ask in this lecture are the following:
• Under what conditions can “less entangled” states be “converted” to “more entangled”
states?
The first question will require the foundational concept of entropy, introduced in the context of
classical information theory by Claude Shannon in 1948. The entropy is worthy of a lecture in
itself, being an extremely important quantity. The second question above is tied to the discovery
of entanglement distillation, in which, similar to the age-old process of distilling potable water
from salt water (or more fittingly for our analogy, “pure” water from “dirty” water), one can
“distill” pure entanglement from noisy entanglement.
6.1 Entropy
One of the most influential scientific contributions of the 20th century was the 1948 paper of
Claude Shannon, “A Mathematical Theory of Communication”, which single-handed founded
the field of information theory. Roughly, the aim of information theory is to study information
transmission and compression. For this, Shannon introduced the notion of entropy, which
intuitively quantifies “how random” a data source is, or the “average information content” of
the source. It turns out that a quantum generalization of entropy will be vital to quantifying
entanglement; as such, we begin by defining and motivating the classical Shannon entropy.
43
Motivation. Before getting our hands dirty understanding H(x), let us step back and motivate
it via data compression. Suppose we have a data source whose output we wish to transmit from
(say) Germany to Canada. Naturally, we may wish to first compress the data, so that we need
to transmit as few bits as possible between the two countries. Furthermore, a compression
scheme is useless unless we are later able to recover or uncompress the data in Canada. This
raises the natural question: How few bits can one transmit, so as to ensure recovery of the data
on the receiving end? Remarkably, Shannon’s noiseless coding theorem says that this quantity
is given by the entropy! Roughly, the theorem says that in order to reliably transmit N i.i.d.
(independently and identically distributed) random variables Xi from a random source X, it is
necessary and sufficient to instead send H(X) bits of communication.
Getting our hands dirty. We now explore the sense in which H(X) indeed quantifies the
“randomness” or “uncertainty” of X by considering two boundary cases. In the first boundary
case, X models a fair coin flip, i.e. X takes value HEADS or TAILS with probability 1/2 each.
Then,
1 1 1 1 1 1
H(X) = − log − log = + = 1. (6.2)
2 2 2 2 2 2
Therefore, we interpret a fair coin as encoding, on average, one bit of information. Alternatively,
in the information transmission setting, we would need to transmit a single bit to convey the
answer of the coin flip from Germany to Canada. This intuitively makes sense — since the
outcome of the coin flip is completely random, there is no way to guess its outcome in Canada
with success probability greater than 1/2 (i.e. a random guess).
The second boundary case is treated in the exercise below.
Exercise 6.1. Suppose random variable Y models a biased coin, e.g. takes value HEADS and
TAILS with probability 1 and 0, respectively. What is H(Y )?
In this example, there is no “uncertainty”; we know the outcome will be HEADS. Thus, in the
communication setting, one can interpret this as saying zero bits of communication are required
to transmit the outcome of the coin flip from Germany to Canada (assuming both Germany
and Canada know the probabilities of obtaining HEADS and TAILS beforehand). Indeed, the
answer to the exercise above is H(Y ) = 0.
Exercise 6.2. Let random variable Z take values in set {0, 1, 2} with probabilities {1/4, 1/4, 1/2},
respectively. What is H(Z)?
Detour: Deriving the formula for entropy. The entropy formula is odd-looking; to understand
how it arises, the key observation is the intuition behind the coin flip examples, which says that
“when an event is less likely to happen, it reveals more information”. To capture this intuition,
Shannon started with a formula for information content I(xi ), which for any possible event xi
for random variable X, is given by
1
I(xi ) = log = − log (Pr(xi )) . (6.3)
Pr(xi )
Since the log function is strictly monotonically increasing (i.e. I(x) > I(y) if x > y for x, y ∈
(0, ∞)), it holds that I(xi ) captures the idea that “rare events yield more information”. But
I(x) also has three other important properties we expect of an “information measure”; here are
the first two:
44
1. (Information is non-negative) I(x) ≥ 0 , and
2. (If an event occurs with certainty, said occurrence conveys no information) if Pr(x) = 1,
then I(x) = 0.
For the third important property, we ask — why did we use the log function? Why not any
other monotonically increasing function satisfying properties (1) and (2) above? Recall that,
by definition, two random variables X and Y are independent if
Moreover, if X and Y are independent, then intuitively one expects the information conveyed
by the joint event z := (X = x and Y = y) to be additive, i.e. I(z) = I(x) + I(y). But this is
precisely what the information content I(x) satisfies, due to its use of the log function, as you
will now verify.
Exercise 6.3. Let X and Y be independent random variables. Then, for z := (X = x and Y =
y), express I(z) in terms of I(x) and I(y).
We can now use the information content to derive the formula for entropy — H(X) is simply the
expected information content over all possible events {x1 , . . . , xn }. (Recall here
P that for random
variable X taking values x ∈ {xi }, its expectation E[x] is given by E[x] = i Pr(xi ) · xi .) This
is precisely why at the start of this section, we referred to H(x) as the average information
content of a data source.
• Positive semidefinite operators, Pos Cd , which generalize the non-negative real numbers.
• Orthogonal projection operators, Π(Cd ), which generalize the set {0, 1}.
Note that Π(Cd ) ⊆ Pos Cd ⊆ Herm Cd , and that the notion of “generalize” above means the
eigenvalues of the operators fall into the respective class the operators generalize. (For example,
matrices in Pos Cd have non-negative eigenvalues.)
Applying this same interpretation to the
set of density operators acting on Cd , D Cd , we thus have that density operators generalize
the notion of a probability distribution. Indeed, any probability distribution can be embedded
into a diagonal density matrix, as you will now show.
Exercise 6.4. Let {pi }di=1 denote a probability distribution. Define diagonal matrix ρ ∈ L(Cd )
such that ρ(i, i) = pi . Show that ρ is a density matrix.
45
Since the eigenvalues λi (ρ) of a density operator ρ ∈ D Cd form a probability distribution,
the natural approach for defining a quantum entropy formula is to apply the classical Shannon
entropy to the spectrum of ρ:
d
{λi (ρ)}di=1
X
S(ρ) := H = −λi (ρ) log (λi (ρ)) . (6.5)
i=1
Operator functions. It is important to pause now and take stock of what we have done in
defining S(ρ) in
d
Equation (6.5): In order to apply a function f : R → R to a Hermitian matrix
H ∈ Herm C , we instead applied f to the eigenvalues of H. Why does this “work”? Let us
look at the Taylor series expansion of f , which for e.g. f = ex is (the series converges for all x)
∞
X xn x2 x3
ex = =1+x+ + + ··· . (6.6)
n! 2! 3!
n=0
This suggests an idea — to define eH , perhaps we can substitute H in the right hand side of
the Taylor series expansion of ex :
H2 H3
eH := I + H + + + ··· . (6.7)
2! 3!
Indeed, this leads to our desired definition; that to generalize the function f (x) = ex to Hermi-
tian matrices, we apply f to the eigenvalues of H, as you will now show.
P
Exercise 6.5. Let H have spectral decomposition H = i λi |λi ⟩⟨λi |. Show that in Equa-
tion (6.7), X
eH = eλi |λi ⟩⟨λi |.
i
Exercise 6.6. Verify that Equations (6.5) and (6.8) are equal.
Exercise 6.7. Let f (x) = x2 . What is f (X), for X the Pauli X operator? Why does this yield
the same results as multiplying X by itself via matrix multiplication?
√
Exercise 6.8. Let f (x) = x. For any pure state |ψ⟩ ∈ Cd , define rank one density operator
√
ρ = |ψ⟩⟨ψ|. What is ρ?
√
Exercise 6.9. What is Z for Z the Pauli Z operator? Is it uniquely defined?
46
Properties of the von Neumann entropy
Let us now see how properties of H(X) carry over to S(ρ). These will prove crucial in our
understanding of quantifying entanglement shortly.
Exercise 6.10. Prove that for any pure state |ψ⟩, S(|ψ⟩⟨ψ|) = 0.
There are other important properties of S(ρ) which we do not wish to focus on at present; for
completeness, however, let us briefly mention two more: (1) For arbitrary, possibly Pnentangled,
bipartite
Pp mixed states ρAB , S(ρ AB ) ≤ S(ρ A ) + S(ρB ) (subadditivity), and (2) S( i=1 pi ρi ) ≥
n
p
i=1 i S(ρi ) for {p }
i i=1 a distribution (concavity). Here, and henceforth in this course, we use
the shorthand
ρA := TrB (ρAB ). (6.9)
47
P
entangled? (Recall this means that ρAB cannot be written ρAB = i pi ρA,i ⊗ ρB,i as a convex
combination of product states.) Roughly, if one uses d to encode the size of the input (i.e. the
input is the entire d2 × d2 matrix representing ρAB ), then deciding this question turns out to
be NP-hard. This directly implies that quantifying “how much” entanglement is in ρAB is also
NP-hard. However, there is a special case in which we can do both tasks efficiently — the case
of bipartite pure states |ψAB ⟩ ∈ Cd ⊗ Cd . It is here in which the von Neumann entropy plays a
role.
In fact, a previous lecture already discussed an efficient test for entanglement for |ψAB ⟩ —
the latter is entangled if and only if
has rank at least 2. This, in turn, followed because it immediately implies the Schmidt rank of
|ψAB ⟩ is at least 2. However, we can say more. Suppose we have Schmidt decomposition
d
X
|ψAB ⟩ = si |ai ⟩|bi ⟩. (6.11)
i=1
Then, intuitively, as with the example of a Bell pair, |ψAB ⟩ is “highly entangled” if all the
Schmidt coefficients si are approximately equal in magnitude, and |ψAB ⟩ is “weakly entangled”
if there exists a single si whose magnitude is approximately 1. Do we know of a function which
quantifies precisely this sort of behavior on the set {si }? Indeed, the entropy function! This
notion of si being “spread out” versus “concentrated” is highly reminiscent of our fair versus
biased coin flip example for the Shannon entropy. We can therefore use the von Neumann
entropy to define an entanglement measure E(|ψAB ⟩) as
Exercise 6.15. Unlike the example of the fair coin, the P Schmidt coefficientsP si2 of |ψAB ⟩ are
not probabilities, but amplitudes (i.e. we do not have i si = 1, but rather i si = 1). Show,
however, that the notion of probabilities is recovered in the formula S(ρA ), i.e. show that the
d
eigenvalues of ρA are the precisely set s2i i=1 , which do form a distribution.
Finally, let us close this section with a natural question — does E(|ψAB ⟩) still measure
entanglement when its input is allowed to be a mixed state ρAB (as opposed to a pure state
|ψAB ⟩)? The answer is given in the following exercise.
Exercise 6.16. Define E(ρAB ) := S(TrB (ρAB )) = S(ρA ). Recall that the maximally mixed
state on two qubits is a product state, i.e. I/4 = I/2 ⊗ I/2. Show that E(I/4) = 1. Why does
this imply when cannot use E as an entanglement measure for bipartite mixed states?
48
6.3 Entanglement distillation
Now that we have a notion of how to quantify entanglement in pure states, we can become
greedy — under what circumstances is it possible to “increase” the amount of entanglement in
a composite quantum system? This is a highly non-trivial question, as fundamental communica-
tion tasks such as teleportation require highly entangled Bell pairs as a resource. Unfortunately,
experimentally producing such pure states is generally a difficult task due to noise from the en-
vironment. In other words, in a lab one is typically able to produce mixed states, as opposed
to pure states. Moreover, even if Alice could produce perfect Bell pairs in a lab on Earth,
when she sends half of a Bell pair to Bob on Mars, the transmitted qubit will again typically
be subject to noise, yielding a shared mixed state ρAB between Alice and Bob. Do Alice and
Bob have any hope of running the teleportation protocol given ρAB ?
Local Operations and Classical Communication (LOCC). To answer the question, it is im-
portant to first define the rules of the game. Since Alice and Bob are spatially separated, they
are not able to apply joint quantum operations to both systems A and B, e.g. they cannot apply
a non-factorizable unitary UAB ∈ U Cd ⊗ Cd to ρAB . However, they can apply local unitaries
and measurements, e.g. factorizable unitaries of the form UA ⊗ UB for UA , UB ∈ U Cd (i.e.
Alice locally applies UA , Bob locally applies UB ). They can also pick up the phone and call one
another to transmit classical information. Taken together, this set of allowed operations is given
a name: Local Operations and Classical Communication (LOCC). The question is thus: Given
a shared mixed state ρAB , can Alice and Bob use LOCC to “purify” or “distill” Bell states out
of ρAB ? The answer is sometimes yes, and protocols accomplishing this are called distillation
protocols, as they recover “pure” entanglement from “noisy” or mixed state entanglement.
A simple distillation protocol. We shall discuss a simple distillation protocol, known as the
recurrence protocol. Given as input a mixed two-qubit state ρAB ∈ D C2 ⊗ C2 , our aim is to
√
distill the Bell state known as the singlet, |Ψ− ⟩ = (|01⟩ − |10⟩/ 2; note I ⊗ Y |Ψ− ⟩ = i|Φ+ ⟩,
making it easy to convert one Bell state into the other for the teleportation scheme. There is
a required precondition for the protocol to work — the input state ρAB must have sufficient
initial overlap with |Ψ− ⟩, i.e.
1
F (ρAB ) := ⟨Ψ− |ρAB |Ψ− ⟩ > . (6.13)
2
In other words, in transmitting half of |Ψ− ⟩ from Alice to Bob, the resulting mixed state ρAB
should not have deviated “too far” from |Ψ− ⟩. Henceforth, we shall use shorthand F to denote
F (ρAB ) for brevity.
Suppose Alice and Bob share two copies of ρAB ; let us label them ρA1 B1 and ρA2 B2 , where
Alice holds systems A1 , A2 , and Bob holds B1 , B2 . Each round of the distillation protocol
proceeds as follows.
1. (Twirling operation) Alice picks a Pauli operator U from set {I, X, Y, Z} uniformly√at
random, and communicates this choice
√ to Bob. They each locally apply operator U
to ρAi Bi for i ∈ {1, 2} (note that U is defined using the notion of operator functions,
introduced in Section 6.1.2), obtaining
p p p † p †
σAi Bi := ( UA ⊗ UB )ρAi Bi ( UA ⊗ UB ).
49
This random choice of Pauli and its subsequent
local application is together called the
2 2 2 2
twirling map Φ : D C ⊗ C → D C ⊗ C , and is not a unitary map (due to the
random choice over Pauli operators); nevertheless, it can clearly be implemented given
the ability to flip random coins and apply arbitrary single qubit gates. (The formal
framework for studying such operations is via Trace Preserving Completely Positive Maps,
and is beyond the scope of this course.) The nice thing about the twirling operation is
that, for any input ρAB , Φ(ρAB ) can be diagonalized in the Bell basis, i.e. can be written
2. (Convert from |Ψ− ⟩ to |Φ+ ⟩) Alice applies Pauli Y to her half of each state to obtain
states:
1−F − 1−F + 1−F −
σAi Bi = |Ψ ⟩⟨Ψ− | + |Ψ ⟩⟨Ψ+ | + F |Φ+ ⟩⟨Φ+ | + |Φ ⟩⟨Φ− | (6.15)
3 3 3
This shifts most of the weight in Φ(ρAi Bi ) from |Ψ− ⟩ to |Φ+ ⟩, since F > 1/2 by Equa-
tion (6.13).
3. (Application of CNOT gates) Alice applies a CNOT gate with qubit A1 as the control
and A2 as the target. Bob does the same for B1 and B2 .
4. (Local measurements) Alice and Bob each locally measure A2 and B2 in the standard
basis, obtaining outcomes a and b in {0, 1}, respectively. They pick up the phone to
compare their measurement results a and b. If a = b, they keep the remaining composite
system on AB, denoted σA ′ . Otherwise if a ̸= b, they throw out all systems and start
1 B1
again.
To get a better sense of this protocol in action, let us apply it to a concrete example. Suppose
Alice sends half of the singlet to Bob, and along the way, the state |Ψ− ⟩ is injected with some
completely random noise, denoted by the identity matrix:
1 1 3 1 1 1
ρAB = |Ψ− ⟩⟨Ψ− | + I = |Ψ− ⟩⟨Ψ− | + |Ψ+ ⟩⟨Ψ+ | + |Φ+ ⟩⟨Φ+ | + |Φ− ⟩⟨Φ− |, (6.16)
2 8 4 12 12 12
where the second equality follows by recalling the identity matrix diagonalizes in any basis,
including the Bell basis. (The noise-inducing channel above is formally known as the depolarizing
channel in quantum information theory.)
√ √
Exercise 6.17. √ Show√that Z ⊗ Z maps |Φ+ ⟩ to |Φ− ⟩ and vice √ versa.
√ Using the additional
identities that X ⊗ X maps |Φ+ ⟩ to |Ψ+ ⟩ and vice versa, and Y ⊗ Y maps |Φ− ⟩ to |Ψ+ ⟩
and vice versa, show that the twirling map leaves ρAB in Equation (6.16) invariant.
50
Exercise 6.18. Show that applying YA ⊗ I to ρAB yields in Step 2 that
1 − 1 3 1
σAB = |Ψ ⟩⟨Ψ− | + |Ψ+ ⟩⟨Ψ+ | + |Φ+ ⟩⟨Φ+ | + |Φ− ⟩⟨Φ− |. (6.17)
12 12 4 12
Finally, Steps 3 and 4 are a bit messier. The intuition here is that the CNOT entangles σA1 B1
with σA2 B2 , and the measurement forces the bits in the second system (formerly in state σA2 B2 )
to match; via the entanglement just created, this increases the weight in Equation (6.17) on the
terms where bits match, i.e. |Φ+ ⟩⟨Φ+ | and |Φ− ⟩⟨Φ− |. Thus, the final Step 5 will yield a state
with higher overlap on the desired singlet state |Ψ− ⟩.
To run through the full technical analysis for Steps 3 and 4 would be tedious, so we analyze
one term of the computation for brevity. Before Step 3 is run, Alice and Bob share state
σA1 B1 ⊗ σA2 B2 ∈ D C2 ⊗ C2 ⊗ C2 ⊗ C2 ,
(6.18)
where recall Alice holds qubits A1 , A2 , and Bob holds B1 , B2 . Since each σAi Bi has 4 terms in its
mixture, the tensor product in Equation (6.18) has 16 terms. By linearity, Step 3 then applies
gates CNOTA1 A2 and CNOTB1 B2 to each of these 16 terms, where our notational convention is
that CNOT12 has qubit 1 as the control and qubit 2 as the target. Let us analyze one of these
16 terms: |Φ+ ⟩⟨Φ+ |A1 B1 ⊗ |Φ+ ⟩⟨Φ+ |A2 B2 .
Finally, let us briefly state what this protocol buys us. A careful but tedious analysis yields
that with probability at least 1/4, this protocol maps the input state ρA1 B1 ⊗ρA2 B2 to an output
′
state σA such that (for Fρ := F (ρA1 B1 ))
1 B1
′
Fρ2 + 19 (1 − Fρ )2
F (σA 1 B1
)= . (6.19)
Fρ2 + 32 Fρ (1 − Fρ ) + 59 (1 − Fρ )2
So long as Fρ > 1/2, one can show F (σA ′ ) > Fρ ; thus, recursively applying this protocol
1 B1
(using many pairs of input states ρAB ) improves our overlap with our desired target state of
|Ψ− ⟩.
51
7 The Deutsch-Josza and Berstein-Vazirani
algorithms
“Computers are physical objects, and computations are physical processes. What
computers can or cannot compute is determined by the laws of physics alone. . .”
— David Deutsch
In the lectures thus far, we’ve introduced the postulates of quantum mechanics, and studied
them through the lens of quantum information theoretic concepts such as entanglement, non-
local games, and entropy. We now switch to the theme of quantum algorithms, i.e. algorithms
harnessing the four postulates of quantum mechanics. We begin with a simple quantum algo-
rithm due to David Deutsch, which is by no means new (it was discovered in 1985), nor does it
tackle a particularly important problem (in fact, the problem is quite artificial). Nevertheless,
Deutsch’s algorithm serves as an excellent proof of concept that, in certain settings, quantum
computers are strictly more powerful than classical ones. Moreover, as shown by Berstein and
Vazirani, the generalization of this algorithm (dubbed the Deutsch-Josza algorithm) can actu-
ally be seen as solving a more natural problem — given a black-box function f : {0, 1}n → {0, 1}
which computes f (x) = a · x mod 2 for some unknown a ∈ {0, 1}n , what is a?
|xi |xi
Uf
|yi |y ⊕ f (x)i
Here, Uf ∈ U((C2 )⊗2 ) is a unitary operator mapping |x⟩|y⟩ 7→ |x⟩|y ⊕ f (x)⟩ for any x, y ∈ {0, 1}
(i.e. |x⟩, |y⟩ denote standard basis states), and where ⊕ denotes XOR or addition modulo 2.
Note that by linearity, once we define the action of Uf on standard basis states, we immediately
know how it acts on any input state |ψ⟩ ∈ (C2 )⊗2 .
52
Exercise 7.1. Let |ψ⟩ = α00 |00⟩ + α01 |01⟩ + α10 |10⟩ + α11 |11⟩. What is the state Uf |ψ⟩?
Observe now that Uf is reversible — this is because running Uf again on its output, |x⟩|y⊕f (x)⟩,
yields state |x⟩|y ⊕ f (x) ⊕ f (x)⟩ = |x⟩|y⟩, since f (x) ⊕ f (x) = 0 (adding the same bit twice
and dividing by 2 leaves remainder zero). Second, note that we have not specified the inner
workings of Uf (i.e. we have not given a circuit implementing the functionality stated above);
in this course, we shall treat Uf as a “black box” or “oracle” which we presume we can run,
but cannot “look inside” to see its implementation details.
Exercise 7.2. Suppose f (0) = 1 and f (1) = 0. Is f constant or balanced? Given an example
of a constant f .
Quantum query complexity. As you may have noticed above, the “cost function” we are
interested in minimizing in solving Deutsch’s problem is the number of quantum queries to
Uf . This is an example of the model of quantum query complexity, in which many quantum
algorithms have been developed. In the study of quantum query complexity, one is given a black
box Uf implementing some function f , and asked what the minimum number of required queries
to Uf is in order to determine some desired property of f . Note that the quantum algorithm
computing this property can consist of (say) 999999999 quantum gates; if it contains only 2
queries to Uf , then we consider the cost of the algorithm as 2, i.e. all “non-query” operations
are considered free.
53
Thus, we seem to have obtained both outputs of f with just a single query! Unfortunately,
things in life are rarely free, and this is certainly no exception — although we have both outputs
f (0) and f (1) in superposition, we cannot hope to extract both answers via measurement. In
particular, once we measure both registers in the standard basis, we will collapse to one of the
two terms in the superposition, effectively destroying the other term.
Exercise 7.3. Suppose one measures the first qubit of the output state |ψ⟩ (this qubit marks
which term in the superposition we have) with a standard basis measurement {|0⟩⟨0|, |1⟩⟨1|}.
Show that the probability of outcome 0 or 1 is |α|2 or |β|2 , respectively, and that in each case,
the state collapses to either |0⟩|f (0)⟩ or |1⟩|f (1)⟩, respectively. Thus, only one answer f (0) or
f (1) can be extracted this way.
Luckily, our goal is not to extract both f (0) and f (1) after a single query. Rather, we want
something possibly simpler: To evaluate the expression f (0) ⊕ f (1).
Exercise 7.4. Convince yourself that f is constant if f (0) ⊕ f (1) = 0 and f is balanced if
f (0) ⊕ f (1) = 1. Thus, Deutsch’s problem is equivalent to evaluating f (0) ⊕ f (1).
It turns out that by a clever twist of the naive approach above, we can indeed evaluate f (0)⊕f (1)
(without individually obtaining the values f (0), f (1)) via Deutsch’s algorithm.
Uf
|q2 i = |1i H
It is not a priori obvious at all why this circuit should work, and this is indicative of designing
quantum algorithms in general — the methods used are often incomparable to known classical
algorithm design techniques, and thus developing an intuition for the quantum setting can be
very difficult. Let us hence simply crunch the numbers and see why this circuit indeed computes
f (0) ⊕ f (1), as claimed. Once we’ve done the brute force calculation, we will take a step back
and talk about the phase kickback trick, which is being used here, and which will allow for a
much simpler and somewhat more intuitive understanding of why the algorithm works.
As in previous lectures, let us divide the computation into 4 stages denoted by the quantum
state in that stage: At the start of the circuit (|ψ1 ⟩), after the first Hadamards are applied
(|ψ2 ⟩), after Uf is applied (|ψ3 ⟩), and after the last Hadamard is applied (|ψ4 ⟩). It is clear that
|ψ1 ⟩ = |0⟩|1⟩,
1
|ψ2 ⟩ = |+⟩|−⟩ = (|0⟩|0⟩ − |0⟩|1⟩ + |1⟩|0⟩ − |1⟩|1⟩).
2
After the oracle Uf is applied, we have state
1
|ψ3 ⟩ = (|0⟩|f (0)⟩ − |0⟩|1 ⊕ f (0)⟩ + |1⟩|f (1)⟩ − |1⟩|1 ⊕ f (1)⟩).
2
54
Before we apply the final Hadamard, it will be easier to break our analysis down into two cases:
When f is constant and when f is balanced.
Case 1: Constant f. By definition, if f is constant, then f (0) = f (1). Therefore, we can
simplify |ψ3 ⟩ to
1
|ψ3 ⟩ = (|0⟩|f (0)⟩ − |0⟩|1 ⊕ f (0)⟩ + |1⟩|f (0)⟩ − |1⟩|1 ⊕ f (0)⟩)
2
1
= ((|0⟩ + |1⟩) ⊗ |f (0)⟩ − (|0⟩ + |1⟩) ⊗ |1 ⊕ f (0)⟩)
2
1
= (|0⟩ + |1⟩) ⊗ (|f (0)⟩ − |1 ⊕ f (0)⟩)
2
1
= √ |+⟩ ⊗ (|f (0)⟩ − |1 ⊕ f (0)⟩).
2
Thus, qubit 1 is now in state |+⟩. We conclude that
1
|ψ4 ⟩ = √ |0⟩ ⊗ (|f (0)⟩ − |1 ⊕ f (0)⟩),
2
i.e. qubit 1 is exactly in state |0⟩. Thus, measuring qubit 1 in the standard basis now yields
outcome 0 with certainty.
Case 2: Balanced f. By definition, if f is balanced, then f (0) ̸= f (1). Since f is a binary
function, this means f (0) ⊕ 1 = f (1) and equivalently f (1) ⊕ 1 = f (0). Therefore, we can
simplify |ψ3 ⟩ to
1
|ψ3 ⟩ = (|0⟩|f (0)⟩ − |0⟩|f (1)⟩ + |1⟩|f (1)⟩ − |1⟩|f (0)⟩)
2
1
= ((|0⟩ − |1⟩) ⊗ |f (0)⟩ − (|0⟩ − |1⟩) ⊗ |f (1)⟩)
2
1
= (|0⟩ − |1⟩) ⊗ (|f (0)⟩ − |f (1)⟩)
2
1
= √ |−⟩ ⊗ (|f (0)⟩ − |f (1)⟩).
2
Thus, qubit 1 is now in state |−⟩. We conclude that
1
|ψ4 ⟩ = √ |1⟩ ⊗ (|f (0)⟩ − |f (1)⟩),
2
i.e. qubit 1 is exactly in state |1⟩. Thus, measuring qubit 1 in the standard basis now yields
outcome 1 with certainty.
Conclusion. If f is constant, the algorithm outputs 0, and if f is balanced, the algorithm
outputs 1. Thus, the algorithm decides whether f is constant or balanced, using just a single
query!
55
Now, there are two possibilities: Either f (x) = 0, or f (x) = 1. If f (x) = 0, the equation above
simplifies to
1
|ψ⟩ = |x⟩ ⊗ √ (|0⟩ − |1⟩) = |x⟩|−⟩,
2
i.e. the input state is unchanged by the action of Uf . If, on the other hand, f (x) = 1, we
instead have
1
|ψ⟩ = |x⟩ ⊗ √ (|1⟩ − |0⟩) = −|x⟩|−⟩,
2
i.e. a −1 phase factor is produced. We can summarize both these cases in a single equation:
Uf |x⟩|−⟩ = (−1)f (x) |x⟩|−⟩. (7.1)
Exercise 7.5. Convince yourself that the equation above indeed captures both the cases of
f (x) = 0 and f (x) = 1.
Reanalyzing Deutsch’s algorithm using phase kickback. Let us return to the state in Deutsch’s
algorithm just before Uf is applied, i.e.
1
|ψ2 ⟩ = |+⟩|−⟩ = √ (|0⟩|−⟩ + |1⟩|−⟩).
2
(Note that we have not expanded out the |−⟩ state as we did previously — this is because
with the phase kickback trick, we don’t need to go to this level of detail!) By applying phase
kickback (Equation (7.1)), we know that after Uf is applied, we have state
1
|ψ3 ⟩ = √ ((−1)f (0) |0⟩|−⟩ + (−1)f (1) |1⟩|−⟩).
2
Suppose now that f is constant, i.e. f (0) = f (1). Then, above we can factor out the −1 phase
factor to simplify |ψ3 ⟩ to
1
|ψ3 ⟩ = (−1)f (0) √ (|0⟩|−⟩ + |1⟩|−⟩) = (−1)f (0) |+⟩|−⟩.
2
Thus, applying the final Hadamard to qubit 1 yields
|ψ4 ⟩ = (−1)f (0) |0⟩|−⟩.
Measuring the first qubit now yields outcome 0 with certainty, as before.
On the other hand, if f is balanced (i.e. f (0) ̸= f (1)), then we cannot simply factor out the
(−1) term as before! Thus, up to an overall factor of ±1, |ψ3 ⟩ can be written as
1
|ψ3 ⟩ = ± √ (|0⟩|−⟩ − |1⟩|−⟩) = ±|−⟩|−⟩.
2
Exercise 7.6. Verify the equation above by considering the two possible balanced functions
f1 (0) = 0 and f1 (1) = 1 and f2 (0) = 1 and f2 (1) = 0.
56
|0i H H
|0i H H
Uf
|0i H H
|1i H
Exercise 7.7. Give examples of balanced and constant functions on 2 bits. Can you give an
example of a 2-bit function which is neither constant nor balanced? Finally, can a single-bit
function be neither constant nor balanced?
It turns out that Deutsch’s algorithm generalizes in an easy manner to this setting; however, its
analysis is a bit more tricky, and crucially uses the phase kickback trick. In this more general
setting, note that we define the oracle Uf implementing f analogously to before: Uf |x⟩|y⟩ =
|x⟩|y ⊕ f (x)⟩, where now x is an n-bit string.
The circuit for the Deutsch-Josza algorithm is given in Figure 7.7. As before, each wire
denotes a single qubit. The first n qubits are initialized to |0⟩; these are the input qubits. The
final, i.e. (n + 1)st, qubit is initialized to |1⟩. Observe that the algorithm is the straightforward
generalization of Deutsch’s algorithm to the setting of n input qubits. We claim that using a
single query to Uf , the Deutsch-Josza algorithm can determine if f is constant or balanced. Let
us now see why this is so.
As before, we divide the computation into 4 stages denoted by the quantum state in that
stage: At the start of the circuit (|ψ1 ⟩), after the first Hadamards are applied (|ψ2 ⟩), after Uf
is applied (|ψ3 ⟩), and after the last Hadamard is applied (|ψ4 ⟩). It is clear that
57
Since we have defined the action of Uf in terms of the standard basis, i.e. Uf |x⟩|y⟩ = |x⟩|y ⊕
f (x)⟩, in order to understand how Uf applies to |ψ2 ⟩, we first need to rewrite |+⟩⊗n in terms of
the standard basis. For this, note that
1 1 X
|+⟩⊗n = √ n (|0⟩ + |1⟩) ⊗ · · · ⊗ (|0⟩ + |1⟩) = n/2 |x⟩,
2 2 n
x∈{0,1}
where the last equality holds since expanding the tensor products out yields 2n terms in the
sum, each of which corresponds to a unique string x ∈ {0, 1}n .
1
Exercise 7.8. Verify that |+⟩⊗3 =
P
√
2 2 x∈{0,1}3 |x⟩.
Now that we have the first register written with respect to the standard basis, we can analyze
the action of Uf using the phase kickback trick, obtaining state
1 X
|ψ3 ⟩ = (−1)f (x) |x⟩|−⟩.
2n/2 n
x∈{0,1}
1 P f (x) H ⊗n |x⟩|−⟩.
Finally, we must apply the last set of Hadamard gates, which gives |ψ4 ⟩ = 2n/2 x∈{0,1}n (−1)
⊗n
To analyze this state, we first need to understand what H |x⟩ equals for arbitrary x ∈ {0, 1}.
For this, we begin with a clean and formal way for writing the action of H on a single qubit.
Recall that H|0⟩ = |+⟩ and H|1⟩ = |−⟩. Equivalently, this means that for x1 ∈ {0, 1},
1 X
H|x1 ⟩ = √ (−1)x1 z1 |z1 ⟩,
2 z ∈{0,1}
1
Exercise 7.9. Verify the statement above by considering the cases H|0⟩ and H|1⟩.
Now that we have a clean way of expressing H|x1 ⟩ for single qubit |x1 ⟩, we can generalize this
to n-qubit states. Specifically, if we write string x = x1 · · · xn as |x1 ⟩ ⊗ · · · ⊗ |xn ⟩, we have
Can we simplify this expression further? There is one small last trick we can apply which will
clean it up a bit: Observe that x1 z1 + · · · xn zn can be rewritten as the bitwise inner product
modulo 2 of strings x and z, i.e. x1 z1 + · · · xn zn = x · z. (The mod 2 arises since the base is
58
(−1), so all we care about is if the exponent x · z is even or odd.) Combining these facts, we
have that after the final Hadamards are applied, we have
1 X
|ψ4 ⟩ = (−1)f (x) H ⊗n |x⟩|−⟩
2n/2 n
x∈{0,1}
1 X 1 X
= (−1)f (x) (−1)x·z |z⟩ |−⟩
2n/2 2n/2
x∈{0,1}n z∈{0,1}n
1 X X
= (−1)f (x)+x·z |z⟩|−⟩. (7.2)
2n n n
z∈{0,1} x∈{0,1}
Finally, a measurement of the first n qubits of |ψ4 ⟩ is made in the standard basis. As for
Deutsch’s algorithm, a good way to analyze this is by individually considering the cases when
f is constant or balanced, respectively. The trick in both analyses will be to determine the
amplitude on the all zeroes state, |0⟩⊗n , in the first register.
Case 1: Constant f. Suppose first that f is constant. Then, we can factor out the term
(−1)f (x) in |ψ4 ⟩, i.e.
1
X X
|ψ4 ⟩ = (−1)f (x) n
(−1)x·z |z⟩|−⟩.
n 2 n
z∈{0,1} x∈{0,1}
Now consider the amplitude on |z⟩ = |0 · · · 0⟩, which is given by (up to an insignificant (−1)f (x)
global phase out front)
1 X 1 X 1 X 1 n
(−1)x·0···0 = (−1)0 = 1= 2 = 1.
2n 2n 2n 2n
x∈{0,1}n x∈{0,1}n x∈{0,1}n
In other words, the state |0⟩⊗n |−⟩ has amplitude 1. Since |ψ4 ⟩ is a unit vector, we conclude
that we must have |ψ4 ⟩ = (−1)f (x) |0 · · · 0⟩|−⟩, i.e. all the weight is on this one term. Thus, if f
is constant, then measuring the first n qubits in the standard basis yields outcome |0 · · · 0⟩ with
certainty.
Case 2: Balanced f. In this case, we cannot simply factor out the (−1)f (x) term, since f can
take on different values depending on its input x. However, it still turns out to be fruitful to
think about the amplitude on the state |z⟩ = |0 · · · 0⟩. This is given by
1 X 1 X
(−1)f (x)+x·0···0 = (−1)f (x)
2n 2n
x∈{0,1}n x∈{0,1}n
since z = 0. Since f is balanced, we know that for precisely half the terms in this sum, f (x) = 0,
and for the other half f (x) = 1. In other words, all the terms in this sum cancel, i.e. the sum
equals 0! We conclude that the amplitude on |z⟩ = |0 · · · 0⟩ is zero, and so we will never see
outcome |0 · · · 0⟩ in the final measurement.
Combining our observations for both cases, we find the following: When the final measurement
is completed, if the outcome is 0n , then we output “constant”; for any other n-bit measurement
result, we output “balanced”.
59
Classical algorithms for the Deutsch-Josza problem. Finally, you might wonder how classical
algorithms compete with the Deutsch-Josza algorithm. For the case of deterministic classical
algorithms, if f is balanced, then there are 2n/2 inputs x yielding f (x) = 0 and 2n/2 inputs x
yielding f (x) = 1. Thus, in the worst case, an algorithm might be very unlucky and have its
first 2n/2 queries return value f (x) = 0, only to have query 2n/2 + 1 return f (x) = 1. For this
reason, deterministic algorithms have worst case query complexity 2n/2 + 1. In this setting, the
Deutsch-Josza algorithm yields an exponential improvement over classical algorithms, requiring
just a single query to f .
However, one can also try a randomized classical algorithm. Here is a particularly simple one:
1. Repeat the following K times, for K a parameter to be chosen as needed:
a) Pick x ∈ {0, 1} uniformly at random.
b) Call Uf to evaluate f (x).
c) If f (x) differs from any previous query answer, then halt and output “balanced”.
Exercise 7.10. Suppose we wish our randomized algorithm to have error probability at most
1/n. What should we set K to?
Finally, let us compare this to the Deutsch-Josza algorithm. Suppose that f is chosen to
be constant with probability 1/2 and balanced with probability 1/2. Then, the Deutsch-Josza
algorithm uses 1 query to determine if f is constant or balanced with certainty. On the other
hand, if the randomized algorithm uses K ∈ O(1) queries, then its probability of success is
60
Exercise 7.11. What is the probability of success for the randomized algorithm if it performs
just a single query, i.e. K = 1? How does this compare with the Deutsch-Josza algorithm?
Note that this derivation was independent of the definition of f . This raises the question: For
what choices of f could measuring |ψ4 ⟩ yield interesting information? Since the exponent on
−1 contains x · z, a naive guess might be to consider linear functions f (x) = a · x, where
a, x ∈ {0, 1}n (or more generally affine functions f (x) = a · x + b for b ∈ {0, 1}). This seems a
natural guess, as it would allow us to factor out the a term. Specifically, plugging f (x) = a · x
into Equation (7.3) yields
1 X X
|ψ4 ⟩ = n (−1)(a+z)·x |z⟩|−⟩. (7.4)
2 n n
z∈{0,1} x∈{0,1}
Thus, the probability P of obtaining some outcome z when measuring the first register in the
standard basis is (2 −n (a+z)·x )2 . Since our goal is to extract a, let us ask: What
x∈{0,1}n (−1)
is the probability of observing z = a? This is given by (recall we are working modulo 2 in the
exponent)
2 2 2
1 1 1
X X X
n
(−1)(a+a)·x = n (−1)0·x = n 1 = 1. (7.5)
2 n 2 n 2 n
x∈{0,1} x∈{0,1} x∈{0,1}
In other words, all of the amplitude in |ψ4 ⟩ is concentrated on the |a⟩ term. Equivalently,
|ψ4 ⟩ = |a⟩|−⟩. We thus have that measuring the first register yields a with certainty. This
procedure is known as the Bernstein-Vazirani algorithm, which solves the problem: Given a
black-box linear function f (x) = a · x, determine a.
Exercise 7.12. Is it possible to have z ̸= a in Equation (7.4)? In other words, does there exist
a term in the superposition with non-zero amplitude which satisfies z ̸= a?
Exercise 7.13. Given an affine function f (x) = ax + b, show how the Bernstein-Vazirani
algorithm allows one to extract a with a single quantum query.
1
We are slightly abusing this proverb here, in that it typically has a negative connotation, whereas here our
premise is that quantum computation outperforming classical computation is a good thing.
61
Interpretation via the Deutsch-Josza problem. To close the lecture, let us go full circle and
see what Bernstein-Vazirani teaches us about Deutsch-Josza. Suppose first that f (x) = b for
b ∈ {0, 1}, i.e. we have affine function f (x) = ax + b with a = 0 in the Bernstein-Vazirani
framework. Then, clearly f is constant in the Deutsch-Josza framework. In both algorithms
(which are the same algorithm), we hence obtain all-zeroes in the first register when we measure
|ψ4 ⟩.
Next, consider f (x) = ax + b for a ̸= 0. (By an exercise above, we can ignore b, as it simply
leads to a global phase in |ψ4 ⟩, which cannot be observed.) Since the case of a = 0 corresponded
to constant f , we now expect the a ̸= 0 case to correspond to balanced f . The following basic,
but important, fact in Boolean arithmetic confirms this, and is worth keeping in your toolkit
moving forward.
Exercise 7.14. Let f (x) = a·x+b mod 2 be an affine function with a ̸= 0. Then, for precisely
half the inputs x ∈ {0, 1}n , f (x) = 0, and on the other half of the inputs, f (x) = 1. In other
words, f is balanced.
62
8 Simon’s algorithm and applications to
cryptography
“Great things are not done by impulse, but by a series of small things brought to-
gether.”
— Vincent van Gogh.
In the previous lecture, we began our foray into quantum algorithms with the Deutsch,
Deutsch-Josza, and Berstein-Vazirani algorithms (the latter two were actually the same algo-
rithm). Although an important proof of concept, the Deutsch-Josza algorithm did not give
us a strong separation between classical and quantum query complexity, as recall a classical
randomized algorithm can solve the Deutsch-Josza problem with only O(2−K ) probability of
error, where K is the number of oracle queries.
In this lecture, we shall study Simon’s algorithm, which was the first to yield an exponential
separation in query complexity between the quantum and classical randomized settings. A pri-
ori, Simon’s problem will again appear artificial. First appearances, however, can be deceiving:
Not only has recent research shown that Simon’s algorithm can be used to break certain clas-
sical cryptosystems, but the algorithm was a crucial step towards inspiring Peter Shor in his
development of the celebrated quantum factoring algorithm.
Exercise 8.1. Suppose f : {0, 1}2 → {0, 1}2 is such that f (00) = 01, f (01) = 10, f (10) = 01,
and f (11) = 10. What is s in this case?
0n = x ⊕ x = (x ⊕ y) ⊕ s,
implying s = x ⊕ y due to our use of XOR. Thus, we are tasked with finding a collision, i.e.
a pair of inputs yielding the same output. Let us be exceedingly naive — let us randomly
63
pick inputs x uniformly at random and compute f (x) until we obtain a collision. What is the
expected number of queries this requires if we wish to succeed with probability at least, say,
1/2? For this, think back to the last birthday party you attended — if there were N attendees
present, how large would N have to be before the odds that there existed two attendees with
the same birthday was at least 1/2? The answer is not ≈ 365, but rather N = 23, by the
Birthday Paradox. More generally, the latter says
√ if there are D days in the year, a collision
√ if N ∈ O( D). Thus, if we equate birthdays with query
occurs with probability at least 1/2
results f (x), we expect to make O( 2n ) queries before we find two inputs with the same output
with probability at least 1/2.
n
√ algorithm above assumed s ̸= 0 ; how can we “complete” the algo-
Exercise 8.2. The naive
n
rithm so that with O( 2 ) queries, with probability at least 1/2, we output the correct s, even
if s can equal 0n ?
|0⟩ H H
|0⟩ H H
.. ..
. .
|0⟩ H H
Uf
|0⟩
|0⟩
.. ..
. .
|0⟩
Exercise 8.3. What are the differences between the Deutsch-Josza circuit and the one above?
Exercise 8.4. Does the circuit above use phase kickback? Why or why not, intuitively?
For clarity, when a query has n output bits in Cn , we define for f : {0, 1}n → {0, 1}n :
64
be difficult to sample from classically. In Simon’s algorithm, as we shall now see, each run of
Cn allows us to sample from the uniform distribution over strings
Y = {y ∈ {0, 1}n | s · y = 0}. (8.1)
The ability to sample from this distribution, coupled with classical polynomial-time processing,
will suffice to solve Simon’s problem.
Exercise 8.5. Suppose you are given the ability to sample from the uniform distribution over
Y in Equation (8.1). Why might this suffice to efficiently compute s classically?
Analysis of Cn . Let us now trace through the execution of each run of Cn . There are four
timesteps before the final measurement, whose states we denote |ψ1 ⟩, |ψ2 ⟩, |ψ3 ⟩, |ψ4 ⟩ ∈ (C2 )⊗2n ,
respectively. Clearly, |ψ1 ⟩ = |0⟩2n and |ψ2 ⟩ = |+⟩⊗n |0⟩⊗n . Expanding this in the standard
basis, we have that
1 X
|ψ3 ⟩ = √ |x⟩|f (x)⟩.
2n x∈{0,1}n
Normally, we would now apply the last set of Hadamards; however, it will be enlightening to
play a technical trick first. Observe that in the circuit diagram above, the Hadamards on the
first n qubits and the measurements on the last n qubits act on disjoint sets of qubits; thus,
these operations commute, and the order in which we apply them is irrelevant. Thus, let us first
measure the second register, obtaining some outcome y ∈ {0, 1}n . The key observation is that
the postmeasurement state on the first register must be consistent with y, i.e. the first register
can now only contain x ∈ {0, 1}n such that f (x) = y. But by our promise on f , we know
something about this — assuming s ̸= 0n , there are precisely two preimages x of y, namely x
and x ⊕ s! Our postmeasurement state, which we denote |ψ3′ ⟩, is hence:
1
|ψ3′ ⟩ = √ (|x⟩ + |x ⊕ s⟩)|f (x)⟩.
2
This demonstrates one of the strengths of quantum computation — by postselecting on a mea-
surement outcome in the second register, we can collapse a superposition over the first register
onto some postmeasurement state which reveals some desired structure.
Exercise 8.6. We claimed above that assuming s ̸= 0n , there are precisely two preimages x of
y, namely x and x ⊕ s. In other words, f is a 2-to-1 function. Convince yourself that this claim
is correct.
Exercise 8.7. Assume s = 0n . Is f still 2-to-1? If not, then what type of function is f ?
We are now ready to apply the final set of Hadamards √ on the first register of |ψ3′ ⟩. Recall
n
from Lecture 7 that for any x ∈ {0, 1} , H ⊗n |x⟩ = (1/ 2n ) y∈{0,1}n (−1)x·y |y⟩ for · the inner
P
product modulo 2. Thus,
1 X X
|ψ4 ⟩ = √ (−1)x·y |y⟩ + (−1)(x⊕s)·y |y⟩ |f (x)⟩
2 n+1
y∈{0,1}n y∈{0,1}n
1 X
= √ (−1)x·y (1 + (−1)s·y )|y⟩ |f (x)⟩,
2 n+1
y∈{0,1}n
65
where we have used the fact that (x ⊕ s) · y = x · y ⊕ s · y. Observe now that in the exponents,
s · y ∈ {0, 1} (since we are working modulo 2), and s · y = 1 leads to an amplitude of 0. Thus,
we really have state
1 X
|ψ4 ⟩ = √ (−1)x·y |y⟩|f (x)⟩,
2 n−1
s·y=0
i.e. an equal superposition (up to relative phase) of y in set Y (Equation (8.1)). Thus, measuring
the first register now allows us to sample uniformly at random from Y , as claimed.
√
Exercise 8.8. The prefactor of |ψ4 ⟩ of 2−n+1 suggests that |Y | = 2n−1 . Prove the latter
statement holds.
Classical postprocessing. We now show how to use the ability to sample from Y to solve
Simon’s problem. Repeating the circuit C n − 1 times yields strings y1 ,. . . , yn−1 , each of which
satisfies s · yi = 0. Thus we have a linear system (where s(i) denotes the ith bit of string s)
s(1)y1 (1) + s(2)y1 (2) + · · · s(n)y1 (n) = 0
s(1)y2 (1) + s(2)y2 (2) + · · · s(n)y2 (n) = 0
..
. (8.2)
s(1)yn−1 (1) + s(2)yn−1 (2) + · · · s(n)yn−1 (n) = 0, (8.3)
where all arithmetic is done over the finite field F2 with elements {0, 1}. By Gaussian elim-
ination, which works over any field, we can solve this system in cubic arithmetic operations,
i.e. O(n3 ) time, to obtain the possible solutions s. Here, an “arithmetic” operation means we
are counting each field operation (i.e. an addition, subtraction, multiplication, or division) as
costing one “unit of time”. (One can also consider the bit complexity of an algorithm, which
further includes how many operations over individual bits each field operation requires.) Note
that s = 0n is always a solution to the system in Equation (8.3), since the system is homoge-
neous (i.e. includes no constant terms). By the promise on f , we know that either s = 0n is
the only solution, or there is precisely one other s ̸= 0n , which is what we are interested in.
Probability of failure. Thus far, the algorithm we have described appears to succeed with
certainty. However, there is a catch to Gaussian elimination which we have neglected to mention
— to obtain a unique non-zero solution s (if it exists), we require the strings yi to be linearly
independent over F2 . Luckily, since we are sampling uniformly over Y , it can be shown that
the probability of y1 , . . . yn−1 being independent is at least 1/4. While this may seem a poor
success probability a priori, it is easy to boost this to something exponentially close to 1, as
you will now show.
Exercise 8.9. Show that if the probability of failure in each run of the algorithm is at most
0 < c < 1, then with K runs of the algorithm, we can guarantee success with probability at
least 1 − c−K .
66
Feistel networks. A common idea in theoretical cryptography is to use the simplest assump-
tions possible to build secure cryptographic primitives. One such primitive is the computation
of a random permutation, i.e. given an input x ∈ {0, 1}n , output π(x) for π a permutation
on n bits chosen uniformly at random. What do we need to implement such a primitive?
Intuitively, one might guess that if we assume the ability to at least perform random func-
tions f : {0, 1}n → {0, 1}n (not necessarily permutations), then this could be bootstrapped to
simulate random permutations.
Indeed, this is the aim of a Feistel network. Let us focus on the case of 3-round Feistel
networks. Such a network takes inputs of the form (l, r) ∈ {0, 1}n × {0, 1}n , and each round i
produces string (li , ri ) ∈ {0, 1}n × {0, 1}n , defined recursively as follows for i ∈ {1, 2, 3}:
Here, the fi : {0, 1}n → {0, 1}n denote the random functions, which we assume the ability
to perform. Classically, Feistel networks are provably cryptographically secure, which roughly
means given a black box U : {0, 1}2n → {0, 1}2n performing either a Feistel network or a truly
random permutation, one cannot distinguish in polynomial-time which of the two cases holds
(formal definitions are beyond the scope of this course). However, it turns out that if we assume
one can quantumly query U in superposition, then one can break the security of the 3-round
Feistel network efficiently. Specifically, in this setup, one can use Simon’s algorithm to efficiently
distinguish whether a given U is a Feistel network or a random permutation.
Exercise 8.10. For any fixed f : {0, 1}n → {0, 1}n , the action of one round of a Feistel network
is given by map gf (x, y) = (y, x ⊕ f (x)). Prove that g is a permutation on 2n bits. (Hint:
Prove that g is a bijection.) Conclude that the 3-round network implements some random
2n-bit permutation. Why does this not immediately yield our desired primitive of generating a
“genuinely random permutation”? (Hint: What distribution over 2n-bit permutations does a
Feistel network sample from?)1
How to break a Feistel network. Let U : {0, 1}2n → {0, 1}2n be a black box which either
computes a 3-round Feistel network with internal random functions f1 , f2 , f3 : {0, 1}n → {0, 1}n ,
or a truly random permutation on 2n bits. Our task is to distinguish which of the two cases
holds.
To apply Simon’s algorithm, our goal is to ideally define a function f satisfying
when U is a Feistel network. Let us begin by looking at just the first n bits of the output of
U (x, y), which define some function V : {0, 1}2n → {0, 1}n . In the case of a Feistel network,
note that Equation (8.4) implies
V (x, y) = y ⊕ f2 (x ⊕ f1 (y)).
1
We thank Or Sattath for suggesting this exercise.
67
Exercise 8.11. Verify the equation above.
To define f , we fix arbitrary distinct strings α, β ∈ {0, 1}n . The function f : {0, 1}×{0, 1}n →
{0, 1}n will take in two inputs, a control bit a, and an argument b, such that
(
V (b, α) ⊕ α if a = 0,
f (a, b) =
V (b, β) ⊕ β if a = 1.
In other words, depending on the control bit a, we feed either α or β as the second input to
black box U , and subsequently bitwise XOR on α or β.
Let us now see why f almost satisfies the promise of Simon’s problem (i.e. Equation (8.5)
when U is a Feistel network. For this, we require the following exercise.
With this exercise in hand, we claim that the reverse implication of Equation (8.5) holds, i.e.
if x = y ⊕ s, then f (x) = f (y). To see this, define mask s = (1, f1 (α) ⊕ f1 (β)). Then, for any
input (0, b) ∈ {0, 1} × {0, 1}n (the case of input (1, b) is analogous),
∃s ∈ {0, 1}n+1 such that, for all inputs (a, b), f (a, b) = f ((a, b) ⊕ s). (8.6)
Thus, the reverse implication of Equation (8.5) holds. Unfortunately, the forward implication
of Equation (8.5) does not in general hold.
Exercise 8.13. Why is it not necessarily true that if U is a Feistel network and if f (a, b) =
f (c, d), then (a, b) = (c, d) ⊕ s? (Hint: Recall the fact that in Simon’s problem, the function f
was 2-to-1 assuming s ̸= 0n . Is f above 2-to-1 in general?)
Luckily, it turns out Simon’s algorithm does not need this forward direction of Equation (8.5)
to hold.
We can now outline the algorithm for breaking the security of the Feistel network. For this,
note that if U is not a Feistel network, but rather a random permutation, then the odds of
there existing an s satisfying Equation (8.6) can be shown to be negligible. Thus, intuitively,
given black box access to U , we can run Simon’s algorithm to see if we can compute a mask s
satisfying Equation (8.6); if we succeed, we conclude U is a Feistel network, and if we fail, we
conclude U is a random permutation.
This outline is more formally stated below; its analysis is left as an exercise. As a first step
in the analysis, one needs that given black-box access to U , one can simulate black-box access
to f , which you will now show.
68
Exercise 8.14. Given black-box access to U , i.e. the ability to map input |x, y⟩ to |x, y ⊕U (x)⟩,
show how to simulate black box access to f , i.e. how to simulate mapping |u, y⟩ 7→ |u, y ⊕ f (u)⟩.
The algorithm of [SS17] (see also [KM10]) is as follows. It returns either FEISTEL or RAN-
DOM to denote its guess as to the identity of U .
1. Initialize empty set S, whose purpose will be to store samples of strings y obtained via
Simon’s algorithm.
4. (Check if s is correct) Pick input (a, b) ∈ {0, 1} × {0, 1}n uniformly at random, and check
if Equation (8.6) holds. If yes, output FEISTEL.
5. Output RANDOM.
69
9 The Quantum Fourier Transform
“. . . in mathematics you don’t understand things. You just get used to them.”
— John von Neumann.
It is hard to understate the impact of the Fourier transform (FT) on our everyday lives. Did
you listen to an MP3 sound file this week? You used the Fourier Transform. Download JPEG
photos of your friends off the internet? You used the Fourier Transform. Get an MRI recently?
You guessed it — Fourier Transform. So what is the Fourier transform, and why is it so useful?
In a nutshell, the Fourier Transform allows us to efficiently decompose a signal (say, a sound
wave) into its constituent frequencies. So, for example, that MP3 file which compresses your
favorite song down to a few megabytes — it roughly works by applying the Fourier Transform to
break down the song into its constituent frequencies, and then removing or filtering out the fre-
quencies whose absence you are unlikely to notice (e.g. very high or low frequencies). Moreover,
what makes this so applicable in practice is that the Fourier transform has a speedy near-linear
time implementation. In particular, the Discrete Fourier transform (DFT) of a vector |ψ⟩ ∈ CN
can be implemented via the Fast Fourier Transform (FFT) algorithm using 1 O(N log N ) field
operations over C. These two properties (signal decomposition and highly efficient implemen-
tation) are the one-two punch which makes the Fourier Transform so ubiquitous in everyday
life.
It is thus perhaps fitting that, in the quantum setting, the Quantum Fourier Transform
(QFT) underpins one of the greatest achievements of the field — Shor’s quantum factoring
algorithm. The primary focus of this lecture is hence to introduce the QFT, its implementation,
and various applications. With these concepts under our belt, we will be ready to tackle the
quantum factoring algorithm.
Now, we should be clear that broadly speaking, the topic of Fourier transforms is a bit
confusing (this is roughly what the quote atop this lecture is alluding to). There are, in fact, four
Fourier transforms (at least from the perspective of signal processing), depending on whether
the input signal is continuous or discrete, finite/periodic or infinite/aperiodic. These go by the
names of Fourier Series (continuous, finite), Discrete Fourier Transform (discrete, finite), Fourier
Transform (continuous, infinite), and Discrete-Time Fourier Transform (discrete, infinite). (The
interested reader is referred to [Kul02] for further details.) Here, when we talk about the FFT
or QFT, we are referring to fast classical and quantum implementations, respectively, of the
DFT. Thus, our “signals” are discrete and finite, encoded as vectors |ψ⟩ ∈ CN . The DFT,
in turn, is a linear opeator mapping CN to CN , representable as an N × N complex matrix.
For this reason, we begin with a general class of matrices from which the DFT arises — the
Vandermonde matrices.
70
PN −1 k
the sequence entries, i.e. the geometric series k=0 c , has a closed form. What happens if
we embed such sequences as rows of a matrix? We might naively expect to get a structured
matrix whose properties can, like the geometric series, be analyzed. Such matrices are called
Vandermonde matrices.
Vandermonde matrices. As a naive first attempt for N = 3, fix any c ∈ C, and consider
Vandermonde matrix
1 c c2
M1 = 1 c c 2 .
1 c c2
This is not very interesting — since all rows are the same, M is just a rank 1 embedding of the
original sequence (1, c, c2 ). But something interesting happens when we pick a different c for
each row. For any distinct c1 , c2 , c3 ∈ C, the matrix
1 c1 c21
M2 = 1 c2 c22 .
1 c3 c23
turns out to have linearly independent rows. Thus, it is full rank (and hence invertible, assuming
M2 is square).
Exercise 9.1. Suppose above that c1 = c2 , but c1 ̸= c3 . Is M2 full rank? What if we add a
fourth row with entries 1, c4 , c24 , for c4 distinct from c1 , c2 , c3 ?
To help give intuition as to the usefulness of Vandermonde matrices, one important application
(which we revisit at the end of this section) is evaluating polynomials p : C → C at the points
(in the case of M2 ) c1 , c2 , c3 ∈ C.
Exercise 9.3. Using the previous exercise, prove the Interpolation Theorm, which states: Any
set of d + 1 point-value pairs {(ci , p(ci ))}d+1
i=1 identifies a unique degree-d polynomial p, assuming
all ci are distinct.
Returning to the DFT. Since we can choose any complex values for the {ci }, let us choose
the “quintessential” set of N complex numbers, the N th roots of unity. These are generated by
taking powers of the “principal” N th root of unity, ωN := e2πi/N , i.e.
0 1 N −1
ωN , ωN , . . . , ωN .
For example, the principal 4th root of unity is ω4 = eπ/2 = i, and the 4th roots are (1, i, i2 =
−1, i3 = −i).
Exercise 9.4. Draw the 4th roots of unity on the complex unit circle. More generally, what
do the N th roots of unity look like on the circle?
71
Exercise 9.5. Why are there only N N th-roots of unity? More precisely, why is sequence
0 , ω 1 , . . . , ω N −1 , ω N ) redundant?
(ωN N N N
Plugging the N th roots of unity into a Vandermonde matrix, we finally arrive at the order-N
DFT:
1 1 1 1 1
2 N −1
1 ωN ωN ··· ωN
−1
2 2 2 ··· 2 N
DFTN = 1 ωN (ωN ) (ωN ) .
.. .. . . .
.
1 . . . .
N −1 N −1 2 N −1 N −1
1 ωN (ωN ) ··· (ωN )
Exercise 9.8. Prove that the rows of DFTN are orthogonal. You may find it useful to look up
the closed formula for geometric series, which also applies to complex numbers.
Exercise 9.9. Prove that a square matrix is unitary if and only if its rows (equivalently,
columns) form an orthonormal set. Conclude that, up to scaling, DFTN is unitary.
Exercise 9.10. What is the correct scaling required to make DFTN unitary?
Since DFTN is unitary, we can in principle apply it as a quantum gate to a quantum state!
In other words, given |ψ⟩ ∈ CN , Nature allows the mapping DFTN |ψ⟩ ∈ CN . The only problem
is that for an n-qubit system, the dimension N = 2n is exponential in n. So the question is:
Can we implement DFTN via a quantum circuit of size polynomial in the number of qubits,
n? In the next section, we show the answer is yes. It is this quantum implementation of the
(normalized) DFT which is henceforth denoted the Quantum Fourier Transform (QFT).
Exercise 9.11. The fastest classical implementation of the DFT is the Fast Fourier Transform
(FFT), which requires O(N log N ) time. We shall show in Section 9.2 that the quantum im-
plementation of the DFT, the QFT, requires only time O(polylog(N )). Thus, on the face of it,
it looks like one quantum computers exponentially speed up computation of the DFT. Why is
this a misleading claim? (Hint: How does the output of the QFT differ from that of the FFT?))
Finally, back to polynomial evaluation. Before closing this section, let us return to our aside
into polynomial evaluation.
72
Exercise 9.12. Using the FFT, argue that an arbitrary polynomial of degree N − 1 can be
evaluated at all N th roots of unity in just O(N log N ) time. What would be the naive runtime
for this if we do not appeal to the FFT?
Exercise 9.13. Given two degree N − 1 polynomials p and q, suppose we use the FFT (as in
the previous exercise) to evaluate p and q at the N th roots of unity, i.e. to obtain two lists of
point-value pairs
N −1 N −1
(1, p(1)), (ωN , p(ωN )), ... , (ωN , p(ωN )), and
N −1 N −1
(1, q(1)), (ωN , q(ωN )), ... , (ωN , q(ωN )).
Using the fact that for any (x, p(x)) and (x, q(x)), the tuple (x, p(x)q(x)) denotes the value of
the product of p and q on input x, show that the coefficients of polynomial pq(x) := p(x)q(x)
can be computed in time O(N log N ). (Hint: You will also need the Interpolation Theorem.)
How does this compare with the naive runtime for multiplying out two degree N −1 polynomials
p and q (i.e. without using the FFT)?
Exercise 9.14. Use the FFT-based algorithm you derived in the previous exercise to compute
the product of polynomials p(x) = x2 − x + 1 and q(x) = 2x2 + 3x. (Hint: Since N = 3 in this
case, you can do all computations by hand.)
The N=4 case. We already know that QFT2 = H (i.e. when n = 1), so what is QFT4
(n = 2)? Recall that any linear map is completely specified by its action on a basis, such as the
standard basis. Thus, since QFT4 |j⟩ extracts the jth column of QFT4 , we have:
3
1 X 2πij k
QFT4 |j⟩ = e 4 |k⟩, (9.1)
2
k=0
where we have intentionally grouped k/4 together for the following reason. Recall from your
programming courses that, typically, division of two integers a/b is not “cheap”, but division
by a power of 2, a/2k , is cheap — it simply shifts over the bit representation of a to the right
by k positions. For example, translating 3/2 to binary means we shift the bit representation of
3, 112 , to the right one position to obtain 1.12 , which should be interpreted as 1 · 21 + 1 · 2−1 .
Exercise 9.15. What is the binary representation of 14/29? Expand it out in terms of powers
of 2.
73
Rewriting |k⟩ in binary as |k1 k2 ⟩, we thus have
3
1 X 2πij(0.k1 k2 )
QFT4 |j⟩ = e |k1 k2 ⟩ (9.2)
2
k=0
3
1
e2πij (k1 ·2 ) |k k ⟩
X −1 +k ·2−2
= 2
1 2 (9.3)
2
k=0
1 1
1 X −1
X −2
= e2πijk1 ·2 |k1 ⟩ e2πijk2 ·2 |k2 ⟩ (9.4)
2
k1 =0 k2 =0
1 j
j
= |0⟩ + e2πi 2 |1⟩ |0⟩ + e2πi 22 |1⟩ (9.5)
2
1
= |0⟩ + e2πi(0.j2 ) |1⟩ |0⟩ + e2πi(0.j1 j2 ) |1⟩ . (9.6)
2
Exercise 9.16. In Equation (9.6), why can we omit the bit j1 in the phase e2πi(0.j2 ) ?
Now comes the key observation: Applying the Hadamard to single qubit standard basis state
|j⟩ ∈ C2 yields
1
H|j⟩ = √ |0⟩ + e2πi(0.j) |1⟩ . (9.7)
2
Exercise 9.18. Prove that Rθ is unitary for any θ ∈ R. What is its action on input |ψ⟩ =
α|0⟩ + β|1⟩?
Exercise 9.19. Consider circuit (R1/4 ⊗ I)(H ⊗ H)|j⟩. Why does this not reproduce DFT4 |j⟩
(up to swapping the output registers)?
The exercise above highlights that in order to insert phase exp(2πi(0.0j2 )) into the first term
(and hence first qubit) of Equation (9.8), the gate R1/4 must also depend on the second qubit
74
state, |j2 ⟩. We thus arrive at our QFT4 circuit by adding a controlled -R1/4 gate (below, we
omit swapping the two output qubits for simplicity):
Exercise 9.20. Convince yourself that, up to swapping the output qubits, this is a correct
QFT4 circuit.
Figure 9.1: The QFTN circuit (omitting final reordering of output qubits) acting on n qubits,
where N = 2n .
Exercise 9.22. Note the output qubits of the circuit above are in the opposite order than
in Equation (9.12). Given the ability to perform the two-qubit SWAP gate, which maps any
bipartite standard basis state |i⟩|j⟩ to |j⟩|i⟩, show how to rearrange the output of the circuit to
match Equation (9.12).
Exercise 9.23. Prove that the following sequence of CNOT gates implements the two-qubit
SWAP gate:
• •
•
75
Do you need to explicitly prove it for any choice of product input state |ψ⟩|ϕ⟩ ∈ C4 ?
Exercise 9.24. How does the SWAP gate act on each of the four Bell basis states for C4 ?
Exercise 9.25. Show the circuit of Figure 9.21, together with your use of postprocessing via
SWAP gates, correctly computes QFTN . This quantum circuit implementation of DFTN is
what we call here the QFTN .
Exercise 9.26. What is the size of the circuit in Figure 9.21, where “size” is defined by the
number of 1- and 2-qubit gates? (Hint: Think about arithmetic series.) How much overhead
does the final postprocessing via SWAP gates incur, and does this change the overall circuit
size estimate for Figure 9.21? Give your estimates in terms of both N and n.
Let us close this section by again stressing that, up to renormalization, DFTN and QFTN
perform precisely the same map. However, while the former is an abstract linear map, the latter
is an explicit quantum circuit implementation of DFTN which requires O(log2 N ) ∈ O(n2 ) gates.
In contrast, the best known classical implementation for the DFTN is the FFT, which requires
O(N log N ) time. Thus, it looks like QFTN gives an exponential speedup over the FFT. But
this is entirely misleading, as the FFT and QFTN represent their output in entirely different
formats — the former gives an explicit list of all N entires of the output vector |ψ⟩, whereas
the latter gives |ψ⟩ as a physical quantum state on log N qubits, whose amplitudes cannot in
general be individually recovered.
• Input: A quantum circuit implementing a unitary operator U ∈ L (C2 )⊗n , and eigen-
vector |ψθ ⟩ satisfying U |ψθ ⟩ = e2πiθ |ψθ ⟩, for some θ ∈ [0, 1).
Exercise 9.28. Prove that a normal operator U is unitary if and only if all of its eigenvalues
have form eiθ .
Exercise 9.29. How would you solve QPE classically, i.e. given full written matrix and vector
descriptions of U and |ψ⟩?
Despite its arguably dull appearance, QPE is an important problem. Not only is it at the
heart of the factoring algorithm, but QPE has applications throughout quantum algorithms,
from quantum walks to solving linear systems of equations. We will briefly comment on such
applications in Section 9.3.1. Before then, however, take a moment to break down the statement
of Definition 9.27 in your mind, and allow the inevitable myriad of questions to arise — Can
76
we solve QPE in general? What if θ has no finite representation? How do we magically get our
hands on an eigenvector |ψθ ⟩? And what if we cannot compute such an eigenvector efficiently?
2. Quantum random walks. A classical random walk is exactly what it sounds like —
given a graph G = (V, E) and a starting vertex v ∈ V , in each time step we pick a random
neighbor of our current vertex and “walk” there. A general framework for implementing
quantum walks uses QPE as follows. Given a vector |ψ⟩, as a subroutine in the quantum
walk, we would like to reflect about |ψ⟩, i.e. to apply operator Rψ = 2|ψ⟩⟨ψ| − I.
Exercise 9.30. Show that Rψ is indeed a reflection about |ψ⟩, i.e. that Rψ |ψ⟩ = |ψ⟩, and
for any |ψ ⊥ ⟩ orthogonal to |ψ⟩, Rψ |ψ ⊥ ⟩ = −|ψ ⊥ ⟩.
One approach for implementing Rψ , roughly, is to set up a unitary map U for which |ψ⟩
is an eigenvector. Then, to simulate application of Rψ to an arbitrary state |ϕ⟩, one uses
QPE to decompose |ϕ⟩ into its components: The component proportional to |ψ⟩, and
the component orthogonal to |ψ⟩. With this “on-the-fly” decomposition in hand, one can
“inject” a phase of +1 or −1 as needed, respectively.
3. Powers of unitaries and linear systems. Suppose √ we have access to a circuit imple-
mentation of unitary U , but want to instead apply U . Can we do it using just U ? QPE
gives us an approach for this, based on the following exercise.
√
Exercise 9.31. If the kth eigenvalue of U is eiθk , what is the kth eigenvalue of U?
Thus, we can use QPE to, roughly, extract the eigenvalue phases of U into a separate
register (i.e. as strings |θk ⟩), and coherently
√ apply the square root function to “update”
or “customize” the eigenvalue phases to | θk ⟩. Undoing the QPE estimation√ circuit then
effectively “reinserts” the eigenvalues back into the operator to simulate U .
A striking application of this idea of “eigenvalue surgery” is to solve linear systems of
equations A|ψ⟩ = |b⟩ (i.e. given A, |b⟩, what is |ψ⟩?). The setup there is more complicated,
as the coefficient matrix A need not even be Hermitian (never mind unitary). But the
rough premise is similar — to simulate the inverse A−1 , one uses QPE to extract the
eigenvalues λi of A (via some appropriate unitary implementation of A), inverts the λi
“manually” to 1/λi , and then uses postselection to “reinsert” the edited eigenvalues “back
into A”.
77
9.3.2 Quantum algorithms for QPE
We now give quantum algorithms for QPE. For simplicity, we assume the desired phase θ ∈
[0, 1) can be represented exactly using n bits, i.e. in binary we write θ = (0.θ1 · · · θn )2 . As a
result, 2n θ = (θ1 · · · θn )2 . Our aim is, given U and |ψθ ⟩, to prepare a quantum n-qubit register
containing |θ1 · · · θn ⟩ = |2n θ⟩.
Thus, if we could prepare the right hand side above, applying QFT†N would recover |θ⟩, as
desired. There are two ways to do this, depending on how much we want to assume about what
we can do with U .
More restrictive: Assume we can apply controlled-U k gates for any integer k ≥ 0. The
following exercise suggests an approach for solving QPE in this setting.
Exercise 9.32. Show that for non-negative integer k, U k |ψθ ⟩ = e2πikθ |ψθ ⟩.
But now we claim it is easy to prepare the right-hand side of Equation (9.13): Prepare an equal
superposition over all |k⟩, and then apply the controlled-U k gate:
n
2 −1 2 −1 n 2 −1 n
n H ⊗n 1 X c−U k 1 X 2πikθ 1 X 2πik (2nnθ) QFT†
|0 ⟩|ψθ ⟩ 7−−−→ √ |k⟩|ψθ ⟩ 7−−−→ √ e |k⟩|ψθ ⟩ = √ e 2 → |2n θ⟩|ψθ ⟩,
|k⟩|ψθ ⟩ 7−−−−N
n
2 k=0 n
2 k=0 n
2 k=0
where the first map applies local Hadamards to the first register, the second the controlled-U k
gates with the first register as control and the second register as target, and the last the inverse
QFTN to the first register.
k
Less restrictive: Assume we can apply controlled-U 2 gates for any integer k ≥ 0. Suppose
we cannot raise U to arbitrary powers, but at least we can raise it to powers of 2. Can we still
solve QPE? The answer is yes (albeit slightly more involved than the previous case), and is
given in the following exercise.
Exercise 9.33. Show that by leveraging the expansion in Equation (9.14) instead of Equa-
tion (9.13), one can still solve QPE, even with the restricted ability of raising U to powers of
2. (Hint: Each choice of a power of 2 will allow you to prepare a specific term in the tensor
product on the right hand side of Equation (9.14).)
Runtime and discussion. There are two major issues we have thus far swept under the rug:
What if θ is not representable in n bits, and what is the size of the circuit implementing QPE?
Irrational θ ∈ [0, 1). Suppose now θ’s binary representation is longer than n bits (potentially
infinitely long). Can we at least approximate θ to its n most significant bits? Yes, and in fact,
78
the exact same circuit as above works, with the following tweaks. Suppose we wish to compute
the n most significant bits of θ’s binary expansion with probability of success at least 1 − ϵ for
ϵ > 0. Then, one can show that running the same procedure as before suffices, so long as we are
1
willing to slightly expand the size of the first register to n + ⌈log(2 + 2ϵ )⌉ qubits (all initialized
to |0⟩ before the circuit starts).
Size of circuit. It is easy to see that the two dominant components of the QPE algorithm are
the QFTN circuit and the controlled-U k gate. We know the size of the first; but what about
the size of the second? Unfortunately, here we hit a snag — to get n bits of precision in our
n
estimate of θ, we need the gate c − U 2 . But in general it will be impossible, given an arbitrary
U , to compute U k efficiently if k is superpolynomial in the input size.
n
Exercise 9.34. Suppose given an arbitrary unitary U on n qubits, we can compute U 2 effi-
ciently, i.e. with a quantum circuit of size polynomial in n. Prove that this implies quantum
computers can efficiently count the number of solutions to a SAT formula ϕ : {0, 1}n → {0, 1}
(formally, the complexity class BQP would contain #P). Since it is believed unlikely that quan-
tum computers can solve NP-hard problems (never mind #P problems), conclude it is unlikely
for such a powering circuit for U to exist.
Thus, for arbitrary U , the best we can hope to do is estimate θ within O(polylog(n)) bits of
precision (since then the largest power of U we need is polynomial in n).
Exercise 9.35. Suppose we approximate θ = 0.θ1 θ2 . . . ∈ [0, 1) by its first ⌈log n⌉ bits, i.e.
θe = 0.θ1 . . . θ⌈log n⌉ . What is the largest the additive error θ − θe can be?
The exercise above shows that logarithmically many bits of precision suffices to get inverse
polynomial additive error. Note that this suffices in certain applications, since (roughly speak-
ing) polynomial-size quantum circuits cannot distinguish states which are closer than an inverse
polynomial in distance (trace distance for density operators, which reduces to Euclidean distance
for pure state vectors) anyway.
Conclusion for QPE. We conclude our discussion on QPE by reiterating that, to estimate
k
θ within n bits of precision, we require the controlled-U 2 gate for k ∈ O(n). The overall
runtime of QPE will thus scale with the size of the QFTN circuit and the powering-of-U circuit,
the latter of which generally scales exponentially with k (the precise size is naturally context
specific, i.e. depending on which U we have). Second, for the factoring algorithm, being able
to raise U to exponentially large powers will be important — luckily, as we will see there, the
specific U needed will implement a classical operation whose precise structure allows such fast
exponential powering.
79
10 Shor’s quantum factoring algorithm
“Euclid taught me that without assumptions there is no proof. Therefore, in any
argument, examine the assumptions.”
— Eric Temple Bell.
In the previous lecture, we introduced the Quantum Fourier Transform (QFT), and applied
it to the Quantum Phase Estimation (QPE) problem. In this lecture, we show how to use QPE
to derive a crown jewel of quantum computation — the quantum factoring algorithm. This
algorithm, developed by Peter Shor in 1994, forced the classical computer science community
to re-evaluate one of its core assumptions. Namely, in 1978, Rivest, Shamir, and Adleman
published the public-key cryptosystem known as RSA, which to this day remains widely used
in practice. And while the field had grown comfortable assuming RSA is secure, its Achilles
heel was hiding in plain sight — the abstract of the original RSA paper itself reads:
“The security of the system rests in part on the difficulty of factoring the published
divisor, n”.
Indeed, the standard assumption leading into the early 1990’s was that factoring is “hard”. And
this is precisely where Peter Shor made his mark — he showed that a polynomial-time quantum
computer could factor a composite integer N into its prime factors, thus breaking RSA.
At a high level, the factoring algorithm proceeds as follows: First, the problem of integer
factorization is reduced to the problem of order finding 1 . Then, the QPE algorithm is used
to sample from a certain desired distribution. Finally, these samples are postprocessed using
a classical algorithm known as continued fractions to solve the order finding problem. Before
getting into these details, however, let us begin by formally defining the factoring problem and
understanding its context.
Exercise 10.1. Consider the following brute force algorithm for finding √ a factor pi : Iteratively
try to divide N by each of the numbers in sequence√ (2, 3, 4, . . . , ⌊ N ⌋), and output the first
divisor found. Clearly, this algorithm requires O( N ) field operations over R. Why is this not
a polynomial-time factoring
√ algorithm? (Hint: What is the number of bits required to represent
the input, N ? Is an O( N )-time algorithm polynomial with respect to this number of bits?)
1
For clarity, this reduction was known prior to Shor’s algorithm, due to the 1976 work of Miller [Mil76].
80
Exercise 10.2. Given the prime factors {p1 , . . . , pm }, show how to also compute the multiplic-
ities {k1 , . . . , km } using polylog(N ) field operations. Hint: Prove first that there are at most
polylog(N ) primes (including multiplicities) in the prime factorization of N .
A formal definition. Although the intuitive definition of the factoring problem is clear, recall
that formally the complexity classes P and NP are sets of decision problems, meaning roughly
that each possible input to the problem is either a YES or NO instance. Thus, we formally
define factoring as follows.
Exercise 10.4. Show how a subroutine for solving the decision problem FACTOR can be
bootstrapped to efficiently compute the smallest non-trivial prime factor of N .
Exercise 10.6. Why is FACTOR in co-NP? (Hint: Use the Fundamental Theorem of Arith-
metic.)
Combining the exercises above, we conclude that if FACTOR is NP-complete, then NP = co-NP.
The latter equality, however, is known to collapse the Polynomial-Time Hierarchy (PH) to its
first level, a consequence widely believed almost as unlikely as P = NP. Thus, FACTOR is
strongly expected not to be NP-complete.
NP-intermediate problems. While at first glance this may seem an unsatisfying state of affairs,
there is a silver lining: FACTOR is one of the few known natural candidate “NP-intermediate”
problems. Here, an NP-intermediate problem is a language in NP which is neither in P nor NP-
complete. Of course, noone can prove that FACTOR is NP-intermediate. However, what we
can prove (or more accurately, what Richard Ladner proved in 1975) is that, assuming P ̸= NP,
NP-intermediate problems do exist.
Extended Church-Turing thesis. The backbone on which essentially all of theoretical computer
science rests is the Church-Turing Thesis. This thesis posits that a Turing machine (TM) can
simulate any other model of computing. (Note this is a thesis, not a proven theorem.) In other
words, TMs tell us everything we need to know about computing. Except that the Church-
Turing thesis says nothing about the efficiency with which a TM can simulate other models.
81
For this, we rely on the (less commonly accepted) Extended Church-Turing Thesis, which states
that TMs can efficiently simulate any physically realizable model of computing.
The added constraints of “efficiency” and “physical realizability” of the Extended Church-
Turing Thesis imply that, thanks to the quantum factoring algorithm, we have arrived at an
exciting crossroads, at which one of the following three statements must be false:
1. The Extended Church-Turing Thesis is true.
Exercise 10.8. Convince yourself that the truth of any two of these statements implies the
falsehood of the remaining statement. Where do the keywords “efficiently” and “physically
realizable” from the Extended Church-Turing Thesis fit into your arguments?
Exercise 10.10. As stated, ORDER is not a decision problem. How can it be converted to
one?
The reduction. We now show how FACTOR reduces to ORDER. We begin with a simple
observation, given as the following exercise. Define gcd(x, y) as the greatest common divisor of
x, y ∈ Z. Recall that x and y are co-prime if gcd(x, y) = 1.
The exercise above gives us a route for factoring a given N ∈ Z+ — namely, we “just” need
to find a non-trivial x satisfying x2 ≡ 1 mod N . The only question is — where do get our
hands on such an x? To solve this difficult-sounding problem, we resort to the most advanced
of algorithmic techniques — simply pick an x at random. More accurately, pick (uniformly at
random) an integer x in set ZN := {0, 1, . . . , N − 1} which is co-prime to N . It turns out that
with high probability, the order of x, i.e. the smallest r such that xr ≡ 1 mod N , allows us
82
to construct the solution to x2 ≡ 1 mod N we are looking for. Before formally proving that
random sampling of x works, let us see why the order r of x helps.
Exercise 10.12. Let r be an even positive integer satisfying xr ≡ 1 mod N . Give a y satisfying
y 2 ≡ 1 mod N . (Hint: Note that r is even by assumption.)
Exercise 10.13. Does your y from the previous exercise necessarily allow you to extract a fac-
tor of N ? (Hint: Is it necessarily true that y will be a non-trivial solution to y 2 ≡ 1 mod N ?)
The last exercise above suggests that simply picking an x with even order r will not suffice —
we will need slightly more structure.
Finding a non-trivial solution to x2 ≡ 1 mod N . The proof that random sampling can lead to
a non-trivial solution for x2 ≡ 1 mod N requires three well-known facts, which we first state.
Fact 10.14 (Fermat’s Little Theorem (FLT)). For any prime p and integer a, ap ≡ a mod p.
Moreover, if p does not divide a, then ap−1 ≡ 1 mod p.
Fact 10.15 (Chinese Remainder Theorem (CRT)). Let {m1 , . . . , mn } be a set of integers larger
than 1, each pair of which is co-prime. Then, for any integers {a1 , . . . , an }, the linear system
of equations
x ≡ a1 mod m1 (10.2)
x ≡ a2 mod m2 (10.3)
..
. (10.4)
x ≡ an mod mn (10.5)
Exercise 10.17. Use Fermat’s Little Theorem to prove that there is a bijection between ele-
ments e and powers k in Fact 10.16 above. Conclude that picking e uniformly at random is
equivalent to picking k uniformly at random. (For this exercise, do not assume a priori that
k ∈ {1, . . . , p − 1} as stated in Fact 10.16.)
′
Exercise 10.18. Let r be the order of x modulo N , and consider r′ > r satisfying xr ≡ 1
mod N . Prove that r divides r′ .
The main theorem of this section is the following. Its proof requires Lemma 10.25, which is
stated and proven subsequently.
Theorem 10.19. Let N be an odd, composite natural number. Define ZN∗ := {x ∈ Z | x is co-prime to N }.
N
∗
Choose x ∈ ZN uniformly at random. Then, with probability at least 1/2, the order r of x is
even and satisfies
xr/2 ̸≡ −1 mod N. (10.6)
83
Exercise 10.20. Why does Theorem 10.19 allow us to factor N with probability at least 1/2,
assuming we have an oracle for order-finding? In your answer, make sure you account for the
fact that Theorem 10.19 does not explicitly state that xr/2 ̸≡ 1 mod N .
Proof of Theorem 10.19. We prove the claim for the special case relevant to RSA (Section 10.3),
i.e. N = pq for distinct primes p and q. The proof can be generalized to arbitrary N = pk11 · · · pkmm
for distinct primes p1 , . . . , pm .
To begin, the only thing we know about N is its prime factorization N = pq, so let us start by
rephrasing what it means to “choose x ∈ ZN ∗ uniformly at random” in terms of “local” factors
p and q.
(a, b) ∈ Spq uniformly at random, for set Spq := (a, b) | a ∈ Zp∗ and b ∈ Zq∗ . (Hint: Use the
Thus, assume we equivalently sample (a, b) uniformly at random from Spq . We now have to
bound the probability that r is even and xr/2 ̸∈ {±1} modulo N .
Focus: Largest powers of 2. Recall that xr ≡ 1 mod pq, x ≡ a mod p, and x ≡ b mod q.
Defining ra and rb as the orders of a modulo p and b modulo q, respectively, we thus have
xra ≡ ara ≡ 1 mod p, and xrb ≡ brb ≡ 1 mod q.
To bound the probability of our bad events (i.e. r is odd or xr/2 ∈ {±1} modulo N ), it turns out
the right place to look is the exact power of 2 which appears in each of the prime decompositions
of ra and rb , respectively. Specifically, we proceed by showing two claims:
(A) Either bad event causes ra and rb to have precisely the same power of 2 in their prime
factorizations.
(B) But since (a, b) is drawn uniformly at random from Spq , the odds that ra and rb have this
same factor of 2 is precisely 1/2.
Formally, define ta and tb as the exact powers of 2 which divide ra and rb , respectively (e.g.
ra = 2ta pk22 pk33 · · · is the prime factorization of ra , with p2 , p3 , . . . being distinct primes larger
than 2). Let us first prove point (A). Consider the first bad event, r is odd. By Exercise 10.23,
ra and rb divide r, and hence are both odd. Thus, ta = tb = 0. The second bad event is when
r is even but xr/2 ≡ −1 mod N . We know from Exercise 10.23 that ra and rb both divide r.
Thus, ta and tb are at most the power of 2 in the prime decomposition of r. We show matching
lower bounds. Since xr/2 ≡ −1 mod N , we have xr/2 ≡ −1 mod p and xr/2 ≡ −1 mod q.
But by definition xra ≡ ara ≡ 1 mod p and xrb ≡ brb ≡ 1 mod q, so ra and rb cannot divide
r/2. (Otherwise, for example, xr/2 ≡ 1 mod p.) Since ra and rb divide r but not r/2, we obtain
our desired lower bound — ta and tb must be at least the power of 2 in the prime decomposition
of r. This concludes the proof of point (A). Point (B) now follows from Lemma 10.25, as the
following exercise shows.
84
|0n ⟩ H ⊗n • QF T2†n
|ψ⟩ Uk
Exercise 10.24. Confirm that Point (B) follows from Lemma 10.25.
Lemma 10.25. Fix an odd prime p, and choose a ∈ Zp∗ uniformly at random. Let ra be the
order of a. Then, with probability 1/2, the largest power of 2 which divides p − 1 also divides
ra .
Proof. Let t be the largest power of 2 which divides p − 1, i.e. 2t divides p − 1.
The proof strategy exploits Exercise 10.17, which recall states that choosing a ∈ Zp∗ uniformly
at random is equivalent to choosing k ∈ {1, . . . , p − 1} uniformly at random (where g k ≡ a
mod p for fixed generator g of Zp ). In particular, since Zp∗ has even size, this means that k is
odd with probability 1/2. The key point is that, as we show next, k is odd if and only if 2t
divides ra , yielding the desired claim regarding ra .
• (Case 1: k odd) We claim 2t divides ra . Since
where the last equivalence follows from FLT, we conclude p − 1 divides kra . But p − 1 is
even and k is odd, implying the factor 2t which divides p − 1 does not divide k, but rather
ra , as claimed.
• (Case 2: k even) We claim 2t does not divide ra . To show this, note
p−1
g kra ≡ ara ≡ 1 ≡ g p−1 ≡ g k 2 mod p,
where the third equivalence uses FLT, and the last that k is even. We conclude that ra
divides (p − 1)/2. But p − 1 has factor 2t , meaning (p − 1)/2 has factor 2t−1 , and thus the
largest power of 2 appearing in the prime factorization of ra is also at most 2t−1 . Thus,
2t does not divide ra , as claimed.
85
Challenge 1: The choice of U . Recalling that (x, N ) is the input to ORDER, we shall run
QPE on unitary Ux defined via action
Exercise 10.27. Let V be a d × d complex matrix. Prove that V is unitary if and only if V
maps any given orthonormal basis B ⊆ Cd to an orthonormal basis B ′ ⊆ Cd . Can you give an
outer product representation of V ?
Exercise 10.28. Using the previous exercise, prove that Ux is unitary. (Hint: Use the fact that
for any x co-prime to N , there exists an inverse x−1 ∈ ZN
∗ such that xx−1 ≡ 1 mod N .)
The claim now is that certain eigenvalues of Ux encode the order r of x. Let us “design” the
corresponding eigenvectors from scratch to build intuition. First, what do we know, and why
did we choose Ux the way we did? Well, we know xr ≡ 1 mod N , and the operation y 7→ xy
mod N induced by U allows us to iteratively produce the elements of the infinite sequence
(1, x, x2 , x3 , . . . , xr−1 , 1, x, . . .). It follows that a first guess at an eigenvector of Ux is
r−1
1 X j
|η⟩ := √ |x mod N ⟩. (10.8)
r
j=0
Exercise 10.29. Show that Ux |η⟩ = |η⟩, i.e. |η⟩ has eigenvalue 1.
Unfortunately, an eigenvalue of 1 is not very exciting. However, we are only a tweak away from
what we want. Recall the other setting in which we have a “cyclic” or “wraparound” property:
The complex unit circle. In particular, for principal rth root of unitary ωr := exp(2πi/r), we
have ωrr = 1. Thus, let us try injecting rth roots of unity as relative phases into |η⟩ to obtain:
r−1
1 X 2πij/r j
|ϕ⟩ := √ e |x mod N ⟩. (10.9)
r
j=0
Exercise 10.30. Show that Ux |ϕ⟩ = e−2πi/r |ϕ⟩, i.e. |ϕ⟩ has eigenvalue e−2πi/r .
We conclude that running QPE with Ux on |ϕ⟩ would yield an approximation to 1/r, from
which we could in principle recover r, as desired. There’s only one problem — how do we
prepare |ϕ⟩? Indeed, to prepare |ϕ⟩, it would seem we need to know r, which is precisely what
we had originally set out to find!
Exercise 10.31. In Lecture 9, we assumed in the definition of QPE that the input phase
θ ∈ [0, 1), whereas above we have θ = −i/r. Show how a simple modification to the QPE
algorithm allows us to nevertheless extract i/r.
86
Challenge 2: How to prepare |ϕ⟩. Unfortunately, it is not known how to construct |ϕ⟩ directly.
There is, however, a slick way around this problem. Let us extend |ϕ⟩ to an orthonormal basis
B := {|ϕk ⟩} for the Hilbert space H whichP Ux acts on, whereP we define |ϕ1 ⟩ := |ϕ⟩. Then, any
unit vector |ψ⟩ ∈ H may be written |ψ⟩ = k αk |ϕk ⟩ with k |αk |2 = 1. Since QPE is unitary
(omitting the final measurement), by linearity we have
X QP E X
|0n ⟩|ψ⟩ = αk |0n ⟩|ϕk ⟩ 7−−−→ αk |θk ⟩|ϕk ⟩, (10.10)
k k
Exercise 10.32. Suppose we measure the first register of the right side of Equation (10.10).
With what probability do we obtain our desired outcome θ1 corresponding to |ϕ1 ⟩?
The takeaway is that we need not prepare |ϕ1 ⟩ exactly; rather, it may be easier to prepare some
|ψ⟩ which has non-zero amplitude α1 on |ϕ1 ⟩. It turns out this is not only easy, but trivial.
Indeed, there is a clever choice of the first r basis vectors |ϕk ⟩ in B which satisfies
r−1
1 X
√ |ϕk ⟩ = |1 mod N ⟩. (10.11)
r
k=0
(Note that |1 mod N ⟩ is indeed trivial to prepare.) So what are |ϕ2 ⟩, . . . , |ϕr ⟩? In particular,
how do we systematically construct states orthonormal to |ϕ1 ⟩? Observe that |ϕ1 ⟩ is essentially
a column from a Vandermonde matrix, obtained by raising the principal rth root of unity ωr
to increasing powers. Thus, we know precisely how to generate r − 1 more vectors orthonormal
to |ϕ1 ⟩ — simply consider the Vandermonde matrix whose kth (for k ∈ {0, . . . , r − 1}) row
exponentiates ωrk to increasing powers (up to r − 1, inclusive). Formally, define
r−1
1 X 2πijk/r j
|ϕk ⟩ := √ e |x mod N ⟩. (10.12)
r
j=0
2πik
Exercise 10.34. Prove that U |ϕk ⟩ = e− r |ϕk ⟩.
Exercise 10.35. Show that defining |ϕk ⟩ as in Equation (10.12) yields Equation (10.11).
Plugging the exercise above into Equations (10.10) and (10.11), we have that if we run QPE on
|1 mod N ⟩ and measure the first register, with probability 1/r we get (an approximation to)
phase k/r for some k ∈ [r − 1]. This is almost what we want, except for the pesky k variable
in the numerator. So we must settle for something less — the ability to sample uniformly at
random from set S := {1/r, 2/r, . . . , (r − 1)/r}. Is this enough to extract r? Incredibly, the
answer is yes, and will be covered in Section 10.2.3.
Exercise 10.36. Consider the following naive algorithm for extracting r. Run QPE repeatedly
t times to collect t samples from S, for t some polynomial in the input size, log N . Let t∗ denote
the smallest sample drawn. Return 1/t∗ as the guess for r. Intuitively, how large might t have
to be for this algorithm to extract r with (say) constant probability?
87
Challenge 3: Precision and the controlled-U k operation. Before discussing how sampling
from S allows us to extract r, we must address a key issue we have thus far swept under the
rug: Can we even efficiently run the QPE algorithm to the precision required for sampling from
S, i.e. to represent k/r for k ∈ [r − 1]? To analyze this, recall that the input size to FACTOR
is the number of bits needed to encode N , i.e. n := ⌈log N ⌉ + 1 bits.
Exercise 10.37. Consider arbitrary k, r ∈ Z+ for k < r, which can both be expressed exactly
using n bits. How many bits are required to express the ratio k/r as a real number, i.e. as
a sequence of bits 0.b1 b2 b3 . . . bm ? (Hint: Does k/r necessarily have a finite binary expansion,
even if k and r have finite expansions?)
As the exercise above suggests, even if k and r have finite binary expansions, it is not necessarily
true that k/r has a finite expansion. Thus, a priori it is not clear what precision we should
require from QPE. Naively, one might “guess” that since kr requires at most 2n + 1 bits to
represent, perhaps k/r can be sufficiently well-approximated also using 2n + 1 bits (at least
close enough for us to recover k and r as integers). Indeed, it turns out that 2n + 1 bits of QPE
precision will suffice for the continued fractions algorithm of Section 10.2.3 to extract k and r.
To obtain 2n+1 bits of precision in QPE with probability at least 1−ϵ for fixed ϵ > 0, however,
1
we require two things: Expanding the first register of Figure 10.1 to n′ = 2n + 1 + ⌈log(2 + 2ϵ )⌉
′
qubits (which is just a constant factor overhead), and the ability to raise U to power 2 ∈ O(2n )
n
in time poly(n) (this smells like trouble). Stated differently, the latter requires us to compute
xk y mod N for k ∈ O(2n ) in time poly(n). As luck would have it, there is a well-known classical
algorithm which accomplishes essentially this, known as square-and-multiply.
Basic idea. The basic idea is most easily seen when b is a power of 2, e.g. b = 2n . In this case,
note that (trivially) ab = (a2 )b/2 . In other words, we can cut the exponent in half simply by
performing a single squaring of a. So let us repeat this tradeoff, each time squaring the result of
the previous square and cutting the remaining exponent in half. How many times can we repeat
this before it terminates? Precisely log b = n times, hence yielding an exponential speedup over
brute force, which would have required b operations.
General case. When b is not a power of 2, the principle is similar, but the recursive step requires
two cases instead of one: ( b
b (a2 ) 2 if b is even
a = 2 b−1 (10.13)
a(a ) 2 if b is odd.
The square-and-multiply algorithm repeatedly applies this recursive step until b = 0, at which
point it terminates. Note that since we are working mod M , in order to prevent the expression
being computed from blowing up exponentially quickly, we may apply the mod operation after
each application of the recursive rule.
Exercise 10.38. Asymptotically, how many operations does the algorithm require in the gen-
eral case?
88
10.2.3 Postprocessing via continued fractions
In Section 10.2.2, we used QPE to approximately sample from set S := {1/r, 2/r, . . . , (r − 1)/r}.
More specifically, defining n := ⌈log N ⌉ + 1, each time we ran the QPE subroutine, we obtained
(with probability at least 1 − ϵ) the 2n + 1 most significant bits of the binary expansion of
some random element k/r of S. Given access to such a sample x, it thus remains to extract a
guess for r. To do so, we run the continued fractions algorithm, for which we first overcome the
following minor obstacle.
A minor obstacle. If we are to have any hope of extracting r given k/r, we must have k
and r co-prime (in other words, k/r is already in lowest terms). A priori, there is no way
of guaranteeing this, since k is drawn uniformly at random from {0, . . . , r − 1}. However, the
uniform sampling is simultaneously the key to the solution. Namely, given positive integer r,
the number of positive integers less than r which are co-prime to r has been studied since
1763, and goes by the name of the Euler totient function ϕ(r). The totient function scales
as ϕ(r) ∈ Ω(r/ log log r) for r > 2. Thus, a randomly chosen k in {0, . . . , r − 1} is co-prime
to r with probability scaling as 1/ log log r ≥ 1/ log log N . It follows that repeating the QPE
sampling step O(log log N ) times (i.e. logarithmic in the encoding size of N ) suffices to obtain
k/r with k co-prime to r with high probability. Assuming we have such a k/r, we can now
apply continued fractions.
The continued fractions algorithm. In a nutshell, the continued fractions expansion is just
another representation for a given rational number a/b for a, b ∈ Z+ . Specifically, given a/b,
the algorithm outputs a finite sequence of integers C := (c0 , c2 , . . . , cm ), such that
a 1
= c0 + 1 . (10.14)
b c1 + c + 1
2
···+ c1
m
Why should this representation of a/b be useful? First, instead of viewing C as representing
a single rational number, note we may view it as encoding a list of m rational numbers, each
defined via the continued fraction subsequence (formally, kth convergent) Ck := (c0 , . . . , ck ).
Recall now that what QPE gave us was not a random sample k/r, but rather the first 2n + 1
bits of k/r, denoted x. Thus, x itself is rational, and is “close” to k/r. Amazingly, the following
theorem thus tells us that the continued fractions expansion of x will yield a sequence of m
convergents, one of which is precisely a/b.
k 1
− x ≤ 2. (10.15)
r 2r
Exercise 10.40. Show that since x is the 2n + 1 most significant bits of k/r, Equation (10.15)
indeed holds, and thus we can apply Theorem 10.39.
This almost gives us what we want — instead of an approximation x to k/r, we now actually have
k/r exactly. There are three questions to be answered: (1) How large is M in Theorem 10.39?
89
(2) How do we compute the continued fractions expansion of x? (3) Given the convergent of x
which yields k/r, how do we extract r (recall we don’t know k)? For (1), it turns out since k
and r are n-bit integers, one can choose M ∈ O(n). The answers to the other two questions we
give below.
Computing the continued fractions expansion. The algorithm essentially “implements” Equa-
tion (10.14) in a “greedy” fashion. Let us demonstrate with an explicit example, the expansion
of 23/9:
23 split 5 invert 1 split 1 invert 1 split 1
−−→ 2 + −−−→ 2 + 9 −−→ 2 + 4 −−−→ 2 + 1 −−→ 2 + .
9 9 5 1+ 5 1+ 5 1 + 1+1 1
4 4
As demonstrated above, the algorithm repeatly applies the split step (extract the integer com-
ponent) and invert step (flip the fractional component) until the final fraction is of form 1/z
for some z ∈ Z.
Exercise 10.42. Prove that the continued fractions expansion for any rational number is indeed
finite.
Theorem 10.43. Given any continued fractions expansion (or more generally, some kth con-
vergent) Ck := (c0 , c1 , . . . , ck ) where ci ∈ Z+ for all i ∈ 0, . . . , k, define ai and bi recursively as
follows:
Then, the expansion Ck is equivalent to rational number ak /bk , where ak and bk are co-prime.
Combining Theorems 10.39 and 10.43, we hence have that trying all m possible convergents in
the expansion C for x (which recall is a 2n + 1-bit estimate of k/r) will eventually allow us to
recover both k and r.
Exercise 10.44. Apply Theorem 10.43 to C = (1, 2, 3, 4). What are a3 and b3 ? Confirm that
they are co-prime.
90
The RSA cryptosystem. RSA, named after its founders Rivest-Shamir-Adleman, is a public
key cryptosystem still widely used today. Roughly, in a public key cryptosystem, Alice wishes
to allow the public to send her private messages. For this, she creates a pair of keys: One public
key P , one private key S. The public key P is distributed openly to the world; anyone can use
P to encrypt a given message M , obtaining ciphertext E(M ) for transmission to Alice. Upon
receipt of any such E(M ), Alice uses her private key S, which is only known to her, to decrypt
E(M ) and recover M .
What we thus need to specify for RSA are the public and private keys P and S, respectively,
as well as the encryption and decryption algorithms. These are given as follows.
• How the public encrypts messages given P = (e, N ). Suppose Bob wishes to encrypt
string M ∈ {0, 1}⌊log N ⌋ . Viewing M as the binary encoding of an integer, he computes
ciphertext
E(M ) := M e mod N. (10.16)
Note this can be done efficiently even for large e via the square-and-multiply algorithm.
• How Alice decrypts ciphertext E(M ). Given ciphertext E(M ), Alice recovers M by
computing
M = (E(M ))d mod N. (10.17)
The proof of this equality is not difficult, but is omitted to avoid sidetracking the discus-
sion.
• How factoring allows one to completely break RSA. Suppose adversary Eve can
factor N efficiently into primes p and q. Then, since e is public knowledge, she can
recompute d such that ed ≡ 1 mod ϕ(N ) (Step 3 of Alice’s creation protocol), giving her
Alice’s private key S = (d, N ). Thus, not only can Eve decrypt any incoming ciphertext
to Alice, but she in fact has complete knowledge of Alice’s secret key.
In sum, what makes RSA classically “hard” to break is that given just N = pq, it is not clear
how to compute ϕ(N ) = (p − 1)(q − 1). But given the factors p and q, recovering ϕ(N ), and
hence the private key d, is easy.
91
11 Grover search and approximate counting
“The way to find a needle in a haystack is to sit down.”
— Beryl Markham.
”After that, work and hope. But never hope more than you work.”
— Also Beryl Markham.
Shor’s factoring algorithm and Grover’s search algorithm are like soccer teammates on differ-
ent ends of the field — the former, like a striker, is highly specialized to do one thing efficiently
(i.e. score), while the latter, akin to a sweeper, has the broad job of neutralizing all threats
which slip past the rest of the team. Indeed, Shor’s algorithm gives an exponential speedup for
the specific problem of factoring, whereas Grover search can be thrown at just about anything
to yield a quadratic speedup.
Discovered by Lov Grover in 1996, Grover search is more specifically a quantum algorithm for
solving the following general problem: Given query access to an oracle containing N items, one
of which is “marked”, find the marked item. For example, the “oracle” could be implemented
by a 3-SAT formula ϕ, indexed by n-bit assignments x, and the aim is to find a satisfying
assignment x (which would be√considered “marked”). Grover’s algorithm solves this problem
with high probability using O( N ) queries to the database (which turns out to be optimal for
a quantum algorithm), whereas a classical algorithm
√ would require Ω(N ) queries in the worst
n
case. Thus, Grover search solves 3-SAT in O( 2 ) time.
We begin in Section 11.1 by defining the unstructured search problem. Section 11.2 then gives
Grover’s algorithm, and Section 11.3 discusses a recent approach for bootstrapping Grover’s
algorithm for approximate counting.
• Input: Query access to an oracle Uf for f : {0, 1}n → {0, 1} an unknown Boolean function,
meaning the ability to compute (at unit cost) map
Note that no assumptions about Uf are made, other than the requirement that we have super-
position query access to the input/output behavior of f .
92
Exercise 11.2. SEARCH is often alternatively formulated as follows: The oracle Uf is replaced
n
with an unknown string z ∈ {0, 1}2 , such that each query to Uf is equivalent to accessing a
Wn
single bit of z. The output is to compute the OR function on string z, i.e. OR(z) = 2i=1 zi .
Explain why this formulation of SEARCH is equivalent to Definition 11.1.
Claim 11.3 (Exponential-Time Hypothesis (ETH)). There exists a constant ϵ > 0 such that
3-SAT requires time Ω(2ϵn ) on a deterministic Turing machine.
While the runtime of Grover’s algorithm√does not contradict ETH, if one does believe ETH, then
it is perhaps not too surprising that O( 2n ) turns out to be the optimal worst-case runtime for
a black-box quantum search algorithm.
Is ETH true? Again, this is not clear, but assuming the truth of the ETH has led to matching
runtime lower bounds in an area known as fine-grained complexity. Roughly, the latter aims to
pin down the precise complexity of problems which are known to be in P (e.g. is the optimal
worst-case runtime, say, O(n3 ) or O(n2 )?). For example, in the Orthogonal Vectors Problem
(OV), given sets of vectors A, B ⊆ {0, 1}d with |A| = |B| = n, one is asked whether there exist
|v⟩ ∈ A and |w⟩ ∈ B such that ⟨v|w⟩ = 0?
It turns out that, assuming a stronger variant of ETH known as the Strong Exponential Time
Hypothesis, the naive polynomial runtime of Exercise 11.4 is the best possible. The interested
reader is referred to the accessible survey of Bringmann [Bri19] for details.
Phase kickback. The starting point is the phase kickback trick used in conjunction with oracle
Uf . Recall this means using mapping |x⟩|−⟩ 7→ (−1)f (x) |x⟩|−⟩, or for brevity,
93
|Ai |Ai |Ai
00
|ψ i |ψ K i
η η
|ψi |ψi |ψi
2θ
θ θ θ
|Bi |Bi |Bi
θ θ
|ψ 0 i |ψ 0 i
Figure 11.1: (Left) A reflection of |ψ⟩ about |B⟩ to |ψ ′ ⟩. (Middle) A reflection of |ψ ′ ⟩ about
|ψ⟩. For clarity, the angle 2θ is between |ψ ′′ ⟩ and |ψ⟩. (Right) The final state
|ψ K ⟩ = (Rψ RB )K |ψ⟩ after running the Grover iterate K times.
It will be remarkably helpful to visualize phase kickback geometrically in the case of SEARCH.
This is done, roughly, by considering superpositions over marked and unmarked items (for
clarity, a “marked” (“unmarked”) item x satisfies f (x) = 1 (f (x) = 0)). For this, let A, B ⊆
{0, 1}n be the sets of marked and unmarked items, respectively, and define
1 X 1 X
|A⟩ := p |x⟩ and |B⟩ := p |x⟩. (11.3)
|A| x∈A |B| x∈B
Thus, |A⟩ (|B⟩) is an equal superposition over all marked (unmarked) items. The geometric
interpretation follows from the next exercise.
In words, restricted to the 2D space defined by Span(|A⟩, |B⟩), Uf acts as a reflection about
|B⟩, as depicted in Figure 11.1. And this is no coincidence — digging more deeply into this
view will lead us directly to Grover’s algorithm (though, for clarity, this geometric view was
only discovered after Grover’s original work). To see this, let us remind ourselves of the second
tool we have at our disposal — preparing some initial start state |ψ⟩. Ideally, this state |ψ⟩
should also lie in the span of |A⟩ and |B⟩, so that it “fits” into the 2D picture of Figure 11.1.
Exercise 11.6. Take a moment to guess what might be a “reasonable” choice of |ψ⟩, given our
current state of knowledge. What do we know about the location of the marked items? What
choice of |ψ⟩ might lie in the span of |A⟩ and |B⟩?
The start state, |ψ⟩. Since a priori we have no reason to believe any particular item x is
marked, a naive choice of start state is
r r
1 X |A| |A|
|ψ⟩ = √ |x⟩ = |A⟩ + 1 − |B⟩. (11.4)
N n N N
x∈{0,1}
In other words, |ψ⟩ ∈ Span(|A⟩, |B⟩), as desired, and we may depict it as in Figure 11.1 (Middle).
The angle θ therein is given in the following exercise.
94
q
|A|
Exercise 11.8. Show that for start state |ψ⟩ in Equation (11.4), cos θ = 1− N .
The goal. To guide the rest of the algorithm, we now pause and step back to restate our
goal in the 2D picture of Figure 11.1. Given the ability to prepare |ψ⟩, we wish to rotate |ψ⟩
counter-clockwise up to to |A⟩, and subsequently measure in the standard basis, thus obtaining
some marked item x ∈ A. Since this is just a rotation map, it is certainly possible, in that there
exists a 2n × 2n unitary matrix performing this rotation. The question is: Can this mapping be
computed using poly(n) queries to Uf (and ideally, poly(n) auxiliary gates)?
Two reflections make a rotation. As the saying goes, beggars can’t be choosers, and indeed
to attain our goal, we must make do with what few tools the generality of SEARCH affords us:
We can reflect about |B⟩, and we can prepare |ψ⟩. The key observation is that since we can
efficiently prepare |ψ⟩, i.e. |ψ⟩ = H ⊗n |0n ⟩, we can also reflect about |ψ⟩.
Exercise 11.9. In Lecture 9, we saw that for |ψ⟩, a reflection about |ψ⟩ is achieved by unitary
Uψ = 2|ψ⟩⟨ψ|−I. Show how to implement Uψ for our choice of |ψ⟩ from Equation (11.4). (Hint:
Begin by showing how to reflect about |0n ⟩, i.e. by implementing U0n = 2|0n ⟩⟨0n | − I.))
Geometrically, this means we can map |ψ ′ ⟩ to |ψ ′′ ⟩ in Figure 11.1. In other words, by first
reflecting about |B⟩, and then about |ψ⟩, we can effect a counter-clockwise rotation of 2θ in
Figure 11.1. Magic! Formally, we shall repeatedly apply the pair of reflections (often dubbed
the “Grover iterate”)
(2|ψ⟩⟨ψ| − I)(2|B⟩⟨B| − I) =: Rψ RB , (11.5)
where RB is effected by querying the oracle Uf , and Rψ by undoing the preparation procedure
for |ψ⟩, reflecting about |0n ⟩, and then re-doing the preparation for |ψ⟩ (as per Exercise 11.9).
The number of iterations required. We can now state Grover’s algorithm as follows:
1. Prepare |ψ⟩ = H ⊗n |0n ⟩.
2. Apply the Grover iterate, Rψ RB , K times.
3. Measure in the standard basis to obtain string x ∈ {0, 1}n .
If there are no solutions, i.e. M = 0, this procedure always outputs x satisfying f (x) = 0,
as expected. If M > 0, on the other hand, the question is what to set K, the number of
loop iterations, to? Note that it suffices for the algorithm to succeed with any fixed constant
success probability p, as then independently repeating the algorithm drives down the overall
error probability exponentially. To simplify the analysis, set p = 1/2. Without loss of generality,
we may assume |A| /N ≤ 1/2 (as otherwise classically choosing uniformly random inputs to Uf
yields success probability at least 1/2). By Exercise 11.8, we hence have θ ≤ π/4.
Exercise 11.10. Show that for Step 3 of Grover’s algorithm to output x ∈ {0, 1}n satisfying
f (x) = 1 with probability at least 1/2, it suffices in Figure 11.1 (Right) that |ψ K ⟩ makes angle
at most η ≤ π/4 with |A⟩. (Hint: Recall your high school formula that the overlap between
real vectors |v⟩ and |w⟩ equals ⟨v|w⟩ = ∥ |v⟩ ∥2 ∥ |w⟩ ∥2 cos η for η the angle between |v⟩ and |w⟩.)
p
Thus, by Exercise 11.8, we start with |ψ⟩ at θ = arccos( (1 − |A|)/N ) ≤ π/4, and we wish to
end up at |ψ K ⟩ with η ∈ [−π/4, π/4].
95
Exercise 11.11. Given that |ψ⟩ starts at angle θ ≤ π/4, do we ever need to wrap around the
circle (when applying the Grover iterate) in order to land in range η ∈ [−π/4, π/4]?
Exercise 11.12. Using your answer from the previous exercise, show that setting
& r !'
1 |A| π
K= arccos − (11.6)
2θ N 4
Exercise 11.14. How many auxiliary gates (i.e. gates other than queries to Uf ) does Grover’s
algorithm use?
In closing, given query access to an oracle Uf for which p |A| out of N = |A| + |B| items are
marked, a marked item can be found quantumly using O( N/ |A|) queries. There is a slight
catch, however — the exact number of queries required, as given by Equation (11.6), requires
knowledge of |A|. Luckily, it turns out that not only do quantum algorithms allow us to check
if |A| > 0, but additionally to estimate |A| itself to within multiplicative error. This is known
as quantum approximate counting, covered next.
96
of x ∈ {0, 1}n satisfying f (x) = 1. The classic result for approximate counting is Stockmeyer’s
algorithm, which shows how to approximate M within a multiplicative factor of 2 in randomized
polynomial time, assuming one has access to an NP oracle. (This factor of 2 can then easily
be boosted to 1 + (1/p(|x|)) for any desired fixed polynomial p.) In contrast, in this section we
shall make a tradeoff — by adding quantum computation to the p picture, we no longer need an
NP oracle, but now the runtime is worst-case exponential: O( N/M ) to be precise (assuming
we want a constant probability of success). And this tradeoff in some sense is necessary —
in the worst case, approximating the Permanent of a matrix (a classic #P-complete problem
dating back to Valiant’s original 1979 paper on #P) within a constant multiplicative error is still
#P-hard [AA11]. And it is nowadays generally believed quantum computers cannot efficiently
solve NP-hard problems, never mind #P-hard problems.
Intuition
The basic idea. Let p = M/N , i.e. the fraction of satisfying assignments. Naively, there is a
simple classical algorithm for estimating p — simply pick x uniformly at random, and evaluate
f (x). By definition, this succeeds with probability p, and so 1/p trials are expected before a
satisfying assignment is found.
Exercise 11.15. Take a moment to Google for “geometric distribution”. Show how to model
the sampling experiment above as a geometric distribution. What is the expected value and
variance for this distribution? What is the probability that the number of trials needed to
see a success deviates significantly from 1/p? (Hint: Use Chebyshev’s inequality, which unlike
Markov’s inequality, takes the variance into account.)
√ course, in the worst case, 1/p ∈ O(N ), and by now we are spoiled with getting faster
Of
O( N ) runtimes via Grover search. Thus, roughly, the quantum algorithm we discuss will
carefully mimic (a refinement of) the idea above in conjunction with Grover search. In the
remainder of this section, we state the algorithm, and sketch its proof of correctness. The
interested reader is referred to [AR20] for full details.
97
Exercise 11.16. More accurately, denoting the Grover iterate as G, we have
sin(rθ) X cos(rθ) X
G(r−1)/2 = √ |x⟩ + √ |x⟩. (11.8)
M x∈A N − M x∈B
Confirm that the probability of extracting a marked item after O(r) uses of G is indeed p.
The beauty of [AR20] is now that we can forget about the word “quantum”, and simply think
of p as a probability arising from some abstract sampling experiment E with parameter r (we
henceforth write E(r) where appropriate). The question is: Given the ability to choose r in this
experiment E(r), how many runs of E do we need to estimate p (thus allowing us to extract θ,
which encodes the number of solutions M )?
The high-level outline for achieving this with a quadratic speedup consists of two steps:
1. (Rough estimate) Repeat E(r) using exponentially increasing values of r until “sufficiently
many” marked items are found. This gives a rough estimate of K− ≤ θ ≤ K+ .
2. (Finetuning the estimate) Iteratively cut down the interval [K− , K+ ] to zoom in on θ.
The second step, in particular, will require a careful choice of r each time E(r) is run to avoid
cutting the candidate interval [K− , K+ ] too much, which in particular is a danger if θ ≈ K− or
θ ≈ K+ . This shall rely on the Steady Hands Lemma 11.20, to be stated shortly.
Theorem 11.17. Fix any desired ϵ, δ > 0. Given oracle access to Uf (as in SEARCH), there
exists a quantum algorithm which:
f satisfying (1 − ϵ)M < M
1. outputs M f < (1 + ϵ)M ,
Statement of the algorithm. The algorithm is stated below. It assumes θ ≤ π/1000, which is
without loss of generality since we can always “extend” Uf with dummy indices x which always
lead to f (x) = 0. For now, we use abstract names ci to denote parameters in the algorithm, so
as to avoid excess detail obscuring the main pedagogical ideas.
1. Set t = 0.
98
t∗ +1 t∗ −1
5 1 5 1
3. Set θmin := 8 c1 and θmax := 8 c1 , where recall c1 > 1.
4. Set t = 0.
θmax
θmin = , (11.9)
∆
where ∆ := 0.1 + 0.9(θmax /θmin ). Otherwise, decrease the upper bound via
Formally, setting parameters c1 = 12/11, c2 = 105 ln(120/δ), c3 = 103 ln(100(0.9)t /(δϵ)) suffices
for the proof of Theorem 11.17.
Checkpoint 1: After first loop terminates. The claim here, as suggested by line 3 of the algorithm,
is that with probability at least 1 − (δ/2),
t∗ +1 t∗ +1 t∗ −1 t∗ −1
5 1 5 11 5 11 5 1
= ≤θ≤ = . (11.11)
8 c1 8 12 8 12 8 c1
The claim is seen by focusing on a “threshold” t0 for t, around which the probability of seeing
at least 1/3 of the recorded items marked in Step 2(c) jumps from “insignificant” to “extremely
likely”. Formally, set t0 as the largest integer satisfying
t0
12 5
θ≤ . (11.12)
11 8
Then, one can show that if t < t0 , the probability p of E(r) returning a marked item is at most
2(1+t−t0 )
2 12 1
p = sin (rθ) ≤ 0.33 < . (11.13)
11 3
Thus, over m trials, when t < t0 we expect to see strictly less than 1/3 of the sampled items
being marked. Formally, one applies the Chernoff-Hoeffding bound to conclude that since the
number of trials is c2 = 105 ln(120/δ), the probability of seeing at least 1/3 of the samples being
marked is “small”, i.e. at most δ/4.
99
Exercise 11.18. The Hoeffding bound states the following. Let X1 through Xn be independent
random variables, each satisfying 0 ≤ Xi ≤ 1, whose arithmetic mean is X = (X1 + · · · + Xn )/n.
Then, the bound states
2
Pr X − E[X] ≥ t ≤ 2e−2nt .
(11.14)
In words, we converge exponentially quickly to the true expectation of X by drawing many
samples and taking their arithmetic mean. Use the Chernoff bound to show that if a biased
coin flip lands HEADS with probability p − ϵ for fixed ϵ > 0, then it is highly unlikely over n
coin flips to have at least pn HEADS instances.
Checkpoint 2: After second loop terminates. By line 5(d), we are guaranteed that if we reach
line 6, then
θmax ϵ
≤1+ , (11.15)
θmin 5
implying any θe ∈ [θmin , θmax ] satisfies
ϵ ϵ
1− θ ≤ θe ≤ 1 + θ. (11.16)
5 5
Exercise 11.19. Observing that line 6 of the algorithm chooses θe = θmax , show that M
f satisfies
the first claim of Theorem 11.17.
Roughly, to show that we indeed reach line 6 eventually, observe that after line 3, we have
∆ = 0.1 + 0.9(θmax /θmin ) ≈ 0.1 + 0.9(1.19) > 1. Thus, intuitively each run of lines 5c and 5d
will eliminate a constant fraction of the search space, implying line 5(d) will eventually cause
us to exit the second loop, as desired. The only catch is that, each time we remove such a
fraction of the search space, we must ensure we do not accidentally “skip” θ, i.e. we must
always maintain θ ∈ [θmin , θmax ]. This is ensured by the following lemma, which shows how to
pick r in line 5(a) to avoid the situation θ ̸∈ [θmin , θmax ]. Note that the remaining probability of
failure of δ/2 in the algorithm arises by applying the union bound over all uses of the following
lemma.
Lemma 11.20 (Steady Hands Lemma). Assume 0 < θmin ≤ θ ≤ θmax ≤ π/1000, and
that θmax /θmin ≤ 5/4. Then, there exists an odd integer r such that, running E(r) at least
1000 ln(1/δ) times and updating θmin and θmax according to the following rules preserves θmin ≤
θ ≤ θmax with probability at least 1 − δ:
Further, r ∈ Θ(1/θ), where recall r controls the number of applications of the Grover iterate in
E(r).
100
We will not prove this lemma explicitly, but the rough idea is to select1 r satisfying
π θmin π
rθmin ≈ and rθmax ≈ rθmin + . (11.17)
2 θmax − θmin 2
In words, this not only means r iterations of the Grover iterate “separate” the starting angles
θmin and θmax by an additive angle of approximately π/2 radians, but that one can additionally
show this choice of r aligns rθmin with “approximately” |B⟩ (unmarked items) and rθmax with
|A⟩ (marked items) (recall |A⟩ and |B⟩ have an angle of π/2 radians). Since before we apply
Lemma 11.20, we assume as a precondition that θmin ≤ θ ≤ θmax , this means that if θ ≈ θmin
(θ ≈ θmax ), after r iterations we have rθ ≈ rθmin ≈ 0 modulo 2π (rθ ≈ rθmax ≈ π/2 modulo 2π),
so we are likely to sample an unmarked (marked) element. Thus, repeating E(r) sufficiently
many times and applying a Chernoff bound will ensure that if θ is close to (say) threshold θmin ,
then Lemma 11.20 is smart enough to recognize this and to instead update threshold θmax .
Formally, if θmin ≤ θ ≤ θmax /∆ (Case 1 of Lemma 11.20), one can show E(r) outputs a
marked item with probability
1
p = sin2 (rθ) ≤ 0.47 < . (11.18)
2
Conversely, if ∆θmin ≤ θ ≤ θmax (Case 2 of Lemma 11.20), one can show E(r) outputs a marked
item with probability
1
p = sin2 (rθ) ≥ 0.6 > . (11.19)
2
Thus, by the Chernoff-Hoeffding bound, cases 1 and 2 of Lemma 11.20 will correctly update
θmax or θmin , respectively, with high probability.
Query complexity. Finally, let us sketch the number of queries to Uf each stage of the algorithm
requires.
Exercise 11.21. Recall the first loop is run at most t0 + 1 times, for t0 defined in Equa-
tion (11.12). Recalling that each run of the loops uses 105 ln(120/δ) calls to Uf , show that the
total number of queries required by the first loop scales as
r !
1 1 N 1
O log ∈O log . (11.20)
θ δ M δ
queries, which dominates the first loop’s runtime, and hence leads to the claimed overall query
cost of Theorem 11.17.
1
Formally, one selects r as the closest integer to 2πk/θmin , where k is the closest integer to θmin /(4(θmax −θmin )).
101
12 Stabilizers and the Gottesman-Knill
theorem
“So the problem is not so much to see what nobody has yet seen, as to think what
nobody has yet thought, concerning that which everybody sees.”
— Arthur Schopenhauer, English translation.
According to internet lore, when asked on a physics exam how one should compute the height
of a building given just a barometer, a young Nils Bohr allegedly did not give the “expected”
answer (e.g. use the barometer to measure pressure at the base versus the top of the building),
but rattled off a string of alternate answers (e.g. drop the barometer from the top of the building
and measure the time to impact, measure the length of the shadow made by the barometer from
atop the roof, etc). While this anecdote is likely more fiction than fact, both it and the quote
atop the lecture reveal an important truth — it is often helpful to consider multiple perspectives
on a problem.
In particular, in this course, we have used explicit representations of quantum states, i.e.
n
throughout our analyses, we tracked the full state vector |ψ⟩ ∈ C2 . Naturally, this incurred
an exponential classical overhead. But perhaps this overhead is just an artifact of our chosen
representation of |ψ⟩? Specifically, we ask:
Remarkably, the answer is sometimes yes. In this lecture, we introduce a prominent framework
for achieving this: The stabilizer formalism. To be clear, the stabilizer formalism does not
allow efficient simulation of arbitrary quantum circuits. However, what it does allow is efficient
classical simulation of all quantum circuits consisting of so-called Clifford gates — this is the
content of the celebrated Gottesman-Knill theorem, a proof sketch of which is the main goal of
this chapter.
We begin in Section 12.1 by defining the stabilizer formalism, and stating the Gottesman-
Knill theorem. Section 12.2 then sketches a proof of the theorem. Note that there is a vast
amount further one can say about stabilizers, most importantly in the study of quantum error-
correcting codes; however, for reasons of brevity, here we focus on the computational aspect of
efficiently simulating (certain classes of) quantum circuits.
102
12.1.1 Intuition
We build intuition via a sequence of failed attempts (i.e. always cherish the mistakes you make).
n
Attempt 1: Expansion in the Pauli basis. Just as|ψ⟩ ∈ C2 can be written in terms of a basis
n n
for C2 , the density operator |ψ⟩⟨ψ| ∈ Herm C2 can be written in terms of an operator basis
n
for Herm C2 . For example, any H ∈ Herm C2 can be written as H = aI + bX + cY + dZ,
Exercise 12.1. Show that |0⟩⟨0| = 21 (I + Z). (Hint: Do not write out matrices; use spectral
decompositions.) What pure state does 12 (I − X) equal?
Exercise 12.2. Show that any single qubit density operator ρ may be written ρ = 21 (I +rX X +
rY Y + rz Z) for Bloch vector r = (rX , rY , rZ ) ∈ R3 satisfying ∥ r ∥2 ≤ 1. What Bloch vector
does the maximally mixed state have? How about a pure state ρ = |ψ⟩⟨ψ|?
Thus, any ρ ∈ D C2 can alternatively be expressed via real vector r ∈ R3 . Does this help?
now explore.
Attempt 2: Casting “shadows2 ” of Pauli basis elements. Attempt 1 used many bits (O(4n )
of them, to be precise) to describe one state. Attempt 2 asks the opposite: Can we use few
many states? Consider, for example, our operator basis element Z ⊗ Z ⊗ Z ∈
bits to describe
L(Herm C8 ). Henceforth, we omit the tensor product and write ZZZ for short.
Exercise 12.6. Prove that ZZZ has precisely two eigenspaces, E1 and E−1 , corresponding to
eigenvalues 1 and −1, respectively, each of dimension 4.
1
This is actually a sleight of hand — by appropriately “reshaping” matrices A and B into vectors vA and vB
(roughly, vA is the rows of A concatenated into one big vector, likewise for B), the Hilbert-Schmidt inner
product of A and B is actually just the usual inner product of vectors vA and vB .
2
In recent years, the concepts of “shadow tomography” and “classical shadows” have been studied in the
quantum computing literature. These are distinct from our use of the term “shadow” here.
103
In words, we may view ZZZ as “succinctly representing” the 4-dimensional spaces E1 and E−1 ,
of which we henceforth focus on E1 . More generally, Z ⊗n also has precisely two eigenspaces
E1 and E−1 , each of dimension 2n−1 . Thus, by writing down just O(n) bits (encoding Z ⊗n
requires specifying the Pauli operator at each of the n sites), we can succinctly represent a
2n−1 -dimensional space, E1 ! This certainly seems interesting, but unfortunately the “shadow”
n
cast by Z ⊗n on C2 is “too wide” — we capture too many states (i.e. all of E1 ). Can we bring
the size of this “shadow” under control?
Exercise 12.7. Confirm that ZZZ|000⟩ = |000⟩ and ZZZ|101⟩ = |101⟩, and thus |000⟩, |101⟩ ∈
E1 . What other two standard basis states are in E1 ? What is E1 more generally, expressed as
the span of standard basis states, for arbitrary n?
Attempt 3: Cutting down shadows with set intersections. The natural approach for cutting
down the size of a set E1 is to intersect it with another set, E1′ . So consider XX and ZZ.
Similar to Exercise 12.7,
E1 (XX) = Span(H1 ⊗ H2 |x⟩ | x ∈ {0, 1}2 has even Hamming weight ) = Span(|++⟩, |−−⟩)
E1 (ZZ) = Span(|x⟩ | x ∈ {0, 1}2 has even Hamming weight ) = Span(|00⟩, |11⟩).
Thus, by carefully taking the intersection of two 2-dimensional spaces, E1 (XX) and E1 (ZZ), √we
+
obtain a 1-dimensional space. In this particular example, the Bell state |ϕ ⟩ = (|00⟩ + |11⟩)/ 2
is unique state (up to scaling) in the joint 1-eigenspace of XX and ZZ.
But hold on, you ask — what black magic is this? How can we hope to characterize the
types of 1-eigenspaces whose intersection gives rise to an interesting joint 1-eigenspace? We
will get to the answer shortly, but for now, note that it is crucial that XX and ZZ commute
(i.e. [XX, ZZ] = 0), and thus simultaneously diagonalize (i.e. there exists a basis with respect
to which both XX and ZZ are diagonal). The following exercise explicitly reveals this.
Exercise 12.9. Recall the Bell basis {|ϕ+ ⟩, |ϕ− ⟩, |ψ + ⟩, |ψ − ⟩}. Show that
Conclude that XX and ZZ indeed simultaneously diagonalize, and that |ϕ+ ⟩ is their unique
joint 1-eigenvector.
Theorem 12.10 (Gottesman-Knill Theorem). Any quantum circuit consisting solely of the
following ingredients can be efficiently simulated classically via the stabilizer formalism:
104
• The following gates are allowed: H, CNOT, X, Y, Z, and phase gate
1 0
S= . (12.3)
0 i
• The final state is measured in the standard basis, i.e. via observable Z ⊗n .
We have intentionally stated the theorem in a simpler form; from this, slightly more general
statements can be extracted, as demonstrated by the following exercises (neither of which require
any knowledge of how the theorem is proven!).
Exercise 12.11. Show that Theorem 12.10 holds even N⊗nif we start with an arbitrary string
n
x ∈ {0, 1} , or measure an arbitrary Pauli observable i=1 σi for σi ∈ {I, X, Y, Z} at the end.
Exercise 12.12. Show that Theorem 12.10 holds even if we allow intermediate measurements
in the standard basis, where all measured qubits are subsequently used as classical controls.
(Hint: Use the principle of deferred measurement.)
The Pauli group. The name of the stabilizer game is to consider joint 1-eigenspaces of Pauli
strings. For this, we recall some basic properties: For any σi ̸= σj ∈ {X, Y, Z}, we have σi2 = I,
σi σj = −σj σi (i.e. Pauli operators anti-commute), and σi σj ∝ σk for some σk ∈ {X, Y, Z}
(where the proportionality constant is from set {±1, ±i}).
Exercise 12.14. Show that, up to multiplicative factor in {±1, ±i}, the product of any two
n-bit Pauli strings is another n-bit Pauli string.
Exercise 12.15. Show that, for any Pauli string g, there exists a Pauli string g ′ such that
gg ′ = I (where I is the Pauli string of identity operators).
Thus, the set of all Pauli strings up to scalars {±1, ±i}, i.e.
105
n
( )
O
Gn := c· σi | σi ∈ {I, X, Y, Z} and c ∈ {±1, ±i} , (12.5)
i=1
forms a group under multiplication. For any subset S ⊆ Gn , let ⟨S⟩ ⊆ Gn denote the subgroup
generated by S (i.e. any element of Gn which may be obtained by taking products of elements
in S).
Joint 1-eigenspaces of Pauli strings. As in Section 12.1.1, we now wish to choose a set of
generators S ⊆ Gn , so that the joint 1-eigenspace of all Pauli strings in S is non-empty, i.e.
E1 (S) ̸= ∅ (equivalently, dim(E1 (S)) ̸= 0).
Exercise 12.16. Fix any S ⊆ Gn . Show that for any |ψ⟩, |ψ⟩ ∈ E1 (S) if and only if
|ψ⟩ ∈ E1 (⟨S⟩).
Thus, as far as E1 is concerned, we may interchangeably talk about a generating set S and the
subgroup ⟨S⟩ it generates. The set S is then called the stabilizer of E1 (S).
Of course, since E1 (S) is implicitly an intersection of sets, the larger S gets, the smaller
E1 (S) gets. Thus, we may find ourselves in the predicament that if S is not chosen carefully,
E1 (S) = ∅, as you now show.
Exercise 12.18. Show that S = {XX, Y Y, ZZ} has E1 (S) = ∅. Contrast this with the case of
S = {XX, ZZ} from Section 12.1.1.
How to pick the generating set S. Luckily, there is a clean characterization of which S indeed
produce a non-empty joint 1-eigenspace E1 (S).
Theorem 12.19 (Non-trivial generating set). Let S ⊆ Gn . Then, E1 (S) ̸= ∅ if and only if
2. −I ̸∈ ⟨S⟩.
That these are necessary criteria is not difficult to see: We already saw Condition 2’s necessity
in Exercise 12.17, and now we confirm the same for Condition 1 via the following exercises.
Exercise 12.20. Fix any S ⊆ Gn and any A, B ∈ S. Show that if [A, B] ̸= 0, then AB = −BA.
In other words, all elements of S either pairwise commute or anti-commute.
Exercise 12.21. Fix any S ⊆ Gn and any A, B ∈ S, such that [A, B] ̸= 0, but that there exists
|ψ⟩ ∈ E1 (S). Show that |ψ⟩ = −|ψ⟩, i.e. |ψ⟩ must be the zero vector.
What remains is to show that the criteria in Theorem 12.19 are also sufficient to ensure
E1 (S) ̸= ∅. Interestingly, one can show a stronger, quantitative version of this statement.
106
Lemma 12.22 (Stabilizer dimension). For any 1 ≤ k ≤ n, let S = {g1 , . . . , gk } ⊆ Gn be an
independent3 set of generators satisfying the conditions of Theorem 12.19. Then,
In other words, not only is it true that increasing S by 1 element shrinks E1 (S), but each added
generator cuts the dimension of the encoded space down by a multiplicative factor of 2. Let us
now sketch the proof of this lemma.
Proof sketch (Lemma 12.22). The basic approach is rather slick, and best visualized via
pizza — if we partition a pizza into 2k equal slices, one of which represents E1 (S), then our
n
“E1 (S) pizza slice” must span a 1/2k -fraction of the pizza. If we now replace “pizza” with C2
and “E1 (S) pizza slice” with E1 (S), we obtain dim(E1 (S)) = 2n /2k = 2n−k , as claimed.
So how do we partition our Hilbert space pizza into equal-sized spaces? Consider again our
example of stabilizer S = {XX, ZZ}. To begin, let’s write down the projector onto our space
of interest, E1 (S). For this, define
I ± XX I ± ZZ
Π±XX := and Π±ZZ := . (12.7)
2 2
Exercise 12.23. Show that ΠXX and ΠZZ project onto E1 (XX) and E1 (ZZ), respectively.
The naive way to obtain then the projector onto E1 (S) is to consider ΠXX ΠZZ . However, given
two projectors P1 and P2 onto spaces S1 and S2 , it is not in general true that the projector
onto S1 ∩ S2 is P1 P2 . Luckily, here we can leverage the requirement in Lemma 12.22 that all
generators commute, i.e. [P1 , P2 ] = 0. In the commuting case, it is true that P1 P2 projects onto
S1 ∩ S2 . We conclude that since [ΠXX , ΠZZ ] = 0, ΠXX ΠZZ projects onto E1 (S).
With the projector onto E1 (S) in hand, it remains to devise projectors onto the remaining
k
2 − 1 = 3 (k = 2 for S = {XX, ZZ}) slices of the pizza we claimed existed.
Exercise 12.24. Show that ΠXX ΠZZ , ΠXX Π−ZZ , Π−XX ΠZZ , and Π−XX Π−ZZ project onto
orthogonal subspaces.
Exercise 12.25. Show that ΠXX ΠZZ + ΠXX Π−ZZ + Π−XX ΠZZ + Π−XX Π−ZZ = I.
We conclude that ΠXX ΠZZ , ΠXX Π−ZZ , Π−XX ΠZZ , and Π−XX Π−ZZ indeed partition C4 into
4 orthogonal spaces. This part of the proof generalizes easily to all n and k. The last piece of
the puzzle is to show that all these orthogonal spaces have equal dimension, i.e. are pizza slices
of equal size.
Exercise 12.26. Show that conjugation by unitary preserves rank, i.e. for any unitary U and
normal operator A, rank(U AU † ) = rank(A). Does the same result hold when A is not normal,
e.g. when A is not square?
Exercise 12.27. Define U1 = X ⊗ I ∈ U C4 . Show that U1 ΠXX ΠZZ U1† = ΠXX Π−ZZ .
Conclude that ΠXX ΠZZ and ΠXX Π−ZZ project onto spaces of the same dimension.
3
By independent, we mean no generator gi can be produced as a product of the remaining gj (and their inverses),
i.e. for all i, gi ̸∈ ⟨{g1 , . . . , gi−1 , gi+1 , . . . , gk }⟩.
107
Exercise 12.28. Find the remaining unitary operators U2 , U3 ∈ U C4 such that
Combining these exercises, we have that ΠXX ΠZZ , ΠXX Π−ZZ , Π−XX ΠZZ , and Π−XX Π−ZZ
indeed partition C4 into equal-dimensional orthogonal subspaces. We conclude that ΠXX ΠZZ
projects onto a 1-dimensional space, as claimed.
Exercise 12.29. Compare Lemma 12.22 with Exercise 12.8. Conclude that the Bell state |ϕ+ ⟩
is the unique state stabilized by S = {XX, ZZ}. Can you give the stabilizers for the remaining
three Bell states?
Exercise 12.30. Use Lemma 12.22 as an alternate way of showing Execise 12.18. (Use the fact
that XX, Y Y, ZZ are independent generators.)
This completes our proof sketch. To formalize this for arbitrary n and k, the main hurdle is
systematically finding unitaries analogous to U1 , U2 , U3 (as in Exercise 12.27), given an arbitrary
set of generators satisfying the conditions of Lemma 12.22. This can be accomplished without
too much work; however, it goes via the additional formalism of the check matrix, which we
omit for reasons of brevity.
Exercise 12.31. Give a stabilizer S such that E1 (S) = {|0⟩⊗n }. According to Lemma 12.22,
how many generators must S have? (Hint: Do the exercise for the n = 1 case first.)
Gate simulation. Normally, for a state |ψ⟩, we would apply a unitary U directly via U |ψ⟩.
However, now we are representing |ψ⟩ implicitly via a stabilizer S. How does the action of U
on |ψ⟩ translate to S? The answer is actually simple: For any g ∈ S,
where the first equality holds since |ψ⟩ ∈ E1 (S). Thus, U |ψ⟩ is stabilized by
n o
U SU † = U gU † | g ∈ S . (12.11)
108
Exercise 12.32. Set g = X ⊗n ∈ Gn . Show that H ⊗n g(H † )⊗n ∈ Gn . Generalize your proof to
arbitrary g ∈ Gn .
Formally, the set of unitaries U mapping Pauli strings to Pauli strings, i.e. which for all g ∈ Gn ,
U gU † ∈ Gn , is called the Clifford group. Note that to generate the Clifford group, it actually
suffices to use just H, CN OT , and the S gates (i.e. the Pauli operators listed in Theorem 12.10
are not necessary; they are stated for convenience).
In order to now efficiently simulate gates on our stabilized state, it suffices to explicitly write
down how the Clifford group generators {H, S, CN OT } act on Pauli strings:
HXH = Z (12.12)
HZH = X (12.13)
SXS † = Y (12.14)
†
SZS = Z (12.15)
CN OT (X1 ⊗ I2 )CN OT = X1 ⊗ X2 (12.16)
CN OT (I1 ⊗ X2 )CN OT = I1 ⊗ X2 (12.17)
CN OT (Z1 ⊗ I2 )CN OT = Z1 ⊗ I2 (12.18)
CN OT (I1 ⊗ Z2 )CN OT = Z1 ⊗ Z2 , (12.19)
where we assume the CNOT acts on qubit 1 as control and qubit 2 as target. For example,
Pauli string XZXZ is mapped under CN OT12 to
CN OT12 (X1 Z2 X3 Z4 )CN OT12 = (CN OT12 (X1 I2 I3 I4 )CN OT12 )(CN OT12 (I1 Z2 X3 Z4 )CN OT12 )
= (X1 X2 I3 I4 )(Z1 Z2 X3 Z4 )
= −Y1 Y2 X3 Z4 ,
where the last equality follows since XZ = −iY .
Exercise 12.33. Equations 12.12 through 12.19 do not explicitly describe the action of H, S,
or CN OT on Pauli Y . How can we recover the action on Y immediately using just the action
on X and Z? Use your insight to compute CN OT (I ⊗ Y2 )CN OT .
Measurement simulation. The last tool we need for Theorem 12.10 is measurement in the stan-
dard basis. For this, suppose we have an n-qubit system stabilized by n generators ⟨g1 , . . . , gn ⟩,
so that the encoded state |ψ⟩ is unique (up to phase) by Lemma 12.22. We wish to measure a
subset of the qubits in the standard (i.e. Z) basis; without loss of generality, assume these are
the first k qubits, so that the relevant observable is Z ⊗k ⊗ I n−k . Recall the projectors onto the
+1 and −1 eigenspaces of Z := Z ⊗k , are (respectively)
I + Z ⊗k I − Z ⊗k
Π+ := and Π− := . (12.20)
2 2
where Π+ (respectively, Π− ) projects onto the span of all even (respectively, odd) Hamming
weight k-bit strings. We may now fully characterize the measurement probabilities of Π+ and
Π− by, again, looking solely at the generators ⟨g1 , . . . , gk ⟩. There are two cases to consider:
• (Case 1: For all i ∈ [n], [Z, gi ] = 0.) In this case, either Z ∈ ⟨g1 , . . . , gn ⟩ or −Z ∈
⟨g1 , . . . , gn ⟩, as you now show.
109
Exercise 12.34. Show that Z|ψ⟩ is stabilized by ⟨g1 , . . . , gn ⟩. Since the stabilized sub-
space is dimension 1, conclude that Z|ψ⟩ ∝ |ψ⟩ (Aside: Can Z|ψ⟩ = 0?)
Exercise 12.35. Why can we now conclude that Z|ψ⟩ = ±|ψ⟩, and why does this yield
the desired claim?
• (Case 2: For some i ∈ [n], [Z, gi ] ̸= 0.) In this case, you will show that ⟨ψ|Π+ |ψ⟩ =
⟨ψ|Π− |ψ⟩ = 1/2, i.e. both outcomes ±1 occur with equal probability.
Exercise 12.36. Suppose [Z, gi ] ̸= 0. Then, by Exercise 12.20, Z, gi = 0, i.e. Z and
gi anti-commute. Using this, show that
I −Z
I +Z
⟨ψ|Π+ |ψ⟩ = ⟨ψ| |ψ⟩ = ⟨ψ| |ψ⟩ = ⟨ψ|Π− |ψ⟩. (12.21)
2 2
(Hint: Use the fact all gi are Hermitian. Why is this a necessary condition for ⟨g1 , . . . , gn ⟩
to be non-trivial?)
Post-measurement states. It is also easy to update our stabilizer to reflect the post-measurement
state. Namely, in Case 1, precisely one of Z or −Z is in ⟨g1 , . . . , gn ⟩ already, so no update to the
stabilizer is necessary. (Intuitively, in Case 1, the measurement outcome occurs with probability
1, meaning |ψ⟩ is not disturbed upon measurement. Thus, its stabilizer need not change.) As for
Case 2, we claim that if we assume that [Z, gj ] = 0 for all j ̸= i, then whenever we get outcome
+1 (respectively, −1), the postmeasurement state is stabilized by g1 , . . . gi−1 , Z, gi+1 , . . . , gn
(respectively, g1 , . . . gi−1 , −Z, gi+1 , . . . , gn ).
Exercise 12.37. Show the claim above. Where do we use the fact that [Z, gj ] = 0 for all j ̸= i?
Exercise 12.38. Why can we assume without loss of generality that [Z, gj ] = 0 for all j ̸= i?
(Hint: Suppose gi , Z = gj , Z = 0 for i ̸= j. What can we say about [gi gj , Z]?)
110
13 Independent reading - Quantum money
13.1 Quantum money
And now, we have arrived at the end of this course. As the topic of the final lecture, I felt it would
be fitting to cover the first known suggested use of quantum information — quantum money.
Indeed, quantum information processing was not first envisioned to solve fancy problems such
as factoring or simulation of physical systems. Rather, in the early 1970’s, Stephen Wiesner
had the idea of using four single-qubit quantum states, |0⟩, |1⟩, |+⟩, |−⟩, to implement an
unforgeable notion of money. As the story goes (more accurately, as Gilles Brassard told the
story at QIP 2010 in Zurich, and as far as my memory recalls), Wiesner actually proposed
his idea as part of a university assignment or project of some sort. His professor did not
understand it, and thus it was largely cast aside, only to be published about a decade later
(1983, to be precise [Wie83]). In the meantime, however, Wiesner’s idea of conjugate coding has
triumphantly stood the test of time. For example, by happy coincidence, Wiesner happened to
share his ideas with Charlie Bennett and Gilles Brassard, who went on to propose their now-
famous BB84 protocol in 1984 [BB84] for quantum key distribution (also based on conjugate
coding).
Our focus. There is more to quantum money which we will be able to cover in this lecture, as
with most topics in this course. Here is what we will focus on. Wiesner’s original scheme turns
out to indeed be secure, meaning: Given a quantum banknote |ψ⟩ consisting of n conjugate
coding states, the probability of creating a duplicate copy of |ψ⟩ is exponentially small in n, i.e.
precisely (3/4)n . (Remarkably, it was not until 2012 that this was explicitly shown rigorously by
Molina, Vidick, and Watrous [MVW13] using semidefinite programming.) Note this security is
information-theoretic, meaning no computational assumptions are required, other than having
the adversary obey the laws of quantum mechanics.
However, security is a fickle thing, and it turns out that if we change the rules of engagement
ever so slightly, Wiesner’s scheme is no longer secure. In particular, as with current-day ban-
knotes, a bank may wish to verify that a claimed quantum banknote |ψ⟩ e is genuine. It turns
out that, if the bank does what any normal person would, namely return a banknote which
passes the authenticity test and confiscate a banknote which fails the authenticity test, then
Wiesner’s scheme can be broken. (Here, the key point is we are now allowing multiple rounds
of interaction with an entity, the bank, which performs as its authenticity test the projective
measurement M = {|ψ⟩⟨ψ|, I − |ψ⟩⟨ψ|}, for |ψ⟩ the genuine banknote in question.)
111
particular, you are required to read sections 1 to 3 inclusive of this paper.
Motivation. There are a few reasons for the setup of this final lecture. First, the study
of quantum money highlights how careful one need be in formally modelling security in a
cryptographic context. Second, Reference [NSBU16] will teach you a neat new trick: Applying
the quantum Zeno effect via the Elitzur-Vaidman bomb tester, which is worth having in your
toolkit. Finally, the aim of this course is to get you to become independent learners in the field,
and so leaving you with an independent reading lecture is arguably quite appropriate. With
this in mind, we thus say: Goodbye, farewell, and leave you with one final quote:
• Griffiths [Gri04]: For those interested in looking beyond our “quantum mechanics sand-
box” to what undergraduate quantum mechanics is “really about”.
• Nielsen and Chuang [NC00]: The course reference textbook, with much material we did
not manage to cover.
112
Bibliography
[AA11] Scott Aaronson and Alex Arkhipov. The computational complexity of linear optics.
In Forty-third Annual ACM Symposium on Theory of Computing, STOC ’11, pages
333–342, 2011.
[AR20] Scott Aaronson and Patrick Rall. Quantum Approximate Counting, Simplified,
pages 24–32. 2020.
[BB84] C. H. Bennett and G. Brassard. Quantum cryptography: Public key distribution
and coin tossing. In IEEE International Conference on Computers, Systems, and
Signal Processing, volume 175, page 8, 1984.
[Bri19] Karl Bringmann. Fine-Grained Complexity Theory (Tutorial). In Rolf Niedermeier
and Christophe Paul, editors, 36th International Symposium on Theoretical Aspects
of Computer Science (STACS 2019), volume 126 of Leibniz International Proceed-
ings in Informatics (LIPIcs), pages 4:1–4:7, Dagstuhl, Germany, 2019. Schloss
Dagstuhl–Leibniz-Zentrum fuer Informatik.
[Gri04] D. J. Griffiths. Introduction to Quantum Mechanics (2nd Edition). Pearson Pren-
tice Hall, 2004.
[KLLNP16] M. Kaplan, Gaëtan L., A. Leverrier, and M. Naya-Plascencia. Breaking sym-
metric cryptosystems using quantum period finding. In Advances in Cryptology -
CRYPTO 2016., volume 9815 of Lecture Notes in Computer Science, 2016.
[KM10] H. Kuwakado and M. Morii. Quantum distinguisher between the 3-round feistel
cipher and the random permutation. In 2010 IEEE International Symposium on
Information Theory, pages 2682–2685, June 2010.
[Kul02] S. R. Kulkarni. Chapter 4: Frequency domain and Fourier transforms. https:
//www.princeton.edu/~cuff/ele201/kulkarni_text/frequency.pdf, 2002.
[Mil76] Gary L. Miller. Riemann’s hypothesis and tests for primality. Journal of Computer
and System Sciences, 13(3):300 – 317, 1976.
[MVW13] Abel Molina, Thomas Vidick, and John Watrous. Optimal counterfeiting attacks
and generalizations for Wiesner’s quantum money. In Kazuo Iwama, Yasuhito
Kawano, and Mio Murao, editors, Theory of Quantum Computation, Communica-
tion, and Cryptography, volume 7582 of Lecture Notes in Computer Science, pages
45–64. Springer Berlin Heidelberg, 2013.
[NC00] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information.
Cambridge University Press, 2000.
[NSBU16] D. Nagaj, O. Sattath, A. Brodutch, and D. Unruh. An adaptive attack on Wiesner’s
quantum money. Quantum Information & Computation, 16(11&12):1048–1070,
2016.
113
[SS17] T. Santoli and C. Schaffner. Using Simon’s algorithm to attack symmetric-key
cryptographic primitives. Quantum Information & Computation, 17(1&2):65–78,
2017.
114