Graph Theory and Additive Combinatorics
Graph Theory and Additive Combinatorics
Yufei Zhao
Preface v
Notation and Conventions ix
1 Forbidding a Subgraph 11
1.1 Forbidding a Triangle: Mantel’s Theorem 12
1.2 Forbidding a Clique: Turán’s Theorem 14
1.3 Turán Density and Supersaturation 19
1.4 Forbidding a Complete Bipartite Graph: Kővári–Sós–Turán Theorem 22
1.5 Forbidding a General Subgraph: Erdős–Stone–Simonovits Theorem 27
1.6 Forbidding a Cycle 31
1.7 Forbidding a Sparse Bipartite Graph: Dependent Random Choice 33
1.8 Lower Bound Constructions: Overview 37
1.9 Randomized Constructions 38
1.10 Algebraic Constructions 40
1.11 Randomized Algebraic Constructions 46
3 Pseudorandom Graphs 91
3.1 Quasirandom Graphs 92
3.2 Expander Mixing Lemma 103
i
ii Contents
References 303
Index 319
Preface
Lecture videos
A complete set of video lectures from my Fall 2019 class is available for free through MIT
OpenCourseWare and YouTube (search for Graph Theory and Additive Combinatorics and
MIT OCW). The lecture videos are a useful resource and complement this book.
v
vi Preface
A core thread throughout the book is the connection bridging graph theory and additive
combinatorics. The book opens with Schur’s theorem, which is an early example whose proof
illustrates this connection. Graph theoretic perspectives are presented throughout the book.
Here are some the topics and questions considered in this book:
Chapter 1: Forbidding a subgraph
What is the maximum number of edges in a triangle-free graph on 𝑛 vertices? What if
instead we forbidding some other subgraph? This is known as the Turán problem.
Chapter 2: Graph regularity method
Szemerédi introduced this powerful tool that provides an approximate structural descrip-
tion for every large graph.
Chapter 3: Pseudorandom graphs
What does it mean for some graph to resemble a random graph?
Chapter 4: Graph limits
In what sense can a sequence of graphs, increasing in size, converge to some limit object?
Chapter 5: Graph homomorphism inequalities
What are possible relationships between subgraph densities?
Chapter 6: Forbidding a 3-term arithemtic progression
Roth’s theorem and Fourier analysis in additive combinatorics.
Chapter 7: Structure of set addition
What can one say about a set of integer 𝐴 with small sumset 𝐴 + 𝐴 = {𝑎 + 𝑏 : 𝑎, 𝑏 ∈ 𝐴}?
Freiman’s theorem is foundational result that gives an answer.
Chapter 8: Sum-product problem
Can a set 𝐴 simultaneously have both small sumset 𝐴 + 𝐴 and product set 𝐴 · 𝐴?
Chapter 9: Progressions in sparse pseudorandom sets
Key ideas in the proof of the Green–Tao theorem. How can we apply a dense setting result,
namely Szemerédi’s theorem, to a sparse set?
For a more detailed list of topics, see the highlights and summary boxes at the beginning
and the end of each chapter.
The book is roughly divided into two parts, with graph theory the focus of Chapters 1 to 5
and additive combinatorics the focus of Chapters 6 to 9. These are not disjoint and separate
subjects. Rather, graph theory and additive combinatorics are interleaved throughout the
book. We emphasize their interactions. Each chapter can be enjoyed independently as there
are very few dependencies between chapters, though one gets the most out of the book by
appreciating the connections.
more technical or advanced topics and proofs, including: (Chapter 1) the proofs of the
Erdős–Stone–Simonovits theorem, the 𝐾𝑠,𝑡 construction, randomized algebraic construction;
(Chapter 2) the proof of the graph counting lemma, induced graph removal and strong
regularity, hypergraph regularity and removal; (Chapter 3) quasirandom groups, quasirandom
Cayley graphs; (Chapter 4) most technical proofs on graph limits; (Chapter 5) Hölder, entropy;
(Chapter 6) arithmetic regularity and popoular common difference; (Chapter 7) proofs in the
later part of the chapter if short on time; (Chapter 9) proof details.
For a class focused on one part of the book, one may wish to explore further topics as
suggested in Further Reading at the end of each chapter.
Prerequisites
The prerequisites are minimal—primarily mathematical maturity and an interest in combina-
torics. Some basic concepts from abstract algebra, analysis, and linear algebra are assumed.
Exercises
The book contains around 150 carefully selected exercises. They are scattered throughout
each chapter. Some exercises are embedded in the middle of a section—these exercises are
meant as routine tests of understanding of the concepts just discussed. For example, they
sometimes ask you to fill in missing proof details or think about easy generalizations and
extensions. The exercises at the end of each section are carefully selected problems that
reinforce the techniques discussed in the chapter. Hopefully they are all interesting. Most
of them are intended to test your mastery of the techniques taught in the chapter. Many of
these end-of-chapter exercises are quite challenging, with starred problems intended to be
more difficult but still do-able by a strong student given the techniques taught. Many of these
exercises are adapted from lemmas and results from research papers (I apologize for omitting
references for the exercises, so that they can be used as homework assignments).
Spending time with the exercises is essential for mastering the techniques. I used many
of these exercises in my classes. My students often told me that they thought that they
had understood the material after a lecture, only to discover their incomplete mastery when
confronted with the exercises. Struggling with these exercises led them to newfound insight.
Further reading
This is a massive and rapidly expanding subject. The book is intended to be introductory
and enticing rather than comprehensive. Each chapter concludes with recommendations for
further reading for anyone who wishes to learn more. Additionally, references are given
generously throughout the text for anyone who wishes to dive deeper and read the original
sources.
Acknowledgements
I thank all my teachers and mentors who have taught me the subject starting from when
I was a graduate student, with a special shoutout to my PhD advisor Jacob Fox for his
viii Preface
dedicated mentorship. I first encountered this subject at the University of Cambridge, when
I took a Part III class on extremal graph theory taught by David Conlon. Over the years,
I learned a lot from various researchers thanks to their carefully and insightfully written
lecture notes scattered on the web, in particular by David Conlon, Tim Gowers, Andrew
Granville, Ben Green, Choongbum Lee, László Lovász, Imre Ruzsa, Asaf Shapira, Adam
Sheffer, K. Soundararajan, Terry Tao, and Jacques Verstraete.
This book arose from a one-semester course that I taught at MIT in Fall 2017, 2019,
and 2021. I thank all my amazing and dedicated students who kept their interest in my
teaching — they were instrumental in motivating me to complete this book project. Students
from the 2017 and 2019 classes took notes based on my lectures, which I subsequently
rewrote and revised into this book. My 2021 class used an early draft of this book and gave
valuable comments and feedback. There are many students whom I wish to thank, and here
is my attempt at listing them (my apologies to anyone whose name I inadvertently omitted):
Dhroova Aiylam, Ganesh Ajjanagadde, Shyan Akmal, Ryan Alweiss, Morris Ang Jie Jun,
Adam Ardeishar, Matt Babbitt, Yonah Borns-Weil, Matthew Brennan, Brynmor Chapman,
Evan Chen, Byron Chin, Ahmed Chowdhury Zawad, Anlong Chua, Travis Dillon, Jonathan
Figueroa Rodriguez, Christian Gaetz, Shengwen Gan, Jiyang Gao, Yibo Gao, Swapnil Garg,
Benjamin Gunby, Meghal Gupta, Kaarel Haenni, Milan Haiman, Linus Hamilton, Carina
Hong Letong, Vishesh Jain, Pakawut Jiradilok, Sujay Kazi, Younhun Kim, Elena Kim,
Dain Kim, Yael Kirkpatrick, Daishi Kiyohara, Frederic Koehler, Keiran Lewellen, Anqi Li,
Jerry Li, Allen Liu, Michael Ma, Nitya Mani, Olga Medrano, Holden Mui, Eshaan Nichani,
Yuchong Pan, Minjae Park, Alan Peng, Saranesh Prembabu, Michael Ren, Dhruv Rohatgi,
Diego Roque, Ashwin Sah, Maya Sankar, Mehtaab Sawhney, Carl Schildkraut, Tristan Shin,
Mihir Singhal, Tomasz Slusarczyk Albert Soh, Kevin Sun, Sarah Tammen, Jonathan Tidor,
Paxton Turner, Danielle Wang, Hong Wang, Nicole Wein, Jake Wellens, Chris Xu, Max
Wenqiang Xu, Yinzhan Xu, Zixuan Xu, Lisa Yang, Yuan Yao, Richard Yi, Hung-Hsun Yu,
Lingxian Zhang, Kai Zheng, Yunkun Zhou. Additionally, I would like to thank Thomas
Bloom and Zilin Jiang for carefully reading the book draft and sending in many suggestions
for corrections and improvements.
The title page illustration (with the bridge) was drawn by my friend Anne Ma.
I also wish to acknowledge research funding support during the writing of this book,
including from the National Science Foundation, the Sloan Research Foundation, as well
as support from MIT including the Solomon Buchsbaum Research Fund, the Class of 1956
Career Development Professorship, and the Edmund F. Kelly Research Award.
Finally, I am grateful to all my students, colleagues, friends, and family for their encour-
agement throughout the writing of the book, and most importantly to Lu for her unwavering
support through the whole process, especially in the late stages of the book writing, which
coincided with the arrival of our baby daughter Andi.
Yufei Zhao
Cambridge, MA
February 2022
http://yufeizhao.com/
yufeiz@mit.edu
Notation and Conventions
We use standard notation in this book. The comments here are mostly for clarification. You
should skip this section and return to it only as needed.
Sets
We write [𝑵] := {1, 2, . . . , 𝑁 }. Also N := {1, 2, . . . }.
Given a finite set 𝑆 and a positive integer 𝑟, we write 𝑺𝒓 for the set of 𝑟-element subsets
of 𝑆.
If 𝑆 is a finite set and 𝑓 is a function on 𝑆, we use the expectation notation E 𝒙∈𝑺 𝒇 (𝒙), or
Í
more simply E 𝒙 𝒇 (𝒙) (or even E 𝒇 if there is no confusion) to mean the average |𝑆| −1 𝑥 ∈𝑆 𝑓 (𝑥).
We also use the symbol E for its usual meaning as the expectation for some random variable.
Graphs
We write a graph as 𝑮 = (𝑽, 𝑬), where 𝑉 is a finite set of vertices, and
𝐸 is the set of edges.
Each edge is an unordered pair of distinct vertices. Formally, 𝐸 ⊆ 𝑉2 .
Given a graph 𝐺, we write 𝑽 (𝑮) for the set of vertices, and 𝑬 (𝑮) for the set of edges,
and denote their cardinalities by 𝒗(𝑮) := |𝑉 (𝐺)| and 𝒆(𝑮) := |𝐸 (𝐺)|.
In a graph 𝐺, the neighborhood of a vertex 𝑥, denoted 𝑵𝑮 (𝒙) (or simply 𝑁 (𝑥) if there is
no confusion), is the set of vertices 𝑦 such that 𝑥𝑦 is an edge. The degree of 𝑥 is the number
of neighbors of 𝑥, denoted deg𝑮 (𝒙) := |𝑁𝐺 (𝑥)| (or simply written as deg(𝒙)).
Given a graph 𝐺, for each 𝐴 ⊆ 𝑉 (𝐺), we write 𝒆(𝑨) to denote the number of edges with
both endpoints in 𝐴. Given 𝐴, 𝐵 ⊆ 𝑉 (𝐺) (not necessarily disjoint), we write
𝒆(𝑨, 𝑩) := |{(𝑎, 𝑏) ∈ 𝐴 × 𝐵 : 𝑎𝑏 ∈ 𝐸 (𝐺)}| .
ix
x Notation and Conventions
Note that when 𝐴 and 𝐵 are disjoint, 𝑒( 𝐴, 𝐵) is the number of the edges between 𝐴 and 𝐵.
On the other hand, 𝑒( 𝐴, 𝐴) = 2𝑒( 𝐴) as each edge within 𝐴 is counted twice.
Here are some standard graphs:
• 𝑲𝒓 is the complete graph on 𝑟 vertices, also known as an 𝒓-clique;
• 𝑲𝒔,𝒕 is the complete bipartite graph with 𝑠 vertices in one vertex part and 𝑡 vertices in
the other vertex part;
• 𝑲𝒓 ,𝒔,𝒕 is a complete tripartite graph with vertex parts having sizes 𝑟, 𝑠, 𝑡 respectively
(e.g., 𝐾1,1,1 = 𝐾3 ); and so on analogously for complete multipartite graphs with more
parts;
• 𝑪ℓ (ℓ ≥ 3) is a cycle with ℓ vertices and ℓ edges.
Some examples are shown below.
𝐾4 𝐾3,2 𝐾3,2,2 𝐶8
Given two graphs 𝐻 and 𝐺, we say that 𝐻 is a subgraph of 𝐺 if one can delete some
vertices and edges from 𝐺 to obtain a graph isomorphic to 𝐻 (example below). A copy of
𝐻 in 𝐺 is a subgraph 𝐺 that is isomorphic to 𝐻. A labeled copy of 𝐻 in 𝐺 is a subgraph of
𝐺 isomorphic to 𝐻 where we also specify the isomorphism from 𝐻. Equivalently, a labeled
copy of 𝐻 in 𝐺 is an injective graph homomorphism from 𝐻 to 𝐺. For example, if 𝐺 has 𝑞
copies of 𝐾3 , the 𝐺 has 6𝑞 labeled copies of 𝐾3 .
We say that 𝐻 is an induced subgraph of 𝐺 if one can delete some vertices of 𝐺 (when
we delete a vertex, we also remove all edges incident to the vertex) to obtain 𝐻—note that
in particular we are not allowed to remove additional edges other than those incident to a
deleted vertex. If 𝑆 ⊆ 𝑉 (𝐺), we write 𝑮[𝑺] to denote the subgraph of 𝐺 induced by the
vertex set 𝑆, i.e., 𝐺 [𝑆] is the subgraph with vertex set 𝑆 and keeping all the edges from 𝐺
among 𝑆.
As an example, the following graph contains the 4-cycle as an induced subgraph. It contains
the 5-cycle as a subgraph but not as an induced subgraph.
In this book, when we say 𝑯-free, we always mean not containing 𝐻 as a subgraph. On
the other hand, we say induced 𝑯-free to mean not containing 𝐻 as an induced subgraph.
Given two graphs 𝐹 and 𝐺, a graph homomorphism is a map 𝜙 : 𝑉 (𝐹) → 𝑉 (𝐺) (not
necessarily injective) such that 𝜙(𝑢)𝜙(𝑣) ∈ 𝐸 (𝐺) whenever 𝑢𝑣 ∈ 𝐸 (𝐹). In other words, 𝜙 is
a map of vertices that sends edges to edges. A key difference between a copy of 𝐹 in 𝐺 and
a graph homomorphism from 𝐹 to 𝐺 is that the latter does not have to be an injective map
of vertices.
The chromatic number 𝝌(𝑮) of a graph 𝐺 is the smallest number of colors needed to
Notation and Conventions xi
color the vertices of 𝐺 of so that no two adjacent vertices receive the same color (such a
coloring is called a proper coloring).
The adjacency matrix of a graph 𝐺 = (𝑉, 𝐸) is a 𝑣(𝐺) × 𝑣(𝐺) matrix whose rows and
columns both are indexed by 𝑉, and such that the entry indexed by (𝑢, 𝑣) ∈ 𝑉 × 𝑉 is 1 if
𝑢𝑣 ∈ 𝐸 and 0 if 𝑢𝑣 ∉ 𝐸.
An 𝒓-uniform hypergraph (also
called 𝒓-graph for short) consists of a finite vertex set 𝑉
along with an edge set 𝐸 ⊆ 𝑉𝑟 . Each edge of the 𝑟-graph is an 𝑟-element subset of vertices.
Asymptotics
We use the following standard asymptotic notation. Given nonnegative quantities 𝑓 and 𝑔, in
each item below, the various notations have the same meaning (as some parameter, usually
𝑛, tends to infinity)
• 𝒇 ≲ 𝒈, 𝒇 = 𝑶 (𝒈), 𝒈 = 𝛀( 𝒇 ), 𝑓 ≤ 𝐶𝑔 for some constant 𝐶 > 0
• 𝒇 = 𝒐(𝒈), 𝑓 /𝑔 → 0
• 𝒇 = 𝚯(𝒈), 𝒇 ≍ 𝒈, 𝑔 ≲ 𝑓 ≲ 𝑔
• 𝒇 ∼ 𝒈, 𝑓 = (1 + 𝑜(1))𝑔
Subscripts (e.g., 𝑶 𝒔 ( ), ≲𝒔 ) are used to emphasize that the hidden constants may depend
on the subscripted parameters. For example, 𝑓 (𝑠, 𝑥) ≲𝑠 𝑔(𝑠, 𝑥) means that for every 𝑠 there
is some constant 𝐶𝑠 so that 𝑓 (𝑠, 𝑥) ≤ 𝐶𝑠 𝑔(𝑠, 𝑥) for all 𝑥.
We avoid using ≪ since this notation carries different meanings in different communities
and by different authors. In analytic number theory, 𝑓 ≪ 𝑔 is standard for 𝑓 = 𝑂 (𝑔) (this
is called Vinogradov notation). In combinatorics and probability, 𝑓 ≪ 𝑔 sometimes means
𝑓 = 𝑜(𝑔), and sometimes means that 𝑓 is sufficiently small depending on 𝑔.
When asymptotic notation is used in the hypothesis of a statement, it should be interpreted
as being applied to a sequence rather than a single object. For example, given functions 𝑓
and 𝑔, we write
if 𝑓 (𝐺) = 𝑜(1), then 𝑔(𝐺) = 𝑜(1)
to mean
whenever a sequence 𝐺 𝑛 satisfies 𝑓 (𝐺 𝑛 ) = 𝑜(1), then 𝑔(𝐺 𝑛 ) = 𝑜(1),
which is also equivalent to
for every 𝜀 > 0 there is some 𝛿 > 0 such that if | 𝑓 (𝐺)| ≤ 𝛿 then |𝑔(𝐺)| ≤ 𝜀.
0
Chapter Highlights
• Schur’s theorem on monochromatic solutions to 𝑥 + 𝑦 = 𝑧 and its graph theoretic proof
• Problems and results on progressions (e.g., Szemerédi’s theorem, the Green–Tao theorem)
• Introduction to the connection between graph theory and additive combinatorics
The finitary formulation leads to quantitative questions. For example, how large does 𝑁 (𝑟)
have to be as a function of 𝑟? Questions of this type are often quite difficult to resolve, even
approximately. There are lots of open questions concerning quantitative bounds.
1
2 Appetizer: Triangles and Equations
Proof that the above two formulations of Schur’s theorem are equivalent. First, the finitary
version (Theorem 0.1.2) of Schur’s theorem easily implies the infinitary version (Theo-
rem 0.1.1). Indeed, in the infinitary version, given a coloring of the positive integers, we
can consider the colorings of the first 𝑁 (𝑟) integers and use the finitary statement to find a
monochromatic solution.
To prove that the infinitary version implies the finitary version, we use a diagonalization
argument. Fix 𝑟, and suppose that for every 𝑁 there is some coloring 𝜙 𝑁 : [𝑁] → [𝑟] that
avoids monochromatic solutions to 𝑥 + 𝑦 = 𝑧. We can take an infinite subsequence of (𝜙 𝑁 )
such that, for every 𝑘 ∈ N, the value of 𝜙 𝑁 (𝑘) stabilizes to a constant as 𝑁 increases along this
subsequence (we can do this by repeatedly restricting to convergent infinite subsequences).
Then the 𝜙 𝑁 ’s, along this subsequence, converge pointwise to some coloring 𝜙 : N → [𝑟]
avoiding monochromatic solutions to 𝑥 + 𝑦 = 𝑧, but 𝜙 contradicts the infinitary statement. □
Proof assuming Schur’s theorem (Theorem 0.1.2). Let (Z/𝑝Z) × denote the group of nonzero
residues mod 𝑝 under multiplication. Let 𝐻 = {𝑥 𝑛 : 𝑥 ∈ (Z/𝑝Z) × } be the subgroup of 𝑛-th
powers in (Z/𝑝Z) × . Since (Z/𝑝Z) × is a cyclic group of order 𝑝 − 1 (due to the existence of
primitive roots mod 𝑝, a fact from elementary number theory), the index of 𝐻 in (Z/𝑝Z) ×
is equal to gcd(𝑛, 𝑝 − 1) ≤ 𝑛. So the cosets of 𝐻 partition {1, 2, . . . , 𝑝 − 1} into ≤ 𝑛 sets.
Viewing each of the ≤ 𝑛 cosets of 𝐻 as a “color”, by the finitary statement of Schur’s theorem
(Theorem 0.1.2), for 𝑝 large enough as a function of 𝑛, there exists a solution to
𝑥+𝑦=𝑧 in Z
in some coset of 𝐻, say 𝑥, 𝑦, 𝑧 ∈ 𝑎𝐻 for some 𝑎 ∈ (Z/𝑝Z) × . Since 𝐻 consists of 𝑛-th powers,
we have 𝑥 = 𝑎𝑋 𝑛 , 𝑦 = 𝑎𝑌 𝑛 , and 𝑧 = 𝑎𝑍 𝑛 for some 𝑋, 𝑌 , 𝑍 ∈ (Z/𝑝Z) × . Thus
𝑎𝑋 𝑛 + 𝑎𝑌 𝑛 ≡ 𝑎𝑍 𝑛 (mod 𝑝).
Since 𝑎 ∈ (Z/𝑝Z) × is invertible mod 𝑝, we have 𝑋 𝑛 + 𝑌 𝑛 ≡ 𝑍 𝑛 (mod 𝑝) as desired. □
Ramsey’s theorem
Now let us prove Schur’s theorem (Theorem 0.1.2) by deducing it from an analogous result
about edge-coloring of a complete graph.
We write 𝐾 𝑁 for the complete graph on 𝑁 vertices.
0.1 Schur’s Theorem 3
Proof. Define
𝑁1 = 3, and 𝑁𝑟 = 𝑟 (𝑁𝑟 −1 − 1) + 2 for all 𝑟 ≥ 2. (0.1)
We show by induction on 𝑟 that every coloring of the edges of 𝐾 𝑁𝑟 by 𝑟 colors has a
monochromatic triangle. The case 𝑟 = 1 holds trivially.
Suppose the claim is true for 𝑟 − 1 colors. Consider any edges-coloring of 𝐾 𝑁𝑟 using 𝑟
colors. Pick an arbitrary vertex 𝑣. Of the 𝑁𝑟 − 1 = 𝑟 (𝑁𝑟 −1 − 1) + 1 edges incident to 𝑣, by
the pigeonhole principle, at least 𝑁𝑟 −1 edges incident to 𝑣 have the same color, say red. Let
𝑉0 be the vertices joined to 𝑣 by a red edge.
𝑉0
𝑣
If there is a red edge inside 𝑉0 , we obtain a red triangle. Otherwise, there are at most 𝑟 − 1
colors appearing among |𝑉0 | ≥ 𝑁𝑟 −1 vertices, and we have a monochromatic triangle inside
𝑉0 by the induction hypothesis. □
Í𝑟
Exercise 0.1.5. Show that 𝑁𝑟 from (0.1) satisfies 𝑁𝑟 = 1 + 𝑟! 𝑖=0 1/𝑖! = ⌈𝑟!𝑒⌉.
Proof. Label the vertices by elements of {0, 1}𝑟 . Assign an edge color 𝑖 if 𝑖 is the smallest
index such that the two endpoint vertices differ on coordinate 𝑖. This coloring does not have
monochromatic triangles. Indeed, suppose 𝑥, 𝑦, 𝑧 form a monochromatic triangle with color
𝑖, then 𝑥𝑖 , 𝑦 𝑖 , 𝑧𝑖 ∈ {0, 1} must be all distinct, which is impossible. □
Schur (1916) had actually given an even better lower bound: see Exercise 0.1.14. One of
Erdős’ favorite problems asks whether there is an exponential upper bound. This is major
open problem in Ramsey theory, and it is related to to other important topics in combinatorics
such as the Shannon capacity of graphs (see, e.g., the survey by Nešetřil & Rosenfeld 2001).
Open problem 0.1.13 (Multicolor triangle Ramsey numbers: exponential upper bound)
Is there a constant 𝐶 > 0 so that if 𝑁 ≥ 𝐶 𝑟 , then every edge-coloring of 𝐾 𝑁 using 𝑟
colors contains a monochromatic triangle?
𝑖 𝜙( 𝑗 − 𝑖) 𝑗 𝜙(𝑘 − 𝑗) 𝑘
Exercise 0.1.15 (Upper bound on Ramsey numbers). Let 𝑠 and 𝑡 be positive integers.
−2
Show that if the edges of a complete graph on 𝑠+𝑡 𝑠−1
vertices are colored with red and
blue, then there must be either a red 𝐾𝑠 or a blue 𝐾𝑡 .
0.2 Progressions
Additive combinatorics describes a rapidly growing body of mathematics motivated by
simple-to-state questions about addition and multiplication of integers (the name “additive
combinatorics” became popular in the 2000’s, when the field witnessed a rapid explosion
thanks to the groundbreaking works of Gowers, Green, Tao, and others; previously the area
was more commonly known as “combinatorial number theory”). The problems and methods
in additive combinatorics are deep and far-reaching, connecting many different areas of
mathematics such as graph theory, harmonic analysis, ergodic theory, discrete geometry, and
model theory.
Here we highlight some important developments in additive combinatorics, particularly
concerning progressions. The ideas behind these developments form some of the core themes
of this book.
Note that having arbitrarily long arithmetic progressions is very different from having
infinitely long arithmetic progressions, as seen in the next exercise.
Exercise 0.2.2. Show that Z may be colored using two colors so that it contains no
infinitely long arithmetic progressions.
Erdős & Turán (1936) conjectured a stronger statement, that any subset of the integers
with positive density contains arbitrarily long arithmetic progressions. To be precise, we say
6 Appetizer: Triangles and Equations
Szemerédi’s theorem is deep and intricate. This important work led to many subsequent
developments in additive combinatorics. Several different proofs of Szemerédi’s theorem
have since been discovered, and some of them have blossomed into rich areas of mathematical
research. Here are some of the most influential modern proofs of Szemerédi’s theorem (in
historical order):
• The ergodic theoretic approach by Furstenberg (1977);
• Higher-order Fourier analysis by Gowers (2001);
• Hypergraph regularity lemma by independently Rödl et al. (2005) and Gowers (2001).
Another modern proof of Szemerédi’s theorem results from the density Hales–Jewett
theorem, which was originally proved by Furstenberg & Katznelson (1978) using ergodic
theory. Subsequently a new combinatorial proof was found in the first successful Polymath
Project (Polymath 2012), an online collaborative project initiated by Gowers.
Each approaches has its own advantages and disadvantages. For example, the ergodic
approach led to multidimensional and polynomial generalizations of Szemerédi’s theorem,
which we discuss below. On the other hand, the ergodic approach does not give any concrete
quantitative bounds. Fourier analysis and its generalizations produce the best quantitative
bounds to Szemerédi’s theorem. They also led to deep results about counting patterns in the
prime numbers. However, there appear to be difficulties and obstructions extending Fourier
analysis to higher dimensions.
The relationships between these different approaches to Szemerédi’s theorem are not yet
0.2 Progressions 7
For example, the theorem implies that every subset of Z2 of positive upper density contains
a 𝑘 × 𝑘 axis-aligned square grid for every 𝑘.
There is also a polynomial extension of Szemerédi’s theorem. Let us first state a special
case, originally conjectured by Lovász and proved independently by Furstenberg (1977) and
Sárkőzy (1978).
In other words, the set always contains {𝑥, 𝑥 + 𝑦 2 } for some 𝑥 ∈ Z and 𝑦 ∈ Z>0 . What
about other polynomial patterns? The following polynomial generalization was proved by
Bergelson & Leibman (1996).
about prime numbers. Their theorem is considered one of the most celebrated mathematical
achievements this century.
We will discuss the Green–Tao theorem in Chapter 9. The theorem has been extended
to polynomial progressions (Tao & Ziegler 2008) and to higher dimensions (Tao & Ziegler
2015; also see Fox & Zhao 2015).
Roth originally proved his result using Fourier analysis (also called the Hardy–Littlewood
circle method in this context). We will see Roth’s proof in Chapter 6.
In the 1970’s, Szemerédi developed the graph regularity method. It is now a central
technique in extremal graph theory. Ruzsa & Szemerédi (1978) used the graph regularity
method to give a new graph theoretic proof of Roth’s theorem. We will see this proof as well
as other applications of the graph regularity method in Chapter 2.
Extremal graph theory, broadly speaking, concerns questions of the form: what is the
maximum (or minimum) possible number of some structure in a graph with certain prescribed
properties? A starting point (historically and also pedagogically) in extremal graph theory is
the following question:
This question has a relatively simple answer, and it will be the first topic in the next
chapter. We will then explore related questions about the maximum number of edges in a
graph without some given subgraph.
Although Question 0.3.2 above sounds similar to Roth’s theorem, it does not actually allow
us to deduce Roth’s theorem. Instead, we need to consider the following question.
Question 0.3.3
What is the maximum number of edges in an 𝑛-vertex graph where every edge is contained
in a unique triangle?
The graph regularity method illustrates the dichotomy of structure and pseudorandomness
in graph theory. Some of the later chapters dive further into related concepts. Chapter 3
explores pseudorandom graphs—what does it mean for a graph to look random? Chapter 4
concerns graph limits, a convenient analytic language for capturing many important con-
cepts in earlier chapters. Chapter 5 explores graph homomorphism inequalities, revisiting
questions from extremal graph theory with an analytic lens.
And then we switch gears (but not entirely) to some core topics in additive combinatorics.
Chapter 6 contains the Fourier analytic proof of Roth’s theorem. There will be many
thematic similarities between elements of the Fourier analytic proof and earlier topics.
Chapter 7 explores the structure of set addition. Here we prove Freiman’s theorem on sets
with small additive doubling, a cornerstone result in additive combinatorics. It also plays a
key roles in Gowers’ proof of Szemerédi’s theorem, generalizing Fourier analysis to higher
order Fourier analysis, although we will not go into the latter topic in this book (see Further
Reading at the end of Chapter 7). In Chapter 8, we explore the sum-product problem,
which is closely connected to incidence geometry (and we will see another graph theoretic
proof there). In Chapter 9, we discuss the Green–Tao theorem and prove an extension of
Szemerédi’s theorem to sparse pseudorandom sets, which plays a central role in the proof of
the Green–Tao theorem.
I hope that you will enjoy this book. I have been studying this subject since I began
graduate school. I still think about these topics nearly everyday. My goal is to organize and
distill the beautiful mathematics in this field as a friendly introduction.
The chapters do not have some logical dependencies, but not much. Each topic can be
studied and enjoyed on its own. Though, you will gain a lot more by appreciating the overall
themes and connections.
There is still a lot that we do not know. Perhaps you too will be intrigued by the boundless
open questions that are still waiting to be explored.
Further Reading
The book Ramsey Theory by Graham, Rothschild, & Spencer (1990) is a wonderful intro-
duction to the subject. It has beautiful accounts of theorems of Ramsey, van der Waerden,
Hales–Jewett, Schur, Rado, and others, that form the foundation of Ramsey theory.
For a survey of modern developments in additive combinatorics, check out the book review
by Green (2009a) of Additive Combinatorics by Tao & Vu (2006).
Chapter Summary
• Schur’s theorem. Every coloring of N using finitely many colors contains a monochro-
matic solution to 𝑥 + 𝑦 = 𝑧.
– Proof: set up a graph whose triangles correspond to solutions to 𝑥 + 𝑦 = 𝑧, and then
apply Ramsey’s theorem.
• Szemerédi’s theorem. Every subset of N with positive density contains arbitrarily long
arithmetic progressions.
– A foundational result that led to important developments in additive combinatorics.
– Several different proofs, each illustrating the dichotomy of structure of pseudoran-
domness in a different context.
– Extensions: multidimensional, polynomial, primes (Green–Tao).
1
Forbidding a Subgraph
Chapter Highlights
• Turán problem: determine the maximum number of edges in an 𝑛-vertex 𝐻-free graph
• Mantel and Turán’s theorems: 𝐾𝑟 -free
• Kővári–Sós–Turán theorem: 𝐾 𝑠,𝑡 -free
• Erdős–Stone–Simonovits theorem: 𝐻-free for general 𝐻
• Dependent random choice technique: 𝐻-free for a bounded degree bipartite 𝐻
• Lower bound constructions of 𝐻-free graphs for bipartite 𝐻
• Algebraic constructions: matching lower bounds for 𝐾2,2 , 𝐾3,3 , and 𝐾 𝑠,𝑡 for 𝑡 much larger
than 𝑠, and also for 𝐶4 , 𝐶6 , 𝐶10
• Randomized algebraic constructions
We will see the answer shortly. More generally, we can ask about what happens if replace
“triangle” by an arbitrary subgraph. This is a foundational problem in extremal graph theory.
The Turán problem is one of the most basic problems in extremal graph theory. It is named
after Turán for his fundamental work on the subject. Research on this problem has led to
many important techniques. We will see a fairly satisfactory answer to the Turán problem for
non-bipartite graphs 𝐻. We also know the answer for a small number of bipartite graphs 𝐻.
However, for nearly all bipartite graphs 𝐻, much mystery remains.
In the first part of the chapter, we focus on techniques for upper bounding ex(𝑛, 𝐻). In
the last few sections, we turn our attention to lower bounding ex(𝑛, 𝐻) when 𝐻 is a bipartite
graph.
11
12 Forbidding a Subgraph
𝐾4,4 = .
The graph 𝐾 ⌊𝑛/2⌋,⌈𝑛/2⌉ has ⌊𝑛/2⌋ ⌈𝑛/2⌉ = ⌊𝑛2 /4⌋ edges (one can check this equality by
separately considering even and odd 𝑛).
Mantel (1907) proved that 𝐾 ⌊𝑛/2⌋,⌈𝑛/2⌉ has the most number of edges among all triangle-free
graphs.
𝑁 (𝑥) 𝑁 (𝑦)
On the other hand, note that for each vertex 𝑥, the term deg 𝑥 appears once in the above
sum for each edge incident to 𝑥, and so it appears a total of deg 𝑥 times. We then apply the
Cauchy–Schwarz inequality to get
!2
∑︁ ∑︁ 1 ∑︁ (2𝑚) 2
2
(deg 𝑥 + deg 𝑦) = (deg 𝑥) ≥ deg 𝑥 = .
𝑥 𝑦 ∈𝐸 𝑥 ∈𝑉
𝑛 𝑥 ∈𝑉 𝑛
1.1 Forbidding a Triangle: Mantel’s Theorem 13
Comparing the two inequalities, we obtain (2𝑚) 2 /𝑛 ≤ 𝑚𝑛, and hence 𝑚 ≤ 𝑛2 /4. Since 𝑚 is
an integer, we obtain 𝑚 ≤ ⌊𝑛2 /4⌋, as claimed. □
Second proof of Mantel’s theorem. Let 𝐺 = (𝑉, 𝐸) be a triangle-free graph. Let 𝑣 be a
vertex of maximum degree in 𝐺. Since 𝐺 is triangle-free, the neighborhood 𝑁 (𝑣) of 𝑣 is an
independent set.
𝐴 = 𝑁 (𝑣) 𝐵
(independent
set)
The next several exercises explore extensions of Mantel’s theorem. It is useful to revisit
the proof techniques.
14 Forbidding a Subgraph
Exercise 1.1.4 (Many triangles). Show that a graph with 𝑛 vertices and 𝑚 edges has at
least
4𝑚 𝑛2
𝑚− triangles.
3𝑛 4
Exercise 1.1.5. Prove that every 𝑛-vertex non-bipartite triangle-free graph has at most
(𝑛 − 1) 2 /4 + 1 edges.
Exercise 1.1.6 (Stability). Let 𝐺 be an 𝑛-vertex triangle-free graph with at least ⌊𝑛2 /4⌋ − 𝑘
edges. Prove that 𝐺 can be made bipartite by removing at most 𝑘 edges.
Exercise 1.1.7. Show that every 𝑛-vertex triangle-free graph with minimum degree
greater than 2𝑛/5 is bipartite.
Exercise 1.1.8∗. Prove that every 𝑛-vertex graph with at least ⌊𝑛2 /4⌋ + 1 edges contains
at least ⌊𝑛/2⌋ triangles.
Exercise 1.1.9∗. Let 𝐺 be an 𝑛-vertex graph with ⌊𝑛2 /4⌋ − 𝑘 edges (here 𝑘 ∈ Z) and 𝑡
triangles. Prove that 𝐺 can be made bipartite by removing at most 𝑘 + 6𝑡/𝑛 edges, and that
this constant 6 is best possible.
Exercise 1.1.10∗. Prove that every 𝑛-vertex graph with at least ⌊𝑛2 /4⌋ + 1 edges contains
some edge in at least (1/6 − 𝑜(1))𝑛 triangles, and that this constant 1/6 is best possible.
Even when 𝑛 is not divisible by 𝑟, the difference between 𝑒(𝑇𝑛,𝑟 ) and (1 − 1/𝑟)𝑛2 /2 is
𝑂 (𝑛𝑟). As we are generally interested in the regime when 𝑟 is fixed, this difference is a
negligible lower order contribution. That is,
2
1 𝑛
ex(𝑛, 𝐾𝑟+1 ) = 1 − − 𝑜(1) , for fixed 𝑟 as 𝑛 → ∞.
𝑟 2
Every 𝑟-partite graph is automatically 𝐾𝑟+1 -free. Let us first consider an easy special case
of the problem.
Proof. Suppose we have an 𝑛-vertex 𝑟-partite graph with the maximum possible number of
edges. It should be a complete 𝑟-partite graph. If there were two vertex parts 𝐴 and 𝐵 with
| 𝐴| + 2 ≤ |𝐵|, then moving a vertex from 𝐵 (the larger part) to 𝐴 (the smaller part) would
increase the number of edges by (| 𝐴| + 1) (|𝐵| − 1) − | 𝐴| |𝐵| = |𝐵| − | 𝐴| − 1 > 0. Thus all
the vertex parts must have sizes within one of each other. The Turán graph 𝑇𝑛,𝑟 is the unique
such graph. □
We will see three proofs of Turán’s theorem. The first proof extends our second proof of
Mantel’s theorem.
First proof of Turán’s theorem. We prove by induction on 𝑟. The case 𝑟 = 1 is trivial as a
𝐾2 -free graph is empty. Now assume 𝑟 > 1 and that ex(𝑛, 𝐾𝑟 ) = 𝑒(𝑇𝑛,𝑟 −1 ) for every 𝑛.
Let 𝐺 = (𝑉, 𝐸) be a 𝐾𝑟+1 -free graph. Let 𝑣 be a vertex of maximum degree in 𝐺. Since 𝐺
is 𝐾𝑟+1 -free, the neighborhood 𝐴 = 𝑁 (𝑣) of 𝑣 is 𝐾𝑟 -free. So by the induction hypothesis,
𝑒( 𝐴) ≤ ex(| 𝐴| , 𝐾𝑟 ) = 𝑒(𝑇| 𝐴|,𝑟 −1 ).
16 Forbidding a Subgraph
𝐴 𝐵
(𝐾𝑟 -free)
Thus
𝑒(𝐺) = 𝑒( 𝐴) + 𝑒( 𝐴, 𝐵) + 𝑒(𝐵) ≤ 𝑒(𝑇| 𝐴|,𝑟 −1 ) + | 𝐴| |𝐵| ≤ 𝑒(𝑇𝑛,𝑟 ),
where the final step follows from the observation that 𝑒(𝑇| 𝐴|,𝑟 −1 ) + | 𝐴| |𝐵| is the number of
edges in an 𝑛-vertex 𝑟-partite graph (with part of size |𝐵| and the remaining vertices equitably
partitioned into 𝑟 − 1 parts) and Lemma 1.2.7.
Í
To have equality in every step above, 𝐵 must be an independent set (or else 𝑦 ∈ 𝐵 deg(𝑦) <
| 𝐴| |𝐵|) and 𝐴 must induce 𝑇| 𝐴|,𝑟 −1 , so that 𝐺 is 𝑟-partite. We knew from Lemma 1.2.7 that
the Turán graph 𝑇𝑛,𝑟 uniquely maximizes the number of edges among 𝑟-partite graphs. □
The second proof starts out similarly to our first proof of Mantel’s theorem. Recall that in
Mantel’s theorem, the initial observation was that in a triangle-free graph, given an edge, its
two endpoints must have no common neighbors (or else they form a triangle). Generalizing,
in a 𝐾4 -free graph, given a triangle, its three vertices have no common neighbor. The rest of
the proof proceeds somewhat differently from earlier. Instead of summing over all edges as
we did before, we remove the triangle and apply induction to the rest of the graph.
Second proof of Turán’s theorem. We fix 𝑟 and proceed by induction on 𝑛. The statement
is trivial for 𝑛 ≤ 𝑟, as the Turán graph is the complete graph 𝐾𝑛 = 𝑇𝑛,𝑟 and thus maximizes
the number of edges.
Now, assume that 𝑛 > 𝑟 and that Turán’s theorem holds for all graphs on fewer than 𝑛
vertices. Let 𝐺 = (𝑉, 𝐸) be an 𝑛-vertex 𝐾𝑟+1 -free graph with the maximum possible number
of edges. By the maximality assumption, 𝐺 contains 𝐾𝑟 as a subgraph, since otherwise we
could add an edge to 𝐺 and it would still be 𝐾𝑟+1 -free. Let 𝐴 be the vertex set of an 𝑟-clique
in 𝐺, and let 𝐵 := 𝑉 \ 𝐴.
1.2 Forbidding a Clique: Turán’s Theorem 17
𝐴
(𝑟 = 3)
We have
𝑒(𝐺) = 𝑒( 𝐴) + 𝑒( 𝐴, 𝐵) + 𝑒(𝐵)
𝑟
≤ + (𝑟 − 1) (𝑛 − 𝑟) + 𝑒(𝑇𝑛−𝑟 ,𝑟 ) = 𝑒(𝑇𝑛,𝑟 ),
2
where the inequality uses the induction hypothesis on 𝐺 [𝐵], which is 𝐾𝑟+1 -free, and the final
equality can be seen by removing a 𝐾𝑟 from 𝑇𝑛,𝑟 .
Finally, let us check when equality occurs. To have equality in every step above, the
𝑟
subgraph induced on 𝐵 must be 𝑇𝑛−𝑟 ,𝑟 by induction. To have 𝑒( 𝐴) = 2 , 𝐴 must induce a
clique. To have 𝑒( 𝐴, 𝐵) = (𝑟 − 1) (𝑛 − 𝑟), every vertex of 𝐵 must be adjacent to all but one
vertex in 𝐴. Also, two vertices 𝑥, 𝑦 lying in distinct parts of 𝐺 [𝐵] 𝑇𝑛−𝑟 ,𝑟 cannot “miss” the
same vertex 𝑣 of 𝐴, or else 𝐴 ∪ {𝑥, 𝑦} \ {𝑣} would be an 𝐾𝑟+1 -clique. This then forces 𝐺 to
be 𝑇𝑛,𝑟 . □
The third proof uses a method known as Zykov symmetrization. The idea here is that if a
𝐾𝑟+1 -free graph is a not a Turán graph, then we should be able make some local modifications
(namely replacing a vertex by a clone of another vertex) to get another 𝐾𝑟+1 -free with strictly
more edges.
Third proof of Turán’s theorem. As before, let 𝐺 be an 𝑛-vertex, 𝐾𝑟+1 -free graph with the
maximum possible number of edges.
We claim that if 𝑥 and 𝑦 are non-adjacent vertices, then deg 𝑥 = deg 𝑦. Indeed, suppose
deg 𝑥 > deg 𝑦. We can modify 𝐺 by removing 𝑦 and adding in a clone of 𝑥 (a new vertex 𝑥 ′
with the same neighborhood as 𝑥 but not adjacent to 𝑥), as illustrated below.
𝑥 𝑦 𝑥 𝑥′
−→
The resulting graph would still be 𝐾𝑟+1 -free (since a clique cannot contain both 𝑥 and its
clone) and has strictly more edges than 𝐺, thereby contradicting the assumption that 𝐺 has
the maximum possible number of edges.
Suppose 𝑥 is non-adjacent to both 𝑦 and 𝑧 in 𝐺. We claim that 𝑦 and 𝑧 must be non-adjacent.
18 Forbidding a Subgraph
We just saw that deg 𝑥 = deg 𝑦 = deg 𝑧. If 𝑦𝑧 is an edge, then by deleting 𝑦 and 𝑧 from 𝐺 and
adding two clones of 𝑥, we obtain a 𝐾𝑟+1 -free graph with one more edge than 𝐺. This would
contradict the maximality of 𝐺.
𝑥 𝑥
−→
𝑦 𝑧 𝑥′ 𝑥 ′′
Exercise 1.2.8. Let 𝐺 be a 𝐾𝑟+1 -free graph. Prove that there exists an 𝑟-partite graph 𝐻
on the same vertex set as 𝐺 such that and deg𝐻 (𝑥) ≥ deg𝐺 (𝑥) for every vertex 𝑥 (here
deg𝐻 (𝑥) is the degree of 𝑥 in 𝐻, and likewise with deg𝐺 (𝑥) for 𝐺). Give another proof of
Turán’s theorem from this fact.
The following exercise is an extension of Exercise 1.1.6.
1.3 Turán Density and Supersaturation 19
Exercise 1.2.9∗ (Stability). Let 𝐺 be an 𝑛-vertex 𝐾𝑟+1 -free graph with at least 𝑒(𝑇𝑛,𝑟 ) − 𝑘
edges, where 𝑇𝑛,𝑟 is the Turán graph. Prove that 𝐺 can be made 𝑟-partite by removing at
most 𝑘 edges.
The next exercise is a neat geometric application of Turán’s theorem.
Exercise 1.2.10. Let 𝑆 be a set of 𝑛 points in the plane, with the property that no two
points are at distance greater
√ than 1. Show that 𝑆 has at most ⌊𝑛2 /3⌋ pairs of points at
distance greater than 1/ 2. Also, show that the bound ⌊𝑛2 /3⌋ is tight (i.e., cannot be
improved).
Turán density
In this chapter, we will define the edge density of a graph 𝐺 to be
𝑣(𝐺)
𝑒(𝐺) .
2
So the edge density of a clique is 1. Later in the book, we will consider a different normal-
ization 2𝑒(𝐺)/𝑣(𝐺) 2 for edge density, which is more convenient for other purposes. When
𝑣(𝐺) is large, there is no significant difference between the two choices.
Next, we use an averaging/sampling argument to show that ex(𝑛, 𝐻)/ 𝑛2 is non-increasing
in 𝑛.
Proposition 1.3.1 (Monotonicity of Turán numbers)
For every graph 𝐻 and positive integer 𝑛,
ex(𝑛 + 1, 𝐻) ex(𝑛, 𝐻)
𝑛+1
≤ 𝑛 .
2 2
Proof. Let 𝐺 an 𝐻-free graph on 𝑛 + 1 vertices. For each 𝑛-vertex subset 𝑆 of 𝑉 (𝐺), since
𝐺 [𝑆] is also 𝐻-free, we have
𝑒(𝐺 [𝑆]) ex(𝑛, 𝐻)
𝑛 ≤ 𝑛 .
2 2
Varying 𝑆 uniformly over all 𝑛-vertex subsets of 𝑉 (𝐺), and the left-hand hand averages to
the edge density of 𝐺 by linearity of expectations (check this). It follows that
𝑒(𝐺) ex(𝑛, 𝐻)
𝑛+1
≤ 𝑛 .
2 2
The exact value of 𝜋(𝐻) is known in very few cases. It is a major open problem to deter-
mine 𝜋(𝐻) when 𝐻 is the complete 3-uniform hypergraph on 4 vertices (also known as a
tetrahedron), and more generally when 𝐻 is a complete hypergraph.
1.3 Turán Density and Supersaturation 21
Supersaturation
We know from Mantel’s theorem that any 𝑛-vertex graph 𝐺 with > 𝑛2 /4 edges must contain
a triangle. What if 𝐺 has a lot more edges? It turns out that 𝐺 must have a lot of triangles.
In particular, an 𝑛-vertex graph with > (1/4 + 𝜀)𝑛2 edges must have at least 𝛿𝑛3 triangles
for some constant 𝛿 > 0 depending on 𝜀 > 0. This is indeed a lot of triangles, since there
are could only be at most 𝑂 (𝑛3 ) triangles no matter what. (Exercise 1.1.4 asks you to give a
more precise quantitative lower bound on the number of triangles. The optimal dependence
of 𝛿 on 𝜀 is a difficult problem that we will discuss in Chapter 5.)
It turns out there is a general phenomenon in combinatorics where once some density
crosses an existence threshold (e.g., the Turán density is the threshold for 𝐻-freeness), it
will be possible to find not just one copy of the desired object, but in fact lots and lots of
copies. This fundamental principle, called supersaturation, is useful for many applications,
including in our upcoming determination of 𝜋(𝐻) for general 𝐻.
Equivalently: every 𝑛-vertex graph with 𝑜(𝑛𝑣 (𝐻 ) ) copies of 𝐻 has edge density ≤ 𝜋(𝐻) +
𝑜(1) (here 𝐻 is fixed). The sampling argument in the proof below is useful in many
applications.
Proof. By the definition of the Turán density, there exists some 𝑛0 (depending on 𝐻 and
𝜀) such that every 𝑛0 -vertex graph with at least (𝜋(𝐻) + 𝜀/2) 𝑛20 edges contains 𝐻 as a
subgraph.
Let 𝑛 ≥ 𝑛0 and 𝐺 be 𝑛-vertex graph with at least (𝜋(𝐻) + 𝜀) 𝑛2 edges. Let 𝑆 be an
𝑛0 -element subset of 𝑉 (𝐺), chosen uniformly at random. Let 𝑋 denote the edge density of
𝐺 [𝑆]. By averaging, E𝑋 equals to the edge density of 𝐺, and so E𝑋 ≥ 𝜋(𝐻) + 𝜀. Then
𝑋 ≥ 𝜋(𝐻) + 𝜀/2 with probability ≥ 𝜀/2 (or else E𝑋 could not be as large as 𝜋(𝐻) + 𝜀). So,
from the previous paragraph, we know that with probability ≥ 𝜀/2, 𝐺 [𝑆] contains a copy of
(𝐻 )
𝐻. This gives us ≥ (𝜀/2) 𝑛𝑛0 copies of 𝐻, but each copy of 𝐻 may be counted up to 𝑛𝑛−𝑣
0 −𝑣 (𝐻 )
times. Thus the number of copies of 𝐻 in 𝐺 is
(𝜀/2) 𝑛𝑛0
≥ 𝑛−𝑣 (𝐻 ) = Ω𝐻, 𝜀 (𝑛𝑣 (𝐻 ) ). □
𝑛0 −𝑣 (𝐻 )
Exercise 1.3.6 (Density Ramsey). Prove that for every 𝑠 and 𝑟, there is some constant
𝑐 > 0 so that for every sufficiently large 𝑛, if the edges of 𝐾𝑛 are colored using 𝑟 colors,
then at least 𝑐 fraction of all copies of 𝐾𝑠 are monochromatic.
22 Forbidding a Subgraph
Zarankiewicz (1951) originally asked a related problem: determine the maximum number
of 1’s in an 𝑚 × 𝑛 matrix without an 𝑠 × 𝑡 submatrix with all entries 1.
The main theorem of this section is the fundamental result due to Kővári, Sós, & Turán
(1954). We will refer to it as the KST theorem, which stands both for its discoverers, as well
as for the forbidden subgraph 𝐾𝑠,𝑡 .
..
𝑋 = number of copies of 𝐾𝑠,1 in 𝐺. .
(When 𝑠 = 1, we set 𝑋 = 2𝑒(𝐺).) The strategy is to count 𝑋 in two ways. First we count 𝐾𝑠,1
by first embedding the “left” 𝑠 vertices of 𝐾𝑠,1 . Then we count 𝐾𝑠,1 by first embedding the
“right” single vertex of 𝐾𝑠,1 .
Upper bound on 𝑋. Since 𝐺 is 𝐾𝑠,𝑡 -free, every 𝑠-vertex subset of 𝐺 has ≤ 𝑡 − 1 common
neighbors. Therefore,
𝑛
𝑋≤ (𝑡 − 1).
𝑠
Lower bound on 𝑋. For each vertex 𝑣 of 𝐺, there are exactly deg𝑠 𝑣 ways to pick 𝑠 of its
1.4 Forbidding a Complete Bipartite Graph: Kővári–Sós–Turán Theorem 23
To obtain a lower bound onthis quantity in terms of the number of edges 𝑚 of 𝐺, we use a
standard trick of viewing 𝑥𝑠 as a convex function on the reals, namely, letting
(
𝑥(𝑥 − 1) · · · (𝑥 − 𝑠 + 1)/𝑠! if 𝑥 ≥ 𝑠 − 1
𝑓𝑠 (𝑥) =
0 𝑥 < 𝑠 − 1.
Then 𝑓 (𝑥) = 𝑥𝑠 for all nonnegative integers 𝑥. Furthermore 𝑓𝑠 is a convex function. Since
the average degree of 𝐺 is 2𝑚/𝑛, it follows by convexity that
∑︁
2𝑚
𝑋= 𝑓𝑠 (deg 𝑣) ≥ 𝑛 𝑓𝑠 .
𝑣 ∈𝑉 (𝐺)
𝑛
(It would be a sloppy mistake to lower bound 𝑋 by 𝑛 2𝑚/𝑛
𝑠
.)
Combining the upper bound and the lower bound. We find that
2𝑚 𝑛
𝑛 𝑓𝑠 ≤𝑋≤ (𝑡 − 1).
𝑛 𝑠
Since 𝑓𝑠 (𝑥) = (1 + 𝑜(1))𝑥 𝑠 /𝑠! for 𝑥 → ∞ and fixed 𝑠, we find that, as 𝑛 → ∞,
𝑠
𝑛 2𝑚 𝑛𝑠
≤ (1 + 𝑜(1)) (𝑡 − 1).
𝑠! 𝑛 𝑠!
Therefore,
(𝑡 − 1) 1/𝑠
𝑚≤ + 𝑜(1) 𝑛2−1/𝑠 . □
2
The final bound in the proof gives us a somewhat more precise estimate than stated in
Theorem 1.4.2. Let us record it here for future reference.
It has been long conjectured that the KST theorem is tight up to a constant factor.
In the final sections of this chapter, we will produce some constructions showing that
24 Forbidding a Subgraph
Conjecture 1.4.4 is true for 𝐾2,𝑡 and 𝐾3,𝑡 . We also know that the conjecture is true if 𝑡 is much
larger than 𝑠. The first open case of the conjecture is 𝐾4,4 .
Corollary 1.4.5
For every bipartite graph 𝐻, there exists some constant 𝑐 > 0 so that ex(𝑛, 𝐻) =
𝑂 𝐻 (𝑛2−𝑐 ).
Proof. Suppose the two vertex parts of 𝐻 have sizes 𝑠 and 𝑡, with 𝑠 ≤ 𝑡. Then 𝐻 ⊆ 𝐾 𝑠,𝑡 . And
thus every 𝑛-vertex 𝐻-free graph is also 𝐾𝑠,𝑡 -free, and thus has 𝑂 𝑠,𝑡 (𝑛2−1/𝑠 ) edges. □
In particular, the Turán density 𝜋(𝐻) of every bipartite graph 𝐻 is zero.
The KST theorem gives a constant 𝑐 in the above corollary that depends on the number of
vertices on the smaller part of 𝐻. In Section 1.7, we will use the dependent random choice
technique to give a proof of the corollary showing that 𝑐 only has to depend on the maximum
degree of 𝐻.
In other words, given 𝑛 distinct points in the plane, at most how many pairs of these points
can be exactly distance 1 apart? We can draw a graph with these 𝑛 points as vertices, with
edges joining points exactly unit distance apart.
To get a feeling for the problem, let us play with some constructions. For small values of
𝑛, it is not hard to check by hand that the following configurations are optimal.
𝑛= 3 4 5 6 7
What about for larger values of 𝑛? If we line up the 𝑛 points equally spaced on a line, we
get 𝑛 − 1 unit distances.
···
We can be a bit more efficient by chaining up triangles. The following construction gives us
2𝑛 − 3 unit distances.
···
The construction for 𝑛 = 6 looks like it was obtained by copying and translating a unit
triangle. We can generalize this idea to obtain a recursive construction. Let 𝑓 (𝑛) denote the
1.4 Forbidding a Complete Bipartite Graph: Kővári–Sós–Turán Theorem 25
maximum number of unit distances formed by 𝑛 points in the plane. Given a configuration
𝑃 with ⌊𝑛/2⌋ points that has 𝑓 ( ⌊𝑛/2⌋) unit distances, we can copy 𝑃 and translate it by a
generic unit vector to get 𝑃′ . The configuration 𝑃 ∪ 𝑃′ has at least 2 𝑓 ( ⌊𝑛/2⌋) + ⌊𝑛/2⌋ unit
distances. We can solve the recursion to get 𝑓 (𝑛) ≳ 𝑛 log 𝑛. Now we take a different approach
to obtain an even better construction.
𝑃 𝑃′
1
√ √
Take a square grid with ⌊ 𝑛⌋ × ⌊ 𝑛⌋ vertices. Instead of choosing the distance between
√
adjacent points as the unit distance, we can scale the configuration so that 𝑟 becomes the
“unit” distance for some integer 𝑟. As an illustration, here is an example of a 5 × 5 grid with
𝑟 = 10.
It turns out that by choosing the optimal 𝑟 as a function of 𝑛, we can get at least
𝑛1+𝑐/log log 𝑛
unit distances, where 𝑐 > 0 is some absolute constant. The proof uses analytic number theory,
which we omit as it would take us too far afield. The basic idea is to choose 𝑟 to be a product
of many distinct primes that are congruent to 1 modulo 4, so that 𝑟 can be represented as a
sum of two squares in many different ways, and then estimate the number of such ways.
It is conjectured that the last construction above is close to optimal.
The KST theorem can be used to prove the following upper bound on the number of unit
distances.
Proof. Every unit distance graph is 𝐾2,3 -free. Indeed, for every pair of distinct points, there
are at most two other points that are at unit distance from both points.
𝑝 𝑞
Let 𝑔(𝑛) denote the answer. The asymptotically best construction for the minimum number
of distinct distances is also a square grid, same √︁
as earlier. It can be shown that a square grid
√ √
with ⌊ 𝑛⌋ × ⌊ 𝑛⌋ points has on the order of 𝑛/ log 𝑛 distinct distances. This is conjectured
√︁
to be optimal (i.e., 𝑔(𝑛) ≲ 𝑛/ log 𝑛).
Let 𝑓 (𝑛) denote the
maximum number of unit distances among 𝑛 points in the answer. We
have 𝑓 (𝑛)𝑔(𝑛) ≥ 𝑛2 , since each distance occurs at most 𝑓 (𝑛) times. So an upper bound on
𝑓 (𝑛) gives a lower bound on 𝑔(𝑛) (but not conversely).
A breakthrough on the distinct distances problem was obtained by Guth & Katz (2015).
In other
√︁ words, 𝑔(𝑛) ≳ 𝑛/log 𝑛, thereby matching the upper bound example up to a factor
of 𝑂 ( log 𝑛). The Guth–Katz proof is quite sophisticated. It uses tools ranging from the
polynomial method to algebraic geometry.
Exercises
Exercise 1.4.11. Show that a 𝐶4 -free bipartite graph between two vertex parts of sizes 𝑎
and 𝑏 has at most 𝑎𝑏 1/2 + 𝑏 edges.
Exercise 1.4.12 (Density KST). Prove that for every pair of positive integers 𝑠 ≤ 𝑡, there
𝑛
are constants 𝐶, 𝑐 > 0 such that every 𝑛-vertex graph with 𝑝 2
edges contains at least
𝑐 𝑝 𝑠𝑡 𝑛𝑠+𝑡 copies of 𝐾𝑠,𝑡 , provided that 𝑝 ≥ 𝐶𝑛 −1/𝑠 .
The next exercise asks you to think about the quantitative dependencies in the proof of the
KST theorem.
Exercise 1.4.13. Show that for every 𝜀 > 0, there exists 𝛿 > 0 such that every graph with
𝑛 vertices and at least 𝜀𝑛2 edges contains a copy of 𝐾𝑠,𝑡 where 𝑠 ≥ 𝛿 log 𝑛 and 𝑡 ≥ 𝑛0.99 .
The next exercise illustrates a bad definition of density of a subset of Z2 (it always ends
up being either 0 or 1).
Exercise 1.4.14 (How not to define density). Let 𝑆 ⊆ Z2 . Define
|𝑆 ∩ ( 𝐴 × 𝐵)|
𝑑 𝑘 (𝑆) = max .
𝐴,𝐵⊆Z | 𝐴| |𝐵|
| 𝐴|=| 𝐵|=𝑘
Remark 1.5.2 (History). Erdős & Stone (1946) proved this result when 𝐻 is a complete
multipartite graph. Erdős & Simonovits (1966) observed that the general case follows as a
quick corollary. The proof given here is due to Erdős (1971).
Example 1.5.3. When 𝐻 = 𝐾𝑟+1 , 𝜒(𝐻) = 𝑟 + 1, and so Theorem 1.5.1 agrees with Turán’s
theorem.
Example 1.5.4. When 𝐻 is the Petersen graph, below, which has chromatic number 3,
Theorem 1.5.1 tells us that ex(𝑛, 𝐻) = (1/4 + 𝑜(1))𝑛2 . The Turán density of the Petersen
graph is the same as that of a triangle, which may be somewhat surprising since the Petersen
graph seems more complicated than the triangle.
28 Forbidding a Subgraph
2
1
1 1
3 2
3 2
2 3
In other words, using the notation 𝐾𝑟 [𝑠] for 𝒔-blow-up of 𝐾𝑟 , obtained by replacing each
vertex of 𝐾𝑟 by 𝑠 duplicates of itself (so that 𝐾𝑟 [𝑠] = 𝐻 in the above theorem statement), the
Erdős–Stone theorem says that
1
𝜋(𝐾𝑟 [𝑠]) = 𝜋(𝐾𝑟 ) = 1 − ,
𝑟 −1
In Section 1.3, we saw supersaturation (Theorem 1.3.4): when the edge density is signif-
icantly above the Turán density threshold 𝜋(𝐻), one finds not just a single copy of 𝐻 but
actually many copies. The Erdős–Stone theorem can be viewed in this light: above edge
density 𝜋(𝐻), we finds a large blow-up of 𝐻.
The proof uses the following hypergraph extension of the KST theorem, which we will
prove later in the section.
Recall the hypergraph Turán problem (Remark 1.3.3). Given an 𝑟-uniform hypergraph 𝐻
(also known as an 𝑟-graph), we write ex(𝑛, 𝐻) to be the maximum number of edges in an
𝐻-free 𝑟-graph.
The analogue of a complete bipartite graph for an 𝑟-graph is a complete 𝑟-partite 𝑟-graph
𝑲𝒔(𝒓1 ,...,𝒔
)
𝒓
. Its vertex set consists of disjoint vertex parts 𝑉1 , . . . , 𝑉𝑟 with |𝑉𝑖 | = 𝑠𝑖 for each 𝑖.
Every 𝑟-tuple in 𝑉1 × · · · × 𝑉𝑟 is an edge.
Proof of the Erdős–Stone theorem (Theorem 1.5.5). We already saw the lower bound to
ex(𝑛, 𝐻) using a Turán graph. It remains to prove an upper bound.
Let 𝐺 be an 𝐻-free graph (where 𝐻 = 𝐾𝑠,...,𝑠 is the complete 𝑟-partite graph in the
theorem). Let 𝐺 (𝑟 ) be the 𝑟-graph with the same vertex set as 𝐺 and whose edges are the
𝑟-cliques in 𝐺. Note that 𝐺 (𝑟 ) is 𝐾𝑠,...,𝑠
(𝑟 )
-free, or else a copy of 𝐾𝑠,...,𝑠
(𝑟 )
in 𝐺 (𝑟 ) would be
supported by a copy of 𝐻 in 𝐺. Thus, by the hypergraph KST theorem (Theorem 1.5.6),
𝐺 (𝑟 ) has 𝑜(𝑛𝑟 ) edges. So 𝐺 has 𝑜(𝑛𝑟 ) copies of 𝐾𝑟 , and thus by the supersaturation theorem
1.5 Forbidding a General Subgraph: Erdős–Stone–Simonovits Theorem 29
quoted above, the edge density of 𝐺 is at most 𝜋(𝐾𝑟 ) + 𝑜(1), which equals 1 − 1/(𝑟 − 1) + 𝑜(1)
by Turán’s theorem. □
In Section 2.6, we will give another proof of the Erdős–Stone–Simonovits theorem using
the graph regularity method.
Hypergraph KST
To help keep notation simple, we first consider what happens for 3-uniform hypergraphs.
Recall that the KST theorem (Theorem 1.4.2) was proved by counting the number of
copies of 𝐾𝑠,1 in the graph in two different ways. For 3-graphs, we instead count the number
of copies of 𝐾𝑠,1,1
(3)
in two different ways, one of which uses the KST theorem for 𝐾𝑠,𝑠 -free
graphs.
Proof. Let 𝐺 be a 𝐾 𝑠,𝑠,𝑠 -free 3-graph with 𝑛 vertices and 𝑚 edges. Let 𝑋 denote the number
(3)
of copies of (3)
𝐾𝑠,1,1 in 𝐺 (when 𝑠 = 1, we count each copy three times).
Upper bound on 𝑋. Given a set 𝑆 of 𝑠 vertices, consider the set 𝑇 of all unordered pairs
of distinct vertices that would form a 𝐾𝑠,1,1
(3)
with 𝑆 (i.e., every triple formed by combining
a pair in 𝑇 and a vertex of 𝑆 is an edge of 𝐺). Note that 𝑇 is the edge-set of a graph on the
same 𝑛 vertices. If 𝑇 contains a 𝐾𝑠,𝑠 , then together with 𝑆 we would have a 𝐾𝑠,𝑠,𝑠
(3)
. Thus 𝑇 is
2−1/𝑠
𝐾𝑠,𝑠 -free, and hence by Theorem 1.4.2, |𝑇 | = 𝑂 𝑠 (𝑛 ). Hence
𝑛 2−1/𝑠
𝑋 ≲𝑠 𝑛 ≲𝑠 𝑛𝑠+2−1/𝑠 .
𝑠
Lower bound on 𝑋. We write deg(𝑢, 𝑣) for the number of edges in 𝐺 containing both 𝑢
and 𝑣. Then, summing over all unordered pairs of distinct vertices 𝑢, 𝑣 in 𝐺, we have
∑︁ deg(𝑢, 𝑣)
𝑋= .
𝑢,𝑣
𝑠
And hence
2
𝑚 = 𝑂 𝑠 (𝑛3−1/𝑠 ). □
We can iterate further, using the same technique, to prove an analogous result for ev-
ery uniformity, thereby giving us the statement (Theorem 1.5.6) used in our proof of the
Erdős–Stone–Simonovits theorem earlier. Feel free to skip reading the next proof if you feel
comfortable with generalizing the above proof to 𝑟-graphs.
where 𝐾𝑠,...,𝑠
(𝑟 )
is the complete 𝑟-partite 𝑟-graph with 𝑠 vertices in each of the 𝑟 parts.
Upper bound on 𝑋. Given a set 𝑆 of 𝑠 vertices, consider the set 𝑇 of all unordered (𝑟 − 1)-
tuples of vertices that would form a 𝐾𝑠,1,...,1
(𝑟 )
with 𝑆 (with 𝑆 in one part, and the 𝑟 − 1 new
vertices each in its own part). Note that 𝑇 is the edge-set of an (𝑟 − 1) graph on the same
𝑛 vertices. If 𝑇 contains a 𝐾𝑠,...,𝑠
(𝑟 −1)
, then together with 𝑆 we would have a 𝐾𝑠,...,𝑠 (𝑟 )
. Thus 𝑇 is
𝐾𝑠,...,𝑠 -free, and by the induction hypothesis, |𝑇 | = 𝑂 𝑟 ,𝑠 (𝑛
(𝑟 −1) 𝑟 −1−𝑠 −𝑟+2
). Hence
𝑛 𝑟 −1−𝑠 −𝑟+2 −𝑟+2
𝑋 ≲𝑟 ,𝑠 𝑛 ≲𝑟 ,𝑠 𝑛𝑟+𝑠−1−𝑠 .
𝑠
Lower bound on 𝑋. Given a set 𝑈 of vertices, we write deg 𝑈 for the number of edges
containing all vertices in 𝑈. Then
∑︁ deg 𝑈
𝑋=
𝑠
𝑈 ∈ ( 𝑉𝑟(𝐺)
−1 )
Let 𝑓𝑠 (𝑥) be defined as in the previous proof. Since the average of deg 𝑈 over all (𝑟 − 1)-
element subsets 𝑈 is 𝑟𝑚/ 𝑟 −1
𝑛
, we have
!
∑︁ 𝑛 𝑟𝑚
𝑋= 𝑓𝑠 (deg 𝑈) ≥ 𝑓𝑠 𝑛 .
𝑟 −1 −1
𝑈 ∈ ( 𝑉𝑟(𝐺)
−1 )
𝑟
1.6 Forbidding a Cycle 31
And hence
−𝑟+1
𝑚 = 𝑂 𝑟 ,𝑠 (𝑛𝑟 −𝑠 ). □
Exercise 1.5.10 (Forbidding a multipartite complete hypergraph with unbalanced parts).
Prove that for every sequence of positive integers 𝑠1 , . . . , 𝑠𝑟 , there exists 𝐶 so that
ex(𝑛, 𝐾𝑠(𝑟1 ,...,𝑠
)
𝑟
) ≤ 𝐶𝑛𝑟 −1/(𝑠1 ···𝑠𝑟 −1 ) .
Exercise 1.5.11 (Erdős–Stone for hypergraphs). Let 𝐻 be an 𝑟-graph. Show that 𝜋(𝐻 [𝑠]) =
𝜋(𝐻), where 𝐻 [𝑠], the 𝑠-blow-up of 𝐻, is obtained by replacing every vertex of 𝐻 by 𝑠
duplicates of itself.
Odd cycles
First let us consider forbidding odd cycles. Let 𝑘 be a positive integer. Then 𝐶2𝑘+1 has
chromatic number 3, and so the Erdős–Stone–Simonovits theorem (Theorem 1.5.1) tells us
that
𝑛2
ex(𝑛, 𝐶2𝑘+1 ) = (1 + 𝑜(1)) .
4
In fact, an even stronger statement is true. If 𝑛 is large enough (as a function of 𝑘), then the
complete bipartite graph 𝐾 ⌊𝑛/2⌋,⌈𝑛/2⌉ is always the extremal graph, just like in the triangle
case.
We will not prove this theorem. See Füredi & Gunderson (2015) for a more recent proof.
More generally, Simonovits (1974) developed a stability method for exactly determining
the Turán number of non-bipartite color-critical graphs.
32 Forbidding a Subgraph
Remark 1.6.4 (Tightness). We will see in Section 1.10 a matching lower bound construction
(up to constant factors) for 𝑘 = 2, 3, 5. For all other values of 𝑘, it is open whether a matching
lower bound construction exists.
Instead of proving the above theorem, we will prove a weaker result, stated below. This
weaker result has a short and neat proof, which hopefully gives some intuition as to why the
above theorem should be true.
Proof. Color every vertex with red or blue independently and uniformly at random. Then the
expected number of non-monochromatic edges is 𝑒(𝐺)/2. Hence there exists a coloring that
has at least 𝑒(𝐺)/2 non-monochromatic edges, and these edges form the desired bipartite
subgraph. □
1.7 Forbidding a Sparse Bipartite Graph: Dependent Random Choice 33
Lemma 1.6.7 (Large average degree implies subgraph with large minimum degree)
Let 𝑡 ∈ R. Every graph with average degree 2𝑡 has a subgraph with minimum degree
greater than 𝑡.
Proof. Let 𝐺 be a graph with average degree 2𝑡. Removing a vertex of degree at most 𝑡
cannot decrease the average degree, since the total degree goes down by at most 2𝑡 and so
the post-deletion graph has average degree at least (2𝑒(𝐺) − 2𝑡)/(𝑣(𝐺) − 1), which is at
least 2𝑒(𝐺)/𝑣(𝐺) since 2𝑒(𝐺)/𝑣(𝐺) ≥ 2𝑡. Let us repeatedly delete vertices of degree at
most 𝑡 in the remaining graph, until every vertex has degree more than 𝑡. This algorithm
must terminate with a non-empty graph since we cannot ever drop below 2𝑡 vertices in this
process (as such a graph would have average degree less than 2𝑡). □
Proof of Theorem 1.6.5. The idea is to use a breath-first search. Suppose 𝐺 contains no
even cycles of length at most 2𝑘. Applying Lemma 1.6.6 followed by Lemma 1.6.7, we find
a bipartite subgraph 𝐺 ′ of 𝐺 with minimum degree > 𝑡 := 𝑒(𝐺)/(2𝑣(𝐺)). Let 𝑢 be an
arbitrary vertex of 𝐺 ′ . For each 𝑖 = 0, 1, . . . , 𝑘, let 𝐴𝑖 denote the set of vertices at distance
exactly 𝑖 from 𝑢.
···
𝑢 ···
𝐴0
···
𝐴1
𝐴2 𝐴𝑡
gives a significant improvement when the maximum degree of 𝐻 is small. The proof intro-
duces an important probabilistic technique known as dependent random choice.
Theorem 1.7.1 (Bounded degree bipartite graph: Turán number upper bound)
Let 𝐻 be a bipartite graph with vertex bipartition 𝐴 ∪ 𝐵 such that every vertex in 𝐴 has
degree at most 𝑟. Then there exists a constant 𝐶 = 𝐶𝐻 such that for all 𝑛,
ex(𝑛, 𝐻) ≤ 𝐶𝑛2−1/𝑟 .
Remark 1.7.2 (History). The result was first proved by Füredi (1991). The proof given here
is due to Alon, Krivelevich, & Sudakov (2003a). For more applications of the dependent
random choice technique see the survey by Fox & Sudakov (2011).
Remark 1.7.3 (Tightness). The exponent 2 − 1/𝑟 is best possible as a function of 𝑟. Indeed,
we will see in the following section that for every 𝑟 there exists some 𝑠 so that ex(𝑛, 𝐾𝑟 ,𝑠 ) ≥
𝑐𝑛2−1/𝑟 for some 𝑐 = 𝑐(𝑟, 𝑠) > 0.
On the other hand, for specific graphs 𝐺, Theorem 1.7.1 may not be tight. For example,
ex(𝑛, 𝐶6 ) = Θ(𝑛4/3 ), whereas Theorem 1.7.1 only tells us that ex(𝑛, 𝐶6 ) = 𝑂 (𝑛3/2 ).
Given a graph 𝐺 with many edges, we wish to find a large subset 𝑈 of vertices such
that every 𝑟-vertex subset of 𝑈 has many common neighbors in 𝐺 (even the case 𝑟 = 2 is
interesting). Once such a 𝑈 is found, we can then embed the 𝐵-vertices of 𝐻 into 𝑈. It will
then be easy to embed the vertices of 𝐴. The tricky part is to find such a 𝑈.
Remark 1.7.4 (Intuition). We want to host a party so that each pair of party-goers has many
common friends (here 𝐺 is the friendship graph). Whom should we invite? Inviting people
uniformly at random is not a good idea (why?). Perhaps we can pick some random individual
(Alice) to host a party inviting all her friends. Alice’s friends are expected to share some
common friends—at least they all know Alice.
We can take a step further, and pick a few people at random (Alice, Bob, Carol, David)
and have them host a party and invite all their common friends. This will likely be an even
more sociable crowd. At least all the party goers will know all the hosts, and likely even
more. As long as the social network is not too sparse, there should be lots of invitees.
Some invitees (e.g., Zack) might feel a bit out of place at the party—maybe they don’t have
many common friends with other party-goers (they all know the hosts but maybe Zack doesn’t
know many others). To prevent such awkwardness, the hosts will cancel Zack’s invitation.
There shouldn’t be too many people like Zack. The party must go on.
Here is the technical statement that we will prove. While there are many parameters, the
specific details are less important compared to the proof technique. This is quite a tricky
proof.
1.7 Forbidding a Sparse Bipartite Graph: Dependent Random Choice 35
Remark 1.7.6 (Parameters). In the theorem statement, 𝑡 is an auxiliary parameter that does
not appear in the conclusion. While one can optimize for 𝑡, it is instructive and convenient to
leave it as is. The theorem is generally applied to graphs with at least 𝑛1−𝑐 edges, for some
small 𝑐 > 0, and we can play with the parameters to get |𝑈| and 𝑚 both large as desired.
Proof. We say that an 𝑟-element subset of 𝑉 (𝐺) is “bad” if it has at most 𝑚 common
neighbors in 𝐺.
Let 𝑢 1 , . . . , 𝑢 𝑡 be vertices chosen uniformly and independently at random from 𝑉 (𝐺)
(these vertices are chosen “with replacement”, i.e., they can repeat). Let 𝐴 be their common
neighborhood. (Keep in mind that 𝑢 1 , . . . , 𝑢 𝑡 , 𝐴 are random. It may be a bit confusing in this
proof what is random and what is not.)
𝑢1 ... 𝑢𝑡
Each fixed vertex 𝑣 ∈ 𝑉 (𝐺) has probability (deg(𝑣)/𝑛) 𝑡 of being adjacent to all of 𝑢 1 , . . . , 𝑢 𝑡 ,
and so by linearity of expectations and convexity,
!𝑡
∑︁ ∑︁ deg(𝑣) 𝑡 1 ∑︁ deg(𝑣)
E | 𝐴| = P(𝑣 ∈ 𝐴) = ≥𝑛 ≥ 𝑛𝛼𝑡 .
𝑣 ∈𝑉 (𝐺) 𝑣 ∈𝑉 (𝐺)
𝑛 𝑛 𝑣 ∈𝑉
𝑛
Let 𝑈 be obtained from 𝐴 by deleting an element from each bad 𝑟-vertex subset. So 𝑈 has
no bad 𝑟-vertex subsets. Also
E |𝑈| ≥ E | 𝐴| − E[the number of bad 𝑟-vertex subsets of 𝐴]
𝑛 𝑚 𝑡
≥ 𝑛𝛼𝑡 − .
𝑟 𝑛
Thus there exists some 𝑈 with at least this size, with the property that all its 𝑟-vertex subsets
have more than 𝑚 common neighbors. □
Now we are ready to show Theorem 1.7.1, which recall says that for a bipartite graph
𝐻 with vertex bipartition 𝐴 ∪ 𝐵 such that every vertex in 𝐴 has degree at most 𝑟, one has
ex(𝑛, 𝐻) = 𝑂 𝐻 (𝑛2−1/𝑟 ).
1
Proof of Theorem 1.7.1. Let 𝐺 be a graph with 𝑛 vertices and at least 𝐶𝑛2− 𝑟 edges. By
choosing 𝐶 large enough (depending only on | 𝐴| + |𝐵|), we have
𝑟
1 𝑟 𝑛 | 𝐴| + |𝐵|
𝑛 2𝐶𝑛 − 𝑟 − ≥ |𝐵| .
𝑟 𝑛
We want to show that 𝐺 contains 𝐻 as a subgraph. By dependent random choice (Theo-
rem 1.7.5 applied with 𝑡 = 𝑟), we can embed the 𝐵-vertices of 𝐻 into 𝐺 so that every 𝑟-vertex
subset of 𝐵 (now viewed as a subset of 𝑉 (𝐺)) has > | 𝐴| + |𝐵| common neighbors.
𝐴 𝐵
𝐻 𝐺
Next, we embed the vertices of 𝐴 one at a time. Suppose we need to embed 𝑣 ∈ 𝐴 (some
previous vertices of 𝐴 may have already been embedded at this point). Note that 𝑣 has at
≤ 𝑟 neighbors in 𝐵, and these ≤ 𝑟 vertices in 𝐵 have > | 𝐴| + |𝐵| common neighbors in 𝐺.
While some of these common neighbors may have already been used up in earlier steps to
embed vertices of 𝐻, there are enough of them that they cannot all be used up, and thus we
can embed 𝑣 to some remaining common neighbor. This process ends with an embedding of
𝐻 into 𝐺. □
Exercise 1.7.7. Let 𝐻 be a bipartite graph with vertex bipartition 𝐴 ∪ 𝐵, such that 𝑟
vertices in 𝐴 are complete to 𝐵, and all remaining vertices in 𝐴 have degree at most 𝑟.
Prove that there is some constant 𝐶 = 𝐶𝐻 such that ex(𝑛, 𝐻) ≤ 𝐶𝑛2−1/𝑟 for all 𝑛.
Exercise 1.7.8. Let 𝜀 > 0. Show that, for sufficiently large 𝑛, every 𝐾4 -free graph with 𝑛
vertices and at least 𝜀𝑛2 edges contains an independent set of size at least 𝑛1− 𝜀 .
1.8 Lower Bound Constructions: Overview 37
(b) We say that a graph 𝐻 is 𝒓-degenerate if its vertices can be ordered so that every
vertex has at most 𝑟 neighbors that appear before it in the ordering. Show that
for every 𝑟-degenerate bipartite graph 𝐻 there is some constant 𝐶 > 0 so that
ex(𝑛, 𝐻) ≤ 𝐶𝑛2−𝑐/𝑟 , where 𝑐 is the same absolute constant from part (a) (𝑐 should
not depend on 𝐻 or 𝑟).
Randomized constructions
The idea is to take a random graph at a density that gives a small number of copies of 𝐻,
and then destroy these copies of 𝐻 by removing some edges from the random graph. The
resulting graph is then 𝐻-free. This method is easy to implement and applies quite generally
to all 𝐻. For example, it will be shown that
𝑣 (𝐻) −2
ex(𝑛, 𝐻) = Ω𝐻 𝑛2− 𝑒 (𝐻) −1 .
However, bounds arising from this method are usually not tight.
Algebraic constructions
The idea is to use algebraic geometry over a finite field to construct a graph. Its vertices
correspond to geometric objects such as points or lines. Its edges correspond to incidences
or other algebraic relations. These constructions sometimes give tight bounds. They work
for a small number of graphs 𝐻, and usually require a different ad hoc idea for each 𝐻. They
work rarely, but when they do, they can appear quite mysterious, or even magical. Many
38 Forbidding a Subgraph
important tight lower bounds on bipartite extremal numbers arise this way. In particular it
will be shown that
ex(𝑛, 𝐾𝑠,𝑡 ) = Ω𝑠,𝑡 𝑛2−1/𝑠 whenever 𝑡 ≥ (𝑠 − 1)! + 1,
thereby matching the KST theorem (Theorem 1.4.2) for such 𝑠, 𝑡. Also, it will be shown that
ex(𝑛, 𝐶2𝑘 ) = Ω𝑘 𝑛1+1/𝑘 whenever 𝑘 ∈ {2, 3, 5},
thereby matching Theorem 1.6.3 for these values of 𝑘.
Proof. Let 𝐺 be an instance of the Erdős–Rényi random graph G(𝑛, 𝑝), with
1 − 𝑒𝑣 (𝐻)
(𝐻) −2
𝑝= 𝑛 −1
4
(chosen with hindsight). We have E 𝑒(𝐺) = 𝑝 𝑛2 . Let 𝑋 denote the number of copies of 𝐻
in 𝐺. Then, our choice of 𝑝 ensures that
𝑝 𝑛 1
E𝑋 ≤ 𝑝 𝑒 (𝐻 ) 𝑛𝑣 (𝐻 ) ≤ = E 𝑒(𝐺).
2 2 2
Thus
𝑝 𝑛 𝑣 (𝐻) −2
E[𝑒(𝐺) − 𝑋] ≥ ≳ 𝑛2− 𝑒 (𝐻) −1 .
2 2
Take a graph 𝐺 such that 𝑒(𝐺) − 𝑋 is at least its expectation. Remove one edge from each
𝑣 (𝐻) −2
copy of 𝐻 in 𝐺, and we get an 𝐻-free graph with at least 𝑒(𝐺) − 𝑋 ≳ 𝑛2− 𝑒 (𝐻) −1 edges. □
1.9 Randomized Constructions 39
For some graphs 𝐻, we can bootstrap Theorem 1.9.1 to give an even better lower bound.
For example, if
𝐻= ,
then 𝑣(𝐻) = 10 and 𝑒(𝐻) = 20, so applying Theorem 1.9.1 directly gives
ex(𝑛, 𝐻) ≳ 𝑛2−8/19 .
On the other hand, any 𝐾4,4 -free graph is automatically 𝐻-free. Applying Theorem 1.9.1 to
𝐾4,4 (8-vertex 16-edge) actually gives a better lower bound (2 − 6/15 > 2 − 8/19):
ex(𝑛, 𝐻) ≥ ex(𝑛, 𝐾4,4 ) ≳ 𝑛2−6/15 .
In general, given 𝐻, we should apply Theorem 1.9.1 to the subgraph of 𝐻 with the
maximum (𝑒(𝐻) − 1)/(𝑣(𝐻) − 2) ratio. This gives the following corollary, which sometimes
gives a better lower bound than directly applying Theorem 1.9.1.
When 𝑡 is large compared to 𝑠, the exponents in the two bounds above are close to each other
(but never equal). When 𝑡 = 𝑠, the above bounds specialize to
2 1
𝑛2− 𝑠+1 ≲ ex(𝑛, 𝐾𝑠,𝑠 ) ≲ 𝑛2− 𝑠 .
In particular, for 𝑠 = 2,
𝑛4/3 ≲ ex(𝑛, 𝐾2,2 ) ≲ 𝑛3/2 .
It turns out that the upper bound is tight. We will show this in the next section using an
algebraic construction.
40 Forbidding a Subgraph
Exercise 1.9.5. Show that if 𝐻 is a bipartite graph containing a cycle of length 2𝑘, then
ex(𝑛, 𝐻) ≳ 𝐻 𝑛1+1/(2𝑘−1) .
1 2
Exercise 1.9.6. Find a graph 𝐻 with 𝜒(𝐻) = 3 and ex(𝑛, 𝐻) > 4
𝑛 + 𝑛1.99 for all
sufficiently large 𝑛.
𝐾2,2 -free
We begin by constructing 𝐾2,2 -free graphs with the number of edges matching the KST theo-
rem. The construction is due to Erdős, Rényi, & Sós (1966) and Brown (1966) independently.
Before giving the proof of Theorem 1.10.1, let us first sketch the geometric intuition.
Given a set of points P and a set of lines L, the point-line incidence graph is the bipartite
graph with two vertex parts P and L, where 𝑝 ∈ P and ℓ ∈ L are adjacent if 𝑝 ∈ ℓ.
P 𝑝∈ℓ L
𝑝
ℓ
A point-line incidence graph is 𝐶4 -free. Indeed, a 𝐶4 would correspond to two lines both
passing through two distinct points, which is impossible.
We want to construct a set of points and a set of lines so that there are many incidences.
To do this, we take all points and all lines in a finite field plane F2𝑝 . There are 𝑝 2 points
and 𝑝 2 + 𝑝 lines. Since every line contains 𝑝 points, the graph has around 𝑝 3 edges, and
1.10 Algebraic Constructions 41
Remark 1.10.4 (Large gaps between primes). The above result already follows from the
prime number theorem, which says that the number of primes up to 𝑁 is (1 + 𝑜(1))𝑁/log 𝑁.
The best quantitative result, due to Baker, Harman, & Pintz (2001), says that there exists a
prime in [𝑁 − 𝑁 0.525 , 𝑁] for all sufficiently large 𝑁. Cramer’s conjecture, which is wide open
and based on a random model of the primes, speculates that the 𝑜(𝑁) in Theorem 1.10.3 may
be replaced by 𝑂 ((log 𝑁) 2 ). An easier claim is Bertrand’s postulate, which says that there is a
prime between 𝑁 and 2𝑁 for every 𝑁, and this already suffices for proving ex(𝑛, 𝐾2,2 ) ≳ 𝑛3/2 .
To get a better constant in the above construction, we optimize somewhat by using the
same vertices to represent both points and lines. This pairing of points and lines is known as
polarity in projective geometry, and this construction is known as the polarity graph (usually
this usually refers to the projective plane version of the construction).
Proof of Theorem 1.10.1. Let 𝑝 denote the largest prime such that 𝑝 2 − 1 ≤ 𝑛. Then 𝑝 =
√
(1 − 𝑜(1)) 𝑛 by Theorem 1.10.3. Let 𝐺 be a graph with vertex set 𝑉 (𝐺) = F2𝑝 \ {(0, 0)} and
an edge between (𝑥, 𝑦) and (𝑎, 𝑏) if and only if 𝑎𝑥 + 𝑏𝑦 = 1 in F 𝑝 .
For any two distinct vertices (𝑎, 𝑏) and (𝑎 ′ , 𝑏 ′ ) in 𝑉 (𝐺), they have at most one common
neighbor since there is at most one solution to the system 𝑎𝑥 + 𝑏𝑦 = 1 and 𝑎 ′ 𝑥 + 𝑏 ′ 𝑦 = 1.
Therefore, 𝐺 is 𝐾2,2 -free. (This is where we use the fact that two lines intersect in at most
one point.)
For every (𝑎, 𝑏) ∈ 𝑉 (𝐺), there are exactly 𝑝 vertices (𝑥, 𝑦) satisfying 𝑎𝑥 + 𝑏𝑦 = 1.
However, one of those vertices could be (𝑎, 𝑏) itself. So every vertex in 𝐺 has degree 𝑝 or
𝑝 − 1. Hence 𝐺 has at least ( 𝑝 2 − 1) ( 𝑝 − 1)/2 = (1/2 − 𝑜(1))𝑛3/2 edges. □
𝐾3,3 -free
Next, we construct 𝐾3,3 -free graphs with the number of edges matching the KST theorem.
This construction is due to Brown (1966).
Consider the incidence between between points in 3-dimensions and unit spheres. This
graph is 𝐾3,3 -free since no three unit spheres can share three distinct common points. Again,
42 Forbidding a Subgraph
one needs to do this over a finite field to attain the desired bounds, but it is easier to visualize
the setup in Euclidean space, where it is clearly true.
Proof sketch. Let 𝑝 be the largest prime less than 𝑛1/3 . Fix a nonzero element 𝑑 ∈ F 𝑝 , which
we take to be a quadratic residue if 𝑝 ≡ 3 (mod 4) and a quadratic non-residue if 𝑝 . 3
(mod 4). Construct a graph 𝐺 with vertex set 𝑉 (𝐺) = F3𝑝 , and an edge between (𝑥, 𝑦, 𝑧) and
(𝑎, 𝑏, 𝑐) ∈ 𝑉 (𝐺) if and only if
(𝑎 − 𝑥) 2 + (𝑏 − 𝑦) 2 + (𝑐 − 𝑧) 2 = 𝑑.
It turns out that each vertex has (1 − 𝑜(1)) 𝑝 2 neighbors (the intuition here is that, for a
fixed (𝑎, 𝑏, 𝑐), if we choose 𝑥, 𝑦, 𝑧 ∈ F 𝑝 independently and uniformly at random, then the
resulting sum (𝑎 − 𝑥) 2 + (𝑏 − 𝑦) 2 + (𝑐 − 𝑧) 2 is roughly uniformly distributed, and hence
equals to 𝑑 with probability close to 1/𝑝). It remains to show that the graph is 𝐾3,3 -free.
To see this, think about how one might prove this claim in R3 via algebraic manipulations.
We compute the radical planes between pairs of spheres as well as the intersections of these
radical planes (i.e., the radical axis). The claim boils down to the fact that no sphere has three
collinear points, which is true due to the quadratic (non)residue hypothesis on 𝑑. The details
are omitted.
Thus 𝐺 is a 𝐾3,3 -free graph on 𝑝 3 ≤ 𝑛 vertices and with at least (1/2 − 𝑜(1)) 𝑝 5 =
(1/2 − 𝑜(1))𝑛5/3 edges. □
It is unknown if the above ideas can be extended to construct 𝐾4,4 -free graphs with Ω(𝑛7/4 )
edges. It is a major open problem to determine the asymptotics of ex(𝑛, 𝐾4,4 ).
𝐾𝑠,𝑡 -free
Now we present a substantial generalization of the above constructions, due to Kollár, Rónyai,
& Szabó (1996) and Alon, Rónyai, & Szabó (1999). It gives a matching lower bound (up to
a constant factor) to the KST theorem for 𝐾𝑠,𝑡 whenever 𝑡 is sufficiently large compared to 𝑠.
Proposition 1.10.10
NormGraph 𝑝,𝑠 is 𝐾 𝑠,𝑠!+1 -free for all 𝑠 ≥ 2.
We wish to upper bound the number of common neighbors to a set of 𝑠 vertices. This
amount to showing that a certain system of algebraic equations cannot have too many
solutions. We quote without proof the following key algebraic result from Kollár, Rónyai, &
Szabó (1996), which can be proved using algebraic geometry.
44 Forbidding a Subgraph
Theorem 1.10.11
Let F be any field and 𝑎 𝑖 𝑗 , 𝑏 𝑖 ∈ F such that 𝑎 𝑖 𝑗 ≠ 𝑎 𝑖′ 𝑗 for all 𝑖 ≠ 𝑖 ′ . Then the system of
equations
(𝑥 1 − 𝑎 11 ) (𝑥2 − 𝑎 12 ) · · · (𝑥 𝑠 − 𝑎 1𝑠 ) = 𝑏 1
(𝑥 1 − 𝑎 21 ) (𝑥2 − 𝑎 22 ) · · · (𝑥 𝑠 − 𝑎 2𝑠 ) = 𝑏 2
..
.
(𝑥1 − 𝑎 𝑠1 ) (𝑥2 − 𝑎 𝑠2 ) · · · (𝑥 𝑠 − 𝑎 𝑠𝑠 ) = 𝑏 𝑠
has at most 𝑠! solutions (𝑥 1 , . . . , 𝑥 𝑠 ) ∈ F𝑠 .
Remark 1.10.12 (Special base 𝑏 = 0). Consider the special case when all the 𝑏 𝑖 are 0. In
this case, since the 𝑎 𝑖 𝑗 are distinct for each fixed 𝑗, every solution to the system corresponds
to a permutation 𝜋 : [𝑠] → [𝑠], setting 𝑥𝑖 = 𝑎 𝑖 𝜋 (𝑖) . So there are exactly 𝑠! solutions in
this special case. The difficult part of the theorem says that the number of solutions cannot
increase if we move 𝑏 away from the origin.
Proof of Proposition 1.10.10. Consider distinct 𝑦 1 , 𝑦 2 , . . . , 𝑦 𝑠 ∈ F 𝑝 𝑠 . We wish to bound the
number of common neighbors 𝑥. Recall that in a field with characteristic 𝑝, we have the
identity (𝑥 + 𝑦) 𝑝 = 𝑥 𝑝 + 𝑦 𝑝 for all 𝑥, 𝑦. So
1 = 𝑁 (𝑥 + 𝑦 𝑖 ) = (𝑥 + 𝑦 𝑖 ) (𝑥 + 𝑦 𝑖 ) 𝑝 . . . (𝑥 + 𝑦 𝑖 ) 𝑝
𝑠−1
𝑠−1 𝑠−1
= (𝑥 + 𝑦 𝑖 ) (𝑥 𝑝 + 𝑦 𝑖𝑝 ) . . . (𝑥 𝑝 + 𝑦 𝑖𝑝 )
for all 1 ≤ 𝑖 ≤ 𝑠. By Theorem 1.10.11, these 𝑠 equations (as 𝑖 ranges over [𝑠]) have at most
𝑠! solutions in 𝑥. Note the hypothesis of Theorem 1.10.11 is satisfied since 𝑦 𝑖𝑝 = 𝑦 𝑝𝑗 if and
only if 𝑦 𝑖 = 𝑦 𝑗 in F 𝑝 𝑠 . □
Now we modify the norm graph construction to forbid 𝐾𝑠, (𝑠−1)!+1 , thereby yielding Theo-
rem 1.10.7.
Construction 1.10.13 (Projective norm graph)
Let ProjNormGraph 𝒑,𝒔 be the graph with vertex set F 𝑝 𝑠−1 × F×𝑝 , where two vertices
(𝑋, 𝑥), (𝑌 , 𝑦) ∈ F 𝑝 𝑠−1 × F×𝑝 are adjacent if and only if
𝑁 (𝑋 + 𝑌 ) = 𝑥𝑦.
In ProjNormGraph 𝑝,𝑠 , every vertex (𝑋, 𝑥) in has degree 𝑝 𝑠−1 − 1 since its neighbors are
(𝑌 , 𝑁 (𝑋 + 𝑌 )/𝑥) for all 𝑌 ≠ −𝑋. There are has ( 𝑝 𝑠−1 − 1) 𝑝 𝑠−1 ( 𝑝 − 1)/2 edges. As earlier, it
remains to show that this graph is 𝐾𝑠, (𝑠−1)!+1 -free. Once we know this, by taking 𝑝 to be the
largest prime satisfying 𝑝 𝑠−1 ( 𝑝 − 1) ≤ 𝑛, we obtain the desired lower bound
1 𝑠−1 1
ex(𝑛, 𝐾𝑠, (𝑠−1)!+1 ) ≥ ( 𝑝 − 1) 𝑝 ( 𝑝 − 1) ≥
𝑠−1
− 𝑜(1) 𝑛2−1/𝑠 .
2 2
Proposition 1.10.14
ProjNormGraph 𝑝,𝑠 is 𝐾 𝑠, (𝑠−1)!+1 -free.
1.10 Algebraic Constructions 45
Proof. Fix distinct (𝑌1 , 𝑦 1 ), . . . , (𝑌𝑠 , 𝑦 𝑠 ) ∈ F 𝑝 𝑠−1 × F×𝑝 . We wish to show that there are at most
(𝑠 − 1)! solutions (𝑋, 𝑥) ∈ F 𝑝 𝑠−1 × F×𝑝 to the system of equations
𝑁 (𝑋 + 𝑌𝑖 ) = 𝑥𝑦 𝑖 , 𝑖 = 1, . . . , 𝑠.
Assume this system has at least one solution. Then if 𝑌𝑖 = 𝑌 𝑗 with 𝑖 ≠ 𝑗 we must have
that 𝑦 𝑖 = 𝑦 𝑗 . Therefore all the 𝑌𝑖 are distinct. For each 𝑖 < 𝑠, dividing 𝑁 (𝑋 + 𝑌𝑖 ) = 𝑥𝑦 𝑖 by
𝑁 (𝑋 + 𝑌𝑠 ) = 𝑥𝑦 𝑠 gives
𝑋 + 𝑌𝑖 𝑦𝑖
𝑁 = , 𝑖 = 1, . . . , 𝑠 − 1.
𝑋 + 𝑌𝑠 𝑦𝑠
Dividing both sides by 𝑁 (𝑌𝑖 − 𝑌𝑠 ) gives
1 1 𝑦𝑖
𝑁 + = , 𝑖 = 1, . . . , 𝑠 − 1.
𝑋 + 𝑌𝑠 𝑌𝑖 − 𝑌𝑠 𝑁 (𝑌𝑖 − 𝑌𝑠 )𝑦 𝑠
Now apply Theorem 1.10.11 (same as in the proof of Proposition 1.10.10). We deduce
that there are at most (𝑠 − 1)! choices for 𝑋, and each such 𝑋 automatically determines
𝑥 = 𝑁 (𝑋 + 𝑌1 )/𝑦 1 . Thus there are at most (𝑠 − 1)! solutions (𝑋, 𝑥). □
𝐶4 , 𝐶6 , 𝐶10 -free
Finally, let us turn to constructions of 𝐶2𝑘 -free graphs. We had mentioned in Section 1.6 that
ex(𝐶2𝑘 , 𝑛) = 𝑂 𝑘 (𝑛1+1/𝑘 ). We saw a matching lower bound construction for 4-cycles. Now
we give matching constructions for 6-cycles and 10-cycles. (It remains an open problem for
other cycle lengths.)
Theorem 1.10.15 (Tight lower bound for avoiding 𝐶2𝑘 for 𝑘 ∈ {2, 3, 5})
Let 𝑘 ∈ {2, 3, 5}. Then there is a constant 𝑐 > 0 such that for every 𝑛,
ex(𝑛, 𝐶2𝑘 ) ≥ 𝑐𝑛1+1/𝑘 .
Remark 1.10.16 (History). The existence of such 𝐶2𝑘 -free graphs for 𝑘 ∈ {3, 5} is due to
Benson (1966) and Singleton (1966). The construction given here is due to Wenger (1991),
with a simplified description due to Conlon (2021).
The following construction generalizes the point-line incidence graph construction earlier
for the 𝐶4 -free graph in Theorem 1.10.1. Here we consider a special set of lines in F𝑞𝑘 , whereas
previously for 𝐶4 we took all lines in F2𝑞 .
We have |L| = 𝑞 𝑘 , since to specify a line in L we can provide a point with first coordinate
equal to zero, along with a choice of 𝑡 ∈ F𝑞 giving the direction of the line. So the graph 𝐺 𝑞,𝑘
46 Forbidding a Subgraph
has 𝑛 = 2𝑞 𝑘 vertices. Since each line contains exactly 𝑞 points, there are exactly 𝑞 𝑘+1 ≍ 𝑛1+1/𝑘
edges in the graph. It remains to show that this graph is 𝐶2𝑘 -free whenever 𝑘 ∈ {2, 3, 5}.
Then Theorem 1.10.15 would follow after the usual trick of taking 𝑞 to be the largest prime
with 2𝑞 𝑘 < 𝑛.
𝑝𝑘 𝑝3
ℓ𝑘−1 ... ℓ3
Then
𝑝 𝑖+1 − 𝑝 𝑖 = 𝑎 𝑖 (1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 )
for some 𝑎 𝑖 ∈ F𝑞 \ {0}. Thus (recall that 𝑝 𝑘+1 = 𝑝 1 )
∑︁
𝑘 ∑︁
𝑘
𝑎 𝑖 (1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 ) = ( 𝑝 𝑖+1 − 𝑝 𝑖 ) = 0. (1.3)
𝑖=1 𝑖=1
The vectors (1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 ), 𝑖 = 1, . . . , 𝑘, after deleting duplicates, are linearly independent.
One way to see this is via the Vandermonde determinant
1 𝑥1 𝑥12 ··· 𝑥 1𝑘−1
1 𝑥2 𝑥22 ··· 𝑥 2𝑘−1 Ö
.. .. .. .. .. = (𝑥 𝑗 − 𝑥𝑖 ).
. . . . . 𝑖< 𝑗
1 𝑥𝑘 𝑥 2𝑘 ··· 𝑥 𝑘𝑘−1
For (1.3) to hold, each vector (1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 ) must appear at least twice in the sum, with
their coefficients 𝑎 𝑖 adding up to zero.
Since the lines ℓ1 , . . . , ℓ𝑘 are distinct, for each 𝑖 = 1, . . . , 𝑘 (indices taken mod 𝑘), the
lines ℓ𝑖 and ℓ𝑖+1 cannot be parallel. So 𝑡 𝑖 ≠ 𝑡 𝑖+1 . When 𝑘 ∈ {2, 3, 5} it is impossible to select
𝑡 1 , . . . , 𝑡 𝑘 with no equal consecutive terms (including wrap-around) and so that each value is
repeated at least twice. Therefore the 2𝑘-cycle cannot exist. (Why does the argument fail for
𝐶8 -freeness?) □
The algebraic constructions in the previous section can be abstractly described as follows.
Take a graph whose vertices are points in some algebraic set (e.g., some finite field geometry),
with two vertices 𝑥 and 𝑦 being adjacent if some algebraic relationship such as 𝑓 (𝑥, 𝑦) = 0
is satisfied. Previously, this 𝑓 was carefully chosen by hand. The new idea that is to take 𝑓
to be a random polynomial.
We illustrate this technique by giving another proof of the tightness of the KST bound on
extremal numbers for 𝐾𝑠,𝑡 when 𝑡 is large compared to 𝑠.
The construction we present here has a worse dependence of 𝑡 on 𝑠 than in Theorem 1.10.7.
The main purpose of this section is to illustrate the technique of randomized algebraic
constructions. Bukh (2021) later gave a significant extension of this technique which shows
that ex(𝑛, 𝐾𝑠,𝑡 ) = Ω𝑠 (𝑛2−1/𝑠 ) for some 𝑡 close to 9𝑠 , improving on Theorem 1.10.7, which
required 𝑡 > (𝑠 − 1)!.
Proof idea. Take a random polynomial 𝑓 (𝑋1 , . . . , 𝑋𝑠 , 𝑌1 , . . . , 𝑌𝑠 ) symmetric in the 𝑋 and 𝑌
variables (i.e., 𝑓 (𝑋, 𝑌 ) = 𝑓 (𝑌 , 𝑋)), but otherwise uniformly chosen among all polynomials
with degree up to 𝑑 with coefficients in F𝑞 . Consider a graph with vertex set F𝑞𝑠 and where
𝑋 and 𝑌 are adjacent if 𝑓 (𝑋, 𝑌 ) = 0.
Given an 𝑠-vertex set 𝑈, let 𝑍𝑈 denote the set of common neighbors of 𝑈. It is an algebraic
set: the common zeros of the polynomials 𝑓 (𝑋, 𝑦), 𝑦 ∈ 𝑈. Due to the Lang–Weil bound
from algebraic geometry, 𝑍𝑈 is either bounded in size, |𝑍𝑈 | ≤ 𝐶 (the zero dimensional case),
or it must be quite large, say, |𝑍𝑈 | > 𝑞/2 (the positive dimensional case). This is unlike an
Erdős–Rényi random graph.
One can then deduce, using Markov’s inequality, that
𝑞 E[|𝑍𝑈 | 𝑘 ] 𝑂 𝑘 (1)
P(|𝑍𝑈 | > 𝐶) = P |𝑍𝑈 | > ≤ = ,
2 (𝑞/2) 𝑘 (𝑞/2) 𝑘
which is quite small (much smaller compared to an Erdős–Rényi random graph). So typically
very few sets 𝑈 have size > 𝐶. By deleting these bad 𝑈’s from the vertex set of the graph,
we obtain a 𝐾𝑠,𝐶+1 -free graph with around 𝑞 𝑠 vertices and on the order of 𝑞 2𝑠−1 edges. ■
Now we begin the actual proof. Let 𝑞 be the largest prime power satisfying 𝑞 𝑠 ≤ 𝑛. Due
to prime gaps (Theorem 1.10.3), we have 𝑞 = (1 − 𝑜(1))𝑛1/𝑠 . So it suffices to construct a
𝐾𝑠,𝑡 -free graph on 𝑞 𝑠 vertices with (1/2 − 𝑜(1))𝑞 2𝑠−1 edges.
Let 𝑑 = 𝑠2 + 𝑠 (the reason for this choice will come up later). Let
𝑓 ∈ F𝑞 [𝑋1 , 𝑋2 , . . . , 𝑋𝑠 , 𝑌1 , 𝑌2 , . . . , 𝑌𝑠 ] ≤𝑑
be a polynomial chosen uniformly at random among all polynomials with degree at most
𝑑 in each of 𝑋 = (𝑋1 , 𝑋2 , . . . , 𝑋𝑠 ) and 𝑌 = (𝑌1 , 𝑌2 , . . . , 𝑌𝑠 ) and furthermore satisfying
48 Forbidding a Subgraph
where the coefficients 𝑎 𝑖1 ,...,𝑖𝑠 , 𝑗1 ,..., 𝑗𝑠 ∈ F𝑞 are chosen subject to 𝑎 𝑖1 ,...,𝑖𝑠 , 𝑗1 ,..., 𝑗𝑠 = 𝑎 𝑗1 ,..., 𝑗𝑠 ,𝑖1 ,...,𝑖𝑠
but otherwise independently and uniformly at random.
Let 𝐺 be the graph with vertex set F𝑞𝑠 , with distinct 𝑥, 𝑦 ∈ F𝑞𝑠 adjacent if and only if
𝑓 (𝑥, 𝑦) = 0.
Then 𝐺 is a random graph. The next two lemmas show that 𝐺 behaves in some ways like
a random graph with edges independently appearing with probability 1/𝑞. Indeed, the next
lemma shows that every pair of vertices form an edge with probability 1/𝑞.
Proof. Note that resampling the constant term of 𝑓 does not change its distribution. Thus,
𝑓 (𝑢, 𝑣) is uniformly distributed in F𝑞 for a fixed (𝑢, 𝑣). Hence 𝑓 (𝑢, 𝑣) takes each value with
probability 1/𝑞. □
More generally, we show below that the expected occurrence of small subgraphs mirrors
that of the usual random graph with independent edges. We write 𝑈2 for the set of unordered
pairs of element from 𝑈.
Proof. We first perform multivariate Lagrange interpolation to show that ( 𝑓 (𝑢, 𝑣)) {𝑢,𝑣 } can
take all possible values. For each pair 𝑢, 𝑣 ∈ 𝑊 with 𝑢 ≠ 𝑣, we can find some polynomial
ℓ𝑢,𝑣 ∈ F[𝑋1 , . . . , 𝑋𝑠 ] of degree at most 1 such that ℓ𝑢,𝑣 (𝑢) = 1 and ℓ𝑢,𝑣 (𝑣) = 0. For each
𝑢 ∈ 𝑊, let
Ö
𝑞 𝑢 (𝑋) = ℓ𝑢,𝑣 (𝑋) ∈ F[𝑋1 , . . . , 𝑋𝑠 ]
𝑣 ∈𝑊\{𝑢}
which has degree ≤ |𝑊 | − 1 ≤ 𝑑. It satisfies 𝑞 𝑢 (𝑢) = 1, and 𝑞 𝑢 (𝑣) = 0 for all 𝑣 ∈ 𝑊 \ {𝑢}.
Let
∑︁
𝑝(𝑋, 𝑌 ) = 𝑐 𝑢,𝑣 (𝑞 𝑢 (𝑋)𝑞 𝑣 (𝑌 ) + 𝑞 𝑣 (𝑋)𝑞 𝑢 (𝑌 ))
{𝑢,𝑣 } ∈ ( 2 )
𝑊
with 𝑐 𝑢,𝑣 ∈ F𝑞 . Note that 𝑝(𝑋, 𝑌 ) = 𝑝(𝑌 , 𝑋). Also, 𝑝(𝑢, 𝑣) = 𝑐 𝑢,𝑣 for all distinct 𝑢, 𝑣 ∈ 𝑊.
Now let each 𝑐 𝑢,𝑣 ∈ F𝑞 above be chosen independently and uniformly at random. So
1.11 Randomized Algebraic Constructions 49
𝑝(𝑋, 𝑌 ) is a random polynomial. Note that 𝑓 (𝑋, 𝑌 ) and 𝑝(𝑋, 𝑌 ) are independent random
polynomials both with degree at most 𝑑 in each of 𝑋 and 𝑌 . Since 𝑓 is chosen uniformly
( |𝑊 |)
at random, it has the same distribution as 𝑓 + 𝑝. Since ( 𝑝(𝑢, 𝑣))𝑢,𝑣 = (𝑐 𝑢,𝑣 )𝑢,𝑣 ∈ F𝑞 2 is
uniformly distributed, the same must be true for ( 𝑓 (𝑢, 𝑣))𝑢,𝑣 as well. □
Now fix 𝑈 ⊆ F𝑞𝑠 with |𝑈| = 𝑠. We want to show that it is rare for 𝑈 to have many common
neighbors. We will use the method of moments. Let
𝑍𝑈 = the set of common neighbors of 𝑈
= {𝑥 ∈ F𝑞𝑠 \ 𝑈 : 𝑓 (𝑥, 𝑢) = 0 for all 𝑢 ∈ 𝑈}.
Then using Lemma 1.11.3, for any 𝑘 ≤ 𝑠2 + 1,
h ∑︁ 𝑘i
E[|𝑍𝑈 | 𝑘 ] = E 1{𝑣 ∈ 𝑍𝑈 }
∑︁
𝑣 ∈F𝑞𝑠 \𝑈
= E[1{𝑣 (1) , . . . , 𝑣 (𝑘 ) ∈ 𝑍𝑈 }]
∑︁
𝑣 (1) ,...,𝑣 (𝑘) ∈F𝑞𝑠 \𝑈
𝑞 − |𝑈 | #{𝑣
(1)
,...,𝑣 (𝑘) }
= ,
𝑣 (1) ,...,𝑣 (𝑘) ∈F𝑞𝑠 \𝑈
with the final step due to Lemma 1.11.3 applied with 𝑊 = 𝑈 ∪ {𝑣 (1) , . . . , 𝑣 (𝑘 ) }, which has
cardinality ≤ |𝑈| + 𝑘 ≤ 𝑠 + 𝑠2 + 1 = 𝑑 + 1. Note that #{𝑣 (1) , . . . , 𝑣 (𝑘 ) } counts distinct elements
in the set. Thus, continuing the above calculation,
∑︁ 𝑞 𝑠 − |𝑈|
= 𝑞 −𝑟 𝑠 #{surjections [𝑘] → [𝑟]}
𝑟
∑︁
𝑟 ≤𝑘
= 𝑂 𝑘 (1),
Applying the above with 𝑘 = 𝑠2 + 1 and using Markov’s inequality, we get
2
𝑠 2 +1 𝑠 2 +1
E |𝑍𝑈 | 𝑠 +1 𝑂 𝑠 (1)
P(|𝑍𝑈 | ≥ 𝜆) = P(|𝑍𝑈 | ≥𝜆 )≤ ≤ 𝑠2 +1 . (1.4)
𝜆 𝑠2 +1 𝜆
Remark 1.11.4. All the probabilistic arguments up to this point would be identical had
we used a random graph with independent edges appearing with probability 𝑝. In both
settings, the |𝑍𝑈 | above is a random variable with constant order expectation. However,
their distributions are extremely different, as we will soon see. For a random graph with
independent edges, |𝑍𝑈 | behaves like a Poisson random variable, and consequently, for any
constant 𝑡, P(|𝑍𝑈 | ≥ 𝑡) is bounded from below by a constant. Consequently, many 𝑠-element
sets of vertices are expected to have at least 𝑡 common neighbors, and so this method will not
work. However, this is not the case with the random algebraic construction. It is impossible
for |𝑍𝑈 | to take on certain ranges of values—if |𝑍𝑈 | is somewhat large, then it must be very
large.
50 Forbidding a Subgraph
Note that 𝑍𝑈 is defined by 𝑠 polynomial equations. The next result tells us that the number
of points on such an algebraic variety must be either bounded or at least around 𝑞.
The lemma can be deduced from the following important result from algebraic geometry
due to Lang & Weil (1954), which says that the number of points of an 𝑟-dimensional
algebraic variety in F𝑞𝑠 is roughly 𝑞 𝑟 , as long as certain irreducibility hypotheses are satisfied.
We include here the statement of the Lang–Weil bound. Here F𝑞 denote the algebraic closure
of F𝑞 .
The two cases in Lemma 1.11.5 then correspond to the zero dimensional case and the
positive dimensional case, though some care is needed to deal with what happens if the
variety is reducible in the field closure. We refer the reader to Bukh (2015) for details on how
to deduce Lemma 1.11.5 from the Lang–Weil bound.
Now, continuing our proof of Theorem 1.11.1. Recall 𝑍𝑈 = {𝑥 ∈ F𝑞𝑠 \ 𝑈 : 𝑓 (𝑥, 𝑢) =
0 for all 𝑢 ∈ 𝑈}. Apply Lemma 1.11.5 to the polynomials 𝑓 (𝑋, 𝑢), 𝑢 ∈ 𝑈. Then for large
enough 𝑞 there exists a constant 𝐶 from Lemma 1.11.5 such that either |𝑍𝑈 | ≤ 𝐶 (bounded)
or |𝑍𝑈 | > 𝑞/2 (very large). Thus, by (1.4),
𝑞 𝑂 𝑠 (1)
P(|𝑍𝑈 | > 𝐶) = P |𝑍𝑈 | > ≤ .
2 (𝑞/2) 𝑠2 +1
So the expected number of 𝑠-element subset 𝑈 with |𝑍𝑈 | > 𝐶 is
𝑠
𝑞 𝑂 𝑠 (1)
≤ = 𝑂 𝑠 (1/𝑞).
𝑠 (𝑞/2) 𝑠2 +1
Remove from 𝐺 a vertex from every 𝑠-element 𝑈 with |𝑍𝑈 | > 𝐶. Then the resulting graph is
𝐾𝑠, ⌈𝐶 ⌉+1 -free. Since we remove at most 𝑞 𝑠 edges for each deleted vertex, the expected number
of remaining edges is at least
1 𝑞𝑠 1
𝑠−1
− 𝑂 𝑠 (𝑞 ) = − 𝑜(1) 𝑞 2𝑠−1 .
𝑞 2 2
1.11 Randomized Algebraic Constructions 51
Finally, given 𝑛, we can take the largest prime 𝑞 satisfying 𝑞 𝑠 ≤ 𝑛 to finish the proof of
Theorem 1.11.1.
Further Reading
Graph theory is a huge subject. There are many important topics that are quite far from
the main theme of this book. For a standard introduction to the subject (especially on more
classical aspects), several excellent graph theory textbooks are available: Bollobás (1998),
Bondy & Murty (2008), Diestel (2017), West (1996). The three-volume Combinatorial
Optimization by Schrijver (2003) is also an excellent reference for graph theory, with a focus
on combinatorial algorithms.
The following surveys discuss in more depth various topics encountered in this chapter:
• The History of Degenerate (Bipartite) Extremal Graph problems by Füredi & Si-
monovits (2013);
• Hypergraph Turán Problems by Keevash (2011);
• Dependent Random Choice by Fox & Sudakov (2011).
Chapter Summary
• Turán number ex(𝑛, 𝐻) = the maximum number of edges in an 𝑛-vertex 𝐻-free graph.
• Turán’s theorem. Among all 𝑛-vertex 𝐾𝑟+1 -free graphs, the Turán graph 𝑇𝑛,𝑟 (a complete
𝑟-partite graph with nearly equal sized parts) uniquely maximizes the number of edges.
• Erdős–Stone–Simonovits Theorem. For any fixed graph 𝐻,
2
1 𝑛
ex(𝑛, 𝐻) = 1 − + 𝑜(1) .
𝜒(𝐻) − 1 2
• Supersaturation (from one copy to many copies): an 𝑛-vertex graph with ≥ ex(𝑛, 𝐻)+𝜀𝑛2
edges has ≥ 𝛿𝑛 𝑣 (𝐻 ) copies of 𝐻, for some constant 𝛿 > 0 only depending on 𝜀 > 0, and
provided that 𝑛 is sufficiently large.
• Kővári–Sós–Turán theorem. For fixed 𝑠 ≤ 𝑡,
ex(𝑛, 𝐾 𝑠,𝑡 ) = 𝑂 𝑠,𝑡 (𝑛2−1/𝑠 ).
– Tight for 𝐾2,2 , 𝐾3,3 , and more generally, for 𝐾 𝑠,𝑡 with 𝑡 much larger than 𝑠 (algebraic
constructions).
– Conjectured to be tight in general.
• Even cycles. For any integer 𝑘 ≥ 2, (we only proved a weaker statement in this book)
Chapter Highlights
• Szemerédi’s graph regularity lemma: partitioning an arbitrary graph into a bounded num-
ber of parts with random-like edges between parts
• Graph regularity method: recipe and applications
• Graph removal lemma
• Roth’s theorem: a graph theoretic proof using the triangle removal lemma
• Strong regularity and induced graph removal lemma
• Graph property testing
• Hypergraph removal lemma and Szemerédi’s theorem
53
54 Graph Regularity Method
of parameters in the statements and proofs below, and rather, focus on the main ideas and
techniques.
Many students experience a steep learning curve when studying the regularity method.
The technical details can obscure the underlying intuition. Also, the style of arguments may
be quite different from the type of combinatorial proofs they encountered earlier in their
studies (e.g., the type of proofs from earlier in this book). Section 2.7 contains important
exercises on applying the graph regularity method, which are essential for understanding the
material.
We allow 𝑋 and 𝑌 to overlap in the definition above. For intuition, it is mostly fine to
picture the bipartite setting, where 𝑋 and 𝑌 are automatically disjoint.
What should it mean for a graph to be “random-like”? We will explore the concept of
pseudorandom graphs in depth in Chapter 3. Given vertex sets 𝑋 and 𝑌 , we would like
2.1 Szemerédi’s Graph Regularity Lemma 55
the edge density between them to not change much even if we restrict 𝑋 and 𝑌 to smaller
subsets. Intuitively, this says that the edges are somewhat evenly distributed.
𝑈 𝐴 𝐵 𝑊
We need the hypotheses | 𝐴| ≥ 𝜀 |𝑈| and |𝐵| ≥ 𝜀 |𝑊 | since the definition would be too
restrictive otherwise. For example, by taking 𝐴 = {𝑥} and 𝐵 = {𝑦}, 𝑑 ( 𝐴, 𝐵) could end up
being both 0 (if 𝑥𝑦 ∉ 𝐸) and 1 (if 𝑥𝑦 ∈ 𝐸).
Remark 2.1.3 (Different roles of 𝜀 ). The 𝜀 in | 𝐴| ≥ 𝜀 |𝑈| and |𝐵| ≥ 𝜀 |𝑊 | plays a different
role from the 𝜀 in |𝑑 ( 𝐴, 𝐵) − 𝑑 (𝑈, 𝑊)| ≤ 𝜀. However, it is usually not important to distinguish
these 𝜀’s. So we use only one 𝜀 for convenience of notation.
The “random-like” intuition is justified as random graphs indeed satisfy the above property.
(This can be proved by the Chernoff bound; more on this in the next chapter.)
The following exercises can help you check your understanding of 𝜀-regularity.
Exercise 2.1.4 (Basic inheritance of regularity). Let 𝐺 be a graph and 𝑋, 𝑌 ⊆ 𝑉 (𝐺). If
(𝑋, 𝑌 ) is an 𝜀𝜂-regular pair, then (𝑋 ′ , 𝑌 ′ ) is 𝜀-regular for all 𝑋 ′ ⊆ 𝑋 with |𝑋 ′ | ≥ 𝜂 |𝑋 | and
𝑌 ′ ⊆ 𝑌 with |𝑌 ′ | ≥ 𝜂 |𝑌 |.
Exercise 2.1.5 (An alternate definition of regular pairs). Let 𝐺 be a graph and 𝑋, 𝑌 ⊆
𝑉 (𝐺). Say that (𝑋, 𝑌 ) is 𝜺-homogeneous if for all 𝐴 ⊆ 𝑋 and 𝐵 ⊆ 𝑌 , one has
|𝑒( 𝐴, 𝐵) − | 𝐴| |𝐵| 𝑑 (𝑋, 𝑌 )| ≤ 𝜀 |𝑋 | |𝑌 | .
Show that if (𝑋, 𝑌 ) is 𝜀-regular, then it is 𝜀-homogeneous. Also, show that if (𝑋, 𝑌 ) is
𝜀 3 -homogeneous, then it is 𝜀-regular.
Exercise 2.1.6 (Robustness of regularity). Prove that for every 𝜀 ′ > 𝜀 > 0, there exists
𝛿 > 0 so that given an 𝜀-regular pair (𝑋, 𝑌 ) in some graph, if we modify the graph by
adding/deleting ≤ 𝛿 |𝑋 | vertices to/from 𝑋, adding/deleting ≤ 𝛿 |𝑌 | vertices to/from 𝑌 , and
adding/deleting ≤ 𝛿 |𝑋 | |𝑌 | edges, then resulting new (𝑋, 𝑌 ) is still 𝜀 ′ -regular.
Next, let us define what it means for a vertex partition to be 𝜀-regular.
56 Graph Regularity Method
In other words, all but at most 𝜀-fraction of pairs of vertices of 𝐺 lie between 𝜀-regular
parts.
Remark 2.1.8. When |𝑉1 | = · · · = |𝑉𝑘 |, the inequality says that at most 𝜀𝑘 2 of pairs (𝑉𝑖 , 𝑉 𝑗 )
are not 𝜀-regular.
Also, note that the summation includes 𝑖 = 𝑗. If none of 𝑉𝑖 ’s are too large, say |𝑉𝑖 | ≤ 𝜀𝑛 for
Í Í
each 𝑖, then the terms with 𝑖 = 𝑗 contribute ≤ 𝑖 |𝑉𝑖 | 2 ≤ 𝜀𝑛 𝑖 |𝑉𝑖 | = 𝜀𝑛2 , which is neglible.
We are now ready to state Szemerédi’s graph regularity lemma.
Since the edge density is always between 0 and 1, we have 0 ≤ 𝑞(P) ≤ 1 for all
partitions P. The following lemmas show that the energy cannot decrease upon refinement,
and furthermore, it must increase substantially at each step of the algorithm above.
𝑈 𝑊
P𝑈 P𝑊
We have
∑︁
𝑘 ∑︁𝑙
|𝑈𝑖 | |𝑊 𝑗 | 𝑛2
E[𝑍 2 ] = 𝑑 (𝑈𝑖 , 𝑊 𝑗 ) 2 = 𝑞(P𝑈 , P𝑊 ).
𝑖=1 𝑗=1
|𝑈| |𝑊 | |𝑈||𝑊 |
Proof. The conclusion follows by applying Lemma 2.1.11 to each pair of parts of P. In
more detail, letting P = {𝑉1 , . . . , 𝑉𝑚 }, and suppose P ′ refines each 𝑉𝑖 into a partition
P𝑉′ 𝑖 = {𝑉𝑖1′ , . . . , 𝑉𝑖𝑘′ 𝑖 } of 𝑉𝑖 , so that P ′ = P𝑉′ 1 ∪ · · · ∪ P𝑉′ 𝑚 , we have
∑︁ ∑︁
𝑞(P) = 𝑞(𝑉𝑖 , 𝑉 𝑗 ) ≤ 𝑞(P𝑉′ 𝑖 , P𝑉′ 𝑗 ) = 𝑞(P ′ ). □
𝑖, 𝑗 𝑖, 𝑗
Proof. Let
𝑅 = {(𝑖, 𝑗) ∈ [𝑘] 2 : (𝑉𝑖 , 𝑉 𝑗 ) is 𝜀-regular} and 𝑅 = [𝑘] 2 \ 𝑅.
For each pair (𝑉𝑖 , 𝑉 𝑗 ) that is not 𝜀-regular, find a pair 𝐴𝑖, 𝑗 ⊆ 𝑉𝑖 and 𝐵𝑖, 𝑗 ⊆ 𝑉 𝑗 that witnesses
the irregularity. Do this simultaneously for all (𝑖, 𝑗) ∈ 𝑅. Note for 𝑖 ≠ 𝑗, we can take
𝐴𝑖, 𝑗 = 𝐵 𝑗,𝑖 due to symmetry. When 𝑖 = 𝑗, we should allow for the possibility of 𝐴𝑖,𝑖 and 𝐵𝑖,𝑖
to be distinct.
2.1 Szemerédi’s Graph Regularity Lemma 59
𝑉1 𝑉1 𝑉1 𝑉1
𝐴12 𝐴13 𝐵11
𝐴11
𝐵12 𝐵13
𝐴22 𝐵22
𝐵33
𝐴33
𝑉2 𝑉3 𝑉2 𝑉3 𝑉2 𝑉3 𝑉2 𝑉3
𝐴23 𝐵23
𝑉1
𝑉2 𝑉3
Figure 2.1 In the proof of Lemma 2.1.14, we refine the partition by taking a
common refinement using witnesses of irregular pairs.
Let Q be a common refinement of P by all the 𝐴𝑖, 𝑗 and 𝐵𝑖, 𝑗 (i.e., the parts of Q are
maximal subsets that are not “cut up” into small pieces by any element of P or by the 𝐴𝑖, 𝑗
and 𝐵𝑖, 𝑗 ; intuitively, imagine regions of a Venn diagram). See Figure 2.1 for an illustration.
There are ≤ 𝑘 + 1 such distinct non-empty sets inside each 𝑉𝑖 . So Q refines each 𝑉𝑖 into at
most 2𝑘+1 parts. Let Q𝑖 be the partition of 𝑉𝑖 given by Q. Then, using the monotonicity of
energy under refinements (Lemma 2.1.11),
∑︁
𝑞(Q) = 𝑞(Q𝑖 , Q 𝑗 )
∑︁ ∑︁
(𝑖, 𝑗 ) ∈ [ 𝑘 ] 2
= 𝑞(Q𝑖 , Q 𝑗 ) + 𝑞(Q𝑖 , Q 𝑗 )
∑︁ ∑︁
(𝑖, 𝑗 ) ∈ 𝑅 (𝑖, 𝑗 ) ∈ 𝑅
The first sum equals 𝑞(P), and the second sum is > 𝜀 5 by Lemma 2.1.13 since P is not
𝜀-regular. This gives the desired inequality. □
Remark 2.1.15 (Refinements should be done simultaneously). Here is a subtle point in
the above proof. The refinement Q must be obtained in a single step by refining P using all
the witnessing sets 𝐴𝑖, 𝑗 simultaneously. If instead we pick out a pair 𝐴𝑖, 𝑗 ⊆ 𝑉𝑖 and 𝐴 𝑗,𝑖 ⊆ 𝑉 𝑗 ,
60 Graph Regularity Method
refine the partition using just this pair, and then iterate using another irregular pair (𝑉𝑖′ , 𝑉 𝑗 ′ ),
the energy boost step would not work. This is because 𝜀-regularity (or lack thereof) is not
well-preserved under taking refinements.
Proof of the graph regularity lemma (Theorem 2.1.9). Start with a trivial partition of the
vertex set of the graph. Repeatedly apply Lemma 2.1.14 whenever the current partition is
not 𝜀-regular. By Lemma 2.1.14, the energy of the partition increases by more than 𝜀 5 at
each iteration. Since the energy of the partition is ≤ 1, we must stop after < 𝜀 −5 iterations,
terminating in an 𝜀-regular partition.
If a partition has 𝑘 parts, then Lemma 2.1.14 produces a refinement with ≤ 𝑘2𝑘+1 parts.
We start with a trivial partition with one part, and then refine < 𝜀 −5 times. Observe the crude
bound 𝑘2𝑘+1 ≤ 22 . So the total number of parts at the end is ≤ tower( ⌈2𝜀 −5 ⌉), where
𝑘
·2
·· height 𝑘
tower(𝒌) := 22 . □
Remark 2.1.16 (The proof does not guarantee that the partition becomes “more regular”
after each step.). Let us stress what the proof is not saying. It is not saying that the partition
gets more and more regular under each refinement. Also, it is not saying that partition gets
more regular as the energy gets higher. Rather, the energy simply bounds the number of
iterations.
The bound on the number of parts guaranteed by the proof is a constant for each fixed
𝜀 > 0, but it grows extremely quickly as 𝜀 gets smaller. Is the poor quantitative dependence
somehow due to a suboptimal proof strategy? Surprisingly, the tower-type bound is necessary,
as shown by Gowers (1997).
We do not include the proof here; see Moshkovitz & Shapira (2016) for a short proof.
The general idea is to construct a graph that roughly reverse engineers the proof of the
regularity lemma, so there is essentially a unique 𝜀-regular partition, which must have many
parts.
Remark 2.1.18 (Irregular pairs are necessary in the regularity lemma). Recall that in
Definition 2.1.7 of an 𝜀-regular partition, we are allowed to have some irregular pairs.
Are irregular pairs necessarily? It turns that we must permit them. Exercise 2.1.24 gives
an example of a canonical example (a “half graph”) where every regularity partition has
irregular pairs.
The regularity lemma is quite flexible. For example, we can start with an arbitrary partition
of 𝑉 (𝐺) instead of the trivial partition in the proof, in order to obtain a partition that is a
refinement of a given partition. The exact same proof with this modification yields the
following.
2.1 Szemerédi’s Graph Regularity Lemma 61
Here is another strengthening of the regularity lemma. We impose the additional require-
ment that vertex parts should be as equal in size as possible. We say that a partition is
equitable if all part sizes are within one of each other; that is, |𝑉𝑖 | − 𝑉 𝑗 ≤ 1. In other
words, a partition of a set of size 𝑛 into 𝑘 parts is equitable if every part has size ⌊𝑛/𝑘⌋ or
⌈𝑛/𝑘⌉.
Remark 2.1.21. The lower bound 𝑚 0 requirement on the number of parts is somewhat
superficial. The reason for including it here is that it is often convenient to discard all the
edges that lie within individual parts of the partition, and since there are most 𝑛2 /𝑘 such
edges, they contribute negligibly if the number of parts 𝑘 is not too small, which is true if
we require 𝑚 0 ≥ 1/𝜀 in the equitable regularity lemma statement.
There are several ways to guarantee equitability. One method is sketched below. We
equitize the partition at every step of the refinement iteration, so that at each step in the proof,
we both obtain an energy increment and also end up with an equitable partition.
Proof sketch of the equitable regularity lemma (Theorem 2.1.20). Here is a modified al-
gorithm:
(1) Start with an arbitrary equitable partition of the graph into 𝑚 0 parts.
(2) While the current equitable partition P is not 𝜀-regular:
(a) (Refinement/energy boost) Refine the partition using pairs that witness irreg-
ularity (as in the earlier proof). The new partition P ′ divides each part of P
into ≤ 2 | P | parts.
(b) (Equitization) Modify P ′ into an equitable partition by arbitrarily chopping
each part of P ′ into parts of size |𝑉 (𝐺)| /𝑚 (for some appropriately chosen
𝑚 = 𝑚(|P ′ | , 𝜀)) plus some leftover pieces, which are then combined together
and then divided into parts of size |𝑉 (𝐺)| /𝑚.
The refinement step (2)(a) increases energy by ≥ 𝜀 5 as before. The energy might go down
in the equitization step (2)(b), but it should
not decrease
by much, provided that the 𝑚 chosen
in that step is large enough (say, 𝑚 = 100 |P ′ | 𝜀 −5 ). So overall, we still have an energy
increment of ≥ 𝜀 5 /2 at each step, and hence the process still terminates after 𝑂 (𝜀 −5 ) steps.
The total number of parts at the end is ≤ 𝑚 0 tower(𝑂 (𝜀 −5 )). □
Exercise 2.1.22. Complete the details in the above proof sketch.
62 Graph Regularity Method
Exercise 2.1.23 (Making each part 𝜀 -regular to nearly all other parts). Prove that for
all 𝜀 > 0 and 𝑚 0 , there exists a constant 𝑀 so that every graph has an equitable vertex
partition into 𝑘 parts, with 𝑚 0 ≤ 𝑘 ≤ 𝑀, such that each part is 𝜀-regular with all but at
most 𝜀𝑘 other parts.
The important example in the next exercise shows why we must allow irregular pairs in
the graph regularity lemma.
Exercise 2.1.24 (Unavoidability of irregular pairs). Let the half-graph 𝐻𝑛 be the bipartite
graph on 2𝑛 vertices {𝑎 1 , . . . , 𝑎 𝑛 , 𝑏 1 , . . . , 𝑏 𝑛 } with edges {𝑎 𝑖 𝑏 𝑗 : 𝑖 ≤ 𝑗 }.
(a) For every 𝜀 > 0, explicitly construct an 𝜀-regular partition of 𝐻𝑛 into 𝑂 (1/𝜀) parts.
(b) Show that there is some 𝑐 > 0 such that for every 𝜀 ∈ (0, 𝑐), every positive integer
𝑘 and sufficiently large multiple 𝑛 of 𝑘, every partition of the vertices of 𝐻𝑛 into 𝑘
equal-sized parts contains at least 𝑐𝑘 pairs of parts which are not 𝜀-regular.
The next exercise should remind you of the iteration technique from the proof of the graph
regularity lemma.
Exercise 2.1.25 (Existence of a regular pair of subsets). Show that there is some absolute
constant 𝐶 > 0 such that for every 0 < 𝜀 < 1/2, every graph on 𝑛 vertices contains an
𝜀-regular pair of vertex subsets each with size at least 𝛿𝑛, where 𝛿 = 2− 𝜀 .
−𝐶
This exercise asks for two different proofs of the following theorem.
Given a graph 𝐺, we say that 𝑋 ⊆ 𝑉 (𝐺) is 𝜺-regular if the pair (𝑋, 𝑋) is 𝜀-regular; that
is, for all 𝐴, 𝐵 ⊆ 𝑋 with | 𝐴| , |𝐵| ≥ 𝜀 |𝑋 |, one has |𝑑 ( 𝐴, 𝐵) − 𝑑 (𝑋, 𝑋)| ≤ 𝜀.
𝑋, 𝑌 , 𝑍 with the same edge densities between parts. By comparing 𝐺 to its random model
approximation, we expect the number of triples (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍 forming a triangle in
𝐺 to be roughly
𝑑 (𝑋, 𝑌 )𝑑 (𝑋, 𝑍)𝑑 (𝑌 , 𝑍)|𝑋 ||𝑌 ||𝑍 |.
The triangle counting lemma makes this intuition precise.
𝑋
𝑦 𝑧
𝑌 𝑍
Remark 2.2.2. The vertex sets 𝑋, 𝑌 , 𝑍 do not have to be disjoint, but one does not lose
any generality by assuming that they are disjoint in this statement. Indeed, starting with
𝑋, 𝑌 , 𝑍 ⊆ 𝑉 (𝐺), one can always create an auxiliary tripartite graph 𝐺 ′ with vertex parts
being disjoint replicas of 𝑋, 𝑌 , 𝑍 and the edge relations in 𝑋 × 𝑌 being the same for 𝐺 and
𝐺 ′ , and likewise for 𝑋 × 𝑍 and 𝑌 × 𝑍. Under this auxiliary construction, a triple in 𝑋 × 𝑌 × 𝑍
forms a triangle in 𝐺 if and only it forms a triangle in 𝐺 ′ .
𝑋
𝑋
−→
𝑌 𝑍
𝑌 𝑍
𝐺 𝐺′
Now we show that in an 𝜀-regular pair (𝑋, 𝑌 ), almost all vertices of 𝑋 have roughly the
same number of neighbors in 𝑌 (the next lemma only states a lower bound on degree, but the
same argument also gives an analogous upper bound).
64 Graph Regularity Method
Proof. Let 𝐴 be the subset of vertices in 𝑋 with < (𝑑 (𝑋, 𝑌 ) − 𝜀) |𝑌 | neighbors in 𝑌 . Then
𝑑 ( 𝐴, 𝑌 ) < 𝑑 (𝑋, 𝑌 ) − 𝜀, and thus | 𝐴| < 𝜀 |𝑋 | by Definition 2.1.2 as (𝑋, 𝑌 ) is an 𝜀-regular
pair. The other claim is similar. □
Proof of Theorem 2.2.1. By Lemma 2.2.3, we can find 𝑋 ′ ⊆ 𝑋 with |𝑋 ′ | ≥ (1 − 2𝜀) |𝑋 |
such that every vertex 𝑥 ∈ 𝑋 ′ has ≥ (𝑑 (𝑋, 𝑌 ) − 𝜀) |𝑌 | neighbors in 𝑌 and ≥ (𝑑 (𝑋, 𝑍) − 𝜀)|𝑍 |
neighbors in 𝑍. Write 𝑁𝑌 (𝑥) = 𝑁 (𝑥) ∩ 𝑌 and 𝑁 𝑍 (𝑥) = 𝑁 (𝑥) ∩ 𝑍.
𝑌 𝑍
𝑁𝑌 (𝑥) 𝑁 𝑍 (𝑥)
𝜖-regular
For each such 𝑥 ∈ 𝑋 ′ , we have |𝑁𝑌 (𝑥)| ≥ (𝑑 (𝑋, 𝑌 ) − 𝜀) |𝑌 | ≥ 𝜀|𝑌 |. Likewise, |𝑁 𝑍 (𝑥)| ≥
𝜀|𝑍 |. Since (𝑌 , 𝑍) is 𝜀-regular, the edge density between 𝑁𝑌 (𝑥) and 𝑁 𝑍 (𝑥) is ≥ 𝑑 (𝑌 , 𝑍) − 𝜀.
So for each 𝑥 ∈ 𝑋 ′ , the number of edges between 𝑁𝑌 (𝑥) and 𝑁 𝑍 (𝑥) is
≥ (𝑑 (𝑌 , 𝑍) − 𝜀)|𝑁𝑌 (𝑥)||𝑁 𝑍 (𝑥)| ≥ (𝑑 (𝑋, 𝑌 ) − 𝜀) (𝑑 (𝑋, 𝑍) − 𝜀) (𝑑 (𝑌 , 𝑍) − 𝜀)|𝑌 ||𝑍 |.
Multiplying by |𝑋 ′ | ≥ (1 − 2𝜀) |𝑋 |, we obtain the desired lower bound on the number of
triangles. □
Remark 2.2.4. We only need the lower bound on the triangle count for our applications in
this chapter, but the same proof can also be modified to give an upper bound, which we leave
as an exercise.
subcubic number of triangles (i.e., asymptotically less than the maximum possible number)
and “few edges” means a subquadratic number of edges.
𝑉𝑗 𝑉𝑘
Because edges between the pairs described in (a) and (b) were removed, 𝑉𝑖 , 𝑉 𝑗 , 𝑉𝑘 satisfy the
hypotheses of the triangle counting lemma (Theorem 2.2.1),
𝜀 𝜀 3
#{triangles in 𝑉𝑖 × 𝑉 𝑗 × 𝑉𝑘 } ≥ 1 − |𝑉𝑖 | 𝑉 𝑗 |𝑉𝑘 |
2 4
𝜀 𝜀 3 𝜀𝑛 3
≥ 1− ,
2 4 4𝑚
where the final step uses (c) above. Then as long as
1 𝜀 𝜀 3 𝜀 3
𝛿< 1− ,
6 2 4 4𝑚
we would contradict the hypothesis that the original graph has < 𝛿𝑛3 triangles (the extra
factor of 6 above is there to account for the possibility that 𝑉𝑖 = 𝑉 𝑗 = 𝑉𝑘 ). Since 𝑚 is bounded
for each fixed 𝜀, we see that 𝛿 can be chosen to depend only on 𝜀. □
The next corollary of the triangle removal lemma will soon be used to prove Roth’s
theorem. Here “diamond” refers to the following graph, consisting of two triangles sharing
an edge.
Proof. Let 𝐺 have 𝑚 edges. Because each edge lies in exactly one triangle, the number of
triangles in 𝐺 is 𝑚/3 = 𝑂 (𝑛2 ) = 𝑜(𝑛3 ). By the triangle removal lemma (see the statement
after Theorem 2.3.1), we can remove 𝑜(𝑛2 ) edges to make 𝐺 triangle-free. However, deleting
an edge removes at most one triangle from the graph by assumption, so 𝑚/3 edges need to
be removed to make 𝐺 triangle-free. Thus 𝑚 = 𝑜(𝑛2 ). □
2.4 Graph Theoretic Proof of Roth’s Theorem 67
Remark 2.3.4 (Quantitative dependencies in the triangle removal lemma). Since the above
proof of the triangle removal lemma applies the graph regularity lemma, the resulting
bounds from the proof are quite poor: it shows that one can pick 𝛿 = 1/tower(𝜀 −𝑂 (1) ).
Using a different but related method, Fox (2011) proved the triangle removal lemma with
a slightly better dependence 𝛿 = 1/tower(𝑂 (log(1/𝜀))). In the other direction, we know
that the triangle removal lemma does not hold with 𝛿 = 𝜀 𝑐 log(1/𝜀) for a sufficiently small
constant 𝑐 > 0. The construction comes from the Behrend construction of large 3-AP-free
sets that we will soon see in Section 2.5. Our knowledge of the quantitative dependence in
Corollary 2.3.3 comes from the same source, specifically, we know that the 𝑜(𝑛2 ) can be
sharpened to 𝑛2 /𝑒 Ω(log (1/𝜀) ) (where log∗ , the iterated logarithm function, is the number of
∗
iterations of log that one needs to take to√bring a number to at most 1) but the statement
is false if the 𝑜(𝑛2 ) is replaced by 𝑛2 𝑒 −𝐶 log 𝑛 for some sufficiently large constant 𝐶. It is
a major open problem to close the gap between these the upper and lower bounds in these
problems.
The triangle removal lemma was historically first considered in the following equivalent
formulation.
Exercise 2.3.6. Deduce the (6, 3)-theorem from Corollary 2.3.3, and vice-versa.
The following conjectural extension of the (6, 3)-theorem is is a major open problem in
extremal combinatorics. The conjecture is attributed to Brown, Erdős, & Sós (1973).
• (𝑥, 𝑦) ∈ 𝑋 × 𝑌 whenever 𝑦 − 𝑥 ∈ 𝐴;
• (𝑦, 𝑧) ∈ 𝑌 × 𝑍 whenever 𝑧 − 𝑦 ∈ 𝐴;
• (𝑥, 𝑧) ∈ 𝑋 × 𝑍 whenever (𝑧 − 𝑥)/2 ∈ 𝐴.
Z/𝑀Z
𝑥 ∼ 𝑦 iff 𝑥 𝑥 ∼ 𝑧 iff
𝑦−𝑥 ∈ 𝐴 (𝑧 − 𝑥)/2 ∈ 𝐴
𝑦 𝑧
Z/𝑀Z Z/𝑀Z
𝑦 ∼ 𝑧 iff
𝑧−𝑦 ∈ 𝐴
(Note that one could relax the assumption 𝑑 > 0 to 𝑑 ≠ 0, allowing “negative” corners. As
shown in the first step in the proof below, the assumption 𝑑 > 0 is inconsequential.)
Remark 2.4.3 (History). The theorem is due to Ajtai & Szemerédi (1974), who originally
proved it by invoking the full power of Szemerédi’s theorem. Here we present a much simpler
proof using the triangle removal lemma due to Solymosi (2003).
Proof. First we show how to relax the assumption in the definition of a corner from 𝑑 > 0
to 𝑑 ≠ 0.
Let 𝐴 ⊆ [𝑁] 2 be a corner-free set. For each 𝑧 ∈ Z2 , let 𝐴 𝑧 = 𝐴 ∩ (𝑧 − 𝐴). Then | 𝐴 𝑧 | is the
Í
number of ways that one can write 𝑧 = 𝑎 + 𝑏 for some (𝑎, 𝑏) ∈ 𝐴 × 𝐴. So 𝑧 ∈ [2𝑁 ] 2 | 𝐴 𝑧 | = | 𝐴| 2 ,
so there is some 𝑧 ∈ [2𝑁] with | 𝐴 𝑧 | ≥ | 𝐴| 2 /(2𝑁) 2 . To show that | 𝐴| = 𝑜(𝑁 2 ), it suffices
to show that | 𝐴 𝑧 | = 𝑜(𝑁 2 ). Moreover, since 𝐴 𝑧 = 𝑧 − 𝐴 𝑧 , it being corner-free implies that it
does not contain three points {(𝑥, 𝑦), (𝑥 + 𝑑, 𝑦), (𝑥, 𝑦 + 𝑑)} with 𝑑 ≠ 0.
2.4 Graph Theoretic Proof of Roth’s Theorem 69
Write 𝐴 = 𝐴 𝑧 from now on. Build a tripartite graph 𝐺 with parts 𝑋 = {𝑥1 , . . . , 𝑥 𝑁 },
𝑌 = {𝑦 1 , . . . , 𝑦 𝑁 } and 𝑍 = {𝑧1 , . . . , 𝑧 2𝑁 }, where each vertex 𝑥 𝑖 corresponds to a vertical line
{𝑥 = 𝑖} ⊆ Z2 , each vertex 𝑦 𝑗 corresponds to a horizontal line {𝑦 = 𝑗 }, and each vertex 𝑧 𝑘
corresponds to a slanted line {𝑦 = −𝑥 + 𝑘 } with slope −1. Join two distinct vertices of 𝐺
with an edge if and only if the corresponding lines intersect at a point belonging to 𝐴. Then,
each triangle in the graph 𝐺 corresponds to a set of three lines of slopes 0, ∞, −1 pairwise
intersecting at a point of 𝐴.
𝑋
𝑥𝑖
𝑦= 𝑗
𝑦𝑗
𝑥+𝑦=𝑘 𝑌 𝑧𝑘 𝑍
𝑥=𝑖
Since 𝐴 is corner-free in the sense stated at the end of the previous paragraph, 𝑥 𝑖 , 𝑦 𝑗 , 𝑧 𝑘 form
a triangle in 𝐺 if and only if the three corresponding lines pass through the same point of 𝐴
(i.e., forming a trivial corner with 𝑑 = 0). Since there is exactly one line of each direction
passing through every point of 𝐴, it follows that each edge of 𝐺 belongs to exactly one
triangle. Thus, by Corollary 2.3.3, 3 | 𝐴| = 𝑒(𝐺) = 𝑜(𝑁 2 ). □
The upper bound on corner-free sets actually implies Roth’s theorem, as shown below. So
we now have a second proof of Roth’s theorem (though, this second proof is secretly the
same as the first proof).
2𝑁
0 𝐴 𝑁 2𝑁
The rough idea is to first find a high dimensional sphere with many lattice points via the
pigeonhole principle. The sphere contains no 3-AP due to convexity. We then project these
2.5 Large 3-AP-Free Sets: Behrend’s Construction 71
lattice points onto Z in a way that creates no additional 3-APs. This is done by treating the
coordinates as the base-𝑞 expansion of an integer with some large 𝑞.
Proof. Let 𝑚 and 𝑑 be two positive integers depending on 𝑁 to be specified
√ later. Consider
the lattice points of 𝑋 = {0, 1, . . . , 𝑚 − 1} 𝑑 that lie on a sphere of radius 𝐿:
𝑋 𝐿 := (𝑥 1 , . . . , 𝑥 𝑑 ) ∈ 𝑋 : 𝑥12 + · · · + 𝑥 2𝑑 = 𝐿 .
Ð𝑑𝑚2
Then, 𝑋 = 𝑖=1 𝑋𝑖 . So by the pigeonhole principle, there exists an 𝐿 ∈ [𝑑𝑚 2 ] such that
2
|𝑋 𝐿 | ≥ 𝑚 /(𝑑𝑚 ). Define the base 2𝑚 digital expansion
𝑑
∑︁
𝑑
𝜙(𝑥1 , . . . , 𝑥 𝑑 ) := 𝑥𝑖 (2𝑚) 𝑖−1 .
𝑖=1
Proof. In the proof of Theorem 2.4.1, starting from a 3-AP-free set 𝐴 ⊆ [𝑁], we constructed a
graph with 6𝑁 +3 vertices and (6𝑁 +3) | 𝐴| edges such that every edge lies in a unique triangle.
Choosing 𝑁 √= ⌊(𝑛 − 3)/6⌋ and letting 𝐴 be the Behrend construction of Theorem 2.5.1 with
| 𝐴| ≥ 𝑁𝑒 −𝐶 log 𝑁 , we obtain the desired graph. □
Remark 2.5.3 (More lower bounds from Behrend’s construction). The same graph con-
struction also shows, after examining the proof of Corollary 2.3.3, that in the triangle removal
2
lemma, Theorem 2.3.1, one cannot take 𝛿 = 𝑒 −𝑐 (log(1/𝜀) ) if the constant 𝑐 > 0 is too small.
In Proposition 2.4.4 we deduced an upper bound 𝑟 3 (𝑁)𝑁 ≤ 𝑟 ⌞ (2𝑁) on corner-free sets
using 3-AP-free √ sets. The Behrend construction then also gives a corner free subset of [𝑁] 2
of size ≥ 𝑁 2 𝑒 −𝐶 log 𝑁 .
Exercise 2.5.4 (Modifying Behrend’s construction). Prove that there √︁ is some constant
𝐶 > 0 so that for all 𝑁, there exists 𝐴 ⊆ [𝑁] with | 𝐴| ≥ 𝑁 exp(−𝐶 log 𝑁) so that there
do not exist 𝑤, 𝑦, 𝑥, 𝑧 ∈ 𝐴 not all equal and satisfying 𝑥 + 𝑦 + 𝑧 = 3𝑤.
72 Graph Regularity Method
Proof. We repeatedly apply the following statement, which is a simple consequence of the
definition of 𝜀-regularity (and a small extension of Lemma 2.2.3):
Given an 𝜀-regular pair (𝑋, 𝑌 ), and 𝐵 ⊆ 𝑌 with |𝐵| ≥ 𝜀 |𝑌 |, the number of vertices in 𝑋
with < (𝑑 (𝑋, 𝑌 ) − 𝜀) |𝐵| neighbors in 𝐵 is < 𝜀 |𝑋 |.
The number of vertices 𝑋1 with ≥ (𝑑1𝑖 − 𝜀) |𝑋𝑖 | neighbors in 𝑋𝑖 for each 𝑖 = 2, 3, 4 is
≥ (1 − 3𝜀) |𝑋1 |. Fix a choice of such an 𝑥1 ∈ 𝑋1 . For each 𝑖 = 2, 3, 4, let 𝑌𝑖 be the neighbors
of 𝑥 1 in 𝑋𝑖 , so that |𝑌𝑖 | ≥ (𝑑1𝑖 − 𝜀) |𝑋𝑖 |.
𝑋1
𝑥1
𝑥2
𝑍4
𝑌2 𝑌4
𝑌2 𝑌4
𝑋2 𝑋4
𝑌3 𝑍3
𝑋3 𝑌3
The number of vertices in 𝑌2 with ≥ (𝑑2𝑖 − 𝜀) |𝑌𝑖 | common neighbors in 𝑌𝑖 for each 𝑖 = 3, 4
2.6 Graph Counting and Removal Lemmas 73
is ≥ |𝑌2 | − 2𝜀 |𝑋2 | ≥ (𝑑12 − 3𝜀) |𝑋2 |. Fix a choice of such an 𝑥2 ∈ 𝑌2 . For each 𝑖 = 3, 4, let 𝑍𝑖
be the neighbors of 𝑥 2 in 𝑌𝑖 .
For each 𝑖 = 3, 4, |𝑍𝑖 | ≥ (𝑑1𝑖 − 𝜀) (𝑑2𝑖 − 𝜀) |𝑋𝑖 | ≥ 𝜀 |𝑋𝑖 |, and so
𝑒(𝑍3 , 𝑍4 ) ≥ (𝑑34 − 𝜀) |𝑍3 | |𝑍4 |
≥ (𝑑34 − 𝜀) · (𝑑13 − 𝜀) (𝑑23 − 𝜀) |𝑋3 | · (𝑑14 − 𝜀) (𝑑24 − 𝜀) |𝑋4 | .
Any edge between 𝑍3 and 𝑍4 forms a 𝐾4 together with 𝑥 1 and 𝑥 2 . Multiplying the above
quantity with the earlier lower bounds on the number of choices of 𝑥1 and 𝑥2 gives the
result. □
The same strategy works more generally for counting any graph. To find copies of 𝐻, we
embed vertices of 𝐻 one at a time.
Theorem 2.6.2 (Graph counting lemma)
For every graph 𝐻 and real 𝛿 > 0, there exists an 𝜀 > 0 such that the following is true.
Let 𝐺 be a graph, and 𝑋𝑖 ⊆ 𝑉 (𝐺) for each 𝑖 ∈ 𝑉 (𝐻) such that for each 𝑖 𝑗 ∈ 𝐸 (𝐻),
(𝑋𝑖 , 𝑋 𝑗 ) is an 𝜀-regular pair with edge density 𝑑𝑖 𝑗 := 𝑑 (𝑋𝑖 , 𝑋 𝑗 ) ≥ 𝛿. Then the number of
graph homomorphisms 𝐻 → 𝐺 where each 𝑖 ∈ 𝑉 (𝐻) is mapped to 𝑋𝑖 is
Ö Ö
≥ (1 − 𝛿) (𝑑𝑖 𝑗 − 𝛿) |𝑋𝑖 |.
𝑖 𝑗 ∈𝐸 (𝐻 ) 𝑖 ∈𝑉 (𝐻 )
Remark 2.6.3. (a) For a fixed 𝐻, as |𝑋𝑖 | → ∞ for each 𝑖, all but a negligible fraction of
such homomorphisms from 𝐻 are injective (i.e., yielding a copy of 𝐻 as a subgraph).
(b) It is useful (and in fact equivalent) to think about the setting where 𝐺 is a multipartite
graph with parts 𝑋𝑖 , as illustrated below.
3 𝑋2 𝑋3
2
−→
1 4 𝑋1 𝑋4
𝐻
𝐺
In the multipartite setting, we see that the graph counting lemma can be adapted to variants
such as counting induced copies of 𝐻. Indeed, an induced copy of 𝐻 is the same as a 𝑣(𝐻)-
clique in an auxiliary graph 𝐺 ′ obtained by replacing the bipartite graph in 𝐺 between 𝑋𝑖
and 𝑋 𝑗 by its complementary bipartite graph between 𝑋𝑖 and 𝑋 𝑗 for each 𝑖 𝑗 ∉ 𝐸 (𝐻).
𝑋1 𝑋1
in 𝐺 12 𝐺 13 ⇐⇒ in 𝐺 12 𝐺 13
𝑋2 𝐺 23 𝑋3 𝑋2 𝐺 23 𝑋3
induced 𝐻 𝐺 𝐾 𝑣 (𝐻 ) modified 𝐺
74 Graph Regularity Method
(c) We will see a different proof in Section 4.5 using the language of graphons. There,
instead of embedding 𝐻 one vertex at a time, we compare the density of 𝐻 and 𝐻 \ {𝑒}.
We establish the following stronger statement, which has the additional advantage that one
can choose the regularity parameter 𝜀 to depend on the maximum degree of 𝐻 rather than
𝐻 itself. You may wish to skip reading the proof, as it is notationally rather heavy. The main
ideas were already illustrated in the 𝐾4 counting lemma.
Furthermore, if |𝑋𝑖 | ≥ 𝑣(𝐻)/𝜀 for each 𝑖, then there exists such a homomorphism 𝐻 → 𝐺
that is injective (i.e., an embedding of 𝐻 as a subgraph).
Proof. Let us order and label the vertices of 𝐻 by 1, . . . , 𝑣(𝐻) arbitrarily. We will select
vertices 𝑥1 ∈ 𝑋1 , 𝑥2 ∈ 𝑋2 , . . . in order. The idea is to always make sure that they have enough
neighbors in 𝐺 so that there are many ways to continue the embedding of 𝐻. We say that a
partial embedding 𝑥 1 , . . . , 𝑥 𝑠−1 (here partial embedding means that 𝑥𝑖 𝑥 𝑗 ∈ 𝐸 (𝐺) whenever
𝑖 𝑗 ∈ 𝐸 (𝐻) for all the 𝑥 𝑖 ’s chosen so far) is abundant if for each 𝑗 ≥ 𝑠, the number of
valid extensions 𝑥 𝑗 ∈ 𝑋 𝑗 (meaning that 𝑥 𝑖 𝑥 𝑗 ∈ 𝐸 (𝐺) whenever 𝑖 < 𝑠 and 𝑖 𝑗 ∈ 𝐸 (𝐻)) is
Î
≥ |𝑋 𝑗 | 𝑖<𝑠:𝑖 𝑗 ∈𝐸 (𝐻 ) (𝑑𝑖 𝑗 − 𝜀).
For each 𝑠 = 1, 2, . . . , 𝑣(𝐻) in order, suppose we have already fixed an abundant partial
embedding 𝑥1 , . . . , 𝑥 𝑠−1 . For each 𝑗 ≥ 𝑠, let
𝑌 𝑗 = {𝑥 𝑗 ∈ 𝑋 𝑗 : 𝑥 𝑖 𝑥 𝑗 ∈ 𝐸 (𝐺) whenever 𝑖 < 𝑠 and 𝑖 𝑗 ∈ 𝐸 (𝐻)}
be the set of valid extensions of the 𝑗-th vertex in 𝑋 𝑗 given the partial embeddings of
𝑥1 , . . . , 𝑥 𝑠−1 , so that the abundance hypothesis gives
Ö
|𝑌 𝑗 | ≥ |𝑋 𝑗 | (𝑑𝑖 𝑗 − 𝜀) ≥ (𝜀 1/Δ ) | {𝑖<𝑠:𝑖 𝑗 ∈𝐸 (𝐻 ) } | |𝑋 𝑗 | ≥ 𝜀|𝑋 𝑗 |.
𝑖<𝑠
𝑖 𝑗 ∈𝐸 (𝐻 )
Thus, as in the proof of Proposition 2.6.1 for 𝐾4 , the number of choices 𝑥 𝑠 ∈ 𝑋𝑠 that would
extend 𝑥1 , . . . , 𝑥 𝑠−1 to an abundant partial embedding is
≥ |𝑌𝑠 | − |{𝑖 > 𝑠 : 𝑠𝑖 ∈ 𝐸 (𝐻)}| 𝜀 |𝑋𝑠 |
Ö
≥ |𝑋𝑠 | (𝑑𝑖 𝑗 − 𝜀) − |{𝑖 > 𝑠 : 𝑠𝑖 ∈ 𝐸 (𝐻)}| 𝜀 |𝑋𝑠 | . (†)
𝑖<𝑠
𝑖𝑠∈𝐸 (𝐻 )
Otherwise we can absorb the second term into the product and obtain
Ö Ö
(†) ≥ |𝑋𝑠 | (𝑑𝑖 𝑗 − 𝜀) − (Δ − 1)𝜀 |𝑋𝑠 | ≥ |𝑋𝑠 | (𝑑𝑖 𝑗 − Δ𝜀 1/Δ ).
𝑖<𝑠 𝑖<𝑠
𝑖𝑠∈𝐸 (𝐻 ) 𝑖𝑠∈𝐸 (𝐻 )
Fix such a choice of 𝑥 𝑠 . And now we move onto embedding the next vertex 𝑥 𝑠+1 .
Multiplying together these lower bounds for the number of choices of each 𝑥 𝑠 over all
𝑠 = 1, . . . , 𝑣(𝐻), we obtain the lower bound on the number of homomorphisms 𝐻 → 𝐺.
Finally, note that in both cases (†) ≥ 𝜀 |𝑋𝑠 |, and so if |𝑋𝑠 | ≥ 𝑣(𝐻)/𝜀, then (†) ≥ 𝑣(𝐻) and
so we can choose each 𝑥 𝑠 to be distinct from the previously embedded vertices 𝑥1 , . . . , 𝑥 𝑠−1 ,
thereby yielding an injective homomorphism. □
The next exercise asks you to show that, if 𝐻 is bipartite, then one can prove the 𝐻-removal
lemma without using regularity, and thereby getting a much better bound.
Exercise 2.6.6 (Removal lemma for bipartite graphs with polynomial bounds). Prove
that for every bipartite graph 𝐻, there is a constant 𝐶 such that for every 𝜀 > 0, every
𝑛-vertex graph with fewer than 𝜀𝐶 𝑛𝑣 (𝐻 ) copies of 𝐻 can be made 𝐻-free by removing at
most 𝜀𝑛2 edges.
Erdős–Stone–Simonovits theorem
As another application, let us give a different proof of the Erdős–Stone–Simonovits theorem
from Section 1.5, restated below, which gives the asymptotics (up to a +𝑜(𝑛2 ) error term)
for ex(𝑛, 𝐻), the maximum number of edges in an 𝑛-vertex 𝐻-free graph. We saw a proof in
Section 1.5 using supersaturation and the hypergraph KST theorem. The proof below follows
the partition-clean-count strategy in Remark 2.3.2 combined with an application of Turán’s
theorem. A common feature of many regularity applications is that they “boost” an exact
extremal graph theoretic result (e.g., Turán’s theorem) to an asymptotic result involving more
complex derived structures (e.g., from the existence of a copy of 𝐾𝑟 to embedding a complete
𝑟-partite graph).
76 Graph Regularity Method
𝑉𝑖1
𝐻 𝐺′
𝑉𝑖2 𝑉𝑖3
By Turán’s theorem (Corollary 1.2.6), 𝐺 ′ contains a copy of 𝐾 𝜒 (𝐻 ) . Suppose that the 𝜒(𝐻)
vertices of this 𝐾 𝜒 (𝐻 ) land in 𝑉𝑖1 , · · · , 𝑉𝑖𝜒 (𝐻) (allowing repeated indices). Since each pair of
these sets is 𝜂-regular, has edge density ≥ 𝜀/8, and each has size ≥ 𝜀𝑛/(8𝑚), applying the
graph counting lemma, Theorem 2.6.2, we see that as long as 𝜂 is sufficiently small in terms
of 𝜀 and 𝐻, and 𝑛 is sufficiently large, there exists an injective embedding of 𝐻 into 𝐺 ′
where the vertices of 𝐻 in the 𝑟-th color class are mapped into 𝑉𝑖𝑟 . So 𝐺 contains 𝐻 as a
subgraph. □
Exercise 2.7.2 (Ramsey’s theorem in a nearly complete graph). Show that for every 𝐻
there exists some 𝛿 > 0 such that for all sufficiently large 𝑛, if 𝐺 is an 𝑛-vertex graph with
average degree at least (1 − 𝛿)𝑛 and the edges of 𝐺 are colored using 2 colors, then there
is a monochromatic copy of 𝐻.
Exercise 2.7.3 (Nearly homogeneous subset). Show that for every 𝐻 and 𝜀 > 0 there
exists 𝛿 > 0 such that every graph on 𝑛 vertices without an induced copy of 𝐻 contains an
induced subgraph on at least 𝛿𝑛 vertices whose edge density is at most 𝜀 or at least 1 − 𝜀.
Exercise 2.7.4 (Ramsey numbers of bounded degree graphs). Show that for every Δ
there exists a constant 𝐶Δ so that if 𝐻 is a graph with maximum degree at most Δ, then every
2-edge-coloring of a complete graph on at least 𝐶Δ 𝑣(𝐻) vertices contains a monochromatic
copy of 𝐻.
Exercise 2.7.6∗ (Induced Ramsey). Show that for every graph 𝐻 there is some graph 𝐺
such that if the edges of 𝐺 are colored with two colors, then some induced subgraph of 𝐺
is a monochromatic copy of 𝐻.
Exercise 2.7.7∗ (Finding a degree-regular subgraph). Show that for every 𝛼 > 0, there
exists 𝛽 > 0 such that every graph on 𝑛 vertices with at least 𝛼𝑛2 edges contains a 𝑑-regular
subgraph for some 𝑑 ≥ 𝛽𝑛 (here 𝑑-regular refers to every vertex having degree 𝑑).
Remark 2.8.2. Given two graphs on the same vertex set, the minimum number of edges
that one needs to add/delete to obtain the second graph from the first graph is called the edit
distance between the two graphs. The induced graph removal lemma can be rephrased as
saying that every graph with few induced copies of 𝐻 is close in edit distance to an induced
𝐻-free graph.
Unlike the previous graph removal lemma, for the induced version, it is important that
we allow both adding and deleting edges. The statement would be false if we only allow
edge deletion but not addition. For example, suppose 𝐺 = 𝐾𝑛 \ 𝐾3 (i.e., a complete graph on
𝑛 vertices with three edges of a single triangle removed). If 𝐻 is an empty graph on three
vertices, then 𝐺 has exactly one induced copy of 𝐻, but 𝐺 cannot be made induced 𝐻-free
by only deleting edges.
To see why the earlier proof of the graph removal lemma (Theorem 2.6.5) does not apply
in a straightforward way to prove the induced graph removal lemma, let us attempt to follow
the earlier strategy and see where things go wrong.
First we apply the graph regularity lemma. Then we need to clean up the graph. In the
induced graph removal lemma, edges and non-edges play symmetric roles. We can handle
low density pairs (edge density less than 𝜀) by removing edges between such pairs. Naturally,
for the induced graph removal lemma, we also need to handle high density pairs (density
more than 1 − 𝜀), and we can add all the edges between such pairs. However, it is not clear
what to do with irregular pairs. Earlier, we just removed all edges between irregular pairs. The
problem is that this may create many induced copies of 𝐻 that were not present previously
(see illustration below). Likewise, we cannot simply add all edges between irregular pairs.
irregular
Perhaps we can always find a regularity partition without irregular pairs? Unfortunately, this
is false, as shown in Exercise 2.1.24. One must allow for the possibility of irregular pairs.
Remark 2.8.4. One should think of the sequence 𝜀 1 , 𝜀 2 , . . . as rapidly decreasing. This
strong regularity lemma outputs a refining pair of partitions P and Q such that P is regular,
Q is extremely regular, and P and Q are close to each other (as captured by 𝑞(P) ≤ 𝑞(Q) ≤
𝑞(P) + 𝜀0 ; see Lemma 2.8.7 below). A key point here is that we demand Q to be extremely
regular relative to the number of parts of P. The more parts P has, the more regular Q should
be.
Proof. We repeatedly apply the following version of Szemerédi’s regularity lemma:
Theorem 2.1.19 (restated): For all 𝜀 > 0, there exists an integer 𝑀0 = 𝑀0 (𝜀) so that for
all partitions P of 𝑉 (𝐺), there exists a refinement P ′ of P with each part in P refined into
≤ 𝑀0 parts so that P ′ is 𝜀-regular.
By iteratively applying the above regularity partition, we obtain a sequence of partitions
P0 , P1 , . . . of 𝑉 (𝐺) starting with P0 = {𝑉 (𝐺)} being the trivial partition. Each P𝑖+1 is
𝜀 | P𝑖 | -regular and refines P𝑖 . The regularity lemma guarantees that we can have |P𝑖+1 | ≤
|P𝑖 | 𝑀0 (𝜀 | P𝑖 | ).
Since 0 ≤ 𝑞(·) ≤ 1, there exists 𝑖 ≤ 𝜀0−1 so that 𝑞(P𝑖+1 ) ≤ 𝑞(P𝑖 ) + 𝜀0 . Then setting P = P𝑖
and Q = P𝑖+1 satisfies the desired requirements. Indeed, the number of parts of Q is bounded
by a function of the sequence (𝜀0 , 𝜀1 , . . . ) since there are a bounded number of iterations
and each iteration produced a refining partition with a bounded number of parts. □
Remark 2.8.5 (Bounds in the strong regularity lemma). The bound on 𝑀 produced by the
proof depends on the sequence (𝜀 0 , 𝜀1 , . . . ). In the application below, we use 𝜀 𝑖 = 𝜀 0 /poly(𝑖).
Then the size of 𝑀 is comparable to applying 𝑀0 to 𝜀 0 in succession 1/𝜀 0 times. Note that 𝑀0
is a tower function, and this makes 𝑀 a tower function iterated 𝑖 times. This iterated tower
function is called the wowzer function: wowzer(𝒌) := tower(tower(· · · (tower(𝑘)) · · · ))
(with 𝑘 applications of tower). The wowzer function is one step up from the tower function
in the Ackermann hierarchy. It grows extremely quickly.
Remark 2.8.6 (Equitability). We can further ensure that the parts have nearly equal size.
This can be done by adapting the ideas sketched in the proof sketch of Theorem 2.1.20.
The following lemma explains the significance of the inequality 𝑞(Q) ≤ 𝑞(P) + 𝜀 from
earlier.
80 Graph Regularity Method
Proof. Let 𝑥, 𝑦 ∈ 𝑉 (𝐺) be chosen uniformly at random. As in the proof of Lemma 2.1.11,
we have 𝑞(P) = E[𝑍 P2 ], where 𝑍 P = 𝑑 (𝑉𝑥 , 𝑉𝑦 ). Likewise, 𝑞(Q) = E[𝑍 Q2 ], where 𝑍 Q =
𝑑 (𝑊 𝑥 , 𝑊 𝑦 ).
We have
𝑞(Q) − 𝑞(P) = E[𝑍 Q2 ] − E[𝑍 P2 ] = E[(𝑍 Q − 𝑍 P ) 2 ],
where the final step above is a “Pythagorean identity.”
𝑍Q
𝑍P
𝑍Q − 𝑍P
𝑉1
𝑊1
𝑊2 𝑊3
𝑉2 𝑉3
Remark 2.8.10. It is significant that all (rather than nearly all) pairs (𝑊𝑖 , 𝑊 𝑗 ) are regular.
We will need this fact in our applications below.
Proof sketch.. Here we show how to prove a slightly weaker result where 𝑖 ≤ 𝑗 in (b) is
replaced by 𝑖 < 𝑗. In other words, this proof does not promise that each 𝑊𝑖 is 𝜀 𝑘 -regular. To
obtain the stronger conclusion as stated (requiring each 𝑊𝑖 to be regular with itself), we can
adapt the ideas in Exercise 2.1.27. We omit the details.
By decreasing the 𝜀𝑖 ’s if needed (we can do this since a smaller sequence of 𝜀𝑖 ’s yields a
stronger conclusion), we may assume that 𝜀𝑖 ≤ 1/(10𝑖 2 ) and 𝜀 𝑖 ≤ 𝜀 0 /4 for every 𝑖 ≥ 1.
Let us apply the strong regularity lemma, Theorem 2.8.3, with equitable partitions (see
above Remark 2.8.6). That is, we have (we make the simplifying assumption that all partitions
are exactly equitable, to avoid unimportant technicalities):
• an equitable 𝜀0 -regular partition P = {𝑉1 , . . . , 𝑉𝑘 } of 𝑉 (𝐺) and
• an equitable 𝜀 𝑘 -regular partition Q refining P
satisfying
• 𝑞(Q) ≤ 𝑞(P) + 𝜀03 /8, and
• |Q| ≤ 𝑀 = 𝑀 (𝜀0 , 𝜀1 , . . . ).
Inside each part 𝑉𝑖 , let us choose a part 𝑊𝑖 of Q uniformly at random. Since |Q| ≤ 𝑀,
the equitability assumption implies that each part of Q has size ≥ 𝛿𝑛 for some constant
𝛿 = 𝛿(𝜀0 , 𝜀1 , . . . ). So (a) is satisfied.
Since Q is 𝜀 𝑘 -regular, all but an 𝜀 𝑘 -fraction of pairs of parts of Q are 𝜀 𝑘 -regular. Summing
over all 𝑖 < 𝑗, using linearity of expectations, the expected the number of pairs (𝑊𝑖 , 𝑊 𝑗 ) that
are not 𝜀 𝑘 -regular is ≤ 𝜀 𝑘 𝑘 2 ≤ 1/10. It follows that with probability ≥ 9/10, (𝑊𝑖 , 𝑊 𝑗 ) is
𝜀 𝑘 -regular for all 𝑖 < 𝑗, so (b) is satisfied (this argument ignores 𝑖 = 𝑗 as mentioned at the
beginning of the proof).
Let 𝑋 denote the number of pairs (𝑖, 𝑗) ∈ [𝑘] 2 with 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) > 𝜀0 . Since
𝑞(Q) ≤ 𝑞(P) + (𝜀 0 /2) 3 , by Lemma 2.8.7 and linearity of expectations, E𝑋 ≤ (𝜀0 /2)𝑘 2 . So
by Markov’s inequality, 𝑋 ≤ 𝜀 0 𝑘 2 with probability ≥ 1/2, so that (c) is satisfied.
It follows that (a) and (b) are both satisfied with probability ≥ 1 − 1/10 − 1/2. Therefore,
there exist valid choices of 𝑊𝑖 ’s. □
(a) (𝑊𝑖 , 𝑊 𝑗 ) is 𝜀 ′ -regular for every 𝑖 ≤ 𝑗, with some sufficiently small constant 𝜀 ′ > 0
depending on 𝜀 and 𝐻,
(b) 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/8 for all but < 𝜀𝑘 2 /8 pairs (𝑖, 𝑗) ∈ [𝑘] 2 , and
(c) |𝑊𝑖 | ≥ 𝛿0 𝑛, for some constant 𝛿0 depending only on 𝜀 and 𝐻.
Now we clean the graph. For each pair 𝑖 ≤ 𝑗 (including 𝑖 = 𝑗),
• if 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/8, then remove all edges between (𝑉𝑖 , 𝑉 𝑗 ), and
• if 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≥ 1 − 𝜀/8, then add all edges between (𝑉𝑖 , 𝑉 𝑗 ).
Note that we are not simply add/removing edges within each pair (𝑊𝑖 , 𝑊 𝑗 ), but rather all of
(𝑉𝑖 , 𝑉 𝑗 ). To bound the number of edges add/deleted, recall (b) from the previous paragraph. If
𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/8 and 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/4, then 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) ≤ 𝜀/4, and the number
of edges in all such (𝑉𝑖 , 𝑉 𝑗 ) is at most 𝜀𝑛2 /4. Likewise for 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≥ 1 − 𝜀/8. For the
remaining < 𝜀𝑘 2 /8 pairs (𝑖, 𝑗) not satisfying 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/8, the total number
of edges among all such pairs is at most 𝜀𝑛2 /8. All together, we added/deleted < 𝜀𝑛2 edges
from 𝐺. Call the resulting graph 𝐺 ′ . There are no irregular pairs (𝑊𝑖 , 𝑊 𝑗 ) for us to worry
about.
It remains to show that 𝐺 ′ is induced 𝐻-free. Suppose otherwise. Let us count induced
copies of 𝐻 in 𝐺 as in the proof of the graph removal lemma, Theorem 2.6.5. We have
some induced copy of 𝐻 in 𝐺 ′ , with each vertex 𝑣 ∈ 𝑉 (𝐻) embedded in 𝑉𝜙 (𝑣) for some
𝜙 : 𝑉 (𝐻) → [𝑘].
Consider a pair of distinct vertices 𝑢, 𝑣 of 𝐻. If 𝑢𝑣 ∈ 𝐸 (𝐻), there must be an edge in 𝐺 ′
between 𝑉 𝜙 (𝑢) and 𝑉𝜙 (𝑣) (here 𝜙(𝑢) and 𝜙(𝑣) are not necessarily different). So we must not
have deleted all the edges in 𝐺 between 𝑉𝜙 (𝑢) and 𝑉𝜙 (𝑣) in the cleaning step. By the cleaning
algorithm above, this means that 𝑑𝐺 (𝑊𝑖 , 𝑊 𝑗 ) > 𝜀/8.
Likewise, if 𝑢𝑣 ∉ 𝐸 (𝐻) for any pair of distinct 𝑢, 𝑣 ∈ 𝑉 (𝐻), we have 𝑑𝐺 (𝑊𝑖 , 𝑊 𝑗 ) < 1−𝜀/8.
Since (𝑊𝑖 , 𝑊 𝑗 ) is 𝜀 ′ -regular in 𝐺 for every 𝑖 ≤ 𝑗, provided that 𝜀 ′ is small enough (in
terms of 𝜀 and 𝐻), the graph counting lemma, (Theorem 2.6.2 with the induced variation as
in Remark 2.6.3(b)) applied to 𝐺 gives
𝜀 ( 𝑣 (𝐻)
2 )
# induced copies of 𝐻 in 𝐺 ≥ (1 − 𝜀) (𝛿0 𝑛) 𝑣 (𝐻 ) =: 𝛿𝑛𝑣 (𝐻 )
10
(recall |𝑊𝑖 | ≥ 𝛿0 𝑛). Setting 𝛿 as above, this contradicts the hypothesis that 𝐺 has < 𝛿𝑛𝑣 (𝐻 )
copies of 𝐻. Thus 𝐺 ′ must be induced 𝐻-free. □
Remark 2.8.12. The presence of ℎ0 may seem a bit strange at first. In the next section, we
will see a reformulation of this theorem in the language of property testing, where ℎ0 comes
up naturally.
Proof. The proof is mostly the same as the proof of the induced graph removal lemma that
we just saw. The main tricky issue here is how to choose the regularity parameter 𝜀 ′ for every
pair (𝑊𝑖 , 𝑊 𝑗 ) in condition (a) of the earlier proof. Previously, we did not use the full strength
of Theorem 2.8.9, which allowed 𝜀 ′ to depend on 𝑘, but now we are going to use it. Recall
that we had to make sure that this 𝜀 ′ was chosen to be small enough for the 𝐻-counting
lemma to work. Now that there are possibly infinitely many graphs in H , we cannot naively
choose 𝜀 ′ to be sufficiently small. The main point of the proof is to reduce the problem to a
finite subset of H for each 𝑘.
Define a template 𝑇 to be an edge-coloring of the looped 𝑘-clique (i.e., a complete graph
on 𝑘 vertices along with a loop at a every vertex) where each edge is colored by one of
{white, black, gray}. We say that a graph 𝐻 is compatible with a template 𝑇 if there exists a
map 𝜙 : 𝑉 (𝐻) → 𝑉 (𝑇) such that for every distinct pair 𝑢, 𝑣 of vertices of 𝐻:
• if 𝑢𝑣 ∈ 𝐸 (𝐻), then 𝜙(𝑢)𝜙(𝑣) is colored black or gray in 𝑇; and
• if 𝑢𝑣 ∉ 𝐸 (𝐻), then 𝜙(𝑢)𝜙(𝑣) is colored white or gray in 𝑇.
That is, a black edge in a template means an edge of 𝐻, a white edge means a non-edge of
𝐻, and a gray edge is a wildcard. An example is shown below.
𝑏
𝑎
𝑐 𝑏 𝜙 black
−→ gray
𝑏 𝑐 (none) white
𝑎 𝑐
𝐻 𝑇
As another example, every graph is compatible with every completely gray template.
For every template 𝑇, pick some representative 𝐻𝑇 ∈ H compatible with 𝑇, as long as
such a representative exists (and ignore 𝑇 otherwise). A graph in H is allowed to be the
representative of more than one template. Let H𝑘 be a set of all 𝐻 ∈ H that arise as the
representative of some 𝑘-vertex template. Note that H𝑘 is finite since there are finitely many
𝑘-vertex templates. We can pick each 𝜀 𝑘 > 0 to be small enough so that the conclusion of
the counting step later can be guaranteed for all elements of H𝑘 .
Now we proceed nearly identically as in the proof of the induced removal lemma, Theo-
rem 2.8.1, that we just saw. In applying Theorem 2.8.9 to obtain the partition 𝑉1 ∪ · · · ∪ 𝑉𝑘
and finding 𝑊𝑖 ⊆ 𝑉𝑖 , we ensure the following condition instead of the earlier (a):
(a) (𝑊𝑖 , 𝑊 𝑗 ) is 𝜀 𝑘 -regular for every 𝑖 ≤ 𝑗.
We set ℎ0 to be the maximum number of vertices of a graph in H𝑘 .
Now we do the cleaning step. Along the way, we create a 𝑘-vertex template 𝑇 with vertex
set [𝑘] corresponding to the parts {𝑉1 , . . . , 𝑉𝑘 } of the partition. For each 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛,
• if 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/4, then remove all edges between (𝑉𝑖 , 𝑉 𝑗 ) from 𝐺, and color the edge
𝑖 𝑗 in template 𝑇 white;
84 Graph Regularity Method
• if 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≥ 1 − 𝜀/4, then add all edges between (𝑉𝑖 , 𝑉 𝑗 ), and color the edge 𝑖 𝑗 in
template 𝑇 black;
• otherwise, color the edge in 𝑖 𝑗 in template 𝑇 gray.
Finally, suppose some induced 𝐻 ∈ H remains in 𝐺 ′ . Due to our cleaning procedure, 𝐻
must be compatible with the template 𝑇. Then the representative 𝐻𝑇 ∈ H𝑘 of 𝑇 is a graph on
at most ℎ0 vertices, and furthermore, the counting lemma guarantees that, provided 𝜀 𝑘 > 0
is small enough (subject to a finite number of pre-chosen constraints, one for each element
of H𝑘 ), the number of copies of 𝐻𝑇 in 𝐺 is ≥ 𝛿𝑛𝑣 (𝐻𝑇 ) for some constant 𝛿 > 0 that only
depends on 𝜀 and H . This contradicts the hypothesis, and thus 𝐺 ′ is induced H -free. □
All the techniques above work nearly verbatim for a generalization to colored graphs.
The induced graph removal lemma corresponds to the special case 𝑟 = 2, with the two
colors representing edges and non-edges respectively.
these 𝐾 vertices, then output that 𝐺 is triangle-free; else output that 𝐺 is 𝜀-far from
triangle-free.
Probabilistic guarantees.
(a) If the input graph 𝐺 is triangle-free, then the algorithm always correctly outputs
that 𝐺 is triangle-free;
(b) If the input graph 𝐺 is 𝜀-far from triangle-free, then with probability ≥ 0.99 the
algorithm outputs that 𝐺 is 𝜀-far from triangle-free;
(c) We do not make any guarantees when the input graph is neither triangle-free nor
𝜀-far from triangle-free.
Remark 2.9.2. This is an example of a one-sided tester, meaning that it always (non-
probabilistically) outputs a correct answer when 𝐺 satisfies property P and only has a
probabilistic guarantee when 𝐺 does not satisfy property 𝐺. (In contrast, a two-sided tester
would have probabilistic guarantees for both situations.)
For a one-sided tester, there is nothing special about the number 0.99 above in (b). It can
be any positive constant 𝛿 > 0. If we run the algorithm 𝑚 times, then the probability of
success improves from ≥ 𝛿 to ≥ 1 − (1 − 𝛿) 𝑚 , which can be made arbitrarily close to 1 if we
choose 𝑚 large enough.
The probabilistic guarantee turns out to be essentially a rephrasing of the triangle removal
lemma.
Proof. If the graph 𝐺 is triangle-free, the algorithm clearly always outputs correctly. On
the other hand, if 𝐺 is 𝜀-far
from triangle-free, then by the triangle removal lemma (The-
orem 2.3.1), 𝐺 has ≥ 𝛿 𝑛3 triangles with some constant 𝛿 = 𝛿(𝜀) > 0. If we sample three
vertices from 𝐺 uniformly at random, then then they form a triangle with probability ≥ 𝛿. And
if run 𝐾/3 independent trials, then the probability that we see a triangle is ≥ 1 − (1 − 𝛿) 𝐾/3 ,
which is ≥ 0.99 as long as 𝐾 is a sufficiently large constant (depending on 𝛿, which in turn
depends on 𝜀).
In the algorithm as stated in the theorem, 𝐾 vertices are sampled without replacement.
Above we had 𝐾 independent trials of picking a triple of vertices at random. But this difference
hardly matters. We can couple the two processes by adding additional random verrtices to
the latter process until we see 𝐾 distinct vertices. □
Just as how the guarantee of the above algorithm is essentially a rephrasing of the triangle
removal lemma, other graph removal lemmas can be rephrased as graph property testing
theorems. For the infinite induced graph removal lemma, Theorem 2.8.11, we can rephrase
the result in terms of graph property testing for hereditary properties.
A graph property P is hereditary if it is closed under vertex-deletion: if 𝐺 ∈ P, then
every induced subgraph of 𝐺 is in P. Here are some examples of hereditary graph properties:
𝐻-free, induced 𝐻-free, planar, 3-colorable, perfect. Every hereditary property P can be
characterized as the set of induced H -free graph for some (possibly infinite) family of graphs
H ; we can take H = {𝐻 : 𝐻 ∉ P}.
86 Graph Regularity Method
Recall Szemerédi’s theorem says that for every fixed 𝑘 ≥ 3, every 𝑘-AP-free subset of
[𝑁] has size 𝑜(𝑁). We will prove it as a corollary of the hypergraph removal lemma for
2.10 Hypergraph Removal and Szemerédi’s Theorem 87
Corollary 2.10.2
If 𝐺 is a 3-graph such that every edge is contained in a unique tetrahedron (i.e., a clique
on four vertices), then 𝐺 has 𝑜(𝑛3 ) edges.
Can this result be used to prove the hypergraph removal lemma? Unfortunately, no.
Recall that our graph regularity recipe (Remark 2.3.2) involves three steps: partition, clean,
and count. It turns out that no counting lemma is possible for the above notion of 3-graph
regularity.
The notion of 𝜀-regularity is supposed to model pseudorandomness. So why don’t we
try truly random hypergraphs and see what happens? Let us consider two different random
3-graph constructions:
(a) First pick constants 𝑝, 𝑞 ∈ [0, 1] . Build a random graph 𝐺 (2) = G(𝑛, 𝑝), an ordinary
Erdős–Rényi graph. Then construct 𝐺 (3) by including each triangle of 𝐺 (2) as an
edge of 𝐺 (3) with probability 𝑞. Call this 3-graph 𝑋.
(b) For each possible edge (i.e. triple of vertices), include the edge with probability 𝑝 3 𝑞,
independent of all other edges. Call this 3-graph 𝑌 .
2.11 Hypergraph Regularity 89
The edge density in both 𝑋 and 𝑌 are close to 𝑝 3 𝑞, even when restricted to linearly sized
triples of vertex subsets. So both graphs satisfy our above notion of 𝜀-regularity with high
probability. However, we can compute the tetrahedron densities in both of these graphs and
see that they do not match.
The tetrahedron density in 𝑋 is around 𝑞 4 times the 𝐾4 density in the underlying random
graph 𝐺 (2) . The 𝐾4 density in 𝐺 (2) is around 𝑝 6 . So the tetrahedron density in 𝑋 is around
𝑝6 𝑞4.
On the other hand, the tetrahedron density in 𝑌 is around ( 𝑝 3 𝑞) 4 , different from 𝑝 6 𝑞 4
earlier. So we should not expect a counting lemma with this notion of 𝜀-regularity. (Unless
the 3-graph we are counting is linear, as in the exercise below.)
Exercise 2.11.3. Under the notion of 3-graph regularity in Definition 2.11.1, formulate
and prove an 𝐻-counting lemma for every linear 3-graph 𝐻. Here a hypergraph is said to
be linear if every pair of its edges intersects in at most one vertex.
As hinted by the first random hypergraph above, a more useful notion of hypergraph
regularity should involve both vertex subsets as well as subsets of vertex-pairs (i.e., an
underlying 2-graph).
Given a 3-graph 𝐺, a regularity decomposition will consist of
(1) a partition of 𝑉2 into 2-graphs 𝐺 1(2) ∪ · · · ∪ 𝐺 𝑙(2) so that 𝐺 sits in a random-like way
on top of most triples of these 2-graphs (we won’t try to make it precise), and
(2) a partition of 𝑉 that gives an extremely regular partition for all 2-graphs 𝐺 1(2) , . . . , 𝐺 𝑙(2)
(this should be somewhat reminiscent of the strong graph regularity lemma from
Section 2.8).
For such a decomposition to be applicable, it should come with a corresponding counting
lemma.
There are several ways to make the above notions precise. Certain formulations make the
regularity partition easier to prove while the counting lemma harder, and some vice versa.
The interested readers should consult Rödl et al. (2005), Gowers (2007) (see Gowers (2006)
for an exposition of the case of 3-uniform hypergraphs), and Tao (2006) for three different
approaches to the hypergraph regularity lemma.
Remark 2.11.4 (Quantitative bounds). Whereas the proof of the graph regularity lemma
gives tower-type bounds tower(𝜀 −𝑂 (1) ), the proof of the 3-graph regularity lemma has
wowzer-type bounds. The 4-graph regularity lemma moves us one more step up in the Ack-
ermann hierarchy (i.e., iterating wowzer), and so on. Just as with the tower-type lower bound
(Theorem 2.1.17) for the graph regularity lemma, Ackermann type bounds are necessary for
hypergraph regularity as well (Moshkovitz & Shapira 2019).
Further Reading
For surveys on the graph regularity method and applications, see Komlós & Simonovits
(1996) and Komlós, Shokoufandeh, Simonovits, & Szemerédi (2002).
The survey Graph Removal Lemmas by Conlon & Fox (2013) discusses many variants,
extensions, and proof techniques of graph removal lemmas.
For a well-motivated introduction to the hypergraph regularity lemma, see the article
Quasirandomness, Counting and Regularity for 3-Uniform Hypergraphs by Gowers (2006).
90 Graph Regularity Method
Chapter Summary
• Szemerédi’s graph regularity lemma. For every 𝜀 > 0, there exists a constant 𝑀 such
that every graph has an 𝜀-regular partition into at most 𝑀 parts.
– Proof method: energy increment.
• Regularity method recipe: partition, clean, count.
• Graph counting lemma. The number of copies of 𝐻 among 𝜀-regular parts is similar to
random.
• Graph removal lemma. Fix 𝐻. Every 𝑛-vertex graph with 𝑜(𝑛 𝑣 (𝐻 ) ) copies of 𝐻 can be
made 𝐻-free by removing 𝑜(𝑛2 ) edges.
• Roth’s theorem can be proved by applying the triangle removal lemma to a graph whose
triangles correspond to 3-APs.
• Szemerédi’s theorem follows from the hypergraph removal lemma, whose proof uses
the hypergraph regularity method (not covered in this book).
• Induced removal lemma. Fix 𝐻. Every 𝑛-vertex graph with 𝑜(𝑛 𝑣 (𝐻 ) ) induced copies of
𝐻 can be made induced 𝐻-free by adding/removing 𝑜(𝑛2 ) edges
– Proof uses a strong regularity lemma, which involves iterating the earlier graph
regularity lemma.
• Every hereditary graph property is testable.
– One can distinguish graphs that have property P from those that are 𝜀-far from property
P (far in the sense of edit distance ≥ 𝜀𝑛2 ) by sampling a subgraph induced a constant
number of random vertices.
– The probabilistic guarantee is essentially equivalent to removal lemmas.
3
Pseudorandom Graphs
Chapter Highlights
• Equivalent notions of graph quasirandomness
• Role of eigenvalues in pseudorandomness
• Expander mixing lemma
• Eigenvalues of abelian Cayley graphs and the Fourier transform
• Quasirandom groups and representations theory
• Quasirandom Cayley graphs and Grothendieck’s inequality
• Alon–Boppana bound on the second eigenvalue of a 𝑑-regular graph
In the previous chapter on the graph regularity method, we saw that every graph can
be partitioned into a bounded number of vertex parts so that the graph looks “random-
like” between most pairs of parts. In this chapter, we dive further into how a graph can be
random-like.
Pseudorandomness is a concept prevalent in combinatorics, theoretical computer science,
and in many other areas. It specifies how a non-random object can behave like a truly random
object.
Example 3.0.1 (Pseudorandom generators). Suppose you want to generate a random num-
ber on a computer. In most systems and programming languages, you can do this easily with
a single command (e.g., rand()). The output is not actually truly random. Instead, the output
came from a pseudorandom generator, which is some function/algorithm that takes a seed as
input, and passes it through some sophisticated function, so that there is no practical way to
distinguish the output from a truly random object. In other words, the output is not actually
truly random, but for all practical purposes the output cannot be distinguished from a truly
random output.
Example 3.0.2 (Primes). In number theory, the prime numbers behave like a random se-
quence in many ways. The celebrated Riemann hypothesis and its generalizations give quanti-
tative predictions about how closely the primes behave in a certain specific way like a random
sequence. There is also something called Cramér’s random model for the primes that allows
one to make predictions about the asymptotic density of certain patterns in the primes (e.g.,
how many twin primes up to 𝑁 are there?). Empirical data support these predictions, and
they have been proved in certain cases. Nevertheless, there are still notorious open problems
such as the twin prime and Goldbach conjectures. Despite their pseudorandom behavior, the
primes are not random!
Example 3.0.3 (Normal numbers). It is very much believed that the digits of 𝜋 behave in
91
92 Pseudorandom Graphs
a random-like way, where every digit or block of digits appear with frequency similar to
that of a truly random
√ number. Such numbers are called normal. It is widely believed that
numbers such as 2, 𝜋, and 𝑒 are normal, but proofs remain elusive. Again, the digits of 𝜋
are deterministic, not random, but they are believed to behave pseudorandomly. On the other
hand, nearly all real numbers are normal, with the exceptions occupying only a measure zero
subset of the reals.
Coming back to graph theory. The Erdős–Rényi random graph 𝑮 (𝒏, 𝒑) is a random
𝑛-vertex graph where each edge appears with probability 𝑝 independently. Now, given some
specific graph (perhaps an instance of the random graph, or perhaps generated via some
other means), we can ask whether this graph, for the purpose of some intended application,
behaves similarly to that of a typical random graph. What are some useful ways to measure
the pseudorandomness of a graph? This is the main theme that we explore in this chapter.
Remark 3.1.3 (Single graph vs. a sequence of graphs). Strictly speaking, it does not make
sense to say whether a single graph is quasirandom, but we will abuse the definition as such
when it is clear that the graph we are referring to is part of a sequence.
Remark 3.1.4 (C4 condition). The C4 condition is surprising. It says that the 4-cycle density,
a single statistic, is equivalent to all the other quasirandomness conditions.
We will soon see below in Proposition 3.1.14 that the C4 can be replaced by the equivalent
condition that the number of labeled 4-cycles is ( 𝑝 4 + 𝑜(1))𝑛4 (rather than at most this
quantity).
Remark 3.1.5 (Checking quasirandomness). The discrepancy conditions are hard to verify
since they involve checking exponentially many sets. The other conditions can all be checked
in time polynomial in the size of the graph. So the equivalence gives us an algorithmically
efficient way to certify the discrepancy condition.
Remark 3.1.6 (Quantiative equivalences). Rather than stating these properties for a se-
quence of graphs using a decaying error term 𝑜(1), we can state a quantitative quasirandom-
ness hypothesis for a specific graph using an error tolerance parameter 𝜀. For example, we
can restate the discrepancy condition as follows.
DISC(𝜀): For all 𝑋, 𝑌 ⊆ 𝑉 (𝐺), |𝑒(𝑋, 𝑌 ) − 𝑝 |𝑋 | |𝑌 || < 𝜀𝑛2 .
Similar statements can be made for other quasirandom graph notions. The proof below
shows that these notions are equivalent up to a polynomial change in 𝜀; that is, for each pair
of properties, Prop1(𝜀) implies Prop2(𝐶𝜀 𝑐 ) for some constants 𝐶, 𝑐 > 0.
The following statement says that the 4-cycle density is always roughly at least as much as
random. Later in Chapter 5, we will see Sidorenko’s conjecture, which says that all bipartite
graphs have this property.
As a consequence, the C4 condition is equivalent to saying that the number of labeled
4-cycles is ( 𝑝 4 + 𝑜(1))𝑛4 (rather than at most).
Remark 3.1.15. Since all but 𝑂 (𝑛3 ) such closed walks use four distinct vertices, the above
statement implies that the number of labeled 4-cycles is at least ( 𝑝 4 − 𝑜(1))𝑛4 .
Proof. The number of closed walks of length 4 is
∑︁ 𝑦
𝑤
|{(𝑤, 𝑥, 𝑦, 𝑧) closed walk}| = |{𝑥 : 𝑤 ∼ 𝑥 ∼ 𝑦}| 2
!2
𝑤,𝑦
1 ∑︁
≥ 2 |{𝑥 : 𝑤 ∼ 𝑥 ∼ 𝑦}| 𝑤 𝑦
𝑛 𝑤,𝑦
!2
1 ∑︁ 𝑥
= 2 |{(𝑤, 𝑦) : 𝑤 ∼ 𝑥 ∼ 𝑦}|
𝑛
!2
𝑥
1 ∑︁ 2 𝑥
= 2 (deg 𝑥)
𝑛
!4
𝑥
1 ∑︁
≥ 4 deg 𝑥 𝑥
𝑛 𝑥
= (2𝑒(𝐺)) 4 /𝑛4 ≥ 𝑝 4 𝑛4
Here both inequality steps are due to Cauchy–Schwarz. On the right column is a pictorial
depiction of what is being counted by the inner sum on each line. These diagrams are a useful
way to keep track of the graph inequalities, especially when dealing with much larger graphs,
where the algebraic expressions get unwieldly. Note that each application of the Cauchy–
Schwarz inequality corresponds to “folding” the graph along a line of reflection. □
We shall prove the equivalences of Theorem 3.1.1 in the following way:
DISC′ DISC COUNT
CODEG C4 EIG
Proof that DISC implies DISC′ . Take 𝑌 = 𝑋 in DISC. (Note that 𝑒(𝑋, 𝑋) = 2𝑒(𝑋) and
|𝑋|
2
= |𝑋 | 2 /2 − 𝑂 (𝑛).) □
96 Pseudorandom Graphs
Proof that DISC′ implies DISC. We have the following “polarization identity”, together
with a proof by picture (recall 2𝑒(𝑋) = 𝑒(𝑋, 𝑋)):
𝑒(𝑋, 𝑌 ) = 𝑒(𝑋 ∪ 𝑌 ) + 𝑒(𝑋 ∩ 𝑌 ) − 𝑒(𝑋 \ 𝑌 ) − 𝑒(𝑌 \ 𝑋).
𝑋
𝑌 𝑌
𝑋 𝑌
∩
\
\
𝑋
𝑋 \𝑌
𝑋 ∩𝑌 + = + − −
𝑌\𝑋
We also have (below the 𝑂 (𝑛3 ) error term is due to walks of length 4 that use repeated
vertices)
∑︁
codeg(𝑢, 𝑣) 2 = # labeled 𝐶4 + 𝑂 (𝑛3 )
𝑢,𝑣
≤ 𝑝 4 𝑛4 + 𝑜(𝑛4 ).
Thus, by the Cauchy–Schwarz inequality,
2 ∑︁
1 ∑︁ 2
2
2
codeg(𝑢, 𝑣) − 𝑝 𝑛 ≤ codeg(𝑢, 𝑣) − 𝑝 2 𝑛
𝑛 𝑢,𝑣
∑︁ ∑︁
𝑢,𝑣
= codeg(𝑢, 𝑣) 2 − 2𝑝 2 𝑛 codeg(𝑢, 𝑣) + 𝑝 4 𝑛4
𝑢,𝑣 𝑢,𝑣
≤ 𝑝 𝑛 − 2𝑝 𝑛 · 𝑝 𝑛 + 𝑝 𝑛 + 𝑜(𝑛4 )
4 4 2 2 3 4 4
= 𝑜(𝑛4 ). □
3.1 Quasirandom Graphs 97
Remark 3.1.16. These calculations share the spirit of the second moment method in proba-
bilistic combinatorics. The condition C4 says that the variance of the codegree of two random
vertices is small.
Exercise 3.1.17. Show that if we modify the COEG condition to
∑︁
codeg(𝑢, 𝑣) − 𝑝 2 𝑛 = 𝑜(𝑛3 ),
𝑢,𝑣 ∈𝑉 (𝐺)
= 𝑝 2 𝑛3 − 2𝑝 2 𝑛3 + 𝑝 2 𝑛3 + 𝑜(𝑛3 )
= 𝑜(𝑛3 ). (3.1)
Now we bound the expression in DISC. We have
2
1 2 1 ∑︁
|𝑒(𝑋, 𝑌 ) − 𝑝 |𝑋 | |𝑌 || = (deg(𝑥, 𝑌 ) − 𝑝 |𝑌 |)
𝑛 𝑛 𝑥 ∈𝑋
∑︁
≤ (deg(𝑥, 𝑌 ) − 𝑝 |𝑌 |) 2 .
𝑥 ∈𝑋
The above Cauchy–Schwarz step turned all the summands nonnegative, which allows us to
expand the domain of summation from 𝑋 to all of 𝑉 = 𝑉 (𝐺) in the next step. Continuing,
∑︁
≤ (deg(𝑥, 𝑌 ) − 𝑝 |𝑌 |) 2
∑︁
𝑥 ∈𝑉
∑︁
= deg(𝑥, 𝑌 ) 2 − 2𝑝 |𝑌 | deg(𝑥, 𝑌 ) + 𝑝 2 𝑛 |𝑌 | 2
∑︁
𝑥 ∈𝑉 𝑥 ∈𝑉
∑︁
= codeg(𝑦, 𝑦 ) − 2𝑝 |𝑌 |
′
deg 𝑦 + 𝑝 2 𝑛 |𝑌 | 2
𝑦,𝑦 ′ ∈𝑌 𝑦 ∈𝑌
= |𝑌 | 𝑝 𝑛 − 2𝑝 |𝑌 | · |𝑌 | 𝑝𝑛 + 𝑝 2 𝑛 |𝑌 | 2 + 𝑜(𝑛3 )
2 2
[by CODEG and (3.1)]
3
= 𝑜(𝑛 ). □
Finally, let us consider the graph spectrum, which are eigenvalues of the graph adja-
cency matrix, accounting for eigenvalue multiplicities. Eigenvalues are core to the study of
pseudorandomness and they will play a central role in the rest of this chapter.
In this book, when we talk about the eigenvalues of a graph, we always mean the
eigenvalues of the adjacency matrix of the graph. In other contexts, it may be useful to
98 Pseudorandom Graphs
consider other related matrices, such as the Laplacian matrix, or a normalized adjacency
matrix.
We will generally only consider real symmetric matrices, whose eigenvalues are always
all real (Hermitian matrices also have this property). Our usual convention is to list all the
eigenvalues in order (including multiplicities): 𝜆1 ≥ 𝜆2 ≥ · · · ≥ 𝜆 𝑛 . We refer to 𝜆1 as the
top eigenvalue (or largest eigenvalue), and 𝜆𝑖 as the 𝒊-th eigenvalue (or the 𝒊-th largest
eigenvalue). The second eigenvalue plays an important role. We write 𝜆𝑖 ( 𝐴) for the 𝑖-th
eigenvalue of the matrix 𝐴 and 𝜆𝑖 (𝐺) = 𝜆𝑖 ( 𝐴𝐺 ) where 𝐴𝐺 is the adjacency matrix of 𝐺.
Remark 3.1.18 (Linear algebra review). For every 𝑛 × 𝑛 real symmetric matrix 𝐴 with
eigenvalues 𝜆1 ≥ · · · ≥ 𝜆 𝑛 , we can choose an eigenvector 𝑣 𝑖 ∈ R𝑛 for each eigenvalue 𝜆𝑖
(so that 𝐴𝑣 𝑖 = 𝜆𝑖 𝑣 𝑖 ) and such that {𝑣 1 , . . . , 𝑣 𝑛 } is an orthogonal basis of R𝑛 (this is false for
general non-symmetric matrices).
The Courant–Fischer min-max theorem is an important characterization of eigenvalues
in terms of a variational problem. Here we only state some consequences most useful for us.
We have
⟨𝑣, 𝐴𝑣⟩
𝜆1 = max .
𝑣 ∈R \{0} ⟨𝑣, 𝑣⟩
𝑛
Once we have fixed a choice of an eigenvector 𝑣 1 for the top eigenvalue 𝜆1 , we have
⟨𝑣, 𝐴𝑣⟩
𝜆2 = max .
𝑣⊥𝑣1 ⟨𝑣, 𝑣⟩
𝑣 ∈R𝑛 \{0}
Proof. Let 1 ∈ R𝑛 be the all-1 vector. By the Courant–Fischer min-max theorem, the
adjacency matrix 𝐴 of the graph 𝐺 has top eigenvalue
⟨𝑥, 𝐴𝑥⟩ ⟨1, 𝐴1⟩ 2𝑒(𝐺)
𝜆1 = sup ≥ = = avgdeg(𝐺). □
𝑛
𝑥 ∈R ⟨𝑥, 𝑥⟩ ⟨1, 1⟩ 𝑣(𝐺)
𝑥≠0
Proof that C4 implies EIG. Again writing 𝐴 for the adjacency matrix,
∑︁
𝑛
𝜆4𝑖 = tr 𝐴4 = # {closed walks of length 4} ≤ 𝑝 4 𝑛4 + 𝑜(𝑛4 ).
𝑖=1
On the other hand, by Lemma 3.1.20 above, we have 𝜆1 ≥ 𝑝𝑛 + 𝑜(𝑛). So we must have
𝜆1 = 𝑝𝑛 + 𝑜(𝑛) and max𝑖 ≥2 |𝜆𝑖 | = 𝑜(𝑛). □
This completes all the implications in the proof of Theorem 3.1.1.
Additional remarks
Remark 3.1.21 (Forcing graphs). The C4 hypothesis says that having 4-cycle density
asymptotically the same as random implies quasirandomness. Which other graphs besides
𝐶4 have this property?
Chung, Graham, & Wilson (1989) called a graph 𝐹 forcing if every graph with edge
density 𝑝 + 𝑜(1) and 𝐹-density 𝑝 𝑒 (𝐹 ) + 𝑜(1) (i.e., asymptotically the same as random) is
automatically quasirandom. Theorem 3.1.1 implies that 𝐶4 is forcing. Here is a conjectural
characterization of forcing graphs (Skokan & Thoma 2004; Conlon, Fox, & Sudakov 2010).
We will revisit this conjecture in Chapter 5 where we will reformulate it using the language
of graphons.
More generally, one says that a family of graphs F is forcing if having 𝐹-density being
𝑝 𝑒 (𝐹 ) + 𝑜(1) for each 𝐹 ∈ F implies quasirandomness. So {𝐾2 , 𝐶4 } is forcing. It seems to
be a difficult problem to classify forcing families.
Even though many other graphs can potentially play the role of the 4-cycle, the 4-cycle
100 Pseudorandom Graphs
nevertheless occupies an important role in the study of quasirandomness. The 4-cycle comes
up naturally in the proofs, as we will see below. It also is closely tied to other impor-
tant pseudorandomness measurements such as the Gowers 𝑈 2 uniformity norm in additive
combinatorics.
Let us formulate a bipartite analogue of Theorem 3.1.1 since we will need it later. It is
easy to adapt the above proofs to the bipartite version—we encourage the readers to think
about the differences between the two settings.
Remark 3.1.23 (Eigenvalues of bipartite graphs). Given a bipartite graph 𝐺 with vertex
bipartition 𝑉 ∪ 𝑊, we can write its adjacency matrix as
0 𝐵
𝐴= (3.2)
𝐵⊺ 0
where 𝐵 is an |𝑉 | × |𝑊 | matrix with rows indexed by 𝑉 and columns indexed by 𝑊. The
eigenvalues 𝜆1 ≥ · · · ≥ 𝜆 𝑛 of 𝐴 always satisfy
𝜆𝑖 = 𝜆 𝑛+1−𝑖 for every 1 ≤ 𝑖 ≤ 𝑛.
In other words, the eigenvalues are symmetric around zero. One way to see this is that if
𝑥 = (𝑣, 𝑤) is an eigenvector of 𝐴, where 𝑣 ∈ R𝑉 is the restriction of 𝑥 to the first |𝑉 |
coordinates, and 𝑤 is the restriction of 𝑥 to the last |𝑊 | coordinates, then
𝜆𝑣 0 𝐵 𝑣 𝐵𝑤
= 𝜆𝑥 = 𝐴𝑥 = = ,
𝜆𝑤 𝐵⊺ 0 𝑤 𝐵⊺ 𝑣
so that
𝐵𝑤 = 𝜆𝑣 and 𝐵 ⊺ 𝑣 = 𝜆𝑤.
Then the vector 𝑥 ′ = (𝑣, −𝑤) satisfies
0 𝐵 𝑣 −𝐵𝑤 −𝜆𝑣
𝐴𝑥 ′ = = = = −𝜆𝑥 ′ .
𝐵⊺ 0 −𝑤 𝐵⊺ 𝑣 𝜆𝑤
So we can pair each eigenvalue of 𝐴 with its negation.
Exercise 3.1.24. Using the notation from (3.2), show that the positive eigenvalues of the
adjacency matrix 𝐴 coincide with the positive singular values of 𝐵 (the singular values of
𝐵 are also the positive square roots of the eigenvalues of 𝐵 ⊺ 𝐵).
𝐺 𝐺 × 𝐾2
Exercise 3.1.27. Show that a graph 𝐺 satisfies each property in Theorem 3.1.1 if and
only if 𝐺 × 𝐾2 satisfies the corresponding bipartite property in Theorem 3.1.25.
Like earlier, random bipartite graphs are bipartite quasirandom. The proof (omitted) is
essentially the same as Proposition 3.1.8 and Corollary 3.1.9.
Remark 3.1.29 (Sparse graphs). We stated quasirandom properties so far only for graphs
of constant order density (i.e., 𝑝 is a constant). Let us think about what happens if we allow
𝑝 = 𝑝 𝑛 to depend on 𝑛 and decaying to zero as 𝑛 → ∞. Such graphs are sometimes called
sparse (although some other authors reserve the word “sparse” for bounded degree graphs).
Theorems 3.1.1 and 3.1.25 as stated do hold for a constant 𝑝 = 0, but the results are not as
informative as we would like. For example, the error tolerance on the DISC is 𝑜(𝑛2 ), which
does not tell us much since the graph already has much fewer edges due to its sparseness
anyway.
To remedy the situation, the natural thing to do is to adjust the error tolerance relative to
the edge density 𝑝 = 𝑝 𝑛 → 0. Here are some representative examples (all of these properties
should also depend on 𝑝):
SparseDISC |𝑒(𝑋, 𝑌 ) − 𝑝 |𝑋 | |𝑌 || = 𝑜( 𝑝𝑛2 ) for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺).
SparseCOUNT 𝐻 The number of labeled copies of 𝐻 is (1 + 𝑜(1)) 𝑝 𝑒 (𝐻 ) 𝑛𝑣 (𝐻 ) .
SparseC4 The number of labeled 4-cycles is at most (1 + 𝑜(1)) 𝑝 4 𝑛4 .
102 Pseudorandom Graphs
Exercise 3.1.31∗ (Quasirandomness through fixed sized subsets). Fix 𝑝 ∈ [0, 1]. Let
(𝐺 𝑛 ) be a sequence of graphs with 𝑣(𝐺 𝑛 ) = 𝑛 (here 𝑛 → ∞ along a subsequence of
integers).
(a) Fix a single 𝛼 ∈ (0, 1). Suppose
𝑝𝛼2 𝑛2
𝑒(𝑆) = + 𝑜(𝑛2 ) for all 𝑆 ⊆ 𝑉 (𝐺) with |𝑆| = ⌊𝛼𝑛⌋ .
2
Prove that 𝐺 is quasirandom.
(b) Fix a single 𝛼 ∈ (0, 1/2). Suppose
𝑒(𝑆, 𝑉 (𝐺) \ 𝑆) = 𝑝𝛼(1 − 𝛼)𝑛2 + 𝑜(𝑛2 ) for all 𝑆 ⊆ 𝑉 (𝐺) with |𝑆| = ⌊𝛼𝑛⌋ .
Prove that 𝐺 is quasirandom. Furthermore, show that the conclusion is false for
𝛼 = 1/2.
Exercise 3.1.32 (Quasirandomness and regularity partitions). Fix 𝑝 ∈ [0, 1]. Let (𝐺 𝑛 )
be a sequence of graphs with 𝑣(𝐺 𝑛 ) → ∞. Suppose that for every 𝜀 > 0, there exists
𝑀 = 𝑀 (𝜀) so that each 𝐺 𝑛 has an 𝜀-regular partition where all but 𝜀-fraction of vertex
pairs lie between pairs of parts with edge density 𝑝 + 𝑜(1) (as 𝑛 → ∞). Prove that 𝐺 𝑛 is
quasirandom.
3.2 Expander Mixing Lemma 103
Exercise 3.1.33∗ (Triangle counts on induced subgraphs). Fix 𝑝 ∈ (0, 1]. Let (𝐺 𝑛 ) be
a sequence of graphs with 𝑣(𝐺 𝑛 ) = 𝑛. Let 𝐺 = 𝐺 𝑛 . Suppose that for every 𝑆 ⊆ 𝑉 (𝐺),
the number of triangles in the induced subgraph 𝐺 [𝑆] is 𝑝 3 |𝑆3 | + 𝑜(𝑛3 ). Prove that 𝐺 is
quasirandom.
Exercise 3.1.34∗ (Perfect matchings). Prove that there are constant 𝛽, 𝜀 > 0 such that
for every positive even integer 𝑛 and real 𝑝 ≥ 𝑛 −𝛽 , if 𝐺 is an 𝑛-vertex graph where every
vertex has degree (1 ± 𝜀) 𝑝𝑛 (meaning within 𝜀 𝑝𝑛 of 𝑝𝑛) and every pair of vertices has
codegree (1 ± 𝜀) 𝑝 2 𝑛, then 𝐺 has a perfect matching.
Remark 3.2.2 (Notation). Rather than saying “an (𝑛, 7, 6)-graph” we prefer to say “an
(𝑛, 𝑑, 𝜆)-graph with 𝑑 = 7 and 𝜆 = 6” for clarity as the name “(𝑛, 𝑑, 𝜆)” is quite standard
and recognizable.
Remark 3.2.3 (Linear algebra review). The operator norm of a matrix 𝐴 ∈ R𝑚×𝑛 is defined
by
| 𝐴𝑥| ⟨𝑦, 𝐴𝑥⟩
∥ 𝐴∥ = sup = sup .
𝑥 ∈R𝑛 \{0} |𝑥| 𝑛
𝑥 ∈R \{0} |𝑥| |𝑦|
𝑦 ∈R𝑚 \{0}
√︁
Here |𝑥| = ⟨𝑥, 𝑥⟩ denotes the length of vector 𝑥. The operator norm of 𝐴 is the maximum
ratio that 𝐴 can amplify the length of a vector by. If 𝐴 is a real symmetric matrix, then
∥ 𝐴∥ = max |𝜆𝑖 ( 𝐴)| .
𝑖
For general matrices, the operator norm of 𝐴 equals the largest singular value of 𝐴.
Here is the main result of this section.
104 Pseudorandom Graphs
On the left-hand side, (𝑑/𝑛) |𝑋 | |𝑌 | is the number of edges that one should expect between
𝑋 and 𝑌 purely based on the edge density 𝑑/𝑛 of the graph and the sizes of 𝑋 and 𝑌 . Note
that unlike the discrepancy condition (DISC) from quasirandom graphs (Theorem 3.1.1),
the error bound on the right-side hand depends on the sizes of 𝑋 and 𝑌 . We can apply the
expander mixing lemma to small subsets 𝑋 and 𝑌 and still obtain useful estimates on 𝑒(𝑋, 𝑌 ),
unlike the dense quasirandom graph conditions.
Proof. Let 𝐽 be the 𝑛 × 𝑛 all-1 matrix. Since the all-1 vector 1 ∈ R𝑛 is an eigenvector of
𝐴𝐺 with eigenvalue 𝑑, we see that 1 is an eigenvector of 𝐴𝐺 − 𝑑𝑛 𝐽 with eigenvalue 0. Any
other eigenvector 𝑣 of 𝐴𝐺 , with 𝑣 ⊥ 1, satisfies 𝐽𝑣 = 0, and thus 𝑣 is also an eigenvector
of 𝐴𝐺 − 𝑑𝑛 𝐽 with the same eigenvalue as in 𝐴𝐺 . Therefore, the eigenvalues of 𝐴𝐺 − 𝑑𝑛 𝐽 are
obtained by taking the eigenvalues of 𝐴𝐺 then replacing one top eigenvalue 𝑑 by zero. All the
other eigenvalues of 𝐴𝐺 − 𝑑𝑛 𝐽 are therefore at most 𝜆 in absolute value, so 𝐴𝐺 − 𝑑𝑛 𝐽 ≤ 𝜆.
Therefore,
𝑑 𝑑
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | = 1𝑋 , 𝐴𝐺 − 𝐽 1𝑌
𝑛 𝑛
𝑑
≤ 𝐴𝐺 − 𝐽 |1𝑋 | |1𝑌 |
𝑛
√︁
≤ 𝜆 |𝑋 | |𝑌 |. □
Exercise 3.2.5. Prove the following strengthening the expander mixing lemma.
We also have a bipartite analogue (the nomenclature used here is less standard). Recall
from Remark 3.1.23 that the eigenvalues of a bipartite graph are symmetric around zero.
In other words, a graph with edge-expansion ratio at least ℎ has the property that for every
nonempty subset of vertices 𝑆 with |𝑆| ≤ |𝑉 | /2, there are at least ℎ |𝑆| edges leaving 𝑆.
Cheeger’s inequality, stated below, tells us that among 𝑑-regular graphs for a fixed 𝑑,
having spectral gap bounded away from zero is equivalent to having edge-expansion ratio
bounded away from zero. Cheeger (1970) originally developed this inequality for Riemannian
manifolds. The graph theoretic analogue was proved by Dodziuk (1984), and independently
by Alon & Milman (1985) and Alon (1986).
The two bounds of Cheeger’s inequality are tight up to constant factors. For the lower
bound, taking 𝐺 to be the skeleton of the 𝑑-dimensional cube with vertex set {0, 1} 𝑑 gives
ℎ = 1 (achieved by the 𝑑−1 dimensional subcube) and 𝜅 = 2. For the upper bound, taking 𝐺 to
be an 𝑛-cycle gives ℎ = 2/(𝑛/2) = Θ(1/𝑛) while 𝑑 = 2 and 𝜅 = 2 − 2 cos(2𝜋/𝑛)) = Θ(1/𝑛2 ).
We call a family of 𝑑-regular graphs expanders if there is some constant 𝜅 0 > 0 so that
106 Pseudorandom Graphs
each graph in the family has spectral gap ≥ 𝜅0 ; by Cheeger’s inequality, this is equivalent to
the existence of some ℎ0 > 0 so that each graph in the family has edge expansion ratio ≥ ℎ0 .
Expander graphs are important objects in mathematics and computer science. For example,
expander graphs have rapid mixing properties, which are useful for designing efficient Monte
Carlo algorithms for sampling and estimation.
The following direction of Cheeger’s inequality is easier to prove. It is similar to the
expander mixing lemma.
Exercise 3.2.14 (Spectral gap implies expansion). Prove the 𝜅/2 ≤ ℎ part of Cheeger’s
inequality.
√
The other direction, ℎ ≤ 2𝑑𝜅, is more difficult and interesting. The proof is outlined in
the following exercise.
Exercise 3.2.15 (Expansion implies spectral gap). Let 𝐺 = (𝑉, 𝐸) be a connected 𝑑-
regular graph with spectral gap 𝜅. Let 𝑥 = (𝑥 𝑣 ) 𝑣 ∈𝑉 ∈ R𝑉 be an eigenvector associated to
the second largest eigenvalue 𝜆2 = 𝑑 − 𝜅 of the adjacency matrix of 𝐺. Assume that 𝑥 𝑣 > 0
on at most half of the vertex set (or else we replace 𝑥 by −𝑥). Let 𝑦 = (𝑦 𝑣 ) 𝑣 ∈𝑉 ∈ R𝑉 be
obtained from 𝑥 by replacing all its negative coordinates by zero.
(a) Prove that
⟨𝑦, 𝐴𝑦⟩
𝑑− ≤ 𝜅.
⟨𝑦, 𝑦⟩
Í
𝑢∼𝑣 𝑥𝑢 . Hint: Recall that 𝜆2 𝑥𝑣 =
(b) Let
∑︁
Θ= 𝑦 2𝑢 − 𝑦 2𝑣 .
𝑢𝑣 ∈𝐸
Prove that
Θ2 ≤ 2𝑑 (𝑑 ⟨𝑦, 𝑦⟩ − ⟨𝑦, 𝐴𝑦⟩) ⟨𝑦, 𝑦⟩ .
2 − 𝑦 2 = ( 𝑦 − 𝑦 ) ( 𝑦 + 𝑦 ) . Apply Cauchy–Schwarz.
Hint: 𝑦𝑢 𝑣 𝑢 𝑣 𝑢 𝑣
Exercises
Exercise 3.2.16 (Independence numbers). Prove that every independent set in a (𝑛, 𝑑, 𝜆)-
graph has size at most 𝑛𝜆/(𝑑 + 𝜆).
3.3 Abelian Cayley Graphs and Eigenvalues 107
Exercise 3.2.17 (Diameter). Prove that the diameter of an (𝑛, 𝑑, 𝜆)–graph is at most
⌈log 𝑛/log(𝑑/𝜆)⌉. (The diameter of a graph is the maximum distance between a pair of
vertices.)
Exercise 3.2.18 (Counting cliques). For each part below, prove that for every 𝜀 > 0, there
exists 𝛿 > 0 such that the conclusion holds for every (𝑛, 𝑑, 𝜆)-graph 𝐺 with 𝑑 = 𝑝𝑛.
(a) If 𝜆 ≤ 𝛿 𝑝 2 𝑛, then the number of triangles of 𝐺 is within a 1 ± 𝜀 factor of 𝑝 3 𝑛3 .
(b*) If 𝜆 ≤ 𝛿 𝑝 3 𝑛, then the number of 𝐾4 ’s in 𝐺 is within a 1 ± 𝜀 factor of 𝑝 6 𝑛4 .
In this section, we only consider abelian groups, specifically Z/𝑝Z for concreteness (though
everything here generalizes easily to all finite abelian groups). For abelian groups, we write
the group operation additively as 𝑔 + 𝑠. So edges join elements whose difference lies in 𝑆.
Remark 3.3.2. In later sections when we consider a non-abelian group Γ, one needs to
make a choice whether to define edges by left- or right-multiplication (i.e., 𝑔𝑠 or 𝑠𝑔; we
chose 𝑔𝑠 here). It does not matter which choice one makes (as long as one is consistent) since
the resulting Cayley graphs are isomorphic (why?). However, some careful bookkeeping is
sometimes required to make sure that later computations are consistent with the initial choice.
Example 3.3.3. Cay(Z/𝑛Z, {−1, 1}) is a cycle of length 𝑛. The graph for 𝑛 = 8 is shown
below.
Here is an explicitly constructed family of quasirandom graphs with edge density 1/2 +
𝑜(1).
Example 3.3.6. The Paley graphs for 𝑝 = 5 and 𝑝 = 13 are shown below.
12 0 1
0
11 2
4 1 10 3
9 4
8 5
3 2 7 6
Cay(Z/5Z, {±1}) Cay(Z/13Z, {±1, ±3, ±4})
Remark 3.3.7 (Quadratic residues). Here we
2 recall some facts from elementary number
theory. For every odd prime 𝑝, the set 𝑆 = 𝑎 : 𝑎 ∈ F 𝑝 of quadratic residues is a multi-
×
plicative subgroup of F×𝑝 with index two. In particular, |𝑆| = ( 𝑝 − 1)/2. We have −1 ∈ 𝑆 if
and only if 𝑝 ≡ 1 (mod 4) (which is required to define a Cayley graph, as the generating set
needs to be symmetric in the sense that 𝑆 = −𝑆).
We will show that Paley graphs are quasirandom by verifying the EIG condition, which
says that all eigenvalues, except the top one, are small. Here is a general formula for computing
the eigenvalues of any Cayley graph on Z/𝑝Z.
Remark 3.3.9 (Eigenvalues and the Fourier transform). The coordinates of the eigenvec-
3.3 Abelian Cayley Graphs and Eigenvalues 109
𝑠∈𝑆 𝑠∈𝑆
So 𝐴𝑣 𝑗 = 𝜆 𝑗 𝑣 𝑗 .
Next we check that {𝑣 0 , . . . , 𝑣 𝑛−1 } is an orthonormal basis. We have the inner product
1
⟨𝑣 𝑗 , 𝑣 𝑘 ⟩ = 1 · 1 + 𝜔 𝑗 𝜔 𝑘 + 𝜔2 𝑗 𝜔2𝑘 + · · · + 𝜔 (𝑛−1) 𝑗 𝜔 (𝑛−1) 𝑘
𝑛 (
1 2(𝑘− 𝑗 )
1 if 𝑗 = 𝑘,
= 1+𝜔 𝑘− 𝑗
+𝜔 +···+𝜔 (𝑛−1) (𝑘− 𝑗 )
= .
𝑛 0 if 𝑗 ≠ 𝑘.
Í
For the 𝑖 ≠ 𝑗 case, we use that for any 𝑚-th root of unity 𝜁 ≠ 1, 𝑚−1 𝑗=0 𝜁 = 0. So {𝑣 0 , . . . , 𝑣 𝑛−1 }
𝑗
is an orthonormal basis. □
Remark 3.3.10 (Real vs complex eigenbases). The adjacency matrix of a graph is a real
symmetric matrix, so all its eigenvalues are real, and it always has a real orthogonal eigenbasis.
The eigenbasis given in Theorem 3.3.8 is complex, but it can always be made real. Looking
at the formulas in Theorem 3.3.8, we have 𝜆 𝑗 = 𝜆 𝑛− 𝑗 , and 𝑣 𝑗 is the complex conjugate of
𝑣 𝑛− 𝑗 . So we can form a real orthogonal√ eigenbasis √ by replacing, for each 𝑗 ∉ {0, 𝑛/2}, the
pair (𝑣 𝑗 , 𝑣 𝑛− 𝑗 ) by ((𝑣 𝑗 + 𝑣 𝑛− 𝑗 )/ 2, 𝑖(𝑣 𝑗 − 𝑣 𝑛− 𝑗 )/ 2). Equivalently, we can separate the real
and imaginary parts of each 𝑣 𝑗 , which are both eigenvectors with eigenvalue 𝜆 𝑗 . All the real
eigenvalues and eigenvectors can be expressed in terms of sines and cosines.
Remark 3.3.11 (Every abelian Cayley graph has an eigenbasis independent of the gen-
erators). The above theorem and its proof generalizes to all finite abelian groups, not just
Z/𝑛Z. For every finite abelian group Γ, we have a set b
Γ of characters, where each character
is a homomorphism 𝜒 : Γ → C× . Then b Γ turns out to be a group isomorphic to Γ (one can
110 Pseudorandom Graphs
check this by first writing Γ as a direct product of cyclic groups).√︁For each 𝜒 ∈ bΓ, define
the vector 𝑣 𝜒 ∈ C by setting the coordinate at 𝑔 ∈ Γ to be 𝜒(𝑔)/ |Γ|. Then {𝑣 𝜒 : 𝜒 ∈ b
Γ
Γ}
is an orthonormal basis for the adjacency matrix of every Cayley graph on Γ. The eigen-
Í
value corresponding to 𝑣 𝜒 is 𝜆 𝜒 (𝑆) = 𝑠∈𝑆 𝜒(𝑠). Up to normalization, 𝜆 𝜒 (𝑆) is the Fourier
transform of the indicator function of 𝑆 on the abelian group Γ (Theorem 3.3.8 is a special
case of this construction). In particular, this eigenbasis {𝑣 𝜒 : 𝜒 ∈ b
Γ} depends only on the
finite abelian group and not on the generating set 𝑆. In other words, we have a simultaneous
diagonalization for all adjacency matrices of Cayley graphs on a fixed finite abelian group.
If Γ is a non-abelian group, then there does not exist a simultaneous eigenbasis for all
Cayley graphs on Γ. There is a corresponding theory of non-abelian Fourier analysis, which
uses group representation theory. We will discuss more about non-abelian Cayley graphs in
Section 3.4.
Now we apply the above formula to compute eigenvalues of Paley graphs. In particular,
the following tells us that Paley graphs satisfy the quasirandomness condition EIG from
Theorem 3.1.1.
Proof. Applying Theorem 3.3.8, we see that the eigenvalues are given by, for 𝑗 = 0, 1, . . . , 𝑝−
1,
∑︁ ∑︁
𝑗𝑠 1 𝑗 𝑥2
𝜆𝑗 = 𝜔 = −1 + 𝜔 ,
𝑠∈𝑆
2 𝑥 ∈F 𝑝
since each quadratic residue 𝑠 appears as 𝑥 2 for exactly two non-zero 𝑥. Clearly 𝜆0 = ( 𝑝−1)/2.
√
For 𝑗 ≠ 0, the next result shows that the inner sum on the right-hand side is ± 𝑝 (note that
the above sum is real when 𝑝 ≡ 1 (mod 4) since 𝑆 = 𝑆 −1 and so the sum equals to its own
complex conjugate; alternatively, the sum must be real since all eigenvalues of a symmetric
matrix are real). □
Remark 3.3.13. Since the trace of the adjacency matrix is zero, and equals the sum of
√
eigenvalues, we see that the non-top eigenvalues are equally split between ( 𝑝 − 1)/2 and
√
(− 𝑝 − 1)/2.
Proof. We have
∑︁ 2
2 ∑︁ 2
∑︁
− 𝑥2 ) 2
𝜔 𝑗𝑥 = 𝜔 𝑗 ( ( 𝑥+𝑦) = 𝜔 𝑗 (2𝑥 𝑦+𝑦 ) .
𝑥 ∈F 𝑝 𝑥,𝑦 ∈Z/ 𝑝Z 𝑥,𝑦 ∈Z/ 𝑝Z
3.4 Quasirandom Groups 111
0
if 𝑎 ≡ 0 (mod 𝑝)
𝑎
= 1 if 𝑎 is a nonzero quadratic residue mod 𝑝
−1 if 𝑎 is a quadratic nonresidue mod 𝑝
𝑝
√
Exercise 3.3.17. Prove that in a Paley graph of order 𝑝, every clique has size at most 𝑝.
Exercise 3.3.18 (No spectral gap if too few generators). Prove that for every 𝜀 > 0 there
is some 𝑐 > 0 such that for every 𝑆 ⊆ Z/𝑛Z with 0 ∉ 𝑆 = −𝑆 and |𝑆| ≤ 𝑐 log 𝑛, the second
largest eigenvalue of the adjacency matrix of Cay(Z/𝑛Z, 𝑆) is at least (1 − 𝜀) |𝑆|.
Exercise 3.3.19∗. Let 𝑝 be a prime and let 𝑆 be a multiplicative subgroup of F×𝑝 . Suppose
−1 ∈ 𝑆. Prove that all eigenvalues of the adjacency matrix of Cay(Z/𝑝Z, 𝑆), other than the
√
top one, are at most 𝑝 in absolute value.
Remark 3.4.2 (Representations of finite groups). We need some basic concepts from group
representation theory in this section—mostly just some definitions. Feel free to skip this
remark if you have already seen group representations before.
Given a finite group Γ, it is often useful to study its actions as linear transformations on
some vector space. For example, if Γ is a cyclic or dihedral group, it is natural to think of
elements of Γ as rotations and reflection of a plane, which are linear transformations on R2 .
The theory turns out to be much nicer over C than R since C is algebraically closed. We are
interested in ways that Γ can be represented as a group of linear transformations acting on
some C𝑑 .
A representation of a finite group Γ is a group homomorphism 𝜌 : Γ → GL(𝑉), where
𝑉 is a complex vector space (everything will take place over C) and GL(𝑉) is the group of
invertible linear transformations of 𝑉. We sometimes omit 𝜌 from the notation and just say
that 𝑉 is a representation of Γ, and also that Γ acts on 𝑉 (via 𝜌). For each 𝑔 ∈ Γ and 𝑣 ∈ 𝑉,
we write 𝑔𝑣 = 𝜌(𝑔)𝑣 for the image of the 𝑔-action on 𝑣. We write dim 𝜌 = dim 𝑉 for the
dimension of the representation.
The fact that 𝜌 : Γ → GL(𝑉) is a group homomorphism means that the action of Γ on 𝑉 is
compatible with group operations in Γ in the following sense: if 𝑔, ℎ ∈ Γ, then the expression
𝑔ℎ𝑥 does not depend on whether we first apply ℎ to 𝑥 and then 𝑔 to ℎ𝑥, or if we first multiply
𝑔 and ℎ in Γ and then apply their product 𝑔ℎ to 𝑥.
For example, suppose Γ is a subgroup of permutations of [𝑛], with each element 𝑔 ∈ Γ
viewed as a permutation 𝑔 : [𝑛] → [𝑛]. We can define a representation of Γ on C𝑛 by letting
Γ permute the coordinates: for any 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ) ∈ C𝑛 , set 𝑔𝑥 = (𝑥 𝑔 (1) , . . . , 𝑥 𝑔 (𝑛) ). As
an element of GL(𝑛, C), 𝜌(𝑔) is the 𝑛 × 𝑛 permutation matrix of the permutation 𝑔, and
𝑔𝑥 = 𝜌(𝑔)𝑥 for each 𝑥 ∈ C𝑛 .
We say that the representation 𝑉 of Γ is trivial if 𝑔𝑣 = 𝑣 for all 𝑔 ∈ Γ and 𝑣 ∈ 𝑉, and
non-trivial otherwise.
We say that a subspace 𝑊 of 𝑉 is 𝚪-invariant if 𝑔𝑤 ∈ 𝑊 for all 𝑤 ∈ 𝑊. In other words,
the image of 𝑊 under Γ is contained in 𝑊 (and actually must equal 𝑊 due to the invertibility
of group elements). Then 𝑊 is a representation of Γ, and we call it a subrepresentation of 𝑉.
For an introduction to group representation theory, see any standard textbook such as
the classic Linear Representations of Finite Groups by Serre (1977) is a classic. Also, the
lectures notes titled Representation Theory of Finite Groups, and Applications by Wigderson
(2012) is a friendly introduction with applications to combinatorics and theoretical computer
science.
Recall from Definition 3.2.1 that an (𝒏, 𝒅, 𝝀)-graph is an 𝑛-vertex 𝑑-regular graph all of
whose eigenvalues, except the top one, are at most 𝜆 in absolute value.
The main theorem of this section, below, says that a group with no small non-trivial
representations always produces quasirandom Cayley graphs (Gowers 2008).
Therefore
√︂ √︂
𝑑 (𝑛 − 𝑑) 𝑑𝑛
|𝜇| ≤ < . □
𝐾 𝐾
The above proof can be modified to prove a bipartite version, which will be useful for
certain applications.
Given a finite group Γ and a subset 𝑆 ⊆ Γ (not necessarily symmetric), we define the
bipartite Cayley graph BiCay(𝚪, 𝑺) as the bipartite graph with vertex set Γ on both parts,
with an edge joining 𝑔 on the left with 𝑔𝑠 on the right for every 𝑔 ∈ Γ and 𝑠 ∈ 𝑆.
√︁
In other words, the second largest eigenvalue of the adjacency matrix of this bipartite
Cayley graph is less than 𝑛𝑑/𝐾.
Exercise 3.4.8. Prove Theorem 3.4.7.
As an application of the expander mixing lemma, we show that in a quasirandom group,
the number of solutions to 𝑥𝑦 = 𝑧 with 𝑥, 𝑦, 𝑧 lying in three given sets 𝑋, 𝑌 , 𝑍 ⊆ Γ is close to
what one should predict from density alone. Note that the right-hand side expression below
is relatively small if 𝐾 2 is large compared to |𝑋 | |𝑌 | |𝑍 | /|Γ| 3 (e.g., if 𝑋, 𝑌 , 𝑍 each occupy at
least a constant proportion of the group, and 𝐾 tends to infinity).
𝑋
𝑧
Γ 𝑦 = 𝑥 −1 𝑧 ∈ 𝑌 Γ
𝑥 𝑍
BiCay(Γ, 𝑌 )
3.4 Quasirandom Groups 115
√︁
By Theorem 3.4.7, BiCay(Γ, 𝑌 ) is a bipartite-(𝑛, 𝑑, 𝜆)-graph with 𝑛 = |Γ|, 𝑑 = |𝑌 |, and some
𝜆 < |Γ| |𝑌 | /|𝐾 |. The above inequality then follows from applying the bipartite expander
mixing lemma, Theorem 3.2.9, to BiCay(Γ, 𝑌 ). □
Proof. If there is no solution to 𝑥𝑦 = 𝑧, then the left-hand side of the inequality in Theo-
rem 3.4.9 is |𝑋 | |𝑌 | |𝑍 | /|Γ|. Rearranging gives the result. □
The above result already shows that all product-free subsets of a quasirandom group must
be small. This sharply contrasts the abelian setting. For example, in Z/𝑛Z (written additively),
there is a sum-free subset of size around 𝑛/3 consisting of all group elements strictly between
𝑛/3 and 2𝑛/3.
Exercise 3.4.11 (Growth and expansion in quasirandom groups). Let Γ be a finite group
with no non-trivial representations of dimension less than 𝐾. Let 𝑋, 𝑌 , 𝑍 ⊆ Γ. Suppose
|𝑋 | |𝑌 | |𝑍 | ≥ |Γ| 3 /𝐾. Then 𝑋𝑌 𝑍 = Γ (i.e., every element of Γ can be expressed as 𝑥𝑦𝑧 for
some (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍).
Recall that the special linear group SL(2, 𝑝) is the group of 2 × 2 matrices (under multi-
plication) with determinant 1:
𝑎 𝑏
SL(2, 𝑝) = : 𝑎, 𝑏, 𝑐, 𝑑 ∈ F 𝑝 , 𝑎𝑑 − 𝑏𝑐 = 1 .
𝑐 𝑑
The projective special linear group PSL(2, 𝑝) is a quotient of SL(2, 𝑝) by all scalars; that is,
PSL(2, 𝑝) = SL(2, 𝑝)/{±𝐼} .
The following result is due to Frobenius.
Proof. The claim is trivial for 𝑝 = 2, so we can assume that 𝑝 is odd. It suffices to prove the
claim for SL(2, 𝑝). Indeed, any non-trivial representation of PSL(2, 𝑝) can be made into a
representation of SL(2, 𝑝) by first passing through the quotient SL(2, 𝑝) → SL(2, 𝑝)/{±𝐼} =
PSL(2, 𝑝).
Now suppose 𝜌 is a non-trivial representation of SL(2, 𝑝). The group SL(2, 𝑝) is generated
by the elements (Exercise: check!)
1 1 1 0
𝑔= and ℎ= .
0 1 −1 1
These two elements are conjugate in SL(2, 𝑝) via 𝑧 = 11 −1 0 as 𝑔𝑧 = 𝑧ℎ. If 𝜌(𝑔) = 𝐼,
then 𝜌(ℎ) = 𝐼 by conjugation, and 𝜌 would be trivial since 𝑔 and ℎ generate the group. So,
𝜌(𝑔) ≠ 𝐼. Since 𝑔 𝑝 = 𝐼, we have 𝜌(𝑔) 𝑝 = 𝐼. So 𝜌(𝑔) is diagonalizable (here we use that a
matrix is diagonalizable if and only if its minimal polynomial has distinct roots, and that the
minimal polynomial of 𝜌(𝑔) divides 𝑋 𝑝 − 1). Since 𝜌(𝑔) ≠ 𝐼, 𝜌(𝑔) has an eigenvalue 𝜆 ≠ 1.
Since 𝜌(𝑔) 𝑝 = 𝐼, 𝜆 is a primitive 𝑝-th root of unity.
For every 𝑎 ∈ F×𝑝 , 𝑔 is conjugate to
𝑎 0 1 1 𝑎 −1 0 1 𝑎2 2
= = 𝑔𝑎 .
0 𝑎 −1 0 1 0 𝑎 0 1
2
Thus 𝜌(𝑔) is conjugate to 𝜌(𝑔) 𝑎 . Hence these two matrices have same set of eigenvalues.
2
So 𝜆 𝑎 is an eigenvalue of 𝜌(𝑔) for every 𝑎 ∈ F×𝑝 , and by ranging over all 𝑎 ∈ F×𝑝 , this gives
( 𝑝 − 1)/2 distinct eigenvalues of 𝜌(𝑔) (recall that 𝜆 is a primitive 𝑝-th root of unity). It
follows that dim 𝜌 ≥ ( 𝑝 − 1)/2. □
Applying Corollary 3.4.10 with Theorem 3.4.13 yields the following corollary (Gowers
2008). Note that the order of PSL(2, 𝑝) is ( 𝑝 3 − 𝑝)/2.
Before Gowers’ work, it was not known whether every order 𝑛 group has a product-free
subset of size ≥ 𝑐𝑛 for some absolute constant 𝑐 > 0 (this was Question 3.4.1, asked by
Babai and Sós). Gowers’ result shows that the answer is no.
In the other direction, Kedlaya (1997; 1998) showed that every finite group of order 𝑛
has a product-free subset of size ≳ 𝑛11/14 . In fact, he showed that if the group has a proper
subgroup 𝐻 of index 𝑚, then there is a product-free subset that is a union of ≳ 𝑚 1/2 cosets
of 𝐻.
To see that REP implies QUOTIENT, note that any non-trivial representation of Γ/𝐻 is
automatically a representation of Γ after passing through the quotient. Furthermore, every
non-trivial abelian group has a non-trivial 1-dimensional representation, and every group
√
of order 𝑚 > 1 has a non-trivial representation of dimension < 𝑚. For the proof of the
converse, see Gowers (2008, Theorem 4.8). (This implication has an exponential dependence
of parameters.)
Remark 3.4.17 (Non-abelian Fourier analysis). (This is an advanced remark and can be
skipped over.) Section 3.3 discussed the Fourier transform on finite abelian groups. The topic
118 Pseudorandom Graphs
of this section can be alternatively viewed through the lenses of the non-abelian Fourier
transform. We refer to Wigderson (2012) for a tutorial on the non-abelian Fourier transform
from a combinatorial perspective.
Let us give here the recipe for computing the eigenvalues and an orthonormal basis of
eigenvectors of Cay(Γ, 𝑆).
For each irreducible representation 𝜌 of Γ (always working over C), let
∑︁
𝑀𝜌 := 𝜌(𝑠),
𝑠∈𝑆
viewed as a dim 𝜌 × dim 𝜌 matrix over C. Then 𝑀𝜌 has dim 𝜌 eigenvalues 𝜆 𝜌,1 , . . . , 𝜆 𝜌,dim 𝜌 .
Here is how to list all the eigenvalues of the adjacency matrix of Cay(Γ, 𝑆): repeating
each 𝜆 𝜌,𝑖 with multiplicity dim 𝜌, ranging over all irreducible representations 𝜌 and all
1 ≤ 𝑖 ≤ dim 𝜌.
To emphasize, the eigenvalues always come in bundles with multiplicities determined by
the dimensions of the irreducible representations of Γ (although it is possible for there to be
additional coalescence of eigenvalues).
One can additionally recover a system of eigenvectors of Cay(Γ, 𝑆). For each eigenvector
𝑣 with eigenvalue 𝜆 of 𝑀𝜌 , and every 𝑤 ∈ Cdim 𝜌 , set 𝑥 𝜌,𝑣,𝑤 ∈ CΓ with coordinates
𝑥 𝑔𝜌,𝑣,𝑤 = ⟨𝜌(𝑔)𝑣, 𝑤⟩
for all 𝑔 ∈ Γ. Then 𝑥 is an eigenvector of Cay(Γ, 𝑆) with eigenvalue 𝜆. Now let 𝜌 range over
all irreducible representations of Γ, and let 𝑣 range over an orthonormal basis of eigenvectors
of 𝑀𝜌 (let 𝜆 be the corresponding eigenvalue), and let 𝑤 range over an orthonormal basis
of eigenvectors of Cdim 𝜌 , then 𝑥 𝜌,𝑣,𝑤 ranges over an orthogonal system of eigenvectors of
Cay(Γ, 𝑆). The eigenvalue associated to 𝑥 𝜌,𝑣,𝑤 is 𝜆.
A basic theorem in representation theory tells us that the regular representation decom-
poses into a direct sum of dim 𝜌 copies of 𝜌 ranging over every irreducible representation
𝜌 of Γ. This decomposition then corresponds to a block diagonalization (simultaneously for
all 𝑆) of the adjacency matrix of Cay(Γ, 𝑆) into blocks 𝑀𝜌 , repeated dim 𝜌 times, for each
𝜌. The above statement comes from interpreting this block diagonalization.
The matrix 𝑀𝜌 , appropriately normalized, is the non-abelian Fourier transform of the
indicator vector of 𝑆 at 𝜌. Many basic and important formulas for Fourier analysis over
abelian groups, e.g, inversion and Parseval (which we will see in Chapter 6) have nonabelian
analogs.
In Section 3.1, we saw that when 𝑑 grows linearly in 𝑛, then these two conditions are
3.5 Quasirandom Cayley Graphs and Grothendieck’s Inequality 119
Proof. In an (𝑛, 𝑑, 𝜆) graph with 𝜆 ≤ 𝜀𝑑, by the expander mixing lemma (Theorem 3.2.4),
for every vertex subsets 𝑋 and 𝑌 ,
𝑑 √︁ √︁
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ 𝜆 |𝑋 | |𝑌 | ≤ 𝜀𝑑 |𝑋 | |𝑌 | ≤ 𝜀𝑑𝑛.
𝑛
So the graph satisfies SparseDISC(𝜀). □
The converse fails badly. Consider the disjoint union of a large random 𝑑-regular graph
and a 𝐾 𝑑+1 (here 𝑑 = 𝑜(𝑛)).
large random
∪
𝑑-regular graph
𝐾 𝑑+1
This graph satisfies SparseDISC(𝑜(1)) since it is satisfied by the large component, and the
small component 𝐾 𝑑+1 contributes negligibly to discrepancy due to its size. On the other
hand, each connected component contributes a eigenvalue of 𝑑 (by taking the all-1 vector
supported on each component), and so SparseEIG(𝜀) fails for any 𝜀 < 1.
The main result of this section is that despite the above example, if we restrict ourselves
to Cayley graphs (abelian or non-abelian), SparseDISC(𝜀) and SparseEIG(𝜀) are always
equivalent up to a linear change in 𝜀. This result is due to Conlon & Zhao (2017).
As in Section 3.4, we prove the above result more generally for vertex-transitive graphs
(see Definition 3.4.5).
Grothendieck’s inequality
The proof of the above theorem leads us to the following important inequality from functional
analysis due to Grothendieck (1953).
Given a matrix 𝐴 = (𝑎 𝑖, 𝑗 ) ∈ R𝑚×𝑛 , we can consider its ℓ ∞ → ℓ 1 norm
sup ∥ 𝐴𝑦∥ ℓ 1 ,
∥ 𝑦 ∥ ∞ ≤1
which can also be written as (exercise: check! Also see Lemma 4.5.3 for a related fact about
the cut norm of graphons)
∑︁
𝑛 ∑︁
𝑛
sup ⟨𝑥, 𝐴𝑦⟩ = sup 𝑎 𝑖, 𝑗 𝑥 𝑖 𝑦 𝑗 . (3.3)
𝑥 ∈ { −1,1} 𝑚 𝑥1 ,··· , 𝑥𝑚 ∈ { −1,1} 𝑖=1 𝑗=1
𝑦 ∈ { −1,1} 𝑛 𝑦1 ,...,𝑦𝑛 ∈ { −1,1}
where the surpremum is taken over vectors 𝑥 1 , . . . , 𝑥 𝑚 , 𝑦 1 , . . . , 𝑦 𝑛 in the unit ball of some
real Hilbert space, whose norm is denoted by ∥ ∥. Without loss of generality, we can take
assume that these vectors lie in R𝑚+𝑛 with the usual Euclidean norm (here 𝑚 + 𝑛 dimensions
are enough since 𝑥1 , . . . , 𝑥 𝑚 , 𝑦 1 , . . . , 𝑦 𝑛 span a real subspace of dimension at most 𝑚 + 𝑛).
We always have
(3.3) ≤ (3.4)
by restricting the vectors in (3.4) to R. There are efficient algorithms (both in theory and in
practice) using semidefinite programming to solve (3.4), whereas no efficient algorithm is
believed to exist for computing (3.3) (Alon & Naor 2006).
Grothendieck’s inequality says that this semidefinite relaxation never loses more than a
constant factor.
Remark 3.5.6. The optimal constant 𝐾 is known as the real Grothendieck’s constant. Its
exact value is unknown. It is known to lie within [1.676, 1.783]. There is also a complex ver-
sion of Grothendieck’s inequality, where the left-hand side uses a complex Hilbert space (and
place an absolute value around the final sum). The corresponding complex Grothendieck’s
constant is known to lie within [1.338, 1.405].
3.5 Quasirandom Cayley Graphs and Grothendieck’s Inequality 121
We will not prove Grothendieck’s inequality here. See Alon & Naor (2006) for three proofs
of the inequality, along with algorithmic discussions.
The final step follows from Grothendieck’s inequality (applied with 𝐾 ≤ 2) along with (3.5).
This completes the proof of SparseEIG(8𝜀). □
122 Pseudorandom Graphs
We will see two different proofs. The first proof (Nilli 1991) constructs an eigenvector
explicitly. The second proof (only for Corollary 3.6.3) uses the trace method to bound
moments of the eigenvalues via counting closed walks.
𝑠 𝑡 𝑥𝑣
𝑉0 1
𝑉1 (𝑑 − 1) −1/2
𝑉2 (𝑑 − 1) −1
𝑉3 (𝑑 − 1) −3/2
Proof. Let 𝐿 = 𝑑𝐼 − 𝐴 (this is called the Laplacian matrix of 𝐺). The claim can be rephrased
as an upper bound on ⟨𝑥, 𝐿𝑥⟩ /⟨𝑥, 𝑥⟩. Here is an important and convenient formula (it can be
easily proved by expanding):
∑︁
⟨𝑥, 𝐿𝑥⟩ = (𝑥 𝑢 − 𝑥 𝑣 ) 2 .
𝑢𝑣 ∈𝐸
Since 𝑥 𝑣 is constant for all 𝑣 in the same 𝑉𝑖 , we only need to consider edges spanning
consecutive 𝑉𝑖 ’s. Using the formula for 𝑥, we obtain
∑︁
𝑟 −1 2
1 1 𝑒(𝑉𝑟 , 𝑉𝑟+1 )
⟨𝑥, 𝐿𝑥⟩ = 𝑒(𝑉𝑖 , 𝑉𝑖+1 ) − +
𝑖=0
(𝑑 − 1) 𝑖/2 (𝑑 − 1) (𝑖+1)/2 (𝑑 − 1) 𝑟
For each 𝑖 ≥ 0, each vertex in 𝑉𝑖 has at most 𝑑−1 neighbors in 𝑉𝑖+1 , so 𝑒(𝑉𝑖 , 𝑉𝑖+1 ) ≤ (𝑑−1) |𝑉𝑖 |.
Thus continuing from above,
∑︁
𝑟 −1 2
1 1 |𝑉𝑟 | (𝑑 − 1)
≤ |𝑉𝑖 | (𝑑 − 1) − +
𝑖=0
(𝑑 − 1) 𝑖/2 (𝑑 − 1) (𝑖+1)/2 (𝑑 − 1) 𝑟
√ 2 ∑︁
𝑟 −1
|𝑉𝑖 | |𝑉𝑟 | (𝑑 − 1)
= 𝑑−1−1 +
𝑖=0
(𝑑 − 1) 𝑖 (𝑑 − 1) 𝑟
√ ∑︁𝑟
|𝑉𝑖 | √ |𝑉 |
= 𝑑−2 𝑑−1 2 1 1
𝑟
+ 𝑑 − − .
𝑖=0
(𝑑 − 1) 𝑖 (𝑑 − 1) 𝑟
We have |𝑉𝑖+1 | ≤ (𝑑 − 1) |𝑉𝑖 | for every 𝑖 ≥ 0, so that |𝑉𝑟 | (𝑑 − 1) −𝑟 ≤ |𝑉𝑖 | (𝑑 − 1) −𝑖 for each
𝑖 ≤ 𝑟. So continuing,
! 𝑟
2 𝑑 − 1 − 1 ∑︁ |𝑉𝑖 |
√
√
≤ 𝑑−2 𝑑−1+
𝑟 +1 (𝑑 − 1) 𝑖
!
𝑖=0
√
√ 2 𝑑−1−1
= 𝑑−2 𝑑−1+ ⟨𝑥, 𝑥⟩ ,
𝑟 +1
It follows that
√ !
⟨𝑥, 𝐴𝑥⟩ ⟨𝑥, 𝐿𝑥⟩ √ 2 𝑑−1−1
=𝑑− ≥ 2 𝑑−1−
⟨𝑥, 𝑥⟩ ⟨𝑥, 𝑥⟩ 𝑟 +1
1 √
≥ 1− 2 𝑑 − 1. □
𝑟 +1
124 Pseudorandom Graphs
Proof of the Alon–Boppana bound (Theorem 3.6.2). Let 𝑉 = 𝑉 (𝐺). Let 1 be the all-1’s
vector, which is an eigenvector with eigenvalue 𝑑. To prove the theorem, it suffices to exhibit
a nonzero vector 𝑧 ⊥ 1 such that
⟨𝑧, 𝐴𝑧⟩ √
≥ 2 𝑑 − 1 − 𝑜(1).
⟨𝑧, 𝑧⟩
Let 𝑟 be an arbitrary positive integer. When 𝑛 is sufficiently large, there exist two edges 𝑠𝑡
and 𝑠′ 𝑡 ′ in the graph with distance at least 2𝑟 + 2 apart (indeed, since the number of vertices
within distance 𝑘 of an edge is ≤ 2(1 + (𝑑 − 1) + (𝑑 − 1) 2 + · · · + (𝑑 − 1) 𝑘 )). Let 𝑥 ∈ R𝑉
be the vector constructed as in Lemma 3.6.4 for 𝑠𝑡, and let 𝑦 ∈ R𝑉 be the corresponding
vector constructed for 𝑠′ 𝑡 ′ . Recall that 𝑥 is supported on vertices within distance 𝑟 from 𝑠𝑡,
and likewise with 𝑦 and 𝑠′ 𝑡 ′ . Since 𝑠𝑡 and 𝑠′ 𝑡 ′ are at distance at least 2𝑟 + 2 apart, the support
of 𝑥 is at distance at least 2 from the support of 𝑦. Thus
⟨𝑥, 𝑦⟩ = 0 and ⟨𝑥, 𝐴𝑦⟩ = 0.
Choose a constant 𝑐 ∈ R such that 𝑧 = 𝑥 − 𝑐𝑦 has sum of its entries equal to zero (this is
possible since ⟨𝑦, 1⟩ > 0). Then
⟨𝑧, 𝑧⟩ = ⟨𝑥, 𝑥⟩ + 𝑐2 ⟨𝑦, 𝑦⟩
and so by Lemma 3.6.4
⟨𝑧, 𝐴𝑧⟩ = ⟨𝑥, 𝐴𝑥⟩ + 𝑐2 ⟨𝑦, 𝐴𝑦⟩
1 √
≥ 1− 2 𝑑 − 1 ⟨𝑥, 𝑥⟩ + 𝑐2 ⟨𝑦, 𝑦⟩
𝑟 +1
1 √
= 1− 2 𝑑 − 1 ⟨𝑧, 𝑧⟩ .
𝑟 +1
Taking 𝑟 → ∞ as 𝑛 → ∞ gives the theorem. □
Remark 3.6.5. The above proof cleverly considers distance from an edge rather than from a
single vertex. This is important for a rather subtle reason. Why does the proof fail if we had
instead considered distance from a vertex?
Now let us give another proof—actually we will only prove the slightly weaker statement
of Corollary 3.6.3, which is equivalent to
√
max {|𝜆2 | , |𝜆 𝑛 |} ≥ 2 𝑑 − 1 − 𝑜(1). (3.6)
√
As a warmup, let us first prove (3.6) with 𝑑 − 𝑜(1) on the right-hand side. We have
∑︁𝑛
𝑑𝑛 = 2𝑒(𝐺) = tr 𝐴 =2
𝜆2𝑖 ≤ 𝑑 2 + (𝑛 − 1) max {|𝜆2 | , |𝜆 𝑛 |}2 .
𝑖=1
So
√︂
𝑑 (𝑛 − 𝑑) √
max {|𝜆2 | , |𝜆 𝑛 |} ≥ = 𝑑 − 𝑜(1)
𝑛−1
as 𝑛 → ∞ for fixed 𝑑.
To prove (3.6), we consider higher moments tr 𝐴 𝑘 . This is a useful technique, sometimes
called the trace method or the moment method.
3.6 Second Eigenvalue: Alon–Boppana Bound 125
counts the number of closed walks of length 2𝑘 on 𝐺. Let T𝑑 denote the infinite 𝑑-regular
tree. Observe that
# closed length-2𝑘 walks in 𝐺 starting from a fixed vertex
≥ # closed length-2𝑘 walks in T𝑑 starting from a fixed vertex.
Indeed, at each vertex, for both 𝐺 and T𝑑 , we can label its 𝑑 incident edges arbitrarily from
1 to 𝑑 (the labels assigned from the two endpoints of the same edge do not have to match).
Then every closed length-2𝑘 walk in T𝑑 corresponds to a distinct closed length-2𝑘 walk in
𝐺 by tracing the same outgoing edges at each step (why?). Note that not all closed walks in
𝐺 arise this way (e.g., walks that go around cycles in 𝐺).
The number of closed walks of length 2𝑘 on an infinite 𝑑-regular graph starting at a fixed
1 2𝑘
root is at least (𝑑 − 1) 𝑘 𝐶𝑘 , where 𝐶𝑘 = 𝑘+1 𝑘
is the 𝑘-th Catalan number. To see this, note
that each step in the walk is either “away from the root” or “towards the root.” We record a
sequence by denoting steps of the former type by + and of the latter type by −.
+ −
− − + ++
+ + − − −
+ −
+ + + − + − − + + + − − − −
Then the number of valid sequences permuting 𝑘 +’s and 𝑘 −’s is exactly counted by the
Catalan number 𝐶𝑘 , as the only constraint is that there can never be more −’s than +’s up to
any point in the sequence. Finally, there are at least 𝑑 − 1 choices for where to step in the
walk at any + (there are 𝑑 choices at the root), and exactly one choice for each −.
Thus, the number of closed walks of length 2𝑘 in 𝐺 is at least
𝑛 2𝑘
tr 𝐴2𝑘 ≥ 𝑛(𝑑 − 1) 𝑘 𝐶𝑘 ≥ (𝑑 − 1) 𝑘 .
𝑘 +1 𝑘
On the other hand, we have
∑︁
𝑛
tr 𝐴2𝑘 = 𝜆2𝑘
𝑖 ≤ 𝑑
2𝑘
+ (𝑛 − 1) max {|𝜆2 | , |𝜆 𝑛 |}2𝑘 .
𝑖=1
Thus,
2𝑘 1 2𝑘 𝑑 2𝑘
max {|𝜆2 | , |𝜆 𝑛 |} ≥ (𝑑 − 1) 𝑘 − .
𝑘 +1 𝑘 𝑛−1
126 Pseudorandom Graphs
1 2𝑘
The term 𝑘+1 is (2 − 𝑜(1)) 2𝑘 as 𝑘 → ∞. Letting 𝑘 → ∞ slowly (e.g., 𝑘 = 𝑜(log 𝑛)) as
𝑘 √
𝑛 → ∞ gives us max {|𝜆2 | , |𝜆 𝑛 |} ≥ 2 𝑑 − 1 − 𝑜(1). □
Remark 3.6.6. The infinite 𝑑-regular graph T𝑑 is the universal cover of all 𝑑-regular√graphs
(this fact is used in the first step of the argument). The spectral radius of T𝑑 is 2 𝑑 − 1,
which is the fundamental reason why this number arises in the Alon–Boppana bound.
√
Graphs with 𝜆2 ≈ 2 𝑑 − 1
Let us return to Question 3.6.1: what is the smallest possible 𝜆2 for 𝑛-vertex 𝑑-regular graphs,
with 𝑑 fixed and 𝑛 large? Is the Alon–Boppana bound tight? (The answer is yes.)
Alon’s second eigenvalue conjecture says that random 𝑑-regular graphs match the Alon–
Boppana bound. This was proved by Friedman (2008). We will not present the proof, as it is
quite a difficult result.
In other words, the above theorem says that random 𝑑-random graphs on 𝑛 vertices satisfy,
with probability 1 − 𝑜(1) (for fixed 𝑑 ≥ 3 and 𝑛 → ∞),
√
max {|𝜆2 | , |𝜆 𝑛 |} ≤ 2 𝑑 − 1 + 𝑜(1).
√
Can we get ≤ 2 𝑑 − 1 exactly without an error term? This leads us to one of the biggest
open problems of the field.
While it is not too hard to construct small Ramanujan graphs (e.g., 𝐾 𝑑+1 has eigenvalues
𝜆1 = 𝑑 and 𝜆2 = · · · = 𝜆 𝑛 = −1), it is a major open problem to construct infinitely many
𝑑-regular Ramanujan graphs for each 𝑑.
The term “Ramanujan graph” was coined by Lubotzky, Phillips, & Sarnak (1988), who
constructed infinite families of 𝑑-regular Ramanujan graphs when 𝑑 − 1 is an odd prime.
The same result was independently proved by Margulis (1988). The proof of the eigenvalue
bounds uses deep results from number theory, namely solutions to the Ramanujan conjecture
(hence the name). These constructions were later extended by Morgenstern (1994) whenever
𝑑 − 1 is a prime power. The current state of Conjecture 3.6.9 is given below, and it remains
open for all other 𝑑, with the smallest open case being 𝑑 = 7.
3.6 Second Eigenvalue: Alon–Boppana Bound 127
All known results are based on explicit constructions using Cayley graphs on PSL(2, 𝑞)
or related groups. We refer the reader to the book Davidoff, Sarnak, & Valette (2003) for a
gentle exposition of the construction.
Theorem 3.6.7 says that random 𝑑-regular graphs are “nearly-Ramanujan.” Empirical
evidence suggests that for each fixed 𝑑, a uniform random 𝑛-vertex 𝑑-regular graph is
Ramanujan with probability bounded away from 0 and 1, for large 𝑛.
If this were true, it would prove Conjecture 3.6.9 on the existence of Ramanujan graphs.
However, no rigorous results are known in this vein.
One can formulate a bipartite analog.
Exercise 3.6.14 (Alon–Boppana bound with multiplicity). Prove that for every positive
integer 𝑑 and real 𝜀 > 0, there is some constant√ 𝑐 > 0 so that every 𝑛-vertex 𝑑-regular
graph has at least 𝑐𝑛 eigenvalues greater than 2 𝑑 − 1 − 𝜀.
Exercise 3.6.15∗ (Net removal decreases top eigenvalue). Show that for every 𝑑 and 𝑟,
there is some 𝜀 > 0 such that if 𝐺 is a 𝑑-regular graph, and 𝑆 ⊆ 𝑉 (𝐺) is such that every
vertex of 𝐺 is within distance 𝑟 of 𝑆, then the top eigenvalue of the adjacency matrix of
𝐺 − 𝑆 (i.e., remove 𝑆 and its incident edges from 𝐺) is at most 𝑑 − 𝜀.
Further Reading
The survey Pseudo-random Graphs by Krivelevich & Sudakov (2006) discusses many com-
binatorial aspects of this topic.
128 Pseudorandom Graphs
Expander graphs are a large and intensely studied topic, partly due to many important
applications in computer science. Here are two important surveys articles:
• Expander Graphs and Their Applications by Hoory, Linial, & Wigderson (2006);
• Expander Graphs in Pure and Applied Mathematics by Lubotzky (2012).
For spectral graph theory, see the book Spectral Graph Theory by Chung (1997), or the
book draft Spectral and Algebraic Graph Theory by Spielman.
The book Elementary Number Theory, Group Theory and Ramanujan Graphs by Davidoff,
Sarnak, & Valette (2003) gives a gentle introduction to the construction of Ramanujan graphs.
The breakthrough by Marcus, Spielman, & Srivastava (2015) constructing bipartite Ra-
manujan graphs via interlacing polynomials is an instant classic.
Chapter Summary
• We are interested in quantifying how a given graph can be similar to a random graph.
• The Chung–Graham–Wilson quasirandom graphs theorem says that several notions
are equivalent, notably:
– DISC: edge discrepancy (c.f. the 𝜀-regular pair from c2),
– C4 : 4-cycle count close to random, and
– EIG: all eigenvalues (except the largest) small.
These equivalences only apply to graphs at constant order edge density. Some of the
implications break down for sparser graphs.
• An (𝒏, 𝒅, 𝝀)-graph is an 𝑛-vertex 𝑑-regular graph all of whose adjacency matrix eigenval-
ues are ≤ 𝜆 in absolute value except the top one (which must be 𝑑). The second eigenvalue
plays an important role in pseudorandomness.
• Expander mixing lemma. An (𝑛, 𝑑, 𝜆)-graph satisfies
𝑑 √︁
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ 𝜆 |𝑋 | |𝑌 | for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺).
𝑛
• The eigenvalues of an abelian Cayley graph Cay(Γ, 𝑆) can be computed via the Fourier
transform of 1𝑆 𝑆. For example, using a Gauss sum, one can deduce that the Paley graph
(generated by quadratic residues in Z/𝑝Z) is quasirandom.
• A non-abelian group with no small non-trivial representations is call a quasirandom
group.
– Every Cayley graph on a quasirandom group is a quasirandom graph.
– There are no large product-free sets in a quasirandom group.
– Example of quasirandom group: PSL(2, 𝑝), which has order ( 𝑝 3 − 𝑝)/2, and all non-
trivial representations have dimension ≥ ( 𝑝 − 1)/2.
• Among vertex-transitive graphs (which includes all Cayley graphs), the sparse ana-
logues of the discrepancy property (SparseDISC) and small second eigenvalue property
(SparseEIG) are equivalent up to a linear change of the error tolerance parameter. This
equivalence is false for general graphs.
– Proof applies Grothendieck’s inequality, which says that that semidefinite relaxation
of the ℓ ∞ → ℓ 1 norm (equivalent to the cut norm) gives a constant factor approximation.
• Alon–Boppana√second eigenvalue bound. Every 𝑑-regular graph has second largest
eigenvalue ≥ 2 𝑑 − 1 − 𝑜(1) for the adjacency matrix, with 𝑑 fixed as the number of
vertices goes to infinity.
– Two spectral proof√ methods: (1) constructing a test vector and (2) trace/moment method
– The constant 2 𝑑 − 1 is √ best possible, as a random 𝑑-regular graph is typically an
(𝑛, 𝑑, 𝜆)-graph with 𝜆 = 2 𝑑 − 1 + 𝑜(1) (Friedman’s theorem).
√
– A Ramanujan graph is an (𝑛, 𝑑, 𝜆)-graph with 𝜆 = 2 𝑑 − 1. It is conjectured that for
every 𝑑 ≥ 3, there exist infinitely many 𝑑-regular Ramanujan graphs (this is known to
hold when 𝑑 − 1 is a prime power). A bipartite version of this conjecture is true.
4
Graph limits
Chapter Highlights
• An analytic language for studying dense graphs
• Convergence and limit for a sequence of graphs
• Compactness of the graphon space with respect to the cut metric
• Applications of compactness
• Equivalence of cut metric convergence and left-convergence
The theory of graph limits was developed by Lovász and his collaborators in a series
of works starting around 2003. The researchers were motivated by questions about very
large graphs from several different angles, including from combinatorics, statistical physics,
computer science, and applied math. Graph limits give an analytic framework for analyzing
large graphs. The theory offers both a convenient mathematical language as well as powerful
theorems.
Motivation
Suppose we live in a hypothetical world where we only had access to rational numbers and
had no language for irrational numbers. We are given the following optimization problem:
minimize 𝑥 3 − 𝑥 subject to 0 ≤ 𝑥 ≤ 1.
√
The minimum occurs at 𝑥 = 1/ 3, but this answer does not make sense over the rationals.
With only access to rationals, we can state a progressively improving sequence of answers
that converge to the optimum. This is rather cumbersome. It is much easier to write down a
single real number expressing the answer.
Now consider an analogous question for graphs. Fix some real 𝑝 ∈ [0, 1]. We want to
minimize (# closed walks of length 4)/𝑛4
among 𝑛-vertex graphs with ≥ 𝑝𝑛2 /2 edges.
We know from Proposition 3.1.14 every 𝑛-vertex graph with edge density ≥ 𝑝 has at least
𝑛4 𝑝 4 closed walks of length 4. On the other hand, every sequence of quasirandom graphs
with edge density 𝑝 + 𝑜(1) has 𝑝 4 𝑛4 + 𝑜(𝑛4 ) closed walks of length 4. It follows that the
minimum (or rather, infimum) is 𝑝 4 , and is attained not by any single graph, but rather by a
sequence of quasirandom graphs.
One of the purposes of graph limits is to provide an easy-to-use mathematical object that
captures the limit of such graph sequences. The central object in the theory of graph limits
129
130 Graph limits
is called a graphon (the word comes from combining graph and function), to be defined
shortly. Graphons can be viewed as an analytic generalization of graphs.
Here are some questions that we will consider:
(1) What does it mean for a sequence of graphs (or graphons) to converge?
(2) Are different notions of convergence equivalent?
(3) Does every convergent sequence of graphs (or graphons) have a limit?
Note that it is possible to talk about convergence without a limit. In a first real analysis
course, one learns about a Cauchy sequence in a metric space (X, 𝑑), which is some
sequence 𝑥1 , 𝑥2 , · · · ∈ X such that for every 𝜀 > 0, there is some 𝑁 so that 𝑑 (𝑥 𝑚 , 𝑥 𝑛 ) < 𝜀
for all 𝑚, 𝑛 ≥ 𝑁. For instance, one can have a Cauchy sequence without a limit in Q. A
metric space is complete if every Cauchy sequence has a limit. The completion of X is some
complete metric space X e such that X is isometrically embedded in X
e as a dense subset. The
completion of X is in some sense the smallest complete space containing X. For example, R
is the completion of Q. Intuitively, the completion of a space fills in all of its gaps. A basic
result in analysis says that every space has a unique completion.
Here is a key result about graph limits that we will prove:
The space of graphons is compact, and is the completion of the set of graphs.
To make this statement precise, we also need to define a notion of similarity (i.e., distance)
between graphs, and also between graphons. We will see two different notions, one based
on the cut metric, and another based on subgraph densities. Another important result in the
theory of graph limits is that these two notions are equivalent. We will prove it at the end of
the chapter once we have developed some tools.
4.1 Graphons
Here is the central object in the theory of dense graph limits.
Remark 4.1.2. More generally, we can consider an arbitrary probability space Ω and study
symmetric measurable functions Ω × Ω → [0, 1]. In practice, we do not lose much by
restricting to [0, 1].
We will also sometimes consider symmetric measurable functions [0, 1] 2 → R (e.g.,
arising as the difference between two graphons). Such an object is sometimes called a kernel
in the literature.
Remark 4.1.3 (Measure theoretic technicalities). We try to sweep measure theoretic tech-
nicalities under the rug in order to focus on key ideas. If you have not seen measure theory
before, do not worry. Just view “measure” as lengths of intervals or areas of boxes (or count-
able unions thereof) in the most natural sense. We always ignore measure zero differences.
For example, we shall treat two graphons as the same if they only differ on a measure zero
subset of the domain.
4.1 Graphons 131
More generally, we can encode nonnegative vertex and edge weights in a graphon.
Example 4.1.6 (Half-graph). Consider the bipartite graph on 2𝑛 vertices, with one vertex
part {𝑣 1 , . . . , 𝑣 𝑛 } and the other vertex part {𝑤 1 , . . . , 𝑤 𝑛 }, and edges 𝑣 𝑖 𝑤 𝑗 whenever 𝑖 ≤ 𝑗. Its
adjacency matrix and associated graphon are illustrated below.
0 0 0 0 0 0 1 1 1 1 1 1
1 7 0 0 0 0 0 0 0 1 1 1 1 1
0 0 0 0 0 0 0 0 1 1 1 1
2 8 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 1 1
3 9 0 0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0
4 10 1 1 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0
5 11 1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
6 12 1 1 1 1 1 1 0 0 0 0 0 0
(
0 1
0
1 if 𝑥 + 𝑦 ≤ 1/2 or 𝑥 + 𝑦 ≥ 3/2,
𝑊 (𝑥, 𝑦) =
0 otherwise.
1
In general, pointwise convergence turns out to be too restrictive. We will need a more
flexible notion of convergence, which we will discuss more in depth in the next section. Let
us first give some more examples to motivate subsequent definitions.
Example 4.1.7 (Quasirandom graphs). Let 𝐺 𝑛 be a sequence of quasirandom graphs with
edge density approaching 1/2, and 𝑣(𝐺 𝑛 ) → ∞. The constant graphon 𝑊 ≡ 1/2 seems like
a reasonable candidate for its limit, and later we will see that this is indeed the case.
132 Graph limits
−→ 1
2
Example 4.1.8 (Stochastic block model). Consider an 𝑛 vertex graph with two types of
vertices: red and blue. Half of the vertices are red and half of the vertices are blue. Two red
vertices are adjacent with probability 𝑝 𝑟 , two blue vertices are adjacent with probability 𝑝 𝑏 ,
and finally, a red vertex and a blue vertex are adjacent with probability 𝑝 𝑟 𝑏 , all independently.
Then as 𝑛 → ∞, the graphs converge to the step graphon shown below.
𝑝𝑟 𝑝𝑟 𝑏
−→
𝑝𝑟 𝑏 𝑝𝑏
The above examples suggest that the limiting graphon looks like a blurry image of the
adjacency matrix. However, there is an important caveat as illustrated in the next example.
Example 4.1.9 (Checkerboard). Consider the 2𝑛×2𝑛 “checkerboard” graphon shown below
(for 𝑛 = 4).
1 2
3 4
5 6
7 8
Since the 0’s and 1’s in the adjacency matrix are evenly spaced, one might suspect that
this sequence converges to the constant 1/2 graphon. However, this is not so. The checker-
board graphon is associated to the complete bipartite graph 𝐾𝑛,𝑛 , with the two vertex parts
interleaved. By relabeling the vertices, we see that below is another representation of the
associated graphon of the same graph.
1 5
2 6
3 7
4 8
So the graphon is the same for all 𝑛. So the graphon shown on the right, which is also 𝑊𝐾2 ,
must be the limit of the sequence, and not the constant 1/2 graphon.
This example tells us that we must be careful about the possibility of rearranging vertices
when studying graph limits.
A graphon is an infinite dimensional object. We would like some ways to measure the
similarity between two graphons. We will explain two different approaches:
• cut distance, and
4.2 Cut Distance 133
• homomorphism densities.
One of the main results in the theory of graph limits is that these two approaches are
equivalent—we will show this later in the chapter.
∥𝑾 ∥ ∞ := sup 𝑡 : 𝑊 −1 ( [𝑡, ∞)) has positive measure .
(This is not simply the supremum of 𝑊; the definition should be invariant under measure
zero changes of 𝑊.)
134 Graph limits
Let 𝐺 and 𝐺 ′ be two graphs sharing a common vertex set. Let 𝑊𝐺 and 𝑊𝐺 ′ be their
associated graphons (using the same ordering of vertices when constructing the graphons).
Then 𝐺 and 𝐺 ′ are 𝜀-close in cut norm (see (4.1)) if and only if
∥𝑊𝐺 − 𝑊𝐺 ′ ∥ □ ≤ 𝜀.
(There is a subtlety in this claim that is worth thinking about: should we be worried about
sets 𝑆, 𝑇 ⊆ [0, 1] in Definition 4.2.1 of cut norm that contain fractions of some intervals that
represent vertices? See Lemma 4.5.3 for a reformulation of the cut norm that may shed some
light.)
We need a concept for an analog of a vertex set permutation for graphons. We write
𝝀(𝑨) := the Lebesgue measure of 𝐴.
Intuitively, this is the “length” or “area” of 𝐴. We will always be referring to Lebesgue
measurable sets (measure theoretic technicalities are not central to the discussions here, so
feel free to ignore them).
Example 4.2.3. For any constant 𝛼 ∈ R, the function 𝜙(𝑥) = 𝑥 + 𝛼 mod 1 is measure
preserving (this map rotates the circle R/Z by 𝛼).
A more interesting example is, 𝜙(𝑥) = 2𝑥 mod 1, illustrated below.
1 1
𝑓 (𝑥) = 2𝑥 mod 1
𝐴
0 0
0 1 0 1
𝑓 −1 ( 𝐴)
This map is also measure preserving. This might not seem to be the case at first, since 𝑓 seems
to shrink some intervals by half. However, the definition of measure preserving actually says
𝜆( 𝑓 −1 ( 𝐴)) = 𝜆( 𝐴) and not 𝜆( 𝑓 ( 𝐴)) = 𝜆( 𝐴). For any interval [𝑎, 𝑏] ⊆ [0, 1], we have
4.2 Cut Distance 135
𝑓 −1 ([𝑎, 𝑏]) = [𝑎/2, 𝑏/2] ∪ [1/2 + 𝑎/2, 1/2 + 𝑏/2], which does have the same measure as
[𝑎, 𝑏]. This map is 2-to-1, and it is not invertible.
Given 𝑊 : [0, 1] 2 → R and an invertible measure preserving map 𝜙 : [0, 1] → [0, 1], we
write
𝑾 𝝓 (𝒙, 𝒚) := 𝑊 (𝜙(𝑥), 𝜙(𝑦)).
Intuitively, this operation relabels the vertex set.
where the infimum is taken over all invertible measure preserving maps 𝜙 : [0, 1] →
[0, 1]. Define the cut distance between two graphs 𝐺 and 𝐺 ′ by the cut distance of their
associated graphons:
𝜹□ (𝑮, 𝑮 ′ ) := 𝛿□ (𝑊𝐺 , 𝑊𝐺 ′ ).
Likewise, we can also define the cut distance between a graph and a graphon 𝑈:
𝜹□ (𝑮, 𝑼) := 𝛿□ (𝑊𝐺 , 𝑈).
Space of graphons
We can form a metric space by identifying graphons with measure zero (i.e., treating such
two graphs with cut distance zero as the same point).
One of the main goals of this chapter is to prove this theorem and show its applications.
The compactness of graphon space is related to the graph regularity lemma. In fact,
we will use the regularity method to prove compactness. Both compactness and the graph
regularity lemma tell us that despite the infinite variability of graphs, every graph can be
𝜀-approximated by a graph from a finite set of templates.
We close this section with the following observation.
Proof. Let 𝜀 > 0. It suffices to show that for every graphon 𝑊 there exists a graph 𝐺 such
that 𝛿□ (𝐺, 𝑊) < 𝜀.
We approximate 𝑊 in several steps, illustrated below.
𝑊 𝑊1 𝑊2
First, by rounding down the values of 𝑊 (𝑥, 𝑦), we construct a graphon 𝑊1 whose values
are all integer multiples of 𝜀/3, such that
∥𝑊 − 𝑊1 ∥ ∞ ≤ 𝜀/3.
Next, since every Lebesgue measurable subset of [0, 1] 2 can be arbitrarily well approx-
imated using a union of boxes, we can find a step graphon 𝑊2 approximating 𝑊1 in 𝐿 1
norm:
∥𝑊1 − 𝑊2 ∥ 1 ≤ 𝜀/3.
Finally, by replacing each block of 𝑊2 by a sufficiently large quasirandom (bipartite) graph
of edge density equal to the value of 𝑊2 , we find a graph 𝐺 so that
∥𝑊2 − 𝑊𝐺 ∥ □ ≤ 𝜀/3.
Then 𝛿□ (𝑊, 𝐺) < 𝜀. □
Remark 4.2.9. In the above proof, to obtain ∥𝑊1 − 𝑊2 ∥ 1 ≤ 𝜀/3, the number of steps of 𝑊2
cannot be uniformly bounded as a function of 𝜀 (i.e., it must depend on 𝑊 as well—think
4.3 Homomorphism Density 137
about what happens for a random graph). Consequently the number of vertices of the final
graph 𝐺 produced by this proof is not bounded by a function of 𝜀.
Later on, we will see a different proof showing that for every 𝜀 > 0, there is some
𝑁 (𝜀) so that every graphon lies within cut distance 𝜀 of some graph with ≤ 𝑁 (𝜀) vertices
(Proposition 4.8.1).
Since every compact metric space is complete, we have the following corollary.
Exercise 4.2.11 (Zero-one valued graphons). Let 𝑊 be a {0, 1}-valued graphon. Sup-
pose graphons 𝑊𝑛 satisfy ∥𝑊𝑛 − 𝑊 ∥ □ → 0 as 𝑛 → ∞. Show that ∥𝑊𝑛 − 𝑊 ∥ 1 → 0 as
𝑛 → ∞.
This definition agrees with Definition 4.3.1 for the triangle density in graphs. Indeed, for
every graph 𝐺, the triangle density in 𝐺 equals the triangle density in the associated graphon
𝑊𝐺 ; that is, 𝑡 (𝐾3 , 𝑊𝐺 ) = 𝑡 (𝐾3 , 𝐺).
Note that for all graphs 𝐹 and 𝐺, letting 𝑊𝐺 be the graphon associated to 𝐺,
𝑡 (𝐹, 𝐺) = 𝑡 (𝐹, 𝑊𝐺 ). (4.2)
So the two definitions of 𝐹-density agree.
One usually has 𝑣(𝐺 𝑛 ) → ∞, but it is not strictly necessary for this definition. Note
4.4 𝑊-Random Graphs 139
that when 𝑣(𝐺 𝑛 ) → ∞, homomorphism densities and subgraph densities coincide (see
Remark 4.3.3).
It turns out that left-convergence is equivalent to convergence in cut metric. This founda-
tional result in graph limits is due to Borgs, Chayes, Lovász, Sós, & Vesztergombi (2008).
The implication that convergence in cut metric implies left-convergence is easier; it follows
from the counting lemma (Section 4.5). The converse is more difficult, and we will establish
it at the end of the chapter.
This allows us to talk about convergent sequences of graphs or graphons without spec-
ifying whether we are referring to left-convergence or convergence in cut metric. However,
since a major goal of this chapter is to prove the equivalence between these two notions, we
will be more specific about the notion of convergence.
From the compactness of the space of graphons and the equivalence of convergence
(actually only needing the easier implication), we will be able to quickly deduce the existence
of limit for a left-convergent sequence, which was first proved by Lovász & Szegedy (2006).
Note that the following statement does not require knowledge of the cut metric.
Remark 4.3.9. One can artificially define a metric that coincides with left-convergence. Let
(𝐹𝑛 ) 𝑛≥1 enumerate over all graphs. One can define a distance between graphons 𝑈 and 𝑊 by
∑︁
2−𝑘 |𝑡 (𝐹𝑘 , 𝑊) − 𝑡 (𝐹𝑘 , 𝑈)| .
𝑘 ≥1
We see that a sequence of graphons convergences under this notion of distance if and only if
it is left-convergent. This shows that left-convergence defines a metric topology on the space
of graphons, but in practice the above distance is pretty useless.
Exercise 4.3.10 (Counting Eulerian orientations). Define 𝑊 : [0, 1] 2 → R by 𝑊 (𝑥, 𝑦) =
2 cos(2𝜋(𝑥 − 𝑦)). Let 𝐹 be a graph. Show that 𝑡 (𝐹, 𝑊) is the number of ways to orient all
edges of 𝐹 so that every vertex has the same number of incoming edges as outgoing edges.
with all the numbers lying in [0, 1], and subject to 𝑞 𝑟 + 𝑞 𝑏 = 1. We form a 𝑛-vertex random
graph as follows:
(1) Color each vertex red with probability 𝑞 𝑟 and blue with probability 𝑞 𝑏 , independently
at random. These vertex colors are “hidden states” and are not part of the data of
the output random graph (this step is slightly different from Example 4.1.8 in an
unimportant way);
(2) For every pair of vertices, independently place an edge between them with probability
• 𝑝 𝑟𝑟 if both vertices are red,
• 𝑝 𝑏𝑏 if both vertices are blue, and
• 𝑝 𝑟 𝑏 if one vertex is red and the other is blue.
One can easily generalize the above to a 𝒌-block model, where vertices have 𝑘 hidden
states, with 𝑞 1 , . . . , 𝑞 𝑘 (adding up to 1) being the vertex state probabilities, and a symmetric
𝑘 × 𝑘 matrix ( 𝑝 𝑖 𝑗 )1≤𝑖, 𝑗 ≤ 𝑘 of edge probabilities for pairs of vertices between various states.
𝑊-random graph
The 𝑊-random graph is a further generalization. The stochastic block model corresponds to
step graphons 𝑊.
𝑥3 𝑥5 𝑥1 𝑥2 𝑥4
𝑥3
𝑥5
𝑥1
𝑥2
𝑥4
Remark 4.4.3. The theorem does not require each 𝐺 𝑛 to be sampled independently. For
example, we can construct the sequence of random graphs, with 𝐺 𝑛 distributed as G(𝑛, 𝑊),
by revealing one vertex at a time without resampling the previous vertices and edges. In this
case, each 𝐺 𝑛 is a subgraph of the next graph 𝐺 𝑛+1 .
We will need the following standard result about concentration of Lipschitz functions. This
can be proved using Azuma’s inequality (e.g., see Chapter 7 of The Probabilistic Method by
Alon & Spencer).
Let us show that the 𝐹-density in a 𝑊-random graph rarely differs significantly from
𝑡 (𝐹, 𝑊).
Proof. Recall from Remark 4.3.3 that the injective homomorphism density 𝑡 inj (𝐹, 𝐺) is
defined to be the fraction of injective maps 𝑉 (𝐹) → 𝑉 (𝐺) that carry every edge of 𝐹 to an
edge of 𝐺. We will first prove that
−𝜀 2 𝑛
P 𝑡 inj (𝐹, G(𝑛, 𝑊)) − 𝑡 (𝐹, 𝑊) > 𝜀 ≤ 2 exp . (4.5)
2𝑣(𝐹) 2
Let 𝑦 1 , . . . , 𝑦 𝑛 , and 𝑧𝑖 𝑗 for each 1 ≤ 𝑖 < 𝑗 ≤ 𝑛, be independent uniform random variables in
[0, 1]. Let 𝐺 be the graph on vertices {1, . . . , 𝑛} with an edge between 𝑖 and 𝑗 if and only if
𝑧𝑖 𝑗 ≤ 𝑊 (𝑦 𝑖 , 𝑦 𝑗 ), for every 𝑖 < 𝑗. Then 𝐺 has the same distribution as G(𝑛, 𝑊). Let us group
variables 𝑦 𝑖 , 𝑧𝑖 𝑗 into 𝑥1 , 𝑥2 , . . . , 𝑥 𝑛 where
𝑥1 = (𝑦 1 ), 𝑥2 = (𝑦 2 , 𝑧12 ), 𝑥3 = (𝑦 3 , 𝑧13 , 𝑧23 ), 𝑥4 = (𝑦 4 , 𝑧14 , 𝑧24 , 𝑧34 ), ....
This amounts to exposing the graph 𝐺 one vertex at a time. Define the function 𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) =
𝑡 inj (𝐹, 𝐺). Note that E 𝑓 = E 𝑡 inj (𝐹, G(𝑛, 𝑊)) = 𝑡 (𝐹, 𝑊) by linearity of expectations (in this
step, it is important that we are using the injective variant of homomorphism densities). Note
142 Graph limits
changing a single coordinate of 𝑓 changes the value of the function by at most 𝑣(𝐹)/𝑛, since
exactly a 𝑣(𝐹)/𝑛 fraction of injective maps 𝑉 (𝐹) → 𝑉 (𝐺) includes a fixed 𝑣 ∈ 𝑉 (𝐺) in the
image. Then (4.5) follows from the bounded differences inequality, Theorem 4.4.4.
To deduce the theorem from (4.5), recall from Remark 4.3.3 that
𝑡 (𝐹, 𝐺) − 𝑡 inj (𝐹, 𝐺) ≤ 𝑣(𝐹) 2 /(2𝑣(𝐺)).
If 𝜀 < 𝑣(𝐹) 2 /𝑛, then the right-hand side of (4.4) is at least 2𝑒 − 𝜀/8 ≥ 1, and so the inequality
trivially holds. Otherwise, |𝑡 (𝐹, G(𝑛, 𝑊)) − 𝑡 (𝐹, 𝑊)| > 𝜀 implies 𝑡 inj (𝐹, G(𝑛, 𝑊)) − 𝑡 (𝐹, 𝑊) >
𝜀 − 𝑣(𝐹) 2 /(2𝑛) ≥ 𝜀/2, and then we can apply (4.5) to conclude. □
Theorem 4.4.2 then follows from the Borel–Cantelli lemma, stated below, applied to
Theorem 4.4.5 with a union bound over all rational 𝜀 > 0.
Qualitatively, the counting lemma tells us that for every graph 𝐹, the function 𝑡 (𝐹, ·) is
continuous in ( Wf0 , 𝛿□ ), the graphon space with respect to the cut metric. It implies the easier
direction of the equivalence in Theorem 4.3.7, namely that convergence in cut metric implies
left-convergence.
In the rest of this section, we prove Theorem 4.5.1. It suffices to prove that
|𝑡 (𝐹, 𝑊) − 𝑡 (𝐹, 𝑈)| ≤ |𝐸 (𝐹)| ∥𝑊 − 𝑈 ∥ □ . (4.6)
Indeed, for every invertible measure preserving map 𝜙 : [0, 1] → [0, 1], we have 𝑡 (𝐹, 𝑈) =
𝑡 (𝐹, 𝑈 𝜙 ). By considering the above inequality with 𝑈 replaced by 𝑈 𝜙 , and taking the infimum
over all 𝑈 𝜙 , we obtain Theorem 4.5.1.
The following reformulation of the cut norm is often quite useful.
4.5 Counting Lemma 143
Proof. We want to show (left-hand side below is how we defined the cut norm in Defini-
tion 4.2.1)
∫ ∫
sup 𝑊 (𝑥, 𝑦)1𝑆 (𝑥)1𝑇 (𝑦) 𝑑𝑥𝑑𝑦 = sup 𝑊 (𝑥, 𝑦)𝑢(𝑥)𝑣(𝑦) 𝑑𝑥𝑑𝑦 .
𝑆,𝑇 ⊆ [0,1] [0,1] 2 𝑢,𝑣:[0,1]→[0,1] [0,1] 2
measurable measurable
The right-hand side is at least as large as the left-hand side since we can take 𝑢 = 1𝑆 and
𝑣 = 1𝑇 . On the other hand, the integral on the right-hand side is bilinear in 𝑢 and 𝑣, and so it is
always possible to change 𝑢 and 𝑣 to {0, 1}-valued functions without decreasing the value of
the integral (e.g., think about what is the best choice for 𝑣 with 𝑢 held fixed, and vice versa).
If 𝑢 and 𝑣 are restricted to {0, 1}-valued functions, then the two sides are identical. □
As a warm up, let us illustrate the proof of the triangle counting lemma, which has all the
ideas of the general proof but with simpler notation. As illustrated below, the main idea to
“replace” 𝑊 by 𝑈 on the triangle one at a time using the cut norm.
𝑊 𝑈 𝑈 𝑈
≈ ≈ ≈
𝑊 𝑊 𝑊 𝑊 𝑊 𝑈 𝑈 𝑈
So
𝑡 (𝐾3 , 𝑊) = 𝑡 (𝑊, 𝑊, 𝑊) and 𝑡 (𝐾3 , 𝑈) = 𝑡 (𝑈, 𝑈, 𝑈).
Observe that 𝑡 (𝑊12 , 𝑊13 , 𝑊23 ) is trilinear in 𝑊12 , 𝑊13 , 𝑊23 . We have
∫
𝑡 (𝑊, 𝑊, 𝑊) − 𝑡 (𝑈, 𝑊, 𝑊) = (𝑊 − 𝑈) (𝑥, 𝑦)𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧) 𝑑𝑥𝑑𝑦𝑑𝑧.
[0,1] 3
For any fixed 𝑧, note that 𝑥 ↦→ 𝑊 (𝑥, 𝑧) and 𝑦 ↦→ 𝑊 (𝑦, 𝑧) are both measurable functions
[0, 1] → [0, 1]. So applying Lemma 4.5.3 gives
∫
(𝑊 − 𝑈) (𝑥, 𝑦)𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧) 𝑑𝑥𝑑𝑦 ≤ ∥𝑊 − 𝑈 ∥ □
[0,1] 2
144 Graph limits
for every 𝑧. Now integrate over all 𝑧 and applying the triangle inequality, we obtain
|𝑡 (𝑊, 𝑊, 𝑊) − 𝑡 (𝑈, 𝑊, 𝑊)| ≤ ∥𝑊 − 𝑈 ∥ □ .
We have similar inequalities in the other two coordinates. We can write
𝑡 (𝑊, 𝑊, 𝑊) − 𝑡 (𝑈, 𝑈, 𝑈) = 𝑡 (𝑊, 𝑊, 𝑊 − 𝑈) + 𝑡 (𝑊, 𝑊 − 𝑈, 𝑈) + 𝑡 (𝑊 − 𝑈, 𝑈, 𝑈).
We say that each term on the right-hand side is at most ∥𝑊 − 𝑈 ∥ □ in absolute value. So the
result follows. □
The above proof generalizes in a straightforward way to a general graph counting lemma..
Proof. Given a collection of graphons 𝑊𝑒 indexed by the edges 𝑒 of 𝐹, define
∫ Ö Ö
𝑡 𝐹 (𝑊𝑒 : 𝑒 ∈ 𝐸 (𝐹)) = 𝑊𝑖 𝑗 (𝑥 𝑖 , 𝑥 𝑗 ) 𝑑𝑥 𝑖 .
[0,1] 𝑉 (𝐹) 𝑖 𝑗 ∈𝐸 (𝐹 ) 𝑖 ∈𝑉 (𝐻 )
Remark 4.6.2 (Interpreting weak regularity). Given 𝐴, 𝐵 ⊆ 𝑉 (𝐺), suppose we only knew
how many vertices from 𝐴 and 𝐵 lie in each part of the partition (and not specifically which
vertices), and we are asked to predict the number of edges between 𝐴 and 𝐵. Then the sum
above is the number of edges between 𝐴 and 𝐵 that one would naturally expect based on the
edge densities between vertex parts. Being weak regular says that this prediction is roughly
correct.
Weak regularity is more “global” compared to the notion of an 𝜀-regular partition from
4.6 Weak Regularity Lemma 145
Chapter 2. Here 𝐴 and 𝐵 have size a constant order fraction of the entire vertex set, rather
than subsets of individual parts of the partition. The edge densities between certain pairs
𝐴 ∩ 𝑉𝑖 and 𝐵 ∩ 𝑉 𝑗 could differ significantly from that of 𝑉𝑖 and 𝑉 𝑗 . All we ask is that on
average these discrepancies mostly cancel out.
The following weak regularity lemma was proved by Frieze & Kannan (1999), initially
motivated by algorithmic applications that we will mention in Remark 4.6.11.
Remark 4.6.5. The stepping operator is the orthogonal projection in the Hilbert space
𝐿 2 ([0, 1] 2 ) onto the subspace of functions constant on each step 𝑆𝑖 × 𝑆 𝑗 . It can also be
viewed as the conditional expectation with respect to the 𝜎-algebra generated by 𝑆𝑖 × 𝑆 𝑗 .
Remark 4.6.8. Technically speaking, Theorem 4.6.3 does not follow from Theorem 4.6.7
since the partition of [0, 1] for 𝑊𝐺 could split intervals corresponding to individual vertices of
𝐺. However, the proofs of the two claims are exactly the same. Alternatively, one can allow a
more flexible definition of a graphon as a symmetric measurable function 𝑊 : Ω×Ω → [0, 1],
and then take Ω to be the discrete probability space 𝑉 (𝐺) endowed with the uniform measure.
Like the proof of the regularity lemma in Section 2.1, we use an energy increment strategy.
Recall from Definition 2.1.10 that the energy of a vertex partition is the mean-squared edge-
density between parts. Given a graphon 𝑊, we define the energy of a measurable partition
146 Graph limits
P = {𝑆1 , . . . , 𝑆 𝑘 } of [0, 1] by
∫ ∑︁
𝑘
2 2
∥𝑊 P ∥ 2 = 𝑊 P (𝑥, 𝑦) 𝑑𝑥𝑑𝑦 = 𝜆(𝑆𝑖 )𝜆(𝑆 𝑗 ) (avg of 𝑊 on 𝑆𝑖 × 𝑆 𝑗 ) 2 .
[0,1] 2 𝑖, 𝑗=1
Proof. Because ∥𝑊 − 𝑊 P ∥ □ > 𝜀, there exist measurable subsets 𝑆, 𝑇 ⊆ [0, 1] such that
|⟨𝑊 − 𝑊 P , 1𝑆×𝑇 ⟩| > 𝜀.
Let P ′ be the refinement of P by introducing 𝑆 and 𝑇, dividing each part of P into ≤ 4
sub-parts. We know that
⟨𝑊 P , 𝑊 P ⟩ = ⟨𝑊 P ′ , 𝑊 P ⟩
because 𝑊 P is constant on each step of P, and P ′ is a refinement of P. Thus,
⟨𝑊 P ′ − 𝑊 P , 𝑊 P ⟩ = 0.
By the Pythagorean Theorem (in the Hilbert space 𝐿 2 ( [0, 1] 2 )),
∥𝑊 P ′ ∥ 22 = ∥𝑊 P ∥ 22 + ∥𝑊 P ′ − 𝑊 P ∥ 22 . (4.7)
Note that ⟨𝑊 P ′ , 1𝑆×𝑇 ⟩ = ⟨𝑊, 1𝑆×𝑇 ⟩ since 𝑆 and 𝑇 are both unions of parts of the partition
P ′ . So, by the Cauchy–Schwarz inequality,
∥𝑊 P ′ − 𝑊 P ∥ 2 ≥ |⟨𝑊 P ′ − 𝑊 P , 1𝑆×𝑇 ⟩| = |⟨𝑊 − 𝑊 P , 1𝑆×𝑇 ⟩| > 𝜀.
So by (4.7), we have ∥𝑊 P ′ ∥ 22 > ∥𝑊 P ∥ 22 + 𝜀 2 , as claimed. □
We will prove the following slight generalization of Theorem 4.6.7, allowing an arbitrary
starting partition (this will be useful later).
This proposition specifically tells us that starting with any given partition, the regularity
argument still works.
Proof. Starting with 𝑖 = 0:
4.6 Weak Regularity Lemma 147
The rest of the exercise shows how to recover a regularity partition from the above
approximation.
(b) Show that the stepping operator is contractive with respect to the cut norm, in the sense
that if 𝑊 : [0, 1] 2 → R is a measurable symmetric function, then ∥𝑊 P ∥ □ ≤ ∥𝑊 ∥ □ .
(c) Let P be a partition of [0, 1] into measurable sets. Let 𝑈 be a graphon that is constant
on 𝑆 × 𝑇 for each 𝑆, 𝑇 ∈ P. Show that for every graphon 𝑊, one has
∥𝑊 − 𝑊 P ∥ □ ≤ 2 ∥𝑊 − 𝑈 ∥ □ .
148 Graph limits
(d) Use (a) and (c) to give a different proof of the weak regularity lemma (with slightly
worse bounds than the one given in class): show that for every 𝜀 > 0 and every
2
graphon 𝑊, there exists a partition P of [0, 1] into 2𝑂 (1/𝜀 ) measurable sets such that
∥𝑊 − 𝑊 P ∥ □ ≤ 𝜀.
Exercise 4.6.13∗ (Second neighborhood distance). Let 0 < 𝜀 < 1/2. Let 𝑊 be a
graphon. Define 𝜏𝑊 , 𝑥 : [0, 1] → [0, 1] by
∫
𝜏𝑊 ,𝑥 (𝑧) = 𝑊 (𝑥, 𝑦)𝑊 (𝑦, 𝑧) 𝑑𝑦.
[0,1]
(This models the second neighborhood of 𝑥.) Prove that if a finite set 𝑆 ⊆ [0, 1] satisfies
∥𝜏𝑊 ,𝑠 − 𝜏𝑊 ,𝑡 ∥ 1 > 𝜀 for all distinct 𝑠, 𝑡 ∈ 𝑆,
2
then |𝑆| ≤ (1/𝜀) 𝐶/𝜀 , where 𝐶 is some absolute constant.
Remark 4.7.2. The above definition is sufficient for our purposes. In order to give a more
formal definition of a martingale, we need to introduce the notion of a filtration. See any
standard measure theory based introduction to probability (Williams (1991, Chapters 10–11)
has a particularly lucid discussion of martingales and their convergence theorem discussed
4.7 Martingale Convergence Theorem 149
below). This martingale is indexed by integers, and hence called “discrete-time.” There are
also continuous-time martingales (e.g., Brownian motion), which we will not discuss here.
Example 4.7.3 (Partial sum of independent mean zero random variables). Let 𝑍1 , 𝑍2 , . . .
be a sequence of independent mean zero random variables (e.g., ±1 with equal probability).
Then 𝑋𝑛 = 𝑍1 + · · · + 𝑍 𝑛 , 𝑛 ≥ 0, is a martingale.
Example 4.7.4 (Betting strategy). Consider any betting strategy in a “fair” casino, where
the expected value of each bet is zero. Let 𝑋𝑛 be the balance after 𝑛 rounds of betting.
Then 𝑋𝑛 is a martingale regardless of the betting strategy. So every betting strategy has zero
expected gain after 𝑛 rounds. Also see the optional stopping theorem for a more general
statement (e.g., Williams (1991, Chapter 10)).
The original meaning of the word “martingale” refers to the following better strategy on a
sequence of fair coin tosses. Each round the better is allowed to bet an arbitrary amount 𝑍:
if heads, the better gains 𝑍 dollars, and if tails the better loses 𝑍 dollars.
Start betting 1 dollar. If one wins, stop. If one loses, then double one’s bet for the next
coin. And then repeat (i.e., keep doubling one’s bet until the first win, at which point one
stops).
A “fallacy” is that this strategy always results in a final net gain of $1, the supposed reason
being that with probability 1 one eventually sees a head. This initially appears to contradict
the earlier claim that all betting strategies have zero expected gain. Thankfully there is no
contradiction. In real life, one starts with a finite budget and could possibly go bankrupt with
this betting strategy, thereby leading to a forced stop. In the optional stopping theorem, there
are some boundedness hypotheses that are violated by the above strategy.
The following construction of martingales is most relevant for our purposes.
Example 4.7.5 (Doob martingale). Let 𝑋 be some “hidden” random variable. Partial infor-
mation is revealed about 𝑋 gradually over time. For example, 𝑋 is some fixed function of
some random inputs. So the exact value of 𝑋 is unknown but its distribution can be derived
from the distribution of the inputs. Initially one does not know any of the inputs. Over time,
some of the inputs are revealed. Let
𝑋𝑛 = E[𝑋 | all information revealed up to time 𝑛].
Then 𝑋0 , 𝑋1 , . . . is a martingale (why?). Informally, 𝑋𝑛 is the best guess (in expectation) of 𝑋
based on all the information available up to time 𝑛. We have 𝑋0 = E𝑋 (when no information
is revealed). All information is revealed as 𝑛 → ∞, and the martingale 𝑋𝑛 converges to the
random variable 𝑋 with probability 1.
Here is a real-life example. Let 𝑋 ∈ {0, 1} be whether a candidate wins in a presidential
election. Let 𝑋𝑛 be the inferred probability that the candidate wins, given all the information
known at time 𝑡 𝑛 . Then 𝑋𝑛 converges to the “truth”, a {0, 1}-value, eventually becoming
deterministic when the election result is finalized.
Then 𝑋𝑛 is a martingale. At time 𝑡 𝑛 , knowing 𝑋𝑛 , if the expectation for 𝑋𝑛+1 (conditioned
on everything known at time 𝑡 𝑛 ) were different from 𝑋𝑛 , then one should have adjusted 𝑋𝑛
accordingly in the first place.
The precise notion of “information” in the above formula can be formalized using the
notion of filtration in probability theory.
150 Graph limits
In other words, if 𝑋0 , 𝑋1 , . . . is a martingale with 𝑋𝑛 ∈ [0, 1] for every 𝑛, then the sequence
is convergent with probability 1.
Remark 4.7.7. The proof actually shows that the boundedness condition can be replaced by
the weaker 𝐿 1 -boundedness condition sup𝑛 E |𝑋𝑛 | < ∞. Even more generally, a hypothesis
called “uniform integrability” is enough.
Some boundedness condition is necessary. For example, in Example 4.7.3, a running sum
of independent uniform ±1 is a non-bounded martingale, and never converges.
Proof. If a sequence 𝑋0 , 𝑋1 , · · · ∈ [0, 1] does not converge, then there exist a pair of rational
numbers 0 < 𝑎 < 𝑏 < 1 such that 𝑋𝑛 “up-crosses” [𝑎, 𝑏] infinitely many times, meaning that
there is an infinite sequence 𝑠1 < 𝑡1 < 𝑠2 < 𝑡2 < · · · such that 𝑋𝑠𝑖 < 𝑎 < 𝑏 < 𝑋𝑡𝑖 for all 𝑖.
𝑠1 𝑡1 𝑠2 𝑡 2 𝑠3 𝑡3
We will show that for each 𝑎 < 𝑏, the probability that a bounded martingale 𝑋0 , 𝑋1 , · · · ∈
[0, 1] up-crosses [𝑎, 𝑏] infinitely many times is zero. Then, by taking a union of all countably
many such pairs (𝑎, 𝑏) of rationals, we deduce that the martingale converges with probability
1.
Consider the following betting strategy. Imagine that 𝑋𝑛 is a stock price. At any time, if
𝑋𝑛 dips below 𝑎, we buy and hold one share until 𝑋𝑛 reaches above 𝑏, at which point we
sell this share. (Note that we always hold either zero or one share–we do not buy more until
we have sold the currently held share). Start with a budget of 𝑌0 = 1 (so we will never go
bankrupt). Let 𝑌𝑛 be the value of our portfolio (cash on hand plus the value of the share if
held) at time 𝑛. Then 𝑌𝑛 is a martingale (why?). So E𝑌𝑛 = 𝑌0 = 1. Also 𝑌𝑛 ≥ 0 for all 𝑛. If
one buys and sells at least 𝑘 times up to time 𝑛, then 𝑌𝑛 ≥ 𝑘 (𝑏 − 𝑎) (this is only the net profit
from buying and selling; the actual 𝑌𝑛 may be higher due to the initial cash balance and the
value of the current share held). So, by Markov’s inequality, for every 𝑛,
E𝑌𝑛 1
P(≥ 𝑘 up-crossings up to time 𝑛) ≤ P(𝑌𝑛 ≥ 𝑘 (𝑏 − 𝑎)) ≤ = .
𝑘 (𝑏 − 𝑎) 𝑘 (𝑏 − 𝑎)
By the monotone convergence theorem,
1
P(≥ 𝑘 up-crossings) = lim P(≥ 𝑘 up-crossings up to time 𝑛) ≤ .
𝑛→∞ 𝑘 (𝑏 − 𝑎)
Letting 𝑘 → ∞, the probability of having infinitely many up-crossings is zero. □
4.8 Compactness of the Graphon Space 151
𝑈1 𝑈2 𝑈3
152 Graph limits
Quick applications
f0 , 𝛿□ ) is a powerful statement. We will use it to prove the equivalence
The compactness of ( W
of cut metric convergence and left-convergence in the next section. Right now, let us show
how to use compactness to deduce the existence of limits for a left-convergent sequence of
graphons.
Proof of Theorem 4.3.8 (existence of limit for a left-convergent sequence of graphons). Let
𝑊1 , 𝑊2 , . . . be a sequence of graphons such that the sequence of 𝐹-densities {𝑡 (𝐹, 𝑊𝑛 )} 𝑛
converges for every graph 𝐹. Since ( W f0 , 𝛿□ ) is a compact metric space by Theorem 4.2.7,
it is also sequentially compact, and so there is a subsequence (𝑛𝑖 )𝑖=1
∞
and a graphon 𝑊 such
that 𝛿□ (𝑊𝑛𝑖 , 𝑊) → 0 as 𝑖 → ∞. Fix any graph 𝐹. By the counting lemma, Theorem 4.5.1, it
follows that 𝑡 (𝐹, 𝑊𝑛𝑖 ) → 𝑡 (𝐹, 𝑊). But by assumption, the sequence {𝑡 (𝐹, 𝑊𝑛 )} 𝑛 converges.
Therefore 𝑡 (𝐹, 𝑊𝑛 ) → 𝑡 (𝐹, 𝑊) as 𝑛 → ∞. Thus 𝑊𝑛 left-converges to 𝑊. □
Let us now examine a different aspect of compactness. Recall that by definition, a set is
compact if every open cover has a finite subcover.
Recall from Theorem 4.2.8 that the set of graphs is dense in the space of graphons with
respect to the cut metric. This was proved by showing that for every 𝜀 > 0 and graphon
𝑊, one can find a graph 𝐺 such that 𝛿□ (𝐺, 𝑊) < 𝜀. However, the size of 𝐺 produced by
this proof depends on both 𝜀 and 𝑊, since the proof proceeds by first taking a discrete 𝐿 1
approximation of 𝑊, which could involve an unbounded number of steps to approximate. In
contrast, we show below that the number of vertices of 𝐺 needs to depend only on 𝜀 and not
on 𝑊.
Proof. Let 𝜀 > 0. For a graph 𝐺, define the open 𝜀-ball (with respect to the cut metric)
around 𝐺:
f0 : 𝛿□ (𝐺, 𝑊) < 𝜀}.
𝐵 𝜀 (𝐺) = {𝑊 ∈ W
Since every graphon lies within cut distance 𝜀 from some graph (Theorem 4.2.8), the balls
f0 as 𝐺 ranges over all graphs. By compactness, this open cover has a finite
𝐵 𝜀 (𝐺) cover W
subcover, and let 𝑁 be the maximum number of vertices in graphs 𝐺 of this subcover. Then
every graphon lies within cut distance 𝜀 of a graph on at most 𝑁 vertices. □
The following exercise asks to make the above proof quantitative.
Exercise 4.8.2. Show that for every 𝜀 > 0, every graphon lies within cut distance at most
2
𝜀 from some graph on at most 𝐶 1/𝜀 vertices, where 𝐶 is some absolute constant.
Hint: Use the weak regularity lemma.
Remark 4.8.3 (Ineffective bounds from compactness). Arguments using compactness usu-
ally do not generate quantitative bounds, meaning, for example, the proof of Proposition 4.8.1
does not give any specific function 𝑛(𝜀), only that such a function always exists. In case where
one does not have an explicit bound, we call the bound ineffective. Ineffective bounds also
often arise from arguments involving ergodic theory and non-standard analysis. Sometimes a
different argument can be found that generates a quantitative bound (e.g., Exercise 4.8.2), but
it is not always known how to do this. Here we illustrate a simple example of a compactness
application (unrelated to dense graph limits) that gives an ineffective bound, but it remains
an open problem to make the bound effective.
This example concerns bounded degree graphs. It is sometimes called a “regularity lemma”
for bounded degree graphs, but it is very different from the regularity lemmas we have
encountered so far.
A rooted graph (𝐺, 𝑣) consists of a graph 𝐺 with a vertex 𝑣 ∈ 𝑣(𝐺) designated as the
root. Given a graph 𝐺 and positive integer 𝑟, we can obtain a random rooted graph by first
picking a vertex 𝑣 of 𝐺 as the root uniformly at random, and then removing all vertices more
than distance 𝑟 from 𝑣. We define the 𝒓-neighborhood-profile of 𝐺 to be the probability
distribution on rooted graphs generated by this process.
Recall that the total variation distance between two probability distributions 𝜇 and 𝜆 is
defined by
𝑑𝑇𝑉 (𝜇, 𝜆) = sup |𝜇(𝐸) − 𝜆(𝐸)| ,
𝐸
where 𝐸 ranges over all events. In the case of two discrete discrete random distributions 𝜇
and 𝜆, the above definition can be written as half the ℓ 1 distance between the two probability
distributions:
1 ∑︁
𝑑𝑇𝑉 (𝜇, 𝜆) = |𝜇(𝑥) − 𝜆(𝑥)| .
2 𝑥
The following is an unpublished observation of Alon.
154 Graph limits
Proof. Let G = GΔ,𝑟 be the set of all possible rooted graphs with maximum degree Δ and
radius at most 𝑟 around the root. Then |G| < ∞. The 𝑟-neighborhood-profile 𝑝 𝐺 of any
rooted graph 𝐺 can be represented as a point 𝑝 𝐺 ∈ [0, 1] G with coordinate sum 1, and let
𝐴 = {𝑝 𝐺 : graph 𝐺} ⊆ [0, 1] G be the set of all points that can arise this way. Since [0, 1] G
is compact, the closure of 𝐴 is compact. Since the union of the open 𝜀-neighborhoods (with
respect to 𝑑𝑇𝑉 ) of 𝑝 𝐺 , ranging over all graphs 𝐺, covers the closure of 𝐴, by compactness
there is some finite subcover. This subcover is a finite collection X of graphs so that for every
graph 𝐺, 𝑝 𝐺 lies within 𝜀 total variance distance to some 𝑝 𝐺 ′ with 𝐺 ′ ∈ X. We conclude by
letting 𝑁 be the maximum number of vertices of a graph from X. □
Despite the short proof using compactness, it remains an open problem to make the above
result quantitative.
Open problem 4.8.5 (Effective “regularity lemma” for bounded degree graphs)
Find some specific 𝑁 (𝜀, Δ, 𝑟) so that Theorem 4.8.4 holds.
Remark 4.9.2. The result is reminiscent of results from probability theory on the uniqueness
of moments, which roughly says that if two “sufficiently well-behaved” real random variables
𝑋 and 𝑌 share the same moments, (i.e., E[𝑋 𝑘 ] = E[𝑌 𝑘 ] for all nonnegative integers 𝑘),
4.9 Equivalence of Convergence 155
then 𝑋 and 𝑌 must be identically distributed. One needs some technical conditions for the
conclusion to hold. For example, Carleman’s condition says that if the moments of 𝑋 satisfy
Í∞ 2𝑘 −1/(2𝑘 )
𝑘=1 E[𝑋 ] = ∞, then the distribution of 𝑋 is uniquely determined by its moments.
This sufficient condition holds as long as the 𝑘-th moment of 𝑋 does not grow too quickly
with 𝑘. It holds for many distributions in practice.
We need some preparation before proving the uniqueness of moments theorem.
Proof. Let 𝑓 (𝑥 1 , . . . , 𝑥 𝑛 ) denote the expression inside the absolute value. So E 𝑓 = 0. Also
𝑓 changes by at most 2(𝑘 − 1)/ 𝑘2 = 4/𝑘 whenever we change exactly one coordinate of 𝑓 .
By the bounded differences inequality, Theorem 4.4.4, we obtain
−2𝜀 2 2
P(| 𝑓 | ≥ 𝜀) ≤ 2 exp = 2𝑒 −𝑘 𝜀 /8 . □
(4/𝑘) 2 𝑘
Let us now consider a variation of the 𝑊-random graph model from Section 4.4. Let
𝑥 1 , . . . , 𝑥 𝑘 ∈ [0, 1] be chosen independently and uniformly at random. Let H(𝑘, 𝑊) be an
edge-weighted random graph on vertex set [𝑘] with edge 𝑖 𝑗 having weight 𝑊 (𝑥𝑖 , 𝑥 𝑗 ), for
each 1 ≤ 𝑖 < 𝑗 ≤ 𝑛. Note that this definition makes sense for any symmetric measurable
𝑊 : [0, 1] 2 → R. Furthermore, when 𝑊 is a graphon, the 𝑊-random graph G(𝑘, 𝑊) can be
obtained by independently sampling each edge of H(𝑘, 𝑊) with probability equal to its edge
weight. We shall study the joint distributions of G(𝑘, 𝑊) and H(𝑘, 𝑊) coupled through the
above two-step process.
𝑥3 𝑥5 𝑥1 𝑥2 𝑥4 1 2 3 4 5 1 2 3 4 5
𝑥3
1 1
𝑥5
2 2
𝑥1
3 3
𝑥2 4 4
𝑥4
5 5
𝑊 H(𝑘, 𝑊) G(𝑘, 𝑊)
Similar to Definition 4.2.4 of the cut distance 𝛿□ , define the distance based on the 𝐿 1 norm:
𝜹1 (𝑾, 𝑼) := inf ∥𝑊 − 𝑈 𝜙 ∥ 1
𝜙
where the infimum is taken over all invertible measure preserving maps 𝜙 : [0, 1] → [0, 1].
Since ∥ · ∥ □ ≤ ∥ · ∥ 1 , we have 𝛿□ ≤ 𝛿1 .
156 Graph limits
Proof. First we prove the result for step graphons 𝑊. In this case, with probability 1 the
fraction of vertices of H(𝑘, 𝑊) that fall in each step of 𝑊 converges to the length of each step
by the law of large numbers. If so, then after sorting the vertices of H(𝑘, 𝑊), the associated
graphon H(𝑘, 𝑊) is obtained from 𝑊 by changing the step sizes by 𝑜(1) as 𝑘 → ∞, and
then zeroing out the diagonal blocks, as illustrated below. Then H(𝑘, 𝑊) converges to 𝑊
pointwise almost everywhere as 𝑘 → ∞. In particular, 𝛿1 (H(𝑘, 𝑊), 𝑊) → 0.
𝑊 H(𝑘, 𝑊)
Now let 𝑊 be any graphon. For any other graphon 𝑊 ′ , by using the same random vertices
for H(𝑘, 𝑊) and H(𝑘, 𝑊 ′ ), the two random graphs are coupled so that with probability 1,
∥H(𝑘, 𝑊) − H(𝑘, 𝑊 ′ )∥ 1 = ∥H(𝑘, 𝑊 − 𝑊 ′ )∥ 1 = ∥𝑊 − 𝑊 ′ ∥ 1 + 𝑜(1) as 𝑘 → ∞
by Lemma 4.9.3 applied to 𝑈 (𝑥, 𝑦) = |𝑊 (𝑥, 𝑦) − 𝑊 ′ (𝑥, 𝑦)|.
For every 𝜀 > 0, we can find some step graphon 𝑊 ′ so that ∥𝑊 − 𝑊 ′ ∥ 1 ≤ 𝜀 (by approx-
imating the Lebesgue measure using boxes). We saw earlier that 𝛿1 (H(𝑘, 𝑊 ′ ), 𝑊 ′ ) → 0. It
follows that with probability 1,
𝛿1 (H(𝑘, 𝑊), 𝑊) ≤ ∥H(𝑘, 𝑊) − H(𝑘, 𝑊 ′ )∥ 1 + 𝛿1 (H(𝑘, 𝑊 ′ ), 𝑊 ′ ) + ∥𝑊 ′ − 𝑊 ∥ 1
= 2 ∥𝑊 ′ − 𝑊 ∥ 1 + 𝑜(1) ≤ 2𝜀 + 𝑜(1)
as 𝑘 → ∞. Since 𝜀 > 0 can be chosen to be arbitrarily small, we have 𝛿1 (H(𝑘, 𝑊), 𝑊) → 0
with probability 1. □
Proof of Theorem 4.9.1 (uniqueness of moments). By inclusion-exclusion, for any 𝑘-vertex
labeled graph 𝐹,
where the sum ranges over all graphs 𝐹 ′ with 𝑉 (𝐹 ′ ) = 𝑉 (𝐹) and 𝐸 (𝐹 ′ ) ⊇ 𝐸 (𝐹). Since
𝑡 (𝐹 ′ , 𝑊) = Pr[G(𝑘, 𝑊) ⊇ 𝐹 ′ as labeled graphs],
we see that the distribution of G(𝑘, 𝑊) is determined by the values of 𝑡 (𝐹, 𝑊) over all 𝐹.
Since 𝑡 (𝐹, 𝑊) = 𝑡 (𝐹, 𝑈) for all 𝐹, G(𝑘, 𝑊) and G(𝑘, 𝑈) are identically distributed.
4.9 Equivalence of Convergence 157
Exercise 4.9.7. Prove the inverse counting lemma Corollary 4.9.6 using the compactness
of the graphon space (Theorem 4.2.7) and the uniqueness of moments (Theorem 4.9.1).
Hint: Consider a hypothetical sequence of counterexamples.
158 Graph limits
Remark 4.9.8. The inverse counting lemma was first proved by Borgs, Chayes, Lovász, Sós,
& Vesztergombi (2008) in the following quantitative form:
Exercise 4.9.10. Prove that there exists a function 𝑓 : (0, 1] → (0, 1] such that for all
graphons 𝑈 and 𝑊, there exists a graph 𝐹 with
|𝑡 (𝐹, 𝑈) − 𝑡 (𝐹, 𝑊)|
≥ 𝑓 (𝛿□ (𝑈, 𝑊)).
𝑒(𝐹)
Exercise 4.9.11∗ (Generalized maximum cut). For symmetric measurable functions 𝑊, 𝑈 : [0, 1] 2 →
R, define
∫
C(𝑊, 𝑈) := sup ⟨𝑊, 𝑈 ⟩ = sup
𝜙
𝑊 (𝑥, 𝑦)𝑈 (𝜙(𝑥), 𝜙(𝑦)) 𝑑𝑥𝑑𝑦,
𝜙 𝜙
where 𝜙 ranges over all invertible measure preserving maps [0, 1] → [0, 1]. Extend the
definition of C(·, ·) to graphs via C(𝐺, ·) := C(𝑊𝐺 , ·) and so on.
(a) Is C(𝑈, 𝑊) continuous jointly in (𝑈, 𝑊) with respect to the cut norm? Is it contin-
uous in 𝑈 if 𝑊 is held fixed?
(b) Show that if 𝑊1 and 𝑊2 are graphons such that C(𝑊1 , 𝑈) = C(𝑊2 , 𝑈) for all
graphons 𝑈, then 𝛿□ (𝑊1 , 𝑊2 ) = 0.
(c) Let 𝐺 1 , 𝐺 2 , . . . be a sequence of graphs such that C(𝐺 𝑛 , 𝑈) converges as 𝑛 → ∞
for every graphon 𝑈. Show that 𝐺 1 , 𝐺 2 , . . . is convergent.
(d) Can the hypothesis in (c) be replaced by “C(𝐺 𝑛 , 𝐻) converges as 𝑛 → ∞ for every
graph 𝐻”?
Further Reading
The book Large Networks and Graph Limits by Lovász (2012) is the authoritative reference
on the subject. His survey article titled Very Large Graphs (2009) also gives an excellent
overview.
One particularly striking application of the theory of dense graph limits is to large de-
viations for random graphs by Chatterjee & Varadhan (2011). See the survey article An
4.9 Equivalence of Convergence 159
Introduction to Large Deviations for Random Graphs by Chatterjee (2016) as well as his
book (Chatterjee 2017).
Chapter Summary
where the infimum is taken over all invertible measure preserving maps 𝜙 : [0, 1] → [0, 1].
• Given a sequence of graphons (or graphs) 𝑊1 , 𝑊2 , . . . , we say that it
– convergences in cut metric if it is a Cauchy sequence with respect to the cut metric
𝛿□ ;
– left-convergences if the homomorphism density 𝑡 (𝐹, 𝑊𝑛 ) convergences for every fixed
graph 𝐹 as 𝑛 → ∞.
• The graphon space is compact under the cut metric.
– Proof uses the weak regularity lemma and the martingale convergence theorem.
– Compactness has powerful consequences.
• Convergence in cut metric and left-convergence are equivalent for a sequence of graphons.
– (⇒) follows from a counting lemma.
– (⇐) was proved here using compactness.
5
Chapter Highlights
• A suite of techniques for proving inequalities between subgraph densities
• The maximum/minimum triangle density in a graph of given edge density.
• How to apply Cauchy–Schwarz and Hölder inequalities
• Lagrangian method (another proof of Turán’s theorem, and linear inequalities between
clique densities)
• Entropy method (and applications to Sidorenko’s conjecture)
161
162 Graph Homomorphism Inequalities
In other words, the conjecture says that for a fixed bipartite graph 𝐹, the 𝐹-density in
a graph of a given edge density is asymptotically minimized by a random graph. We will
develop techniques in this chapter to prove several interesting special cases of Sidorenko’s
conjecture.
Graph Homomorphism Inequalities 163
Sidorenko’s conjecture has the equivalent graphon formulation: for every bipartite graph 𝐹
and graphon 𝑊,
𝑡 (𝐹, 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 𝑒 (𝐹 ) .
Note that equality occurs when 𝑊 ≡ 𝑝, the constant graphon. One can think of Sidorenko’s
∫ as a separate problem for each 𝐹, and asking to minimize 𝑡 (𝐹, 𝑊) among graphons
conjecture
𝑊 with 𝑊 ≥ 𝑝. Whether the constant graphon is the unique minimizer is the subject of an
even stronger conjecture known as the forcing conjecture.
By translating back and forth between graph limits and sequences of graphs, being forcing
is equivalent to the quasirandomness condition. Thus any forcing graph can play the role
of 𝐶4 in Theorem 3.1.1. This is what led Chung, Graham, and Wilson to consider forcing
graphs. In particular, 𝐶4 is forcing.
Exercise 5.0.10. Prove the “only if” direction of the forcing conjecture.
164 Graph Homomorphism Inequalities
Exercise 5.0.12 (Forcing and stability). Show that a graph 𝐹 is forcing if and only if for
every 𝜀 > 0, there exists 𝛿 > 0 such that if a graph 𝐺 satisfies 𝑡 (𝐹, 𝐺) ≤ 𝑡 (𝐾2 , 𝐺) 𝑒 (𝐹 ) + 𝛿,
then 𝛿□ (𝐺, 𝑝) ≤ 𝜀.
The following exercise shows that to prove a graph is Sidorenko, we do not lose anything
by giving away a constant factor. The proof is a quick and neat application of the tensor
power trick.
Exercise 5.0.13 (Tensor power trick). Let 𝐹 be a bipartite graph. Suppose there is some
constant 𝑐 > 0 such that
𝑡 (𝐹, 𝐺) ≥ 𝑐 𝑡 (𝐾2 , 𝐺) 𝑒 (𝐹 ) for all graphs 𝐺.
Show that 𝐹 is Sidorenko.
For a given 𝑝 ∈ [0, 1], the set {𝑡 (𝐾3 , 𝑊) : 𝑡 (𝐾2 , 𝑊) = 𝑝} is a closed interval. Indeed,
if 𝑊0 achieves the minimum triangle density, and 𝑊1 achieves the maximum, then their
linear interpolation 𝑊𝑡 = (1 − 𝑡)𝑊0 + 𝑡𝑊1 , ranging over 0 ≤ 𝑡 ≤ 1, must have triangle
density continuously interpolating between those of 𝑊0 and 𝑊1 , and therefore achieves every
intermediate value.
𝑡 (𝐾3 , 𝑊)
𝑦 = 𝑥 3/2 12
25
3
8
2
9
1 2 3 4
0 2 3 4 5 1 𝑡 (𝐾2 , 𝑊)
𝑡 (𝐾3 , 𝑊)
0 1 𝑡 (𝐾2 , 𝑊)
Figure 5.1 The top figure shows the edge-triangle region. This region is often
depicted as in the bottom figure, which better highlights the concave scallops on the
lower boundary but is a less accurate plot.
166 Graph Homomorphism Inequalities
This inequality is asymptotically tight for 𝐺 being a clique on a subset of vertices. The
equivalent graphon inequality 𝑡 (𝐾3 , 𝑊) ≤ 𝑡 (𝐾2 , 𝑊) 3/2 attains equality for the clique graphon
(
0 𝑎 1
1 if 𝑥, 𝑦 ≤ 𝑎, 1
𝑊 (𝑥, 𝑦) = (5.3)
0 otherwise. 𝑎
0
1
Proof. Assume at least one 𝑎 𝑖 is positive, or else both sides equal to zero. Then
𝑛 𝑡 ∑︁
LHS ∑︁
𝑛
𝑎𝑖 𝑎𝑖
= ≤ = 1. □
RHS 𝑖=1 𝑎 1 + · · · + 𝑎 𝑛 𝑖=1
𝑎1 + · · · + 𝑎 𝑛
Remark 5.1.4. We will see additional proofs of Theorem 5.1.2 not invoking eigenvalues
later in Exercise 5.2.14 and in Section 5.3. Theorem 5.1.2 is an inequality in “physical space”
(as opposed to going into the “frequency space” of the spectrum), and it is a good idea to
think about how to prove it while staying in the physical space.
More generally, the clique graphon (5.3) also maximizes 𝐾𝑟 -densities among all graphons
of given edge density.
Proof. There exist integers 𝑎, 𝑏 ≥ 0 such that 𝑘 = 3𝑎 + 2𝑏 (e.g., take 𝑎 = 1 if 𝑘 is odd and
5.1 Edge vs. Triangle Densities 167
𝑎 = 0 if 𝑘 is even). Then 𝑎𝐾3 + 𝑏𝐾2 (a disjoint union of 𝑎 triangles and 𝑏 isolated edges) is
a subgraph of 𝐾 𝑘 . So
𝑡 (𝐾 𝑘 , 𝑊) ≤ 𝑡 (𝑎𝐾3 + 𝑏𝐾2 , 𝑊) = 𝑡 (𝐾3 , 𝑊) 𝑎 𝑡 (𝐾2 , 𝑊) 𝑏 ≤ 𝑡 (𝐾2 , 𝑊) 3𝑎/2+𝑏 = 𝑡 (𝐾2 , 𝑊) 𝑘/2 . □
Remark 5.1.6 (Kruskal–Katona theorem). Thanks to a theorem of Kruskal (1963) and
Katona (1968), the exact answer to the following non-asymptotic question is completely
known:
What is the maximum
number of copies of 𝐾 𝑘 ’s in an 𝑛-vertex graph with 𝑚 edges?
When 𝑚 = 𝑎2 for some integer 𝑎, the optimal graph is a clique on 𝑎 vertices. More
generally, for any value of 𝑚, the optimal graph is obtained by adding edges in colexicographic
order:
12, 13, 23, 14, 24, 34, 15, 25, 35, 45, . . .
This is stronger than Theorem 5.1.5, which only gives an asymptotically tight answer as
𝑛 → ∞. The full Kruskal–Katona theorem also answers:
What is the maximum
number of 𝑘-cliques in an 𝑟-graph with 𝑛 vertices and 𝑚 edges?
When 𝑚 = 𝑎𝑟 , the optimal 𝑟-graph is a clique on 𝑎 vertices. (An asymptotic version of
this statement can be proved using techniques in Section 5.3.) More generally, the optimal
𝑟-graph is obtained by adding the edges in colexicographic order. For example, for 3-graphs,
the edges should be added in the following order:
123, 124, 134, 234, 125, 135, 235, 145, 245, 345, . . .
Here 𝑎 1 . . . 𝑎 𝑟 < 𝑏 1 . . . 𝑏 𝑟 in colexicographic order if 𝑎 𝑖 < 𝑏 𝑖 at the last 𝑖 where 𝑎 𝑖 ≠ 𝑏 𝑖 (i.e.,
dictionary order when read from right to left). Here we sort the elements of each 𝑟-tuple in
increasing order.
The Kruskal–Katona theorem can be proved by a compression/shifting argument. The
idea is to repeatedly modify the graph so that we eventually end up at the optimal graph. At
each step, we “push” all the edges towards a clique along some “direction” in a way that does
not reduce the number of 𝑘-cliques in the graph.
the edge-triangle region, as illustrated in Figure 5.1 on page 165. (Recall that 𝐾 𝑘 is associated
to the same graphon as a complete 𝑘-partite graph with equal parts.)
Now suppose the given edge density 𝑝 lies strictly between 1 − 1/(𝑘 − 1) and 1 − 1/𝑘
for some integer 𝑘 ≥ 2. To obtain the graphon with edge density 𝑝 and minimum triangle
density, we first start with 𝐾 𝑘 with all vertices having equal weight. And then shrink the
relative weight of exactly one of the 𝑘 vertices (while keeping the remaining 𝑘 − 1 vertices
to have the same vertex weight). For example, the graphon illustrated below is obtained by
starting with 𝐾4 and shrinking the weight on one vertex.
𝐼1 𝐼2 𝐼3 𝐼4
𝐼1 0 1 1 1
𝐼2 1 0 1 1
𝐼3 1 1 0 1
𝐼4 1 1 1 0
During this process, the total edge density (account for vertex weights) decreases continuously
from 1 − 1/𝑘 to 1 − 1/(𝑘 − 1). At some point, the edge density is equal to 𝑝. This vertex-
weighted 𝑘-clique 𝑊 turns out minimize triangle density among all graphons with edge
density 𝑝.
The above claim is much more difficult to prove than the maximum triangle density result.
This theorem, stated below, due to Razborov (2008), was proved using an involved Cauchy–
Schwarz calculus that he coined flag algebra. We will say a bit more about this method in
Section 5.2.
We will not prove this theorem in full here. See Lovász (2012, Section 16.3.2) for a
presentation of the proof of Theorem 5.1.7. Later in this Chapter, we give lower bounds that
match the edge-triangle region at the cliques. In particular, Theorem 5.4.4 will allow us to
determine the convex hull of the region.
The graphon described in Theorem 5.1.7 turns out to be not unique unless 𝑝 = 1 − 1/𝑘
for some positive integer 𝑘. Indeed, suppose 1 − 1/(𝑘 − 1) < 𝑝 < 1 − 1/𝑘. Let 𝐼1 , . . . , 𝐼 𝑘 be
the partition of [0, 1] into the intervals corresponding to the vertices of the vertex-weighted
𝑘-clique, with 𝐼1 , . . . , 𝐼 𝑘−1 all having equal length, and 𝐼 𝑘 strictly smaller length. Now replace
the graphon on 𝐼 𝑘−1 ∪ 𝐼 𝑘 by an arbitrary triangle-free graphon of the same edge density.
5.2 Cauchy–Schwarz 169
𝐼1 𝐼2 𝐼3 𝐼4
𝐼1 0 1 1 1
𝐼2 1 0 1 1
any
𝐼3 1 1 triangle-
0 1
free
𝐼4 1 1 graphon
1 0
This operation does not change the edge-density or the triangle-density of the graphon
(check!). The non-uniqueness of the minimizer hints at the difficulty of the result.
This completes our discussion of the edge-triangle region (Figure 5.1 on page 165).
Theorem 5.1.7 was generalized from 𝐾3 to 𝐾4 (Nikiforov 2011), and then to all cliques 𝐾𝑟
(Reiher 2016). The construction for the minimizing graphon is the same as for the triangle
case.
5.2 Cauchy–Schwarz
We will apply the Cauchy–Schwarz inequality in the following form: given real-valued
functions 𝑓 and 𝑔 on the same space (always assuming the usual measurability assumptions
without further comments), we have
∫ 2 ∫ ∫
2 2
𝑓𝑔 ≤ 𝑓 𝑔 .
𝑋 𝑋 𝑋
In practice, we will often apply the Cauchy–Schwarz inequality by changing the order of
integration, and separating an integral into an outer integral and an inner integral.
A typical application of the Cauchy–Schwarz inequality is demonstrated in the following
170 Graph Homomorphism Inequalities
Note that in the final step, “expanding a square” has the effect of “duplicating a variable.”
It is useful to recognize expressions with duplicated variables that can be folded back into a
square.
Let us warm up by proving that 𝐾2,2 is Sidorenko. We actually already proved this statement
in Proposition 3.1.14 in the context of the Chung–Graham–Wilson theorem on quasirandom
graphs. We repeat the same calculations here to demonstrate the integral notation.
≥ 𝑊 (𝑥, 𝑦) = 𝑡 (𝐾2 , 𝑊) 2 . □
𝑥,𝑦
Lemma 5.2.3
𝑡 (𝐾2,2 , 𝑊) ≥ 𝑡 (𝐾1,2 , 𝑊) 2 .
Although it was initially conjectured that all graphs are common, this turns out to be false.
In particular, 𝐾𝑡 is not common for all 𝑡 ≥ 4 (Thomason 1989).
Proposition 5.2.7
Every Sidorenko graph is common.
172 Graph Homomorphism Inequalities
Proof. Suppose 𝐹 were Sidorenko. Let 𝑝 = 𝑡 (𝐾2 , 𝑊). Then 𝑡 (𝐹, 𝑊) ≥ 𝑝 𝑒 (𝐹 ) and 𝑡 (𝐹, 1 −
𝑊) ≥ 𝑡 (𝐾2 , 1 − 𝑊) 𝑒 (𝐹 ) = (1 − 𝑝) 𝑒 (𝐹 ) . Adding up and using convexity,
𝑡 (𝐹, 𝑊) + 𝑡 (𝐹, 1 − 𝑊) ≥ 𝑝 𝑒 (𝐹 ) + (1 − 𝑝) 𝑒 (𝐹 ) ≥ 2−𝑒 (𝐹 )+1 . □
The converse is false. The triangle is common but not Sidorenko (recall that every
Sidorenko graph is bipartite).
We also have the following lower bound on the minimum triangle density given edge
density (Goodman 1959).
Below is plot of Goodman’s bound against the true edge triangle region from Figure 5.1
on page 165. The inequality is tight whenever 𝑊 = 𝐾𝑛 , in which case 𝑡 (𝐾2 , 𝑊) = 1 − 1/𝑛
and 𝑡 (𝐾3 , 𝑊) = 𝑛3 /𝑛3 = (1 − 1/𝑛) (1 − 2/𝑛). In particular, Goodman’s bound implies that
𝑡 (𝐾3 , 𝑊) > 0 whenever 𝑡 (𝐾2 , 𝑊) > 1/2, which we saw from Mantel’s theorem.
𝑡 (𝐾3 , 𝑊)
𝑦 = 𝑥(2𝑥 − 1)
0 𝑡 (𝐾2 , 𝑊) 1
Figure 5.2 The Goodman lower bound on the triangle density from Theorem 5.2.8
plotted on top of the edge-triangle region (Figure 5.1 on page 165).
Thus
∫
𝑡 (𝐾3 , 𝐺) = 𝑊 (𝑥, 𝑦)𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧)
∫
𝑥,𝑦,𝑧
= 2𝑡 (𝐾1,2 , 𝑊) − 𝑡 (𝐾2 , 𝑊)
≥ 2𝑡 (𝐾2 , 𝑊) 2 − 𝑡 (𝐾2 , 𝑊). □
Finally, let us demonstrate an application of the Cauchy–Schwarz inequality in the follow-
ing form, for nonnegative functions 𝑓 and 𝑔:
∫ ∫ ∫
𝑓 2𝑔 𝑔 ≥ 𝑓𝑔 .
Recall that a graph 𝐹 is Sidorenko if 𝑡 (𝐹, 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 𝑒 (𝐹 ) for all graphons 𝑊 (Defini-
tion 5.0.4).
Theorem 5.2.9
is Sidorenko.
Proof. The idea is the “fold” the above graph 𝐹 in half along the middle using the Cauchy–
Schwarz inequality. Using 𝑤 and 𝑥 to indicate the two vertices in the middle, we have
∫ ∫ 2 𝑧 𝑥
𝑡 (𝐹, 𝑊) = 𝑊 (𝑤, 𝑦)𝑊 (𝑦, 𝑧)𝑊 (𝑧, 𝑥) 𝑊 (𝑤, 𝑥).
𝑤, 𝑥,𝑦,𝑧
𝑦 𝑤
So
∫ ∫ 2
𝑡 (𝐹, 𝑊)𝑡 (𝐾2 , 𝑊) ≥ 𝑊 (𝑤, 𝑦)𝑊 (𝑦, 𝑧)𝑊 (𝑧, 𝑥)𝑊 (𝑤, 𝑥)
𝑤, 𝑥,𝑦,𝑧
= 𝑡 (𝐶4 , 𝑊) 2 ≥ 𝑡 (𝐾2 , 𝑊) 8 ,
with the last step due to Theorem 5.2.1. Therefore 𝑡 (𝐹, 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 7 and hence 𝐹 is
Sidorenko. □
Remark 5.2.10 (Flag algebra). The above examples were all simple enough to be found
by hand. As mentioned earlier, every application of the Cauchy–Schwarz inequality can be
rewritten in the form of a sum of a squares. One could actually search for these sum-of-
squares proofs more systematically using a computer program. This idea, first introduced
by Razborov (2007), can be combined with other sophisticated methods to determine the
lower boundary of the edge-triangle region (Razborov 2008). Razborov coined the term flag
algebra to describe a formalization of such calculations. The technique is also sometimes
called graph algebra, Cauchy–Schwarz calculus, sum-of-squares proof.
Conceptually, the idea is that we are looking for all the ways to obtain nonnegative linear
combinations of squared expressions. In a typical application, one is asked to solve an
174 Graph Homomorphism Inequalities
Here 𝑎, 𝑏, 𝑐 ∈ R are constants (to be chosen). We can expand the above expression, and then,
for instance,
∫ 2 ∫
replace 𝐺 𝑥,𝑦,𝑧 (𝑢, 𝑤) by 𝐺 𝑥,𝑦,𝑧 (𝑢, 𝑤)𝐺 𝑥,𝑦,𝑧 (𝑢 ′ , 𝑤 ′ ).
𝑢,𝑤 𝑢,𝑤,𝑢′ ,𝑤 ′
Let us mention another nice result obtained using the flag algebra method.
What is the maximum possible number of induced copies of a given graph 𝐻 among all
𝑛-vertex graphs? (Pippenger & Golumbic 1975)
𝑛
The optimal limiting density (as a fraction of 𝑣 (𝐻 )
, as 𝑛 → ∞) is called the inducibility
of graph 𝐻. They conjectured that for every 𝑘 ≥ 5, the inducibility of a 𝑘-cycle is 𝑘!/(𝑘 𝑘 − 𝑘),
obtained by an iterated blow-up of a 𝑘-cycle (𝑘 = 5 illustrated below; in the limit the should
be infinitely many fractal-like iterations).
The conjecture for 5-cycles was proved by using flag algebra methods combined with addi-
tional “stability” methods (Balogh, Hu, Lidický, & Pfender 2016). The constant factor in the
following theorem is tight.
Although the flag algebra method has successfully solved several extremal problems, in
many interesting cases, the method does not give a tight bound. Nevertheless, for many open
extremal problems, such as the tetrahedron hypergraph Turán problem, the best known bound
comes from this approach.
Remark 5.2.13 (Incompleteness). Can every true linear inequality for graph homomor-
phism densities be proved via Cauchy–Schwarz/sum-of-squares?
Before giving the answer, we first discuss classical results about real polynomials. Suppose
𝑝(𝑥1 , . . . , 𝑥 𝑛 ) is a real polynomial such that 𝑝(𝑥 1 , . . . , 𝑥 𝑛 ) ≥ 0 for all 𝑥 1 , . . . , 𝑥 𝑛 ∈ R. Can
such a nonnegative polynomial always be written as a sum of squares? Hilbert (1888; 1893)
proved that the answer is yes for 𝑛 ≤ 2 and no in general for 𝑛 ≥ 3. The first explicit
counterexample was given by Motzkin (1967):
𝑝(𝑥, 𝑦) = 𝑥 4 𝑦 2 + 𝑥 2 𝑦 4 + 1 − 3𝑥 2 𝑦 2
is always nonnegative due to the AM-GM inequality, but it cannot be written as a non-
negative sum of squares. Solving Hilbert’s 17th problem, Artin (1927) proved that every
𝑝(𝑥1 , . . . , 𝑥 𝑛 ) ≥ 0 can be written as a sum of squares of rational functions, meaning that
there is some nonzero polynomial 𝑞 such that 𝑝𝑞 2 can be written as a sum of squares of
176 Graph Homomorphism Inequalities
Exercise 5.2.16. Prove that 𝐾4− is common, where 𝐾4− is 𝐾4 with one edge removed.
Exercise 5.2.17. Prove that every path is Sidorenko, by extending the proof of Theo-
rem 5.3.4.
Exercise 5.2.18 (A lower bound on clique density). Show that for every positive integer
𝑟 ≥ 3, and graphon 𝑊, writing 𝑝 = 𝑡 (𝐾2 , 𝑊),
𝑡 (𝐾𝑟 , 𝑊) ≥ 𝑝(2𝑝 − 1) (3𝑝 − 2) · · · ((𝑟 − 1) 𝑝 − (𝑟 − 2)) .
Note that this inequality is tight when 𝑊 is the associated graphon of a clique.
Exercise 5.2.19 (Triangle vs. diamond). Prove there is a function 𝑓 : [0, 1] → [0, 1]
with 𝑓 (𝑥) ≥ 𝑥 2 and lim 𝑥→0 𝑓 (𝑥)/𝑥 2 = ∞ such that
𝑡 (𝐾4− , 𝑊) ≥ 𝑓 (𝑡 (𝐾3 , 𝑊))
for all graphons 𝑊. Here 𝐾4− is 𝐾4 with one edge removed.
Hint: Apply the triangle removal lemma
5.3 Hölder 177
5.3 Hölder
Hölder’s inequality is a generalization of the Cauchy–Schwarz inequality. It says that given
𝑝 1 , . . . , 𝑝 𝑘 ≥ 1 with 1/𝑝 1 + · · · +1/𝑝 𝑘 = 1, and real-valued functions 𝑓1 , . . . , 𝑓 𝑘 on a common
space, we have
∫
𝑓1 𝑓2 · · · 𝑓 𝑘 ≤ ∥ 𝑓1 ∥ 𝑝1 · · · ∥ 𝑓 𝑘 ∥ 𝑝𝑘 ,
Lemma 5.3.2
𝑡 (𝐾𝑠,1 , 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 𝑠 .
Lemma 5.3.3
𝑡 (𝐾𝑠,𝑡 , 𝑊) ≥ 𝑡 (𝐾𝑠,1 , 𝑊) 𝑡 .
Theorem 5.3.4
The 3-edge path is Sidorenko.
Let us give two short proofs that both appeared as answers to a MathOverflow question
https://mathoverflow.net/q/189222. Later in Section 5.5 we will see another proof
using the entropy method.
The first proof is a special case of a more general technique by Sidorenko (1991).
𝑤
𝑥
𝑦
𝑧
178 Graph Homomorphism Inequalities
∫ the 3-edge path is Sidorenko. Let 𝑃4 be the 3-edge path. Let 𝑊 be a graphon.
First proof that
Let 𝑔(𝑥) = 𝑊 (𝑥, 𝑦), representing the “degree” of vertex 𝑥. We have
∫ ∫
𝑦
𝑡 (𝑃4 , 𝑊) = 𝑊 (𝑥, 𝑤)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦) = 𝑔(𝑥)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦).
𝑤, 𝑥,𝑦,𝑧 𝑥,𝑦,𝑧
= 𝑔(𝑥)𝑊 (𝑥, 𝑦)
∫ 2
𝑦 𝑥
√︁
≥ 𝑔(𝑥)𝑊 (𝑥, 𝑦)
∫ 2 ∫ 3 ∫ 3
𝑥,𝑦
3/2
= 𝑔(𝑥) ≥ 𝑔(𝑥) = 𝑊 (𝑥, 𝑦) . □
𝑥 𝑥 𝑥,𝑦
Note that ∫ ∫
𝑊 (𝑥, 𝑦) 𝑔(𝑥)
= = 1.
𝑥,𝑦 𝑔(𝑥) 𝑥 𝑔(𝑥)
Similarly we have
∫
𝑊 (𝑥, 𝑦)
= 1.
𝑥,𝑦 𝑔(𝑦)
So by Hölder’s inequality
∫ ∫ ∫
𝑊 (𝑥, 𝑦) 𝑊 (𝑥, 𝑦)
𝑡 (𝑃4 , 𝑊) = 𝑔(𝑥)𝑊 (𝑥, 𝑦)𝑔(𝑦)
𝑔(𝑥) 𝑔(𝑦)
∫ 3
𝑥,𝑦 𝑥,𝑦 𝑥,𝑦
≥ 𝑊 (𝑥, 𝑦) . □
𝑥,𝑦
5.3 Hölder 179
∫
Note that a straightforward ∫
application of Hölder’s inequality, when 𝑋, 𝑌 , 𝑍 are probability
spaces (so that 𝑥,𝑦,𝑧 𝑓 (𝑥, 𝑦) = 𝑥,𝑦 𝑓 (𝑥, 𝑦)) would yield
∫
𝑓 (𝑥, 𝑦)𝑔(𝑥, 𝑧)ℎ(𝑦, 𝑧) ≤ ∥ 𝑓 ∥ 3 ∥𝑔∥ 3 ∥ℎ∥ 3 .
𝑥,𝑦,𝑧
Next, we apply the Cauchy–Schwarz inequality to the variable 𝑦 (this affects 𝑓 and ℎ while
leaving 𝑔 intact). Continuing the above inequality,
∫ ∫ 1/2 ∫ 1/2 ∫ 1/2
2 2 2
≤ 𝑓 (𝑥, 𝑦) 𝑔(𝑥, 𝑧) ℎ(𝑦, 𝑧) .
𝑧 𝑥,𝑦 𝑥 𝑦
Finally, we apply the Cauchy–Schwarz inequality to the variable 𝑧 (this affects 𝑔 and ℎ while
leaving 𝑥 intact). Continuing the above inequality,
∫ 1/2 ∫ 1/2 ∫ 1/2
2 2 2
≤ 𝑓 (𝑥, 𝑦) 𝑔(𝑥, 𝑧) ℎ(𝑦, 𝑧) .
𝑥,𝑦 𝑥,𝑧 𝑦,𝑧
& Whitney (1949). See Exercise 5.3.9 below. It has important applications in combinatorics.
A powerful generalization known as Shearer’s entropy inequality will be discussed in
Section 5.5. Also see Exercise 5.5.19 for a strengthening of the projection inequalities.
Now let us state a more general form of Theorem 5.3.5, which can be proved using the
same techniques. The key point of the inequality in Theorem 5.3.5 is that each variable
(i.e., 𝑥, 𝑦, and 𝑧) is contained in exactly 2 of the factors (i.e., 𝑓 (𝑥, 𝑦), 𝑔(𝑥, 𝑧), and ℎ(𝑦, 𝑧)).
Everything works the same way as long as each variable is contained in exactly 𝑘 factors, as
long as we use 𝐿 𝑘 norms on the right-hand side.
For example,
∫
𝑓1 (𝑢, 𝑣) 𝑓2 (𝑣, 𝑤) 𝑓3 (𝑤, 𝑧) 𝑓4 (𝑥, 𝑦) 𝑣 𝑤
𝑢,𝑣,𝑤, 𝑥,𝑦,𝑧
Ö
9 𝑢 𝑥
· 𝑓5 (𝑦, 𝑧) 𝑓6 (𝑧, 𝑢) 𝑓7 (𝑢, 𝑥) 𝑓8 (𝑢, 𝑧) 𝑓9 (𝑤, 𝑦) ≤ ∥ 𝑓𝑖 ∥ 3 . 𝑧 𝑦
𝑖=1
Here the factors in the integral correspond to edges of a 3-regular graph shown. In particular,
every variable lies in exactly 3 factors.
More generally, each function 𝑓𝑖 can take as input any number of variables, as long as
every variable appears in exactly 𝑘 functions. For example
∫
𝑓 (𝑤, 𝑥, 𝑦)𝑔(𝑤, 𝑦, 𝑧)ℎ(𝑥, 𝑧) ≤ ∥ 𝑓 ∥ 2 ∥𝑔∥ 2 ∥ℎ∥ 2 .
𝑤, 𝑥,𝑦,𝑧
Furthermore, if every 𝑋𝑖 is a probability space, then we can relax the hypothesis to “each
element of [𝑚] appears in at most 𝑘 different 𝐼𝑖 ’s.”
Exercise 5.3.8. Prove Theorem 5.3.7 by generalizing the proof of Theorem 5.3.5.
The next exercise generalizes the projection inequality from Remark 5.3.6. Also see
Exercise 5.5.19 for a strengthening.
Exercise 5.3.9 (Projection inequalities). Let 𝐼1 , . . . , 𝐼ℓ ⊆ [𝑑] such that each element of
[𝑑] appears in exactly 𝑘 different 𝐼𝑖′ 𝑠. Prove that for any compact body 𝐾 ⊆ R𝑑 , with | · |
denoting volume in the appropriate dimension,
|𝐾 | 𝑘 ≤ |𝜋 𝐼1 (𝐾)| · · · |𝜋 𝐼ℓ (𝐾)|.
The version of Theorem 5.3.7 with each 𝑋𝑖 being a probability space is useful for graphons.
5.3 Hölder 181
In particular, since
∫
∥𝑊 ∥ 𝑘𝑘 = 𝑊 𝑘 ≤ 𝑡 (𝐾2 , 𝑊),
The answer turns out to be 𝐺 = 𝐾 𝑑,𝑑 . We can also take 𝐺 to be a disjoint union of copies
of 𝐾 𝑑,𝑑 ’s, and this would not change 𝑖(𝐺) 1/𝑣 (𝐺) . This result, stated below, was shown by
Kahn (2001) for bipartite regular graphs 𝐺, and later extended by Zhao (2010) to all regular
graphs 𝐺.
182 Graph Homomorphism Inequalities
The set of independent sets of 𝐺 is in bijection with the set of graph homomorphisms
from 𝐺 to the following graph:
Indeed, a map between their vertex sets form a graph homomorphism if and only if the
vertices of 𝐺 that map to the non-looped vertex is an independent set of 𝐺.
Let us first prove Theorem 5.3.14 for bipartite regular 𝐺. The following more general in-
equality was shown by Galvin & Tetali (2004). It implies the bipartite case of Theorem 5.3.14
by the above discussion.
Theorem 5.3.16
For any 𝑑-regular bipartite graph 𝐹,
2
𝑡 (𝐹, 𝑊) ≤ 𝑡 (𝐾 𝑑,𝑑 , 𝑊) 𝑒 (𝐹 )/𝑑
Let us prove this theorem in the case 𝐹 = 𝐶6 to illustrate the technique more concretely.
The general proof is basically the same. Let
∫
𝑓 (𝑥 1 , 𝑥2 ) = 𝑊 (𝑥1 , 𝑦)𝑊 (𝑥 2 , 𝑦).
𝑦
This function should be thought of the codegree of vertices 𝑥 1 and 𝑥 2 . Then, grouping the
factors in the integral according to their right-endpoint, we have
𝑥1 𝑦1
𝑥2 𝑦2
𝑥3 𝑦3
5.3 Hölder 183
∫
𝑡 (𝐶6 , 𝑊) = 𝑊 (𝑥1 , 𝑦 1 )𝑊 (𝑥 2 , 𝑦 1 )𝑊 (𝑥 1 , 𝑦 2 )𝑊 (𝑥 3 , 𝑦 2 )𝑊 (𝑥2 , 𝑦 3 )𝑊 (𝑥2 , 𝑦 3 )
∫ ∫ ∫
𝑥1 , 𝑥2 , 𝑥3 ,𝑦1 ,𝑦2 ,𝑦3
= 𝑊 (𝑥 1 , 𝑦 1 )𝑊 (𝑥2 , 𝑦 1 ) 𝑊 (𝑥 1 , 𝑦 2 )𝑊 (𝑥3 , 𝑦 2 )
∫
𝑥 1 , 𝑥2 , 𝑥3 𝑦1 𝑦2
· 𝑊 (𝑥 2 , 𝑦 3 )𝑊 (𝑥3 , 𝑦 3 )
∫
𝑦3
= 𝑓 (𝑥1 , 𝑥2 ) 𝑓 (𝑥1 , 𝑥3 ) 𝑓 (𝑥 2 , 𝑥3 )
𝑥 1 , 𝑥2 , 𝑥3
= 𝑊 (𝑥1 , 𝑦 1 )𝑊 (𝑥 2 , 𝑦 1 )𝑊 (𝑥 1 , 𝑦 2 )𝑊 (𝑥2 , 𝑦 2 )
𝑥1 , 𝑥2 ,𝑦1 ,𝑦2
= 𝑡 (𝐶4 , 𝑊).
𝑥1 𝑦1
𝑥2 𝑦2
This proves Theorem 5.3.16 in the case 𝐹 = 𝐶6 . The theorem in general can be proved via
a similar calculation.
Exercise 5.3.17. Complete the proof of Theorem 5.3.16 by generalizing the above argu-
ment.
Remark 5.3.18. Kahn (2001) first proved the bipartite case of Theorem 5.3.14 using
Shearer’s entropy inequality, which we will see in Section 5.5. His technique was extended
by Galvin & Tetali (2004) to prove Theorem 5.3.15. The proof using generalized Hölder’s
inequality presented here was given by Lubetzky & Zhao (2017).
So far we proved Theorem 5.3.14 for bipartite regular graphs. To prove it for all regular
graphs, we apply the following inequality by Zhao (2010). Here 𝐺 × 𝐾2 (tensor product) is
the bipartite double cover of 𝐺. An example is illustrated below:
𝐺 𝐺 × 𝐾2
The vertex set of 𝐺 × 𝐾2 is 𝑉 (𝐺) × {0, 1}. Its vertices are labeled 𝑣 𝑖 with 𝑣 ∈ 𝑉 (𝐺) and
𝑖 ∈ {0, 1}. Its edges are 𝑢 0 𝑣 1 for all 𝑢𝑣 ∈ 𝐸 (𝐺). Note that 𝐺 × 𝐾2 is always a bipartite graph.
184 Graph Homomorphism Inequalities
Assuming Theorem 5.3.19, we can now prove Theorem 5.3.14 by reducing the statement
to the bipartite case, which we proved earlier. Indeed, for every 𝑑-regular graph 𝐺,
𝑖(𝐺) ≤ 𝑖(𝐺 × 𝐾2 ) 1/2 ≤ 𝑖(𝐾 𝑑,𝑑 ) 𝑛/(2𝑑) ,
where the last step follows from applying Theorem 5.3.14 to the bipartite graph 𝐺 × 𝐾2 .
Proof of Theorem 5.3.19. Let 2𝐺 denote a disjoint union of two copies of 𝐺. Label its
vertices by 𝑣 𝑖 with 𝑣 ∈ 𝑉 and 𝑖 ∈ {0, 1} so that its edges are 𝑢 𝑖 𝑣 𝑖 with 𝑢𝑣 ∈ 𝐸 (𝐺) and
𝑖 ∈ {0, 1}. We will give an injection 𝜙 : 𝐼 (2𝐺) → 𝐼 (𝐺 × 𝐾2 ). Recall that 𝐼 (𝐺) is the set of
independent sets of 𝐺. The injection would imply 𝑖(𝐺) 2 = 𝑖(2𝐺) ≤ 𝑖(𝐺 × 𝐾2 ) as desired.
Fix an arbitrary order on all subsets of 𝑉 (𝐺). Let 𝑆 be an independent set of 2𝐺. Let
𝐸 bad (𝑆) := {𝑢𝑣 ∈ 𝐸 (𝐺) : 𝑢 0 , 𝑣 1 ∈ 𝑆}.
Note that 𝐸 bad (𝑆) is a bipartite subgraph of 𝐺, since each edge of 𝐸 bad has exactly one
endpoint in {𝑣 ∈ 𝑉 (𝐺) : 𝑣 0 ∈ 𝑆} but not both (or else 𝑆 would not be independent). Let 𝐴
denote the first subset (in the previously fixed ordering) of 𝑉 (𝐺) such that all edges in 𝐸 bad (𝑆)
have one vertex in 𝐴 and the other outside 𝐴. Define 𝜙(𝑆) to be the subset of 𝑉 (𝐺) × {0, 1}
obtained by “swapping” the pairs in 𝐴. That is, for all 𝑣 ∈ 𝐴, 𝑣 𝑖 ∈ 𝜙(𝑆) if and only if 𝑣 1−𝑖 ∈ 𝑆
for each 𝑖 ∈ {0, 1}, and for all 𝑣 ∉ 𝐴, 𝑣 𝑖 ∈ 𝜙(𝑆) if and only if 𝑣 𝑖 ∈ 𝑆 for each 𝑖 ∈ {0, 1}. It is
not hard to verify that 𝜙(𝑆) is an independent set in 𝐺 × 𝐾2 . The swapping procedure fixes
the “bad” edges.
2𝐺 𝐺 × 𝐾2 𝐺 × 𝐾2
It remains to verify that 𝜙 is an injection. For every 𝑆 ∈ 𝐼 (2𝐺), once we know 𝑇 = 𝜙(𝑆),
we can recover 𝑆 by first setting
′
𝐸 bad (𝑇) = {𝑢𝑣 ∈ 𝐸 (𝐺) : 𝑢 𝑖 , 𝑣 𝑖 ∈ 𝑇 for some 𝑖 ∈ {0, 1}},
so that 𝐸 bad (𝑆) = 𝐸 bad
′
(𝑇), and then finding 𝐴 as earlier and swapping the pairs of 𝐴 back.
(Remark: it follows that 𝑇 ∈ 𝐼 (𝐺 × 𝐾2 ) lies in the image of 𝜙 if and only if 𝐸 bad ′
(𝑇) is
bipartite.) □
Remark 5.3.20 (Reverse Sidorenko). Does Theorem 5.3.15 generalize to all regular graphs
𝐺 like Theorem 5.3.14? Unfortunately, no. For example, when 𝐻 = consists of two
isolated loops, hom(𝐺, 𝐻) = 2𝑐 (𝐺) , with 𝑐(𝐺) being the number of connected components
of 𝐺. So hom(𝐺, 𝐻) 1/𝑣 (𝐺) is minimized among 𝑑-regular graphs 𝐺 for 𝐺 = 𝐾 𝑑+1 , which is
the connected 𝑑-regular graph with the fewest vertices.
Theorem 5.3.15 actually extends to every triangle-free regular graph 𝐺. Furthermore, for
5.3 Hölder 185
every non-triangle-free regular graph 𝐺, there is some graph 𝐻 for which the inequality in
Theorem 5.3.15 fails.
There are several interesting families of graphs 𝐻 where Theorem 5.3.15 is known to
extend to all regular graphs 𝐺. Notably, this is true for 𝐻 = 𝐾𝑞 , which is significant since
hom(𝐺, 𝐾𝑞 ) is the number of proper 𝑞-colorings of 𝐺.
There are also generalizations of the above to non-regular graphs. For example, for a graph
𝐺 without isolated vertices, letting 𝑑𝑢 denote the degree of 𝑢 ∈ 𝑉 (𝐺), we have
Ö
𝑖(𝐺) ≤ 𝑖(𝐾 𝑑𝑢 ,𝑑𝑣 ) 1/(𝑑𝑢 𝑑𝑣 ) .
𝑢𝑣 ∈𝐸 (𝐺)
And similarly for the number of proper 𝑞-colorings. In fact, the results mentioned in this
remark about regular graphs are proved by induction on vertices of 𝐺, and thus require
considering the larger family of not necessarily regular graphs 𝐺.
The results discussed in this remark are due to Sah, Sawhney, Stoner, & Zhao (2019;
2020). The term reverse Sidorenko inequalities was introduced to describe inequalities such
2
as 𝑡 (𝐹, 𝑊) 1/𝑒 (𝐹 ) ≤ 𝑡 (𝐾 𝑑,𝑑 , 𝑊) 1/𝑑 , which mirror the inequality 𝑡 (𝐹, 𝑊) 1/𝑒 (𝐹 ) ≥ 𝑡 (𝐾2 , 𝑊) in
Sidorenko’s conjecture. Also see the earlier survey by Zhao (2017) for discussions of related
results and open problems.
We already know through the quasirandom graph equivalences (Theorem 3.1.1) that 𝐶4 is
forcing. The following exercise generalizes this fact.
Exercise 5.3.21. Prove that 𝐾 𝑠,𝑡 is forcing whenever 𝑠, 𝑡 ≥ 2.
Exercise 5.3.22. Let 𝐹 be a bipartite graph with vertex bipartition 𝐴 ∪ 𝐵 such that every
vertex in 𝐵 has degree 𝑑. Let 𝑑𝑢 denote the degree of 𝑢 in 𝐹. Prove that for every graphon
Ö
𝑊,
𝑡 (𝐹, 𝑊) ≤ 𝑡 (𝐾 𝑑𝑢 ,𝑑𝑣 , 𝑊) 1/(𝑑𝑢 𝑑𝑣 ) .
𝑢𝑣 ∈𝐸 (𝐹 )
Exercise 5.3.23 (Sidorenko for 3-edge path with vertex weights). Let 𝑊 : [0, 1] 2 →
[0, ∞) be a measurable function (not necessarily symmetric). Let 𝑝, 𝑞, 𝑟, 𝑠 : [0, 1] →
[0, ∞) be measurable functions. Prove that
∫
𝑝(𝑤)𝑞(𝑥)𝑟 (𝑦)𝑠(𝑧)𝑊 (𝑥, 𝑤)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦) 𝑤
∫ 3
𝑤, 𝑥,𝑦,𝑧 𝑥
𝑦
≥ ( 𝑝(𝑥)𝑞(𝑥)𝑟 (𝑦)𝑠(𝑦)) 1/3𝑊 (𝑥, 𝑦) . 𝑧
𝑥,𝑦
Exercise 5.3.24. For a graph 𝐺, let 𝑓𝑞 (𝐺) denote the number of maps 𝑉 (𝐺) → {0, 1, . . . , 𝑞}
such that 𝑓 (𝑢) + 𝑓 (𝑣) ≤ 𝑞 for every 𝑢𝑣 ∈ 𝐸 (𝐺). Prove that for every 𝑛-vertex 𝑑-regular
graph 𝐺 (not necessarily bipartite),
𝑓𝑞 (𝐺) ≤ 𝑓𝑞 (𝐾 𝑑,𝑑 ) 𝑛/(2𝑑) .
186 Graph Homomorphism Inequalities
5.4 Lagrangian
Another proof of Turán’s theorem
Here is another proof of Turán’s theorem due to Motzkin & Straus (1965). It can be viewed
as a continuous/analytic analogue of the Zykov symmetrization proof of Turán’s theorem
from Section 1.2 (the third proof there).
Proof. Let 𝐺 be a 𝐾𝑟+1 -free graph on vertex set [𝑛]. Consider the function
∑︁
𝑓 (𝑥 1 , . . . , 𝑥 𝑛 ) = 𝑥𝑖 𝑥 𝑗 .
𝑖 𝑗 ∈𝐸 (𝐺)
It is a useful tool for certain hypergraph Turán problems. The above proof of Turán’s theorem
shows that for every graph 𝐺, 𝜆(𝐺) = (1 − 1/𝜔(𝐺))/2, where 𝜔(𝐺) is the size of the largest
clique in 𝐺. A maximizing 𝑥 has coordinate 1/𝜔(𝐺) on vertices of the clique and zero
elsewhere.
5.4 Lagrangian 187
As an alternate but equivalent perspective, the above proof can rephrased in terms of
maximizing the edge density among 𝐾𝑟+1 -free vertex-weighted graphs (vertex weights are
given by the vector 𝑥 above). The proof shifts weights between non-adjacent vertices while
not decreasing the edge density, and this process preserves 𝐾𝑟+1 -freeness.
𝑒 3 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑥𝑖 𝑥 𝑗 𝑥 𝑘 ,
1≤𝑖< 𝑗<𝑘 ≤𝑛
..
.
𝑒 𝑛 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑥1 · · · 𝑥 𝑛 .
is true for every graph 𝐺 if and only if it is true with 𝐺 = 𝐾𝑛 for every positive integer 𝑛.
More explicitly, the above inequality holds for all graphs 𝐺 if and only if
∑︁
ℓ
𝑛(𝑛 − 1) · · · (𝑛 − 𝑟 + 1)
𝑐𝑟 · ≥0 for every 𝑛 ∈ N.
𝑟=1
𝑛𝑟
Since this is a single variable polynomial in 𝑛, it is usually easy to check this inequality. We
will see some examples right after the proof.
Proof. The only non-trivial direction is the “if” implication. Suppose the displayed inequality
holds for all cliques 𝐺. Let 𝐺 be an arbitrary graph with vertex set [𝑛]. Let
∑︁
ℓ ∑︁
𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑟!𝑐𝑟 𝑥𝑖1 · · · 𝑥𝑖𝑟 .
𝑟=1 {𝑖1 ,...,𝑖𝑟 }
𝑟 -clique in 𝐺
So
∑︁ℓ
1 1
𝑓 ,..., = 𝑐𝑟 𝑡 (𝐾𝑟 , 𝐺).
𝑛 𝑛 𝑟=1
By compactness, we can assume that the minimum is attained at some 𝑥. Among all
minimizing 𝑥, choose one with the smallest support (i.e., the number of nonzero coordinates).
As in the previous proof, if 𝑖 𝑗 ∉ 𝐸 (𝐺) for some pair of distinct 𝑥𝑖 , 𝑥 𝑗 > 0, then, replacing
(𝑥𝑖 , 𝑥 𝑗 ) by (𝑠, 𝑥𝑖 + 𝑥 𝑗 − 𝑠), 𝑓 changes linearly in 𝑠. Since 𝑓 is already minimized at 𝑥, it must
stay constant as 𝑠 changes. So we can replace (𝑥𝑖 , 𝑥 𝑗 ) by (𝑥𝑖 + 𝑥 𝑗 , 0), which keeps 𝑓 the same
while decreasing the number of nonzero coordinates of 𝑥. Thus the support of 𝑥 is a clique
in 𝐺. Suppose 𝑥 is supported on the first 𝑘 coordinates. Then 𝑓 is a linear combination of
elementary symmetric polynomial in 𝑥1 , . . . , 𝑥 𝑘 . By Lemma 5.4.3, 𝑥1 = · · · = 𝑥 𝑘 = 1/𝑘.
Íℓ
Then 𝑓 (𝑥) = 𝑟=1 𝑐𝑟 𝑡 (𝐾𝑟 , 𝐾 𝑘 ) ≥ 0 by hypothesis. □
Remark 5.4.5. This proof technique can be adapted to show the stronger result that among
Íℓ
all graphs 𝐺 with a given number of vertices, the quantity 𝑟=1 𝑐𝑟 𝑡 (𝐾𝑟 , 𝐺) is minimized
when 𝐺 is a multipartite graph. Compare with the Zykov symmetrization proof of Turán’s
theorem (Theorem 1.2.4).
The theorem only considers linear inequalities between clique densities. The statement
fails in general for inequalities with other graph densities (why?).
5.4 Lagrangian 189
Theorem 5.4.4 can be equivalently stated in terms of the convex hull of the region of all
possible clique density tuples.
Exercise 5.4.9. For each graph 𝐹, let 𝑐 𝐹 ∈ R be such that 𝑐 𝐹 ≥ 0 whenever 𝐹 is not
a clique (no restrictions when 𝐹 is a clique). Assume that 𝑐 𝐹 ≠ 0 for finitely many 𝐹’s.
Prove that the inequality
∑︁
𝑐 𝐹 𝑡 inj (𝐹, 𝐺) ≥ 0
𝐹
is true for every graph 𝐺 if and only if it is true with 𝐺 = 𝐾𝑛 for every positive integer 𝑛.
Exercise 5.4.10 (Cliquey edges). Let 𝑛, 𝑟, 𝑡 be nonnegative integers. Show that every
2
𝑛-vertex graph with at least (1 − 𝑟1 ) 𝑛2 + 𝑡 edges contains at least 𝑟𝑡 edges that belong to a
𝐾𝑟+1 .
Hint: Rephrase the statement as a linear inequality between the number of edges and the number of cliquey edges in every graph.
Exercise 5.4.11 (A hypergraph Turán density). Let 𝐹 be the 3-graph with 10 vertices and
6 edges illustrated below (lines denotes edges). Prove that the hypergraph Turán density of
𝐹 is 2/9.
190 Graph Homomorphism Inequalities
Exercise 5.4.12∗ (Maximizing 𝐾1,2 density). Prove that, for every 𝑝 ∈ [0, 1], among all
graphons 𝑊 with 𝑡 (𝐾2 , 𝑊) = 𝑝, the maximum possible value of 𝑡 (𝐾1,2 , 𝑊) is attained by
either a “clique” or a “hub” graphon, illustrated below.
0 𝑎 1 0 𝑎 1
1
1 𝑎
𝑎 0
0
1 1
clique graphon hub graphon
𝑊 (𝑥, 𝑦) = 1max{ 𝑥,𝑦 } ≤𝑎 𝑊 (𝑥, 𝑦) = 1min{ 𝑥,𝑦 } ≤𝑎
5.5 Entropy
In this section, we explain how to use entropy to prove certain graph homomorphism in-
equalities.
Entropy basics
Proof. Let function 𝑓 (𝑥) = −𝑥 log2 𝑥 is concave for 𝑥 ∈ [0, 1]. We have, by concavity,
!
∑︁ 1 ∑︁ 1
𝐻 (𝑋) = 𝑓 ( 𝑝 𝑠 ) ≤ |𝑆| 𝑓 𝑝 𝑠 = |𝑆| 𝑓 = log2 |𝑆| . □
𝑠∈𝑆
|𝑆| 𝑠∈𝑆 |𝑆|
5.5 Entropy 191
We write 𝑯(𝑿, 𝒀) for the entropy of the joint random variables (𝑋, 𝑌 ). This means that
∑︁
𝑯(𝑿, 𝒀) := 𝐻 (𝑍) = −P(𝑋 = 𝑥, 𝑌 = 𝑦) log2 P(𝑋 = 𝑥, 𝑌 = 𝑦).
( 𝑥,𝑦)
In particular,
𝐻 (𝑋, 𝑌 ) = 𝐻 (𝑋) + 𝐻 (𝑌 ) if 𝑋 and 𝑌 are independent.
We can similarly define 𝐻 (𝑋, 𝑌 , 𝑍), and so on.
= 𝐻 (𝑋, 𝑌 ) − 𝐻 (𝑌 ). □
192 Graph Homomorphism Inequalities
∑︁ 𝑝(𝑥) 𝑝(𝑦)
≥ 𝑓 𝑝(𝑥, 𝑦) = 𝑓 (1) = 0.
𝑥,𝑦
𝑝(𝑥, 𝑦)
More generally, by iterating the above inequality for two random variables, we have
𝐻 (𝑋1 , . . . , 𝑋𝑛 ) ≤ 𝐻 (𝑋1 , . . . , 𝑋𝑛−1 ) + 𝐻 (𝑋𝑛 )
≤ 𝐻 (𝑋1 , . . . , 𝑋𝑛−2 ) + 𝐻 (𝑋𝑛−1 ) + 𝐻 (𝑋𝑛 )
≤ · · · ≤ 𝐻 (𝑋1 ) + · · · + 𝐻 (𝑋𝑛 ). □
Remark 5.5.7. The nonnegative quantity
𝐼 (𝑋; 𝑌 ) := 𝐻 (𝑋) + 𝐻 (𝑌 ) − 𝐻 (𝑋, 𝑌 )
is called mutual information. Intuitively, it measures the amount of common information
between 𝑋 and 𝑌 .
Theorem 5.5.10
The 3-edge path is Sidorenko.
Proof. Let 𝑃4 denote the 3-edge path and 𝐺 a graph. An element of Hom(𝑃4 , 𝐺) is a walk
of length three. We choose randomly a walk 𝑋𝑌 𝑍𝑊 in 𝐺 as follows:
• 𝑋𝑌 is a uniform random edge of 𝐺 (by this we mean first choosing an edge of 𝐺
uniformly at random, and then let 𝑋 be a uniformly chosen endpoint of this edge, and
then 𝑌 the other endpoint);
• 𝑍 is a uniform random neighbor of 𝑌 ;
• 𝑊 is a uniform random neighbor of 𝑍.
A key observation is that 𝑌 𝑍 is also distributed as a uniform random edge of 𝐺 (pause
and think about why). Indeed, conditioned on the choice of 𝑌 , the vertices 𝑋 and 𝑍 are both
independent and uniform neighbors of 𝑌 , so 𝑋𝑌 and 𝑌 𝑍 are identically distributed, and hence
𝑌 𝑍 is a uniform random edge of 𝐺.
Similarly, 𝑍𝑊 is distributed as uniform random edge.
Also, since 𝑋 and 𝑍 are conditionally independent given 𝑌
𝐻 (𝑍 |𝑋, 𝑌 ) = 𝐻 (𝑍 |𝑌 ) and 𝐻 (𝑊 |𝑋, 𝑌 , 𝑍) = 𝐻 (𝑊 |𝑍).
Furthermore,
𝐻 (𝑌 |𝑋) = 𝐻 (𝑍 |𝑌 ) = 𝐻 (𝑊 |𝑍)
194 Graph Homomorphism Inequalities
This proves (5.6), and thus shows that 𝑃4 is Sidorenko. Indeed, by the uniform bound,
log2 hom(𝑃4 , 𝐹) ≥ 𝐻 (𝑋, 𝑌 , 𝑍, 𝑊) ≥ 3 log2 (2𝑒(𝐺)) − 2 log2 𝑣(𝐺),
and hence
3
hom(𝑃4 , 𝐺) 2𝑒(𝐺)
𝑡 (𝑃4 , 𝐺) = ≥ = 𝑡 (𝐾2 , 𝐺) 3 . □
𝑣(𝐺) 4 𝑣(𝐺) 2
Let us outline how to extend the above proof strategy from the 3-edge path to any tree 𝑇.
Define a 𝑻-branching random walk in a graph 𝐺 to be a random Φ ∈ Hom(𝑇, 𝐺) defined
by fixing an arbitrary root 𝑣 of 𝑇 (the choice of 𝑣 will not matter in the end). Then set Φ(𝑣)
to be a random vertex of 𝐺 with each vertex of 𝐺 chosen proportional to its degree. Then
extend Φ to a random homomorphism 𝑇 → 𝐺 one vertex at a time: if 𝑢 ∈ 𝑉 (𝑇) is already
mapped to Φ(𝑢) and its neighbor 𝑤 ∈ 𝑉 (𝑇) has not yet been mapped, then set Φ(𝑤) to
be a uniform random neighbor of Φ(𝑢), independent of all previous choices. The resulting
random Φ ∈ Hom(𝑇, 𝐺) has the following properties:
• for each edge of 𝑇, its image under Φ is a uniform random edge of 𝐺 and with the two
possible edge orientations equally likely; and
• for each vertex 𝑣 of 𝑇, conditioned on Φ(𝑣), the neighbors of 𝑣 in 𝑇 are mapped by Φ
to conditionally independent and uniform neighbors of Φ(𝑣) in 𝐺.
Furthermore, as in the proof of Theorem 5.5.10,
𝐻 (Φ) = 𝑒(𝑇) log2 (2𝑒(𝐺)) − (𝑒(𝑇) − 1)𝐻 (Φ(𝑣))
≥ 𝑒(𝑇) log2 (2𝑒(𝐺)) − (𝑒(𝑇) − 1) log2 𝑣(𝐺). (5.7)
(Exercise: fill in the details.) Together with the uniform bound 𝐻 (Φ) ≤ log2 hom(𝑇, 𝐺), we
proved the following.
Theorem 5.5.11
Every tree is Sidorenko.
We saw earlier that 𝐾𝑠,𝑡 is Sidorenko, which can be proved by two applications of Hölder’s
inequality (see Section 5.3). Here let us give another proof using entropy. This entropy proof
is subtler than the earlier Hölder’s inequality proof, but it will soon lead us more naturally to
the next generalization.
5.5 Entropy 195
Theorem 5.5.12
Every complete bipartite graph is Sidorenko.
Let us demonstrate the proof for 𝐾2,2 for concreteness. The same proof extends to all 𝐾𝑠,𝑡 .
𝑥1 𝑦1
𝑥2 𝑦2
Proof that 𝐾2,2 is Sidorenko. As earlier, we construct a random element of Hom(𝐾2,2 , 𝐺).
Pick a random (𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) ∈ 𝑉 (𝐺) 4 with 𝑋𝑖𝑌 𝑗 ∈ 𝐸 (𝐺) for all 𝑖, 𝑗 as follows:
• 𝑋1𝑌1 is a uniform random edge;
• 𝑌2 is a uniform random neighbor of 𝑋1 ;
• 𝑋2 is a conditionally independent copy of 𝑋1 given (𝑌1 , 𝑌2 ).
The last point deserves some attention. It does not say that we choose a uniform random
common neighbor of 𝑌1 and 𝑌2 , as one might naively attempt. Instead, one can think of
the first two steps as defining the 𝐾1,2 -branching random walk for (𝑋1 , 𝑌1 , 𝑌2 ). Under this
distribution, we can first sample (𝑌1 , 𝑌2 ) according to its marginal, and then produce two
conditionally independent copies of 𝑋1 (with the second copy now called 𝑋2 ).
We have
𝐻 (𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 )
= 𝐻 (𝑌1 , 𝑌2 ) + 𝐻 (𝑋1 , 𝑋2 |𝑌1 , 𝑌2 ) [chain rule]
= 𝐻 (𝑌1 , 𝑌2 ) + 2𝐻 (𝑋1 |𝑌1 , 𝑌2 ) [cond. indep.]
= 2𝐻 (𝑋1 , 𝑌1 , 𝑌2 ) − 𝐻 (𝑌1 , 𝑌2 ) [chain rule]
≥ 2(2 log2 (2𝑒(𝐺)) − log2 𝑣(𝐺)) − 𝐻 (𝑌1 , 𝑌2 ). [(5.7)]
≥ 2(2 log2 (2𝑒(𝐺)) − log2 𝑣(𝐺)) − 2 log2 𝑣(𝐺). [uniform bound]
= 4 log(2𝑒(𝐺)) − 4 log2 𝑣(𝐺).
Together with the uniform bound 𝐻 (𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) ≤ log2 hom(𝐾2,2 , 𝐺), we deduce that 𝐾2,2
is Sidorenko. □
Exercise 5.5.13. Complete the proof of Theorem 5.5.12 for general 𝐾 𝑠,𝑡 .
The following result was first proved by Conlon, Fox, & Sudakov (2010) using the de-
pendent random choice technique. The entropy proof was found later by Li & Szegedy
(2011).
Theorem 5.5.14
Let 𝐹 be a bipartite graph that has a vertex adjacent to all vertices in the other part. Then
𝐹 is Sidorenko.
Let us illustrate the proof for the following graph 𝐹. The proof extends to the general case.
196 Graph Homomorphism Inequalities
𝑦1
𝑥1
𝑥0
𝑦2
𝑥2
𝑦3
Proof that the above graph is Sidorenko. Pick (𝑋0 , 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 , 𝑌3 ) ∈ 𝑉 (𝐺) 6 randomly as
follows:
• 𝑋0𝑌1 is a uniform random edge;
• 𝑌2 and 𝑌3 are independent uniform random neighbors of 𝑋0 ;
• 𝑋1 is a conditionally independent copy of 𝑋0 given (𝑌1 , 𝑌2 );
• 𝑋2 is a conditionally independent copy of 𝑋0 given (𝑌2 , 𝑌3 ).
We have the following properties:
• 𝑋0 , 𝑋1 , 𝑋2 are conditionally independent given (𝑌1 , 𝑌2 , 𝑌3 );
• 𝑋1 and (𝑋0 , 𝑌3 , 𝑋2 ) are conditionally independent given (𝑌1 , 𝑌2 );
• The distribution of (𝑋0 , 𝑌1 , 𝑌2 ) is identical to the distribution of (𝑋1 , 𝑌1 , 𝑌2 ).
So (the 1st and 4th steps by chain rule, and the 2nd and 3rd steps by conditional independence)
𝐻 (𝑋0 , 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 , 𝑌3 )
= 𝐻 (𝑋0 , 𝑋1 , 𝑋2 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑌1 , 𝑌2 , 𝑌3 )
= 𝐻 (𝑋0 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑋1 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑋2 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑌1 , 𝑌2 , 𝑌3 )
= 𝐻 (𝑋0 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑋1 |𝑌1 , 𝑌2 ) + 𝐻 (𝑋2 |𝑌2 , 𝑌3 ) + 𝐻 (𝑌1 , 𝑌2 , 𝑌3 )
= 𝐻 (𝑋0 , 𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑋1 , 𝑌1 , 𝑌2 ) + 𝐻 (𝑋2 , 𝑌2 , 𝑌3 ) − 𝐻 (𝑌1 , 𝑌2 ) − 𝐻 (𝑌2 , 𝑌3 ).
By (5.7),
𝐻 (𝑋0 , 𝑌1 , 𝑌2 , 𝑌3 ) ≥ 3 log2 (2𝑒(𝐺)) − 2 log2 𝑣(𝐺),
𝐻 (𝑋1 , 𝑌1 , 𝑌2 ) ≥ 2 log2 (2𝑒(𝐺)) − log2 𝑣(𝐺),
and 𝐻 (𝑋2 , 𝑌2 , 𝑌3 ) ≥ 2 log2 (2𝑒(𝐺)) − log2 𝑣(𝐺).
And by the uniform bound,
𝐻 (𝑌1 , 𝑌2 ) = 𝐻 (𝑌2 , 𝑌3 ) ≤ 2 log2 𝑣(𝐺).
Putting everything together, we have
log2 hom(𝐹, 𝐺) ≥ 𝐻 (𝑋0 , 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 , 𝑌3 ) ≥ 7 log2 (2𝑒(𝐺)) − 8 log2 𝑣(𝐺).
Thereby verifying (5.6), showing that 𝐹 is Sidorenko. □
(Where did we use the assumption that 𝐹 has vertex complete to the other part?)
Exercise 5.5.15. Complete the proof of Theorem 5.5.14.
Shearer’s inequality
Another important tool in the entropy method is Shearer’s inequality, which is a powerful
generalization of subadditivity. Before stating it in full generality, let us first see a simple
instance of Shearer’s lemma.
5.5 Entropy 197
Exercise 5.5.18. Prove Theorem 5.5.17 by generalizing the proof of Theorem 5.5.16.
Shearer’s entropy inequality is related to the generalized Hölder inequality from Sec-
tion 5.3. It is a significant generalization of the projection inequality discussed in Re-
mark 5.3.6. See Friedgut (2004) for more on these connections.
The next exercise asks you to prove a strengthening of the projection inequalities (Re-
mark 5.3.6 and Exercise 5.3.9) by mimicking the entropy proof of Shearer’s entropy inequal-
ity. The result is due to Bollobás & Thomason (1995), though their original proof does not
use the entropy method.
Exercise 5.5.19 (Box theorem). For each 𝐼 ⊆ [𝑑], write 𝜋 : R𝑑 → R𝐼 to denote the
projection obtained by omitting coordinates outside 𝐼. Show that for every compact body
𝐾 ⊆ R𝑑 , there exists a box 𝐵 = [𝑎 1 , 𝑏 1 ] × · · · × [𝑎 𝑑 , 𝑏 𝑑 ] ⊆ R𝑑 such that |𝐵| = |𝐾 | and
|𝜋 𝐼 (𝐵)| ≤ |𝜋 𝐼 (𝐾)| for every 𝐼 ⊆ [𝑑] (here | · | denotes volume).
Use this result to give another proof of the projection inequality from Exercise 5.3.9.
Hint: First prove it for 𝐾 being a union of grid boxes. Then extend it to general 𝐾 via compactness.
Let us use the entropy method to give another proof of Theorem 5.3.15, restated below.
The proof below is based on (with some further simplifications) the entropy proofs of
Galvin & Tetali (2004), which was in turn based on the proof by Kahn (2001) for independent
sets.
Proof. Let us first illustrate the proof for 𝐹 being the following graph
𝑥1 𝑦1
𝑥2 𝑦2
𝑥3 𝑦3
In the final step, we use that 𝑋3 and 𝑌1 are conditionally independent given 𝑋1 and 𝑋2 (why?),
along with two other analogous statements. A more general statement is that if 𝑆 ⊆ 𝑉 (𝐹), then
the restrictions to the different connected components of 𝐹 − 𝑆 are conditionally independent
given (𝑋𝑠 ) 𝑠∈𝑆 .
To complete the proof, it remains to show
𝐻 (𝑋1 , 𝑋2 ) + 2𝐻 (𝑌1 |𝑋1 , 𝑋2 ) ≤ log2 hom(𝐾2,2 , 𝐺),
𝐻 (𝑋1 , 𝑋3 ) + 2𝐻 (𝑌2 |𝑋1 , 𝑋3 ) ≤ log2 hom(𝐾2,2 , 𝐺),
and 𝐻 (𝑋2 , 𝑋3 ) + 2𝐻 (𝑌3 |𝑋2 , 𝑋3 ) ≤ log2 hom(𝐾2,2 , 𝐺).
They are analogous so let us just show the first inequality. Let 𝑌1′ be a conditionally indepen-
dent copy of 𝑌1 given (𝑋1 , 𝑋2 ). Then (𝑋1 , 𝑋2 , 𝑌1 , 𝑌1′ ) is the image of a homomorphism from
𝐾2,2 to 𝐺 (though not necessarily chosen uniformly).
𝑥1 𝑦1
𝑥2 𝑦 1′
Thus we have
𝐻 (𝑋1 , 𝑋2 ) + 2𝐻 (𝑌1 |𝑋1 , 𝑋2 ) = 𝐻 (𝑋1 , 𝑋2 ) + 𝐻 (𝑌1 , 𝑌1′ |𝑋1 , 𝑋2 )
= 𝐻 (𝑋1 , 𝑋2 , 𝑌1 , 𝑌1′ ) [chain rule]
≤ log2 hom(𝐾2,2 , 𝐺) [uniform bound]
Hom(𝐹, 𝐺) be chosen uniformly at random. For each 𝑣 ∈ 𝑉, let 𝑋𝑣 = Φ(𝑣). For each 𝑆 ⊆ 𝑉,
write 𝑋𝑆 := (𝑋𝑣 ) 𝑣 ∈𝑆 . We have
𝑑 log2 hom(𝐹, 𝐺) = 𝑑𝐻 (Φ) = 𝑑𝐻 (𝑋 𝐴) + 𝑑𝐻 (𝑋 𝐵 |𝑋 𝐴) [chain rule]
∑︁ ∑︁
≤ 𝐻 (𝑋 𝑁 (𝑏) ) + 𝑑 𝐻 (𝑋𝑏 |𝑋 𝐴) [Shearer]
∑︁
𝑏∈ 𝐵
∑︁
𝑏∈ 𝐵
For each 𝑏 ∈ 𝐵, let 𝑋𝑏(1) , . . . , 𝑋𝑏(𝑑) be conditionally independent copies of 𝑋𝑏 given 𝑋 𝑁 (𝑏) .
We have
𝐻 (𝑋 𝑁 (𝑏) ) + 𝑑𝐻 (𝑋𝑏 |𝑋 𝑁 (𝑏) ) = 𝐻 (𝑋 𝑁 (𝑏) ) + 𝐻 (𝑋𝑏(1) , . . . , 𝑋𝑏(𝑑) |𝑋 𝑁 (𝑏) )
= 𝐻 (𝑋𝑏(1) , . . . , 𝑋𝑏(𝑑) , 𝑋 𝑁 (𝑏) ) [chain rule]
≤ log2 hom(𝐾 𝑑,𝑑 , 𝐺). [uniform bound]
Further Reading
The book Large Networks and Graph Limits by Lovász (2012) contains an excellent treatment
of graph homomorphism inequalities in Section 2.1 and Chapter 16.
The survey Flag Algebras: An Interim Report by Razborov (2013) contains a survey of
results obtained using the flag algebra method.
For combinatorial applications of the entropy method, see the surveys
• Entropy and Counting by Radhakrishnan (2003), and
• Three Tutorial Lectures on Entropy and Counting by Galvin (2014).
200 Graph Homomorphism Inequalities
Chapter Summary
• Many problems in extremal graph theory can be phrased in terms of graph homomorphism
inequalities.
– Homomorphism density inequalities are undecidable in general.
– Many open problems remain, such as Sidorenko’s conjecture, which says that if 𝐹 is
bipartite, then 𝑡 (𝐹, 𝐺) ≥ 𝑡 (𝐾2 , 𝐺) 𝑒 (𝐹 ) for all graphs 𝐺.
• The set of all possible (edge, triangle) density pairs is known.
– For a given edge density, the maximum triangle density is maximized by a clique.
– For a given edge density, the minimum triangle density is given by a certain multipartite
graph. (We did not prove this result in full and only established the convex hull in
Section 5.4.)
• Cauchy–Schwarz and Hölder inequalities are versatile tools.
– Simple applications of Cauchy–Schwarz inequalities can often be recognized by “re-
flection symmetries” in a graph that can be “folded in half.”
– Flag algebra leads to computerized searches of Cauchy–Schwarz proofs of subgraph
density inequalities.
– Generalized Hölder inequality tells us that, as an example,
∫
𝑓 (𝑥, 𝑦)𝑔(𝑥, 𝑧)ℎ(𝑦, 𝑧) ≤ ∥ 𝑓 ∥ 2 ∥𝑔∥ 2 ∥ℎ∥ 2 .
𝑥,𝑦,𝑧
It can be proved by repeated applications of Hölder’s inequality, once for each variable.
The inequality is related to Shearer’s entropy inequality, an example of which says
that for joint random variables 𝑋, 𝑌 , 𝑍,
2𝐻 (𝑋, 𝑌 , 𝑍) ≤ 𝐻 (𝑋, 𝑌 ) + 𝐻 (𝑋, 𝑍) + 𝐻 (𝑌 , 𝑍).
• The Lagranian method relaxes an optimization problem on graphs to one about vertex-
weighted graphs, and then argue by shifting weights between vertices. We used the method
to prove
– Turán’s theorem (again);
– A linear inequality between clique densities in 𝐺 is true and only if it holds whenever
𝐺 is a clique.
• The entropy method can be used to establish various cases of Sidorenko’s conjecture,
including for trees, as well as for a bipartite graph with one vertex complete to the other
side.
6
Chapter Highlights
• Fourier analytic proof of Roth’s theorem
• Finite field model in additive combinatorics: F𝑛𝑝 as a model for the integers
• Basics of discrete Fourier analysis
• Density increment argument in the proof of Roth’s theorem
• The polynomial method proof of Roth’s theorem in F3𝑛
• Arithmetic analogue of the regularity lemma, and application to Roth’s theorem with
popular difference
In this chapter, we study Roth’s theorem, which says that every 3-AP-free subset of [𝑁]
has size 𝑜(𝑁).
Previously, in Section 2.4, we gave a proof of Roth’s theorem using the graph regularity
lemma. The main goal of this chapter is to give a Fourier analytic proof of Roth’s theorem.
This is also Roth’s original proof (1953).
We begin by proving Roth’s theorem in the finite field model. That is, we first prove an
analogue of Roth’s theorem in F3𝑛 . The finite field vector space serves as a fruitful playground
for many additive combinatorics problems. Techniques such as Fourier analysis are often
simpler to carry out in the finite field model. After we develop the techniques in the finite
field model, we then prove Roth’s theorem in the integers. It can be a good idea to first try out
ideas in the finite field model before bringing them to the integers, as there may be additional
technical difficulties in the integers.
Later in Section 6.5, we will see a completely different proof of Roth’s theorem in F3𝑛
using the polynomial method, which gives significantly better quantitative bounds. This
proof surprised many people at the time of its discovery. However, unlike Fourier analysis,
this polynomial method technique only applies to the finite field setting, and it is unknown
how to apply it to the integers.
There is an interesting parallel between the Fourier analytic method in this chapter and the
graph regularity method from Chapter 2. In Section 6.6, we develop an arithmetic regularity lemma
and use it in Section 6.7 to prove a strengthening of Roth’s theorem showing popular com-
mon differences.
201
202 Forbidding 3-Term Arithmetic Progressions
1 ∑︁
𝒇 (𝒓) := E 𝑥 ∈F𝑛𝑝 𝑓 (𝑥)𝜔 −𝑟 · 𝑥 = 𝑛
b 𝑓 (𝑥)𝜔 −𝑟 · 𝑥
𝑝 𝑥 ∈F𝑛
𝑝
where 𝑟 · 𝑥 = 𝑟 1 𝑥 1 + · · · + 𝑟 𝑛 𝑥 𝑛 .
In particular, b
𝑓 (0) = E 𝑓 is the average of 𝑓 . This value often plays a special role compared
to other values b𝑓 (𝑟).
To simplify notation, it is generally understood that the variables being averaged or summed
over are varying uniformly in the domain F𝑛𝑝 .
Let us now state several important properties of the Fourier transform. We will see that all
these properties are consequences of the orthogonality of the Fourier basis.
The next result allows us to write 𝑓 in terms of b 𝑓.
The next result tells us that the Fourier transform preserves inner products.
Remark 6.1.4 (History/naming). The names Parseval and Plancheral are often used in-
terchangeably in practice to refer to the unitarity of the Fourier transform (i.e., the above
theorem). Parseval derived the identity for the Fourier series of a periodic function on R,
whereas Plancheral derived it for the Fourier transform on R.
As is nowadays the standard in additive combinatorics, we adopt the following convention
for the Fourier transform in finite abelian groups:
average in physical space E 𝒇
and sum in frequency (Fourier) space 𝒇.
Íb
6.1 Fourier Analysis in Finite Field Vector Spaces 203
For example, following this convention, we define an “averaging” inner product for functions
𝑓 , 𝑔 : F𝑛𝑝 → C by
Proof. We have
𝑓 ∗ 𝑔(𝑟) = E 𝑥 ( 𝑓 ∗ 𝑔)(𝑥)𝜔 −𝑟 · 𝑥 = E 𝑥 E 𝑦,𝑧:𝑦+𝑧=𝑥 𝑓 (𝑦)𝑔(𝑧)𝜔 −𝑟 · ( 𝑦+𝑧)
= E 𝑦,𝑧 𝑓 (𝑦)𝑔(𝑧)𝜔 −𝑟 · ( 𝑦+𝑧) = E 𝑦 𝑓 (𝑦)𝜔 −𝑟 ·𝑦 (E𝑧 𝑔(𝑧)𝜔 −𝑟 ·𝑧 ) = b
𝑓 (𝑟)b
𝑔 (𝑟). □
By repeated applications of the convolution identity, we have
( 𝑓1 ∗ · · · ∗ 𝑓 𝑘 ) ∧ = b
𝑓1 b
𝑓2 · · · b
𝑓𝑘
(here we write 𝑓 ∧ for b
𝑓 for typographical reasons).
Now we introduce a quantity relevant to Roth’s theorem on 3-APs.
We will give two proofs of this proposition. The first proof is more mechanically straight-
forward. It is similar to the proof of the convolution identity earlier. The second proof directly
applies the convolution identity, and may be a bit more abstract/conceptual.
First proof. We expand the left-hand side using the formula for Fourier inversion.
E 𝑥,𝑦 𝑓 (𝑥)𝑔(𝑥 + 𝑦)ℎ(𝑥 + 2𝑦)
! ! !
∑︁ ∑︁ ∑︁
= E 𝑥,𝑦 b
𝑓 (𝑟 1 )𝜔𝑟1 · 𝑥 b
𝑔 (𝑟 2 )𝜔𝑟2 · ( 𝑥+𝑦) b
ℎ(𝑟 3 )𝜔𝑟3 · ( 𝑥+2𝑦)
∑︁
𝑟1 𝑟2 𝑟3
= b 𝑔 (𝑟 2 )b
𝑓 (𝑟 1 )b ℎ(𝑟 3 )E 𝑥 𝜔 𝑥· (𝑟1 +𝑟2 +𝑟3 ) E 𝑦 𝜔 𝑦· (𝑟2 +2𝑟3 )
∑︁
𝑟1 ,𝑟2 ,𝑟3
= b 𝑔 (𝑟 2 )b
𝑓 (𝑟 1 )b ℎ(𝑟 3 )1𝑟1 +𝑟2 +𝑟3 =0 1𝑟2 +2𝑟3 =0
∑︁
𝑟1 ,𝑟2 ,𝑟3
= b 𝑔 (−2𝑟)b
𝑓 (𝑟)b ℎ(𝑟).
𝑟
= b
𝑓 (𝑟) 𝑔b1 (𝑟)b
ℎ(𝑟) [Convolution identity]
∑︁𝑟
= b 𝑔 (−2𝑟)b
𝑓 (𝑟)b ℎ(𝑟). □
𝑟
Remark 6.1.10. In the following section, we will work in F3𝑛 . Since −2 = 1 in F3 (and
so 𝑔1 = 𝑔 above), the proof looks even simpler. In particular, by Fourier inversion and the
convolution identity,
Λ3 (1 𝐴) = 3−2𝑛 {(𝑥, 𝑦, 𝑧) ∈ 𝐴3 : 𝑥 + 𝑦 + 𝑧 = 0}
∑︁ ∑︁
= (1 𝐴 ∗ 1 𝐴 ∗ 1 𝐴) (0) = (1 𝐴 ∗ 1 𝐴 ∗ 1 𝐴) ∧ (𝑟) = 1c𝐴 (𝑟) 3 . (6.4)
𝑟 𝑟
When 𝐴 = −𝐴, the eigenvalues of the adjacency matrix of the Cayley graph Cay(F3𝑛 , 𝐴) are
3𝑛 1c𝐴 (𝑟), 𝑟 ∈ F3𝑛 (recall from Section 3.3 on the eigenvalues of abelian Cayley graphs are
206 Forbidding 3-Term Arithmetic Progressions
given by the Fourier transforms). The quantity 32𝑛 Λ3 (1 𝐴) is the number of closed walks of
length 3 in the Cayley graph Cay(F𝑛𝑝 , 𝐴). So the above identity is saying that the number of
closed walks of length 3 in Cay(F3𝑛 , 𝐴) equals to the third moment of the eigenvalues of the
adjacency matrix, which is a general fact for every graph. (When 𝐴 ≠ −𝐴, we can consider
the directed or bipartite version of this argument.)
The following exercise generalizes the above identity.
Exercise 6.1.11. Let 𝑎 1 , . . . , 𝑎 𝑘 be nonzero integers, none divisible by the prime 𝑝. Let
𝑓1 , . . . , 𝑓 𝑘 : F𝑛𝑝 → C. Show that
∑︁
E 𝑥1 ,..., 𝑥𝑘 ∈F𝑛𝑝 :𝑎1 𝑥1 +···+𝑎𝑘 𝑥𝑘 =0 𝑓1 (𝑥 1 ) · · · 𝑓 𝑘 (𝑥 𝑘 ) = b
𝑓1 (𝑎 1 𝑟) · · · b
𝑓 𝑘 (𝑎 𝑘 𝑟).
𝑟 ∈F𝑛𝑝
Remark 6.2.2 (General finite fields). We work in F3𝑛 mainly for convenience. The argument
presented in this section also shows that for every odd prime 𝑝, there is some constant 𝐶 𝑝 so
that every 3-AP-free subset of F𝑛𝑝 has size ≤ 𝐶 𝑝 𝑝 𝑛 /𝑛.
In F3𝑛 , there are several equivalent interpretations of 𝑥, 𝑦, 𝑧 ∈ F3𝑛 forming a 3-AP (allowing
the possibility for a trivial 3-AP with 𝑥 = 𝑦 = 𝑧):
• (𝑥, 𝑦, 𝑧) = (𝑥, 𝑥 + 𝑑, 𝑥 + 2𝑑) for some 𝑑;
• 𝑥 − 2𝑦 + 𝑧 = 0;
• 𝑥 + 𝑦 + 𝑧 = 0;
• 𝑥, 𝑦, 𝑧 are three distinct points of a line in F3𝑛 or are all equal;
• for each 𝑖, the 𝑖-th coordinates of 𝑥, 𝑦, 𝑧 are all distinct or all equal.
Remark 6.2.3 (SET card game). The card game SET comes with a deck of 81 cards (see
Figure 6.1 on the next page). Each card one of three possibilities in each of the following
four features:
• Number: 1, 2, 3;
• Symbol: diamond, squiggle, oval;
• Shading: solid, striped, open;
• Color: red, green, purple.
Each of the 34 = 81 combinations appears exactly once as a card.
In this game, a combination of three cards is called a “set” if each of the four features
6.2 Roth’s Theorem in the Finite Field Model 207
shows up as all identical or all distinct among the three cards. For the example, the three cards
shown below form a “set”: number (all distinct), symbol (all distinct), shading (all striped),
color (all red).
In a standard play of the game, the dealer lays down twelve cards on the table until some
player finds a “set”, in which case the player keeps the three cards of the “set” as their score,
then dealer replenishes the table by laying down more cards. If no set is found, then the dealer
continues to lay down more cards until a set is found.
The cards of the game correspond to points of F43 . A “set” is precisely a 3-AP. The cap set
problem in F43 asks for the number of cards without a “set.” The size of the maximum cap set
in F43 is 20 (Pellegrino 1970).
208 Forbidding 3-Term Arithmetic Progressions
Λ3 ( 𝑓 ) − (E 𝑓 ) 3 ≤ max | b
𝑓 (𝑟)| ∥ 𝑓 ∥ 22 .
𝑟≠0
Since E 𝑓 = b
𝑓 (0), we have
∑︁ ∑︁
Λ3 ( 𝑓 ) − (E 𝑓 ) 3 ≤ |b
𝑓 (𝑟)| 3 ≤ max | b
𝑓 (𝑟)| · |b
𝑓 (𝑟)| 2 = max | b
𝑓 (𝑟)| ∥ 𝑓 ∥ 22 .
𝑟≠0 𝑟≠0
𝑟≠0 𝑟
Proof. Since 𝐴 is 3-AP-free, Λ3 ( 𝐴) = | 𝐴| /32𝑛 = 𝛼/3𝑛 , as all 3-APs are trivial (i.e., with
common difference zero). By the counting lemma, Lemma 6.2.4,
𝛼3 − 𝑛 = 𝛼3 − Λ3 (1 𝐴) ≤ max |b1 𝐴 (𝑟)| ∥1 𝐴 ∥ 22 = max |b
𝛼
1 𝐴 (𝑟)|𝛼.
3 𝑟≠0 𝑟≠0
6.2 Roth’s Theorem in the Finite Field Model 209
Proof. We have
𝛼0 + 𝛼1 𝜔 + 𝛼2 𝜔2
1c𝐴 (𝑟) = E 𝑥 1 𝐴 (𝑥)𝜔 −𝑟 · 𝑥 =
3
where 𝛼0 , 𝛼1 , 𝛼2 are densities of 𝐴 on the cosets of 𝑟 ⊥ . We want to show that one of 𝛼0 , 𝛼1 , 𝛼2
is significantly larger than 𝛼. This is easy to check directly, but let us introduce a trick that
we will also use later in the integer setting.
We have 𝛼 = (𝛼0 + 𝛼1 + 𝛼2 )/3. By the triangle inequality,
3𝛿 ≤ 𝛼0 + 𝛼1 𝜔 + 𝛼2 𝜔2
= (𝛼0 − 𝛼) + (𝛼1 − 𝛼)𝜔 + (𝛼2 − 𝛼)𝜔2
≤ |𝛼0 − 𝛼| + |𝛼1 − 𝛼| + |𝛼2 − 𝛼|
∑︁
2
= |𝛼 𝑗 − 𝛼| + (𝛼 𝑗 − 𝛼) .
𝑗=0
We now view this hyperplane 𝐻 as F3𝑛−1 (we may need to select a new origin for 𝐻 if
0 ∉ 𝐻). The restriction of 𝐴 to 𝐻 (i.e., 𝐴 ∩ 𝐻) is now a 3-AP-free subset of 𝐻. The density
increased from 𝛼 to 𝛼 + 𝛼2 /4. Next we iterate this density increment.
Remark 6.2.9 (Translation invariance). It is important that the pattern we are forbidding
(3-AP) is translation-invariant. What is wrong with the argument if instead we forbid the
pattern 𝑥 + 𝑦 = 𝑧? Note that {𝑥 ∈ F3𝑛 : 𝑥1 = 2} avoids solutions to 𝑥 + 𝑦 = 𝑧, and this set has
density 1/3.
This is just shy of the bound 𝛼 = 𝑂 (1/𝑛) that we aim to prove. So let us re-do the density
increment analysis more carefully to analyze how quickly 𝛼𝑖 grows.
Each round, 𝛼𝑖 increases by at least 𝛼2 /4. So it takes ≤ ⌈4/𝛼⌉ initial rounds for 𝛼𝑖 to
double. Once 𝛼𝑖 ≥ 2𝛼, it then increases by at least 𝛼𝑖2 /4 each round afterwards, so it takes
≤ ⌈1/𝛼𝑖 ⌉ ≤ ⌈1/𝛼⌉ additional
rounds
for the density to double again. And so on: the 𝑘-th
doubling time is at most 42−𝑘 /𝛼 . Since the density is always at least 𝛼, the density can
double at most log2 (1/𝛼) times. So the total number of rounds is at most
∑︁ 42− 𝑗
1
=𝑂 .
𝑗 ≤log (1/𝛼)
𝛼 𝛼
2
Suppose the process terminates after 𝑚 steps with density 𝛼𝑚 . Then, examining the
hypothesis of Lemma 6.2.8, we find that the size of the final subspace |𝑉𝑚 | = 3𝑛−𝑚 is less
than 𝛼𝑚
−2
≤ 𝛼 −2 . So 𝑛 ≤ 𝑚 + 𝑂 (log(1/𝛼)) ≤ 𝑂 (1/𝛼). Thus 𝛼 = | 𝐴| /𝑁 = 𝑂 (1/𝑛). This
completes the proof of Roth’s theorem in F3𝑛 (Theorem 6.2.1).
Remark 6.2.10 (Quantitative bounds). Edel (2004) obtained a cap set of size ≥ 2.21𝑛
for sufficiently large 𝑛. This is obtained by constructing a cap set in F480 3 of size 𝑚 =
2327 (273 + 37 776 ) ≥ 2.21480 , which then implies, by a product construction, a cap set in F480𝑘
3
of size 𝑚 𝑘 for each positive integer 𝑘.
It was an open problem of great interest whether an upper bound of the form 𝑐 𝑛 , with
constant 𝑐 < 3, was possible on the size of cap sets in F3𝑛 . With significant effort, the
Fourier analytic strategy above was extended to prove an upper bound of the form 3𝑛 /𝑛1+𝑐
(Bateman & Katz 2012). So it came as quite a shock to the community when a very short
polynomial method proof was discovered, giving an upper bound 𝑂 (2.76𝑛 ) (Croot, Lev, &
Pach 2017; Ellenberg & Gijswijt 2017). We will discuss this proof in Section 6.5. However,
the polynomial method proof appears to be specific to the finite field model, and it is not
known how to extend the strategy to the integers.
The following exercise shows why the above strategy does not generalize to 4-APs at least
in a straightforward manner.
Exercise 6.2.11 (Fourier uniformity does not control 4-AP counts). Let
𝐴 = {𝑥 ∈ F5𝑛 : 𝑥 · 𝑥 = 0}.
Prove that:
(a) | 𝐴| = (5−1 + 𝑜(1))5𝑛 and | 1c𝐴 (𝑟)| = 𝑜(1) for all 𝑟 ≠ 0;
(b) |{(𝑥, 𝑦) ∈ F5𝑛 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦, 𝑥 + 3𝑦 ∈ 𝐴}| ≠ (5−4 + 𝑜(1))52𝑛 .
Hint: First write 1 𝐴 as an exponential sum. Compare with the Gauss sum from Theorem 3.3.14.
6.2 Roth’s Theorem in the Finite Field Model 211
Exercise 6.2.12 (Linearity testing). Show that for every prime 𝑝 there is some 𝐶 𝑝 > 0
such that if 𝑓 : F𝑛𝑝 → F 𝑝 satisfies
P 𝑥,𝑦 ∈F𝑛𝑝 ( 𝑓 (𝑥) + 𝑓 (𝑦) = 𝑓 (𝑥 + 𝑦)) = 1 − 𝜀
then there exists some 𝑎 ∈ F𝑛𝑝 such that
P 𝑥 ∈F𝑛𝑝 ( 𝑓 (𝑥) = 𝑎 · 𝑥) ≥ 1 − 𝐶 𝑝 𝜀.
In the above P expressions 𝑥 and 𝑦 are chosen i.i.d. uniform from F𝑛𝑝 .
The following exercises introduce Gowers uniformity norms. Gowers (2001) used them
to prove Szemerédi’s theorem by extending the Fourier analytic proof strategy of Roth’s
theorem to what is now called higher order Fourier analysis.
The 𝑈 2 norm in the following exercise plays a role similar to Fourier analysis.
Exercise 6.2.13 (Gowers 𝑈 2 uniformity norm). Let 𝑓 : F𝑛𝑝 → C, define
1/4
∥ 𝒇 ∥𝑼 2 := E 𝑥,𝑦,𝑦 ′ ∈F𝑛𝑝 𝑓 (𝑥) 𝑓 (𝑥 + 𝑦) 𝑓 (𝑥 + 𝑦 ′ ) 𝑓 (𝑥 + 𝑦 + 𝑦 ′ ) .
(a) Show that the expectation above is always a nonnegative real number, so that the
above expression is well defined. Also, show that ∥ 𝑓 ∥𝑈 2 ≥ |E 𝑓 |.
(b) (Gowers Cauchy–Schwarz) For 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 : F𝑛𝑝 → C, let
(The second inequality gives a so-called “inverse theorem” for the 𝑈 2 norm: if
∥ 𝑓 ∥𝑈 2 ≥ 𝛿 then | b
𝑓 (𝑟)| ≥ 𝛿2 for some 𝑟 ∈ F𝑛𝑝 . Informally, if 𝑓 is not 𝑈 2 -uniform,
then 𝑓 correlates with some exponential phase function of the form 𝑥 ↦→ 𝜔𝑟 · 𝑥 .)
The inadequacy of Fourier analysis towards understanding 4-APs is remedied by the 𝑈 3
norm, which is significantly more mysterious than the 𝑈 2 norm. Some easier properties of
the 𝑈 3 norm are given in the exercise below. Understanding properties of functions with large
𝑈 3 norm (known as the inverse problem) lies at the heart of quadratic Fourier analysis,
212 Forbidding 3-Term Arithmetic Progressions
which we do not discuss in this book (see Further Reading). The structure of set addition,
which is the topic of the next chapter, plays a central role in this theory.
Exercise 6.2.14 (Gowers 𝑈 3 uniformity norm). Let 𝑓 : F𝑛𝑝 → C. Define
∥ 𝒇 ∥𝑼 3 := E 𝑥,𝑦1 ,𝑦2 ,𝑦3 𝑓 (𝑥) 𝑓 (𝑥 + 𝑦 1 ) 𝑓 (𝑥 + 𝑦 2 ) 𝑓 (𝑥 + 𝑦 3 ) · · ·
1/8
· 𝑓 (𝑥 + 𝑦 1 + 𝑦 2 ) 𝑓 (𝑥 + 𝑦 1 + 𝑦 3 ) 𝑓 (𝑥 + 𝑦 2 + 𝑦 3 ) 𝑓 (𝑥 + 𝑦 1 + 𝑦 2 + 𝑦 3 ) .
Alternatively, for each 𝑦 ∈ F𝑛𝑝 , define the multiplicative finite difference Δ 𝑦 𝑓 : F𝑛𝑝 → C by
Δ 𝑦 𝑓 (𝑥) := 𝑓 (𝑥) 𝑓 (𝑥 + 𝑦), we can rewrite the above expression in terms of the 𝑈 2 uniformity
norm from Exercise 6.2.13 as
8 4
∥ 𝑓 ∥𝑈 3 = E 𝑦 ∈F𝑛 Δ𝑦 𝑓 .
𝑝 𝑈2
(a) (Monotonicity) Verify that the above two definitions for ∥ 𝑓 ∥𝑈 3 coincides and give
well defined nonnegative real numbers. Also, show that
∥ 𝑓 ∥𝑈 2 ≤ ∥ 𝑓 ∥𝑈 3 .
(b) (Separation of norms) Let 𝑝 be odd and 𝑓 : F𝑛𝑝 → C be defined by 𝑓 (𝑥) = 𝑒 2 𝜋𝑖 𝑥· 𝑥/ 𝑝 .
Prove that ∥ 𝑓 ∥𝑈 3 = 1 and ∥ 𝑓 ∥𝑈 2 = 𝑝 −𝑛/4 .
(c) (Triangle inequality) Prove that
∥ 𝑓 + 𝑔∥𝑈 3 ≤ ∥ 𝑓 ∥𝑈 3 + ∥𝑔∥𝑈 3 .
Conclude that ∥ ∥𝑈 3 is a norm.
(d) (𝑈 3 norm controls 4-APs) Let 𝑝 ≥ 5 be a prime, and 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 : F𝑛𝑝 → C all
taking values in the unit disk. We write
Λ( 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 ) := E 𝑥,𝑦 ∈F𝑛𝑝 𝑓1 (𝑥) 𝑓2 (𝑥 + 𝑦) 𝑓3 (𝑥 + 2𝑦) 𝑓4 (𝑥 + 3𝑦).
Prove that
|Λ( 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 )| ≤ min ∥ 𝑓𝑠 ∥𝑈 3 .
𝑠
where
𝒆(𝒕) := exp(2𝜋𝑖𝑡), 𝑡 ∈ R.
Note the normalization conventions: we sum in the physical space Z (there is no sensible
way to average in Z) and average in the frequency space R/Z.
and
𝚲3 ( 𝒇 ) := Λ( 𝑓 , 𝑓 , 𝑓 ).
Then for any finite set 𝐴 of integers,
Λ3 ( 𝐴) = |{(𝑥, 𝑦) : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}|
counts the number of 3-APs in 𝐴, where each non-trivial 3-AP is counted twice, forward and
backward, and each trivial 3-AP is counted once.
Exercise 6.3.9. Show that if a finite set 𝐴 of integers contains 𝛽 | 𝐴| 2 solutions (𝑎, 𝑏, 𝑐) ∈
𝐴3 to 𝑎 +2𝑏 = 3𝑐, then it contains at least 𝛽2 | 𝐴| 3 solutions (𝑎, 𝑏, 𝑐, 𝑑) ∈ 𝐴4 to 𝑎 + 𝑏 = 𝑐 + 𝑑.
The proof of Roth’s theorem in F3𝑛 proceeded by density increment when restricting to
subspaces. An important difference between F3𝑛 and Z is that Z has no subspaces (more on
this later). Instead, we will proceed in Z by restricting to subprogressions. In this section, by
a progression we mean an arithmetic progression.
We have the following analogue of Lemma 6.2.4. It says that if 𝑓 and 𝑔 are “Fourier-close,”,
then they have similar 3-AP counts. We write
! 1/2
∑︁
𝒇 ∥ ∞ := sup | b
∥b 𝑓 (𝜃)| and ∥ 𝒇 ∥ ℓ2 := | 𝑓 (𝑥)| 2
.
𝜃 𝑥 ∈Z
6.4 Roth’s Theorem in the Integers 215
Proof. We have
Λ3 ( 𝑓 ) − Λ3 (𝑔) = Λ( 𝑓 − 𝑔, 𝑓 , 𝑓 ) + Λ(𝑔, 𝑓 − 𝑔, 𝑓 ) + Λ(𝑔, 𝑔, 𝑓 − 𝑔).
Let us bound the first term on the right-hand side. We have
|Λ( 𝑓 − 𝑔, 𝑓 , 𝑓 )|
∫ 1
= (𝑓 − 𝑔)(𝜃) b
𝑓 (−2𝜃) b
𝑓 (𝜃) 𝑑𝜃 [Prop. 6.3.6]
∫
0
1
≤ ∥
𝑓 − 𝑔∥ ∞ b
𝑓 (−2𝜃) b
𝑓 (𝜃) 𝑑𝜃 [Triangle ineq.]
0
∫ 1 1/2 ∫ 1 1/2
≤ ∥ b b
2 2
𝑓 − 𝑔∥ ∞ 𝑓 (−2𝜃) 𝑑𝜃 𝑓 (𝜃) 𝑑𝜃 [Cauchy-Schwarz]
0 0
≤ ∥
𝑓 − 𝑔∥ ∞ ∥ 𝑓 ∥ ℓ22 . [Parseval]
Proof. Since 𝐴 is 3-AP-free, the quantity 1 𝐴 (𝑥)1 𝐴 (𝑥 + 𝑦)1 𝐴 (𝑥 + 2𝑦) is nonzero only for
trivial 3-APs (here trivial means 𝑦 = 0). Thus
Λ3 (1 𝐴) = | 𝐴| = 𝛼𝑁.
On the other hand, a 3-AP in [𝑁] can be counted by counting pairs of integers with the same
parity to form the first and third element of the 3-AP, yielding,
Λ3 (1 [ 𝑁 ] ) = ⌊𝑁/2⌋ 2 + ⌈𝑁/2⌉ 2 ≥ 𝑁 2 /2.
Now apply the counting lemma (Proposition 6.4.2) to 𝑓 = 1 𝐴 and 𝑔 = 𝛼1 [ 𝑁 ] . We have
∥1 𝐴 ∥ ℓ22 = | 𝐴| = 𝛼𝑁 and ∥𝛼1 [ 𝑁 ] ∥ ℓ22 = 𝛼2 𝑁. So
𝛼3 𝑁 2
− 𝛼𝑁 ≤ 𝛼3 Λ3 (1 [ 𝑁 ] ) − Λ3 (1 𝐴) ≤ 3𝛼𝑁 (1 𝐴 − 𝛼1 [ 𝑁 ] ) ∧ .
2 ∞
Proof. Let 𝑚 = ⌊1/𝛿⌋. By the pigeonhole principle, among the 𝑚 + 1 numbers 0, 𝜃, · · · , 𝑚𝜃,
we can find 0 ≤ 𝑖 < 𝑗 ≤ 𝑚 such that the fractional parts of 𝑖𝜃 and 𝑗 𝜃 differ by at most 𝛿. Set
𝑑 = |𝑖 − 𝑗 |. Then ∥𝑑𝜃 ∥ R/Z ≤ 𝛿, as desired. □
Given 𝜃, we now partition [𝑁] into subprogressions with roughly constant 𝑒(𝑥𝜃) inside
each progression. The constants appearing in rest of this argument are mostly unimportant.
√ √
Proof. By Lemma 6.4.4, there is a positive integer 𝑑 < 𝑁 such that ∥𝑑𝜃 ∥ R/Z ≤ 1/ 𝑁.
Partition [𝑁] greedily into progressions with common difference 𝑑 of lengths between 𝑁 1/3
and 2𝑁 1/3 . Then, for two elements 𝑥, 𝑦 within the same progression 𝑃𝑖 , we have
|𝑒(𝑥𝜃) − 𝑒(𝑦𝜃)| ≤ |𝑃𝑖 | |𝑒(𝑑𝜃) − 1| ≤ 2𝑁 1/3 · 2𝜋 · 𝑁 −1/2 ≤ 𝜂.
Here we use the inequality |𝑒(𝑑𝜃) − 1| ≤ 2𝜋 ∥𝑑𝜃 ∥ R/Z from the fact that the length of a chord
on a circle is at most the length of the corresponding arc. □
We can now apply this lemma to obtain a density increment.
Next, apply Lemma 6.4.5 with 𝜂 = 𝛼2 /20 (the hypothesis 𝑁 ≥ (4𝜋/𝜂) 6 is satisfied since
(16/𝛼) 12 ≥ (80𝜋/𝛼2 ) 6 = (4𝜋/𝜂) 6 ) to obtain a partition 𝑃1 , . . . , 𝑃 𝑘 of [𝑁] satisfying 𝑁 1/3 ≤
|𝑃𝑖 | ≤ 2𝑁 1/3 and
𝛼2
|𝑒(𝑥𝜃) − 𝑒(𝑦𝜃)| ≤ for all 𝑖 and 𝑥, 𝑦 ∈ 𝑃𝑖 .
20
218 Forbidding 3-Term Arithmetic Progressions
So on each 𝑃𝑖 ,
∑︁ ∑︁ 𝛼2
(1 𝐴 − 𝛼) (𝑥)𝑒(𝑥𝜃) ≤ (1 𝐴 − 𝛼) (𝑥) + |𝑃𝑖 |.
𝑥 ∈ 𝑃𝑖 𝑥 ∈ 𝑃𝑖
20
Thus
𝛼2 ∑︁
𝑁
𝑁≤ (1 𝐴 − 𝛼) (𝑥)𝑒(𝑥𝜃)
10 𝑥=1
∑︁
𝑘 ∑︁
≤ (1 𝐴 − 𝛼) (𝑥)𝑒(𝑥𝜃) .
!
𝑖=1 𝑥 ∈ 𝑃𝑖
∑︁
𝑘 ∑︁ 𝛼2
≤ (1 𝐴 − 𝛼) (𝑥) + |𝑃𝑖 |
𝑖=1 𝑥∈𝑃
20
𝑖
∑︁
𝑘 ∑︁ 𝛼2
= (1 𝐴 − 𝛼) (𝑥) + 𝑁
𝑖=1 𝑥 ∈ 𝑃𝑖
20
Thus
𝛼2 ∑︁
𝑘 ∑︁
𝑁≤ (1 𝐴 − 𝛼) (𝑥)
20 𝑖=1 𝑥 ∈ 𝑃 𝑖
and hence
𝛼2 ∑︁ ∑︁
𝑘 𝑘
|𝑃𝑖 | ≤ | 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 | .
20 𝑖=1 𝑖=1
We want to show that there exists some 𝑃𝑖 such that 𝐴 has a density increment when restricted
to 𝑃𝑖 . The following trick is convenient. Note that
𝛼2 ∑︁ ∑︁
𝑘 𝑘
|𝑃𝑖 | ≤ | 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 |
20 𝑖=1 𝑖=1
∑︁
𝑘
= | 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 | + (| 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 |) ,
𝑖=1
as the newly added terms in the final step sum to zero. Thus there exists an 𝑖 such that
𝛼2
|𝑃𝑖 | ≤ | 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 | + (| 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 |) .
20
Since |𝑡| + 𝑡 is 2𝑡 for 𝑡 > 0 and 0 for 𝑡 ≤ 0, we deduce
𝛼2
|𝑃𝑖 | ≤ 2(| 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 |),
20
which yields
𝛼2
| 𝐴 ∩ 𝑃𝑖 | ≥ 𝛼 + |𝑃𝑖 |. □
40
6.4 Roth’s Theorem in the Integers 219
Rearranging gives
𝑁 ≤ (16/𝛼) 12·3 ≤ (16/𝛼) 𝑒
𝑚 𝑂 (1/𝛼)
.
Therefore
| 𝐴| 1
=𝛼=𝑂 .
𝑁 log log 𝑁
This completes the proof of Roth’s theorem (Theorem 6.4.1). □
We saw that the proofs in F3𝑛 and Z have largely the same set of ideas, but the proof in Z
is somewhat more technically involved. The finite field model is often a good sandbox to try
out Fourier analytic ideas.
Remark 6.4.7 (Bohr sets). Let us compare the results in F3𝑛 and [𝑁]. Write 𝑁 = 3𝑛 for
the size of the ambient space in both cases, for comparison. We obtained an upper bound
of 𝑂 (𝑁/log 𝑁) for 3-AP-free sets in F3𝑛 and 𝑂 (𝑁/log log 𝑁) in [𝑁] ⊆ Z. Where does the
difference in quantitative bounds stem from?
In the density increment step for F3𝑛 , at each step, we pass down to a subset which had size
a constant factor (namely 1/3) of the original one. However, in [𝑁], each iteration gives us
a subprogression which has size equal to the cube root of the previous subprogression. The
extra log for Roth’s theorem in the integers comes from this rapid reduction in the sizes of
the subprogressions.
Can we do better? Perhaps by passing down to subsets of [𝑁] that look more like subspaces?
220 Forbidding 3-Term Arithmetic Progressions
Indeed, this is possible. Bourgain (1999) used Bohr sets to prove an improved bound of
𝑁/(log 𝑁) 1/2+𝑜(1) on Roth’s theorem. Given 𝜃 1 , . . . , 𝜃 𝑘 , and some 𝜀 > 0, a Bohr set has the
form
𝑥 ∈ [𝑁] : ∥𝑥𝜃 𝑗 ∥ R/Z ≤ 𝜀 for each 𝑗 = 1, . . . , 𝑘 .
To see why this is analogous to subspaces, note that we can define a subspace of F3𝑛 as a set
of the following form
𝑥 ∈ F3𝑛 : 𝑟 𝑗 · 𝑥 = 0 for each 𝑗 = 1, . . . , 𝑘 .
where 𝑟 1 , . . . , 𝑟 𝑘 ∈ F3𝑛 \ {0}. Bohr sets are used widely in additive combinatorics, and in
nearly all subsequent work on Roth’s theorem in the integers, including the proof of the
current best bound 𝑁/(log 𝑁) 1+𝑐 for some constant 𝑐 > 0 (Bloom & Sisask 2020).
We will see Bohr sets again in the proof of Freiman’s theorem in Chapter 7.
The next exercise is analogous to Exercise 6.2.11, which was in F5𝑛 .
Exercise 6.4.8∗ (Fourier uniformity does not control 4-AP counts). Fix 0 < 𝛼 < 1. Let
𝑁 be a prime. Let
𝐴 = 𝑥 ∈ [𝑁] : 𝑥 2 mod 𝑁 < 𝛼𝑁 .
Viewing 𝐴 ⊆ Z/𝑁Z, prove that, as 𝑁 → ∞ with fixed 𝛼,
(a) | 𝐴| = (𝛼 + 𝑜(1))𝑁 and max𝑟≠0 | 1c𝐴 (𝑟)| = 𝑜(1);
(b) |(𝑥, 𝑦) ∈ Z/𝑁Z : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦, 𝑥 + 3𝑦 ∈ 𝐴| ≠ (𝛼4 + 𝑜(1))𝑁 2 .
those of the form 𝐹 (𝑥, 𝑦, 𝑧) = 𝑓 (𝑥)𝑔(𝑦)ℎ(𝑧). This is a standard and important notion (which
comes with a lot of mystery), but it is not the one that we shall use.
Proof. Let 𝐹𝑎 be the restriction of 𝐹 to the “slice” {(𝑥, 𝑦, 𝑧) ∈ 𝐴 × 𝐴 × 𝐴 : 𝑥 = 𝑎}; that is,
(
𝐹 (𝑥, 𝑦, 𝑧) if 𝑥 = 𝑎,
𝐹𝑎 (𝑥, 𝑦, 𝑧) =
0 if𝑥 ≠ 𝑎.
𝐹𝑎
Then 𝐹𝑎 has slice rank ≤ 1 since 𝐹𝑎 (𝑥, 𝑦, 𝑧) = 𝛿 𝑎 (𝑥)𝐹 (𝑎, 𝑦, 𝑧), where 𝛿 𝑎 denotes the function
Í
taking value 1 at 𝑎 and 0 elsewhere. Thus 𝐹 = 𝑎∈ 𝐴 𝐹𝑎 has slice rank at most | 𝐴|. □
For the next lemma, we need the following fact from linear algebra.
Proof. Form a 𝑘 × 𝑛 matrix 𝑀 whose rows form a basis of this 𝑘-dimensional subspace
𝑊. Then 𝑀 has rank 𝑘. So it has some invertible 𝑘 × 𝑘 submatrix with columns 𝑆 ⊆ [𝑛]
with |𝑆| = 𝑘. Then for every 𝑧 ∈ F𝑆 , there is some linear combination of the rows whose
coordinates on 𝑆 are identical to those of 𝑧. In particular, there is some vector in the 𝑘-
dimensional subspace 𝑊 whose 𝑆-coordinates are all nonzero. □
A diagonal matrix with nonzero diagonal entries has full rank. We show that a similar
statement holds true for the slice rank.
Lemma 6.5.5 (Slice rank of a diagonal)
Suppose 𝐹 : 𝐴 × 𝐴 × 𝐴 → F satisfies 𝐹 (𝑥, 𝑦, 𝑧) ≠ 0 if and only if 𝑥 = 𝑦 = 𝑧. Then 𝐹 has
slice rank | 𝐴|.
Proof. From Lemma 6.5.3, we already know that the slice rank of 𝐹 is ≤ | 𝐴|. It remains to
prove that the slice rank of 𝐹 is is ≥ | 𝐴|.
Suppose 𝐹 (𝑥, 𝑦, 𝑧) can be written as a sum of functions of the form
𝑓 (𝑥)𝑔(𝑦, 𝑧), 𝑓 (𝑦)𝑔(𝑥, 𝑧), and 𝑓 (𝑧)𝑔(𝑥, 𝑦),
222 Forbidding 3-Term Arithmetic Progressions
with 𝑚 1 summands of the first type, 𝑚 2 of the second type, and 𝑚 3 of the third type. By
Lemma 6.5.4, there is some function ℎ : 𝐴 → F that is orthogonal to all the 𝑓 ’s from the
Í
third type of summands (i.e., 𝑥 ∈ 𝐴 𝑓 (𝑥)ℎ(𝑥) = 0), and such that |supp ℎ| ≥ | 𝐴| − 𝑚 3 . Let
∑︁
𝐺 (𝑥, 𝑦) = 𝐹 (𝑥, 𝑦, 𝑧)ℎ(𝑧).
𝑧∈ 𝐴
Only summands of the first two types remain. Each summand of the first type turns into a
rank 1 function (in the matrix sense of the rank)
∑︁
(𝑥, 𝑦) ↦→ 𝑓 (𝑥)𝑔(𝑦, 𝑧)ℎ(𝑧) = 𝑓 (𝑥)e
𝑔 (𝑦)
𝑧
for some new function e 𝑔 : 𝐴 → F. Similarly with functions of the second type. So 𝐺 (viewed
as an | 𝐴| × | 𝐴| matrix) has rank ≤ 𝑚 1 + 𝑚 2 . On the other hand,
(
ℎ(𝑥) if 𝑥 = 𝑦,
𝐺 (𝑥, 𝑦) =
0 if 𝑥 ≠ 𝑦.
This 𝐺 has rank |supp ℎ| ≥ | 𝐴| − 𝑚 3 . Combining, we get
| 𝐴| − 𝑚 3 ≤ rank 𝐺 ≤ 𝑚 1 + 𝑚 2 .
So 𝑚 1 + 𝑚 2 + 𝑚 3 ≥ | 𝐴|. This shows that the slice rank of 𝐹 is ≥ | 𝐴|. □
Now we prove an upper bound on the slice rank by invoking magical powers of polynomials.
If we expand the right-hand side, we obtain a polynomial in 3𝑛 variables with degree 2𝑛.
This is a sum of monomials, each of the form
𝑥1𝑖1 · · · 𝑥 𝑛𝑖𝑛 𝑦 1𝑗1 · · · 𝑦 𝑛𝑗𝑛 𝑧1𝑘1 · · · 𝑧 𝑛𝑘𝑛 ,
6.5 Polynomial Method 223
∑︁
𝑗1 +···+ 𝑗𝑛 ≤ 3
Each summand has slice rank at most 1. The number of summands in the first sum is precisely
the number of triples of nonnegative integers 𝑎, 𝑏, 𝑐 with 𝑎 + 𝑏 + 𝑐 = 𝑛 and 𝑏 + 2𝑐 ≤ 2𝑛/3
(𝑎, 𝑏, 𝑐 correspond to the numbers of 𝑖 ∗ ’s that are equal to 0, 1, 2 respectively) . The lemma
then follows. □
Here is a standard estimate. The proof is similar to that of the Chernoff bound.
Proof. Let 𝑥 ∈ [0, 1]. The sum equals to the coefficients of all the monomials 𝑥 𝑘 with
𝑘 ≤ 2𝑛/3 in the expansion of (1 + 𝑥 + 𝑥 2 ) 𝑛 . By deleting contributions 𝑥 𝑘 with 𝑘 > 2𝑛/3 and
using 𝑥 2𝑛/3 ≤ 𝑥 𝑘 whenever 𝑘 ≤ 2𝑛/3, we have
∑︁ 𝑛! (1 + 𝑥 + 𝑥 2 ) 𝑛
≤ .
𝑎,𝑏,𝑐≥0
𝑎!𝑏!𝑐! 𝑥 2𝑛/3
𝑎+𝑏+𝑐=𝑛
𝑏+2𝑐≤2𝑛/3
has slice rank | 𝐴|. On the other hand, by Lemmas 6.5.6 and 6.5.7, 𝐹 has slice rank ≤ 3(2.76) 𝑛 .
So | 𝐴| ≤ 3(2.76) 𝑛 . □
It is straightforward to extend the above proof from F3 to any other fixed F 𝑝 , resulting:
Finally, the proof technique in this section seems specific to the finite field model. It is an
intriguing open problem to apply the polynomial method for Roth’s theorem in the integers.
Due to the Behrend example (Section 2.5), we cannot expect power-saving bounds in the
integers.
Exercise 6.5.12 (Tricolor sum-free set). Let 𝑎 1 , . . . , 𝑎 𝑚 , 𝑏 1 , . . . , 𝑏 𝑚 , 𝑐 1 , . . . , 𝑐 𝑚 ∈ F2𝑛 .
Suppose that the equation 𝑎 𝑖 + 𝑏 𝑗 + 𝑐 𝑘 = 0 holds if and only if 𝑖 = 𝑗 = 𝑘. Show that there
is some constant 𝑐 > 0 such that 𝑚 ≤ (2 − 𝑐) 𝑛 for all sufficiently large 𝑛.
The following exercises explains how Fourier uniformity is analogous to the discrepancy-
type condition for 𝜀-regular pairs in the graph regularity lemma.
6.6 Arithmetic Regularity 225
Exercise 6.6.2 (Uniformity vs. discrepancy). Let 𝐴 ⊆ F𝑛𝑝 with | 𝐴| = 𝛼𝑝 𝑛 . We say that 𝐴
satisfies HyperplaneDISC(𝜂) if for every hyperplane 𝑊 of F𝑛𝑝 ,
|𝐴 ∩ 𝑊|
− 𝛼 ≤ 𝜂.
|𝑊 |
(a) Prove that if 𝐴 satisfies HyperplaneDISC(𝜀), then 𝐴 is 𝜀-uniform.
(b) Prove that if 𝐴 is 𝜀-uniform, then it satisfies HyperplaneDISC(( 𝑝 − 1)𝜀).
The proof is very similar to the proof of the graph regularity lemma in Chapter 2. Each
subspace 𝑊 induces a partition of the whole space F𝑛𝑝 into 𝑊-cosets, and we keep track the
energy (mean-squared density) of the partition. We show that if the conclusion of Theo-
rem 6.6.4 does not hold for the current 𝑊, then we can replace 𝑊 by a smaller subspace so
that the energy increases significantly. Since the energy is always bounded between 0 and 1,
there are at most a bounded number of iterations.
The next lemma is analogous to the energy boost lemma for irregular pairs in the proof of
graph regularity (Lemma 2.1.13).
Proof. By Lemma 6.6.7, for each coset 𝑊 ′ of 𝑊 on which 𝑓 is not 𝜀-uniform, we can find
some 𝑟 ∈ F𝑛𝑝 \ 𝑊 ⊥ so that replacing 𝑊 by its intersection with 𝑟 ⊥ increases its energy on 𝑊 ′
by more than 𝜀 2 . In other words,
| 𝐴 ∩ 𝑊 ′ |2
𝑞 𝐴∩𝑊 ′ (𝑊 ′ ∩ 𝑟 ⊥ ) > + 𝜀2 .
|𝑊 ′ | 2
6.6 Arithmetic Regularity 227
Let 𝑅 be a set of such 𝑟’s, one for each 𝑊-coset on which 𝑓 is not 𝜀-uniform (allowing some
𝑟’s to be chosen repeatedly).
Let 𝑈 = 𝑊 ∩ 𝑅 ⊥ . Then codim 𝑈 − codim 𝑊 ≤ |𝑅| ≤ |F 𝑝 /𝑊 | = 𝑝 codim 𝑊 .
Applying the monotonicity of energy (Lemma 6.6.6) on each 𝑊-coset and using the
observation in the first paragraph in this proof, we see the “local” energy of 𝑈 is more than
that of 𝑊 on by > 𝜀 2 on each of the > 𝜀-fraction of 𝑊-cosets on which 𝑓 is not 𝜀-uniform,
and is at least as great as that of 𝑊 on each of the remaining 𝑊-cosets. There the energy
increases by > 𝜀 2 when refining from 𝑊 to 𝑈. □
Proof of the arithmetic regularity lemma (Theorem 6.6.4). Starting with 𝑊0 = F𝑛𝑝 , we con-
struct a sequence of subspaces 𝑊0 ≥ 𝑊1 ≥ 𝑊2 ≥ · · · where each at step, unless 𝐴 is 𝜀-
uniform on all but ≤ 𝜀-fraction of 𝑊-cosets, then we apply Lemma 6.6.8 to find 𝑊𝑖+1 ≤ 𝑊𝑖 .
The energy increases by > 𝜀 3 at each iteration, so there are < 𝜀 −3 iterations. We have
codim 𝑊𝑖+1 ≤ codim 𝑊𝑖 + 𝑝 codim 𝑊𝑖 at each 𝑖, so the final 𝑊 = 𝑊𝑚 has codimension at most
some function of 𝑝 and 𝜀 (one can check that it is an exponential tower of 𝑝’s of height
𝑂 (𝜀 −3 )). This 𝑊 satisfies the desired properties. □
Remark 6.6.9 (Lower bound). Recall that Gowers (1997) showed that there exist graphs
whose 𝜀-regular partition requires at least tower(Ω(𝜀 −𝑐 )) parts (Theorem 2.1.17). There is a
similar tower-type lower bound for the arithmetic regularity lemma (Green 2005a; Hosseini,
Lovett, Moshkovitz, & Shapira 2016).
Remark 6.6.10 (Abelian groups). Green (2005a) also established an arithmetic regularity
lemma over arbitrary finite abelian groups. Instead of subspaces, one uses Bohr sets (see
Remark 6.4.7).
You may wish to skip ahead to Section 6.7 to see an application of the arithmetic regularity
lemma.
Remark 6.6.12. It is worth comparing Theorem 6.6.11 to the strong graph regularity lemma
(Theorem 2.8.3). It is important that the uniformity requirement on the pseudorandom piece
depends on the codim 𝑊.
In other more advanced applications, we would like 𝑓str to come from some structured
class of functions. For example, in higher order Fourier analysis, 𝑓str is a nilsequence.
Proof. Let 𝑘 0 = 0 and 𝑘 𝑖+1 = max{𝑘 𝑖 , ⌈𝜀 𝑘−2𝑖 ⌉} for each 𝑖 ≥ 0. Note that 𝑘 0 ≤ 𝑘 1 ≤ · · · .
Let us label the elements 𝑟 1 , 𝑟 2 , . . . , 𝑟 𝑝 𝑛 of F𝑛𝑝 so that
|b
𝑓 (𝑟 1 )| ≥ | b
𝑓 (𝑟 2 )| ≥ · · · .
By Parseval (Theorem 6.1.3), we have
∑︁
𝑝𝑛
|b
𝑓 (𝑟 𝑗 )| 2 = E 𝑓 2 ≤ 1.
𝑗=1
into
𝑓 = 𝑓str + 𝑓sml + 𝑓psr
according to the sizes of the Fourier coefficients. Roughly speaking, the large spectrum will
go into the structured piece 𝑓str , the very small spectrum will go into pseudorandom piece
𝑓psr , and the remaining middle terms will form the small piece 𝑓sml (which has small 𝐿 2 norm
by (6.8)).
Let 𝑊 = {𝑟 1 , . . . , 𝑟 𝑘𝑚 }⊥ and set
𝑓str = 𝑓𝑊 .
Then, by (6.6),
(
b if 𝑟 ∈ 𝑊 ⊥ ,
c
𝑓str (𝑟) =
𝑓 (𝑟)
0 if 𝑟 ∈ 𝑊 ⊥ .
Let us define 𝑓psr and 𝑓sml via their Fourier transform (and we can recover the functions via
the inverse Fourier transform). For each 𝑗 = 1, 2, . . . , 𝑝 𝑛 , set
(
b
𝑓 (𝑟 𝑗 ) if 𝑗 > 𝑘 𝑚+1 and 𝑟 𝑗 ∉ 𝑊 ⊥ ,
c
𝑓psr (𝑟 𝑗 ) =
0 otherwise.
6.6 Arithmetic Regularity 229
Exercise 6.6.13. Deduce Theorem 6.6.4 from Theorem 6.6.11 by using an appropriate
sequence 𝜀𝑖 and using the same 𝑊 guaranteed by Theorem 6.6.11.
Remark 6.6.14 (Spectral proof of the graph regularity lemma). The proof technique of
Theorem 6.6.11 can be adapted to give an alternate proof of the graph regularity lemma
(along with certain weak and strong variants). Instead of iteratively refining partitions and
tracking energy increments as we did in Chapter 2, we can first take a spectral decomposition
of the adjacency matrix 𝐴 of a graph:
∑︁
𝑛
𝐴= 𝜆𝑖 𝑣 𝑖 𝑣 𝑖⊺ ,
𝑖=1
for some appropriately chosen 𝑘 and 𝑘 similar to the proof of Theorem 6.6.11.
′
We have
∑︁
𝑛
𝜆2𝑖 = tr 𝐴2 ≤ 𝑛2 .
𝑖=1
√
So 𝜆𝑖 ≤ 𝑛/ 𝑖 for each 𝑖. We can guarantee that the spectral norm of 𝐴psr is small enough as
2
Í
a function of 𝑘 and 𝜀. Furthermore, we can guarantee that tr 𝐴sml = 𝑘<𝑖 ≤ 𝑘 ′ 𝜆2𝑖 ≤ 𝜀.
To turn 𝐴str into a vertex partition, we can use the approximate level sets of the top 𝑘
eigenvectors 𝑣 1 , . . . , 𝑣 𝑘 . Some bookkeeping calculations then shows that this is a regularity
partition. Intuitively, 𝐴psr provides us with regular pairs. Some of these regular pairs may not
stay regular after adding 𝐴sml , but since 𝐴sml has ≤ 𝜀 mass (in terms of 𝐿 2 norm), it destroys
at most a negligible fraction of regular pairs.
See Tao (2007a, Lemma 2.11) or Tao’s blog post The Spectral Proof of the Szemerédi
Regularity Lemma (2012) for more details of the proof.
230 Forbidding 3-Term Arithmetic Progressions
The following exercise is the arithmetic analogue of the existence of an 𝜀-regular vertex
subset in a graph (Theorem 2.1.26 and Exercise 2.1.27).
Exercise 6.6.15 (𝜀 -uniform subspace).
(a*) Prove that for every 0 < 𝜀 < 1/2 and 𝐴 ⊆ F2𝑛 , there exists a subspace 𝑊 ⊆ F2𝑛 (note
that 0 ∈ 𝑊) with codimension at most exp(𝐶/𝜀) such that 𝐴 is 𝜀-uniform on 𝑊.
Here 𝐶 is some absolute constant.
(b) Let 𝐴 = {𝑥 ∈ F3𝑛 : there exists 𝑖 such that 𝑥1 = · · · = 𝑥𝑖 = 0, 𝑥𝑖+1 = 1}. Prove that 𝐴
is not 𝑐-uniform on any positive dimensional subspace of F3𝑛 . Here 𝑐 > 0 is some
absolute constant.
In particular, Theorem 6.7.1 implies that every 3-AP-free subset of F3𝑛 has size 𝑜(3𝑛 ).
Exercise 6.7.2. Show that it is false that every 𝐴 ⊆ F3𝑛 with | 𝐴| = 𝛼3𝑛 , the number of
pairs (𝑥, 𝑦) ∈ F3𝑛 with 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴 is ≥ (𝛼3 − 𝑜(1))32𝑛 , where 𝑜(1) → 0 as 𝑛 → 0.
We will prove Theorem 6.7.1 via the next result, which concerns the number of 3-APs
with common difference coming from some subspace of bounded codimension, which is
picked via the arithmetic regularity lemma.
Proof. By the arithmetic regularity lemma (Theorem 6.6.4), there is some 𝑀 depending
only on 𝜀 and a subspace 𝑊 of F𝑛𝑝 of codimension ≤ 𝑀 so that 𝐴 is 𝜀-uniform on all but at
most 𝜀-fraction of 𝑊-cosets.
Let 𝑢 + 𝑊 be a 𝑊-coset on which 𝐴 is 𝜀-uniform. Denote the density of 𝐴 in 𝑢 + 𝑊 by
| 𝐴 ∩ (𝑢 + 𝑊)|
𝛼𝑢 = .
|𝑊 |
6.7 Popular Common Difference 231
Restricting ourselves inside 𝑢 + 𝑊 for a moment, by the 3-AP counting lemma Lemma 6.2.4,
the number of 3-APs of 𝐴 (including trivial ones) that are contained in 𝑢 + 𝑊 is
|{(𝑥, 𝑦) ∈ (𝑢 + 𝑊) × 𝑊 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}| ≥ (𝛼𝑢3 − 𝜀) |𝑊 | 2 .
Since 𝐴 is 𝜀-uniform on all but at most 𝜀-fraction of 𝑊-cosets, by varying 𝑢 + 𝑊 over all
such cosets, we find that the total number of 3-APs in 𝐴 with common difference in 𝑊 is
{(𝑥, 𝑦) ∈ F3𝑛 × 𝑊 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} ≥ (1 − 𝜀) (𝛼3 − 𝜀)3𝑛 |𝑊 | ≥ (𝛼3 − 2𝜀)3𝑛 |𝑊 | .
This proves the theorem (with 𝜀 replaced by 2𝜀). □
Exercise 6.7.4. Give another proof of Theorem 6.7.3 using Theorem 6.6.11 (arithmetic
regularity decomposition 𝑓 = 𝑓str + 𝑓psr + 𝑓sml ).
Proof of Theorem 6.7.1. First apply Theorem 6.7.3 with find a subspace 𝑊 of codimension
≤ 𝑀 = 𝑀 (𝜀). Choose 𝑛0 = 𝑀 + log3 (1/𝜀). So 𝑛 ≥ 𝑛0 guarantees |𝑊 | ≥ 1/𝜀.
We need to exclude 3-APs with common difference zero. We have
(𝛼3 − 𝜀)3𝑛 |𝑊 | ≤ {(𝑥, 𝑦) ∈ F3𝑛 × 𝑊 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}
= {(𝑥, 𝑦) ∈ F3𝑛 × (𝑊 \ {0}) : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} + | 𝐴| .
We have | 𝐴| ≤ 3𝑛 ≤ 𝜀3𝑛 |𝑊 |, so
(𝛼3 − 2𝜀)3𝑛 |𝑊 | ≤ {(𝑥, 𝑦) ∈ F3𝑛 × (𝑊 \ {0}) : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} .
By averaging, there exists 𝑦 ∈ 𝑊 \ {0} satisfying
{𝑥 ∈ F3𝑛 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} ≥ (𝛼3 − 2𝜀)3𝑛 .
This proves the theorem (with 𝜀 replaced by 2𝜀). □
By adapting the above proof strategy with Bohr sets, Green (2005a) proved that a Roth’s
theorem with popular differences in finite abelian groups of odd order, as well as in the
integers.
Theorem 6.7.5 (Roth’s theorem with popular difference in finite abelian groups)
For all 𝜀 > 0, there exists 𝑁0 = 𝑁0 (𝜀) such that for all finite abelian groups Γ of odd
order |Γ| ≥ 𝑁0 , and every 𝐴 ⊆ Γ with | 𝐴| = 𝛼 |Γ|, there exists 𝑦 ∈ Γ \ {0} such that
|{𝑥 ∈ Γ : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}| ≥ (𝛼3 − 𝜀) |Γ| .
See Tao’s blog post A Proof of Roth’s Theorem (2014) for a proof of Theorem 6.7.6 using
Bohr sets, following an arithmetic regularity decomposition in the spirit of Theorem 6.6.11.
232 Forbidding 3-Term Arithmetic Progressions
Remark 6.7.7 (Bounds). The above proof of Theorem 6.7.1 gives 𝑛0 = tower(𝜀 −𝑂 (1) ).
The bounds Theorems 6.7.5 and 6.7.6 are also tower-type. What is the smallest 𝑛0 (𝜀) for
which Theorem 6.7.1 holds? It turns out to be tower(Θ(log(1/𝜀))), as proved by Fox &
Pham (2019) over finite fields and Fox, Pham, & Zhao (2022) over the integers. Although it
had been known since Gowers (1997) that tower-type bounds are necessary for the regularity
lemmas themselves, Roth’s theorem with popular differences is the first regularity application
where a tower-type bound is shown to be indeed necessary.
Using quadratic Fourier analysis, Green & Tao (2010c) extended the popular difference
result over to 4-APs.
Theorem 6.7.8 (Popular difference for 4-APs)
For all 𝜀 > 0, there exists 𝑁0 = 𝑁0 (𝜀) such that for every 𝑁 ≥ 𝑁0 and 𝐴 ⊆ [𝑁] with
| 𝐴| = 𝛼𝑁, there exists 𝑦 ≠ 0 such that
|{𝑥 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦, 𝑥 + 3𝑦 ∈ 𝐴}| ≥ (𝛼4 − 𝜀)𝑁.
It may be a surprising that such a statement is false for APs of length 5 or longer. This was
shown by Bergelson, Host, & Kra (2005) with an appendix by Ruzsa giving a construction
that is a clever modification of the Behrend construction (Section 2.5).
Further Reading
Green has several excellent surveys and lecture notes:
• Finite Field Models in Additive Combinatorics (2005c) — For many additive combina-
torics problems, it is a good idea to first study them in the finite field setting (also see
the follow up by Wolf (2015)).
• Montreal Lecture Notes on Quadratic Fourier Analysis (2007a)— An introduction to
quadratic Fourier analysis and its application to the popular common difference theorem
for 4-APs in F5𝑛 .
• Lecture notes from his Cambridge course Additive Combinatorics (2009b).
Tao’s FOCS 2007 tutorial Structure and Randomness in Combinatorics (2007a) explains
many facets of arithmetic regularity and applications.
For more on algebraic methods in combinatorics (mostly pre-dating methods in Sec-
tion 6.5), see the books:
• Thirty-three Miniatures by Matoušek (2010);
• Linear Algebra Methods in Combinatorics by Babai & Frankl;
6.7 Popular Common Difference 233
Chapter Summary
• Basic tools of discrete Fourier analysis:
– Fourier transform,
– Fourier inversion formula,
– Parsevel / Plancheral identity (unitarity of the Fourier transform),
– convolution identity (Fourier transform converts convolutions to multiplication).
• The finite field model (e.g., F3𝑛 ) offers a convenient playground for Fourier analysis
in additive combinatorics. Many techniques can then be adapted to the integer setting,
although often with additional technicalities.
• Roth’s theorem. Using Fourier analysis, we proved that every 3-AP-free subset has size
at most
– 𝑂 (3𝑛 /𝑛) in F𝑛 , and
– 𝑂 (𝑁/log log 𝑁) in [𝑁] ⊆ Z.
• The Fourier analytic proof of Roth’s theorem (both in F3𝑛 and in Z) proceeds via a density
increment argument:
(1) A 3-AP-free set has a large Fourier coefficient;
(2) A large Fourier coefficient implies density increment on some hyperplane (in F3𝑛 ) or
subprogression (in Z);
(3) Iterate the density increment.
• Using the polynomial method, we showed that every 3-AP-free subset of F3𝑛 has size
𝑂 (2.76𝑛 ).
• Arithmetic regularity lemma. Given 𝐴 ⊆ F𝑛𝑝 , we can find a bounded codimensional
subspace so that 𝐴 is Fourier-uniform on almost all cosets.
– An application: Roth’s theorem with popular difference. For every 𝐴 ⊆ F3𝑛 , there is
some “popular 3-AP common difference” with frequency at least nearly as much as if
𝐴 were random.
7
Chapter Highlights
• Freiman’s theorem: structure of sets with small doubling
• Inequalities between sizes of sumsets: Ruzsa triangle inequality and Plünnecke’s inequality
• Ruzsa covering lemma
• Freiman homomorphisms: preserving partial additive structure
• Ruzsa modeling lemma
• Structure in iterated sumsets: Bogolyubov’s lemma
• Geometry of numbers: Minkowski’s second theorem
• Polynomial Freiman–Ruzsa conjecture
• Additive energy and the Balog–Szemerédi–Gowers theorem
Let 𝐴 and 𝐵 be finite subsets of some ambient abelian group. We define their sumset to
be
𝑨 + 𝑩 := {𝑎 + 𝑏 : 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵} .
Note that we view 𝐴 + 𝐵 as a set, and do not keep track of the number of ways that each
element can be written as 𝑎 + 𝑏.
The main goal of this chapter is to understand the following question.
One of the main goals of this chapter is to prove Freiman’s theorem, which is a deep and
foundational result in additive combinatorics. Freiman’s theorem tells us whenever 𝐴 + 𝐴 is
at most a constant factor larger than 𝐴, then 𝐴 must be a large fraction of some generalized
arithmetic progression.
Most of this chapter will be devoted towards proving Freiman’s theorem. We will see ideas
and tools from Fourier analysis, geometry of numbers, and additive combinatorics.
In Section 7.13, we will introduce the additive energy of a set, which is another way to
measure the additive structure of a set. We will see the Balog–Szemerédi–Gowers theorem,
which relates additive energy and doubling. This section can be read independently from the
earlier parts of the chapter.
These results on the structure of set addition are not only interesting on their own, but also
play a key role in Gowers’ proof (2001) of Szemerédi’s theorem (although we do not cover it
in this book; see Further Reading at the end of the chapter). Gowers’ deep and foundational
work shows how these topics in additive combinatorics are all highly connected.
235
236 Structure of Set Addition
Proof. Let 𝑛 = | 𝐴|. For the lower bound | 𝐴 + 𝐴| ≥ 2𝑛 − 1, note that if the elements of 𝐴 are
𝑎 1 < 𝑎 2 < · · · < 𝑎 𝑛 , then
𝑎1 + 𝑎1 < 𝑎1 + 𝑎2 < · · · < 𝑎1 + 𝑎 𝑛 < 𝑎2 + 𝑎 𝑛 < · · · < 𝑎 𝑛 + 𝑎 𝑛
are 2𝑛 − 1 distinct elements of 𝐴 + 𝐴. So | 𝐴 + 𝐴| ≥ 2𝑛 − 1. Equality is attained when 𝐴 is an
arithmetic progression.
The upper bound | 𝐴 + 𝐴| ≤ 𝑛+1 2
follows from that there are 𝑛+12
unordered pairs of
elements of 𝐴. We have equality when there are no nontrivial solutions to 𝑎 + 𝑏 = 𝑐 + 𝑑 in
𝐴, such as when 𝐴 consists of powers of twos. □
Exercise 7.1.2 (Sumsets in abelian groups). Show that if 𝐴 is a finite subset of an abelian
group, then | 𝐴 + 𝐴| ≥ | 𝐴|, with equality if and only if 𝐴 is the coset of some subgroup.
What can we say about 𝐴 if 𝐴 + 𝐴 is not too much larger than 𝐴?
One of the main results of this chapter, Freiman’s theorem, addresses the following ques-
tion.
−→
Z2 Z
We often abuse notation and use the term GAP to refer to the image of 𝜙, viewed as a set:
𝑎 0 + 𝑎 1 · [𝐿 1 ] + · · · + 𝑎 𝑑 · [𝐿 𝑑 ] = {𝑎 0 + 𝑎 1 𝑥1 + · · · + 𝑎 𝑑 𝑥 𝑑 : 𝑥 1 ∈ [𝐿 1 ], . . . , 𝑥 𝑑 ∈ [𝐿 𝑑 ]} .
Example 7.1.8. A proper GAP of dimension 𝑑 has doubling constant ≤ 2𝑑 .
Example 7.1.9. Let 𝑃 be a proper GAP of dimension 𝑑. Let 𝐴 ⊆ 𝑃 with | 𝐴| ≥ |𝑃| /𝐾. Then
𝐴 has doubling constant ≤ 𝐾2𝑑 .
While it is often easy to check that certain sets have small doubling, the inverse problem
is much more difficult. We would like to characterize all sets with small doubling. The
following foundational result by Freiman (1973) shows that all sets with bounded doubling
must look like Example 7.1.9.
Freiman’s theorem is a deep result. We will spend most the chapter proving it.
238 Structure of Set Addition
Remark 7.1.11 (Quantitative bounds). We will present a proof giving 𝑑 (𝐾) = exp(𝐾 𝑂 (1) )
and 𝑓 (𝐾) = exp(𝑑 (𝐾)), due to Ruzsa (1994). Chang (2002) showed that Freiman’s theorem
holds with 𝑑 (𝐾) = 𝐾 𝑂 (1) and 𝑓 (𝐾) = exp(𝑑 (𝐾)) (see Exercise 7.11.2). Schoen (2011)
further improved the bounds to 𝑑 (𝐾) = 𝐾 1+𝑜(1) and 𝑓 (𝐾) = exp(𝐾 1+𝑜(1) ). Sanders (2012,
2013) showed that if we change GAPs to “convex progressions” (see Section 7.12), then an
analogous theorem holds with 𝑑 (𝐾) = 𝐾 (log(2𝐾)) 𝑂 (1) and 𝑓 (𝐾) = exp(𝑑 (𝐾)).
It is easy to see that one cannot do better than 𝑑 (𝐾) ≤ 𝐾 − 1 and 𝑓 (𝐾) = 𝑒 𝑂 (𝐾 ) , by
considering a set without additive structure.
Also see Section 7.12 on the polynomial Freiman–Ruzsa conjecture for a variant of
Freiman’s theorem with much better quantitative dependencies.
Remark 7.1.12 (Making the GAP proper). The conclusion of Freiman’s theorem can be
strengthened to force the GAP to be proper, at the cost of potentially increasing 𝑑 (𝐾) and
𝑓 (𝐾). For example, it is known that every GAP of dimension 𝑑 is contained in some proper
3
GAP of dimension ≤ 𝑑 with at most 𝑑 𝑂 (𝑑 ) factor increase in the volume; see Tao & Vu
(2006, Theorem 3.40).
Remark 7.1.13 (History). Freiman’s original proof (1973) was quite complicated. Ruzsa
(1994) later found a simpler proof, which guided much of the subsequent work. We follow
Ruzsa’s presentation here. Theorem 7.1.10 is sometimes called the Freiman–Ruzsa theorem.
Freiman’s theorem was brought into further prominence due to the role it played in the new
proof of Szemerédi’s theorem by Gowers (2001).
Remark 7.1.14 (Freiman’s theorem in abelian groups). Green & Ruzsa (2007) proved a
generalization of Freiman’s theorem in an arbitrary abelian group. A coset progression is a
set of the form 𝑃 + 𝐻 where 𝑃 is a GAP and 𝐻 is a subgroup of the ambient abelian group.
Define the dimension of this coset progression to be the dimension of 𝑃, and its volume to
be |𝐻| vol 𝑃. Green & Ruzsa (2007) proved the following theorem.
Theorem 7.1.15 (Freiman’s theorem for general abelian groups)
Let 𝐴 be a subset of an abelian group satisfying | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|. Then 𝐴 is contained
in a coset progression of dimension at most 𝑑 (𝐾) and volume at most 𝑓 (𝑘) | 𝐴|, where
𝑑 (𝐾) and 𝑓 (𝐾) are constants depending only on 𝐾.
Then 𝜙 is injective since we can recover (𝑎, 𝑑) from 𝜙(𝑎, 𝑑) = (𝑥, 𝑦) via 𝑑 = 𝑦 − 𝑥 and then
𝑎 = 𝑥 + b (𝑑). □
Remark 7.2.2. By replacing 𝐵 with −𝐵 and/or 𝐶 with −𝐶, Theorem 7.2.1 implies some
additional sumset inequalities:
| 𝐴| |𝐵 + 𝐶 | ≤ | 𝐴 + 𝐵| | 𝐴 − 𝐶 | ;
| 𝐴| |𝐵 + 𝐶 | ≤ | 𝐴 − 𝐵| | 𝐴 + 𝐶 | ;
| 𝐴| |𝐵 − 𝐶 | ≤ | 𝐴 + 𝐵| | 𝐴 + 𝐶 | .
However, this trick cannot be used to prove the similarly looking inequality
| 𝐴| |𝐵 + 𝐶 | ≤ | 𝐴 + 𝐵| | 𝐴 + 𝐶 | .
This inequality is also true, and we will prove it in the following section.
Remark 7.2.3 (Why is it called a triangle inequality?). If we define
| 𝐴 − 𝐵|
𝜌( 𝐴, 𝐵) := log √︁
| 𝐴| |𝐵|
(called a Ruzsa distance), then Theorem 7.2.1 can be rewritten as
𝜌(𝐵, 𝐶) ≤ 𝜌( 𝐴, 𝐵) + 𝜌( 𝐴, 𝐶).
This is why Theorem 7.2.1 is called a “triangle inequality.” However, one should not take
the name too seriously. The function 𝜌 is not a metric because 𝜌( 𝐴, 𝐴) ≠ 0 in general.
Exercise 7.2.4 (Iterated sumsets). Let 𝐴 be a finite ssubset of an abelian group satisfying
|2𝐴 − 2𝐴| ≤ 𝐾 | 𝐴| .
Prove that
|𝑚 𝐴 − 𝑚 𝐴| ≤ 𝐾 𝑚−1 | 𝐴| for every integer 𝑚 ≥ 2.
In the above exercise, we had to start with the assumption that |2𝐴 − 2𝐴| ≤ 𝐾 | 𝐴|. In
the next section, we bound the sizes of iterated sumsets starting with the weaker hypothesis
| 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|.
Remark 7.3.2 (History). Plünnecke (1970) proved a version of the theorem originally using
graph theoretic methods. Ruzsa (1989) gave a simpler version of Plünnecke’s proof and also
extended it from sums to differences. Nevertheless, Ruzsa’s proof was still quite long and
complex. It sets up a “commutative layered graph”, and uses tools from graph theory including
Menger’s theorem. Theorem 7.3.1 is sometimes called the Plünnecke–Ruzsa inequality. See
Ruzsa (2009, Chapter 1) or Tao & Vu (2006, Chapter 6) for an account of this proof.
In a surprising breakthrough, Petridis (2012) found a very short proof of the result, which
we present here.
We will prove the following more general statement. Theorem 7.3.1 is the special case
𝐴 = 𝐵.
Theorem 7.3.3 (Plünnecke’s inequality)
Let 𝐴 and 𝐵 be finite subsets of an abelian group satisfying
| 𝐴 + 𝐵| ≤ 𝐾 | 𝐴| .
Then for all integers 𝑚, 𝑛 ≥ 0,
|𝑚𝐵 − 𝑛𝐵| ≤ 𝐾 𝑚+𝑛 | 𝐴| .
Remark 7.3.5 (Interpretation as expansion ratios). We can interpret Lemma 7.3.4 in terms
of vertex expansion ratios inside the bipartite graph between two copies of the ambient
abelian group, with edges (𝑥, 𝑥 + 𝑏) ranging over all 𝑥 ∈ Γ and 𝑏 ∈ 𝐵. Every vertex subset 𝑋
on the left has neighbors 𝑋 + 𝐵 on the right, and thus has vertex expansion ratio |𝑋 + 𝐵| /|𝑋 |.
+𝐵
𝐴
𝑋+𝐵
𝑋
𝑋+𝐶
𝑋+𝐶 +𝐵
7.3 Sumset Calculus II: Plünnecke’s Inequality 241
We will apply Lemma 7.3.4 by choosing 𝑋 among all nonempty subsets of 𝐴 with the
minimum expansion ratio, so that the hypothesis of Lemma 7.3.4 is automatically satisfied.
The conclusion of Lemma 7.3.4 then says that a union of translates of 𝑋 has expansion ratio
at most that of 𝑋.
Proof of Theorem 7.3.3 given Lemma 7.3.4. Choose 𝑋 among all nonempty subsets of 𝐴
with the minimum |𝑋 + 𝐵| /|𝑋 | so that the hypothesis of Lemma 7.3.4 is satisfied. Also we
have
|𝑋 + 𝐵| | 𝐴 + 𝐵|
≤ ≤ 𝐾.
|𝑋 | | 𝐴|
For every integer 𝑛 ≥ 0, applying Lemma 7.3.4 with 𝐶 = 𝑛𝐵, we have
|𝑋 + (𝑛 + 1)𝐵| |𝑋 + 𝐵|
≤ ≤ 𝐾.
|𝑋 + 𝑛𝐵| |𝑋 |
So induction on 𝑛 yields, for all 𝑛 ≥ 0,
|𝑋 + 𝑛𝐵| ≤ 𝐾 𝑛 |𝑋 | .
Finally, applying the Ruzsa triangle inequality (Theorem 7.2.1), for all 𝑚, 𝑛 ≥ 0.
|𝑋 + 𝑚𝐵| |𝑋 + 𝑛𝐵|
|𝑚𝐵 − 𝑛𝐵| ≤ ≤ 𝐾 𝑚+𝑛 |𝑋 | ≤ 𝐾 𝑚+𝑛 | 𝐴| . □
|𝑋 |
Proof of Lemma 7.3.4. We will proceed by induction on |𝐶 |. For the base case |𝐶 | = 1, note
that 𝑋 + 𝐶 is a translate of 𝑋, so |𝑋 + 𝐶 + 𝐵| = |𝑋 + 𝐵| and |𝑋 + 𝐶 | = |𝑋 |.
Now for the induction step, assume that for some 𝐶,
|𝑋 + 𝐶 + 𝐵| |𝑋 + 𝐵|
≤ .
|𝑋 + 𝐶 | |𝑋 |
Now consider 𝐶 ∪ {𝑐} for some 𝑐 ∉ 𝐶. We wish to show that
|𝑋 + (𝐶 ∪ {𝑐}) + 𝐵| |𝑋 + 𝐵|
≤ .
|𝑋 + (𝐶 ∪ {𝑐})| |𝑋 |
By comparing the change in the left-hand side fraction, it suffices to show that
|𝑋 + 𝐵|
|(𝑋 + 𝑐 + 𝐵) \ (𝑋 + 𝐶 + 𝐵)| ≤ |(𝑋 + 𝑐) \ (𝑋 + 𝐶)| . (7.1)
|𝑋 |
Let
𝑌 = {𝑥 ∈ 𝑋 : 𝑥 + 𝑐 + 𝐵 ⊆ 𝑋 + 𝐶 + 𝐵} ⊆ 𝑋.
Then
|(𝑋 + 𝑐 + 𝐵) \ (𝑋 + 𝐶 + 𝐵)| ≤ |𝑋 + 𝐵| − |𝑌 + 𝐵| .
Furthermore, if 𝑥 ∈ 𝑋 satisfies 𝑥 + 𝑐 ∈ 𝑋 + 𝐶, then 𝑥 + 𝑐 + 𝐵 ⊆ 𝑋 + 𝐶 + 𝐵 and hence 𝑥 ∈ 𝑌 . So
|(𝑋 + 𝑐) \ (𝑋 + 𝐶)| ≥ |𝑋 | − |𝑌 | .
Thus, to prove (7.1), it suffices to show
|𝑋 + 𝐵|
|𝑋 + 𝐵| − |𝑌 + 𝐵| ≤ (|𝑋 | − |𝑌 |) ,
|𝑋 |
242 Structure of Set Addition
Exercise 7.3.8∗ (Loomis–Whitney for sumsets). Show that for every finite subsets 𝐴, 𝐵, 𝐶
in an abelian group, one has
| 𝐴 + 𝐵 + 𝐶 | 2 ≤ | 𝐴 + 𝐵| | 𝐴 + 𝐶 | |𝐵 + 𝐶 | .
Remark 7.4.2 (Geometric intuition). Imagine that 𝐵 is a unit ball in R𝑛 , and cardinality
above is replaced by volume. Given some region 𝑋 (the shaded region below), consider a
maximal set T of disjoint union balls with centers in 𝑋 (maximal in the sense that one cannot
add an additional ball without intersecting some other ball).
7.5 Freiman’s Theorem in Groups with Bounded Exponent 243
Then replacing each ball in T by a ball of radius 2 with the same center, (i.e., replacing
𝐵 by 𝐵 − 𝐵) the resulting balls must cover the region 𝑋 (which amounts to the conclusion
𝑋 ⊆ 𝑇 + 𝐵 − 𝐵), for otherwise at any uncovered point of 𝑋 we could have added an additional
non-overlapping ball in the previous step.
Similar arguments are important in analysis (e.g., the Vitali covering lemma).
Proof. Let 𝑇 ⊆ 𝑋 be a maximal subset such that 𝑡 + 𝐵 as 𝑡 ranges over 𝑇 are disjoint. Then
|𝑇 | |𝐵| = |𝑇 + 𝐵| ≤ |𝑋 + 𝐵| ≤ 𝐾 |𝐵| .
So |𝑇 | ≤ 𝐾.
By the maximality of 𝑇, for all 𝑥 ∈ 𝑋 there exists some 𝑡 ∈ 𝑇 such that (𝑡 + 𝐵) ∩ (𝑥 + 𝐵) ≠ ∅.
In other words, there exist 𝑡 ∈ 𝑇 and 𝑏, 𝑏 ′ ∈ 𝐵 such that 𝑡 + 𝑏 = 𝑥 + 𝑏 ′ . Hence 𝑥 ∈ 𝑇 + 𝐵 − 𝐵
for every 𝑥 ∈ 𝑋. Thus 𝑋 ⊆ 𝑇 + 𝐵 − 𝐵. □
The following “more efficient” covering lemma can be used to prove a better bound in
Freiman’s theorem.
Exercise 7.4.3∗ (Chang’s covering lemma). Let 𝐴 and 𝐵 be finite sets in an abelian group
satisfying
| 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| and | 𝐴 + 𝐵| ≤ 𝐾 ′ |𝐵| .
Show that there exists some set 𝑋 in the abelian group so that
𝐴 ⊆ Σ𝑋 + 𝐵 − 𝐵 and |𝑋 | = 𝑂 (𝐾 log(𝐾𝐾 ′ )),
where Σ𝑋 denotes the set of all elements that can be written as the sum of a subset of
elements of 𝑋 (including zero as the sum of the empty set).
Hint: Try first finding 2𝐾 disjoint translates 𝑎 + 𝐵.
For example, F2𝑛 has exponent 2. The cyclic group Z/𝑁Z has exponent 𝑁. The integers Z
has infinite exponent.
We use ⟨𝑨⟩ to refer to the subgroup of a group 𝐺 generated by some subset 𝐴 of 𝐺. Then
the exponent of a group 𝐺 is sup 𝑥 ∈𝐺 |⟨𝑥⟩|. When the group is a vector space (e.g., F2𝑛 ), ⟨𝐴⟩
is the smallest subspace containing 𝐴.
Remark 7.5.5. This theorem is a converse of the observation that if 𝐴 is a large fraction of
a subgroup, then 𝐴 has small doubling.
Proof. By Plünnecke’s inequality (Theorem 7.3.1), we have
| 𝐴 + (2𝐴 − 𝐴)| = |3𝐴 − 𝐴| ≤ 𝐾 4 | 𝐴| .
By the Ruzsa covering lemma (Theorem 7.4.1 applied with 𝑋 = 2𝐴 − 𝐴 and 𝐵 = 𝐴), there
exists some 𝑇 ⊆ 2𝐴 − 𝐴 with |𝑇 | ≤ | 𝐴 + (2𝐴 − 𝐴)| /| 𝐴| ≤ 𝐾 4 such that
2𝐴 − 𝐴 ⊆ 𝑇 + 𝐴 − 𝐴.
Adding 𝐴 to both sides, we have,
3𝐴 − 𝐴 ⊆ 𝑇 + 2𝐴 − 𝐴 ⊆ 2𝑇 + 𝐴 − 𝐴.
Iterating, for any positive integer 𝑛, we have
(𝑛 + 1) 𝐴 − 𝐴 ⊆ 𝑛𝑇 + 𝐴 − 𝐴 ⊆ ⟨𝑇⟩ + 𝐴 − 𝐴.
7.6 Freiman Homomorphisms 245
Since we are in an abelian group with bounded exponent, every element of ⟨𝐴⟩ lies in 𝑛𝐴
for some n. Thus
Ø
⟨𝐴⟩ ⊆ (𝑛𝐴 + 𝐴 − 𝐴) ⊆ ⟨𝑇⟩ + 𝐴 − 𝐴.
𝑛≥1
Exercise 7.5.8∗ (Ball volume growth in an abelian Cayley graph). Show that there is
some absolute constant 𝐶 so that if 𝑆 is a finite subset of an abelian group, and 𝑘 is a
positive integer, then
|2𝑘 𝑆| ≤ 𝐶 |𝑆 | |𝑘 𝑆| .
The two sets are very similar from the point of view of additive structure. For example, the
obvious bijection between 𝐴 and 𝐵 has the nice property that any solution to the equation
𝑤 + 𝑥 = 𝑦 + 𝑧 in one set is automatically a solution in the other. Sometimes, in additive
combinatorics, it is a good idea to treat these two sets as isomorphic. Let us define this
246 Structure of Set Addition
notion formally and study what it means for a map between sets to partially preserve additive
structure.
Intuitively, the idea is that there are no wrap around additive relations mod 𝑁 if 𝐴 has
small diameter.
Proof. The mod 𝑁 map Z → Z/𝑁 is a group homomorphism, and hence automatically a
Freiman 𝑠-homomorphism. Now, if 𝑎 1 , . . . , 𝑎 𝑠 , 𝑎 1′ , . . . , 𝑎 ′𝑠 ∈ 𝐴 are such that
(𝑎 1 + · · · + 𝑎 𝑠 ) − (𝑎 1′ + · · · + 𝑎 ′𝑠 ) ≡ 0 (mod 𝑁),
then the left hand side, viewed as an integer, has absolute value less than 𝑁 (since 𝑎 𝑖 − 𝑎 𝑖′ <
𝑁/𝑠 for each 𝑖). Thus the left hand side must be 0 in Z. So the inverse of the mod 𝑁 map is a
Freiman 𝑠-homomorphism over 𝐴, and thus mod 𝑁 is a Freiman 𝑠-isomorphism. □
Proof. Choose any prime 𝑞 > max(𝑠 𝐴 − 𝑠 𝐴). For every choice of 𝜆 ∈ [𝑞 − 1], we define 𝜙𝜆
as the composition of functions as follows
mod 𝑞 ·𝜆 (mod 𝑞) −1
𝜙 = 𝜙𝜆 : Z −−−−→ Z/𝑞Z −−−−→ Z/𝑞Z −−−−−−−−−→ {0, 1, . . . , 𝑞 − 1} .
The first map is the mod 𝑞 map. The second map sends 𝑥 to 𝜆𝑥. The last map inverts the mod
𝑞 map Z → Z/𝑞Z.
If 𝜆 ∈ [𝑞 − 1] is chosen uniformly at random, then each nonzero integer is mapped to a
uniformly random element of [𝑞 − 1] under 𝜙𝜆 , and so is divisible by 𝑁 with probability
≤ 1/𝑁. Since there are fewer than 𝑁 nonzero elements in 𝑠 𝐴 − 𝑠 𝐴, there exists a choice of 𝜆
so that
𝑁 ∤ 𝜙𝜆 (𝑥) for any nonzero 𝑥 ∈ 𝑠 𝐴 − 𝑠 𝐴. (7.2)
Let us fix this 𝜆 from now on and write 𝜙 = 𝜙𝜆 .
Among the three functions whose composition defines 𝜙, the first map (i.e., mod 𝑞) and the
second map (·𝜆 in Z/𝑞Z) are group homomorphisms, and hence Freiman 𝑠-homomorphisms.
The last map is not a Freiman 𝑠-homomorphism, but it becomes one when restricted to an
interval of at most 𝑞/𝑠 elements (see Proposition 7.6.6). By the pigeonhole principle, we can
find an interval 𝐼 with
diam 𝐼 < 𝑞/𝑠
such that
𝐴′ = {𝑎 ∈ 𝐴 : 𝜙(𝑎) ∈ 𝐼}
has ≥ | 𝐴| /𝑠 elements. So 𝜙 sends 𝐴′ Freiman 𝑠-homomorphically to its image.
We further compose 𝜙 with the mod 𝑁 map to obtain
𝜙 mod 𝑞
𝜓 : Z −−→ {0, 1, . . . , 𝑞 − 1} −−−−→ Z/𝑁Z.
We claim that 𝜓 maps 𝐴′ Freiman 𝑠-isomorphically to its image. Indeed, we saw that 𝜓 is a
Freiman 𝑠-homomorphism when restricted to 𝐴′ (since both 𝜙| 𝐴′ and the mod 𝑁 map are).
Now suppose 𝑎 1 , . . . , 𝑎 𝑠 , 𝑎 1′ , . . . , 𝑎 ′𝑠 ∈ 𝐴′ satisfy
𝜓(𝑎 1 ) + · · · + 𝜓(𝑎 𝑠 ) = 𝜓(𝑎 1′ ) + · · · + 𝜓(𝑎 ′𝑠 ),
which is the same as saying that 𝑁 divides
𝑦 := 𝜙(𝑎 1 ) + · · · + 𝜙(𝑎 𝑠 ) − 𝜙(𝑎 1′ ) − · · · − 𝜙(𝑎 ′𝑠 ) ∈ Z.
By swapping (𝑎 1 , . . . , 𝑎 𝑠 ) with (𝑎 1′ , . . . , 𝑎 ′𝑠 ) if needed, we may assume that 𝑦 ≥ 0. Since
𝜙( 𝐴′ ) ⊆ 𝐼, we have |𝜙(𝑎 𝑖 ) − 𝜙(𝑎 𝑖′ )| ≤ diam 𝐼 < 𝑞/𝑠 for each 𝑖, and thus
0 ≤ 𝑦 < 𝑞.
7.8 Iterated Sumsets: Bogolyubov’s Lemma 249
Let
𝑥 = 𝑎 1 + · · · + 𝑎 𝑠 − 𝑎 1′ − · · · − 𝑎 ′𝑠 ∈ 𝑠 𝐴 − 𝑠 𝐴.
Since 𝜙 mod 𝑞 is a group homomorphism,
𝜙(𝑥) ≡ 𝜙(𝑎 1 ) + · · · + 𝜙(𝑎 𝑠 ) − 𝜙(𝑎 1′ ) − · · · − 𝜙(𝑎 ′𝑠 ) = 𝑦 (mod 𝑞).
Since
𝜙(𝑥), 𝑦 ∈ [0, 𝑞) ∩ Z and 𝜙(𝑥) ≡ 𝑦 (mod 𝑞),
we have 𝜙(𝑥) = 𝑦. Since 𝑁 divides 𝑦 = 𝜙(𝑥), and by (7.2), 𝑁 ∤ 𝜙(𝑥) for any nonzero
𝑥 ∈ 𝑠𝐴 − 𝑠𝐴, we must have 𝑥 = 0. Thus
𝑎 1 + · · · + 𝑎 𝑠 = 𝑎 1′ + · · · + 𝑎 ′𝑠 .
Hence 𝐴′ is a set of size ≥ | 𝐴| /𝑠 that is Freiman 𝑠-isomorphic via 𝜓 to its image in Z/𝑁Z. □
Exercise 7.7.4 (Modeling arbitrary sets of integers). Let 𝐴 ⊆ Z with | 𝐴| = 𝑛.
(a) Let 𝑝 be a prime. Show that there is some integer 𝑡 relatively prime to 𝑝 such that
∥𝑎𝑡/𝑝∥ R/Z ≤ 𝑝 −1/𝑛 for all 𝑎 ∈ 𝐴.
(b) Show that 𝐴 is Freiman 2-isomorphic to a subset of [𝑁] for some 𝑁 = (4 + 𝑜(1)) 𝑛 .
(c) Show that (b) cannot be improved to 𝑁 = 2𝑛−2 .
(You may use the fact that the smallest prime larger than 𝑚 has size 𝑚 + 𝑜(𝑚).)
Exercise 7.7.5 (Sumset with 3-AP-free set). Let 𝐴 and 𝐵 be 𝑛-element subsets of the
integers. Suppose 𝐴 is 3-AP free. Prove that | 𝐴 + 𝐵| ≥ 𝑛(log log 𝑛) 1/100 provided that 𝑛 is
sufficiently large.
Hint: Ruzsa triangle inequality, Plünnecke’s inequality, Ruzsa model lemma, Roth’s theorem
Exercise 7.7.6 (3-AP-free subsets of arbitrary sets of integers). Prove that there is√some
log 𝑛
constant 𝐶 > 0 so that every set of 𝑛 integers has a 3-AP-free subset of size ≥ 𝑛𝑒 −𝐶 .
The answer to the above question is no, as evidenced by the following example. (Niveau
is French for level.)
250 Structure of Set Addition
Example 7.8.2 (Niveau set). Let 𝐴 be the set of all points in F2𝑛 with Hamming weight
√
(number of 1 entries) at most (𝑛−𝑐 𝑛)/2. Note by the central limit theorem | 𝐴| = (𝛼+𝑜(1))2𝑛
for for some constant 𝛼 = 𝛼(𝑐) ∈ (0, 1). The sumset 𝐴 + 𝐴 consists of points in the boolean
√
cube whose Hamming weight is at most 𝑛 − 𝑐 𝑛 and thus does not contain any subspace of
√
codimension < 𝑐 𝑛, by Lemma 6.5.4.
It turns out that the iterated sumset 2𝐴 − 2𝐴 (same as 4𝐴 in F2𝑛 ) always contains a bounded
codimensional subspace. The intuition is that taking sumsets “smooths” out the structure of
a set, analogous to how convolutions in real analysis make functions more smooth.
𝑓∗𝑓
𝑓∗𝑓∗𝑓
𝑓∗𝑓∗𝑓∗𝑓
Recall some basic properties of the Fourier transform. Given 𝐴 ⊆ F𝑛𝑝 with | 𝐴| = 𝛼𝑝 𝑛 , we
have
1c𝐴 (0) = 𝛼,
and by Parseval’s identity
∑︁
| 1c𝐴 (𝑟)| 2 = E 𝑥 ∈F𝑛𝑝 |1 𝐴 (𝑥)| 2 = 𝛼.
𝑟 ∈F𝑛𝑝
Proof. Let
𝑓 = 1 𝐴 ∗ 1 𝐴 ∗ 1− 𝐴 ∗ 1− 𝐴,
which is supported on 2𝐴 − 2𝐴. By the convolution identity (Theorem 6.1.7), noting that
1d c
− 𝐴 (𝑟) = 1 𝐴 (𝑟), we have, for every 𝑟 ∈ F 𝑝 ,
𝑛
b
𝑓 (𝑟) = 1c𝐴 (𝑟) 2 1d 2 c 4
− 𝐴 (𝑟) = | 1 𝐴 (𝑟)| .
It suffices to find a subspace where 𝑓 is positive since 𝑓 (𝑥) > 0 implies 𝑥 ∈ 2𝐴 − 2𝐴. We
will take the subspace defined by large Fourier coefficients. Let
n o
𝑅 = 𝑟 ∈ F𝑛𝑝 \{0} : | 1c𝐴 (𝑟)| > 𝛼3/2 .
We can bound the size of 𝑅 using Parseval’s identity:
∑︁ ∑︁
|𝑅| 𝛼3 ≤ | 1c𝐴 (𝑟)| 2 < | 1c𝐴 (𝑟)| 2 = E 𝑥 |1 𝐴 (𝑥)| 2 = 𝛼.
𝑟 ∈𝑅 𝑟 ∈F𝑛𝑝
So
|𝑅| < 1/𝛼2 .
If 𝑟 ∉ 𝑅 ∪ {0}, then | 1c𝐴 (𝑟)| ≤ 𝛼3/2 . So, applying Parseval’s identity again,
∑︁ ∑︁
| 1c𝐴 (𝑟)| 4 ≤ max | 1c𝐴 (𝑟)| 2 | 1c𝐴 (𝑟)| 2
𝑟∉𝑅∪{0}
∑︁
𝑟∉𝑅∪{0} 𝑟∉𝑅∪{0}
<𝛼 3
| 1c𝐴 (𝑟)| = 𝛼3 E 𝑥 |1 𝐴 (𝑥)| 2 = 𝛼4 .
2
𝑟 ∈F𝑛𝑝
Bogolyubov’s lemma holds over Z/𝑁Z after replacing subspaces by Bohr sets. Note that
the dimension of a Bohr set of Z/𝑁Z corresponds to the codimension of a subspace in F𝑛𝑝 .
With the right setup, the proof is essentially identical to that of Theorem 7.8.3.
Given 𝑓 : Z/𝑁Z → C, we define its Fourier transform to be the function b 𝑓 : Z/𝑁Z → C
given by
b
𝑓 (𝑟) = E 𝑥 ∈Z/𝑁 Z 𝑓 (𝑥)𝜔 −𝑟 𝑥
where 𝜔 = exp(2𝜋𝑖/𝑁). Fourier inversion, Parseval’s identity, and the convolution identity
all work the same way.
Proof. Let
𝑓 = 1 𝐴 ∗ 1 𝐴 ∗ 1− 𝐴 ∗ 1− 𝐴,
which is supported on 2𝐴 − 2𝐴. By the convolution identity, for every 𝑟 ∈ Z/𝑁Z,
b
𝑓 (𝑟) = 1c𝐴 (𝑟)b
2
12− 𝐴 (𝑟) = | 1c𝐴 (𝑟)| 4 .
By Fourier inversion, we have (noting that 𝑓 is real-valued)
∑︁ ∑︁
𝑓 (𝑥) = b
𝑓 (𝑟)𝜔𝑟 𝑥 = | 1c𝐴 (𝑟)| 4 𝜔𝑟 𝑥 .
𝑟 ∈Z/𝑁 Z 𝑟 ∈Z/𝑁 Z
Let n o
𝑅 = 𝑟 ∈ Z/𝑁Z\{0} : | 1c𝐴 (𝑟)| > 𝛼3/2 .
So
|𝑅| < 1/𝛼2 .
We have
∑︁ ∑︁
| 1c𝐴 (𝑟)| 4 ≤ 𝛼3 | 1c𝐴 (𝑟)| 2 < 𝛼4 .
𝑟∉𝑅∪{0} 𝑟∉𝑅∪{0}
For all 𝑥 ∈ Bohr(𝑅, 1/4), every 𝑟 ∈ 𝑅 satisfies ∥𝑟𝑥/𝑁 ∥ R/Z ≤ 1/4, and so cos(2𝜋𝑟𝑥/𝑁) ≥ 0.
Thus every 𝑥 ∈ Bohr(𝑅, 1/4) satisfies
∑︁
𝑓 (𝑥) = | 1c𝐴 (𝑟)| 4 𝜔𝑟 · 𝑥
∑︁ ∑︁
𝑟 ∈Z/𝑁 Z
affine subspace (we do not always have a large subspace since the origin is not necessarily
even in 3𝐴).
A related phenomenon arises in Goldbach conjecture. Let 𝑃 denote the set of primes. The
still open Goldbach conjecture states that 𝑃 + 𝑃 contains all sufficiently large even integers.
On the other hand, Vinogradov (1937) showed that 𝑃 + 𝑃 + 𝑃 contains all sufficiently large
odd integers (also known as the weak or ternary Goldbach problem).
Our next goal is to find a large GAP in the Bohr set produced by Bogolyubov’s lemma. To
do this, we need some results from the geometry of numbers.
Exercise 7.8.7 (Bogolyubov with 3-fold sums). Let 𝐴 ⊆ F𝑛𝑝 with | 𝐴| = 𝛼𝑝 𝑛 . Prove that
𝐴 + 𝐴 + 𝐴 contains a translate of a subspace of codimension 𝑂 (𝛼 −3 ).
Given a lattice, there are many choices of a basis for the lattice. The determinant of a
lattice does not depend on the choice of a basis, and equals the volume of every fundamental
parallelepiped. Translations of the fundamental parallelepiped by lattice vectors tiles (i.e.,
partitions) the space.
An example of a lattice is illustrated below. Two different fundamental parallelepipeds are
shaded.
254 Structure of Set Addition
𝑣2
0 𝑣1
b2
𝜆2 𝐾
𝜆1 𝐾
b1 𝐾
0
In the next section, we will apply the following fundamental result from the geometry of
numbers (Minkowski 1896).
Proof. We have vol( 21 𝐾) = 2−𝑑 vol(𝐾) > det(Λ). By Blichfeldt’s theorem there exist distinct
𝑥, 𝑦 ∈ 21 𝐾 such that 𝑥 − 𝑦 ∈ Λ. The point 𝑥 − 𝑦 is the midpoint of 2𝑥 and −2𝑦, both of which lie
in 𝐾 (using that 𝐾 is centrally symmetric) and hence 𝑥 − 𝑦 lies in 𝐾 (since 𝐾 is convex). □
Note that Minkowski’s first theorem is tight for 𝐾 = [−1, 1] 𝑑 and Z𝑑 .
Proof of Minkowski’s second theorem (Theorem 7.9.4). The idea is to grow 𝐾 until we hit
a point of Λ, and then continue growing, but only in the complementary direction. However
rigorously carrying out this procedure is very tricky (and easy to get wrong).
In the argument below, 𝐾 is open (i.e., does not include the boundary). Fix a directional
basis b1 , . . . , b𝑑 . For each 1 ≤ 𝑗 ≤ 𝑑, define map 𝜙 𝑗 : 𝐾 → 𝐾 by sending each point 𝑥 ∈ 𝐾
to the center of mass of the ( 𝑗 − 1)-dimensional slice of 𝐾 which contains 𝑥 and is parallel
to spanR {b1 , . . . , b 𝑗 −1 }. In particular, 𝜙1 (𝑥) = 𝑥 for all 𝑥 ∈ 𝐾.
256 Structure of Set Addition
Define a function 𝜓 : 𝐾 → R𝑑 by
𝑑
∑︁
𝜆 𝑗 − 𝜆 𝑗 −1
𝜓(𝑥) = 𝜙 𝑗 (𝑥),
𝑗=1
2
for some continuous functions 𝑐 𝑗,𝑖 . By examining the coefficient of each b𝑖 , we find
∑︁𝑑
𝜆𝑖 𝑥𝑖
𝜓(x) = + 𝜓𝑖 (𝑥 𝑖+1 , . . . , 𝑥 𝑑 ) b𝑖
𝑖=1
2
for some continuous functions 𝜓𝑖 (𝑥𝑖+1 , . . . , 𝑥 𝑑 ), so its Jacobian 𝜕𝜓(x)/𝜕x 𝑗 with respect to
the basis (b1 , . . . , b𝑑 ) is upper triangular with diagonal (𝜆1 /2, . . . , 𝜆 𝑑 /2). Therefore
𝜆1 · · · 𝜆 𝑑
vol 𝜓(𝐾) = vol 𝐾. (7.3)
2𝑑
Í Í
For any distinct points x = 𝑥𝑖 b𝑖 , y = 𝑦 𝑖 b𝑖 in 𝐾, let 𝑘 be the largest index such that
𝑥 𝑘 ≠ 𝑦 𝑘 . Then 𝜙𝑖 (x) agrees with 𝜙𝑖 (y) for all 𝑖 > 𝑘. So
∑︁
𝑑
𝜙 𝑗 (x) − 𝜙 𝑗 (y)
𝜓(x) − 𝜓(y) = (𝜆 𝑗 − 𝜆 𝑗 −1 )
𝑗=1
2
∑︁
𝑘 ∑︁ 𝑘
𝜙 𝑗 (x) − 𝜙 𝑗 (y)
= (𝜆 𝑗 − 𝜆 𝑗 −1 ) ∈ (𝜆 𝑗 − 𝜆 𝑗 −1 )𝐾 = 𝜆 𝑘 𝐾.
𝑗=1
2 𝑗=1
The ∈ step is due to 𝐾 being centrally symmetric and convex. The coefficient of b𝑘 in
(𝜓(x) − 𝜓(y)) is 𝜆 𝑘 (𝑥 𝑘 − 𝑦 𝑘 )/2 ≠ 0. So 𝜓(x) − 𝜓(y) ∉ spanR {b1 , b2 , . . . b𝑘−1 }. But we
just saw that 𝜓(x) − 𝜓(y) ∈ 𝜆 𝑘 𝐾 Recall that 𝐾 is open, and also 𝜆 𝑘 𝐾 ∩ Λ is contained in
spanR {b1 , b2 , . . . b𝑘−1 }. Thus 𝜓(x) − 𝜓(y) ∉ Λ.
So 𝜓(𝐾) contains no two points separated by a nonzero lattice vector. By Blichfeldt’s
theorem (Theorem 7.9.6), we deduce vol 𝜓(𝐾) ≤ det Λ. Combined with (7.3), we deduce
𝜆1 · · · 𝜆 𝑑 vol 𝐾 ≤ 2𝑑 vol 𝜓(𝐾) ≤ 2𝑑 det Λ. □
Proof. By Plünnecke’s theorem, we have |8𝐴 − 8𝐴| ≤ 𝐾 16 | 𝐴|. Let 𝑁 be a prime with
𝐾 16 | 𝐴| ≤ 𝑁 ≤ 2𝐾 16 | 𝐴| (it exists by Bertrand’s postulate). By Ruzsa modeling lemma, some
𝐴′ ⊆ 𝐴 with | 𝐴′ | ≥ | 𝐴| /8 is Freiman 8-isomorphic to a subset 𝐵 of Z/𝑁Z.
Applying Bogolyubov’s lemma on 𝐵 ⊆ Z/𝑁Z, with
|𝐵| | 𝐴′ | | 𝐴| 1
𝛼= = ≥ ≥ ,
𝑁 𝑁 8𝑁 16𝐾 16
we deduce that 2𝐵 − 2𝐵 contains a Bohr set with dimension < 256𝐾 32 and width 1/4. By
Theorem 7.10.1, 2𝐵 − 2𝐵 contains a proper GAP with dimension 𝑑 < 256𝐾 32 and volume
≥ (4𝑑) −𝑑 𝑁.
Since 𝐵 is Freiman 8-isomorphic to 𝐴′ , 2𝐵 − 2𝐵 is Freiman 2-isomorphic to 2𝐴′ − 2𝐴′
(why?). Note GAPs are preserved by Freiman 2-isomorphisms (why?). Hence, the proper
GAP in 2𝐵 − 2𝐵 is mapped to a proper GAP 𝑄 ⊆ 2𝐴′ − 2𝐴′ with the same dimension (≤ 𝑑)
and volume (≥ (4𝑑) −𝑑 𝑁). We have
| 𝐴| ≤ 8 | 𝐴′ | ≤ 8𝑁 ≤ 8(4𝑑) 𝑑 |𝑄| .
Since 𝑄 ⊆ 2𝐴′ − 2𝐴′ ⊆ 2𝐴 − 2𝐴, we have 𝑄 + 𝐴 ⊆ 3𝐴 − 2𝐴. By Plünnecke’s inequality,
|𝑄 + 𝐴| ≤ |3𝐴 − 2𝐴| ≤ 𝐾 5 | 𝐴| ≤ 8𝐾 5 (4𝑑) 𝑑 |𝑄| .
7.12 Polynomial Freiman–Ruzsa Conjecture 259
By the Ruzsa covering lemma, there exists a subset 𝑋 of 𝐴 with |𝑋 | ≤ 8𝐾 5 (4𝑑) 𝑑 such that
𝐴 ⊆ 𝑋 + 𝑄 − 𝑄. It remains to contain 𝑋 + 𝑄 − 𝑄 in a GAP.
By using two elements in each direction, 𝑋 is contained in a GAP of dimension |𝑋 | − 1
and volume ≤ 2 | 𝑋 | −1 . Since 𝑄 is a proper GAP with dimension 𝑑 < 256𝐾 32 and volume
≤ |2𝐴 − 2𝐴| ≤ 𝐾 4 | 𝐴|, 𝑄 − 𝑄 is a GAP with dimension 𝑑 and volume ≤ 2𝑑 𝐾 4 | 𝐴|. It follows
that 𝐴 ⊆ 𝑋 + 𝑄 − 𝑄 is contained in a GAP with
dimension ≤ |𝑋 | − 1 + 𝑑 ≤ 8(4𝑑) 𝑑 𝐾 5 + 𝑑 − 1 = 𝑒 𝐾
𝑂 (1)
The best current result says that in Conjecture 7.12.1 one can cover 𝐴 by exp((log 𝐾) 𝑂 (1) )
cosets of 𝑉 (Sanders 2012). This is called a quasipolynomial bound.
260 Structure of Set Addition
This conjecture has several equivalent forms. Here we give some highlights. For more
details, including proofs of equivalence, see the online note accompanying Green (2005c)
titled Notes on the Polynomial Freiman–Ruzsa Conjecture.
For example, here is a formulation where we just need to use one subspace to cover a large
fraction of 𝐴.
Conjecture 7.12.2 (Polynomial Freiman–Ruzsa in F2𝑛 )
If 𝐴 ⊆ F2𝑛 , and | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then there exists an affine subspace 𝑉 ⊆ F2𝑛 with
|𝑉 | ≤ | 𝐴| such that |𝑉 ∩ 𝐴| ≥ 𝐾 −𝑂 (1) | 𝐴|.
Proof of equivalence of Conjecture 7.12.1 and Conjecture 7.12.2. Conjecture 7.12.1 im-
plies Conjecture 7.12.2 since by the pigeonhole principle, at least one of the cosets of 𝑉
covers ≥ 𝐾 −𝑂 (1) fraction of 𝐴.
Now assume Conjecture 7.12.2. Let 𝐴 ⊆ F2𝑛 with | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|. Let 𝑉 be as in
Conjecture 7.12.2. By the Ruzsa covering lemma (Theorem 7.4.1) with 𝑋 = 𝐴 and 𝐵 = 𝑉 ∩ 𝐴
we find 𝑇 ⊆ 𝑋 with |𝑇 | ≤ |𝑋 + 𝐵| /|𝑋 | ≤ | 𝐴 + 𝐴| /| 𝐴| ≤ 𝐾 such that 𝐴 ⊆ 𝑇 + 𝐵 − 𝐵 ⊆ 𝑇 +𝑉.
The conclusion of Conjecture 7.12.1 holds. □
Here is another attractive equivalent formulation of the polynomial Freiman–Ruzsa con-
jecture in F2𝑛 .
The 𝑈 3 norm plays a central role in Gowers’ proof of Szemerédi’s theorem for 4-APs (the
𝑈 3 norm is also discussed in Exercise 6.2.14).
If 𝑓 : F2𝑛 → {−1, 1} given by 𝑓 (𝑥) = (−1) 𝑞 ( 𝑥 ) where 𝑞 is a quadratic polynomial in 𝑛
variables over F2 (e.g., 𝑥1 + 𝑥1 𝑥2 + · · · ), then it is not hard to check that the expression in
the expectation above is identically 1 (it comes from taking three finite differences of 𝑞).
So ∥ 𝑓 ∥𝑈 3 = 1. For proving Szemerédi’s theorem for 4-APs, one would like a “1% inverse
result” showing that any 𝑓 : F2𝑛 → [−1, 1] satisfying ∥ 𝑓 ∥𝑈 3 ≥ 𝛿 must correlates with some
quadratic polynomial phase function (−1) 𝑞 ( 𝑥 ) . Such a result is known but it remains open
7.12 Polynomial Freiman–Ruzsa Conjecture 261
Remark 7.12.5 (Quantitative equivalence). It is known that the bounds in each of the
above conjectures are equivalent to each other up to a polynomial change. This means that
if one statement is true with conclusion ≤ 𝑓 (𝐾) then all the other statements are true with
conclusion ≤ 𝐶 𝑓 (𝐾) 𝐶 (appropriately interpreted) with some absolute constant 𝐶.
−→
Z
Z2
More generally, one can formulate the polynomial Freiman–Ruzsa conjecture in an arbi-
trary abelian group.
For both Conjecture 7.12.7 and Conjecture 7.12.9, the best current result uses exp((log 𝐾) 𝑂 (1) )
translates and dimension bound (log 𝐾) 𝑂 (1) (Sanders 2012, 2013).
Remark 7.13.2. The additive energy of 𝐴 counts 4-cycles in the bipartite Cayley graph with
generating set 𝐴. It is called an “energy” since we can write it as an 𝐿 2 quantity
∑︁
𝐸 ( 𝐴) = 𝑟 𝐴 (𝑥) 2
𝑥
where
𝒓 𝑨 (𝒙) := |{(𝑎, 𝑏) ∈ 𝐴 × 𝐴 : 𝑎 + 𝑏 = 𝑥}|
is the number of ways to write 𝑥 as the sum of two elements of 𝐴.
We have the easy bound
2 | 𝐴| 2 − | 𝐴| ≤ 𝐸 ( 𝐴) ≤ | 𝐴| 3 .
The lower bound is due to trivial solutions 𝑎 + 𝑏 = 𝑎 + 𝑏 and 𝑎 + 𝑏 = 𝑏 + 𝑎. The lower bound
is tight for sets without non-trivial solutions to 𝑎 + 𝑏 = 𝑐 + 𝑑. The upper bound is due to 𝑑
being determined by 𝑎, 𝑏, 𝑐 when 𝑎 + 𝑏 = 𝑐 + 𝑑. It is tight when 𝐴 is a subgroup.
Here is the main question we explore in this section.
Question 7.13.3
What is the relationship between small doubling and large additive energy? (Both encode
some notion of “lots of additive structure.”)
We will prove a version of the theorem allowing two different sets. Given two finite sets
𝐴 and 𝐵 in an abelian group, define their additive energy to be
𝑬 (𝑨, 𝑩) := |{(𝑎, 𝑏, 𝑎 ′ , 𝑏 ′ ) ∈ 𝐴 × 𝐵 × 𝐴 × 𝐵 : 𝑎 + 𝑏 = 𝑎 ′ + 𝑏 ′ }| .
Then 𝐸 ( 𝐴, 𝐴) = 𝐸 ( 𝐴).
Proof that Theorem 7.13.7 implies Theorem 7.13.6. Suppose 𝐸 ( 𝐴) ≥ | 𝐴| 3 /𝐾. Apply The-
orem 7.13.7 with 𝐵 = 𝐴 to obtain 𝐴′ , 𝐵′ ⊆ 𝐴 with | 𝐴′ | , |𝐵′ | ≥ 𝐾 −𝑂 (1) | 𝐴| and | 𝐴′ + 𝐵′ | ≤
𝐾 𝑂 (1) | 𝐴|. Then by Corollary 7.3.6, a variant of the Ruzsa triangle inequality, we have
| 𝐴′ + 𝐵 ′ | 2
| 𝐴′ + 𝐴′ | ≤ ≤ 𝐾 𝑂 (1) | 𝐴| .
|𝐵′ |
264 Structure of Set Addition
□
We will prove Theorem 7.13.7 by setting up a graph.
Proof that Theorem 7.13.9 implies Theorem 7.13.7. Denote the number of ways to write 𝑥
as 𝑎 + 𝑏 by
𝒓 𝑨,𝑩 (𝒙) := |{(𝑎, 𝑏) ∈ 𝐴 × 𝐵 : 𝑎 + 𝑏 = 𝑥}| .
Consider the “popular sums”
n 𝑛 o
𝑆 = 𝑥 ∈ 𝐴 + 𝐵 : 𝑟 𝐴,𝐵 (𝑥) ≥
2𝐾
Build a bipartite graph 𝐺 with bipartition 𝐴 ∪ 𝐵 such that (𝑎, 𝑏) ∈ 𝐴 × 𝐵 is an edge if and
only if 𝑎 + 𝑏 ∈ 𝑆.
We claim that 𝐺 has many edges, by showing that “unpopular sums” account for at most
half of 𝐸 ( 𝐴, 𝐵). Note that
𝑛3 ∑︁ ∑︁
≤ 𝐸 ( 𝐴, 𝐵) = 𝑟 𝐴,𝐵 (𝑥) 2 + 𝑟 𝐴,𝐵 (𝑥) 2 . (7.4)
𝐾 𝑥 ∈𝑆 𝑥∉𝑆
Because 𝑟 𝐴,𝐵 (𝑥) < 𝑛/(2𝐾) when 𝑥 ∉ 𝑆, we can bound the second term as
∑︁ 𝑛 ∑︁ 𝑛 𝑛3
𝑟 𝐴,𝐵 (𝑥) 2 ≤ 𝑟 𝐴,𝐵 (𝑥) ≤ | 𝐴| |𝐵| ≤ ,
𝑥∉𝑆
2𝐾 𝑥∉𝑆 2𝐾 2𝐾
and setting back into (7.4) yields
𝑛3 ∑︁ 𝑛3
≤ 𝑟 𝐴,𝐵 (𝑥) 2 + ,
𝐾 𝑥 ∈𝑆
2𝐾
and so
∑︁ 𝑛3
𝑟 𝐴,𝐵 (𝑥) 2 ≥ .
𝑥 ∈𝑆
2𝐾
7.13 Additive Energy and the Balog–Szemerédi–Gowers Theorem 265
The proof uses the dependent random choice technique from Section 1.7. Instead of
quoting theorems from that section, let us prove the result from scratch.
𝐴 𝐵
𝑣
𝑈
𝑥
𝑦
Proof. Say that a pair (𝑥, 𝑦) ∈ 𝐴2 is “unfriendly” if it has < 𝜀𝛿2 |𝐵| /2 common neighbors.
Choose 𝑣 ∈ 𝐵 uniformly at random and let 𝑈 = 𝑁 (𝑣) be its neighborhood in 𝑣. We have
𝑒(𝐺)
E |𝑈| = E |𝑁 (𝑣)| = ≥ 𝛿 | 𝐴| .
|𝐵|
For each fixed pair (𝑥, 𝑦) ∈ 𝐴2 , we have
codeg(𝑥, 𝑦)
P(𝑥, 𝑦 ∈ 𝑈) = P(𝑥, 𝑦 ∈ 𝑁 (𝑣)) = .
|𝐵|
So if (𝑥, 𝑦) is unfriendly, then P(𝑥, 𝑦 ∈ 𝑈) < 𝜀𝛿2 /2. Let 𝑋 be the number of unfriendly pairs
(𝑥, 𝑦) ∈ 𝑈 2 . Then
∑︁ 𝜀𝛿2
E𝑋 = P(𝑥, 𝑦 ∈ 𝑈) < | 𝐴| 2 .
2
2
( 𝑥,𝑦) ∈ 𝐴
unfriendly
266 Structure of Set Addition
Hence, we have
𝑋
2 E𝑋 𝛿2
E |𝑈| − ≥ (E |𝑈|) 2 − > | 𝐴| 2 .
𝜀 𝜀 2
So for some 𝑣 ∈ 𝐵, 𝑈 = 𝑁 (𝑣) satisfies
𝑋 𝛿2
|𝑈| 2 − ≥ | 𝐴| 2 .
𝜀 2
Then this 𝑈 ⊆ 𝐴 satisfies |𝑈| 2 ≥ 𝛿2 | 𝐴| 2 /2, and so |𝑈| ≥ 𝛿 | 𝐴| /2. Moreover, we have
𝑋 ≤ 𝜀 |𝑈| 2 , so at most 𝜀-fraction of pairs (𝑥, 𝑦) ∈ 𝑈 2 are unfriendly. □
𝐴 𝐴
𝐴1
𝐴2
𝐵′
𝐴′
𝑏
Proof of Theorem 7.13.9 (Graph BSG). Since 𝑒(𝐺) ≥ 𝑛2 /𝐾, we have | 𝐴| , |𝐵| ≥ 𝑛/𝐾. By
the path of length 3 lemma (Lemma 7.13.11), we can find 𝐴′ ⊆ 𝐴 and 𝐵′ ⊆ 𝐵 each with
size ≥ 𝐾 −𝑂 (1) 𝑛 such that for every (𝑎, 𝑏) ∈ 𝐴′ × 𝐵′ , there are ≥ 𝐾 −𝑂 (1) 𝑛2 paths 𝑎𝑏 1 𝑎 1 𝑏 in
𝐺 with 𝑎 1 ∈ 𝐴 and 𝑏 1 ∈ 𝐵. Then, with
𝑥 = 𝑎 + 𝑏1, 𝑦 = 𝑎1 + 𝑏1, 𝑧 = 𝑎 1 + 𝑏,
we have
𝑎 + 𝑏 = 𝑥 − 𝑦 + 𝑧.
This shows that every element of 𝐴′ + 𝐵′ can be written as 𝑥 − 𝑦 + 𝑧 for some 𝑥 − 𝑦 + 𝑧 in
≥ 𝐾 −𝑂 (1) 𝑛2 ways (for a given (𝑎, 𝑏) ∈ 𝐴′ × 𝐵′ , these choices of 𝑥, 𝑦, 𝑧 are genuinely distinct;
why?). Thus
𝐾 −𝑂 (1) 𝑛2 | 𝐴′ + 𝐵′ | ≤ | 𝐴 +𝐺 𝐵| 3 ≤ 𝐾 3 𝑛3 .
Therefore | 𝐴′ + 𝐵′ | ≤ 𝐾 𝑂 (1) 𝑛. □
268 Structure of Set Addition
Further Reading
See Ruzsa’s lecture notes Sumsets and Structure (2009) for a comprehensive introduction to
many topics related to set addition, including but not limited to Freiman’s theorem.
Sanders’ article The Structure of Set Addition Revisited (2013) provides a modern expo-
sition of Freiman’s theorem and his proof of the quasipolynomial Freiman–Ruzsa theorem.
Lovett’s article An Exposition of Sanders’ Quasi-Polynomial Freiman–Ruzsa Theorem (2015)
gives a gentle exposition of Sanders’ proof in F2𝑛 .
The methods discussed in this chapter play a central role in Gowers’ proof of Szemerédi’s
theorem. The proof for 4-APs is especially worth studying, It contains many beautiful ideas
and shows how these the topics in this chapter and the previous chapter are closely linked.
See the original paper by Gowers (1998a) on Szemerédi’s theorem for 4-APs as well as
excellent lecture notes by Gowers (1998b), Green (2009b), and Soundararajan (2007).
Chapter Summary
• Freiman’s theorem. Every 𝐴 ⊆ Z with | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| is contained in a generalized
arithmetic progression (GAP) of dimension ≤ 𝑑 (𝐾) and volume ≤ 𝑓 (𝐾) | 𝐴|,
– Informally: a set with small doubling is contained in a small GAP.
– Up to constants, this gives a complete characterization of integer sets with bounded
doubling.
• Rusza triangle inequality. | 𝐴| |𝐵 − 𝐶 | ≤ | 𝐴 − 𝐵| | 𝐴 − 𝐶 |.
• Plünnecke’s inequality | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| implies |𝑚 𝐴 − 𝑛𝐴| ≤ 𝐾 𝑚+𝑛 | 𝐴|.
• Ruzsa covering lemma. Idea: take a maximally disjoint set of translates, and their expan-
sions must cover the entire space.
• Freiman’s theorem in groups with bounded exponent. A set with bounded doubling is
contained in a small subgroup.
• Freiman 𝑠-homomorphisms are maps preserving 𝑠-fold sums.
• Ruzsa modeling lemma. A set of integers with small doubling can be partially modeled
as a large fraction of a cyclic group via a Freiman isomorphism.
• Bogolyubov’s lemma. If 𝐴 is large, then 2𝐴 − 2𝐴 contains a large subspace (finite field
model) or GAP (cyclic group).
• A large Bohr set contains a large GAP. Proof uses Minkowski’s second theorem from
the geometry of numbers.
• Polynomial Freiman–Ruzsa conjeture: a central conjecture in additive combinatorics.
The finite field model version has several equivalent and attractive statements, one of
which says: if 𝐴 ⊆ F2𝑛 , and | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then 𝐴 can be covered using 𝐾 𝑂 (1) translates
of some subspace with cardinality ≤ | 𝐴|.
• The additive energy 𝐸 ( 𝐴) of a set 𝐴 is the number of solutions to 𝑎 + 𝑏 = 𝑐 + 𝑑 in 𝐴.
• Balog–Szemerédi–Gowers theorem. If 𝐸 ( 𝐴) ≥ | 𝐴| 3 /𝐾, then 𝐴 has a subset 𝐴′ with
| 𝐴′ | ≥ 𝐾 −𝑂 (1) | 𝐴| and | 𝐴′ + 𝐴′ | ≤ 𝐾 𝑂 (1) | 𝐴′ |.
– Informally: a set with large additive energy contains a large subset with small doubling.
8
Sum-Product Problem
Chapter Highlights
• The sum-product problem: show either 𝐴 + 𝐴 or 𝐴 · 𝐴 must be large
• Erdős multiplication table problem
• Crossing number inequality: lower bound on the number of crossings in a graph drawing
• Szemerédi–Trotter theorem on point-line incidences
• Elekes’ sum-product bound using incidence geometry
• Solymosi’s sum-product bound via multiplicative energy
Arithmetic progressions have small additive doubling, while geometric progressions have
small multiplicative doubling. However, perhaps a set cannot simultaneously look both like
an arithmetic and a geometric progression.
Erdős & Szemerédi (1983) conjectured that at least one of 𝐴 + 𝐴 and 𝐴𝐴 is close to
quadratic size.
269
270 Sum-Product Problem
This is asking for the number of distinct entries that appear in the 𝑁 × 𝑁 multiplication
table.
1 2 3 4 5 9 10 · · ·
6 7 8
2 4 6 8 10 12 14 16 18 20 · · ·
3 6 9 12 15 18 21 24 27 30 · · ·
4 8 12 16 20 24 28 32 36 40 · · ·
5 10 15 20 25 30 35 40 45 50 · · ·
6 12 18 24 30 36 42 48 54 60 · · ·
7 14 21 28 35 42 49 56 63 70 · · ·
8 16 24 32 40 48 56 64 72 80 · · ·
9 18 27 36 45 54 63 72 81 90 · · ·
10 20 30 40 50 60 70 80 90 100 · · ·
.. .. .. .. .. .. .. .. .. .. . .
. . . . . . . . . . .
After much work, we now have a satisfactory answer. A precise estimate was given by
Ford (2008):
𝑁2
| [𝑁] · [𝑁]| = Θ
(log 𝑁) 𝛿 (log log 𝑁) 3/2
where 𝛿 = 1 − (1 + log log 2)/log 2 ≈ 0.086. Here we give a short proof of some weaker
estimates (Erdős 1955).
𝑁2
(1 − 𝑜(1)) ≤ | [𝑁] · [𝑁]| = 𝑜(𝑁 2 )
2 log 𝑁
This already show that is false that at least one of 𝐴 + 𝐴 and 𝐴𝐴 has size ≥ 𝑐 | 𝐴| 2 . So we
cannot remove the −𝑜(1) term from the exponent in the sum-product conjecture.
To prove Theorem 8.1.2, we apply the following fact from number theory due to Hardy &
Ramanujan (1917). A short probabilistic method proof was given by Turán (1934); also see
Alon & Spencer (2016, Section 4.2).
Proof of Theorem 8.1.2. First let us prove the upper bound. By the Hardy–Ramanujan
theorem, all but at most 𝑜(𝑁 2 ) of the elements of [𝑁] · [𝑁] have (2 + 𝑜(1)) log log 𝑁
prime factors. However, by the Hardy–Ramanujan theorem again, all but 𝑜(𝑁 2 ) of posi-
tive integers ≤ 𝑁 2 have (1 + 𝑜(1)) log log(𝑁 2 ) = (1 + 𝑜(1)) log log 𝑁 prime factors, and
thus cannot appear in [𝑁] · [𝑁]. Hence |[𝑁] · [𝑁]| = 𝑜(𝑁 2 ). (Remark: this proof gives
|[𝑁] · [𝑁]| = 𝑂 (𝑁 2 /log log 𝑁).)
Now let us prove the lower bound by giving a lower bound to the number of positive
integers ≤ 𝑁 2 of the form 𝑝𝑚, where 𝑝 is a prime in (𝑁 2/3 , 𝑁] and 𝑚 ≤ 𝑁. Every such 𝑛
has at most 2 such representations as 𝑝𝑚 since 𝑛 ≤ 𝑁 2 can have at most two prime factors
greater than 𝑁 2/3 . There are (1 + 𝑜(1))𝑁/log 𝑁 primes in (𝑁 2/3 , 𝑁] by the prime number
theorem. So the number of distinct such 𝑝𝑚 is ≥ (1/2 − 𝑜(1))𝑁 2 /log 𝑁. □
Remark 8.1.4. The lower bound (up to a constant factor) also follows from Solymosi’s
sum-product estimate that we will see later in Theorem 8.3.1.
Proof. For any connected planar graph 𝐺 = (𝑉, 𝐸) with at least one cycle, we have 3 |𝐹 | ≤
2 |𝐸 |, with |𝐹 | denoting the number of faces (including the outer face). The inequality follows
from double counting using that every face is adjacent to at least three edges and that every
edge is adjacent to at most two faces. By Euler’s formula, |𝑉 | − |𝐸 | + |𝐹 | = 2. Replacing |𝐹 |
using 3 |𝐹 | ≤ 2 |𝐸 |, we obtain |𝐸 | ≤ 3 |𝑉 | − 6. Therefore |𝐸 | ≤ 3 |𝑉 | holds for every planar
graph 𝐺 including ones that are not connected or do not have a cycle.
If an arbitrary graph 𝐺 = (𝑉, 𝐸) satisfies |𝐸 | > 3 |𝑉 |, then any drawing of 𝐺 can
be made planar by deleting at most cr(𝐺) edges, one for each crossing. It follows that
|𝐸 | − cr(𝐺) ≥ 3 |𝑉 | . Therefore, the following inequality holds universally for all graphs
𝐺 = (𝑉, 𝐸):
cr(𝐺) ≥ |𝐸 | − 3 |𝑉 | . (8.1)
Now we apply a probabilistic method technique to “boost” the above inequality to denser
graphs. Let 𝐺 = (𝑉, 𝐸) be a graph with |𝐸 | ≥ 4 |𝑉 |. Let 𝑝 ∈ [0, 1] be some real number to
be determined and let 𝐺 ′ = (𝑉 ′ , 𝐸 ′ ) be a graph obtained by independently randomly keeping
each vertex of 𝐺 with probability 𝑝. By (8.1), we have cr(𝐺 ′ ) ≥ |𝐸 ′ | − 3 |𝑉 ′ | for every 𝐺 ′ .
Therefore the same inequality must hold if we take the expected values of both sides:
E cr(𝐺 ′ ) ≥ E |𝐸 ′ | − 3E |𝑉 ′ | .
We have E |𝐸 ′ | = 𝑝 2 |𝐸 | since an edge remains in 𝐺 ′ if and only if both of its endpoints
are kept. Similarly E |𝑉 ′ | = 𝑝 |𝑉 |. By keeping the same drawing, we get the inequality
𝑝 4 cr(𝐺) ≥ E cr(𝐺 ′ ). Therefore
cr(𝐺) ≥ 𝑝 −2 |𝐸 | − 3𝑝 −3 |𝑉 | .
Finally set 𝑝 = 4 |𝑉 | /|𝐸 | ∈ [0, 1] (here we use |𝐸 | ≥ 4 |𝑉 |) to get cr(𝐺) ≳ |𝐸 | 3 /|𝑉 | 2 . □
One trivial upper bound is |P | |L|. We can get a better bound by using the fact that every
8.2 Crossing Number Inequality and Point-Line Incidences 273
Corollary 8.2.6
The number of point-line incidences between 𝑛 points and 𝑛 lines in R2 is 𝑂 (𝑛4/3 ).
We will see a short proof using the crossing number inequality due to Székely (1997).
Since the inequality is false over finite fields, any proof necessarily requires the topology of
the real plane (via the application of Euler’s theorem in the proof of the crossing number
inequality).
Example 8.2.7. The bounds in both Theorem 8.2.5 and Corollary 8.2.6 are best possible
up to a constant factor. Here is an example showing that Corollary 8.2.6 is tight. Let P =
[𝑘] × [2𝑘 2 ] and L = {𝑦 = 𝑚𝑥 + 𝑏 : 𝑚 ∈ [𝑘], 𝑏 ∈ [𝑘 2 ]}. Then every line in L contains 𝑘
points from P, so 𝐼 (P, L) = 𝑘 4 = Θ(𝑛4/3 ).
274 Sum-Product Problem
𝑏 =9
𝑏 =9
𝑏 =9 𝑏 =8
𝑏 =8
𝑏 =8 𝑏 =7
𝑏 =7
𝑏 =7 𝑏 =6
𝑏 =6
𝑏 =6 𝑏 =5
𝑏 =5
𝑏 =5 𝑏 =4
𝑏 =4
𝑏 =4 𝑏 =3
𝑏 =3
𝑏 =3 𝑏 =2
𝑏 =2
𝑏 =2 𝑏 =1
𝑏 =1
𝑏 =1
(1, 1) (1, 1) (1, 1)
Proof of Theorem 8.2.5. We remove all lines in L containing at most one point in P. These
lines contribute to at most |L| incidences and thus does not affect the inequality we wish to
prove.
Now assume that every line in L contains at least two points of P. Turn every point of P
into a vertex and each line in L into edges connecting consecutive points of P on the line.
This constructs a drawing of a graph 𝐺 = (𝑉, 𝐸) on the plane.
−→
P and L graph 𝐺
Assume that 𝐼 (P, L) ≥ 8 |P | holds (otherwise we are done as 𝐼 (P, L) ≲ |P |). Each line
in L with 𝑘 incidences has 𝑘 − 1 ≥ 𝑘/2 edges. So |𝐸 | ≥ 𝐼 (P, L)/2 ≥ 4 |𝑉 |. The crossing
number inequality (Theorem 8.2.3) gives
|𝐸 | 3 𝐼 (P, L) 3
cr(𝐺) ≳ ≳ .
|𝑉 | 2 |P | 2
Moreover cr(𝐺) ≤ |L| 2 since every pair of lines intersect in at most one point. Rearranging
gives 𝐼 (P, L) ≲ |𝑃| 2/3 |𝐿| 2/3 . (Remember the linear contributions |P | + |L| that need to be
added back in due to the assumptions made earlier in the proof.) □
Now we are ready to prove the sum-product estimate in Theorem 8.2.1 for 𝐴 ⊆ R:
| 𝐴 + 𝐴| | 𝐴𝐴| ≳ | 𝐴| 5/2 .
Proof of Theorem 8.2.1. In R2 , consider a set of points
P = {(𝑥, 𝑦) : 𝑥 ∈ 𝐴 + 𝐴, 𝑦 ∈ 𝐴𝐴}
and a set of lines
L = {𝑦 = 𝑎(𝑥 − 𝑎 ′ ) : 𝑎, 𝑎 ′ ∈ 𝐴}.
8.3 Sum-Product via Multiplicative Energy 275
For a line 𝑦 = 𝑎(𝑥 − 𝑎 ′ ) in L, (𝑎 ′ + 𝑏, 𝑎𝑏) ∈ P is on the line for all 𝑏 ∈ 𝐴, so each line in L
contains ≥ | 𝐴| incidences. By definition of P and L, we have
|P | = | 𝐴 + 𝐴| | 𝐴𝐴| and |L| = | 𝐴| 2 .
By the Szemerédi–Trotter theorem (Theorem 8.2.5),
| 𝐴| 3 = | 𝐴| |L| ≤ 𝐼 (P, L) ≲ |P | 2/3 |L| 2/3 + |P | + |L|
≲ | 𝐴 + 𝐴| 2/3 | 𝐴𝐴| 2/3 | 𝐴| 4/3 .
The contributions from |P | + |L| are lower order as |P | = | 𝐴 + 𝐴| | 𝐴𝐴| ≤ | 𝐴| 4 = |L| 2 and
|L| = | 𝐴| 2 ≤ | 𝐴 + 𝐴| 2 | 𝐴𝐴| 2 = |P | 2 . Rearranging the above inequality gives
| 𝐴 + 𝐴| | 𝐴𝐴| ≳ | 𝐴| 5/2 . □
In Section 1.4, we proved an 𝑂 (𝑛3/2 ) upper bound on the unit distance problem (Ques-
tion 1.4.6) using the extremal number of 𝐾2,3 . The next exercise gives an improved bound
(in fact the best known result to date).
Exercise 8.2.8 (Unit distance bound). Using the crossing number inequality, prove given
𝑛 points in the plane, at most 𝑂 (𝑛4/3 ) pairs of points are separated by exactly unit distance.
Write
𝑟 (𝑠) = |{(𝑎, 𝑏) ∈ 𝐴 × 𝐴 : 𝑠 = 𝑎/𝑏}| .
We have
∑︁
𝐸 × ( 𝐴) = 𝑟 (𝑠) 2 .
𝑠∈ 𝐴/𝐴
ℓ𝑚+1
ℓ2
𝐴
𝐿2
𝐿1 + 𝐿2
ℓ1
𝐿1
The statement is false over non-prime fields, since we could take 𝐴 to be a subfield.
Informally, the above theorem says that a prime field does not have any approximate sub-
rings.
Further Reading
Dvir’s survey Incidence Theorems and Their Applications (2012) discusses many interesting
related topics including incidence geometry and additive combinatorics together with their
applications to computer science.
Guth’s book The Polynomial Method in Combinatorics (2016) gives an in-depth discussion
of incidence geometry in R2 and R3 leading to a proof of the solution of the Erdős distinct
distances problem by Guth & Katz (2015).
Sheffer’s book Polynomial Methods and Incidence Theory (2022) provides an introduction
to incidence geometry and related topics.
Chapter Summary
Chapter Highlights
• The Green–Tao theorem: proof strategy
• A relative Szemerédi theorem and its proof: a central ingredient in the proof of the
Green–Tao theorem
• Transference principle: applying Szemerédi’s theorem as a black box to the sparse pseu-
dorandom setting
• A graph theoretic approach
• Dense model theorem: modeling a sparse set by a dense set
• Sparse triangle counting lemma
In this chapter we discuss a celebrated theorem by Green & Tao (2008) that settled a
folklore conjecture about primes.
The proof of this stunning result uses sophisticated ideas from both combinatorics and
number theory. As stated in the abstract of their paper:
[T]he main new ingredient of this paper . . . is a certain transference principle. This allows us to deduce from
Szemerédi’s theorem that any subset of a sufficiently pseudorandom set (or measure) of positive relative
density contains progressions of arbitrary length.
The main goal of this chapter is to explain what the above paragraph means. As Green
(2007b) writes (emphasis in original):
Our main advance, then, lies not in our understanding of the primes but rather in what we can say about
arithmetic progressions.
We will abstract away ingredients related to prime numbers (see Further Reading at the end
of the chapter) and instead focus on the central combinatorial result: a relative Szemerédi
theorem. We follow the graph theoretic approach by Conlon, Fox, & Zhao (2014, 2015),
which simplified both the hypotheses and the proof of the relative Szemerédi theorem.
279
280 Progressions in Sparse Pseudorandom Sets
In other words, every subset of primes with positive relative density contains arbitrarily
long arithmetic progressions.
Remark 9.1.4 (Residue biases in the primes and the 𝑊 -trick). There are certain local
biases that get in the way of pseudorandomness for primes. For example, all primes greater
than 2 are odd, all primes greater than 3 are not divisible by 3, and so on. In this way,
the primes look different from a subset of positive integers where each 𝑛 is included with
probability 1/log 𝑛 independently at random.
The 𝑾-trick corrects these residue class biases. Let 𝑤 = 𝑤(𝑁) be a function with 𝑤 → ∞
Î
slowly as 𝑁 → ∞. Let 𝑊 = 𝑝≤𝑤 𝑝 be the product of primes up to 𝑤. The 𝑊-trick tells
us to only consider primes that are congruent to 1 mod 𝑊. The resulting set of “𝑊-tricked
primes” {𝑛 : 𝑛𝑊 + 1 is prime} does not have any bias modulo a small fixed prime. The
relative Szemerédi theorem should be applied to the 𝑊-tricked primes.
We shall not dwell on the analytic number theoretic arguments here. See Further Reading
at the end of the chapter for references. For example, Conlon, Fox, & Zhao (2014, Sections
8 and 9) gives an exposition of the construction of the “almost primes” and the proofs of its
properties.
The goal of the rest of the chapter is to state and prove the relative Szemerédi theorem.
We would like to formulate a result of the following form, where Z/𝑁Z is replaced by a
sparse pseudorandom host set 𝑆.
Relative Roth theorem (informal). If 𝑆 ⊆ Z/𝑁Z satisfies certain pseudorandomness
conditions, then every 3-AP-free subset of 𝑆 has size 𝑜(|𝑆|).
In what sense should 𝑆 behave pseudorandomly? It will be easiest to explain the pseudo-
random hypothesis using a graph.
Consider the following construction of a graph 𝐺 𝑆 that we saw in Chapter 6 (in particular
Sections 2.4 and 2.10).
282 Progressions in Sparse Pseudorandom Sets
𝑥 ∼ 𝑦 iff
2𝑥 + 𝑦 ∈ 𝑆
𝑥 ∼ 𝑧 iff 𝑦 ∼ 𝑧 iff
𝑥−𝑧 ∈𝑆 −𝑦 − 2𝑧 ∈ 𝑆
𝑧
𝑍 = Z/𝑁Z
Here 𝐺 𝑆 is a tripartite graph with vertex sets 𝑋, 𝑌 , 𝑍, each being a copy of Z/𝑁Z. Its edges
are:
• (𝑥, 𝑦) ∈ 𝑋 × 𝑌 whenever 2𝑥 + 𝑦 ∈ 𝑆;
• (𝑥, 𝑧) ∈ 𝑋 × 𝑍 whenever 𝑥 − 𝑧 ∈ 𝑆;
• (𝑦, 𝑧) ∈ 𝑌 × 𝑍 whenever −𝑦 − 2𝑧 ∈ 𝑆.
This graph 𝐺 𝑆 is designed so that (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍 is a triangle if and only if
2𝑥 + 𝑦, 𝑥 − 𝑧, −𝑦 − 2𝑧 ∈ 𝑆.
Note that these three terms form a 3-AP with common difference −𝑥 − 𝑦 − 𝑧. So the triangles
in 𝐺 𝑆 precisely correspond to 3-APs in 𝑆 (it is an 𝑁-to-1 correspondence).
The following definition is a variation of homomorphism density from Section 4.3.
𝑋 𝑌
−→
𝐹 𝐺𝑆 𝑍
Now we define the desired pseudorandomness hypotheses on 𝑆 ⊆ Z/𝑁Z, which says that
the associated graph 𝐺 𝑆 has certain subgraph counts close to random.
9.2 Relative Szemerédi Theorem 283
𝐹 ⊆ 𝐾2,2,2
In other words, comparing the graph 𝐺 𝑆 to a random tripartite graph with the same edge
density 𝑝, these two graphs have approximately the same 𝐹-density whenever 𝐹 ⊆ 𝐾2,2,2 .
Alternatively, we can state the 3-linear forms condition explicitly without referring to
graphs. This is done by expanding the definition of 𝐺 𝑆 . Let 𝑥0 , 𝑥1 , 𝑦 0 , 𝑦 1 , 𝑧0 , 𝑧1 ∈ Z/𝑁Z be
chosen independently and uniformly at random. Then 𝑆 ⊆ Z/𝑁Z with |𝑆| = 𝑝𝑁 satisfies the
3-linear forms condition with tolerance 𝜀 if the probability that
−𝑦 0 − 2𝑧0 , 𝑥 0 − 𝑧0 , 2𝑥0 + 𝑦 0 ,
−𝑦 1 − 2𝑧0 ,
𝑥 1 − 𝑧0 , 2𝑥1 + 𝑦 0 ,
⊆𝑆
−𝑦 0 − 2𝑧1 , 𝑥 0 − 𝑧1 , 2𝑥0 + 𝑦 1 ,
−𝑦 1 − 2𝑧1 ,
𝑥 1 − 𝑧1 , 2𝑥1 + 𝑦 1
lies in the interval (1 ± 𝜀) 𝑝 12 , and furthermore the same holds if we erase any subset of
the above 12 linear forms and also change the “12” in 𝑝 12 to the number of linear forms
remaining.
Remark 9.2.4. This 𝐾2,2,2 condition is reminiscent of the 𝐶4 -count condition for the quasir-
andom graph in Theorem 3.1.1 by Chung, Graham, & Wilson (1989). Just as how 𝐶4 = 𝐾2,2
is a 2-blow-up of a single edge, 𝐾2,2,2 is a 2-blow-up of a triangle.
2-blow-up 2-blow-up
−−−−−−−−→ −−−−−−−−→
The 3-linear forms condition can be viewed as a “second moment” condition with respect to
triangles. It is needed in the proof of the sparse triangle counting lemma later.
We are now ready to state a precise formulation of the relative Roth theorem.
284 Progressions in Sparse Pseudorandom Sets
Remark 9.2.8 (History). The above formulations of relative Roth and Szemerédi theorems
are due to Conlon, Fox, & Zhao (2015). The original approach by Green & Tao (2008)
required in addition another technical hypothesis on 𝑆 known as the “correlation condition,”
which is no longer needed.
9.3 Transference Principle 285
The threshold 𝐶𝑁 −1/(𝑘−1) is optimal up to the constant 𝐶. Indeed, the expected number
of 𝑘-APs in 𝑆 is 𝑂 ( 𝑝 𝑘 𝑁 2 ), which is less than half of E |𝑆| = 𝑝𝑁 if 𝑝 < 𝑐𝑁 −1/(𝑘−1) for
a sufficiently small constant 𝑐 > 0. One can delete from 𝑆 an element from each 𝑘-AP
contained in 𝑆. So with high probability, this process deletes at most half of 𝑆, and the
remaining subset of 𝑆 is 𝑘-AP-free.
The hypergraph container method gives another proof of the above result, plus much
more (Balogh, Morris, & Samotij 2015; Saxton & Thomason 2015). See the survey The
method of hypergraph containers by Balogh, Morris, & Samotij (2018) for more on this
topic.
Exercise 9.2.11 (Random sets and the linear forms condition). Let 𝑆 ⊆ Z/𝑁Z be a
random set where every element of Z/𝑁Z is included in 𝑆 independently with probability
𝑝.
Prove that there is some 𝑐 > 0 so that for every 𝜀 > 0 there is some 𝐶 > 0 so that as
long as 𝑝 > 𝐶𝑁 −𝑐 and 𝑁 is large enough, with probability at least 1 − 𝜀, 𝑆 satisfies the
3-linear forms condition with tolerance 𝜀 . What is the optimal 𝑐?
Hint: Use the second moment method; see Alon & Spencer (2016, Chapter 4).
Since the “dense model” 𝐵 has size ≥ 𝛿𝑁/2, by the counting version of Szemerédi’s
theorem, 𝐵 has ≳ 𝛿 𝑁 2 𝑘-APs, and hence 𝐴 has ≳ 𝛿 𝑝 𝑘 𝑁 2 𝑘-APs by the sparse counting
lemma. So in particular, 𝐴 cannot be 𝑘-AP-free. This finishes the proof sketch of the relative
Szemerédi theorem.
Now that we have seen the above outline, it remains to formulate and prove:
• a dense model theorem, and
• a sparse counting lemma.
We will focus on explaining the 3-AP case (i.e., relative Roth theorem) in the rest of this
chapter. The 3-AP setting is notationally simpler than that of 𝑘-AP. It is straightforward to
generalize the 3-AP proof to 𝑘-APs following the (𝑘 −1)-uniform hypergraph setup discussed
in the previous section.
This is essentially the graphon cut norm applied to the (not necessarily symmetric) function
Γ × Γ → R given by (𝑥, 𝑦) ↦→ 𝑓 (𝑥 + 𝑦).
As should be expected from the equivalence of DISC and EIG for quasirandom Cayley
graphs (Theorem 3.5.3), having small cut norm is equivalent to being Fourier uniform.
Exercise 9.4.1. Show that for all 𝑓 : Γ → R,
𝑐∥ b
𝑓 ∥∞ ≤ ∥ 𝑓 ∥□ ≤ ∥ b
𝑓 ∥∞,
where 𝑐 is some absolute constant (not depending on Γ or 𝑓 ).
Remark 9.4.2 (Generalizations to 𝑘 -APs). The above definition is tailored to 3-APs. For
4-APs, we should define the corresponding norm of 𝑓 as
sup E 𝑥,𝑦,𝑧 ∈Γ [ 𝑓 (𝑥 + 𝑦 + 𝑧)1 𝐴 (𝑥, 𝑦)1 𝐵 (𝑥, 𝑧)1𝐶 (𝑦, 𝑧)] .
𝐴,𝐵,𝐶 ⊆Γ×Γ
(The more obvious guess of using 1 𝐴 (𝑥)1 𝐵 (𝑦)1𝐶 (𝑧) instead of the above turns out to be insuf-
ficient for proving the relative Szemerédi theorem. A related issue in the context of hypergraph
regularity was discussed in Section 2.11.) The generalization to 𝑘-APs is straightforward.
However, for 𝑘 ≥ 4, the above norm is no longer equivalent to Fourier uniformity. This is
why we study ∥ 𝑓 ∥ □ norm instead of ∥ b𝑓 ∥ ∞ in this section.
Informally, the main result of this section says that if a sparse set 𝑆 is close to random
in normalized cut norm, then every subset 𝐴 ⊆ 𝑆 can be approximated by some dense
𝐵 ⊆ Z/𝑁Z in normalized cut norm.
Remark 9.4.4 (3-linear forms condition implies small cut norm). The cut norm hypothesis
is weaker than the 3-linear forms condition, as can be proved by two applications of the
Cauchy–Schwarz inequality (for example, see the proof of Lemma 9.5.2 in the next section).
In short, ∥𝜈 − 1∥ 4□ ≤ 𝑡 (𝐾2,2 , 𝜈 − 1).
Remark 9.4.5 (Set instead of function). We can replace the function 𝑔 by a random set
𝐵 ⊆ Γ where each 𝑥 ∈ Γ is included in 𝐵 with probability 𝑔(𝑥). By standard concentration
bounds, changing 𝑔 to 𝐵 induces a negligible effect on 𝜀 if Γ is large enough. It is important
here that 𝑔(𝑥) ∈ [0, 1] for all 𝑥 ∈ Γ.
So the above theorem says, given a sparse pseudorandom host set 𝑆, any subset of 𝑆 can
be modeled by dense set 𝐵 that is close to 𝐴 with respect to the normalized cut norm.
288 Progressions in Sparse Pseudorandom Sets
It will be more natural to prove the above theorem a bit more generally where sets
𝐴 ⊆ 𝑆 ⊆ Γ are replaced by functional analogs. Since these are sparse sets, we should scale
indicator functions as follows:
𝑓 = 𝑝 −1 1 𝐴 and 𝜈 = 𝑝 −1 1𝑆 .
Then 𝑓 ≤ 𝜈 pointwise. Note that 𝑓 and 𝜈 take values in [0, 𝑝 −1 ], unlike 𝑔, which takes values
in [0, 1]. The normalization is such that E𝜈 = 1. Here is the main result of this section.
The rest of this section is devoted to proving the above theorem. First, we reformulate the
cut norm using convex geometry.
Let Φ denote the set of all functions Γ → R that can be written as a convex combination
of convolutions of the form 1 𝐴 ∗ 1 𝐵 or −1 𝐴 ∗ 1 𝐵 , where 𝐴, 𝐵 ⊆ Γ. In other words,
Φ = ConvexHull ({1 𝐴 ∗ 1 𝐵 : 𝐴, 𝐵 ⊆ Γ} ∪ {−1 𝐴 ∗ 1 𝐵 : 𝐴, 𝐵 ⊆ Γ}) .
Note that Φ is a centrally symmetric convex set of functions Γ → R.
we have
∥ 𝑓 ∥ □ = sup |⟨ 𝑓 , 1 𝐴 ∗ 1 𝐵 ⟩| = sup ⟨ 𝑓 , 𝜑⟩ .
𝐴,𝐵⊆Γ 𝜑 ∈Φ
Since Φ is a centrally symmetric convex body, ∥ ∥ □ is indeed a norm. Its dual norm is thus
given by, for any nonzero 𝜓 : Γ → R,
∥𝜓∥ ∗□ = sup ⟨ 𝑓 , 𝜓⟩ = sup 𝑟 ∈ R : 𝑟 −1 𝜓 ∈ Φ .
𝑓 : Γ→R
∥ 𝑓 ∥ □ ≤1
In other words, Φ is the unit ball for ∥ ∥ ∗□ norm. The following inequality holds for all
𝑓 , 𝜓 : Γ → R:
⟨ 𝑓 , 𝜓⟩ ≤ ∥ 𝑓 ∥ □ ∥𝜓∥ ∗□ .
Proof. The inequality is not affected if we multiply 𝜓 and 𝜓 ′ each by a constant. So we can
assume that ∥𝜓∥ ∗□ = ∥𝜓 ′ ∥ ∗□ = 1. Then 𝜓, 𝜓 ′ ∈ Φ. Hence 𝜓𝜓 ′ ∈ Φ by Lemma 9.4.7. This
implies that ∥𝜓𝜓 ′ ∥ □′ ≤ 1. □
We need two classical results from analysis and convex geometry.
We can show that ∥𝜓∥ ∗□ = 𝑂 𝜀 (1). As 𝑃 is a polynomial, by the triangle inequality and the
submultiplicativity of ∥ ∥ ∗□ , we find that ∥𝑃𝜓∥ ∗□ = 𝑂 𝜀 (1). And so
⟨𝜈 − 1, 𝑃𝜓⟩ ≤ ∥𝜈 − 1∥ □ ∥𝑃𝜓∥ ∗□ ≤ 𝛿 ∥𝑃𝜓∥ ∗□
can be made arbitrarily small by making 𝛿 small. We also have ⟨1, 𝑃𝜓⟩ ≈ ⟨1, 𝜓+ ⟩, which is
at most around 1. Together, we see that ⟨ 𝑓 , 𝜓⟩ is at most around 1, which would contradict
⟨ 𝑓 , 𝜓⟩ > 1 from earlier (assuming enough slack). ■
Proof of the dense model theorem (Theorem 9.4.6). We will show that the conclusion holds
with 𝛿 > 0 chosen to be sufficiently small as a function of 𝜀. We may assume that 0 < 𝜀 < 1/2.
We will prove the existence of a function 𝑔 : Γ → [0, 1 + 𝜀/2] such that ∥ 𝑓 − 𝑔∥ □ ≤ 𝜀/2.
(To obtain the function Γ → [0, 1] in the theorem, we can replace 𝑔 by min{𝑔, 1}.)
We are trying to prove that one can write 𝑓 as 𝑔 + 𝑔 ′ with
𝑔 ∈ 𝐾 := functions Γ → [0, 1 + 2𝜀 ]
and
𝑔 ′ ∈ 𝐾 ′ := functions Γ → R with ∥·∥ □ ≤ 𝜀
2
.
We can view the sets 𝐾 and 𝐾 ′ as convex bodies (both containing the origin) in the space of
all functions Γ → R. Our goal is to show that 𝑓 ∈ 𝐾 + 𝐾 ′ .
Let us assume the contrary. By the separating hyperplane theorem applied to 𝑓 ∉ 𝐾 + 𝐾 ′ ,
there exists a function 𝜓 : Γ → R (which is a normal vector to the separating hyperplane)
such that
(a) ⟨ 𝑓 , 𝜓⟩ > 1, and
(b) ⟨𝑔 + 𝑔 ′ , 𝜓⟩ ≤ 1 for all 𝑔 ∈ 𝐾 and 𝑔 ′ ∈ 𝐾 ′
Taking 𝑔 = (1 + 2𝜀 )1 𝜓≥0 and 𝑔 ′ = 0 in (b), we have
1
⟨1, 𝜓+ ⟩ ≤ . (9.1)
1 + 𝜀/2
Here we write 𝜓+ for the function 𝜓+ (𝑥) := max {𝜓(𝑥), 0}.
On the other hand, setting 𝑔 = 0, we have
𝜀
1 ≥ sup ⟨𝑔 ′ , 𝜓⟩ = sup ⟨𝑔 ′ , 𝜓⟩ = ∥𝜓∥ ∗□ .
𝑔′ ∈𝐾 ′ ∥𝑔′ ∥ □ ≤ 𝜀/2
2
So
2
∥𝜓∥ ∗□ ≤ .
𝜀
Setting 𝑔 = 0 and 𝑔 ′ = ± 2𝜀 𝑁1 𝑥 for a single 𝑥 ∈ Γ (i.e, 𝑔 ′ is supported on a single element of
Γ), we have ∥𝑔 ′ ∥ □ ≤ 𝜀/2 and 1 ≥ ⟨𝑔 ′ , 𝜓⟩ = ± 2𝜀 𝜓(𝑥). So |𝜓(𝑥)| ≤ 2/𝜀. This holds for every
𝑥 ∈ Γ. Thus
2
∥𝜓∥ ∞ ≤ .
𝜀
By the Weierstrass polynomial approximation theorem, there exists some real polynomial
𝑃(𝑥) = 𝑝 𝑑 𝑥 𝑑 + · · · + 𝑝 1 𝑥 + 𝑝 0 such that
𝜀 2
|𝑃(𝑡) − max {𝑡, 0}| ≤ whenever |𝑡| ≤ .
20 𝜀
9.4 Dense Model Theorem 291
𝑃(𝑡)
max{𝑡, 0}
Set
∑︁
𝑑 𝑖
2
𝑅= | 𝑝𝑖 | ,
𝑖=0
𝜀
which is a constant that depends only on 𝜀. (A more careful analysis gives 𝑅 = exp(𝜀 −𝑂 (1) ).)
Write 𝑃𝜓 : Γ → R to mean the function given by 𝑃𝜓(𝑥) = 𝑃(𝜓(𝑥)). By the triangle
inequality and the submulticativity of ∥·∥ ∗□ (Lemma 9.4.8) ,
∑︁
𝑑 ∑︁
𝑑 ∑︁
𝑑 𝑖
∗ 𝑖 ∗ ∗ 𝑖 2
∥𝑃𝜓∥ □ ≤ | 𝑝 𝑖 | ∥𝜓 ∥ □ ≤ | 𝑝 𝑖 | (∥𝜓∥ □ ) ≤ | 𝑝𝑖 | = 𝑅.
𝑖=0 𝑖=0 𝑖=0
𝜀
Let us choose n 𝜀 o
𝛿 = min ,1 .
20𝑅
Then ∥𝜈 − 1∥ □ ≤ 𝛿 implies that
𝜀
|⟨𝜈 − 1, 𝑃𝜓⟩| ≤ ∥𝜈 − 1∥ □ ∥𝑃𝜓∥ ∗□ ≤ 𝛿𝑅 ≤ . (9.2)
20
Earlier we showed that ∥𝜓∥ ∞ ≤ 2/𝜀, and also |𝑃(𝑡) − max {𝑡, 0}| ≤ 𝜀/20 whenever
|𝑡| ≤ 2/𝜀. Thus
𝜀
∥𝑃𝜓 − 𝜓+ ∥ ∞ ≤ . (9.3)
20
Hence,
⟨𝜈, 𝑃𝜓⟩ = ⟨1, 𝑃𝜓⟩ + ⟨𝜈 − 1, 𝑃𝜓⟩
𝜀
≤ ⟨1, 𝑃𝜓⟩ + [by (9.2)]
20
𝜀
≤ ⟨1, 𝜓+ ⟩ + [by (9.3)]
10
1 𝜀
≤ + . [by (9.1)].
1 + 𝜀/2 10
Also,
⟨𝜈 − 1, 1⟩ ≤ ∥𝜈 − 1∥ □ ≤ 𝛿.
Thus
∥𝜈∥ 1 ≤ 1 + ∥𝜈 − 1∥ 1 ≤ 1 + 𝛿 ≤ 2.
So by (9.3),
𝜀 𝜀
⟨𝜈, 𝜓+ − 𝑃𝜓⟩ ≤ ∥𝜈∥ 1 ∥𝜓+ − 𝑃𝜓∥ ∞ ≤ 2 · ≤ . (9.4)
20 10
292 Progressions in Sparse Pseudorandom Sets
𝑋 𝑌
For any tripartite graph 𝐹, we write 𝑡 (𝐹, 𝑓 ) for the 𝐹-density in 𝑓 (and likewise with 𝑔
9.5 Sparse Counting Lemma 293
𝑥 𝑦
𝑡 (𝐾3 , 𝑓 ) = E 𝑥,𝑦,𝑧 𝑓 (𝑥, 𝑦) 𝑓 (𝑥, 𝑧) 𝑓 (𝑦, 𝑧) and
𝑧
𝑥′
𝑦
𝑥
𝑡 (𝐾2,1,1 , 𝐹) = E 𝑥, 𝑥 ′ ,𝑦,𝑧 𝑓 (𝑥, 𝑦) 𝑓 (𝑥 ′ , 𝑦) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥 ′ , 𝑧) 𝑓 (𝑦, 𝑧)
𝑧
Throughout we assume that 𝜀 > 0 is sufficiently small, so that ≤ 𝜀 Ω(1) means ≤ 𝐶𝜀 𝑐 for
some absolute constants 𝑐, 𝐶 > 0 (which could change from line to line).
Here is the main result of this section, due to Conlon, Fox, & Zhao (2015).
You should now pause and review the proof of the “dense” triangle counting lemma from
Proposition 4.5.4, which says that if in addition we assume 0 ≤ 𝑓 ≤ 1 (that is, assuming
𝜈 = 1 identically), then
|𝑡 (𝐾3 , 𝑓 ) − 𝑡 (𝐾3 , 𝑔)| ≤ 3 ∥ 𝑓 − 𝑔∥ □ ≤ 3𝜀.
Roughly speaking, the proof of the dense triangle counting lemma proceeds by replacing 𝑓
by 𝑔 one edge at a time, each time incurring at most an ∥ 𝑓 − 𝑔∥ □ loss.
294 Progressions in Sparse Pseudorandom Sets
𝑓 𝑔 𝑔 𝑔
≈ ≈ ≈
𝑓 𝑓 𝑓 𝑓 𝑔 𝑓 𝑔 𝑔
→ → →
𝜈 𝜈 𝜈 𝜈 𝜈 1 1 1
Proof. The proof uses two applications of the Cauchy–Schwarz inequality. Let us write down
the proof in the case when none of the four 𝑓 ’s are replaced by 𝑔’s. The other cases are similar
(basically apply 𝑔 ≤ 1 instead of 𝑓 ≤ 𝜈 wherever appropriate).
Here is a figure illustrating the first application of the Cauchy–Schwarz inequality.
9.5 Sparse Counting Lemma 295
2
𝜈−1 𝜈−1 𝜈−1
© ª © ª © ª
® ®© ª ®© ª
® ® ® ® ®
® ≤ ® 𝑓 ®®
≤ ® 𝜈 ®®
𝑓 ® 𝑓 𝑓 ®® 𝜈 ®®
𝑓 ® 𝑓
« ¬ « ¬
« ¬ « ¬ « ¬
Here are the inequalities written out:
2
E 𝑥,𝑦,𝑧,𝑧 ′ (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ )
2
= E 𝑦,𝑧,𝑧 ′ E 𝑥 (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ )
2
≤ E 𝑦,𝑧,𝑧 ′ E 𝑥 (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ ) E 𝑦,𝑧,𝑧 ′ 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ )
2
≤ E 𝑦,𝑧,𝑧 ′ E 𝑥 (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝜈(𝑦, 𝑧)𝜈(𝑦, 𝑧 ′ ) E 𝑦,𝑧,𝑧 ′ 𝜈(𝑦, 𝑧)𝜈(𝑦, 𝑧 ′ ).
Note that we are able to apply 𝑓 ≤ 𝜈 in the final step above due to the nonnegativity of the
square, which arose from the Cauchy–Schwarz inequality. We could not have applied 𝑓 ≤ 𝜈
at the very beginning.
The second factor above is at most 1 + 𝜀 due to the 3-linear forms condition. It remains to
show that the first factor is ≤ 𝜀 Ω(1) . The first factor expands to
E 𝑥, 𝑥 ′ ,𝑦,𝑧,𝑧 ′ (𝜈(𝑥, 𝑦) − 1)(𝜈(𝑥 ′ , 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑥 ′ , 𝑧) 𝑓 (𝑥 ′ , 𝑧 ′ )𝜈(𝑦, 𝑧)𝜈(𝑦, 𝑧 ′ ).
We can upper bound the above quantity as illustrated below, using a second application of
the Cauchy–Schwarz inequality.
2
𝜈−1 𝜈−1 𝜈−1 𝜈−1 𝜈−1
© ª © ª© ª © ª© ª
® ® ® ® ®
® ® ® ® ®
® ≤ ® ®≤ ® ®
𝑓 𝜈 ®® 𝑓 𝜈 ®® 𝑓 𝜈 ®® 𝜈 𝜈 ®® 𝜈 𝜈 ®®
« ¬ « ¬« ¬ « ¬« ¬
On the right-hand side, the first factor is ≤ 𝜀 Ω(1) by the 3-linear forms condition. Indeed,
|𝑡 (𝐹, 𝜈) − 1| ≤ 𝜀 for any 𝐹 ⊆ 𝐾2,2,2 . If we expand all the 𝜈 − 1 in the first factor above, then
it becomes an alternating sum of various 𝑡 (𝐹, 𝜈) ∈ [1 − 𝜀, 1 + 𝜀] with 𝐹 ⊆ 𝐾2,2,2 , with the
main contribution 1 from each term canceling each other out. The second factor is ≤ 1 + 𝜀
again by the 3-linear forms condition.
Putting everything together, this completes the proof of the lemma. □
Define 𝜈∧ , 𝑓∧ , 𝑔∧ : 𝑋 × 𝑌 → [0, ∞) by
𝜈∧ (𝑥, 𝑦) := E𝑧 𝜈(𝑥, 𝑧)𝜈(𝑦, 𝑧), 𝑥 𝑦
𝑓∧ (𝑥, 𝑦) := E𝑧 𝑓 (𝑥, 𝑧) 𝑓 (𝑦, 𝑧),
𝑔∧ (𝑥, 𝑦) := E𝑧 𝑔(𝑥, 𝑧)𝑔(𝑦, 𝑧).
They represent codegrees. Even though 𝜈 and 𝑓 are possibly unbounded, the new weighted
graphs 𝜈∧ and 𝑓∧ behave like dense graphs because the sparseness is somehow smoothed out
(this is a key observation). On a first reading of the proof, you may wish to pretend that 𝜈∧
296 Progressions in Sparse Pseudorandom Sets
and 𝑓∧ are uniformly bounded above by 1 (in reality, we need to control the negligible bit of
𝜈 exceeding 1).
We have
𝑡 (𝐾3 , 𝑓 ) = ⟨ 𝑓 , 𝑓∧ ⟩,
and 𝑡 (𝐾3 , 𝑔) = ⟨𝑔, 𝑔∧ ⟩.
So
𝑡 (𝐾3 , 𝑓 ) − 𝑡 (𝐾3 , 𝑔) = ⟨ 𝑓 , 𝑓∧ ⟩ − ⟨𝑔, 𝑔∧ ⟩
= ⟨ 𝑓 , 𝑓∧ − 𝑔∧ ⟩ + ⟨ 𝑓 − 𝑔, 𝑔∧ ⟩.
We have
𝑓 −𝑔
|⟨ 𝑓 − 𝑔, 𝑔∧ ⟩| ≤ ∥ 𝑓 − 𝑔∥ □ ≤ 𝜀.
𝑔 𝑔
by the same argument as in the dense triangle counting lemma (Proposition 4.5.4), as
0 ≤ 𝑔 ≤ 1. So it remains to show |⟨ 𝑓 , 𝑓∧ − 𝑔∧ ⟩| ≤ 𝜀 Ω(1) .
By the Cauchy-Schwarz inequality, we have
⟨ 𝑓 , 𝑓∧ − 𝑔∧ ⟩ 2 = E[ 𝑓 ( 𝑓∧ − 𝑔∧ )] 2 ≤ E[ 𝑓 ( 𝑓∧ − 𝑔∧ ) 2 ] E 𝑓 ≤ E[𝜈( 𝑓∧ − 𝑔∧ ) 2 ] E𝜈.
The second factor is E𝜈 ≤ 1 + 𝜀 by the 3-linear forms condition. So it remains to show that
E[𝜈( 𝑓∧ − 𝑔∧ ) 2 ] = ⟨𝜈, ( 𝑓∧ − 𝑔∧ ) 2 ⟩ ≤ 𝜀 Ω(1) .
By Lemma 9.5.2
⟨𝜈 − 1, ( 𝑓∧ − 𝑔∧ ) 2 ⟩ ≤ 𝜀 Ω(1)
(to see this inequality, first expand ( 𝑓∧ − 𝑔∧ ) 2 and then apply Lemma 9.5.2 term by term).
Thus
E[𝜈( 𝑓∧ − 𝑔∧ ) 2 ] ≤ E[( 𝑓∧ − 𝑔∧ ) 2 ] + 𝜀 Ω(1) .
Thus, to prove the induction step (as stated earlier) for the sparse triangle counting lemma, it
remains to prove the following.
Let us first sketch the idea of the proof of Lemma 9.5.3. Expanding, we have
LHS of (9.5) = ⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨ 𝑓∧ , 𝑔∧ ⟩ − ⟨𝑔∧ , 𝑓∧ ⟩ + ⟨𝑔∧ , 𝑔∧ ⟩. (9.6)
Each term represents some 4-cycle density.
9.5 Sparse Counting Lemma 297
So it suffices to show that each of the four terms above differs from ⟨𝑔∧ , 𝑔∧ ⟩ by ≤ 𝜀 Ω(1) .
We are trying to show that ⟨ 𝑓∧ , 𝑓∧ ⟩ ≈ ⟨𝑔∧ , 𝑔∧ ⟩. Expanding the second factor in each ⟨·, ·⟩, we
are trying to show that
𝑓∧
E 𝑥,𝑦,𝑧 𝑓∧ (𝑥, 𝑦) 𝑓 (𝑥, 𝑧) 𝑓 (𝑦, 𝑧)
≈ E 𝑥,𝑦,𝑧 𝑔∧ (𝑥, 𝑦)𝑔(𝑥, 𝑧)𝑔(𝑦, 𝑧).
𝑓 𝑓
However, this is just another instance of the sparse triangle counting lemma! And importantly,
this instance is easier than the one we started with. Indeed, we have ∥ 𝑓∧ − 𝑔∧ ∥ □ ≤ 𝜀 Ω(1) (this
can be proved by invoking the induction hypothesis). Furthermore, the first factor 𝑓∧ (𝑥, 𝑦)
now behaves more like a bounded function (corresponding to a dense graph rather than a
sparse graph). Let us pretend for a second that 𝑓∧ ≤ 1, ignoring the negligible part of 𝑓∧
exceeding 1. Then we have reduced the original problem to a new instance of the triangle
counting lemma, except that now 𝑓 ≤ 𝜈 on 𝑋 ×𝑌 has been replaced by 𝑓∧ ≤ 1 (this is the key
point where densification occurs). Lemma 9.5.3 then follows from the induction hypothesis
as we have reduced the sparsity of the pseudorandom host graph.
Coming back to the proof, as discussed earlier, while 𝑓∧ is not necessarily ≤ 1, it is
almost so. We need to handle the error term arising from replacing 𝑓∧ by its capped version
𝑓∧ : 𝑋 × 𝑌 → [0, 1] defined by
𝑓∧ = min{ 𝑓∧ , 1} pointwise.
We have
0 ≤ 𝑓∧ − 𝑓∧ = max{ 𝑓∧ − 1, 0} ≤ max{𝜈∧ − 1, 0} ≤ |𝜈∧ − 1|. (9.7)
Also,
(E|𝜈∧ − 1|) 2 ≤ E[(𝜈∧ − 1) 2 ] = E𝜈∧2 − 2E𝜈∧ + 1 ≤ 3𝜀, (9.8)
by the 3-linear forms condition, since E𝜈∧2 and E𝜈∧ are both within 𝜀 of 1. So
⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨ 𝑓∧ , 𝑓∧ ⟩ = ⟨ 𝑓∧ − 𝑓∧ , 𝑓∧ ⟩ ≤ E |𝜈∧ − 1| 𝜈∧
= E |𝜈∧ − 1| (𝜈∧ − 1) + E |𝜈∧ − 1|
≤ E[(𝜈∧ − 1) 2 ] + E |𝜈∧ − 1|
≤ 𝜀 Ω(1) . [by (9.8)] (9.9)
⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨𝑔∧ , 𝑔∧ ⟩ ≤ 𝜀 Ω(1) .
Thus |⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨𝑔∧ , 𝑔∧ ⟩| ≤ 𝜀 Ω(1) . Likewise, the other terms on the right-hand side of
(9.9) are within 𝜀 Ω(1) of ⟨𝑔∧ , 𝑔∧ ⟩ (Exercise!). The conclusion E[( 𝑓∧ − 𝑔∧ ) 2 ] ≤ 𝜀 Ω(1) then
follows. □
Exercise 9.5.5. State and prove a generalization of the sparse counting lemma to count
an arbitrary but fixed subgraph (replacing the triangle above). How about hypergraphs?
9.6 Proof of the Relative Roth Theorem 299
Exercise 9.6.2. Deduce the above version of Roth’s theorem from the existence version
(namely that every 3-AP-free subset of [𝑁] has size 𝑜(𝑁).)
Proof of the relative Roth theorem (Theorem 9.2.5). Let 𝑝 = |𝑆| /𝑁. Define
𝜈 : Z/𝑁Z → [0, ∞) by 𝜈 = 𝑝 −1 1𝑆 .
Let 𝑋 = 𝑌 = 𝑍 = Z/𝑁Z. Consider the associated edge-weighted tripartite graph
𝜈 ′ : (𝑋 × 𝑌 ) ∪ (𝑋 × 𝑍) ∪ (𝑌 × 𝑍) → [0, ∞)
defined by, for 𝑥 ∈ 𝑋, 𝑦 ∈ 𝑌 , and 𝑧 ∈ 𝑍,
𝜈 ′ (𝑥, 𝑦) = 𝜈(2𝑥 + 𝑦), 𝜈 ′ (𝑥, 𝑧) = 𝜈(𝑥 − 𝑧), 𝜈 ′ (𝑦, 𝑧) = 𝜈(−𝑦 − 2𝑧).
Since 𝜈 satisfies the 3-linear forms condition (as a function on Z/𝑁Z), 𝜈 ′ also satisfies the
3-linear forms condition in the sense of Section 9.5. Likewise,
∥𝜈 − 1∥ □ = ∥𝜈 ′ − 1∥ □
300 Progressions in Sparse Pseudorandom Sets
where ∥𝜈 − 1∥ □ on the left-hand side is in the sense of Section 9.4 and ∥𝜈 ′ − 1∥ □ is defined
as in Section 9.5 with 𝜈 ′ is restricted to 𝑋 × 𝑌 (the same would be true had we restricted to
𝑋 × 𝑍 or 𝑌 × 𝑍). Indeed,
∥𝜈 − 1∥ □ = sup E(𝜈(𝑥 + 𝑦) − 1)1 𝐴 (𝑥)1 𝐵 (𝑦)
𝐴⊆𝑋,𝐵⊆𝑌
whereas
∥𝜈 ′ − 1∥ □ = sup E(𝜈 ′ (𝑥, 𝑦) − 1)1 𝐴 (𝑥)1 𝐵 (𝑦)
𝐴⊆𝑋,𝐵⊆𝑌
and these two expressions are equal to each other after a change of variables 𝑥 ↔ 2𝑥 (which
is a bijection as 𝑁 is odd).
By Lemma 9.5.2 (or simply two applications of the Cauchy–Schwarz inequality followed
by the 3-linear forms condition), we obtain
∥𝜈 − 1∥ □ ≤ 𝜀 Ω(1) .
Now suppose 𝐴 ⊆ 𝑆 and | 𝐴| ≥ 𝛿𝑁. Define 𝑓 : Z/𝑁Z → [0, ∞) by
𝑓 = 𝑝 −1 1 𝐴
so that 0 ≤ 𝑓 ≤ 𝜈 pointwise. Then by the dense model theorem (Theorem 9.4.6), there exists
a function 𝑔 : Z/𝑁Z → [0, 1] such that
∥ 𝑓 − 𝑔∥ □ ≤ 𝜂,
where 𝜂 = 𝜂(𝜀) is some quantity that tends to zero as 𝜀 → 0.
Define the associated edge-weighted tripatite graphs
𝑓 ′ , 𝑔 ′ : (𝑋 × 𝑌 ) ∪ (𝑋 × 𝑍) ∪ (𝑌 × 𝑍) → [0, ∞)
where, for 𝑥 ∈ 𝑋, 𝑦 ∈ 𝑌 , and 𝑧 ∈ 𝑍,
𝑓 ′ (𝑥, 𝑦) = 𝑓 (2𝑥 + 𝑦), 𝑓 ′ (𝑥, 𝑧) = 𝑓 (𝑥 − 𝑧), 𝑓 ′ (𝑦, 𝑧) = 𝑓 (−𝑦 − 2𝑧),
𝑔 ′ (𝑥, 𝑦) = 𝑔(2𝑥 + 𝑦), 𝑔 ′ (𝑥, 𝑧) = 𝑔(𝑥 − 𝑧), 𝑔 ′ (𝑦, 𝑧) = 𝑔(−𝑦 − 2𝑧).
Note that 𝑔 ′ takes values in [0, 1]. Then
∥ 𝑓 ′ − 𝑔 ′ ∥ □ = ∥ 𝑓 − 𝑔∥ □ ≤ 𝜂
when 𝑓 ′ − 𝑔 ′ is interpreted as restricted to 𝑋 × 𝑌 (and the same for 𝑋 × 𝑍 or 𝑌 × 𝑍). Thus
by the sparse triangle counting lemma (Theorem 9.5.1), we have
|𝑡 (𝐾3 , 𝑓 ′ ) − 𝑡 (𝐾3 , 𝑔 ′ )| ≤ 𝜂Ω(1) .
Note that
𝑡 (𝐾3 , 𝑓 ′ ) = E 𝑥,𝑦,𝑧 𝑓 ′ (𝑥, 𝑦) 𝑓 ′ (𝑥, 𝑧) 𝑓 ′ (𝑦, 𝑧)
= E 𝑥,𝑦,𝑧 ∈Z/𝑁 Z 𝑓 (2𝑥 + 𝑦) 𝑓 (𝑥 − 𝑧) 𝑓 (−𝑦 − 2𝑧)
= E 𝑥,𝑑 ∈Z/𝑁 Z 𝑓 (𝑥) 𝑓 (𝑥 + 𝑑) 𝑓 (𝑥 + 2𝑑).
= Λ3 ( 𝑓 )
9.6 Proof of the Relative Roth Theorem 301
Further Reading
The original paper by Green & Tao (2008) titled The Primes Contain Arbitrarily Long
Arithmetic Progressions is worth reading. Their follow-up paper Linear Equations in Primes
(2010a) substantially strengthens the result to asymptotically count the number of 𝑘-APs
in the primes, though the proof was conditional on several claims that were subsequently
proved, most notably the inverse theorem for Gowers uniformity norms (Green, Tao, &
Ziegler 2012).
A number of expository articles were written on this topic shortly after the breakthroughs:
Green (2007b, 2014), Tao (2007b), Kra (2006), Wolf (2013).
The graph-theoretic approach taken in chapter is adapted from the article The Green–Tao
Theorem: an Exposition by Conlon, Fox, & Zhao (2014). The article presents a full proof
302 Progressions in Sparse Pseudorandom Sets
of the Green–Tao theorem that incorporates various simplifications found since the original
work. The analytic number theoretic arguments, which were omitted from this chapter, can
also be found in that article.
Chapter Summary
• Green–Tao theorem. The primes contain arbitrarily long arithmetic progressions. Proof
strategy:
– Embed the primes in a slightly larger set, the “almost primes,” which enjoys certain
pseudorandomness properties.
– Show that every 𝑘-AP-free subset of such a pseudorandom set must have negligible
size.
• Relative Szemerédi theorem. If 𝑆 ⊆ Z/𝑁Z satisfies a 𝒌-linear forms condition, then
every 𝑘-AP-free subset of 𝑆 has size 𝑜(|𝑆|).
– The 3-linear forms condition is a pseudorandomness hypothesis. It says that the asso-
ciated tripartite graph has 𝐹-density close to random whenever 𝐹 ⊆ 𝐾2,2,2 .
• Proof of the relative Szemerédi theorem uses the transference principle to transfer
Szemerédi’s theorem from the dense setting to the sparse pseudorandom setting.
– First approximate 𝐴 ⊆ 𝑆 by a dense set 𝐵 ⊆ Z/𝑁Z (dense model theorem).
– Then show that the normalized count of 𝑘-APs in 𝐴 and 𝐵 are similar (sparse counting
lemma).
– Finally conclude using Szemerédi’s theorem that 𝐵 has many 𝑘-APs, and therefore so
must 𝐴.
• Dense model theorem. If a sparse set 𝑆 is close to random in normalized cut norm, then
every subset 𝐴 ⊆ 𝑆 can be approximated by some dense 𝐵 ⊆ Z/𝑁Z in normalized cut
norm.
• Sparse counting lemma. If two graphs (one sparse and one dense) are close to normalized
cut norm, then they have similar triangle counts, provided that the sparse graph lies inside
a sparse pseudorandom graph satisfying the 3-linear forms condition (which says that the
densities of 𝐾2,2,2 and its subgraphs are close to random).
References
303
304 References
Gowers, W. T. (1998b)
Additive and combinatorial number theory, online lecture notes written by Jacques Verstraëte based on
a course given by W. T. Gowers, https://www.dpmms.cam.ac.uk/~wtg10/.
Gowers, W. T. (2001)
A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11, 465–588. MR:1844079
Gowers, W. T. (2006)
Quasirandomness, counting and regularity for 3-uniform hypergraphs, Combin. Probab. Comput. 15,
143–184. MR:2195580
Gowers, W. T. (2007)
Hypergraph regularity and the multidimensional Szemerédi theorem, Ann. of Math. 166, 897–946.
MR:2373376
Gowers, W. T. (2008)
Quasirandom groups, Combin. Probab. Comput. 17, 363–387. MR:2410393
Gowers, W. T. (2010)
Decompositions, approximate structure, transference, and the Hahn-Banach theorem, Bull. Lond. Math.
Soc. 42, 573–606. MR:2669681
Graham, Ronald L., Rothschild, Bruce L., & Spencer, Joel H. (1990)
Ramsey theory, second ed., Wiley. MR:1044995
Green, B. (2005a)
A Szemerédi-type regularity lemma in abelian groups, with applications, Geom. Funct. Anal. 15, 340–376.
MR:2153903
Green, Ben (2005b)
Roth’s theorem in the primes, Ann. of Math. (2) 161, 1609–1636. MR:2180408
Green, Ben (2005c)
Finite field models in additive combinatorics, Surveys in combinatorics 2005, Cambridge University
Press, pp. 1–27. MR:2187732
Green, Ben (2007a)
Montréal notes on quadratic Fourier analysis, Additive combinatorics, American Mathematical Society,
pp. 69–102. MR:2359469
Green, Ben (2007b)
Long arithmetic progressions of primes, Analytic Number Theory: A Tribute to Gauss and Dirichlet,
American Mathematical Society, pp. 149–167. MR:2362199
Green, Ben (2009a)
Additive combinatorics (book review), Bull. Amer. Math. Soc. 46, 489–497. MR:2507281
Green, Ben (2009b)
Additive combinatorics, lecture notes, http://people.maths.ox.ac.uk/greenbj/notes.html.
Green, Ben (2014)
Approximate algebraic structure, Proceedings of the International Congress of Mathematicians—Seoul
2014. Vol. 1, Kyung Moon Sa, pp. 341–367. MR:3728475
Green, Ben & Ruzsa, Imre Z. (2007)
Freiman’s theorem in an arbitrary abelian group, J. Lond. Math. Soc. 75, 163–175. MR:2302736
Green, Ben & Tao, Terence (2008)
The primes contain arbitrarily long arithmetic progressions, Ann. of Math. 167, 481–547. MR:2415379
310 References
Hosseini, Kaave, Lovett, Shachar, Moshkovitz, Guy, & Shapira, Asaf (2016)
An improved lower bound for arithmetic regularity, Math. Proc. Cambridge Philos. Soc. 161, 193–197.
MR:3530502
Ireland, Kenneth & Rosen, Michael (1990)
A classical introduction to modern number theory, second ed., Springer-Verlag. MR:1070716
Jordan, Herbert E. (1907)
Group-Characters of Various Types of Linear Groups, Amer. J. Math. 29, 387–405. MR:1506021
Kahn, Jeff (2001)
An entropy approach to the hard-core model on bipartite graphs, Combin. Probab. Comput. 10, 219–237.
MR:1841642
Katona, G. (1968)
A theorem of finite sets, Theory of graphs (Proc. Colloq., Tihany, 1966), pp. 187–207. MR:0290982
Kedlaya, Kiran S. (1997)
Large product-free subsets of finite groups, J. Combin. Theory Ser. A 77, 339–343. MR:1429085
Kedlaya, Kiran S. (1998)
Product-free subsets of groups, Amer. Math. Monthly 105, 900–906. MR:1656927
Keevash, Peter (2011)
Hypergraph Turán problems, Surveys in combinatorics 2011, Cambridge University Press, pp. 83–139.
MR:2866732
Khot, Subhash, Kindler, Guy, Mossel, Elchanan, & O’Donnell, Ryan (2007)
Optimal inapproximability results for MAX-CUT and other 2-variable CSPs?, SIAM J. Comput. 37,
319–357. MR:2306295
Kleinberg, Robert, Speyer, David E., & Sawin, Will (2018)
The growth of tri-colored sum-free sets, Discrete Anal., Paper No. 12, 10 pp. MR:3827120
Kollár, János, Rónyai, Lajos, & Szabó, Tibor (1996)
Norm-graphs and bipartite Turán numbers, Combinatorica 16, 399–406. MR:1417348
Komlós, J. & Simonovits, M. (1996)
Szemerédi’s regularity lemma and its applications in graph theory, Combinatorics, Paul Erdős is eighty,
Vol. 2 (Keszthely, 1993), János Bolyai Mathematical Society, pp. 295–352. MR:1395865
Komlós, János, Shokoufandeh, Ali, Simonovits, Miklós, & Szemerédi, Endre (2002)
The regularity lemma and its applications in graph theory, Theoretical aspects of computer science
(Tehran, 2000), Springer, pp. 84–112. MR:1966181
Konyagin, S. V. & Shkredov, I. D. (2015)
On sum sets of sets having small product set, Proc. Steklov Inst. Math. 290, 288–299. MR:3488800
Kővári, T., Sós, V. T., & Turán, P. (1954)
On a problem of K. Zarankiewicz, Colloq. Math. 3, 50–57. MR:65617
Kra, Bryna (2006)
The Green-Tao theorem on arithmetic progressions in the primes: an ergodic point of view, Bull. Amer.
Math. Soc. 43, 3–23. MR:2188173
Krivelevich, M. & Sudakov, B. (2006)
Pseudo-random graphs, More sets, graphs and numbers, Springer, pp. 199–262. MR:2223394
312 References
Margulis, G. A. (1988)
Explicit group-theoretic constructions of combinatorial schemes and their applications in the construction
of expanders and concentrators, Problemy Peredachi Informatsii 24, 51–60. MR:939574
Matiyasevich, Ju. V. (1970)
The Diophantineness of enumerable sets, Dokl. Akad. Nauk. SSSR. 191, 279–282. MR:0258744
Matoušek, Jiří (2010)
Thirty-three miniatures: Mathematical and algorithmic applications of linear algebra, American Math-
ematical Society. MR:2656313
Meshulam, Roy (1995)
On subsets of finite abelian groups with no 3-term arithmetic progressions, J. Combin. Theory Ser. A 71,
168–172. MR:1335785
Minkowski, Hermann (1896)
Geometrie der Zahlen, Teubner. MR:249269
Morgenstern, Moshe (1994)
Existence and explicit constructions of 𝑞 + 1 regular Ramanujan graphs for every prime power 𝑞, J.
Combin. Theory Ser. B 62, 44–62. MR:1290630
Moshkovitz, Guy & Shapira, Asaf (2016)
A short proof of Gowers’ lower bound for the regularity lemma, Combinatorica 36, 187–194. MR:3516883
Moshkovitz, Guy & Shapira, Asaf (2019)
A tight bound for hypergraph regularity, Geom. Funct. Anal. 29, 1531–1578. MR:4025519
Motzkin, T. S. (1967)
The arithmetic-geometric inequality, Inequalities (Proc. Sympos. Wright-Patterson Air Force Base, Ohio,
1965), Academic Press, pp. 205–224. MR:0223521
Motzkin, T. S. & Straus, E. G. (1965)
Maxima for graphs and a new proof of a theorem of Turán, Canadian J. Math. 17, 533–540. MR:175813
Mulholland, H. P. & Smith, C. A. B. (1959)
An inequality arising in genetical theory, Amer. Math. Monthly 66, 673–683. MR:110721
Nešetřil, Jaroslav & Rosenfeld, Moshe (2001)
I. Schur, C. E. Shannon and Ramsey numbers, a short story, Discrete Math. 229, 185–195. MR:1815606
Nikiforov, V. (2011)
The number of cliques in graphs of given order and size, Trans. Amer. Math. Soc. 363, 1599–1618.
MR:2737279
Nikolov, N. & Pyber, L. (2011)
Product decompositions of quasirandom groups and a Jordan type theorem, J. Eur. Math. Soc. (JEMS)
13, 1063–1077. MR:2800484
Nilli, A. (1991)
On the second eigenvalue of a graph, Discrete Math. 91, 207–210. MR:1124768
Pellegrino, Giuseppe (1970)
Sul massimo ordine delle calotte in 𝑆4,3 , Matematiche (Catania) 25, 149–157 (1971). MR:363952
Peluse, Sarah (2020)
Bounds for sets with no polynomial progressions, Forum Math. Pi 8, e16, 55 pp. MR:4199235
314 References
Sah, Ashwin, Sawhney, Mehtaab, Stoner, David, & Zhao, Yufei (2019)
The number of independent sets in an irregular graph, J. Combin. Theory Ser. B 138, 172–195.
MR:3979229
Sah, Ashwin, Sawhney, Mehtaab, Stoner, David, & Zhao, Yufei (2020)
A reverse Sidorenko inequality, Invent. Math. 221, 665–711. MR:4121160
Sah, Ashwin, Sawhney, Mehtaab, & Zhao, Yufei (2021)
Patterns without a popular difference, Discrete Anal., Paper No. 8, 30 pp. MR:4293329
Salem, R. & Spencer, D. C. (1942)
On sets of integers which contain no three terms in arithmetical progression, Proc. Natl. Acad. Sci. USA
28, 561–563. MR:7405
Sanders, Tom (2012)
On the Bogolyubov-Ruzsa lemma, Anal. PDE 5, 627–655. MR:2994508
Sanders, Tom (2013)
The structure theory of set addition revisited, Bull. Amer. Math. Soc. 50, 93–127. MR:2994996
Sárkőzy, A. (1978)
On difference sets of sequences of integers. I, Acta Math. Acad. Sci. Hungar. 31, 125–149. MR:466059
Saxton, David & Thomason, Andrew (2015)
Hypergraph containers, Invent. Math. 201, 925–992. MR:3385638
Schacht, Mathias (2016)
Extremal results for random discrete structures, Ann. of Math. 184, 333–365. MR:3548528
Schelp, Richard H. & Thomason, Andrew (1998)
A remark on the number of complete and empty subgraphs, Combin. Probab. Comput. 7, 217–219.
MR:1617934
Schoen, Tomasz (2011)
Near optimal bounds in Freiman’s theorem, Duke Math. J. 158, 1–12. MR:2794366
Schoen, Tomasz & Shkredov, Ilya D. (2014)
Roth’s theorem in many variables, Israel J. Math. 199, 287–308. MR:3219538
Schoen, Tomasz & Sisask, Olof (2016)
Roth’s theorem for four variables and additive structures in sums of sparse sets, Forum Math. Sigma 4,
e5, 28 pp. MR:3482282
Schrijver, Alexander (2003)
Combinatorial optimization: Polyhedra and efficiency, Springer-Verlag. MR:1956924
Schur, I. (1916)
Uber die kongruenz 𝑥 𝑚 + 𝑦 𝑚 ≡ 𝑧 𝑚 (mod 𝑝), Jber. Deutsch. Math.-Verein 25, 114–116.
Schur, J. (1907)
Untersuchungen über die Darstellung der endlichen Gruppen durch gebrochene lineare Substitutionen,
J. Reine Angew. Math. 132, 85–137. MR:1580715
Serre, Jean-Pierre (1977)
Linear representations of finite groups, Springer-Verlag. MR:0450380
Sheffer, Adam (2022)
Polynomial methods and incidence theory, Cambridge University Press. MR:4394303
Shkredov, I. D. (2006)
On a generalization of Szemerédi’s theorem, Proc. Lond. Math. Soc. 93, 723–760. MR:2266965
316 References
Sidorenko, A. F. (1991)
Inequalities for functionals generated by bipartite graphs, Diskret. Mat. 3, 50–65. MR:1138091
Sidorenko, Alexander (1993)
A correlation inequality for bipartite graphs, Graphs Combin. 9, 201–204. MR:1225933
Simonovits, M. (1974)
Extermal graph problems with symmetrical extremal graphs. Additional chromatic conditions, Discrete
Math. 7, 349–376. MR:337690
Singleton, Robert (1966)
On minimal graphs of maximum even girth, J. Combinatorial Theory 1, 306–332. MR:201347
Skokan, Jozef & Thoma, Lubos (2004)
Bipartite subgraphs and quasi-randomness, Graphs Combin. 20, 255–262. MR:2080111
Solymosi, József (2003)
Note on a generalization of Roth’s theorem, Discrete and computational geometry, Springer, pp. 825–827.
MR:2038505
Solymosi, József (2009)
Bounding multiplicative energy by the sumset, Adv. Math. 222, 402–408. MR:2538014
Soundararajan, K. (2007)
Additive combinatorics, online lecture notes, http://math.stanford.edu/~ksound/Notes.pdf.
Spielman, Daniel A. (2019)
Spectral and algebraic graph theory, textbook draft, http://cs-www.cs.yale.edu/homes/
spielman/sagt/.
Stein, Elias M. & Shakarchi, Rami (2003)
Fourier analysis: An introduction, Princeton University Press. MR:1970295
Sudakov, B., Szemerédi, E., & Vu, V. H. (2005)
On a question of Erdős and Moser, Duke Math. J. 129, 129–155. MR:2155059
Szegedy, Balázs (2015)
An information theoretic approach to Sidorenko’s conjecture. arXiv:1406.6738
Székely, László A. (1997)
Crossing numbers and hard Erdős problems in discrete geometry, Combin. Probab. Comput. 6, 353–358.
MR:1464571
Szemerédi, E. (1975)
On sets of integers containing no 𝑘 elements in arithmetic progression, Acta Arith. 27, 199–245.
MR:369312
Szemerédi, Endre & Trotter, William T., Jr. (1983)
Extremal problems in discrete geometry, Combinatorica 3, 381–392. MR:729791
Tao, Terence (2006)
A variant of the hypergraph removal lemma, J. Combin. Theory Ser. A 113, 1257–1280. MR:2259060
Tao, Terence (2007a)
Structure and randomness in combinatorics, 48th Annual IEEE Symposium on Foundations of Computer
Science (FOCS’07), pp. 3–15.
Tao, Terence (2007b)
The dichotomy between structure and randomness, arithmetic progressions, and the primes, International
Congress of Mathematicians. Vol. I, European Mathematical Society, pp. 581–608. MR:2334204
References 317
Wolf, J. (2015)
Finite field models in arithmetic combinatorics—ten years on, Finite Fields Appl. 32, 233–274.
MR:3293412
Wolf, Julia (2013)
Arithmetic and polynomial progressions in the primes [after Gowers, Green, Tao and Ziegler], Astérisque
352, 389–427. MR:3087352
Zarankiewicz, K. (1951)
Problem 101, Colloq. Math. 2, 201.
Zhao, Yufei (2010)
The number of independent sets in a regular graph, Combin. Probab. Comput. 19, 315–320. MR:2593625
Zhao, Yufei (2014)
An arithmetic transference proof of a relative Szemerédi theorem, Math. Proc. Cambridge Philos. Soc.
156, 255–261. MR:3177868
Zhao, Yufei (2017)
Extremal regular graphs: independent sets and graph homomorphisms, Amer. Math. Monthly 124,
827–843. MR:3722040
Index
319
320 Index