0% found this document useful (0 votes)

901 views338 pages

Graph Theory and Additive Combinatorics

Uploaded by

राघवेन्द्र त्रिपाठी

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

901 views338 pages

Graph Theory and Additive Combinatorics

Uploaded by

राघवेन्द्र त्रिपाठी

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 338

Graph Theory and Additive Combinatorics

Exploring Structure and Randomness

Yufei Zhao

Massachusetts Institute of Technology

Version: September 24, 2022

Graph Theory and Additive Combinatorics

Graph Theory and Additive Combinatorics provides a modern introduction to important

mathematical results and techniques in the two intimately related subjects of graph theory
and additive combinatorics, while highlighting beautiful connections between the two. The
dichotomy of structure and pseudorandomness is a recurring theme throughout the book. The
book introduces readers to foundational problems and methods in graph theory, including
the Turán problem, Szemerédi’s graph regularity method, pseudorandom graphs, spectral
graph theory, graph limits, and graph density and homomorphism inequalities. The book
also explores seminal results in additive combinatorics, notably cornerstone theorems of
Roth, Szemerédi, Freiman, and Green–Tao. Many important mathematical techniques are
highlighted throughout the book, including combinatorial, graph theoretic, spectral, analytic,
Fourier, algebraic, and geometric methods.
This textbook arose from a one-semester graduate-level course that the author has been
teaching regularly at MIT. Lecture videos are available for free through MIT OpenCourse-
Ware. This is the first introductory graduate level textbook to focus on a unifying set of topics
connecting graph theory and additive combinatorics. The material should appeal to anyone
with an interest in combinatorics, theoretical computer science, analysis, probability, and
number theory. The book is a useful resource for introducing students and researchers to a
wide range of beautiful mathematics in the field, as well as for research. The book contains
around 140 figures and illustrations as well as many carefully selected exercises battle-tested
through classes. The prerequisites are mainly mathematical maturity and undergraduate level
combinatorics. Basic familiarity with abstract algebra, analysis, and linear algebra is assumed
at times.

Yufei Zhao is Associate Professor of Mathematics at the Massachusetts Institute of Technol-

ogy (MIT). His research tackles a broad range of problems in discrete mathematics, including
extremal, probabilistic, and additive combinatorics, graph theory, and discrete geometry, as
well as applications to computer science. His honors include the SIAM Dénes Kőnig prize
(2018), the Sloan Research Fellowship (2019), and the NSF CAREER Award (2021).
To Lu
for your constant love and support
and Andi
who arrived in time to get on this page
Contents

Preface v
Notation and Conventions ix

0 Appetizer: Triangles and Equations 1

0.1 Schur’s Theorem 1
0.2 Progressions 5
0.3 What’s Next in the Book? 9

1 Forbidding a Subgraph 11
1.1 Forbidding a Triangle: Mantel’s Theorem 12
1.2 Forbidding a Clique: Turán’s Theorem 14
1.3 Turán Density and Supersaturation 19
1.4 Forbidding a Complete Bipartite Graph: Kővári–Sós–Turán Theorem 22
1.5 Forbidding a General Subgraph: Erdős–Stone–Simonovits Theorem 27
1.6 Forbidding a Cycle 31
1.7 Forbidding a Sparse Bipartite Graph: Dependent Random Choice 33
1.8 Lower Bound Constructions: Overview 37
1.9 Randomized Constructions 38
1.10 Algebraic Constructions 40
1.11 Randomized Algebraic Constructions 46

2 Graph Regularity Method 53

2.1 Szemerédi’s Graph Regularity Lemma 54
2.2 Triangle Counting Lemma 62
2.3 Triangle Removal Lemma 64
2.4 Graph Theoretic Proof of Roth’s Theorem 67
2.5 Large 3-AP-Free Sets: Behrend’s Construction 70
2.6 Graph Counting and Removal Lemmas 72
2.7 Exercises on Applying Graph Regularity 76
2.8 Induced Graph Removal and Strong Regularity 77
2.9 Graph Property Testing 84
2.10 Hypergraph Removal and Szemerédi’s Theorem 86
2.11 Hypergraph Regularity 88

3 Pseudorandom Graphs 91
3.1 Quasirandom Graphs 92
3.2 Expander Mixing Lemma 103

i
ii Contents

3.3 Abelian Cayley Graphs and Eigenvalues 107

3.4 Quasirandom Groups 111
3.5 Quasirandom Cayley Graphs and Grothendieck’s Inequality 118
3.6 Second Eigenvalue: Alon–Boppana Bound 122

4 Graph limits 129

4.1 Graphons 130
4.2 Cut Distance 133
4.3 Homomorphism Density 137
4.4 𝑊-Random Graphs 139
4.5 Counting Lemma 142
4.6 Weak Regularity Lemma 144
4.7 Martingale Convergence Theorem 148
4.8 Compactness of the Graphon Space 151
4.9 Equivalence of Convergence 154

5 Graph Homomorphism Inequalities 161

5.1 Edge vs. Triangle Densities 164
5.2 Cauchy–Schwarz 169
5.3 Hölder 177
5.4 Lagrangian 186
5.5 Entropy 190

6 Forbidding 3-Term Arithmetic Progressions 201

6.1 Fourier Analysis in Finite Field Vector Spaces 201
6.2 Roth’s Theorem in the Finite Field Model 206
6.3 Fourier Analysis in the Integers 212
6.4 Roth’s Theorem in the Integers 214
6.5 Polynomial Method 220
6.6 Arithmetic Regularity 224
6.7 Popular Common Difference 230

7 Structure of Set Addition 235

7.1 Sets of Small Doubling: Freiman’s Theorem 236
7.2 Sumset Calculus I: Ruzsa Triangle Inequality 238
7.3 Sumset Calculus II: Plünnecke’s Inequality 239
7.4 Covering Lemma 242
7.5 Freiman’s Theorem in Groups with Bounded Exponent 243
7.6 Freiman Homomorphisms 245
7.7 Modeling Lemma 247
7.8 Iterated Sumsets: Bogolyubov’s Lemma 249
7.9 Geometry of Numbers 253
7.10 Finding a GAP in a Bohr Set 256
7.11 Proof of Freiman’s Theorem 258
7.12 Polynomial Freiman–Ruzsa Conjecture 259
7.13 Additive Energy and the Balog–Szemerédi–Gowers Theorem 262

8 Sum-Product Problem 269

Contents iii

8.1 Multiplication Table Problem 270

8.2 Crossing Number Inequality and Point-Line Incidences 271
8.3 Sum-Product via Multiplicative Energy 275

9 Progressions in Sparse Pseudorandom Sets 279

9.1 Green–Tao Theorem 279
9.2 Relative Szemerédi Theorem 281
9.3 Transference Principle 285
9.4 Dense Model Theorem 286
9.5 Sparse Counting Lemma 292
9.6 Proof of the Relative Roth Theorem 299

References 303
Index 319
Preface

Who is this book for?

This textbook is intended for graduate and advanced undergraduate students, as well as re-
searchers in mathematics, computer science, and related areas. The material should appeal to
anyone with an interest in combinatorics, theoretical computer science, analysis, probability,
and number theory. It can be used as a textbook for a class or self-study, or as a research
reference.

Why this book?

There have been many exciting developments in graph theory and additive combinatorics in
recent decades. It is the first introductory graduate level textbook to focus on a unifying set
of topics connecting graph theory and additive combinatorics.
This textbook arose from a one-semester graduate-level course that I developed at MIT
(and still teach regularly) to introduce students to a spectrum of beautiful mathematics in the
field.

Lecture videos
A complete set of video lectures from my Fall 2019 class is available for free through MIT
OpenCourseWare and YouTube (search for Graph Theory and Additive Combinatorics and
MIT OCW). The lecture videos are a useful resource and complement this book.

What is this book about?

This book introduces the readers to classical and modern developments in graph theory and
additive combinatorics, with a focus on topics and themes that connect the two subjects.
A foundational result in additive combinatorics is Roth’s theorem, which says that every
subset of {1, 2, . . . , } without a 3-term arithmetic progression has at most 𝑜(𝑁) elements. We
will see different proofs of Roth’s theorem, using tools from graph theory and Fourier analysis.
A key idea in both approaches is the dichotomy of structure versus pseudorandomness.
Roth’s theorem laid the groundwork for many important later developments, such as
• Szemerédi’s theorem: Every set of integers of positive density contains arbitrarily long
arithmetic progressions; and
• Green–Tao theorem: The primes contain arbitrarily long arithmetic progressions.

v
vi Preface

A core thread throughout the book is the connection bridging graph theory and additive
combinatorics. The book opens with Schur’s theorem, which is an early example whose proof
illustrates this connection. Graph theoretic perspectives are presented throughout the book.
Here are some the topics and questions considered in this book:
Chapter 1: Forbidding a subgraph
What is the maximum number of edges in a triangle-free graph on 𝑛 vertices? What if
instead we forbidding some other subgraph? This is known as the Turán problem.
Chapter 2: Graph regularity method
Szemerédi introduced this powerful tool that provides an approximate structural descrip-
tion for every large graph.
Chapter 3: Pseudorandom graphs
What does it mean for some graph to resemble a random graph?
Chapter 4: Graph limits
In what sense can a sequence of graphs, increasing in size, converge to some limit object?
Chapter 5: Graph homomorphism inequalities
What are possible relationships between subgraph densities?
Chapter 6: Forbidding a 3-term arithemtic progression
Roth’s theorem and Fourier analysis in additive combinatorics.
Chapter 7: Structure of set addition
What can one say about a set of integer 𝐴 with small sumset 𝐴 + 𝐴 = {𝑎 + 𝑏 : 𝑎, 𝑏 ∈ 𝐴}?
Freiman’s theorem is foundational result that gives an answer.
Chapter 8: Sum-product problem
Can a set 𝐴 simultaneously have both small sumset 𝐴 + 𝐴 and product set 𝐴 · 𝐴?
Chapter 9: Progressions in sparse pseudorandom sets
Key ideas in the proof of the Green–Tao theorem. How can we apply a dense setting result,
namely Szemerédi’s theorem, to a sparse set?
For a more detailed list of topics, see the highlights and summary boxes at the beginning
and the end of each chapter.
The book is roughly divided into two parts, with graph theory the focus of Chapters 1 to 5
and additive combinatorics the focus of Chapters 6 to 9. These are not disjoint and separate
subjects. Rather, graph theory and additive combinatorics are interleaved throughout the
book. We emphasize their interactions. Each chapter can be enjoyed independently as there
are very few dependencies between chapters, though one gets the most out of the book by
appreciating the connections.

Using the textbook for a class

The contents may be taught as a fast-paced one semester class, or as a two-semester sequence
each focusing one half of the book: the first on extremal graph theory and the second on
additive combinatorics.
For a one-semester class (which is how I teach it at MIT; see my website or MIT OCW
for syllabus, lecture videos, homework, and further information), I suggest skipping some
Preface vii

more technical or advanced topics and proofs, including: (Chapter 1) the proofs of the
Erdős–Stone–Simonovits theorem, the 𝐾𝑠,𝑡 construction, randomized algebraic construction;
(Chapter 2) the proof of the graph counting lemma, induced graph removal and strong
regularity, hypergraph regularity and removal; (Chapter 3) quasirandom groups, quasirandom
Cayley graphs; (Chapter 4) most technical proofs on graph limits; (Chapter 5) Hölder, entropy;
(Chapter 6) arithmetic regularity and popoular common difference; (Chapter 7) proofs in the
later part of the chapter if short on time; (Chapter 9) proof details.
For a class focused on one part of the book, one may wish to explore further topics as
suggested in Further Reading at the end of each chapter.

Prerequisites
The prerequisites are minimal—primarily mathematical maturity and an interest in combina-
torics. Some basic concepts from abstract algebra, analysis, and linear algebra are assumed.

Exercises
The book contains around 150 carefully selected exercises. They are scattered throughout
each chapter. Some exercises are embedded in the middle of a section—these exercises are
meant as routine tests of understanding of the concepts just discussed. For example, they
sometimes ask you to fill in missing proof details or think about easy generalizations and
extensions. The exercises at the end of each section are carefully selected problems that
reinforce the techniques discussed in the chapter. Hopefully they are all interesting. Most
of them are intended to test your mastery of the techniques taught in the chapter. Many of
these end-of-chapter exercises are quite challenging, with starred problems intended to be
more difficult but still do-able by a strong student given the techniques taught. Many of these
exercises are adapted from lemmas and results from research papers (I apologize for omitting
references for the exercises, so that they can be used as homework assignments).
Spending time with the exercises is essential for mastering the techniques. I used many
of these exercises in my classes. My students often told me that they thought that they
had understood the material after a lecture, only to discover their incomplete mastery when
confronted with the exercises. Struggling with these exercises led them to newfound insight.

Further reading
This is a massive and rapidly expanding subject. The book is intended to be introductory
and enticing rather than comprehensive. Each chapter concludes with recommendations for
further reading for anyone who wishes to learn more. Additionally, references are given
generously throughout the text for anyone who wishes to dive deeper and read the original
sources.

Acknowledgements
I thank all my teachers and mentors who have taught me the subject starting from when
I was a graduate student, with a special shoutout to my PhD advisor Jacob Fox for his
viii Preface

dedicated mentorship. I first encountered this subject at the University of Cambridge, when
I took a Part III class on extremal graph theory taught by David Conlon. Over the years,
I learned a lot from various researchers thanks to their carefully and insightfully written
lecture notes scattered on the web, in particular by David Conlon, Tim Gowers, Andrew
Granville, Ben Green, Choongbum Lee, László Lovász, Imre Ruzsa, Asaf Shapira, Adam
Sheffer, K. Soundararajan, Terry Tao, and Jacques Verstraete.
This book arose from a one-semester course that I taught at MIT in Fall 2017, 2019,
and 2021. I thank all my amazing and dedicated students who kept their interest in my
teaching — they were instrumental in motivating me to complete this book project. Students
from the 2017 and 2019 classes took notes based on my lectures, which I subsequently
rewrote and revised into this book. My 2021 class used an early draft of this book and gave
valuable comments and feedback. There are many students whom I wish to thank, and here
is my attempt at listing them (my apologies to anyone whose name I inadvertently omitted):
Dhroova Aiylam, Ganesh Ajjanagadde, Shyan Akmal, Ryan Alweiss, Morris Ang Jie Jun,
Adam Ardeishar, Matt Babbitt, Yonah Borns-Weil, Matthew Brennan, Brynmor Chapman,
Evan Chen, Byron Chin, Ahmed Chowdhury Zawad, Anlong Chua, Travis Dillon, Jonathan
Figueroa Rodriguez, Christian Gaetz, Shengwen Gan, Jiyang Gao, Yibo Gao, Swapnil Garg,
Benjamin Gunby, Meghal Gupta, Kaarel Haenni, Milan Haiman, Linus Hamilton, Carina
Hong Letong, Vishesh Jain, Pakawut Jiradilok, Sujay Kazi, Younhun Kim, Elena Kim,
Dain Kim, Yael Kirkpatrick, Daishi Kiyohara, Frederic Koehler, Keiran Lewellen, Anqi Li,
Jerry Li, Allen Liu, Michael Ma, Nitya Mani, Olga Medrano, Holden Mui, Eshaan Nichani,
Yuchong Pan, Minjae Park, Alan Peng, Saranesh Prembabu, Michael Ren, Dhruv Rohatgi,
Diego Roque, Ashwin Sah, Maya Sankar, Mehtaab Sawhney, Carl Schildkraut, Tristan Shin,
Mihir Singhal, Tomasz Slusarczyk Albert Soh, Kevin Sun, Sarah Tammen, Jonathan Tidor,
Paxton Turner, Danielle Wang, Hong Wang, Nicole Wein, Jake Wellens, Chris Xu, Max
Wenqiang Xu, Yinzhan Xu, Zixuan Xu, Lisa Yang, Yuan Yao, Richard Yi, Hung-Hsun Yu,
Lingxian Zhang, Kai Zheng, Yunkun Zhou. Additionally, I would like to thank Thomas
Bloom and Zilin Jiang for carefully reading the book draft and sending in many suggestions
for corrections and improvements.
The title page illustration (with the bridge) was drawn by my friend Anne Ma.
I also wish to acknowledge research funding support during the writing of this book,
including from the National Science Foundation, the Sloan Research Foundation, as well
as support from MIT including the Solomon Buchsbaum Research Fund, the Class of 1956
Career Development Professorship, and the Edmund F. Kelly Research Award.
Finally, I am grateful to all my students, colleagues, friends, and family for their encour-
agement throughout the writing of the book, and most importantly to Lu for her unwavering
support through the whole process, especially in the late stages of the book writing, which
coincided with the arrival of our baby daughter Andi.

Yufei Zhao
Cambridge, MA
February 2022
http://yufeizhao.com/
yufeiz@mit.edu
Notation and Conventions

We use standard notation in this book. The comments here are mostly for clarification. You
should skip this section and return to it only as needed.

Sets
We write [𝑵] := {1, 2, . . . , 𝑁 }. Also N := {1, 2, . . . }.
Given a finite set 𝑆 and a positive integer 𝑟, we write 𝑺𝒓 for the set of 𝑟-element subsets

of 𝑆.
If 𝑆 is a finite set and 𝑓 is a function on 𝑆, we use the expectation notation E 𝒙∈𝑺 𝒇 (𝒙), or
Í
more simply E 𝒙 𝒇 (𝒙) (or even E 𝒇 if there is no confusion) to mean the average |𝑆| −1 𝑥 ∈𝑆 𝑓 (𝑥).
We also use the symbol E for its usual meaning as the expectation for some random variable.

A 𝒌-term arithmetic progression (abbreviated 𝒌-AP) in an abelian group is a sequence

of the form
𝑎, 𝑎 + 𝑑, 𝑎 + 2𝑑, . . . , 𝑎 + (𝑘 − 1)𝑑.
Here 𝑑 is called the common difference. The progression is called non-trivial if 𝑑 ≠ 0,
and trivial if 𝑑 = 0. When we say that a set 𝐴 contains a 𝑘-AP, we mean that it contains a
non-trivial 𝑘-AP. Likewise, when we say that 𝐴 is 𝒌-AP-free, we mean that it contains no
non-trivial 𝑘-APs.

Graphs
We write a graph as 𝑮 = (𝑽, 𝑬), where 𝑉 is a finite set of vertices, and
𝐸 is the set of edges.
Each edge is an unordered pair of distinct vertices. Formally, 𝐸 ⊆ 𝑉2 .
Given a graph 𝐺, we write 𝑽 (𝑮) for the set of vertices, and 𝑬 (𝑮) for the set of edges,
and denote their cardinalities by 𝒗(𝑮) := |𝑉 (𝐺)| and 𝒆(𝑮) := |𝐸 (𝐺)|.
In a graph 𝐺, the neighborhood of a vertex 𝑥, denoted 𝑵𝑮 (𝒙) (or simply 𝑁 (𝑥) if there is
no confusion), is the set of vertices 𝑦 such that 𝑥𝑦 is an edge. The degree of 𝑥 is the number
of neighbors of 𝑥, denoted deg𝑮 (𝒙) := |𝑁𝐺 (𝑥)| (or simply written as deg(𝒙)).
Given a graph 𝐺, for each 𝐴 ⊆ 𝑉 (𝐺), we write 𝒆(𝑨) to denote the number of edges with
both endpoints in 𝐴. Given 𝐴, 𝐵 ⊆ 𝑉 (𝐺) (not necessarily disjoint), we write
𝒆(𝑨, 𝑩) := |{(𝑎, 𝑏) ∈ 𝐴 × 𝐵 : 𝑎𝑏 ∈ 𝐸 (𝐺)}| .

ix
x Notation and Conventions

Note that when 𝐴 and 𝐵 are disjoint, 𝑒( 𝐴, 𝐵) is the number of the edges between 𝐴 and 𝐵.
On the other hand, 𝑒( 𝐴, 𝐴) = 2𝑒( 𝐴) as each edge within 𝐴 is counted twice.
Here are some standard graphs:
• 𝑲𝒓 is the complete graph on 𝑟 vertices, also known as an 𝒓-clique;
• 𝑲𝒔,𝒕 is the complete bipartite graph with 𝑠 vertices in one vertex part and 𝑡 vertices in
the other vertex part;
• 𝑲𝒓 ,𝒔,𝒕 is a complete tripartite graph with vertex parts having sizes 𝑟, 𝑠, 𝑡 respectively
(e.g., 𝐾1,1,1 = 𝐾3 ); and so on analogously for complete multipartite graphs with more
parts;
• 𝑪ℓ (ℓ ≥ 3) is a cycle with ℓ vertices and ℓ edges.
Some examples are shown below.

𝐾4 𝐾3,2 𝐾3,2,2 𝐶8

Given two graphs 𝐻 and 𝐺, we say that 𝐻 is a subgraph of 𝐺 if one can delete some
vertices and edges from 𝐺 to obtain a graph isomorphic to 𝐻 (example below). A copy of
𝐻 in 𝐺 is a subgraph 𝐺 that is isomorphic to 𝐻. A labeled copy of 𝐻 in 𝐺 is a subgraph of
𝐺 isomorphic to 𝐻 where we also specify the isomorphism from 𝐻. Equivalently, a labeled
copy of 𝐻 in 𝐺 is an injective graph homomorphism from 𝐻 to 𝐺. For example, if 𝐺 has 𝑞
copies of 𝐾3 , the 𝐺 has 6𝑞 labeled copies of 𝐾3 .
We say that 𝐻 is an induced subgraph of 𝐺 if one can delete some vertices of 𝐺 (when
we delete a vertex, we also remove all edges incident to the vertex) to obtain 𝐻—note that
in particular we are not allowed to remove additional edges other than those incident to a
deleted vertex. If 𝑆 ⊆ 𝑉 (𝐺), we write 𝑮[𝑺] to denote the subgraph of 𝐺 induced by the
vertex set 𝑆, i.e., 𝐺 [𝑆] is the subgraph with vertex set 𝑆 and keeping all the edges from 𝐺
among 𝑆.
As an example, the following graph contains the 4-cycle as an induced subgraph. It contains
the 5-cycle as a subgraph but not as an induced subgraph.

In this book, when we say 𝑯-free, we always mean not containing 𝐻 as a subgraph. On
the other hand, we say induced 𝑯-free to mean not containing 𝐻 as an induced subgraph.
Given two graphs 𝐹 and 𝐺, a graph homomorphism is a map 𝜙 : 𝑉 (𝐹) → 𝑉 (𝐺) (not
necessarily injective) such that 𝜙(𝑢)𝜙(𝑣) ∈ 𝐸 (𝐺) whenever 𝑢𝑣 ∈ 𝐸 (𝐹). In other words, 𝜙 is
a map of vertices that sends edges to edges. A key difference between a copy of 𝐹 in 𝐺 and
a graph homomorphism from 𝐹 to 𝐺 is that the latter does not have to be an injective map
of vertices.
The chromatic number 𝝌(𝑮) of a graph 𝐺 is the smallest number of colors needed to
Notation and Conventions xi

color the vertices of 𝐺 of so that no two adjacent vertices receive the same color (such a
coloring is called a proper coloring).
The adjacency matrix of a graph 𝐺 = (𝑉, 𝐸) is a 𝑣(𝐺) × 𝑣(𝐺) matrix whose rows and
columns both are indexed by 𝑉, and such that the entry indexed by (𝑢, 𝑣) ∈ 𝑉 × 𝑉 is 1 if
𝑢𝑣 ∈ 𝐸 and 0 if 𝑢𝑣 ∉ 𝐸.
An 𝒓-uniform hypergraph (also
called 𝒓-graph for short) consists of a finite vertex set 𝑉
along with an edge set 𝐸 ⊆ 𝑉𝑟 . Each edge of the 𝑟-graph is an 𝑟-element subset of vertices.

Asymptotics
We use the following standard asymptotic notation. Given nonnegative quantities 𝑓 and 𝑔, in
each item below, the various notations have the same meaning (as some parameter, usually
𝑛, tends to infinity)
• 𝒇 ≲ 𝒈, 𝒇 = 𝑶 (𝒈), 𝒈 = 𝛀( 𝒇 ), 𝑓 ≤ 𝐶𝑔 for some constant 𝐶 > 0
• 𝒇 = 𝒐(𝒈), 𝑓 /𝑔 → 0
• 𝒇 = 𝚯(𝒈), 𝒇 ≍ 𝒈, 𝑔 ≲ 𝑓 ≲ 𝑔
• 𝒇 ∼ 𝒈, 𝑓 = (1 + 𝑜(1))𝑔
Subscripts (e.g., 𝑶 𝒔 ( ), ≲𝒔 ) are used to emphasize that the hidden constants may depend
on the subscripted parameters. For example, 𝑓 (𝑠, 𝑥) ≲𝑠 𝑔(𝑠, 𝑥) means that for every 𝑠 there
is some constant 𝐶𝑠 so that 𝑓 (𝑠, 𝑥) ≤ 𝐶𝑠 𝑔(𝑠, 𝑥) for all 𝑥.
We avoid using ≪ since this notation carries different meanings in different communities
and by different authors. In analytic number theory, 𝑓 ≪ 𝑔 is standard for 𝑓 = 𝑂 (𝑔) (this
is called Vinogradov notation). In combinatorics and probability, 𝑓 ≪ 𝑔 sometimes means
𝑓 = 𝑜(𝑔), and sometimes means that 𝑓 is sufficiently small depending on 𝑔.
When asymptotic notation is used in the hypothesis of a statement, it should be interpreted
as being applied to a sequence rather than a single object. For example, given functions 𝑓
and 𝑔, we write
if 𝑓 (𝐺) = 𝑜(1), then 𝑔(𝐺) = 𝑜(1)
to mean
whenever a sequence 𝐺 𝑛 satisfies 𝑓 (𝐺 𝑛 ) = 𝑜(1), then 𝑔(𝐺 𝑛 ) = 𝑜(1),
which is also equivalent to
for every 𝜀 > 0 there is some 𝛿 > 0 such that if | 𝑓 (𝐺)| ≤ 𝛿 then |𝑔(𝐺)| ≤ 𝜀.
0

Appetizer: Triangles and Equations

Chapter Highlights
• Schur’s theorem on monochromatic solutions to 𝑥 + 𝑦 = 𝑧 and its graph theoretic proof
• Problems and results on progressions (e.g., Szemerédi’s theorem, the Green–Tao theorem)
• Introduction to the connection between graph theory and additive combinatorics

0.1 Schur’s Theorem

Can we prove Fermat’s Last Theorem by reducing the equation 𝑋 𝑛 + 𝑌 𝑛 = 𝑍 𝑛 modulo a
prime 𝑝?
It turns out this approach can never work. Dickson (1909) showed that the equation mod 𝑝
can always be solved for sufficiently large primes 𝑝, no matter what 𝑛 is. Schur (1916) gave a
simpler proof of this result by proving the following theorem, showing that Dickson’s result
is much more about combinatorics than about number theory.

Theorem 0.1.1 (Schur’s theorem)

If the positive integers are colored using finitely many colors, then there is always a
monochromatic solution to 𝑥 + 𝑦 = 𝑧 (i.e., 𝑥, 𝑦, 𝑧 all have the same color).

We will prove Schur’s theorem shortly.

Finitary vs. infinitary

Many theorems in this book can be stated in multiple equivalent ways. For instance, Schur’s
theorem above is stated in an infinitary form. It has has an equivalent finitary version below.
We write [𝑁] := {1, 2, . . . , 𝑁 }.

Theorem 0.1.2 (Schur’s theorem, finitary version)

For every positive integer 𝑟, there exists a positive integer 𝑁 = 𝑁 (𝑟) such that if each
element of [𝑁] is colored using one of 𝑟 colors, then there is a monochromatic solution
to 𝑥 + 𝑦 = 𝑧.

The finitary formulation leads to quantitative questions. For example, how large does 𝑁 (𝑟)
have to be as a function of 𝑟? Questions of this type are often quite difficult to resolve, even
approximately. There are lots of open questions concerning quantitative bounds.

1
2 Appetizer: Triangles and Equations

Proof that the above two formulations of Schur’s theorem are equivalent. First, the finitary
version (Theorem 0.1.2) of Schur’s theorem easily implies the infinitary version (Theo-
rem 0.1.1). Indeed, in the infinitary version, given a coloring of the positive integers, we
can consider the colorings of the first 𝑁 (𝑟) integers and use the finitary statement to find a
monochromatic solution.
To prove that the infinitary version implies the finitary version, we use a diagonalization
argument. Fix 𝑟, and suppose that for every 𝑁 there is some coloring 𝜙 𝑁 : [𝑁] → [𝑟] that
avoids monochromatic solutions to 𝑥 + 𝑦 = 𝑧. We can take an infinite subsequence of (𝜙 𝑁 )
such that, for every 𝑘 ∈ N, the value of 𝜙 𝑁 (𝑘) stabilizes to a constant as 𝑁 increases along this
subsequence (we can do this by repeatedly restricting to convergent infinite subsequences).
Then the 𝜙 𝑁 ’s, along this subsequence, converge pointwise to some coloring 𝜙 : N → [𝑟]
avoiding monochromatic solutions to 𝑥 + 𝑦 = 𝑧, but 𝜙 contradicts the infinitary statement. □

Fermat’s equation modulo a prime

Let us show how to deduce the existence of solutions to 𝑋 𝑛 +𝑌 𝑛 ≡ 𝑍 𝑛 (mod 𝑝) using Schur’s
theorem.

Theorem 0.1.3 (Fermat’s Last Theorem mod 𝑝 )

Let 𝑛 be a positive integer. For all sufficiently large prime 𝑝, there exist 𝑋, 𝑌 , 𝑍 ∈
{1, . . . , 𝑝 − 1} such that 𝑋 𝑛 + 𝑌 𝑛 ≡ 𝑍 𝑛 (mod 𝑝).

Proof assuming Schur’s theorem (Theorem 0.1.2). Let (Z/𝑝Z) × denote the group of nonzero
residues mod 𝑝 under multiplication. Let 𝐻 = {𝑥 𝑛 : 𝑥 ∈ (Z/𝑝Z) × } be the subgroup of 𝑛-th
powers in (Z/𝑝Z) × . Since (Z/𝑝Z) × is a cyclic group of order 𝑝 − 1 (due to the existence of
primitive roots mod 𝑝, a fact from elementary number theory), the index of 𝐻 in (Z/𝑝Z) ×
is equal to gcd(𝑛, 𝑝 − 1) ≤ 𝑛. So the cosets of 𝐻 partition {1, 2, . . . , 𝑝 − 1} into ≤ 𝑛 sets.
Viewing each of the ≤ 𝑛 cosets of 𝐻 as a “color”, by the finitary statement of Schur’s theorem
(Theorem 0.1.2), for 𝑝 large enough as a function of 𝑛, there exists a solution to
𝑥+𝑦=𝑧 in Z
in some coset of 𝐻, say 𝑥, 𝑦, 𝑧 ∈ 𝑎𝐻 for some 𝑎 ∈ (Z/𝑝Z) × . Since 𝐻 consists of 𝑛-th powers,
we have 𝑥 = 𝑎𝑋 𝑛 , 𝑦 = 𝑎𝑌 𝑛 , and 𝑧 = 𝑎𝑍 𝑛 for some 𝑋, 𝑌 , 𝑍 ∈ (Z/𝑝Z) × . Thus
𝑎𝑋 𝑛 + 𝑎𝑌 𝑛 ≡ 𝑎𝑍 𝑛 (mod 𝑝).
Since 𝑎 ∈ (Z/𝑝Z) × is invertible mod 𝑝, we have 𝑋 𝑛 + 𝑌 𝑛 ≡ 𝑍 𝑛 (mod 𝑝) as desired. □

Ramsey’s theorem
Now let us prove Schur’s theorem (Theorem 0.1.2) by deducing it from an analogous result
about edge-coloring of a complete graph.
We write 𝐾 𝑁 for the complete graph on 𝑁 vertices.
0.1 Schur’s Theorem 3

Theorem 0.1.4 (Multicolor triangle Ramsey theorem)

For every positive integer 𝑟, there is some integer 𝑁 = 𝑁 (𝑟) such that if each edge of 𝐾 𝑁
is colored using one of 𝑟 colors, then there is a monochromatic triangle.

Proof. Define
𝑁1 = 3, and 𝑁𝑟 = 𝑟 (𝑁𝑟 −1 − 1) + 2 for all 𝑟 ≥ 2. (0.1)
We show by induction on 𝑟 that every coloring of the edges of 𝐾 𝑁𝑟 by 𝑟 colors has a
monochromatic triangle. The case 𝑟 = 1 holds trivially.
Suppose the claim is true for 𝑟 − 1 colors. Consider any edges-coloring of 𝐾 𝑁𝑟 using 𝑟
colors. Pick an arbitrary vertex 𝑣. Of the 𝑁𝑟 − 1 = 𝑟 (𝑁𝑟 −1 − 1) + 1 edges incident to 𝑣, by
the pigeonhole principle, at least 𝑁𝑟 −1 edges incident to 𝑣 have the same color, say red. Let
𝑉0 be the vertices joined to 𝑣 by a red edge.

𝑉0
𝑣

If there is a red edge inside 𝑉0 , we obtain a red triangle. Otherwise, there are at most 𝑟 − 1
colors appearing among |𝑉0 | ≥ 𝑁𝑟 −1 vertices, and we have a monochromatic triangle inside
𝑉0 by the induction hypothesis. □
Í𝑟
Exercise 0.1.5. Show that 𝑁𝑟 from (0.1) satisfies 𝑁𝑟 = 1 + 𝑟! 𝑖=0 1/𝑖! = ⌈𝑟!𝑒⌉.

Remark 0.1.6 (Ramsey’s theorem). The above recursive/inductive pigeonhole argument

can be easily adapted to prove Ramsey’s theorem in general.

Theorem 0.1.7 (Graph Ramsey theorem)

For every 𝑘 and 𝑟 there exists some 𝑁 = 𝑁 (𝑘, 𝑟) such that if each edge of 𝐾 𝑁 is colored
using one of 𝑟 colors, then there is a monochromatic 𝐾 𝑘 .

Exercise 0.1.8. Prove the graph Ramsey theorem (Theorem 0.1.7).

Ramsey’s theorem extends even more generally to hypergraphs.

Theorem 0.1.9 (Hypergraph Ramsey theorem)

For every 𝑘, 𝑟, 𝑠 there exists some 𝑁 = 𝑁 (𝑘, 𝑟, 𝑠) such that if each edge of a complete
𝑠-uniform hypergraph on 𝑁 vertices is colored using one of 𝑟 colors, then there is a
monochromatic clique on 𝑘 vertices.

Exercise 0.1.10. Prove the hypergraph Ramsey theorem (Theorem 0.1.9).

Remark 0.1.11 (Bounds for multicolor triangle Ramsey numbers). The smallest 𝑁 (𝑟)
in Theorem 0.1.4 is also known as the multicolor triangle Ramsey number, denoted
𝑹(3, 3, . . . , 3) with 3 repeated 𝑟 times. It is a major open problem in Ramsey theory to
determine the rate of growth of this Ramsey number. Here is an easy argument showing an
exponential lower bound. (Compare it to the upper bound from Exercise 0.1.5.)
4 Appetizer: Triangles and Equations

Proposition 0.1.12 (Multicolor triangle Ramsey numbers: exponential lower bound)

For each positive integer 𝑟, there exists an edge-coloring of 𝐾2𝑟 using 𝑟 colors with no
monochromatic triangle.

Proof. Label the vertices by elements of {0, 1}𝑟 . Assign an edge color 𝑖 if 𝑖 is the smallest
index such that the two endpoint vertices differ on coordinate 𝑖. This coloring does not have
monochromatic triangles. Indeed, suppose 𝑥, 𝑦, 𝑧 form a monochromatic triangle with color
𝑖, then 𝑥𝑖 , 𝑦 𝑖 , 𝑧𝑖 ∈ {0, 1} must be all distinct, which is impossible. □
Schur (1916) had actually given an even better lower bound: see Exercise 0.1.14. One of
Erdős’ favorite problems asks whether there is an exponential upper bound. This is major
open problem in Ramsey theory, and it is related to to other important topics in combinatorics
such as the Shannon capacity of graphs (see, e.g., the survey by Nešetřil & Rosenfeld 2001).

Open problem 0.1.13 (Multicolor triangle Ramsey numbers: exponential upper bound)
Is there a constant 𝐶 > 0 so that if 𝑁 ≥ 𝐶 𝑟 , then every edge-coloring of 𝐾 𝑁 using 𝑟
colors contains a monochromatic triangle?

Graph theoretic proof of Schur’s theorem

We set up a graph whose triangles correspond to solutions to 𝑥 + 𝑦 = 𝑧, and then apply the
multicolor triangle Ramsey theorem.
Proof of Schur’s theorem (Theorem 0.1.2). Let 𝜙 : [𝑁] → [𝑟] be a coloring. Color the
edges of a complete graph with vertices {1, . . . , 𝑁 + 1} by giving the edge {𝑖, 𝑗 } with 𝑖 < 𝑗
the color 𝜙( 𝑗 − 𝑖).
𝜙(𝑘 − 𝑖)

𝑖 𝜙( 𝑗 − 𝑖) 𝑗 𝜙(𝑘 − 𝑗) 𝑘

By Theorem 0.1.4, if 𝑁 is large enough, then there is a monochromatic triangle, say on

vertices 𝑖 < 𝑗 < 𝑘. So 𝜙( 𝑗 − 𝑖) = 𝜙(𝑘 − 𝑗) = 𝜙(𝑘 − 𝑖). Take 𝑥 = 𝑗 − 𝑖, 𝑦 = 𝑘 − 𝑗, and 𝑧 = 𝑘 − 𝑖.
Then 𝜙(𝑥) = 𝜙(𝑦) = 𝜙(𝑧) and 𝑥 + 𝑦 = 𝑧, as desired. □
Now that we proved Schur’s theorem, let us pause and think about what did we gain by
translating the problem to graph theory? We were able to apply Ramsey’s theorem, whose
proof considers restrictions to subgraphs, which would have been rather unnatural if we had
worked exclusively in the integers. Graphs gave us greater flexibility.
Later in the book, we will see other more sophisticated examples of this idea. We will gain
new perspectives by bringing number theory problems to graph theory.
Exercise 0.1.14 (Schur’s lower bound). Let 𝑁 (𝑟) denote the smallest positive integer in
Schur’s theorem (Theorem 0.1.2). Show that 𝑁 (𝑟) ≥ 3𝑁 (𝑟 − 1) − 1 for every 𝑟. Deduce
that 𝑁 (𝑟) ≥ (3𝑟 + 1)/2 for every 𝑟. Also deduce that there exists a coloring of the edges of
𝐾 (3𝑟 +1)/2 with 𝑟 colors so that there are no monochromatic triangles.
0.2 Progressions 5

Exercise 0.1.15 (Upper bound on Ramsey numbers). Let 𝑠 and 𝑡 be positive integers.
−2
Show that if the edges of a complete graph on 𝑠+𝑡 𝑠−1
vertices are colored with red and
blue, then there must be either a red 𝐾𝑠 or a blue 𝐾𝑡 .

Exercise 0.1.16 (Monochromatic triangles compared to random coloring).

(a) True or false: If the edges of 𝐾𝑛 are colored using 2 colors, then at least 1/4 − 𝑜(1)
fraction of all triangles are monochromatic. (Note that 1/4 is the fraction one
expects if the edges were colored uniformly at random.)
(b) True or false: if the edges of 𝐾𝑛 are colored using 3 colors, then at least 1/9 − 𝑜(1)
fraction of all triangles are monochromatic.
(c*) True or false: if the edges of 𝐾𝑛 are colored using 2 colors, then at least 1/32 − 𝑜(1)
fraction of all copies of 𝐾4 ’s are monochromatic.

0.2 Progressions
Additive combinatorics describes a rapidly growing body of mathematics motivated by
simple-to-state questions about addition and multiplication of integers (the name “additive
combinatorics” became popular in the 2000’s, when the field witnessed a rapid explosion
thanks to the groundbreaking works of Gowers, Green, Tao, and others; previously the area
was more commonly known as “combinatorial number theory”). The problems and methods
in additive combinatorics are deep and far-reaching, connecting many different areas of
mathematics such as graph theory, harmonic analysis, ergodic theory, discrete geometry, and
model theory.
Here we highlight some important developments in additive combinatorics, particularly
concerning progressions. The ideas behind these developments form some of the core themes
of this book.

Towards Szemerédi’s theorem

Schur’s theorem above is one of the earliest results in additive combinatorics. It has important
variations and extensions, such as the following seminal result of van der Waerden (1927)
on monochromatic arithmetic progressions.

Theorem 0.2.1 (van der Waerden’s theorem)

If the integers are colored using finitely many colors, then there exist arbitrarily long
monochromatic arithmetic progressions.

Note that having arbitrarily long arithmetic progressions is very different from having
infinitely long arithmetic progressions, as seen in the next exercise.
Exercise 0.2.2. Show that Z may be colored using two colors so that it contains no
infinitely long arithmetic progressions.
Erdős & Turán (1936) conjectured a stronger statement, that any subset of the integers
with positive density contains arbitrarily long arithmetic progressions. To be precise, we say
6 Appetizer: Triangles and Equations

that 𝐴 ⊆ Z has positive upper density if

| 𝐴 ∩ {−𝑁, . . . , 𝑁 }|
lim sup > 0. (0.2)
𝑁 →∞ 2𝑁 + 1
(There are several variations of definition of density—the exact formulation is not crucial
here.) The Erdős & Turán conjecture speculates that the “true” reason for van der Waerden’s
theorem is not so much having finitely many colors (as in Ramsey’s theorem), but rather that
some color class necessarily has positive density (the analogous claim is false for graphs
since a triangle-free graph can have edge-density up to 1/2; we explore this topic further in
the next chapter).
Roth (1953) proved the Erdős & Turán conjecture for 3-term arithmetic progressions
using Fourier analysis. It took another two decades before Szemerédi (1975) fully settled
the conjecture in a combinatorial tour de force. These theorems by Roth and Szemerédi are
landmark results in additive combinatorics. Much of what we will discuss in the book is
motivated by these results and the developments around them.

Theorem 0.2.3 (Roth’s theorem)

Every subset of the integers with positive upper density contains a 3-term arithmetic
progression.

Theorem 0.2.4 (Szemerédi’s theorem)

Every subset of the integers with positive upper density contains arbitrarily long arith-
metic progressions.

Szemerédi’s theorem is deep and intricate. This important work led to many subsequent
developments in additive combinatorics. Several different proofs of Szemerédi’s theorem
have since been discovered, and some of them have blossomed into rich areas of mathematical
research. Here are some of the most influential modern proofs of Szemerédi’s theorem (in
historical order):
• The ergodic theoretic approach by Furstenberg (1977);
• Higher-order Fourier analysis by Gowers (2001);
• Hypergraph regularity lemma by independently Rödl et al. (2005) and Gowers (2001).
Another modern proof of Szemerédi’s theorem results from the density Hales–Jewett
theorem, which was originally proved by Furstenberg & Katznelson (1978) using ergodic
theory. Subsequently a new combinatorial proof was found in the first successful Polymath
Project (Polymath 2012), an online collaborative project initiated by Gowers.
Each approaches has its own advantages and disadvantages. For example, the ergodic
approach led to multidimensional and polynomial generalizations of Szemerédi’s theorem,
which we discuss below. On the other hand, the ergodic approach does not give any concrete
quantitative bounds. Fourier analysis and its generalizations produce the best quantitative
bounds to Szemerédi’s theorem. They also led to deep results about counting patterns in the
prime numbers. However, there appear to be difficulties and obstructions extending Fourier
analysis to higher dimensions.
The relationships between these different approaches to Szemerédi’s theorem are not yet
0.2 Progressions 7

completely understood. A unifying theme underlying all known approaches to Szemerédi’s

theorem is
the dichotomy between structure and pseudorandomness.
This phrase is popularized by Tao (2007b) and others. It will be a theme throughout this
book. We will see facets of this dichotomy in both graph theory and additive combinatorics.

Quantitative bounds on Szemerédi’s theorem

There is much interest in obtaining better quantitative bounds on Szemerédi’s theorem. Roth’s
initial proof showed that every subset of [𝑁] avoiding 3-term arithmetic progressions has
size 𝑂 (𝑁/log log 𝑁) (we will see this proof in Chapter 6). Roth’s upper bound has been
improved steadily over time, all via refinement of his Fourier analytic technique. At the
time of this writing, the current best upper bound is 𝑁/(log 𝑁) 1+𝑐 for some constant 𝑐 > 0
(Bloom & Sisask 2020). For 4-term arithmetic progressions, the best known upper bound is
𝑁/(log 𝑁) 𝑐 (Green & Tao 2017). For 𝑘-term arithmetic progressions, with fixed 𝑘 ≥ 5, the
best known upper bound is 𝑁/(log log 𝑁) 𝑐𝑘 (Gowers 2001). √
As for lower bounds, Behrend (1946) constructed a subset of [𝑁] of size 𝑁𝑒 −𝑐 log 𝑁
avoiding three term arithmetic progressions. This is an important construction that we will
see in Section 2.5. Some researchers think that this lower bound is closer to the truth, since
for a variant of Roth’s theorem (namely avoiding solutions to 𝑥 + 𝑦 + 𝑧 = 3𝑤), Behrend’s
construction is quite close to the truth (Schoen & Shkredov 2014; Schoen & Sisask 2016).
Erdős famously conjectured the following.

Conjecture 0.2.5 (Erdős conjecture on arithmetic progressions)

Í
Every subset 𝐴 of integers with 𝑎∈ 𝐴 1/𝑎 = ∞ contains arbitrarily long arithmetic
progressions.

This is a strengthening of the Erdős–Turán conjecture (later Szemerédi’s theorem), since

every subset of integers with positive density necessarily has divergent harmonic sum.
Erdős’ conjecture was motivated by the primes (see the Green–Tao theorem below). It has an
attractive statement and is widely publicized. The supposed connection between divergent
harmonic series and arithmetic progressions seems magical. However, this connection is
perhaps somewhat misleading. The hypothesis on divergent harmonic series implies that
there are infinitely many 𝑁 for which | 𝐴 ∩ [𝑁] | ≥ 𝑁/(log 𝑁 (log log 𝑁) 2 ). So the Erdős
conjecture is really about an upper bound on Szemerédi’s theorem. As mentioned earlier, it is
plausible that true upper bound for Szemerédi may be much lower than 1/log 𝑁. Nevertheless,
“logarithmic barrier” proposed by the Erdős conjecture has a special symbolic and historical
status. Erdős’ conjecture for 𝑘-term arithmetic progressions is now proved for 𝑘 = 3 thanks
to the new 𝑁/(log 𝑁) 1+𝑐 upper bound (Bloom & Sisask 2020), but it remains very much
open for all 𝑘 ≥ 3.
Improving quantitative bounds on Szemerédi’s theorem remains an active area of research.
Perhaps by the time you read this book (or when I update it to a future edition), these bounds
will have been significantly improved.
8 Appetizer: Triangles and Equations

Extensions of Szemerédi’s theorem

Instead of working over subsets of integers, what happens if we consider subsets of the lattice
Z𝑑 ? We say that 𝐴 ⊆ Z𝑑 has positive upper density if
| 𝐴 ∩ [−𝑁, 𝑁] 𝑑 |
lim sup >0
𝑁 →∞ (2𝑁 + 1) 𝑑
(as before, other similar definitions are possible). How can we generalize the notion of a
subset of Z containing arbitrarily long arithmetic progressions? We could desire 𝐴 to contain
𝑘 × 𝑘 × · · · × 𝑘 cubical grids for arbitrarily large 𝑘. Equivalently, we say that 𝐴 ⊆ Z𝑑 contains
arbitrary constellations if for every finite set 𝐹 ⊆ Z𝑑 , there is some 𝑎 ∈ Z𝑑 and 𝑡 ∈ Z>0 such
that 𝑎+𝑡 ·𝐹 = {𝑎+𝑡𝑥 : 𝑥 ∈ 𝐹} is contained in 𝐴. In other words, 𝐴 contains every finite pattern
𝐹 (allowing dilation and translation, as captured by 𝑎 +𝑡 · 𝐹). The following multidimensional
generalization of Szemerédi’s theorem was proved by Furstenberg & Katznelson (1978) using
ergodic theory, though a combinatorial proof was later discovered as a consequence of the
hypergraph regularity method.

Theorem 0.2.6 (Multidimensional Szemerédi theorem)

Every subset of Z𝑑 of positive upper density contains arbitrary constellations.

For example, the theorem implies that every subset of Z2 of positive upper density contains
a 𝑘 × 𝑘 axis-aligned square grid for every 𝑘.
There is also a polynomial extension of Szemerédi’s theorem. Let us first state a special
case, originally conjectured by Lovász and proved independently by Furstenberg (1977) and
Sárkőzy (1978).

Theorem 0.2.7 (Furstenberg–Sárkőzy theorem)

Any subset of the integers with positive upper density contains two numbers differing by
a perfect square.

In other words, the set always contains {𝑥, 𝑥 + 𝑦 2 } for some 𝑥 ∈ Z and 𝑦 ∈ Z>0 . What
about other polynomial patterns? The following polynomial generalization was proved by
Bergelson & Leibman (1996).

Theorem 0.2.8 (Polynomial Szemerédi theorem)

Suppose 𝐴 ⊆ Z has positive upper density. If 𝑃1 , . . . , 𝑃 𝑘 ∈ Z[𝑋] are polynomials with
𝑃1 (0) = · · · = 𝑃 𝑘 (0) = 0, then there exist 𝑥 ∈ Z and 𝑦 ∈ Z>0 such that 𝑥 + 𝑃1 (𝑦), . . . , 𝑥 +
𝑃 𝑘 (𝑦) ∈ 𝐴.

In fact, Bergelson & Leibman proved a common generalization—a multidimensional

polynomial Szemerédi theorem (can you guess what it says?).
We will not discuss the polynomial Szemerédi theorem in this book. Currently the only
known proof of the most general form of the polynomial Szemerédi theorem uses ergodic
theory, though quantitative bounds are known for certain patterns (e.g., Peluse (2020)).
Building on Szemerédi’s theorem as well as other important developments in number the-
ory, Green & Tao (2008) proved their famous theorem that settled an old folklore conjecture
0.3 What’s Next in the Book? 9

about prime numbers. Their theorem is considered one of the most celebrated mathematical
achievements this century.

Theorem 0.2.9 (Green–Tao theorem)

The primes contain arbitrarily long arithmetic progressions.

We will discuss the Green–Tao theorem in Chapter 9. The theorem has been extended
to polynomial progressions (Tao & Ziegler 2008) and to higher dimensions (Tao & Ziegler
2015; also see Fox & Zhao 2015).

0.3 What’s Next in the Book?

One of our goals is to understand two different proofs of Roth’s theorem, which has the
following finitary statement. We say that a set is 3-AP-free if it does not contain a 3-term
arithmetic progression.

Theorem 0.3.1 (Roth’s theorem)

Every 3-AP-free subset of [𝑁] has size 𝑜(𝑁).

Roth originally proved his result using Fourier analysis (also called the Hardy–Littlewood
circle method in this context). We will see Roth’s proof in Chapter 6.
In the 1970’s, Szemerédi developed the graph regularity method. It is now a central
technique in extremal graph theory. Ruzsa & Szemerédi (1978) used the graph regularity
method to give a new graph theoretic proof of Roth’s theorem. We will see this proof as well
as other applications of the graph regularity method in Chapter 2.
Extremal graph theory, broadly speaking, concerns questions of the form: what is the
maximum (or minimum) possible number of some structure in a graph with certain prescribed
properties? A starting point (historically and also pedagogically) in extremal graph theory is
the following question:

Question 0.3.2 (Triangle-free graphs)

What is the maximum number of edges in a triangle-free 𝑛-vertex graph?

This question has a relatively simple answer, and it will be the first topic in the next
chapter. We will then explore related questions about the maximum number of edges in a
graph without some given subgraph.
Although Question 0.3.2 above sounds similar to Roth’s theorem, it does not actually allow
us to deduce Roth’s theorem. Instead, we need to consider the following question.

Question 0.3.3
What is the maximum number of edges in an 𝑛-vertex graph where every edge is contained
in a unique triangle?

This innocent looking question turns out to be incredibly mysterious. In Chapter 2, we

develop the graph regularity method and use it to prove that any such graph must have 𝑜(𝑛2 )
edges. And we then deduce Roth’s theorem from this graph theoretic claim.
10 Appetizer: Triangles and Equations

The graph regularity method illustrates the dichotomy of structure and pseudorandomness
in graph theory. Some of the later chapters dive further into related concepts. Chapter 3
explores pseudorandom graphs—what does it mean for a graph to look random? Chapter 4
concerns graph limits, a convenient analytic language for capturing many important con-
cepts in earlier chapters. Chapter 5 explores graph homomorphism inequalities, revisiting
questions from extremal graph theory with an analytic lens.
And then we switch gears (but not entirely) to some core topics in additive combinatorics.
Chapter 6 contains the Fourier analytic proof of Roth’s theorem. There will be many
thematic similarities between elements of the Fourier analytic proof and earlier topics.
Chapter 7 explores the structure of set addition. Here we prove Freiman’s theorem on sets
with small additive doubling, a cornerstone result in additive combinatorics. It also plays a
key roles in Gowers’ proof of Szemerédi’s theorem, generalizing Fourier analysis to higher
order Fourier analysis, although we will not go into the latter topic in this book (see Further
Reading at the end of Chapter 7). In Chapter 8, we explore the sum-product problem,
which is closely connected to incidence geometry (and we will see another graph theoretic
proof there). In Chapter 9, we discuss the Green–Tao theorem and prove an extension of
Szemerédi’s theorem to sparse pseudorandom sets, which plays a central role in the proof of
the Green–Tao theorem.
I hope that you will enjoy this book. I have been studying this subject since I began
graduate school. I still think about these topics nearly everyday. My goal is to organize and
distill the beautiful mathematics in this field as a friendly introduction.
The chapters do not have some logical dependencies, but not much. Each topic can be
studied and enjoyed on its own. Though, you will gain a lot more by appreciating the overall
themes and connections.
There is still a lot that we do not know. Perhaps you too will be intrigued by the boundless
open questions that are still waiting to be explored.

Further Reading
The book Ramsey Theory by Graham, Rothschild, & Spencer (1990) is a wonderful intro-
duction to the subject. It has beautiful accounts of theorems of Ramsey, van der Waerden,
Hales–Jewett, Schur, Rado, and others, that form the foundation of Ramsey theory.
For a survey of modern developments in additive combinatorics, check out the book review
by Green (2009a) of Additive Combinatorics by Tao & Vu (2006).

Chapter Summary
• Schur’s theorem. Every coloring of N using finitely many colors contains a monochro-
matic solution to 𝑥 + 𝑦 = 𝑧.
– Proof: set up a graph whose triangles correspond to solutions to 𝑥 + 𝑦 = 𝑧, and then
apply Ramsey’s theorem.
• Szemerédi’s theorem. Every subset of N with positive density contains arbitrarily long
arithmetic progressions.
– A foundational result that led to important developments in additive combinatorics.
– Several different proofs, each illustrating the dichotomy of structure of pseudoran-
domness in a different context.
– Extensions: multidimensional, polynomial, primes (Green–Tao).
1

Forbidding a Subgraph

Chapter Highlights
• Turán problem: determine the maximum number of edges in an 𝑛-vertex 𝐻-free graph
• Mantel and Turán’s theorems: 𝐾𝑟 -free
• Kővári–Sós–Turán theorem: 𝐾 𝑠,𝑡 -free
• Erdős–Stone–Simonovits theorem: 𝐻-free for general 𝐻
• Dependent random choice technique: 𝐻-free for a bounded degree bipartite 𝐻
• Lower bound constructions of 𝐻-free graphs for bipartite 𝐻
• Algebraic constructions: matching lower bounds for 𝐾2,2 , 𝐾3,3 , and 𝐾 𝑠,𝑡 for 𝑡 much larger
than 𝑠, and also for 𝐶4 , 𝐶6 , 𝐶10
• Randomized algebraic constructions

We begin by completely answering the following question.

Question 1.0.1 (Triangle-free graph)

What is the maximum number of edges in a triangle-free 𝑛-vertex graph?

We will see the answer shortly. More generally, we can ask about what happens if replace
“triangle” by an arbitrary subgraph. This is a foundational problem in extremal graph theory.

Definition 1.0.2 (Extremal number / Turán number)

We write ex(𝒏, 𝑯) for the maximum number of edges in an 𝑛-vertex 𝐻-free graph. Here
an 𝑯-free graph is a graph that does not contain 𝐻 as a subgraph.

In this book, by 𝐻-free we always mean forbidding 𝐻 as a subgraph, rather than as

an induced subgraph (see Notation and Conventions at the beginning of the book for the
distinction).

Question 1.0.3 (Turán problem)

Determine ex(𝑛, 𝐻). Fix a graph 𝐻, how does ex(𝑛, 𝐻) grow as 𝑛 → ∞?

The Turán problem is one of the most basic problems in extremal graph theory. It is named
after Turán for his fundamental work on the subject. Research on this problem has led to
many important techniques. We will see a fairly satisfactory answer to the Turán problem for
non-bipartite graphs 𝐻. We also know the answer for a small number of bipartite graphs 𝐻.
However, for nearly all bipartite graphs 𝐻, much mystery remains.
In the first part of the chapter, we focus on techniques for upper bounding ex(𝑛, 𝐻). In
the last few sections, we turn our attention to lower bounding ex(𝑛, 𝐻) when 𝐻 is a bipartite
graph.

11
12 Forbidding a Subgraph

1.1 Forbidding a Triangle: Mantel’s Theorem

We begin by answering Question 1.0.1: what is the maximum number of edges in an 𝑛-
vertex triangle-free graph? This question was answered in the early 1900’s by Mantel, whose
theorem is considered the starting point of extremal graph theory.
Let us partition the 𝑛 vertices into two equal halves (differing by one if 𝑛 is odd), and then
put in all edges across the two parts. This is the complete bipartite graph 𝐾 ⌊𝑛/2⌋, ⌈𝑛/2⌉ , and is
triangle-free. For example,

𝐾4,4 = .

The graph 𝐾 ⌊𝑛/2⌋,⌈𝑛/2⌉ has ⌊𝑛/2⌋ ⌈𝑛/2⌉ = ⌊𝑛2 /4⌋ edges (one can check this equality by
separately considering even and odd 𝑛).
Mantel (1907) proved that 𝐾 ⌊𝑛/2⌋,⌈𝑛/2⌉ has the most number of edges among all triangle-free
graphs.

Theorem 1.1.1 (Mantel’s theorem)

Every 𝑛-vertex triangle-free graph has at most ⌊𝑛2 /4⌋ edges.

Using the notation of Definition 1.0.2, Mantel’s theorem says that

2
𝑛
ex(𝑛, 𝐾3 ) = .
4
Moreover, we will see that 𝐾 ⌊𝑛/2⌋,⌈𝑛/2⌉ is the unique maximizer of the number of edges among
𝑛-vertex triangle-free graphs.
We give two different proofs of Mantel’s theorem, each illustrating a different technique.
First proof of Mantel’s theorem. Let 𝐺 = (𝑉, 𝐸) be a triangle-free graph with |𝑉 | = 𝑛
vertices and |𝐸 | = 𝑚 edges. For every edge 𝑥𝑦 of 𝐺, note that 𝑥 and 𝑦 have no common
neighbors or else it would create a triangle.
𝑥 𝑦

𝑁 (𝑥) 𝑁 (𝑦)

Therefore, deg 𝑥 + deg 𝑦 ≤ 𝑛, which implies that

∑︁
(deg 𝑥 + deg 𝑦) ≤ 𝑚𝑛.
𝑥 𝑦 ∈𝐸

On the other hand, note that for each vertex 𝑥, the term deg 𝑥 appears once in the above
sum for each edge incident to 𝑥, and so it appears a total of deg 𝑥 times. We then apply the
Cauchy–Schwarz inequality to get
!2
∑︁ ∑︁ 1 ∑︁ (2𝑚) 2
2
(deg 𝑥 + deg 𝑦) = (deg 𝑥) ≥ deg 𝑥 = .
𝑥 𝑦 ∈𝐸 𝑥 ∈𝑉
𝑛 𝑥 ∈𝑉 𝑛
1.1 Forbidding a Triangle: Mantel’s Theorem 13

Comparing the two inequalities, we obtain (2𝑚) 2 /𝑛 ≤ 𝑚𝑛, and hence 𝑚 ≤ 𝑛2 /4. Since 𝑚 is
an integer, we obtain 𝑚 ≤ ⌊𝑛2 /4⌋, as claimed. □
Second proof of Mantel’s theorem. Let 𝐺 = (𝑉, 𝐸) be a triangle-free graph. Let 𝑣 be a
vertex of maximum degree in 𝐺. Since 𝐺 is triangle-free, the neighborhood 𝑁 (𝑣) of 𝑣 is an
independent set.

𝐴 = 𝑁 (𝑣) 𝐵
(independent
set)

Partition 𝑉 = 𝐴 ∪ 𝐵 where 𝐴 = 𝑁 (𝑣) and 𝐵 = 𝑉 \ 𝐴. Since 𝑣 is a vertex of maximum

degree, we have deg 𝑥 ≤ deg 𝑣 = | 𝐴| for all 𝑥 ∈ 𝑉. Since 𝐴 contains no edges, every edge of
𝐺 has at least one endpoint in 𝐵. Therefore,
∑︁ 2
| 𝐴| + |𝐵| 𝑛2
|𝐸 | ≤ deg 𝑥 ≤ |𝐵| max deg 𝑥 ≤ | 𝐴| |𝐵| ≤ = , (1.1)
𝑥∈𝐵
𝑥∈𝐵 2 4
as claimed. □
Remark 1.1.2 (The equality case in Mantel’s theorem). The second proof above shows
that every 𝑛-vertex triangle-free graph with exactly ⌊𝑛2 /4⌋ edges must be isomorphic to
Í
𝐾 ⌊𝑛/2⌋, ⌈𝑛/2⌉ . Indeed, in (1.1), the inequality |𝐸 | ≤ 𝑥 ∈ 𝐵 deg 𝑥 is tight only if 𝐵 is an in-
Í
dependent set, the inequality 𝑥 ∈ 𝐵 deg 𝑥 ≤ | 𝐴| |𝐵| is tight if 𝐵 is complete to 𝐴, and
| 𝐴| |𝐵| < ⌊𝑛2 /4⌋ unless | 𝐴| = |𝐵| (if 𝑛 is even) or || 𝐴| − |𝐵|| = 1 (if 𝑛 is odd).
(Exercise: also deduce the equality case from the first proof.)
In general, it is a good idea to keep the equality case in mind when following the proofs,
or when coming up with your own proofs, to make sure you are not giving away too much at
any step.

The next exercise can be solved by a neat application of Mantel’s theorem.

Exercise 1.1.3. Let 𝑋 and 𝑌 be independent and identically distributed random vectors
in R𝑑 according to some arbitrary probability distribution. Prove that
1
P(|𝑋 + 𝑌 | ≥ 1) ≥ P(|𝑋 | ≥ 1) 2 .
2
Hint: Consider i.i.d. 𝑋1 , . . . , 𝑋𝑛 , and then let 𝑛 → ∞.

The next several exercises explore extensions of Mantel’s theorem. It is useful to revisit
the proof techniques.
14 Forbidding a Subgraph

Exercise 1.1.4 (Many triangles). Show that a graph with 𝑛 vertices and 𝑚 edges has at
least
4𝑚 𝑛2
𝑚− triangles.
3𝑛 4
Exercise 1.1.5. Prove that every 𝑛-vertex non-bipartite triangle-free graph has at most
(𝑛 − 1) 2 /4 + 1 edges.

Exercise 1.1.6 (Stability). Let 𝐺 be an 𝑛-vertex triangle-free graph with at least ⌊𝑛2 /4⌋ − 𝑘
edges. Prove that 𝐺 can be made bipartite by removing at most 𝑘 edges.
Exercise 1.1.7. Show that every 𝑛-vertex triangle-free graph with minimum degree
greater than 2𝑛/5 is bipartite.

Exercise 1.1.8∗. Prove that every 𝑛-vertex graph with at least ⌊𝑛2 /4⌋ + 1 edges contains
at least ⌊𝑛/2⌋ triangles.

Exercise 1.1.9∗. Let 𝐺 be an 𝑛-vertex graph with ⌊𝑛2 /4⌋ − 𝑘 edges (here 𝑘 ∈ Z) and 𝑡
triangles. Prove that 𝐺 can be made bipartite by removing at most 𝑘 + 6𝑡/𝑛 edges, and that
this constant 6 is best possible.

Exercise 1.1.10∗. Prove that every 𝑛-vertex graph with at least ⌊𝑛2 /4⌋ + 1 edges contains
some edge in at least (1/6 − 𝑜(1))𝑛 triangles, and that this constant 1/6 is best possible.

1.2 Forbidding a Clique: Turán’s Theorem

We generalize Mantel’s theorem from triangles to cliques.

Question 1.2.1 (𝐾𝑟+1 -free graph)

What is the maximum number of edges in a 𝐾𝑟+1 -free graph on 𝑛 vertices?

Construction 1.2.2 (Turán graph)

The Turán graph 𝑻𝒏,𝒓 is defined to be the complete 𝑛-vertex 𝑟-partite graph with part
sizes differing by at most 1 (so each part has size ⌊𝑛/𝑟⌋ or ⌈𝑛/𝑟⌉).

Example 1.2.3. 𝑇10,3 = 𝐾3,3,4 :

Turán (1941) proved the following fundamental result.

Theorem 1.2.4 (Turán’s theorem)

The Turán graph 𝑇𝑛,𝑟 maximizes the number of edges among all 𝑛-vertex 𝐾𝑟+1 -free
graphs. It is also the unique maximizer.
1.2 Forbidding a Clique: Turán’s Theorem 15

The first part of the theorem says that

ex(𝑛, 𝐾𝑟+1 ) = 𝑒(𝑇𝑛,𝑟 ).
It is not too hard to give precise formula for 𝑒(𝑇𝑛,𝑟 ), though there is a small annoying
dependence on the residue class of 𝑛 mod 𝑟. The following bound is good enough for most
purposes.
Exercise 1.2.5. Show that

1 𝑛2
𝑒(𝑇𝑛,𝑟 ) ≤ 1 − ,
𝑟 2
with equality if and only if 𝑛 is divisible by 𝑟.

Corollary 1.2.6 (Turán’s theorem)

1 𝑛2
ex(𝑛, 𝐾𝑟+1 ) ≤ 1 − .
𝑟 2

Even when 𝑛 is not divisible by 𝑟, the difference between 𝑒(𝑇𝑛,𝑟 ) and (1 − 1/𝑟)𝑛2 /2 is
𝑂 (𝑛𝑟). As we are generally interested in the regime when 𝑟 is fixed, this difference is a
negligible lower order contribution. That is,
2
1 𝑛
ex(𝑛, 𝐾𝑟+1 ) = 1 − − 𝑜(1) , for fixed 𝑟 as 𝑛 → ∞.
𝑟 2
Every 𝑟-partite graph is automatically 𝐾𝑟+1 -free. Let us first consider an easy special case
of the problem.

Lemma 1.2.7 (Maximum number of edges in an 𝑟 -partite graph)

Among 𝑛-vertex 𝑟-partite graphs, 𝑇𝑛,𝑟 is the unique graph with the maximum number of
edges.

Proof. Suppose we have an 𝑛-vertex 𝑟-partite graph with the maximum possible number of
edges. It should be a complete 𝑟-partite graph. If there were two vertex parts 𝐴 and 𝐵 with
| 𝐴| + 2 ≤ |𝐵|, then moving a vertex from 𝐵 (the larger part) to 𝐴 (the smaller part) would
increase the number of edges by (| 𝐴| + 1) (|𝐵| − 1) − | 𝐴| |𝐵| = |𝐵| − | 𝐴| − 1 > 0. Thus all
the vertex parts must have sizes within one of each other. The Turán graph 𝑇𝑛,𝑟 is the unique
such graph. □
We will see three proofs of Turán’s theorem. The first proof extends our second proof of
Mantel’s theorem.
First proof of Turán’s theorem. We prove by induction on 𝑟. The case 𝑟 = 1 is trivial as a
𝐾2 -free graph is empty. Now assume 𝑟 > 1 and that ex(𝑛, 𝐾𝑟 ) = 𝑒(𝑇𝑛,𝑟 −1 ) for every 𝑛.
Let 𝐺 = (𝑉, 𝐸) be a 𝐾𝑟+1 -free graph. Let 𝑣 be a vertex of maximum degree in 𝐺. Since 𝐺
is 𝐾𝑟+1 -free, the neighborhood 𝐴 = 𝑁 (𝑣) of 𝑣 is 𝐾𝑟 -free. So by the induction hypothesis,
𝑒( 𝐴) ≤ ex(| 𝐴| , 𝐾𝑟 ) = 𝑒(𝑇| 𝐴|,𝑟 −1 ).
16 Forbidding a Subgraph

𝐴 𝐵
(𝐾𝑟 -free)

Let 𝐵 = 𝑉 \ 𝐴. Since 𝑣 is a vertex of maximum degree, we have deg 𝑥 ≤ deg 𝑣 = | 𝐴| for

all 𝑥 ∈ 𝑉. So the number of edges with at least one vertex in 𝐵 is
∑︁
𝑒( 𝐴, 𝐵) + 𝑒(𝐵) ≤ deg 𝑥 ≤ |𝐵| max deg 𝑥 ≤ | 𝐴| |𝐵| .
𝑥∈𝐵
𝑥∈𝐵

Thus
𝑒(𝐺) = 𝑒( 𝐴) + 𝑒( 𝐴, 𝐵) + 𝑒(𝐵) ≤ 𝑒(𝑇| 𝐴|,𝑟 −1 ) + | 𝐴| |𝐵| ≤ 𝑒(𝑇𝑛,𝑟 ),
where the final step follows from the observation that 𝑒(𝑇| 𝐴|,𝑟 −1 ) + | 𝐴| |𝐵| is the number of
edges in an 𝑛-vertex 𝑟-partite graph (with part of size |𝐵| and the remaining vertices equitably
partitioned into 𝑟 − 1 parts) and Lemma 1.2.7.
Í
To have equality in every step above, 𝐵 must be an independent set (or else 𝑦 ∈ 𝐵 deg(𝑦) <
| 𝐴| |𝐵|) and 𝐴 must induce 𝑇| 𝐴|,𝑟 −1 , so that 𝐺 is 𝑟-partite. We knew from Lemma 1.2.7 that
the Turán graph 𝑇𝑛,𝑟 uniquely maximizes the number of edges among 𝑟-partite graphs. □
The second proof starts out similarly to our first proof of Mantel’s theorem. Recall that in
Mantel’s theorem, the initial observation was that in a triangle-free graph, given an edge, its
two endpoints must have no common neighbors (or else they form a triangle). Generalizing,
in a 𝐾4 -free graph, given a triangle, its three vertices have no common neighbor. The rest of
the proof proceeds somewhat differently from earlier. Instead of summing over all edges as
we did before, we remove the triangle and apply induction to the rest of the graph.
Second proof of Turán’s theorem. We fix 𝑟 and proceed by induction on 𝑛. The statement
is trivial for 𝑛 ≤ 𝑟, as the Turán graph is the complete graph 𝐾𝑛 = 𝑇𝑛,𝑟 and thus maximizes
the number of edges.
Now, assume that 𝑛 > 𝑟 and that Turán’s theorem holds for all graphs on fewer than 𝑛
vertices. Let 𝐺 = (𝑉, 𝐸) be an 𝑛-vertex 𝐾𝑟+1 -free graph with the maximum possible number
of edges. By the maximality assumption, 𝐺 contains 𝐾𝑟 as a subgraph, since otherwise we
could add an edge to 𝐺 and it would still be 𝐾𝑟+1 -free. Let 𝐴 be the vertex set of an 𝑟-clique
in 𝐺, and let 𝐵 := 𝑉 \ 𝐴.
1.2 Forbidding a Clique: Turán’s Theorem 17

𝐴
(𝑟 = 3)

Since 𝐺 is 𝐾𝑟+1 -free, every 𝑥 ∈ 𝐵 has at most 𝑟 − 1 neighbors in 𝐴. So

∑︁ ∑︁
𝑒( 𝐴, 𝐵) = deg(𝑦, 𝐴) ≤ (𝑟 − 1) = (𝑟 − 1) (𝑛 − 𝑟).
𝑦∈𝐵 𝑦∈𝐵

We have
𝑒(𝐺) = 𝑒( 𝐴) + 𝑒( 𝐴, 𝐵) + 𝑒(𝐵)

𝑟
≤ + (𝑟 − 1) (𝑛 − 𝑟) + 𝑒(𝑇𝑛−𝑟 ,𝑟 ) = 𝑒(𝑇𝑛,𝑟 ),
2
where the inequality uses the induction hypothesis on 𝐺 [𝐵], which is 𝐾𝑟+1 -free, and the final
equality can be seen by removing a 𝐾𝑟 from 𝑇𝑛,𝑟 .
Finally, let us check when equality occurs. To have equality in every step above, the
𝑟
subgraph induced on 𝐵 must be 𝑇𝑛−𝑟 ,𝑟 by induction. To have 𝑒( 𝐴) = 2 , 𝐴 must induce a
clique. To have 𝑒( 𝐴, 𝐵) = (𝑟 − 1) (𝑛 − 𝑟), every vertex of 𝐵 must be adjacent to all but one
vertex in 𝐴. Also, two vertices 𝑥, 𝑦 lying in distinct parts of 𝐺 [𝐵] 𝑇𝑛−𝑟 ,𝑟 cannot “miss” the
same vertex 𝑣 of 𝐴, or else 𝐴 ∪ {𝑥, 𝑦} \ {𝑣} would be an 𝐾𝑟+1 -clique. This then forces 𝐺 to
be 𝑇𝑛,𝑟 . □
The third proof uses a method known as Zykov symmetrization. The idea here is that if a
𝐾𝑟+1 -free graph is a not a Turán graph, then we should be able make some local modifications
(namely replacing a vertex by a clone of another vertex) to get another 𝐾𝑟+1 -free with strictly
more edges.
Third proof of Turán’s theorem. As before, let 𝐺 be an 𝑛-vertex, 𝐾𝑟+1 -free graph with the
maximum possible number of edges.
We claim that if 𝑥 and 𝑦 are non-adjacent vertices, then deg 𝑥 = deg 𝑦. Indeed, suppose
deg 𝑥 > deg 𝑦. We can modify 𝐺 by removing 𝑦 and adding in a clone of 𝑥 (a new vertex 𝑥 ′
with the same neighborhood as 𝑥 but not adjacent to 𝑥), as illustrated below.
𝑥 𝑦 𝑥 𝑥′
−→

The resulting graph would still be 𝐾𝑟+1 -free (since a clique cannot contain both 𝑥 and its
clone) and has strictly more edges than 𝐺, thereby contradicting the assumption that 𝐺 has
the maximum possible number of edges.
Suppose 𝑥 is non-adjacent to both 𝑦 and 𝑧 in 𝐺. We claim that 𝑦 and 𝑧 must be non-adjacent.
18 Forbidding a Subgraph

We just saw that deg 𝑥 = deg 𝑦 = deg 𝑧. If 𝑦𝑧 is an edge, then by deleting 𝑦 and 𝑧 from 𝐺 and
adding two clones of 𝑥, we obtain a 𝐾𝑟+1 -free graph with one more edge than 𝐺. This would
contradict the maximality of 𝐺.
𝑥 𝑥
−→
𝑦 𝑧 𝑥′ 𝑥 ′′

Therefore, non-adjacency is an equivalence relation among vertices of 𝐺. So the comple-

ment of 𝐺 is a union of cliques. Hence 𝐺 is a complete multipartite graph, which has at
most 𝑟 parts since 𝐺 is 𝐾𝑟+1 -free. Among all complete 𝑟-partite graphs, the Turán graph 𝑇𝑛,𝑟
is the unique graph that maximizes the number of edges, by Lemma 1.2.7. Therefore, 𝐺 is
isomorphic to 𝑇𝑛,𝑟 . □
The last proof we give in this section uses the probabilistic method. This probabilistic
proof was given in the book The Probabilistic Method by Alon & Spencer, though the key
inequality is due earlier to Caro and Wei. Below, we prove Turán’s theorem in the formulation
of Corollary 1.2.6, i.e, ex(𝑛, 𝐾𝑟+1 ) ≤ (1 − 1/𝑟)𝑛2 /2. A more careful analysis of the proof can
yield the stronger statement of Theorem 1.2.4, which we omit.
Fourth proof of Turán’s theorem (Corollary 1.2.6). Let 𝐺 = (𝑉, 𝐸) be an 𝑛-vertex, 𝐾𝑟+1 -
free graph. Consider a uniform random ordering of the vertices. Let
𝑋 = {𝑣 ∈ 𝑉 : 𝑣 is adjacent to all earlier vertices in the random ordering}.
Then 𝑋 is a clique. Since the ordering was chosen uniformly at random,
1
P(𝑣 ∈ 𝑋) = P(𝑣 appears before all its non-neighbors) = .
𝑛 − deg 𝑣
Since 𝐺 is 𝐾𝑟+1 -free, |𝑋 | ≤ 𝑟. So by linearity of expectations
∑︁
𝑟 ≥ E |𝑋 | = P(𝑣 ∈ 𝑋)
∑︁
𝑣 ∈𝑉
1 𝑛 𝑛
= ≥ Í = .
𝑣 ∈𝑉
𝑛 − deg 𝑣 𝑛 − ( 𝑣 ∈𝑉 deg 𝑣)/𝑛 𝑛 − 2𝑚/𝑛
Rearranging gives

1 𝑛2
𝑚 ≤ 1− . □
𝑟 2
In Chapter 5, we will see another proof of Turán’s theorem using a method known as
graph Lagrangians.

Exercise 1.2.8. Let 𝐺 be a 𝐾𝑟+1 -free graph. Prove that there exists an 𝑟-partite graph 𝐻
on the same vertex set as 𝐺 such that and deg𝐻 (𝑥) ≥ deg𝐺 (𝑥) for every vertex 𝑥 (here
deg𝐻 (𝑥) is the degree of 𝑥 in 𝐻, and likewise with deg𝐺 (𝑥) for 𝐺). Give another proof of
Turán’s theorem from this fact.
The following exercise is an extension of Exercise 1.1.6.
1.3 Turán Density and Supersaturation 19

Exercise 1.2.9∗ (Stability). Let 𝐺 be an 𝑛-vertex 𝐾𝑟+1 -free graph with at least 𝑒(𝑇𝑛,𝑟 ) − 𝑘
edges, where 𝑇𝑛,𝑟 is the Turán graph. Prove that 𝐺 can be made 𝑟-partite by removing at
most 𝑘 edges.
The next exercise is a neat geometric application of Turán’s theorem.
Exercise 1.2.10. Let 𝑆 be a set of 𝑛 points in the plane, with the property that no two
points are at distance greater
√ than 1. Show that 𝑆 has at most ⌊𝑛2 /3⌋ pairs of points at
distance greater than 1/ 2. Also, show that the bound ⌊𝑛2 /3⌋ is tight (i.e., cannot be
improved).

1.3 Turán Density and Supersaturation

Turán’s theorem exactly determines ex(𝑛, 𝐻) when 𝐻 is a clique. Such precise answers are
actually quite rare in extremal graph theory. We are often content with looser bounds and
asymptotics.
Before going on to bound ex(𝑛, 𝐻) for other values of 𝐻, let us take a short detour and
think about the structure of the problem.

Turán density
In this chapter, we will define the edge density of a graph 𝐺 to be

𝑣(𝐺)
𝑒(𝐺) .
2
So the edge density of a clique is 1. Later in the book, we will consider a different normal-
ization 2𝑒(𝐺)/𝑣(𝐺) 2 for edge density, which is more convenient for other purposes. When
𝑣(𝐺) is large, there is no significant difference between the two choices.
Next, we use an averaging/sampling argument to show that ex(𝑛, 𝐻)/ 𝑛2 is non-increasing
in 𝑛.
Proposition 1.3.1 (Monotonicity of Turán numbers)
For every graph 𝐻 and positive integer 𝑛,
ex(𝑛 + 1, 𝐻) ex(𝑛, 𝐻)
𝑛+1
≤ 𝑛 .
2 2

Proof. Let 𝐺 an 𝐻-free graph on 𝑛 + 1 vertices. For each 𝑛-vertex subset 𝑆 of 𝑉 (𝐺), since
𝐺 [𝑆] is also 𝐻-free, we have
𝑒(𝐺 [𝑆]) ex(𝑛, 𝐻)
𝑛 ≤ 𝑛 .
2 2

Varying 𝑆 uniformly over all 𝑛-vertex subsets of 𝑉 (𝐺), and the left-hand hand averages to
the edge density of 𝐺 by linearity of expectations (check this). It follows that
𝑒(𝐺) ex(𝑛, 𝐻)
𝑛+1
≤ 𝑛 .
2 2

The claim then follows. □

20 Forbidding a Subgraph
𝑛
For every fixed 𝐻, the sequence ex(𝑛, 𝐻)/ 2
is non-increasing and bounded between 0
and 1. It follows that it approaches a limit.

Definition 1.3.2 (Turán density)

The Turán density of a graph 𝐻 is defined to be
ex(𝑛, 𝐻)
𝝅(𝑯) := lim
𝑛→∞ 𝑛 .
2

Here are some additional equivalent definitions of Turán density:

• 𝜋(𝐻) is smallest real number so that for every 𝜀 > 0 there is some 𝑛0 = 𝑛0 (𝐻, 𝜀) so
that for every 𝑛 ≥ 𝑛0 , every 𝑛-vertex graph with at least (𝜋(𝐻) + 𝜀) 𝑛2 edges contains
𝐻 as a subgraph;
• 𝜋(𝐻) is the smallest real number so that every 𝑛-vertex 𝐻-free graph has edge density
≤ 𝜋(𝐻) + 𝑜(1).
Recall from Turán’s theorem, we saw that
2
1 𝑛
ex(𝑛, 𝐾𝑟+1 ) = 1 − − 𝑜(1) , for fixed 𝑟 as 𝑛 → ∞,
𝑟 2
which is equivalent to
1
𝜋(𝐾𝑟+1 ) = 1 − .
𝑟
In the next couple of sections we will prove the Erdős–Stone–Simonovits theorem, which
determines the Turán density for every graph 𝐻:
1
𝜋(𝐻) = 1 −
𝜒(𝐻) − 1
where 𝜒(𝐻) is the chromatic number of 𝐻. It should be surprising that the Turán density of
𝐻 depends only on the chromatic number of 𝐻.
With the Erdős–Stone–Simonovits theorem, it may seem as if the Turán problem is essen-
tially understood, but actually this would be very far from the truth. We will see in the next
section that 𝜋(𝐻) = 0 for every bipartite graph 𝐻. In other words ex(𝑛, 𝐻) = 𝑜(𝑛2 ). Actual
asymptotics growth rate of ex(𝑛, 𝐻) is often unknown.
In a different direction, the generalization to hypergraphs, while looking deceptively
similar, turns out to be much more difficult, and very little is known here.
Remark 1.3.3 (Hypergraph Turán problem). Generalizing from graphs to hypergraphs,
given an 𝑟-uniform hypergraph 𝐻, we write ex(𝑛, 𝐻) for the maximum number of edges in
an 𝑛-vertex 𝑟-uniform hypergraph that does not contain 𝐻 as a subgraph. A straightforward
extension of Proposition 1.3.1 gives that ex(𝑛, 𝐻)/ 𝑛𝑟 is a non-increasing function of 𝑛, for
each fixed 𝐻. So we can similarly define the hypergraph Turán density
ex(𝑛, 𝐻)
𝜋(𝐻) := lim
𝑛→∞ 𝑛 .
𝑟

The exact value of 𝜋(𝐻) is known in very few cases. It is a major open problem to deter-
mine 𝜋(𝐻) when 𝐻 is the complete 3-uniform hypergraph on 4 vertices (also known as a
tetrahedron), and more generally when 𝐻 is a complete hypergraph.
1.3 Turán Density and Supersaturation 21

Supersaturation
We know from Mantel’s theorem that any 𝑛-vertex graph 𝐺 with > 𝑛2 /4 edges must contain
a triangle. What if 𝐺 has a lot more edges? It turns out that 𝐺 must have a lot of triangles.
In particular, an 𝑛-vertex graph with > (1/4 + 𝜀)𝑛2 edges must have at least 𝛿𝑛3 triangles
for some constant 𝛿 > 0 depending on 𝜀 > 0. This is indeed a lot of triangles, since there
are could only be at most 𝑂 (𝑛3 ) triangles no matter what. (Exercise 1.1.4 asks you to give a
more precise quantitative lower bound on the number of triangles. The optimal dependence
of 𝛿 on 𝜀 is a difficult problem that we will discuss in Chapter 5.)
It turns out there is a general phenomenon in combinatorics where once some density
crosses an existence threshold (e.g., the Turán density is the threshold for 𝐻-freeness), it
will be possible to find not just one copy of the desired object, but in fact lots and lots of
copies. This fundamental principle, called supersaturation, is useful for many applications,
including in our upcoming determination of 𝜋(𝐻) for general 𝐻.

Theorem 1.3.4 (Supersaturation)

For every 𝜀 > 0 and graph 𝐻 there exist some 𝛿 > 0 and 𝑛0 such that every graph on
𝑛 ≥ 𝑛0 vertices with at least (𝜋(𝐻) + 𝜀) 𝑛2 edges contains at least 𝛿𝑛𝑣 (𝐻 ) copies of 𝐻 as
a subgraph.

Equivalently: every 𝑛-vertex graph with 𝑜(𝑛𝑣 (𝐻 ) ) copies of 𝐻 has edge density ≤ 𝜋(𝐻) +
𝑜(1) (here 𝐻 is fixed). The sampling argument in the proof below is useful in many
applications.
Proof. By the definition of the Turán density, there exists some 𝑛0 (depending on 𝐻 and

𝜀) such that every 𝑛0 -vertex graph with at least (𝜋(𝐻) + 𝜀/2) 𝑛20 edges contains 𝐻 as a
subgraph.
Let 𝑛 ≥ 𝑛0 and 𝐺 be 𝑛-vertex graph with at least (𝜋(𝐻) + 𝜀) 𝑛2 edges. Let 𝑆 be an
𝑛0 -element subset of 𝑉 (𝐺), chosen uniformly at random. Let 𝑋 denote the edge density of
𝐺 [𝑆]. By averaging, E𝑋 equals to the edge density of 𝐺, and so E𝑋 ≥ 𝜋(𝐻) + 𝜀. Then
𝑋 ≥ 𝜋(𝐻) + 𝜀/2 with probability ≥ 𝜀/2 (or else E𝑋 could not be as large as 𝜋(𝐻) + 𝜀). So,
from the previous paragraph, we know that with probability ≥ 𝜀/2, 𝐺 [𝑆] contains a copy of
(𝐻 )
𝐻. This gives us ≥ (𝜀/2) 𝑛𝑛0 copies of 𝐻, but each copy of 𝐻 may be counted up to 𝑛𝑛−𝑣
0 −𝑣 (𝐻 )
times. Thus the number of copies of 𝐻 in 𝐺 is

(𝜀/2) 𝑛𝑛0
≥ 𝑛−𝑣 (𝐻 ) = Ω𝐻, 𝜀 (𝑛𝑣 (𝐻 ) ). □
𝑛0 −𝑣 (𝐻 )

Exercise 1.3.5 (Supersaturation for hypergraphs). Let 𝐻 be an 𝑟-uniform hypergraph

with hypergraph Turán density 𝜋(𝐻). Prove that every 𝑛-vertex 𝑟-uniform hypergraph with
𝑜(𝑛𝑣 (𝐻 ) ) copies of 𝐻 has at most (𝜋(𝐻) + 𝑜(1)) 𝑛𝑟 edges.

Exercise 1.3.6 (Density Ramsey). Prove that for every 𝑠 and 𝑟, there is some constant
𝑐 > 0 so that for every sufficiently large 𝑛, if the edges of 𝐾𝑛 are colored using 𝑟 colors,
then at least 𝑐 fraction of all copies of 𝐾𝑠 are monochromatic.
22 Forbidding a Subgraph

Exercise 1.3.7 (Density Szemerédi). Let 𝑘 ≥ 3. Assuming Szemerédi’s theorem for

𝑘-term arithmetic progressions (i.e., every subset of [𝑁] without a 𝑘-term arithmetic
progression has size 𝑜(𝑁)), prove the following density version of Szemerédi’s theorem:
For every 𝛿 > 0 there exist 𝑐 and 𝑁0 (both depending only on 𝑘 and 𝛿) such that for
every 𝐴 ⊆ [𝑁] with | 𝐴| ≥ 𝛿𝑁 and 𝑁 ≥ 𝑁0 , the number of 𝑘-term arithmetic progressions
in 𝐴 is at least 𝑐𝑁 2 .

1.4 Forbidding a Complete Bipartite Graph: Kővári–Sós–Turán Theorem

In this section, we provide an upper bound on ex(𝑛, 𝐾𝑠,𝑡 ), the maximum number of edges in
an 𝑛-vertex 𝐾𝑠,𝑡 -free graph. It is a major open problem to determine the asymptotic growth
of ex(𝑛, 𝐾𝑠,𝑡 ). For certain pairs (𝑠, 𝑡) the answer is known, as we will discuss later in the
chapter.

Problem 1.4.1 (Zarankiewicz problem)

Determine ex(𝑛, 𝐾𝑠,𝑡 ), the maximum number of edges in an 𝑛-vertex 𝐾𝑠,𝑡 -free graph.

Zarankiewicz (1951) originally asked a related problem: determine the maximum number
of 1’s in an 𝑚 × 𝑛 matrix without an 𝑠 × 𝑡 submatrix with all entries 1.
The main theorem of this section is the fundamental result due to Kővári, Sós, & Turán
(1954). We will refer to it as the KST theorem, which stands both for its discoverers, as well
as for the forbidden subgraph 𝐾𝑠,𝑡 .

Theorem 1.4.2 (Kővári–Sós–Turán theorem — “KST theorem”)

For positive integers 𝑠 ≤ 𝑡, there exists some constant 𝐶 = 𝐶 (𝑠, 𝑡), such that, for all 𝑛,
ex(𝑛, 𝐾𝑠,𝑡 ) ≤ 𝐶𝑛2−1/𝑠 .

The proof proceeds by double counting.

Proof. Let 𝐺 be an 𝑛-vertex 𝐾 𝑠,𝑡 -free graph with 𝑚 edges. Let

..
𝑋 = number of copies of 𝐾𝑠,1 in 𝐺. .

(When 𝑠 = 1, we set 𝑋 = 2𝑒(𝐺).) The strategy is to count 𝑋 in two ways. First we count 𝐾𝑠,1
by first embedding the “left” 𝑠 vertices of 𝐾𝑠,1 . Then we count 𝐾𝑠,1 by first embedding the
“right” single vertex of 𝐾𝑠,1 .
Upper bound on 𝑋. Since 𝐺 is 𝐾𝑠,𝑡 -free, every 𝑠-vertex subset of 𝐺 has ≤ 𝑡 − 1 common
neighbors. Therefore,

𝑛
𝑋≤ (𝑡 − 1).
𝑠

Lower bound on 𝑋. For each vertex 𝑣 of 𝐺, there are exactly deg𝑠 𝑣 ways to pick 𝑠 of its
1.4 Forbidding a Complete Bipartite Graph: Kővári–Sós–Turán Theorem 23

neighbors to form a 𝐾𝑠,1 as a subgraph. Therefore

∑︁ deg 𝑣
𝑋=
𝑣 ∈𝑉 (𝐺)
𝑠

To obtain a lower bound onthis quantity in terms of the number of edges 𝑚 of 𝐺, we use a
standard trick of viewing 𝑥𝑠 as a convex function on the reals, namely, letting
(
𝑥(𝑥 − 1) · · · (𝑥 − 𝑠 + 1)/𝑠! if 𝑥 ≥ 𝑠 − 1
𝑓𝑠 (𝑥) =
0 𝑥 < 𝑠 − 1.

Then 𝑓 (𝑥) = 𝑥𝑠 for all nonnegative integers 𝑥. Furthermore 𝑓𝑠 is a convex function. Since
the average degree of 𝐺 is 2𝑚/𝑛, it follows by convexity that
∑︁
2𝑚
𝑋= 𝑓𝑠 (deg 𝑣) ≥ 𝑛 𝑓𝑠 .
𝑣 ∈𝑉 (𝐺)
𝑛

(It would be a sloppy mistake to lower bound 𝑋 by 𝑛 2𝑚/𝑛
𝑠
.)
Combining the upper bound and the lower bound. We find that

2𝑚 𝑛
𝑛 𝑓𝑠 ≤𝑋≤ (𝑡 − 1).
𝑛 𝑠
Since 𝑓𝑠 (𝑥) = (1 + 𝑜(1))𝑥 𝑠 /𝑠! for 𝑥 → ∞ and fixed 𝑠, we find that, as 𝑛 → ∞,
𝑠
𝑛 2𝑚 𝑛𝑠
≤ (1 + 𝑜(1)) (𝑡 − 1).
𝑠! 𝑛 𝑠!
Therefore,

(𝑡 − 1) 1/𝑠
𝑚≤ + 𝑜(1) 𝑛2−1/𝑠 . □
2
The final bound in the proof gives us a somewhat more precise estimate than stated in
Theorem 1.4.2. Let us record it here for future reference.

Theorem 1.4.3 (KST theorem)

Fix positive integers 𝑠 ≤ 𝑡. Then, as 𝑛 → ∞,

(𝑡 − 1) 1/𝑠
ex(𝑛, 𝐾𝑠,𝑡 ) ≤ + 𝑜(1) 𝑛2−1/𝑠 .
2

It has been long conjectured that the KST theorem is tight up to a constant factor.

Conjecture 1.4.4 (Tightness of KST bound)

For positive integers 𝑠 ≤ 𝑡, there exists a constant 𝑐 = 𝑐(𝑠, 𝑡) > 0 such that for all 𝑛 ≥ 2,
ex(𝑛, 𝐾𝑠,𝑡 ) ≥ 𝑐𝑛2−1/𝑠 .
(In other words, ex(𝑛, 𝐾𝑠,𝑡 ) = Θ𝑠,𝑡 (𝑛2−1/𝑠 ).)

In the final sections of this chapter, we will produce some constructions showing that
24 Forbidding a Subgraph

Conjecture 1.4.4 is true for 𝐾2,𝑡 and 𝐾3,𝑡 . We also know that the conjecture is true if 𝑡 is much
larger than 𝑠. The first open case of the conjecture is 𝐾4,4 .

Here is an easy consequence of the KST theorem.

Corollary 1.4.5
For every bipartite graph 𝐻, there exists some constant 𝑐 > 0 so that ex(𝑛, 𝐻) =
𝑂 𝐻 (𝑛2−𝑐 ).

Proof. Suppose the two vertex parts of 𝐻 have sizes 𝑠 and 𝑡, with 𝑠 ≤ 𝑡. Then 𝐻 ⊆ 𝐾 𝑠,𝑡 . And
thus every 𝑛-vertex 𝐻-free graph is also 𝐾𝑠,𝑡 -free, and thus has 𝑂 𝑠,𝑡 (𝑛2−1/𝑠 ) edges. □
In particular, the Turán density 𝜋(𝐻) of every bipartite graph 𝐻 is zero.
The KST theorem gives a constant 𝑐 in the above corollary that depends on the number of
vertices on the smaller part of 𝐻. In Section 1.7, we will use the dependent random choice
technique to give a proof of the corollary showing that 𝑐 only has to depend on the maximum
degree of 𝐻.

Geometric applications of the KST theorem

The following famous problem was posed by Erdős (1946).

Question 1.4.6 (Erdős unit distance problem)

What is the maximum number of unit distances formed by a set of 𝑛 points in R2 ?

In other words, given 𝑛 distinct points in the plane, at most how many pairs of these points
can be exactly distance 1 apart? We can draw a graph with these 𝑛 points as vertices, with
edges joining points exactly unit distance apart.
To get a feeling for the problem, let us play with some constructions. For small values of
𝑛, it is not hard to check by hand that the following configurations are optimal.

𝑛= 3 4 5 6 7

What about for larger values of 𝑛? If we line up the 𝑛 points equally spaced on a line, we
get 𝑛 − 1 unit distances.
···

We can be a bit more efficient by chaining up triangles. The following construction gives us
2𝑛 − 3 unit distances.
···

The construction for 𝑛 = 6 looks like it was obtained by copying and translating a unit
triangle. We can generalize this idea to obtain a recursive construction. Let 𝑓 (𝑛) denote the
1.4 Forbidding a Complete Bipartite Graph: Kővári–Sós–Turán Theorem 25

maximum number of unit distances formed by 𝑛 points in the plane. Given a configuration
𝑃 with ⌊𝑛/2⌋ points that has 𝑓 ( ⌊𝑛/2⌋) unit distances, we can copy 𝑃 and translate it by a
generic unit vector to get 𝑃′ . The configuration 𝑃 ∪ 𝑃′ has at least 2 𝑓 ( ⌊𝑛/2⌋) + ⌊𝑛/2⌋ unit
distances. We can solve the recursion to get 𝑓 (𝑛) ≳ 𝑛 log 𝑛. Now we take a different approach
to obtain an even better construction.

𝑃 𝑃′
1
√ √
Take a square grid with ⌊ 𝑛⌋ × ⌊ 𝑛⌋ vertices. Instead of choosing the distance between
√
adjacent points as the unit distance, we can scale the configuration so that 𝑟 becomes the
“unit” distance for some integer 𝑟. As an illustration, here is an example of a 5 × 5 grid with
𝑟 = 10.

It turns out that by choosing the optimal 𝑟 as a function of 𝑛, we can get at least
𝑛1+𝑐/log log 𝑛
unit distances, where 𝑐 > 0 is some absolute constant. The proof uses analytic number theory,
which we omit as it would take us too far afield. The basic idea is to choose 𝑟 to be a product
of many distinct primes that are congruent to 1 modulo 4, so that 𝑟 can be represented as a
sum of two squares in many different ways, and then estimate the number of such ways.
It is conjectured that the last construction above is close to optimal.

Conjecture 1.4.7 (Erdős unit distance conjecture)

Every set of 𝑛 points in R2 has at most 𝑛1+𝑜(1) unit distances.

The KST theorem can be used to prove the following upper bound on the number of unit
distances.

Theorem 1.4.8 (Upper bound on the unit distance problem)

Every set of 𝑛 points in R2 has 𝑂 (𝑛3/2 ) unit distances.

Proof. Every unit distance graph is 𝐾2,3 -free. Indeed, for every pair of distinct points, there
are at most two other points that are at unit distance from both points.

𝑝 𝑞

So the number of edges is at most ex(𝑛, 𝐾2,3 ) = 𝑂 (𝑛3/2 ) by Theorem 1.4.2. □

There is a short proof of a better bound of 𝑂 (𝑛4/3 ) using the crossing number inequality
(see Section 8.2), and this best known upper bound to date.
26 Forbidding a Subgraph

Erdős (1946) also asked the following related question.

Question 1.4.9 (Erdős distinct distances problem)

What is the minimum number of distinct distances formed by 𝑛 points in R2 ?

Let 𝑔(𝑛) denote the answer. The asymptotically best construction for the minimum number
of distinct distances is also a square grid, same √︁
as earlier. It can be shown that a square grid
√ √
with ⌊ 𝑛⌋ × ⌊ 𝑛⌋ points has on the order of 𝑛/ log 𝑛 distinct distances. This is conjectured
√︁
to be optimal (i.e., 𝑔(𝑛) ≲ 𝑛/ log 𝑛).
Let 𝑓 (𝑛) denote the
maximum number of unit distances among 𝑛 points in the answer. We
have 𝑓 (𝑛)𝑔(𝑛) ≥ 𝑛2 , since each distance occurs at most 𝑓 (𝑛) times. So an upper bound on
𝑓 (𝑛) gives a lower bound on 𝑔(𝑛) (but not conversely).
A breakthrough on the distinct distances problem was obtained by Guth & Katz (2015).

Theorem 1.4.10 (Guth–Katz distinct distances theorem)

A set of 𝑛 points in R2 form Ω(𝑛/log 𝑛) distinct distances.

In other
√︁ words, 𝑔(𝑛) ≳ 𝑛/log 𝑛, thereby matching the upper bound example up to a factor
of 𝑂 ( log 𝑛). The Guth–Katz proof is quite sophisticated. It uses tools ranging from the
polynomial method to algebraic geometry.

Exercises
Exercise 1.4.11. Show that a 𝐶4 -free bipartite graph between two vertex parts of sizes 𝑎
and 𝑏 has at most 𝑎𝑏 1/2 + 𝑏 edges.

Exercise 1.4.12 (Density KST). Prove that for every pair of positive integers 𝑠 ≤ 𝑡, there
𝑛
are constants 𝐶, 𝑐 > 0 such that every 𝑛-vertex graph with 𝑝 2
edges contains at least
𝑐 𝑝 𝑠𝑡 𝑛𝑠+𝑡 copies of 𝐾𝑠,𝑡 , provided that 𝑝 ≥ 𝐶𝑛 −1/𝑠 .
The next exercise asks you to think about the quantitative dependencies in the proof of the
KST theorem.
Exercise 1.4.13. Show that for every 𝜀 > 0, there exists 𝛿 > 0 such that every graph with
𝑛 vertices and at least 𝜀𝑛2 edges contains a copy of 𝐾𝑠,𝑡 where 𝑠 ≥ 𝛿 log 𝑛 and 𝑡 ≥ 𝑛0.99 .

The next exercise illustrates a bad definition of density of a subset of Z2 (it always ends
up being either 0 or 1).
Exercise 1.4.14 (How not to define density). Let 𝑆 ⊆ Z2 . Define
|𝑆 ∩ ( 𝐴 × 𝐵)|
𝑑 𝑘 (𝑆) = max .
𝐴,𝐵⊆Z | 𝐴| |𝐵|
| 𝐴|=| 𝐵|=𝑘

Show that lim𝑘→∞ 𝑑 𝑘 (𝑆) exists and is always either 0 or 1.

1.5 Forbidding a General Subgraph: Erdős–Stone–Simonovits Theorem 27

1.5 Forbidding a General Subgraph: Erdős–Stone–Simonovits Theorem

Turán’s theorem tells us that
2
1 𝑛
ex(𝑛, 𝐾𝑟+1 ) = 1 − − 𝑜(1) for fixed 𝑟.
𝑟 2
The KST theorem implies that
ex(𝑛, 𝐻) = 𝑜(𝑛2 ) for any fixed bipartite graph 𝐻.
In this section, we extend these results and determine ex(𝑛, 𝐻), up to an 𝑜(𝑛2 ) error term,
for every graph 𝐻. In other words, we will compute the Turán density 𝜋(𝐻).
Initially it seems possible that the Turán density 𝜋(𝐻) might depend on 𝐻 in some
complicated way. It turns out that it only depends on the chromatic number 𝝌(𝑯) of 𝐻,
which is the smallest number of colors needed to color the vertices of 𝐻 such that no two
adjacent vertices receive the same color.
Suppose 𝜒(𝐻) = 𝑟. Then 𝐻 cannot be a subgraph of any (𝑟 −1)-partite graph. In particular,
the Turán graph 𝑇𝑛,𝑟 −1 is 𝐻-free (recall from Construction 1.2.2 that 𝑇𝑛,𝑟 −1 is the complete
(𝑟 − 1)-partite graph with 𝑛 vertices divided into nearly equal parts). Therefore,
2
1 𝑛
ex(𝑛, 𝐻) ≥ 𝑒(𝑇𝑛,𝑟 −1 ) = 1 − + 𝑜(1) .
𝑟 −1 2
The main theorem of this section, below, is a matching upper bound.

Theorem 1.5.1 (Erdős–Stone–Simonovits theorem)

For every graph 𝐻, as 𝑛 → ∞,
2
1 𝑛
ex(𝑛, 𝐻) = 1 − + 𝑜(1) .
𝜒(𝐻) − 1 2
In other words, the Turán density of 𝐻 is
1
𝜋(𝐻) = 1 − .
𝜒(𝐻) − 1

Remark 1.5.2 (History). Erdős & Stone (1946) proved this result when 𝐻 is a complete
multipartite graph. Erdős & Simonovits (1966) observed that the general case follows as a
quick corollary. The proof given here is due to Erdős (1971).
Example 1.5.3. When 𝐻 = 𝐾𝑟+1 , 𝜒(𝐻) = 𝑟 + 1, and so Theorem 1.5.1 agrees with Turán’s
theorem.
Example 1.5.4. When 𝐻 is the Petersen graph, below, which has chromatic number 3,
Theorem 1.5.1 tells us that ex(𝑛, 𝐻) = (1/4 + 𝑜(1))𝑛2 . The Turán density of the Petersen
graph is the same as that of a triangle, which may be somewhat surprising since the Petersen
graph seems more complicated than the triangle.
28 Forbidding a Subgraph
2

1
1 1
3 2

3 2

2 3

It suffices to establish the Erdős–Stone–Simonovits theorem for complete 𝑟-partite graphs

𝐻, since every 𝐻 with 𝜒(𝐻) = 𝑟 is a subgraph of some complete 𝑟-partite graph.

Theorem 1.5.5 (Erdős–Stone theorem)

Fix 𝑟 ≥ 2 and 𝑠 ≥ 1. Let 𝐻 = 𝐾𝑠,...,𝑠 be the complete 𝑟-partite graph with 𝑠 vertices in
each part. Then
2
1 𝑛
ex(𝑛, 𝐻) = 1 − + 𝑜(1) .
𝑟 −1 2

In other words, using the notation 𝐾𝑟 [𝑠] for 𝒔-blow-up of 𝐾𝑟 , obtained by replacing each
vertex of 𝐾𝑟 by 𝑠 duplicates of itself (so that 𝐾𝑟 [𝑠] = 𝐻 in the above theorem statement), the
Erdős–Stone theorem says that
1
𝜋(𝐾𝑟 [𝑠]) = 𝜋(𝐾𝑟 ) = 1 − ,
𝑟 −1
In Section 1.3, we saw supersaturation (Theorem 1.3.4): when the edge density is signif-
icantly above the Turán density threshold 𝜋(𝐻), one finds not just a single copy of 𝐻 but
actually many copies. The Erdős–Stone theorem can be viewed in this light: above edge
density 𝜋(𝐻), we finds a large blow-up of 𝐻.
The proof uses the following hypergraph extension of the KST theorem, which we will
prove later in the section.
Recall the hypergraph Turán problem (Remark 1.3.3). Given an 𝑟-uniform hypergraph 𝐻
(also known as an 𝑟-graph), we write ex(𝑛, 𝐻) to be the maximum number of edges in an
𝐻-free 𝑟-graph.
The analogue of a complete bipartite graph for an 𝑟-graph is a complete 𝑟-partite 𝑟-graph
𝑲𝒔(𝒓1 ,...,𝒔
)
𝒓
. Its vertex set consists of disjoint vertex parts 𝑉1 , . . . , 𝑉𝑟 with |𝑉𝑖 | = 𝑠𝑖 for each 𝑖.
Every 𝑟-tuple in 𝑉1 × · · · × 𝑉𝑟 is an edge.

Theorem 1.5.6 (Hypergraph KST)

For every fixed positive integers 𝑟 ≥ 2 and 𝑠,
ex(𝑛, 𝐾𝑠,...,𝑠
(𝑟 )
) = 𝑜(𝑛𝑟 ).

Proof of the Erdős–Stone theorem (Theorem 1.5.5). We already saw the lower bound to
ex(𝑛, 𝐻) using a Turán graph. It remains to prove an upper bound.
Let 𝐺 be an 𝐻-free graph (where 𝐻 = 𝐾𝑠,...,𝑠 is the complete 𝑟-partite graph in the
theorem). Let 𝐺 (𝑟 ) be the 𝑟-graph with the same vertex set as 𝐺 and whose edges are the
𝑟-cliques in 𝐺. Note that 𝐺 (𝑟 ) is 𝐾𝑠,...,𝑠
(𝑟 )
-free, or else a copy of 𝐾𝑠,...,𝑠
(𝑟 )
in 𝐺 (𝑟 ) would be
supported by a copy of 𝐻 in 𝐺. Thus, by the hypergraph KST theorem (Theorem 1.5.6),
𝐺 (𝑟 ) has 𝑜(𝑛𝑟 ) edges. So 𝐺 has 𝑜(𝑛𝑟 ) copies of 𝐾𝑟 , and thus by the supersaturation theorem
1.5 Forbidding a General Subgraph: Erdős–Stone–Simonovits Theorem 29

quoted above, the edge density of 𝐺 is at most 𝜋(𝐾𝑟 ) + 𝑜(1), which equals 1 − 1/(𝑟 − 1) + 𝑜(1)
by Turán’s theorem. □
In Section 2.6, we will give another proof of the Erdős–Stone–Simonovits theorem using
the graph regularity method.

Hypergraph KST
To help keep notation simple, we first consider what happens for 3-uniform hypergraphs.

Theorem 1.5.7 (KST for 3-graphs)

For every 𝑠, there is some 𝐶 such that
2
ex(𝑛, 𝐾𝑠,𝑠,𝑠
(3)
) ≤ 𝐶𝑛3−1/𝑠 .

Recall that the KST theorem (Theorem 1.4.2) was proved by counting the number of
copies of 𝐾𝑠,1 in the graph in two different ways. For 3-graphs, we instead count the number
of copies of 𝐾𝑠,1,1
(3)
in two different ways, one of which uses the KST theorem for 𝐾𝑠,𝑠 -free
graphs.
Proof. Let 𝐺 be a 𝐾 𝑠,𝑠,𝑠 -free 3-graph with 𝑛 vertices and 𝑚 edges. Let 𝑋 denote the number
(3)

of copies of (3)
𝐾𝑠,1,1 in 𝐺 (when 𝑠 = 1, we count each copy three times).
Upper bound on 𝑋. Given a set 𝑆 of 𝑠 vertices, consider the set 𝑇 of all unordered pairs
of distinct vertices that would form a 𝐾𝑠,1,1
(3)
with 𝑆 (i.e., every triple formed by combining
a pair in 𝑇 and a vertex of 𝑆 is an edge of 𝐺). Note that 𝑇 is the edge-set of a graph on the
same 𝑛 vertices. If 𝑇 contains a 𝐾𝑠,𝑠 , then together with 𝑆 we would have a 𝐾𝑠,𝑠,𝑠
(3)
. Thus 𝑇 is
2−1/𝑠
𝐾𝑠,𝑠 -free, and hence by Theorem 1.4.2, |𝑇 | = 𝑂 𝑠 (𝑛 ). Hence

𝑛 2−1/𝑠
𝑋 ≲𝑠 𝑛 ≲𝑠 𝑛𝑠+2−1/𝑠 .
𝑠
Lower bound on 𝑋. We write deg(𝑢, 𝑣) for the number of edges in 𝐺 containing both 𝑢
and 𝑣. Then, summing over all unordered pairs of distinct vertices 𝑢, 𝑣 in 𝐺, we have
∑︁ deg(𝑢, 𝑣)
𝑋= .
𝑢,𝑣
𝑠

As in the proof of Theorem 1.4.2, let

(
𝑥(𝑥 − 1) · · · (𝑥 − 𝑠 + 1)/𝑠! if 𝑥 ≥ 𝑠 − 1
𝑓𝑠 (𝑥) =
0 𝑥 < 𝑠 − 1.

Then 𝑓𝑠 is convex and 𝑓𝑠 (𝑥) = 𝑥𝑠 for all nonnegative integers 𝑥. Since the average of
deg(𝑢, 𝑣) is 3𝑚/ 𝑛2 .
!
∑︁ 𝑛 3𝑚
𝑋= 𝑓𝑠 (deg(𝑢, 𝑣)) ≥ 𝑓𝑠 𝑛 .
𝑢,𝑣
2 2
30 Forbidding a Subgraph

Combining the upper and lower bounds, we have

!𝑠
𝑛 3𝑚
𝑛 ≲𝑠 𝑛𝑠+2−1/𝑠 .
2 2

And hence
2
𝑚 = 𝑂 𝑠 (𝑛3−1/𝑠 ). □

Exercise 1.5.8. Prove that ex(𝑛, 𝐾𝑟 ,𝑠,𝑡 ) = 𝑂 𝑟 ,𝑠,𝑡 (𝑛3−1/(𝑟 𝑠) ).

(3)

We can iterate further, using the same technique, to prove an analogous result for ev-
ery uniformity, thereby giving us the statement (Theorem 1.5.6) used in our proof of the
Erdős–Stone–Simonovits theorem earlier. Feel free to skip reading the next proof if you feel
comfortable with generalizing the above proof to 𝑟-graphs.

Theorem 1.5.9 (Hypergraph KST)

For every 𝑟 ≥ 2 and 𝑠 ≥ 1, there is some 𝐶 such that
ex(𝑛, 𝐾𝑠,...,𝑠
−𝑟+1
(𝑟 )
) ≤ 𝐶𝑛𝑟 −𝑠 ,

where 𝐾𝑠,...,𝑠
(𝑟 )
is the complete 𝑟-partite 𝑟-graph with 𝑠 vertices in each of the 𝑟 parts.

Proof. We prove by induction on 𝑟. The cases 𝑟 = 2 and 𝑟 = 3 were covered previously in

Theorem 1.4.2 and Theorem 1.5.7. Assume that 𝑟 ≥ 3 and that the theorem has already been
established for smaller values of 𝑟. (Actually we could have started at 𝑟 = 1 if we adjust the
definitions appropriately.)
Let 𝐺 be a 𝐾𝑠,...,𝑠
(𝑟 )
-free 𝑟-graph with 𝑛 vertices and 𝑚 edges. Let 𝑋 denote the number of
copies of 𝐾𝑠,1,...,1 in 𝐺 (when 𝑠 = 1, we count each copy 𝑟 times).
(𝑟 )

Upper bound on 𝑋. Given a set 𝑆 of 𝑠 vertices, consider the set 𝑇 of all unordered (𝑟 − 1)-
tuples of vertices that would form a 𝐾𝑠,1,...,1
(𝑟 )
with 𝑆 (with 𝑆 in one part, and the 𝑟 − 1 new
vertices each in its own part). Note that 𝑇 is the edge-set of an (𝑟 − 1) graph on the same
𝑛 vertices. If 𝑇 contains a 𝐾𝑠,...,𝑠
(𝑟 −1)
, then together with 𝑆 we would have a 𝐾𝑠,...,𝑠 (𝑟 )
. Thus 𝑇 is
𝐾𝑠,...,𝑠 -free, and by the induction hypothesis, |𝑇 | = 𝑂 𝑟 ,𝑠 (𝑛
(𝑟 −1) 𝑟 −1−𝑠 −𝑟+2
). Hence

𝑛 𝑟 −1−𝑠 −𝑟+2 −𝑟+2
𝑋 ≲𝑟 ,𝑠 𝑛 ≲𝑟 ,𝑠 𝑛𝑟+𝑠−1−𝑠 .
𝑠
Lower bound on 𝑋. Given a set 𝑈 of vertices, we write deg 𝑈 for the number of edges
containing all vertices in 𝑈. Then
∑︁ deg 𝑈
𝑋=
𝑠
𝑈 ∈ ( 𝑉𝑟(𝐺)
−1 )

Let 𝑓𝑠 (𝑥) be defined as in the previous proof. Since the average of deg 𝑈 over all (𝑟 − 1)-
element subsets 𝑈 is 𝑟𝑚/ 𝑟 −1
𝑛
, we have
!
∑︁ 𝑛 𝑟𝑚
𝑋= 𝑓𝑠 (deg 𝑈) ≥ 𝑓𝑠 𝑛 .
𝑟 −1 −1
𝑈 ∈ ( 𝑉𝑟(𝐺)
−1 )
𝑟
1.6 Forbidding a Cycle 31

Combining the upper and lower bounds, we have

!
𝑛 𝑟𝑚
𝑛 ≲𝑟 ,𝑠 𝑛
𝑠+𝑟 −1−𝑠 −𝑟+2
𝑓𝑠 .
𝑟 −1 𝑟 −1

And hence
−𝑟+1
𝑚 = 𝑂 𝑟 ,𝑠 (𝑛𝑟 −𝑠 ). □
Exercise 1.5.10 (Forbidding a multipartite complete hypergraph with unbalanced parts).
Prove that for every sequence of positive integers 𝑠1 , . . . , 𝑠𝑟 , there exists 𝐶 so that
ex(𝑛, 𝐾𝑠(𝑟1 ,...,𝑠
)
𝑟
) ≤ 𝐶𝑛𝑟 −1/(𝑠1 ···𝑠𝑟 −1 ) .

Exercise 1.5.11 (Erdős–Stone for hypergraphs). Let 𝐻 be an 𝑟-graph. Show that 𝜋(𝐻 [𝑠]) =
𝜋(𝐻), where 𝐻 [𝑠], the 𝑠-blow-up of 𝐻, is obtained by replacing every vertex of 𝐻 by 𝑠
duplicates of itself.

1.6 Forbidding a Cycle

In this section, we consider the problem of determining ex(𝑛, 𝐶ℓ ), the maximum number of
edges in an 𝑛-vertex graph without an ℓ-cycle.

Odd cycles
First let us consider forbidding odd cycles. Let 𝑘 be a positive integer. Then 𝐶2𝑘+1 has
chromatic number 3, and so the Erdős–Stone–Simonovits theorem (Theorem 1.5.1) tells us
that
𝑛2
ex(𝑛, 𝐶2𝑘+1 ) = (1 + 𝑜(1)) .
4
In fact, an even stronger statement is true. If 𝑛 is large enough (as a function of 𝑘), then the
complete bipartite graph 𝐾 ⌊𝑛/2⌋,⌈𝑛/2⌉ is always the extremal graph, just like in the triangle
case.

Theorem 1.6.1 (Exact Turán number of an odd cycle)

Let 𝑘 be a positive integer. Then for all sufficiently large integer 𝑛 (i.e., 𝑛 ≥ 𝑛0 (𝑘) for
some 𝑛0 (𝑘)), one has
2
𝑛
ex(𝑛, 𝐶2𝑘+1 ) = .
4

We will not prove this theorem. See Füredi & Gunderson (2015) for a more recent proof.
More generally, Simonovits (1974) developed a stability method for exactly determining
the Turán number of non-bipartite color-critical graphs.
32 Forbidding a Subgraph

Theorem 1.6.2 (Exact Turán number of a color-critical graph)

Let 𝐹 be a graph with chromatic number 𝑟 + 1 ≥ 3 and such that one can remove some
edge from 𝐹 to reduce its chromatic number to 𝑟. Then for all sufficiently large 𝑛 (i.e.,
𝑛 ≥ 𝑛0 (𝐹) for some 𝑛0 (𝐹)), the Turán graph 𝑇𝑛,𝑟 uniquely maximizes the number of
edges among all 𝑛-vertex 𝐹-free graphs.

Forbidding even cycles

Let us now turn to forbidding even cycles. Since 𝐶2𝑘 is bipartite, we know from the KST
theorem that ex(𝑛, 𝐶2𝑘 ) = 𝑜(𝑛2 ). The following upper bound was determined by Bondy &
Simonovits (1974).

Theorem 1.6.3 (Even cycles)

For every integer 𝑘 ≥ 2, there exists a constant 𝐶 so that
ex(𝑛, 𝐶2𝑘 ) ≤ 𝐶𝑛1+1/𝑘 .

Remark 1.6.4 (Tightness). We will see in Section 1.10 a matching lower bound construction
(up to constant factors) for 𝑘 = 2, 3, 5. For all other values of 𝑘, it is open whether a matching
lower bound construction exists.
Instead of proving the above theorem, we will prove a weaker result, stated below. This
weaker result has a short and neat proof, which hopefully gives some intuition as to why the
above theorem should be true.

Theorem 1.6.5 (Short even cycles)

For any integer 𝑘 ≥ 2, there exists a constant 𝐶 so that every graph 𝐺 with 𝑛 vertices and
at least 𝐶𝑛1+1/𝑘 edges contains an even cycle of length at most 2𝑘.

In other words, Theorem 1.6.5 says that

ex(𝑛, {𝐶2 , 𝐶4 , 𝐶6 , . . . , 𝐶2𝑘 }) = 𝑂 𝑘 (𝑛1+1/𝑘 ).
Here, given a set F of graphs, ex(𝑛, F ) denotes the maximum number of edges in an 𝑛-vertex
graph that does not contain any graph in F as a subgraph.
To prove this theorem, we first clean up the graph by removing some edges and vertices
to get a bipartite subgraph with large minimum degree.

Lemma 1.6.6 (Large bipartite subgraph)

Every 𝐺 has a bipartite subgraph with at least 𝑒(𝐺)/2 edges.

Proof. Color every vertex with red or blue independently and uniformly at random. Then the
expected number of non-monochromatic edges is 𝑒(𝐺)/2. Hence there exists a coloring that
has at least 𝑒(𝐺)/2 non-monochromatic edges, and these edges form the desired bipartite
subgraph. □
1.7 Forbidding a Sparse Bipartite Graph: Dependent Random Choice 33

Lemma 1.6.7 (Large average degree implies subgraph with large minimum degree)
Let 𝑡 ∈ R. Every graph with average degree 2𝑡 has a subgraph with minimum degree
greater than 𝑡.

Proof. Let 𝐺 be a graph with average degree 2𝑡. Removing a vertex of degree at most 𝑡
cannot decrease the average degree, since the total degree goes down by at most 2𝑡 and so
the post-deletion graph has average degree at least (2𝑒(𝐺) − 2𝑡)/(𝑣(𝐺) − 1), which is at
least 2𝑒(𝐺)/𝑣(𝐺) since 2𝑒(𝐺)/𝑣(𝐺) ≥ 2𝑡. Let us repeatedly delete vertices of degree at
most 𝑡 in the remaining graph, until every vertex has degree more than 𝑡. This algorithm
must terminate with a non-empty graph since we cannot ever drop below 2𝑡 vertices in this
process (as such a graph would have average degree less than 2𝑡). □
Proof of Theorem 1.6.5. The idea is to use a breath-first search. Suppose 𝐺 contains no
even cycles of length at most 2𝑘. Applying Lemma 1.6.6 followed by Lemma 1.6.7, we find
a bipartite subgraph 𝐺 ′ of 𝐺 with minimum degree > 𝑡 := 𝑒(𝐺)/(2𝑣(𝐺)). Let 𝑢 be an
arbitrary vertex of 𝐺 ′ . For each 𝑖 = 0, 1, . . . , 𝑘, let 𝐴𝑖 denote the set of vertices at distance
exactly 𝑖 from 𝑢.

···

𝑢 ···
𝐴0
···
𝐴1
𝐴2 𝐴𝑡

For each 𝑖 = 1, . . . , 𝑘 − 1, every vertex of 𝐴𝑖 has

• no neighbors inside 𝐴𝑖 (or else 𝐺 ′ would not be bipartite),
• exactly one neighbor in 𝐴𝑖−1 (else we can backtrace through two neighbors which must
converge at some point to form an even cycle of length at most 2𝑘),
• and thus > 𝑡 − 1 neighbors in 𝐴𝑖+1 (by the minimum degree assumption on 𝐺 ′ ).
Therefore, each layer 𝐴𝑖 expands to the next by a factor of at least 𝑡 − 1. Hence
𝑘
𝑒(𝐺)
𝑣(𝐺) ≥ | 𝐴 𝑘 | ≥ (𝑡 − 1) ≥
𝑘
−1 .
2𝑣(𝐺)
And thus
𝑒(𝐺) ≤ 2𝑣(𝐺) 1+1/𝑘 + 2𝑣(𝐺). □
Exercise 1.6.8 (Extremal number of trees). Let 𝑇 be a tree with 𝑘 edges. Show that
ex(𝑛, 𝑇) ≤ 𝑘𝑛.

1.7 Forbidding a Sparse Bipartite Graph: Dependent Random Choice

Every bipartite graph 𝐻 is contained in some 𝐾𝑠,𝑡 , and thus by the KST theorem (Theo-
rem 1.4.2), ex(𝑛, 𝐻) ≤ ex(𝑛, 𝐾𝑠,𝑡 ) = 𝑂 𝑠,𝑡 (𝑛2−1/𝑠 ). The main result of this section, below,
34 Forbidding a Subgraph

gives a significant improvement when the maximum degree of 𝐻 is small. The proof intro-
duces an important probabilistic technique known as dependent random choice.

Theorem 1.7.1 (Bounded degree bipartite graph: Turán number upper bound)
Let 𝐻 be a bipartite graph with vertex bipartition 𝐴 ∪ 𝐵 such that every vertex in 𝐴 has
degree at most 𝑟. Then there exists a constant 𝐶 = 𝐶𝐻 such that for all 𝑛,
ex(𝑛, 𝐻) ≤ 𝐶𝑛2−1/𝑟 .

Remark 1.7.2 (History). The result was first proved by Füredi (1991). The proof given here
is due to Alon, Krivelevich, & Sudakov (2003a). For more applications of the dependent
random choice technique see the survey by Fox & Sudakov (2011).
Remark 1.7.3 (Tightness). The exponent 2 − 1/𝑟 is best possible as a function of 𝑟. Indeed,
we will see in the following section that for every 𝑟 there exists some 𝑠 so that ex(𝑛, 𝐾𝑟 ,𝑠 ) ≥
𝑐𝑛2−1/𝑟 for some 𝑐 = 𝑐(𝑟, 𝑠) > 0.
On the other hand, for specific graphs 𝐺, Theorem 1.7.1 may not be tight. For example,
ex(𝑛, 𝐶6 ) = Θ(𝑛4/3 ), whereas Theorem 1.7.1 only tells us that ex(𝑛, 𝐶6 ) = 𝑂 (𝑛3/2 ).
Given a graph 𝐺 with many edges, we wish to find a large subset 𝑈 of vertices such
that every 𝑟-vertex subset of 𝑈 has many common neighbors in 𝐺 (even the case 𝑟 = 2 is
interesting). Once such a 𝑈 is found, we can then embed the 𝐵-vertices of 𝐻 into 𝑈. It will
then be easy to embed the vertices of 𝐴. The tricky part is to find such a 𝑈.
Remark 1.7.4 (Intuition). We want to host a party so that each pair of party-goers has many
common friends (here 𝐺 is the friendship graph). Whom should we invite? Inviting people
uniformly at random is not a good idea (why?). Perhaps we can pick some random individual
(Alice) to host a party inviting all her friends. Alice’s friends are expected to share some
common friends—at least they all know Alice.
We can take a step further, and pick a few people at random (Alice, Bob, Carol, David)
and have them host a party and invite all their common friends. This will likely be an even
more sociable crowd. At least all the party goers will know all the hosts, and likely even
more. As long as the social network is not too sparse, there should be lots of invitees.
Some invitees (e.g., Zack) might feel a bit out of place at the party—maybe they don’t have
many common friends with other party-goers (they all know the hosts but maybe Zack doesn’t
know many others). To prevent such awkwardness, the hosts will cancel Zack’s invitation.
There shouldn’t be too many people like Zack. The party must go on.

Here is the technical statement that we will prove. While there are many parameters, the
specific details are less important compared to the proof technique. This is quite a tricky
proof.
1.7 Forbidding a Sparse Bipartite Graph: Dependent Random Choice 35

Theorem 1.7.5 (Dependent random choice)

Let 𝑛, 𝑟, 𝑚, 𝑡 be positive integers and 𝛼 > 0. Then every graph 𝐺 with 𝑛 vertices and at
least 𝛼𝑛2 /2 edges contains a vertex subset 𝑈 with

𝑡 𝑛 𝑚 𝑡
|𝑈| ≥ 𝑛𝛼 −
𝑟 𝑛
such that every 𝑟-element subset 𝑆 of 𝑈 has more than 𝑚 common neighbors in 𝐺.

Remark 1.7.6 (Parameters). In the theorem statement, 𝑡 is an auxiliary parameter that does
not appear in the conclusion. While one can optimize for 𝑡, it is instructive and convenient to
leave it as is. The theorem is generally applied to graphs with at least 𝑛1−𝑐 edges, for some
small 𝑐 > 0, and we can play with the parameters to get |𝑈| and 𝑚 both large as desired.
Proof. We say that an 𝑟-element subset of 𝑉 (𝐺) is “bad” if it has at most 𝑚 common
neighbors in 𝐺.
Let 𝑢 1 , . . . , 𝑢 𝑡 be vertices chosen uniformly and independently at random from 𝑉 (𝐺)
(these vertices are chosen “with replacement”, i.e., they can repeat). Let 𝐴 be their common
neighborhood. (Keep in mind that 𝑢 1 , . . . , 𝑢 𝑡 , 𝐴 are random. It may be a bit confusing in this
proof what is random and what is not.)
𝑢1 ... 𝑢𝑡

Each fixed vertex 𝑣 ∈ 𝑉 (𝐺) has probability (deg(𝑣)/𝑛) 𝑡 of being adjacent to all of 𝑢 1 , . . . , 𝑢 𝑡 ,
and so by linearity of expectations and convexity,
!𝑡
∑︁ ∑︁ deg(𝑣) 𝑡 1 ∑︁ deg(𝑣)
E | 𝐴| = P(𝑣 ∈ 𝐴) = ≥𝑛 ≥ 𝑛𝛼𝑡 .
𝑣 ∈𝑉 (𝐺) 𝑣 ∈𝑉 (𝐺)
𝑛 𝑛 𝑣 ∈𝑉
𝑛

For any fixed 𝑅 ⊆ 𝑉 (𝐺),

𝑡
# common neighbors of 𝑅
P(𝑅 ⊆ 𝐴) = P(𝑅 is complete to 𝑢 1 , . . . , 𝑢 𝑡 ) = .
𝑛
If 𝑅 is a bad 𝑟-vertex subset, then it has at most 𝑚 common neighbors, and so
𝑚 𝑡
P(𝑅 ⊆ 𝐴) ≤ .
𝑛

Therefore, summing over all 𝑛𝑟 possible 𝑟-vertex subsets 𝑅 ⊆ 𝑉 (𝐺), by linearity of expec-
tation,

𝑛 𝑚 𝑡
E[the number of bad 𝑟-vertex subsets of 𝐴] ≤ .
𝑟 𝑛
36 Forbidding a Subgraph

Let 𝑈 be obtained from 𝐴 by deleting an element from each bad 𝑟-vertex subset. So 𝑈 has
no bad 𝑟-vertex subsets. Also
E |𝑈| ≥ E | 𝐴| − E[the number of bad 𝑟-vertex subsets of 𝐴]

𝑛 𝑚 𝑡
≥ 𝑛𝛼𝑡 − .
𝑟 𝑛
Thus there exists some 𝑈 with at least this size, with the property that all its 𝑟-vertex subsets
have more than 𝑚 common neighbors. □
Now we are ready to show Theorem 1.7.1, which recall says that for a bipartite graph
𝐻 with vertex bipartition 𝐴 ∪ 𝐵 such that every vertex in 𝐴 has degree at most 𝑟, one has
ex(𝑛, 𝐻) = 𝑂 𝐻 (𝑛2−1/𝑟 ).
1
Proof of Theorem 1.7.1. Let 𝐺 be a graph with 𝑛 vertices and at least 𝐶𝑛2− 𝑟 edges. By
choosing 𝐶 large enough (depending only on | 𝐴| + |𝐵|), we have
𝑟
1 𝑟 𝑛 | 𝐴| + |𝐵|
𝑛 2𝐶𝑛 − 𝑟 − ≥ |𝐵| .
𝑟 𝑛
We want to show that 𝐺 contains 𝐻 as a subgraph. By dependent random choice (Theo-
rem 1.7.5 applied with 𝑡 = 𝑟), we can embed the 𝐵-vertices of 𝐻 into 𝐺 so that every 𝑟-vertex
subset of 𝐵 (now viewed as a subset of 𝑉 (𝐺)) has > | 𝐴| + |𝐵| common neighbors.
𝐴 𝐵

𝑏1 𝑁 (𝑏1 ) ∩ 𝑁 (𝑏2 ) ∩ 𝑁 (𝑏3 )

𝑣
𝑏2
𝑏3
𝑏4 𝑏1 𝑏2 𝑏3 𝑏4

𝐻 𝐺

Next, we embed the vertices of 𝐴 one at a time. Suppose we need to embed 𝑣 ∈ 𝐴 (some
previous vertices of 𝐴 may have already been embedded at this point). Note that 𝑣 has at
≤ 𝑟 neighbors in 𝐵, and these ≤ 𝑟 vertices in 𝐵 have > | 𝐴| + |𝐵| common neighbors in 𝐺.
While some of these common neighbors may have already been used up in earlier steps to
embed vertices of 𝐻, there are enough of them that they cannot all be used up, and thus we
can embed 𝑣 to some remaining common neighbor. This process ends with an embedding of
𝐻 into 𝐺. □
Exercise 1.7.7. Let 𝐻 be a bipartite graph with vertex bipartition 𝐴 ∪ 𝐵, such that 𝑟
vertices in 𝐴 are complete to 𝐵, and all remaining vertices in 𝐴 have degree at most 𝑟.
Prove that there is some constant 𝐶 = 𝐶𝐻 such that ex(𝑛, 𝐻) ≤ 𝐶𝑛2−1/𝑟 for all 𝑛.

Exercise 1.7.8. Let 𝜀 > 0. Show that, for sufficiently large 𝑛, every 𝐾4 -free graph with 𝑛
vertices and at least 𝜀𝑛2 edges contains an independent set of size at least 𝑛1− 𝜀 .
1.8 Lower Bound Constructions: Overview 37

Exercise 1.7.9 (Extremal numbers of degenerate graphs).

(a*) Prove that there is some absolute constant 𝑐 > 0 so that for every positive integer 𝑟,
every 𝑛-vertex graph with at least 𝑛2−𝑐/𝑟 edges contains disjoint non-empty vertex
subsets 𝐴 and 𝐵 such that every subset of at most 𝑟 vertices in 𝐴 has at least 𝑛𝑐
common neighbors in 𝐵 and every subset of at most 𝑟 vertices in 𝐵 has at least 𝑛𝑐
neighbors in 𝐴.
Hint: Apply the technique from the dependent random choice proof repeatedly back and forth to the two vertex parts.

(b) We say that a graph 𝐻 is 𝒓-degenerate if its vertices can be ordered so that every
vertex has at most 𝑟 neighbors that appear before it in the ordering. Show that
for every 𝑟-degenerate bipartite graph 𝐻 there is some constant 𝐶 > 0 so that
ex(𝑛, 𝐻) ≤ 𝐶𝑛2−𝑐/𝑟 , where 𝑐 is the same absolute constant from part (a) (𝑐 should
not depend on 𝐻 or 𝑟).

1.8 Lower Bound Constructions: Overview

We proved various upper bounds on ex(𝑛, 𝐻) in earlier sections. When 𝐻 is non-bipartite,
the Turán graph construction (Construction 1.2.2) shows that the upper bound in the Erdős–
Stone–Simonovits theorem (Theorem 1.5.1) is tight up to lower order terms. However, when
𝐻 is bipartite, so that ex(𝑛, 𝐻) = 𝑜(𝑛2 ), we have not seen any non-trivial lower bound
constructions. In the remainder of this chapter, we will see some methods for constructing
𝐻-free graphs for bipartite 𝐻. In some cases, these constructions will have enough edges to
match the upper bounds on ex(𝑛, 𝐻) from earlier sections. However, for most bipartite graphs
𝐻, there is a gap in known upper and lower bounds on ex(𝑛, 𝐻). It is a central problem in
extremal graph theory to close this gap.
We will see three methods for constructing 𝐻-free graphs.

Randomized constructions
The idea is to take a random graph at a density that gives a small number of copies of 𝐻,
and then destroy these copies of 𝐻 by removing some edges from the random graph. The
resulting graph is then 𝐻-free. This method is easy to implement and applies quite generally
to all 𝐻. For example, it will be shown that
𝑣 (𝐻) −2
ex(𝑛, 𝐻) = Ω𝐻 𝑛2− 𝑒 (𝐻) −1 .

However, bounds arising from this method are usually not tight.

Algebraic constructions
The idea is to use algebraic geometry over a finite field to construct a graph. Its vertices
correspond to geometric objects such as points or lines. Its edges correspond to incidences
or other algebraic relations. These constructions sometimes give tight bounds. They work
for a small number of graphs 𝐻, and usually require a different ad hoc idea for each 𝐻. They
work rarely, but when they do, they can appear quite mysterious, or even magical. Many
38 Forbidding a Subgraph

important tight lower bounds on bipartite extremal numbers arise this way. In particular it
will be shown that

ex(𝑛, 𝐾𝑠,𝑡 ) = Ω𝑠,𝑡 𝑛2−1/𝑠 whenever 𝑡 ≥ (𝑠 − 1)! + 1,
thereby matching the KST theorem (Theorem 1.4.2) for such 𝑠, 𝑡. Also, it will be shown that

ex(𝑛, 𝐶2𝑘 ) = Ω𝑘 𝑛1+1/𝑘 whenever 𝑘 ∈ {2, 3, 5},
thereby matching Theorem 1.6.3 for these values of 𝑘.

Randomized algebraic constructions

In algebraic constructions, usually we specify the edges using some specific well-chosen
polynomials. A powerful recent idea is to choose the edge-defining polynomials at random.

1.9 Randomized Constructions

We use the probabilistic method to construct an 𝐻-free graph. The Erdős–Rényi random
graph G(𝑛, 𝑝) is the random graph on 𝑛 vertices where every pair of vertices forms an edge
independently with probability 𝑝. We first take a G(𝑛, 𝑝) with an appropriately chosen 𝑝.
The number of copies of 𝐻 in G(𝑛, 𝑝) is expected to be small, and we can destroy all such
copies of 𝐻 from the random graph by removing some edges. The remaining graph will then
be 𝐻-free.
The method of starting with a simple random object and then modifying it is sometimes
called alteration method or the deletion method.
Theorem 1.9.1 (Randomized lower bound)
Let 𝐻 be a graph with at least two edges. Then there exists a constant 𝑐 = 𝑐 𝐻 > 0, so that
𝑣 (𝐻) −2
for all 𝑛 ≥ 2, there exists an 𝐻-free graph on 𝑛 vertices with at least 𝑐𝑛2− 𝑒 (𝐻) −1 edges. In
other words,
𝑣 (𝐻) −2
ex(𝑛, 𝐻) ≥ 𝑐𝑛2− 𝑒 (𝐻) −1 .

Proof. Let 𝐺 be an instance of the Erdős–Rényi random graph G(𝑛, 𝑝), with
1 − 𝑒𝑣 (𝐻)
(𝐻) −2
𝑝= 𝑛 −1
4

(chosen with hindsight). We have E 𝑒(𝐺) = 𝑝 𝑛2 . Let 𝑋 denote the number of copies of 𝐻
in 𝐺. Then, our choice of 𝑝 ensures that

𝑝 𝑛 1
E𝑋 ≤ 𝑝 𝑒 (𝐻 ) 𝑛𝑣 (𝐻 ) ≤ = E 𝑒(𝐺).
2 2 2
Thus
𝑝 𝑛 𝑣 (𝐻) −2
E[𝑒(𝐺) − 𝑋] ≥ ≳ 𝑛2− 𝑒 (𝐻) −1 .
2 2
Take a graph 𝐺 such that 𝑒(𝐺) − 𝑋 is at least its expectation. Remove one edge from each
𝑣 (𝐻) −2
copy of 𝐻 in 𝐺, and we get an 𝐻-free graph with at least 𝑒(𝐺) − 𝑋 ≳ 𝑛2− 𝑒 (𝐻) −1 edges. □
1.9 Randomized Constructions 39

For some graphs 𝐻, we can bootstrap Theorem 1.9.1 to give an even better lower bound.
For example, if

𝐻= ,

then 𝑣(𝐻) = 10 and 𝑒(𝐻) = 20, so applying Theorem 1.9.1 directly gives
ex(𝑛, 𝐻) ≳ 𝑛2−8/19 .
On the other hand, any 𝐾4,4 -free graph is automatically 𝐻-free. Applying Theorem 1.9.1 to
𝐾4,4 (8-vertex 16-edge) actually gives a better lower bound (2 − 6/15 > 2 − 8/19):
ex(𝑛, 𝐻) ≥ ex(𝑛, 𝐾4,4 ) ≳ 𝑛2−6/15 .
In general, given 𝐻, we should apply Theorem 1.9.1 to the subgraph of 𝐻 with the
maximum (𝑒(𝐻) − 1)/(𝑣(𝐻) − 2) ratio. This gives the following corollary, which sometimes
gives a better lower bound than directly applying Theorem 1.9.1.

Definition 1.9.2 (2-density)

The 2-density of a graph 𝐻 is defined by
𝑒(𝐻 ′ ) − 1
𝒎 2 (𝑯) := max .
′
𝐻 ⊆𝐻 𝑣(𝐻 ′ ) − 2
𝑒 (𝐻 ′ ) ≥2

Corollary 1.9.3 (Randomized lower bound)

For any graph 𝐻 with at least two edges, there exists constant 𝑐 = 𝑐 𝐻 > 0 such that
ex(𝑛, 𝐻) ≥ 𝑐𝑛2−1/𝑚2 (𝐻 ) .
𝑒 (𝐻 ′ ) −1
Proof. Let 𝐻 ′ be the subgraph of 𝐻 with 𝑚 2 (𝐻) = 𝑣 (𝐻 ′ ) −2
.Then ex(𝑛, 𝐻) ≥ ex(𝑛, 𝐻 ′ ), and
2−1/𝑚2 (𝐻 )
we can apply Theorem 1.9.1 to get ex(𝑛, 𝐻) ≥ 𝑐𝑛 . □
Example 1.9.4. Theorem 1.9.1 combined with the upper bound from the KST theorem
(Theorem 1.4.2) gives that for every fixed 2 ≤ 𝑠 ≤ 𝑡,
1
𝑛2− 𝑠𝑡 −1 ≲ ex(𝑛, 𝐾𝑠,𝑡 ) ≲ 𝑛2− 𝑠 .
𝑠+𝑡 −2

When 𝑡 is large compared to 𝑠, the exponents in the two bounds above are close to each other
(but never equal). When 𝑡 = 𝑠, the above bounds specialize to
2 1
𝑛2− 𝑠+1 ≲ ex(𝑛, 𝐾𝑠,𝑠 ) ≲ 𝑛2− 𝑠 .
In particular, for 𝑠 = 2,
𝑛4/3 ≲ ex(𝑛, 𝐾2,2 ) ≲ 𝑛3/2 .
It turns out that the upper bound is tight. We will show this in the next section using an
algebraic construction.
40 Forbidding a Subgraph

Exercise 1.9.5. Show that if 𝐻 is a bipartite graph containing a cycle of length 2𝑘, then
ex(𝑛, 𝐻) ≳ 𝐻 𝑛1+1/(2𝑘−1) .
1 2
Exercise 1.9.6. Find a graph 𝐻 with 𝜒(𝐻) = 3 and ex(𝑛, 𝐻) > 4
𝑛 + 𝑛1.99 for all
sufficiently large 𝑛.

1.10 Algebraic Constructions

In this section, we use algebraic methods to construct 𝐾𝑠,𝑡 -free graphs for certain values of
(𝑠, 𝑡), as well as 𝐶2𝑘 -free graphs for certain values of 𝑘. In both cases, the constructions are
optimal in that they match the upper bounds up to a constant factor.

𝐾2,2 -free
We begin by constructing 𝐾2,2 -free graphs with the number of edges matching the KST theo-
rem. The construction is due to Erdős, Rényi, & Sós (1966) and Brown (1966) independently.

Theorem 1.10.1 (Construction of 𝐾2,2 -free graphs)

1
ex(𝑛, 𝐾2,2 ) ≥ − 𝑜(1) 𝑛3/2 .
2

Combining with the KST theorem, we obtain the corollary.

Corollary 1.10.2 (Turán number of 𝐾2,2 )

1
ex(𝑛, 𝐾2,2 ) = − 𝑜(1) 𝑛3/2 .
2

Before giving the proof of Theorem 1.10.1, let us first sketch the geometric intuition.
Given a set of points P and a set of lines L, the point-line incidence graph is the bipartite
graph with two vertex parts P and L, where 𝑝 ∈ P and ℓ ∈ L are adjacent if 𝑝 ∈ ℓ.

P 𝑝∈ℓ L
𝑝
ℓ

A point-line incidence graph is 𝐶4 -free. Indeed, a 𝐶4 would correspond to two lines both
passing through two distinct points, which is impossible.
We want to construct a set of points and a set of lines so that there are many incidences.
To do this, we take all points and all lines in a finite field plane F2𝑝 . There are 𝑝 2 points
and 𝑝 2 + 𝑝 lines. Since every line contains 𝑝 points, the graph has around 𝑝 3 edges, and
1.10 Algebraic Constructions 41

so ex(2𝑝 2 + 𝑝, 𝐾2,2 ) ≥ 𝑝 3 . By rounding down an integer 𝑛 to the closest number of the

form 2𝑝 2 + 𝑝 for a prime 𝑝, we already see that ex(𝑛, 𝐾2,2 ) ≳ 𝑛3/2 for all 𝑛. Here we
use a theorem from number theory regarding large gaps in primes which we quote below
without proof. (This strategy does not work if we instead take points and lines in R2 ; see the
Szemerédi–Trotter theorem in Section 8.2).

Theorem 1.10.3 (Large gaps between primes)

The largest prime below 𝑁 has size 𝑁 − 𝑜(𝑁).

Remark 1.10.4 (Large gaps between primes). The above result already follows from the
prime number theorem, which says that the number of primes up to 𝑁 is (1 + 𝑜(1))𝑁/log 𝑁.
The best quantitative result, due to Baker, Harman, & Pintz (2001), says that there exists a
prime in [𝑁 − 𝑁 0.525 , 𝑁] for all sufficiently large 𝑁. Cramer’s conjecture, which is wide open
and based on a random model of the primes, speculates that the 𝑜(𝑁) in Theorem 1.10.3 may
be replaced by 𝑂 ((log 𝑁) 2 ). An easier claim is Bertrand’s postulate, which says that there is a
prime between 𝑁 and 2𝑁 for every 𝑁, and this already suffices for proving ex(𝑛, 𝐾2,2 ) ≳ 𝑛3/2 .

To get a better constant in the above construction, we optimize somewhat by using the
same vertices to represent both points and lines. This pairing of points and lines is known as
polarity in projective geometry, and this construction is known as the polarity graph (usually
this usually refers to the projective plane version of the construction).
Proof of Theorem 1.10.1. Let 𝑝 denote the largest prime such that 𝑝 2 − 1 ≤ 𝑛. Then 𝑝 =
√
(1 − 𝑜(1)) 𝑛 by Theorem 1.10.3. Let 𝐺 be a graph with vertex set 𝑉 (𝐺) = F2𝑝 \ {(0, 0)} and
an edge between (𝑥, 𝑦) and (𝑎, 𝑏) if and only if 𝑎𝑥 + 𝑏𝑦 = 1 in F 𝑝 .
For any two distinct vertices (𝑎, 𝑏) and (𝑎 ′ , 𝑏 ′ ) in 𝑉 (𝐺), they have at most one common
neighbor since there is at most one solution to the system 𝑎𝑥 + 𝑏𝑦 = 1 and 𝑎 ′ 𝑥 + 𝑏 ′ 𝑦 = 1.
Therefore, 𝐺 is 𝐾2,2 -free. (This is where we use the fact that two lines intersect in at most
one point.)
For every (𝑎, 𝑏) ∈ 𝑉 (𝐺), there are exactly 𝑝 vertices (𝑥, 𝑦) satisfying 𝑎𝑥 + 𝑏𝑦 = 1.
However, one of those vertices could be (𝑎, 𝑏) itself. So every vertex in 𝐺 has degree 𝑝 or
𝑝 − 1. Hence 𝐺 has at least ( 𝑝 2 − 1) ( 𝑝 − 1)/2 = (1/2 − 𝑜(1))𝑛3/2 edges. □

𝐾3,3 -free
Next, we construct 𝐾3,3 -free graphs with the number of edges matching the KST theorem.
This construction is due to Brown (1966).

Theorem 1.10.5 (Construction of 𝐾3,3 -free graphs)

For every 𝑛,

1
ex(𝑛, 𝐾3,3 ) ≥ − 𝑜(1) 𝑛5/3 .
2

Consider the incidence between between points in 3-dimensions and unit spheres. This
graph is 𝐾3,3 -free since no three unit spheres can share three distinct common points. Again,
42 Forbidding a Subgraph

one needs to do this over a finite field to attain the desired bounds, but it is easier to visualize
the setup in Euclidean space, where it is clearly true.
Proof sketch. Let 𝑝 be the largest prime less than 𝑛1/3 . Fix a nonzero element 𝑑 ∈ F 𝑝 , which
we take to be a quadratic residue if 𝑝 ≡ 3 (mod 4) and a quadratic non-residue if 𝑝 . 3
(mod 4). Construct a graph 𝐺 with vertex set 𝑉 (𝐺) = F3𝑝 , and an edge between (𝑥, 𝑦, 𝑧) and
(𝑎, 𝑏, 𝑐) ∈ 𝑉 (𝐺) if and only if
(𝑎 − 𝑥) 2 + (𝑏 − 𝑦) 2 + (𝑐 − 𝑧) 2 = 𝑑.
It turns out that each vertex has (1 − 𝑜(1)) 𝑝 2 neighbors (the intuition here is that, for a
fixed (𝑎, 𝑏, 𝑐), if we choose 𝑥, 𝑦, 𝑧 ∈ F 𝑝 independently and uniformly at random, then the
resulting sum (𝑎 − 𝑥) 2 + (𝑏 − 𝑦) 2 + (𝑐 − 𝑧) 2 is roughly uniformly distributed, and hence
equals to 𝑑 with probability close to 1/𝑝). It remains to show that the graph is 𝐾3,3 -free.
To see this, think about how one might prove this claim in R3 via algebraic manipulations.
We compute the radical planes between pairs of spheres as well as the intersections of these
radical planes (i.e., the radical axis). The claim boils down to the fact that no sphere has three
collinear points, which is true due to the quadratic (non)residue hypothesis on 𝑑. The details
are omitted.
Thus 𝐺 is a 𝐾3,3 -free graph on 𝑝 3 ≤ 𝑛 vertices and with at least (1/2 − 𝑜(1)) 𝑝 5 =
(1/2 − 𝑜(1))𝑛5/3 edges. □
It is unknown if the above ideas can be extended to construct 𝐾4,4 -free graphs with Ω(𝑛7/4 )
edges. It is a major open problem to determine the asymptotics of ex(𝑛, 𝐾4,4 ).

Conjecture 1.10.6 (KST theorem is tight)

For every fixed 𝑠 ≥ 4, one has
ex(𝑛, 𝐾𝑠,𝑠 ) = Θ𝑠 (𝑛2−1/𝑠 ).

𝐾𝑠,𝑡 -free
Now we present a substantial generalization of the above constructions, due to Kollár, Rónyai,
& Szabó (1996) and Alon, Rónyai, & Szabó (1999). It gives a matching lower bound (up to
a constant factor) to the KST theorem for 𝐾𝑠,𝑡 whenever 𝑡 is sufficiently large compared to 𝑠.

Theorem 1.10.7 (Tightness of KST bound when 𝑡 > (𝑠 − 1)!)

Fix a positive integer 𝑠 ≥ 2. Then

1
ex(𝑛, 𝐾𝑠, (𝑠−1)!+1 ) ≥ − 𝑜(1) 𝑛2−1/𝑠 .
2

Corollary 1.10.8 (Tightness of KST bound when 𝑡 > (𝑠 − 1)!)

If 𝑡 > (𝑠 − 1)!, then
ex(𝑛, 𝐾𝑠,𝑡 ) = Θ𝑠,𝑡 (𝑛2−1/𝑠 ).
1.10 Algebraic Constructions 43

We first prove a slightly weaker version of Theorem 1.10.7, namely that

1
ex(𝑛, 𝐾𝑠,𝑠!+1 ) ≥ − 𝑜(1) 𝑛2−1/𝑠 .
2
(Kollár, Rónyai, & Szabó 1996). Afterwards, we will modify the construction to prove
Theorem 1.10.7.
Let 𝑝 be a prime. Recall that the norm map 𝑁 : F 𝑝 𝑠 → F 𝑝 is defined by
2 𝑝 𝑠 −1
𝑁 (𝑥) := 𝑥 · 𝑥 𝑝 · 𝑥 𝑝 · · · 𝑥 𝑝
𝑠−1
=𝑥 𝑝−1 .
Note that 𝑁 (𝑥) ∈ F 𝑝 for all 𝑥 ∈ F 𝑝 𝑠 since 𝑁 (𝑥) 𝑝 = 𝑁 (𝑥) and F 𝑝 is the set of elements in
F 𝑝 𝑠 invariant under the automorphism 𝑥 ↦→ 𝑥 𝑝 . Furthermore, since F×𝑝 𝑠 is a cyclic group of
order 𝑝 𝑠 − 1, we know that
𝑝𝑠 − 1
{𝑥 ∈ F 𝑝 𝑠 : 𝑁 (𝑥) = 1} = . (1.2)
𝑝−1

Construction 1.10.9 (Norm graph)

NormGraph 𝒑,𝒔 is defined to be the graph with vertex set F 𝑝 𝑠 and an edge between distinct
𝑎, 𝑏 ∈ F 𝑝 𝑠 if 𝑁 (𝑎 + 𝑏) = 1.

By (1.2), every vertex in NormGraph 𝑝,𝑠 has degree at least

𝑝𝑠 − 1
− 1 ≥ 𝑝 𝑠−1
𝑝−1
(we had to subtract 1 in case 𝑁 (𝑥 + 𝑥) = 1). And thus the number of edges is at least 𝑝 2𝑠−1 /2.
It remains to establish that NormGraph 𝑝,𝑠 is 𝐾𝑠,𝑠!+1 -free. Once this is done, we can take 𝑝 to
be the largest prime at most 𝑛1/𝑠 , and then

𝑝 2𝑠−1 1
ex(𝑛, 𝐾𝑠,𝑠!+1 ) ≥ ex( 𝑝 , 𝐾𝑠,𝑠!+1 ) ≥
𝑠
≥ − 𝑜(1) 𝑛2−1/𝑠 .
2 2

Proposition 1.10.10
NormGraph 𝑝,𝑠 is 𝐾 𝑠,𝑠!+1 -free for all 𝑠 ≥ 2.

We wish to upper bound the number of common neighbors to a set of 𝑠 vertices. This
amount to showing that a certain system of algebraic equations cannot have too many
solutions. We quote without proof the following key algebraic result from Kollár, Rónyai, &
Szabó (1996), which can be proved using algebraic geometry.
44 Forbidding a Subgraph

Theorem 1.10.11
Let F be any field and 𝑎 𝑖 𝑗 , 𝑏 𝑖 ∈ F such that 𝑎 𝑖 𝑗 ≠ 𝑎 𝑖′ 𝑗 for all 𝑖 ≠ 𝑖 ′ . Then the system of
equations
(𝑥 1 − 𝑎 11 ) (𝑥2 − 𝑎 12 ) · · · (𝑥 𝑠 − 𝑎 1𝑠 ) = 𝑏 1
(𝑥 1 − 𝑎 21 ) (𝑥2 − 𝑎 22 ) · · · (𝑥 𝑠 − 𝑎 2𝑠 ) = 𝑏 2
..
.
(𝑥1 − 𝑎 𝑠1 ) (𝑥2 − 𝑎 𝑠2 ) · · · (𝑥 𝑠 − 𝑎 𝑠𝑠 ) = 𝑏 𝑠
has at most 𝑠! solutions (𝑥 1 , . . . , 𝑥 𝑠 ) ∈ F𝑠 .

Remark 1.10.12 (Special base 𝑏 = 0). Consider the special case when all the 𝑏 𝑖 are 0. In
this case, since the 𝑎 𝑖 𝑗 are distinct for each fixed 𝑗, every solution to the system corresponds
to a permutation 𝜋 : [𝑠] → [𝑠], setting 𝑥𝑖 = 𝑎 𝑖 𝜋 (𝑖) . So there are exactly 𝑠! solutions in
this special case. The difficult part of the theorem says that the number of solutions cannot
increase if we move 𝑏 away from the origin.
Proof of Proposition 1.10.10. Consider distinct 𝑦 1 , 𝑦 2 , . . . , 𝑦 𝑠 ∈ F 𝑝 𝑠 . We wish to bound the
number of common neighbors 𝑥. Recall that in a field with characteristic 𝑝, we have the
identity (𝑥 + 𝑦) 𝑝 = 𝑥 𝑝 + 𝑦 𝑝 for all 𝑥, 𝑦. So
1 = 𝑁 (𝑥 + 𝑦 𝑖 ) = (𝑥 + 𝑦 𝑖 ) (𝑥 + 𝑦 𝑖 ) 𝑝 . . . (𝑥 + 𝑦 𝑖 ) 𝑝
𝑠−1

𝑠−1 𝑠−1
= (𝑥 + 𝑦 𝑖 ) (𝑥 𝑝 + 𝑦 𝑖𝑝 ) . . . (𝑥 𝑝 + 𝑦 𝑖𝑝 )
for all 1 ≤ 𝑖 ≤ 𝑠. By Theorem 1.10.11, these 𝑠 equations (as 𝑖 ranges over [𝑠]) have at most
𝑠! solutions in 𝑥. Note the hypothesis of Theorem 1.10.11 is satisfied since 𝑦 𝑖𝑝 = 𝑦 𝑝𝑗 if and
only if 𝑦 𝑖 = 𝑦 𝑗 in F 𝑝 𝑠 . □
Now we modify the norm graph construction to forbid 𝐾𝑠, (𝑠−1)!+1 , thereby yielding Theo-
rem 1.10.7.
Construction 1.10.13 (Projective norm graph)
Let ProjNormGraph 𝒑,𝒔 be the graph with vertex set F 𝑝 𝑠−1 × F×𝑝 , where two vertices
(𝑋, 𝑥), (𝑌 , 𝑦) ∈ F 𝑝 𝑠−1 × F×𝑝 are adjacent if and only if
𝑁 (𝑋 + 𝑌 ) = 𝑥𝑦.

In ProjNormGraph 𝑝,𝑠 , every vertex (𝑋, 𝑥) in has degree 𝑝 𝑠−1 − 1 since its neighbors are
(𝑌 , 𝑁 (𝑋 + 𝑌 )/𝑥) for all 𝑌 ≠ −𝑋. There are has ( 𝑝 𝑠−1 − 1) 𝑝 𝑠−1 ( 𝑝 − 1)/2 edges. As earlier, it
remains to show that this graph is 𝐾𝑠, (𝑠−1)!+1 -free. Once we know this, by taking 𝑝 to be the
largest prime satisfying 𝑝 𝑠−1 ( 𝑝 − 1) ≤ 𝑛, we obtain the desired lower bound

1 𝑠−1 1
ex(𝑛, 𝐾𝑠, (𝑠−1)!+1 ) ≥ ( 𝑝 − 1) 𝑝 ( 𝑝 − 1) ≥
𝑠−1
− 𝑜(1) 𝑛2−1/𝑠 .
2 2

Proposition 1.10.14
ProjNormGraph 𝑝,𝑠 is 𝐾 𝑠, (𝑠−1)!+1 -free.
1.10 Algebraic Constructions 45

Proof. Fix distinct (𝑌1 , 𝑦 1 ), . . . , (𝑌𝑠 , 𝑦 𝑠 ) ∈ F 𝑝 𝑠−1 × F×𝑝 . We wish to show that there are at most
(𝑠 − 1)! solutions (𝑋, 𝑥) ∈ F 𝑝 𝑠−1 × F×𝑝 to the system of equations
𝑁 (𝑋 + 𝑌𝑖 ) = 𝑥𝑦 𝑖 , 𝑖 = 1, . . . , 𝑠.
Assume this system has at least one solution. Then if 𝑌𝑖 = 𝑌 𝑗 with 𝑖 ≠ 𝑗 we must have
that 𝑦 𝑖 = 𝑦 𝑗 . Therefore all the 𝑌𝑖 are distinct. For each 𝑖 < 𝑠, dividing 𝑁 (𝑋 + 𝑌𝑖 ) = 𝑥𝑦 𝑖 by
𝑁 (𝑋 + 𝑌𝑠 ) = 𝑥𝑦 𝑠 gives

𝑋 + 𝑌𝑖 𝑦𝑖
𝑁 = , 𝑖 = 1, . . . , 𝑠 − 1.
𝑋 + 𝑌𝑠 𝑦𝑠
Dividing both sides by 𝑁 (𝑌𝑖 − 𝑌𝑠 ) gives

1 1 𝑦𝑖
𝑁 + = , 𝑖 = 1, . . . , 𝑠 − 1.
𝑋 + 𝑌𝑠 𝑌𝑖 − 𝑌𝑠 𝑁 (𝑌𝑖 − 𝑌𝑠 )𝑦 𝑠
Now apply Theorem 1.10.11 (same as in the proof of Proposition 1.10.10). We deduce
that there are at most (𝑠 − 1)! choices for 𝑋, and each such 𝑋 automatically determines
𝑥 = 𝑁 (𝑋 + 𝑌1 )/𝑦 1 . Thus there are at most (𝑠 − 1)! solutions (𝑋, 𝑥). □

𝐶4 , 𝐶6 , 𝐶10 -free
Finally, let us turn to constructions of 𝐶2𝑘 -free graphs. We had mentioned in Section 1.6 that
ex(𝐶2𝑘 , 𝑛) = 𝑂 𝑘 (𝑛1+1/𝑘 ). We saw a matching lower bound construction for 4-cycles. Now
we give matching constructions for 6-cycles and 10-cycles. (It remains an open problem for
other cycle lengths.)

Theorem 1.10.15 (Tight lower bound for avoiding 𝐶2𝑘 for 𝑘 ∈ {2, 3, 5})
Let 𝑘 ∈ {2, 3, 5}. Then there is a constant 𝑐 > 0 such that for every 𝑛,
ex(𝑛, 𝐶2𝑘 ) ≥ 𝑐𝑛1+1/𝑘 .

Remark 1.10.16 (History). The existence of such 𝐶2𝑘 -free graphs for 𝑘 ∈ {3, 5} is due to
Benson (1966) and Singleton (1966). The construction given here is due to Wenger (1991),
with a simplified description due to Conlon (2021).
The following construction generalizes the point-line incidence graph construction earlier
for the 𝐶4 -free graph in Theorem 1.10.1. Here we consider a special set of lines in F𝑞𝑘 , whereas
previously for 𝐶4 we took all lines in F2𝑞 .

Construction 1.10.17 (𝐶2𝑘 -free construction for 𝑘 ∈ {2, 3, 5})

Let 𝑞 be a prime power. Let L denote the set of all lines in F𝑞𝑘 whose direction can
be written as (1, 𝑡, . . . , 𝑡 𝑘−1 ) for some 𝑡 ∈ F𝑞 . Let 𝐺 𝑞,𝑘 denote the bipartite point-line
incidence graph with vertex sets F𝑞𝑘 and L. That is, ( 𝑝, ℓ) ∈ F𝑞𝑘 × L is an edge if and
only if 𝑝 ∈ ℓ.

We have |L| = 𝑞 𝑘 , since to specify a line in L we can provide a point with first coordinate
equal to zero, along with a choice of 𝑡 ∈ F𝑞 giving the direction of the line. So the graph 𝐺 𝑞,𝑘
46 Forbidding a Subgraph

has 𝑛 = 2𝑞 𝑘 vertices. Since each line contains exactly 𝑞 points, there are exactly 𝑞 𝑘+1 ≍ 𝑛1+1/𝑘
edges in the graph. It remains to show that this graph is 𝐶2𝑘 -free whenever 𝑘 ∈ {2, 3, 5}.
Then Theorem 1.10.15 would follow after the usual trick of taking 𝑞 to be the largest prime
with 2𝑞 𝑘 < 𝑛.

Proposition 1.10.18 (𝐶2𝑘 -free construction for 𝑘 ∈ {2, 3, 5})

Let 𝑘 ∈ {2, 3, 5}. The graph 𝐺 𝑞,𝑘 from Construction 1.10.17 is 𝐶2𝑘 -free.

Proof. A 2𝑘-cycle in 𝐺 𝑞,𝑘 would correspond to 𝑝 1 , ℓ1 , . . . , 𝑝 𝑘 , ℓ𝑘 with distinct 𝑝 1 , . . . ,

𝑝 𝑘 ∈ F𝑞𝑘 and distinct ℓ1 , . . . , ℓ𝑘 ∈ L, and 𝑝 𝑖 , 𝑝 𝑖+1 ∈ ℓ𝑖 for all 𝑖 (indices taken mod 𝑘). Let
(1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 ) denote the direction of ℓ𝑖 .
𝑝1 ℓ1 𝑝2
ℓ𝑘 ℓ2

𝑝𝑘 𝑝3

ℓ𝑘−1 ... ℓ3

Then
𝑝 𝑖+1 − 𝑝 𝑖 = 𝑎 𝑖 (1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 )
for some 𝑎 𝑖 ∈ F𝑞 \ {0}. Thus (recall that 𝑝 𝑘+1 = 𝑝 1 )
∑︁
𝑘 ∑︁
𝑘
𝑎 𝑖 (1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 ) = ( 𝑝 𝑖+1 − 𝑝 𝑖 ) = 0. (1.3)
𝑖=1 𝑖=1

The vectors (1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 ), 𝑖 = 1, . . . , 𝑘, after deleting duplicates, are linearly independent.
One way to see this is via the Vandermonde determinant
1 𝑥1 𝑥12 ··· 𝑥 1𝑘−1
1 𝑥2 𝑥22 ··· 𝑥 2𝑘−1 Ö
.. .. .. .. .. = (𝑥 𝑗 − 𝑥𝑖 ).
. . . . . 𝑖< 𝑗
1 𝑥𝑘 𝑥 2𝑘 ··· 𝑥 𝑘𝑘−1
For (1.3) to hold, each vector (1, 𝑡𝑖 , . . . , 𝑡 𝑖𝑘−1 ) must appear at least twice in the sum, with
their coefficients 𝑎 𝑖 adding up to zero.
Since the lines ℓ1 , . . . , ℓ𝑘 are distinct, for each 𝑖 = 1, . . . , 𝑘 (indices taken mod 𝑘), the
lines ℓ𝑖 and ℓ𝑖+1 cannot be parallel. So 𝑡 𝑖 ≠ 𝑡 𝑖+1 . When 𝑘 ∈ {2, 3, 5} it is impossible to select
𝑡 1 , . . . , 𝑡 𝑘 with no equal consecutive terms (including wrap-around) and so that each value is
repeated at least twice. Therefore the 2𝑘-cycle cannot exist. (Why does the argument fail for
𝐶8 -freeness?) □

1.11 Randomized Algebraic Constructions

In this section, we show how to add randomness to algebraic constructions, thereby combining
the power of both approaches. This idea is due to Bukh (2015).
1.11 Randomized Algebraic Constructions 47

The algebraic constructions in the previous section can be abstractly described as follows.
Take a graph whose vertices are points in some algebraic set (e.g., some finite field geometry),
with two vertices 𝑥 and 𝑦 being adjacent if some algebraic relationship such as 𝑓 (𝑥, 𝑦) = 0
is satisfied. Previously, this 𝑓 was carefully chosen by hand. The new idea that is to take 𝑓
to be a random polynomial.
We illustrate this technique by giving another proof of the tightness of the KST bound on
extremal numbers for 𝐾𝑠,𝑡 when 𝑡 is large compared to 𝑠.

Theorem 1.11.1 (Tightness of KST bound for large 𝑡 )

For every 𝑠 ≥ 2, there exists some 𝑡 so that

1
ex(𝑛, 𝐾𝑠,𝑡 ) ≥ − 𝑜(1) 𝑛2−1/𝑠 .
2

The construction we present here has a worse dependence of 𝑡 on 𝑠 than in Theorem 1.10.7.
The main purpose of this section is to illustrate the technique of randomized algebraic
constructions. Bukh (2021) later gave a significant extension of this technique which shows
that ex(𝑛, 𝐾𝑠,𝑡 ) = Ω𝑠 (𝑛2−1/𝑠 ) for some 𝑡 close to 9𝑠 , improving on Theorem 1.10.7, which
required 𝑡 > (𝑠 − 1)!.
Proof idea. Take a random polynomial 𝑓 (𝑋1 , . . . , 𝑋𝑠 , 𝑌1 , . . . , 𝑌𝑠 ) symmetric in the 𝑋 and 𝑌
variables (i.e., 𝑓 (𝑋, 𝑌 ) = 𝑓 (𝑌 , 𝑋)), but otherwise uniformly chosen among all polynomials
with degree up to 𝑑 with coefficients in F𝑞 . Consider a graph with vertex set F𝑞𝑠 and where
𝑋 and 𝑌 are adjacent if 𝑓 (𝑋, 𝑌 ) = 0.
Given an 𝑠-vertex set 𝑈, let 𝑍𝑈 denote the set of common neighbors of 𝑈. It is an algebraic
set: the common zeros of the polynomials 𝑓 (𝑋, 𝑦), 𝑦 ∈ 𝑈. Due to the Lang–Weil bound
from algebraic geometry, 𝑍𝑈 is either bounded in size, |𝑍𝑈 | ≤ 𝐶 (the zero dimensional case),
or it must be quite large, say, |𝑍𝑈 | > 𝑞/2 (the positive dimensional case). This is unlike an
Erdős–Rényi random graph.
One can then deduce, using Markov’s inequality, that
𝑞 E[|𝑍𝑈 | 𝑘 ] 𝑂 𝑘 (1)
P(|𝑍𝑈 | > 𝐶) = P |𝑍𝑈 | > ≤ = ,
2 (𝑞/2) 𝑘 (𝑞/2) 𝑘
which is quite small (much smaller compared to an Erdős–Rényi random graph). So typically
very few sets 𝑈 have size > 𝐶. By deleting these bad 𝑈’s from the vertex set of the graph,
we obtain a 𝐾𝑠,𝐶+1 -free graph with around 𝑞 𝑠 vertices and on the order of 𝑞 2𝑠−1 edges. ■
Now we begin the actual proof. Let 𝑞 be the largest prime power satisfying 𝑞 𝑠 ≤ 𝑛. Due
to prime gaps (Theorem 1.10.3), we have 𝑞 = (1 − 𝑜(1))𝑛1/𝑠 . So it suffices to construct a
𝐾𝑠,𝑡 -free graph on 𝑞 𝑠 vertices with (1/2 − 𝑜(1))𝑞 2𝑠−1 edges.
Let 𝑑 = 𝑠2 + 𝑠 (the reason for this choice will come up later). Let
𝑓 ∈ F𝑞 [𝑋1 , 𝑋2 , . . . , 𝑋𝑠 , 𝑌1 , 𝑌2 , . . . , 𝑌𝑠 ] ≤𝑑
be a polynomial chosen uniformly at random among all polynomials with degree at most
𝑑 in each of 𝑋 = (𝑋1 , 𝑋2 , . . . , 𝑋𝑠 ) and 𝑌 = (𝑌1 , 𝑌2 , . . . , 𝑌𝑠 ) and furthermore satisfying
48 Forbidding a Subgraph

𝑓 (𝑋, 𝑌 ) = 𝑓 (𝑌 , 𝑋). In other words,

∑︁
𝑓 = 𝑎 𝑖1 ,...,𝑖𝑠 , 𝑗1 ,..., 𝑗𝑠 𝑋1𝑖1 · · · 𝑋𝑠𝑖𝑠 𝑌1𝑗1 · · · 𝑌𝑠𝑗𝑠
𝑖1 +···+𝑖𝑠 ≤𝑑
𝑗1 +···+ 𝑗𝑠 ≤𝑑

where the coefficients 𝑎 𝑖1 ,...,𝑖𝑠 , 𝑗1 ,..., 𝑗𝑠 ∈ F𝑞 are chosen subject to 𝑎 𝑖1 ,...,𝑖𝑠 , 𝑗1 ,..., 𝑗𝑠 = 𝑎 𝑗1 ,..., 𝑗𝑠 ,𝑖1 ,...,𝑖𝑠
but otherwise independently and uniformly at random.
Let 𝐺 be the graph with vertex set F𝑞𝑠 , with distinct 𝑥, 𝑦 ∈ F𝑞𝑠 adjacent if and only if
𝑓 (𝑥, 𝑦) = 0.
Then 𝐺 is a random graph. The next two lemmas show that 𝐺 behaves in some ways like
a random graph with edges independently appearing with probability 1/𝑞. Indeed, the next
lemma shows that every pair of vertices form an edge with probability 1/𝑞.

Lemma 1.11.2 (Random polynomial)

Suppose 𝑓 is randomly chosen as above. For all 𝑢, 𝑣 ∈ F𝑞𝑠 ,
1
P[ 𝑓 (𝑢, 𝑣) = 0] = .
𝑞

Proof. Note that resampling the constant term of 𝑓 does not change its distribution. Thus,
𝑓 (𝑢, 𝑣) is uniformly distributed in F𝑞 for a fixed (𝑢, 𝑣). Hence 𝑓 (𝑢, 𝑣) takes each value with
probability 1/𝑞. □
More generally, we show below that the expected occurrence of small subgraphs mirrors
that of the usual random graph with independent edges. We write 𝑈2 for the set of unordered
pairs of element from 𝑈.

Lemma 1.11.3 (Random polynomial)

Suppose 𝑓 is randomly chosen as above. Let 𝑊 ⊆ F𝑞𝑠 with |𝑊 | ≤ 𝑑 + 1. Then the vector
(𝑊)
( 𝑓 (𝑢, 𝑣)) {𝑢,𝑣 } ∈ ( 𝑊 ) is uniformly distributed in F𝑞 2 . In particular, for any 𝐸 ⊆ 𝑊2 , one
2
has
P[ 𝑓 (𝑢, 𝑣) = 0 for all {𝑢, 𝑣} ∈ 𝐸] = 𝑞 − |𝐸 | .

Proof. We first perform multivariate Lagrange interpolation to show that ( 𝑓 (𝑢, 𝑣)) {𝑢,𝑣 } can
take all possible values. For each pair 𝑢, 𝑣 ∈ 𝑊 with 𝑢 ≠ 𝑣, we can find some polynomial
ℓ𝑢,𝑣 ∈ F[𝑋1 , . . . , 𝑋𝑠 ] of degree at most 1 such that ℓ𝑢,𝑣 (𝑢) = 1 and ℓ𝑢,𝑣 (𝑣) = 0. For each
𝑢 ∈ 𝑊, let
Ö
𝑞 𝑢 (𝑋) = ℓ𝑢,𝑣 (𝑋) ∈ F[𝑋1 , . . . , 𝑋𝑠 ]
𝑣 ∈𝑊\{𝑢}

which has degree ≤ |𝑊 | − 1 ≤ 𝑑. It satisfies 𝑞 𝑢 (𝑢) = 1, and 𝑞 𝑢 (𝑣) = 0 for all 𝑣 ∈ 𝑊 \ {𝑢}.
Let
∑︁
𝑝(𝑋, 𝑌 ) = 𝑐 𝑢,𝑣 (𝑞 𝑢 (𝑋)𝑞 𝑣 (𝑌 ) + 𝑞 𝑣 (𝑋)𝑞 𝑢 (𝑌 ))
{𝑢,𝑣 } ∈ ( 2 )
𝑊

with 𝑐 𝑢,𝑣 ∈ F𝑞 . Note that 𝑝(𝑋, 𝑌 ) = 𝑝(𝑌 , 𝑋). Also, 𝑝(𝑢, 𝑣) = 𝑐 𝑢,𝑣 for all distinct 𝑢, 𝑣 ∈ 𝑊.
Now let each 𝑐 𝑢,𝑣 ∈ F𝑞 above be chosen independently and uniformly at random. So
1.11 Randomized Algebraic Constructions 49

𝑝(𝑋, 𝑌 ) is a random polynomial. Note that 𝑓 (𝑋, 𝑌 ) and 𝑝(𝑋, 𝑌 ) are independent random
polynomials both with degree at most 𝑑 in each of 𝑋 and 𝑌 . Since 𝑓 is chosen uniformly
( |𝑊 |)
at random, it has the same distribution as 𝑓 + 𝑝. Since ( 𝑝(𝑢, 𝑣))𝑢,𝑣 = (𝑐 𝑢,𝑣 )𝑢,𝑣 ∈ F𝑞 2 is
uniformly distributed, the same must be true for ( 𝑓 (𝑢, 𝑣))𝑢,𝑣 as well. □
Now fix 𝑈 ⊆ F𝑞𝑠 with |𝑈| = 𝑠. We want to show that it is rare for 𝑈 to have many common
neighbors. We will use the method of moments. Let
𝑍𝑈 = the set of common neighbors of 𝑈
= {𝑥 ∈ F𝑞𝑠 \ 𝑈 : 𝑓 (𝑥, 𝑢) = 0 for all 𝑢 ∈ 𝑈}.
Then using Lemma 1.11.3, for any 𝑘 ≤ 𝑠2 + 1,
h ∑︁ 𝑘i
E[|𝑍𝑈 | 𝑘 ] = E 1{𝑣 ∈ 𝑍𝑈 }
∑︁
𝑣 ∈F𝑞𝑠 \𝑈

= E[1{𝑣 (1) , . . . , 𝑣 (𝑘 ) ∈ 𝑍𝑈 }]
∑︁
𝑣 (1) ,...,𝑣 (𝑘) ∈F𝑞𝑠 \𝑈

= P[ 𝑓 (𝑢, 𝑣) = 0 for all 𝑢 ∈ 𝑈 and 𝑣 ∈ {𝑣 (1) , . . . , 𝑣 (𝑘 ) }]

∑︁
𝑣 (1) ,...,𝑣 (𝑘) ∈F𝑞𝑠 \𝑈

𝑞 − |𝑈 | #{𝑣
(1)
,...,𝑣 (𝑘) }
= ,
𝑣 (1) ,...,𝑣 (𝑘) ∈F𝑞𝑠 \𝑈

with the final step due to Lemma 1.11.3 applied with 𝑊 = 𝑈 ∪ {𝑣 (1) , . . . , 𝑣 (𝑘 ) }, which has
cardinality ≤ |𝑈| + 𝑘 ≤ 𝑠 + 𝑠2 + 1 = 𝑑 + 1. Note that #{𝑣 (1) , . . . , 𝑣 (𝑘 ) } counts distinct elements
in the set. Thus, continuing the above calculation,
∑︁ 𝑞 𝑠 − |𝑈|
= 𝑞 −𝑟 𝑠 #{surjections [𝑘] → [𝑟]}
𝑟
∑︁
𝑟 ≤𝑘

≤ #{surjections [𝑘] → [𝑟]}

𝑟 ≤𝑘

= 𝑂 𝑘 (1),
Applying the above with 𝑘 = 𝑠2 + 1 and using Markov’s inequality, we get
2
𝑠 2 +1 𝑠 2 +1
E |𝑍𝑈 | 𝑠 +1 𝑂 𝑠 (1)
P(|𝑍𝑈 | ≥ 𝜆) = P(|𝑍𝑈 | ≥𝜆 )≤ ≤ 𝑠2 +1 . (1.4)
𝜆 𝑠2 +1 𝜆
Remark 1.11.4. All the probabilistic arguments up to this point would be identical had
we used a random graph with independent edges appearing with probability 𝑝. In both
settings, the |𝑍𝑈 | above is a random variable with constant order expectation. However,
their distributions are extremely different, as we will soon see. For a random graph with
independent edges, |𝑍𝑈 | behaves like a Poisson random variable, and consequently, for any
constant 𝑡, P(|𝑍𝑈 | ≥ 𝑡) is bounded from below by a constant. Consequently, many 𝑠-element
sets of vertices are expected to have at least 𝑡 common neighbors, and so this method will not
work. However, this is not the case with the random algebraic construction. It is impossible
for |𝑍𝑈 | to take on certain ranges of values—if |𝑍𝑈 | is somewhat large, then it must be very
large.
50 Forbidding a Subgraph

Note that 𝑍𝑈 is defined by 𝑠 polynomial equations. The next result tells us that the number
of points on such an algebraic variety must be either bounded or at least around 𝑞.

Lemma 1.11.5 (Dichotomy: number of common zeros)

For all 𝑠, 𝑑 there exists a constant 𝐶 such that if 𝑓1 (𝑋), . . . , 𝑓𝑠 (𝑋) are polynomials on F𝑞𝑠
of degree at most 𝑑, then
{𝑥 ∈ F𝑞𝑠 : 𝑓1 (𝑥) = . . . 𝑓𝑠 (𝑥) = 0}
√
has size either at most 𝐶 or at least 𝑞 − 𝐶 𝑞.

The lemma can be deduced from the following important result from algebraic geometry
due to Lang & Weil (1954), which says that the number of points of an 𝑟-dimensional
algebraic variety in F𝑞𝑠 is roughly 𝑞 𝑟 , as long as certain irreducibility hypotheses are satisfied.
We include here the statement of the Lang–Weil bound. Here F𝑞 denote the algebraic closure
of F𝑞 .

Theorem 1.11.6 (Lang–Weil bound)

Let 𝑔1 , . . . , 𝑔𝑚 ∈ F𝑞 [𝑋] be polynomials of degree at most 𝑑. Let
n 𝑠
o
𝑉 = 𝑥 ∈ F𝑞 : 𝑔1 (𝑥) = 𝑔2 (𝑥) = . . . = 𝑔𝑚 (𝑥) .

Suppose 𝑉 is an irreducible variety. Then

𝑉 ∩ F𝑞𝑠 = 𝑞 dim 𝑉 (1 + 𝑂 𝑠,𝑚,𝑑 (𝑞 −1/2 )).

The two cases in Lemma 1.11.5 then correspond to the zero dimensional case and the
positive dimensional case, though some care is needed to deal with what happens if the
variety is reducible in the field closure. We refer the reader to Bukh (2015) for details on how
to deduce Lemma 1.11.5 from the Lang–Weil bound.
Now, continuing our proof of Theorem 1.11.1. Recall 𝑍𝑈 = {𝑥 ∈ F𝑞𝑠 \ 𝑈 : 𝑓 (𝑥, 𝑢) =
0 for all 𝑢 ∈ 𝑈}. Apply Lemma 1.11.5 to the polynomials 𝑓 (𝑋, 𝑢), 𝑢 ∈ 𝑈. Then for large
enough 𝑞 there exists a constant 𝐶 from Lemma 1.11.5 such that either |𝑍𝑈 | ≤ 𝐶 (bounded)
or |𝑍𝑈 | > 𝑞/2 (very large). Thus, by (1.4),
𝑞 𝑂 𝑠 (1)
P(|𝑍𝑈 | > 𝐶) = P |𝑍𝑈 | > ≤ .
2 (𝑞/2) 𝑠2 +1
So the expected number of 𝑠-element subset 𝑈 with |𝑍𝑈 | > 𝐶 is
𝑠
𝑞 𝑂 𝑠 (1)
≤ = 𝑂 𝑠 (1/𝑞).
𝑠 (𝑞/2) 𝑠2 +1
Remove from 𝐺 a vertex from every 𝑠-element 𝑈 with |𝑍𝑈 | > 𝐶. Then the resulting graph is
𝐾𝑠, ⌈𝐶 ⌉+1 -free. Since we remove at most 𝑞 𝑠 edges for each deleted vertex, the expected number
of remaining edges is at least

1 𝑞𝑠 1
𝑠−1
− 𝑂 𝑠 (𝑞 ) = − 𝑜(1) 𝑞 2𝑠−1 .
𝑞 2 2
1.11 Randomized Algebraic Constructions 51

Finally, given 𝑛, we can take the largest prime 𝑞 satisfying 𝑞 𝑠 ≤ 𝑛 to finish the proof of
Theorem 1.11.1.

Further Reading
Graph theory is a huge subject. There are many important topics that are quite far from
the main theme of this book. For a standard introduction to the subject (especially on more
classical aspects), several excellent graph theory textbooks are available: Bollobás (1998),
Bondy & Murty (2008), Diestel (2017), West (1996). The three-volume Combinatorial
Optimization by Schrĳver (2003) is also an excellent reference for graph theory, with a focus
on combinatorial algorithms.
The following surveys discuss in more depth various topics encountered in this chapter:
• The History of Degenerate (Bipartite) Extremal Graph problems by Füredi & Si-
monovits (2013);
• Hypergraph Turán Problems by Keevash (2011);
• Dependent Random Choice by Fox & Sudakov (2011).

Chapter Summary
• Turán number ex(𝑛, 𝐻) = the maximum number of edges in an 𝑛-vertex 𝐻-free graph.
• Turán’s theorem. Among all 𝑛-vertex 𝐾𝑟+1 -free graphs, the Turán graph 𝑇𝑛,𝑟 (a complete
𝑟-partite graph with nearly equal sized parts) uniquely maximizes the number of edges.
• Erdős–Stone–Simonovits Theorem. For any fixed graph 𝐻,
2
1 𝑛
ex(𝑛, 𝐻) = 1 − + 𝑜(1) .
𝜒(𝐻) − 1 2

• Supersaturation (from one copy to many copies): an 𝑛-vertex graph with ≥ ex(𝑛, 𝐻)+𝜀𝑛2
edges has ≥ 𝛿𝑛 𝑣 (𝐻 ) copies of 𝐻, for some constant 𝛿 > 0 only depending on 𝜀 > 0, and
provided that 𝑛 is sufficiently large.
• Kővári–Sós–Turán theorem. For fixed 𝑠 ≤ 𝑡,
ex(𝑛, 𝐾 𝑠,𝑡 ) = 𝑂 𝑠,𝑡 (𝑛2−1/𝑠 ).
– Tight for 𝐾2,2 , 𝐾3,3 , and more generally, for 𝐾 𝑠,𝑡 with 𝑡 much larger than 𝑠 (algebraic
constructions).
– Conjectured to be tight in general.
• Even cycles. For any integer 𝑘 ≥ 2, (we only proved a weaker statement in this book)

ex(𝑛, 𝐶2𝑘 ) = 𝑂 𝑘 (𝑛1+1/𝑘 ).

– Tight for 𝑘 ∈ {2, 3, 5} (algebraic constructions).
– Conjectured to be tight in general.
• Randomized constructions for constructing 𝐻-free graphs: destroying all copies of 𝐻
from a random graph.
• Algebraic construction: define edges using polynomials over F𝑞𝑛 .
• Randomized algebraic constructions: randomly select the polynomials.
2

Graph Regularity Method

Chapter Highlights
• Szemerédi’s graph regularity lemma: partitioning an arbitrary graph into a bounded num-
ber of parts with random-like edges between parts
• Graph regularity method: recipe and applications
• Graph removal lemma
• Roth’s theorem: a graph theoretic proof using the triangle removal lemma
• Strong regularity and induced graph removal lemma
• Graph property testing
• Hypergraph removal lemma and Szemerédi’s theorem

In this chapter, we discuss a powerful technique in extremal graph theory developed in

the 1970’s, known as Szemerédi’s graph regularity lemma. The graph regularity method
has wide ranging applications, and is now considered a central technique in the field. The
regularity lemma produces a “rough structural” decomposition of an arbitrary graph (though
it is mainly useful for graphs with quadratically many edges). It then allows us to model an
arbitrary graph by a random graph.
The regularity method introduces us to a central theme of the book: the dichotomy
of structure and pseudorandomness. This dichotomy is analogous to the more familiar
concept of “signal and noise”, namely that a complex system can be decomposed into a
structural piece with plenty of information content (the signal) as well as a random-like
residue (the noise). This idea will show up again later in Chapter 6 when we discuss Fourier
analysis in additive combinatorics.
In general, we face two related challenges:
• How to decompose an object into a structured piece and a random-like piece?
• How to analyze the resulting components and their interactions?
We begin the chapter with the statement and the proof of the graph regularity lemma.
We then prove Roth’s theorem using the regularity method. This proof, due to Ruzsa &
Szemerédi (1978), is not the original proof by Roth (1953), whose original Fourier analytic
proof we will see in Chapter 6. Nevertheless, it is important for being historically one of
the first major applications of the graph regularity method. Similar to the proof of Schur’s
theorem in Chapter 0, this graph theoretic proof of Roth’s theorem demonstrates a fruitful
connection between graph theory and additive combinatorics.
By the regularity method, we mean both the graph regularity lemma as well as methods
for applying it. Rather than some specific set of theorems, graph regularity should be viewed
as a general technique malleable to adaptations. Do not get bogged down by specific choices

53
54 Graph Regularity Method

of parameters in the statements and proofs below, and rather, focus on the main ideas and
techniques.
Many students experience a steep learning curve when studying the regularity method.
The technical details can obscure the underlying intuition. Also, the style of arguments may
be quite different from the type of combinatorial proofs they encountered earlier in their
studies (e.g., the type of proofs from earlier in this book). Section 2.7 contains important
exercises on applying the graph regularity method, which are essential for understanding the
material.

2.1 Szemerédi’s Graph Regularity Lemma

In this section, we state and prove the graph regularity lemma. Let us first give an informal
statement.
Graph regularity lemma (informal). The vertex set of every graph can be partitioned into
a bounded number of parts so that the graph looks random-like between most pairs of parts.
Below is an illustration of what the outcome of the partition looks like. Here the vertex
set of a graph is partitioned into five parts. Between a pair of parts (including between a part
and itself) is a random-like graph with a certain edge-density (e.g., 0.4 between the first and
second parts, 0.7 between the first and third parts, . . . ).

Definition 2.1.1 (Edge density)

Let 𝑋 and 𝑌 be sets of vertices in a graph 𝐺. Let 𝑒 𝐺 (𝑋, 𝑌 ) be the number of edges
between 𝑋 and 𝑌 ; that is,
𝒆𝑮 (𝑿, 𝒀) := |{(𝑥, 𝑦) ∈ 𝑋 × 𝑌 : 𝑥𝑦 ∈ 𝐸 (𝐺)}| .
Define the edge density between 𝑋 and 𝑌 in 𝐺 by
𝑒 𝐺 (𝑋, 𝑌 )
𝒅𝑮 (𝑿, 𝒀) := .
|𝑋 | |𝑌 |
We drop the subscript 𝐺 if context is clear.

We allow 𝑋 and 𝑌 to overlap in the definition above. For intuition, it is mostly fine to
picture the bipartite setting, where 𝑋 and 𝑌 are automatically disjoint.
What should it mean for a graph to be “random-like”? We will explore the concept of
pseudorandom graphs in depth in Chapter 3. Given vertex sets 𝑋 and 𝑌 , we would like
2.1 Szemerédi’s Graph Regularity Lemma 55

the edge density between them to not change much even if we restrict 𝑋 and 𝑌 to smaller
subsets. Intuitively, this says that the edges are somewhat evenly distributed.

𝑈 𝐴 𝐵 𝑊

Definition 2.1.2 (𝜀 -regular pair)

Let 𝐺 be a graph and 𝑈, 𝑊 ⊆ 𝑉 (𝐺). We call (𝑈, 𝑊) an 𝜺-regular pair in 𝐺 if for all
𝐴 ⊆ 𝑈 and 𝐵 ⊆ 𝑊 with | 𝐴| ≥ 𝜀 |𝑈| and |𝐵| ≥ 𝜀 |𝑊 |, one has
|𝑑 ( 𝐴, 𝐵) − 𝑑 (𝑈, 𝑊)| ≤ 𝜀.
If (𝑈, 𝑊) is not 𝜀-regular, then we say that their irregularity is witnessed by some 𝐴 ⊆ 𝑈
and 𝐵 ⊆ 𝑊 satisfying | 𝐴| ≥ 𝜀 |𝑈|, |𝐵| ≥ 𝜀 |𝑊 |, and |𝑑 ( 𝐴, 𝐵) − 𝑑 (𝑈, 𝑊)| > 𝜀.

We need the hypotheses | 𝐴| ≥ 𝜀 |𝑈| and |𝐵| ≥ 𝜀 |𝑊 | since the definition would be too
restrictive otherwise. For example, by taking 𝐴 = {𝑥} and 𝐵 = {𝑦}, 𝑑 ( 𝐴, 𝐵) could end up
being both 0 (if 𝑥𝑦 ∉ 𝐸) and 1 (if 𝑥𝑦 ∈ 𝐸).
Remark 2.1.3 (Different roles of 𝜀 ). The 𝜀 in | 𝐴| ≥ 𝜀 |𝑈| and |𝐵| ≥ 𝜀 |𝑊 | plays a different
role from the 𝜀 in |𝑑 ( 𝐴, 𝐵) − 𝑑 (𝑈, 𝑊)| ≤ 𝜀. However, it is usually not important to distinguish
these 𝜀’s. So we use only one 𝜀 for convenience of notation.
The “random-like” intuition is justified as random graphs indeed satisfy the above property.
(This can be proved by the Chernoff bound; more on this in the next chapter.)
The following exercises can help you check your understanding of 𝜀-regularity.
Exercise 2.1.4 (Basic inheritance of regularity). Let 𝐺 be a graph and 𝑋, 𝑌 ⊆ 𝑉 (𝐺). If
(𝑋, 𝑌 ) is an 𝜀𝜂-regular pair, then (𝑋 ′ , 𝑌 ′ ) is 𝜀-regular for all 𝑋 ′ ⊆ 𝑋 with |𝑋 ′ | ≥ 𝜂 |𝑋 | and
𝑌 ′ ⊆ 𝑌 with |𝑌 ′ | ≥ 𝜂 |𝑌 |.

Exercise 2.1.5 (An alternate definition of regular pairs). Let 𝐺 be a graph and 𝑋, 𝑌 ⊆
𝑉 (𝐺). Say that (𝑋, 𝑌 ) is 𝜺-homogeneous if for all 𝐴 ⊆ 𝑋 and 𝐵 ⊆ 𝑌 , one has
|𝑒( 𝐴, 𝐵) − | 𝐴| |𝐵| 𝑑 (𝑋, 𝑌 )| ≤ 𝜀 |𝑋 | |𝑌 | .
Show that if (𝑋, 𝑌 ) is 𝜀-regular, then it is 𝜀-homogeneous. Also, show that if (𝑋, 𝑌 ) is
𝜀 3 -homogeneous, then it is 𝜀-regular.

Exercise 2.1.6 (Robustness of regularity). Prove that for every 𝜀 ′ > 𝜀 > 0, there exists
𝛿 > 0 so that given an 𝜀-regular pair (𝑋, 𝑌 ) in some graph, if we modify the graph by
adding/deleting ≤ 𝛿 |𝑋 | vertices to/from 𝑋, adding/deleting ≤ 𝛿 |𝑌 | vertices to/from 𝑌 , and
adding/deleting ≤ 𝛿 |𝑋 | |𝑌 | edges, then resulting new (𝑋, 𝑌 ) is still 𝜀 ′ -regular.
Next, let us define what it means for a vertex partition to be 𝜀-regular.
56 Graph Regularity Method

Definition 2.1.7 (𝜀 -regular partition)

Given a graph 𝐺, a partition P = {𝑉1 , . . . , 𝑉𝑘 } of its vertex set is an 𝜺-regular partition
if
∑︁
|𝑉𝑖 ||𝑉 𝑗 | ≤ 𝜀|𝑉 (𝐺)| 2 .
(𝑖, 𝑗 ) ∈ [ 𝑘 ] 2
(𝑉𝑖 ,𝑉 𝑗 ) not 𝜀-regular

In other words, all but at most 𝜀-fraction of pairs of vertices of 𝐺 lie between 𝜀-regular
parts.

Remark 2.1.8. When |𝑉1 | = · · · = |𝑉𝑘 |, the inequality says that at most 𝜀𝑘 2 of pairs (𝑉𝑖 , 𝑉 𝑗 )
are not 𝜀-regular.
Also, note that the summation includes 𝑖 = 𝑗. If none of 𝑉𝑖 ’s are too large, say |𝑉𝑖 | ≤ 𝜀𝑛 for
Í Í
each 𝑖, then the terms with 𝑖 = 𝑗 contribute ≤ 𝑖 |𝑉𝑖 | 2 ≤ 𝜀𝑛 𝑖 |𝑉𝑖 | = 𝜀𝑛2 , which is neglible.
We are now ready to state Szemerédi’s graph regularity lemma.

Theorem 2.1.9 (Szemerédi’s graph regularity lemma)

For every 𝜀 > 0, there exists a constant 𝑀 such that every graph has an 𝜀-regular partition
into at most 𝑀 parts.

Proof of the graph regularity lemma

Proof idea. We will generate the desired vertex partition according to the following algo-
rithm:
(1) Start with the trivial partition of 𝑉 (𝐺) (the trivial partition has a single part consisting
of the whole set).
(2) While the current partition P is not 𝜀-regular:
(a) For each (𝑉𝑖 , 𝑉 𝑗 ) that is not 𝜀-regular, find a witnessing pair in 𝑉𝑖 and 𝑉 𝑗
(b) Refine P using all the witnessing pairs. (Here given two partitions P and Q of
the same set, we say that Q refines P if each part of Q is contained in a part
of P. In other words, we divide each part of P further to obtain Q.)
We repeat step (2) until the partition is 𝜀-regular, at which point the algorithm terminates.
The resulting partition is always 𝜀-regular by design. It remains to show that the number
of iterations is bounded as a function of 𝜀. To see this, we keep track of a quantity that
necessarily increases at each iteration of the procedure. This is called an energy increment
argument. (The reason that we call it an “energy” is because it is the 𝐿 2 norm of a vector of
edge-densities, and the kinetic energy in physics is also an 𝐿 2 norm.) ■
2.1 Szemerédi’s Graph Regularity Lemma 57

Definition 2.1.10 (Energy)

Let 𝐺 be an 𝑛-vertex graph (whose dependence we drop from the notation). Let 𝑈, 𝑊 ⊆
𝑉 (𝐺). Define
|𝑈| |𝑊 |
𝒒(𝑼, 𝑾) := 𝑑 (𝑈, 𝑊) 2 .
𝑛2
For partitions P𝑈 = {𝑈1 , . . . , 𝑈 𝑘 } of 𝑈 and P𝑊 = {𝑊1 , . . . , 𝑊𝑙 } of 𝑊, define
∑︁
𝑘 ∑︁
𝑙
𝒒(P𝑼 , P𝑾 ) := 𝑞(𝑈𝑖 , 𝑊 𝑗 ).
𝑖=1 𝑗=1

Finally, for a partition P = {𝑉1 , . . . , 𝑉𝑘 } of 𝑉 (𝐺), define its energy to be

∑︁
𝑘 ∑︁
𝑘 ∑︁
𝑘 ∑︁𝑘
|𝑉𝑖 | 𝑉 𝑗
𝒒(P) := 𝑞(P, P) = 𝑞(𝑉𝑖 , 𝑉 𝑗 ) = 2
𝑑 (𝑉𝑖 , 𝑉 𝑗 ) 2 .
𝑖=1 𝑗=1 𝑖=1 𝑗=1
𝑛

Since the edge density is always between 0 and 1, we have 0 ≤ 𝑞(P) ≤ 1 for all
partitions P. The following lemmas show that the energy cannot decrease upon refinement,
and furthermore, it must increase substantially at each step of the algorithm above.

Lemma 2.1.11 (Energy never decreases under refinement)

Let 𝐺 be a graph, 𝑈, 𝑊 ⊆ 𝑉 (𝐺), P𝑈 a partition of 𝑈, and P𝑊 a partition of 𝑊. Then
𝑞(P𝑈 , P𝑊 ) ≥ 𝑞(𝑈, 𝑊).

𝑈 𝑊

P𝑈 P𝑊

Proof. Let 𝑛 = 𝑣(𝐺). Let P𝑈 = {𝑈1 , . . . , 𝑈 𝑘 } and P𝑊 = {𝑊1 , . . . , 𝑊𝑙 }. Choose 𝑥 ∈ 𝑈 and

𝑦 ∈ 𝑊 uniformly and independently at random. Let 𝑈𝑖 be the part of P𝑈 that contains 𝑥 and
𝑊 𝑗 be the part of P𝑊 that contains 𝑦. Define the random variable 𝑍 := 𝑑 (𝑈𝑖 , 𝑊 𝑗 ). We have
√︄
∑︁
𝑘 ∑︁𝑙
|𝑈𝑖 | |𝑊 𝑗 | 𝑛2
E[𝑍] = 𝑑 (𝑈𝑖 , 𝑊 𝑗 ) = 𝑑 (𝑈, 𝑊) = 𝑞(𝑈, 𝑊).
𝑖=1 𝑗=1
|𝑈| |𝑊 | |𝑈||𝑊 |

We have
∑︁
𝑘 ∑︁𝑙
|𝑈𝑖 | |𝑊 𝑗 | 𝑛2
E[𝑍 2 ] = 𝑑 (𝑈𝑖 , 𝑊 𝑗 ) 2 = 𝑞(P𝑈 , P𝑊 ).
𝑖=1 𝑗=1
|𝑈| |𝑊 | |𝑈||𝑊 |

By convexity, E[𝑍 2 ] ≥ E[𝑍] 2 , which implies 𝑞(P𝑈 , P𝑊 ) ≥ 𝑞(𝑈, 𝑊). □

58 Graph Regularity Method

Lemma 2.1.12 (Energy never decreases under refinement)

Given two vertex partitions P and P ′ of some graph, if P ′ refines P, then 𝑞(P) ≤ 𝑞(P ′ ).

Proof. The conclusion follows by applying Lemma 2.1.11 to each pair of parts of P. In
more detail, letting P = {𝑉1 , . . . , 𝑉𝑚 }, and suppose P ′ refines each 𝑉𝑖 into a partition
P𝑉′ 𝑖 = {𝑉𝑖1′ , . . . , 𝑉𝑖𝑘′ 𝑖 } of 𝑉𝑖 , so that P ′ = P𝑉′ 1 ∪ · · · ∪ P𝑉′ 𝑚 , we have
∑︁ ∑︁
𝑞(P) = 𝑞(𝑉𝑖 , 𝑉 𝑗 ) ≤ 𝑞(P𝑉′ 𝑖 , P𝑉′ 𝑗 ) = 𝑞(P ′ ). □
𝑖, 𝑗 𝑖, 𝑗

Lemma 2.1.13 (Energy boost for an irregular pair)

Let 𝐺 be an 𝑛-vertex graph. If (𝑈, 𝑊) is not 𝜀-regular, as witnessed by 𝐴 ⊆ 𝑈 and
𝐵 ⊆ 𝑊, then
|𝑈| |𝑊 |
𝑞({ 𝐴, 𝑈 \ 𝐴}, {𝐵, 𝑊 \ 𝐵}) > 𝑞(𝑈, 𝑊) + 𝜀 4 .
𝑛2
This is the “red bull lemma”, giving an energy boost when feeling irregular.
Proof. Define 𝑍 as in the proof of Lemma 2.1.11 for P𝑈 = { 𝐴, 𝑈 \ 𝐴} and P𝑊 = {𝐵, 𝑊 \ 𝐵}.
Then
𝑛2
Var(𝑍) = E[𝑍 2 ] − E[𝑍] 2 = (𝑞(P𝑈 , P𝑊 ) − 𝑞(𝑈, 𝑊)) .
|𝑈| |𝑊 |
We have 𝑍 = 𝑑 ( 𝐴, 𝐵) with probability ≥ | 𝐴| |𝐵| /(|𝑈| |𝑊 |) (corresponding to the event
𝑥 ∈ 𝐴 and 𝑦 ∈ 𝐵). So
Var(𝑍) = E[(𝑍 − E[𝑍]) 2 ]
| 𝐴| |𝐵|
≥ (𝑑 ( 𝐴, 𝐵) − 𝑑 (𝑈, 𝑊)) 2
|𝑈| |𝑊 |
> 𝜀 · 𝜀 · 𝜀2 .
Putting the two inequalities together gives the claim. □
The next lemma, corresponding to step (2)(b) of the algorithm above, shows that we can
put all the witnessing pairs together to obtain an energy increment.

Lemma 2.1.14 (Energy boost for an irregular partition)

If a partition P = {𝑉1 , . . . , 𝑉𝑘 } of 𝑉 (𝐺) is not 𝜀-regular, then there exists a refinement
Q of P where every 𝑉𝑖 is partitioned into at most 2 𝑘+1 parts, and such that
𝑞(Q) > 𝑞(P) + 𝜀 5 .

Proof. Let
𝑅 = {(𝑖, 𝑗) ∈ [𝑘] 2 : (𝑉𝑖 , 𝑉 𝑗 ) is 𝜀-regular} and 𝑅 = [𝑘] 2 \ 𝑅.
For each pair (𝑉𝑖 , 𝑉 𝑗 ) that is not 𝜀-regular, find a pair 𝐴𝑖, 𝑗 ⊆ 𝑉𝑖 and 𝐵𝑖, 𝑗 ⊆ 𝑉 𝑗 that witnesses
the irregularity. Do this simultaneously for all (𝑖, 𝑗) ∈ 𝑅. Note for 𝑖 ≠ 𝑗, we can take
𝐴𝑖, 𝑗 = 𝐵 𝑗,𝑖 due to symmetry. When 𝑖 = 𝑗, we should allow for the possibility of 𝐴𝑖,𝑖 and 𝐵𝑖,𝑖
to be distinct.
2.1 Szemerédi’s Graph Regularity Lemma 59

𝑉1 𝑉1 𝑉1 𝑉1
𝐴12 𝐴13 𝐵11
𝐴11
𝐵12 𝐵13
𝐴22 𝐵22
𝐵33
𝐴33
𝑉2 𝑉3 𝑉2 𝑉3 𝑉2 𝑉3 𝑉2 𝑉3
𝐴23 𝐵23

𝑉1

𝑉2 𝑉3

Figure 2.1 In the proof of Lemma 2.1.14, we refine the partition by taking a
common refinement using witnesses of irregular pairs.

Let Q be a common refinement of P by all the 𝐴𝑖, 𝑗 and 𝐵𝑖, 𝑗 (i.e., the parts of Q are
maximal subsets that are not “cut up” into small pieces by any element of P or by the 𝐴𝑖, 𝑗
and 𝐵𝑖, 𝑗 ; intuitively, imagine regions of a Venn diagram). See Figure 2.1 for an illustration.
There are ≤ 𝑘 + 1 such distinct non-empty sets inside each 𝑉𝑖 . So Q refines each 𝑉𝑖 into at
most 2𝑘+1 parts. Let Q𝑖 be the partition of 𝑉𝑖 given by Q. Then, using the monotonicity of
energy under refinements (Lemma 2.1.11),
∑︁
𝑞(Q) = 𝑞(Q𝑖 , Q 𝑗 )
∑︁ ∑︁
(𝑖, 𝑗 ) ∈ [ 𝑘 ] 2

= 𝑞(Q𝑖 , Q 𝑗 ) + 𝑞(Q𝑖 , Q 𝑗 )
∑︁ ∑︁
(𝑖, 𝑗 ) ∈ 𝑅 (𝑖, 𝑗 ) ∈ 𝑅

≥ 𝑞(𝑉𝑖 , 𝑉 𝑗 ) + 𝑞({ 𝐴𝑖, 𝑗 , 𝑉𝑖 \𝐴𝑖, 𝑗 }, {𝐵𝑖, 𝑗 , 𝑉 𝑗 \𝐵𝑖, 𝑗 }).

(𝑖, 𝑗 ) ∈ 𝑅 (𝑖, 𝑗 ) ∈ 𝑅

By Lemma 2.1.13, the energy boost lemma, the above sum is

∑︁ ∑︁ |𝑉𝑖 | 𝑉 𝑗
> 𝑞(𝑉𝑖 , 𝑉 𝑗 ) + 𝜀4 .
𝑛2
(𝑖, 𝑗 ) ∈ [ 𝑘 ] 2 (𝑖, 𝑗 ) ∈ 𝑅

The first sum equals 𝑞(P), and the second sum is > 𝜀 5 by Lemma 2.1.13 since P is not
𝜀-regular. This gives the desired inequality. □
Remark 2.1.15 (Refinements should be done simultaneously). Here is a subtle point in
the above proof. The refinement Q must be obtained in a single step by refining P using all
the witnessing sets 𝐴𝑖, 𝑗 simultaneously. If instead we pick out a pair 𝐴𝑖, 𝑗 ⊆ 𝑉𝑖 and 𝐴 𝑗,𝑖 ⊆ 𝑉 𝑗 ,
60 Graph Regularity Method

refine the partition using just this pair, and then iterate using another irregular pair (𝑉𝑖′ , 𝑉 𝑗 ′ ),
the energy boost step would not work. This is because 𝜀-regularity (or lack thereof) is not
well-preserved under taking refinements.
Proof of the graph regularity lemma (Theorem 2.1.9). Start with a trivial partition of the
vertex set of the graph. Repeatedly apply Lemma 2.1.14 whenever the current partition is
not 𝜀-regular. By Lemma 2.1.14, the energy of the partition increases by more than 𝜀 5 at
each iteration. Since the energy of the partition is ≤ 1, we must stop after < 𝜀 −5 iterations,
terminating in an 𝜀-regular partition.
If a partition has 𝑘 parts, then Lemma 2.1.14 produces a refinement with ≤ 𝑘2𝑘+1 parts.
We start with a trivial partition with one part, and then refine < 𝜀 −5 times. Observe the crude
bound 𝑘2𝑘+1 ≤ 22 . So the total number of parts at the end is ≤ tower( ⌈2𝜀 −5 ⌉), where
𝑘

·2
·· height 𝑘
tower(𝒌) := 22 . □
Remark 2.1.16 (The proof does not guarantee that the partition becomes “more regular”
after each step.). Let us stress what the proof is not saying. It is not saying that the partition
gets more and more regular under each refinement. Also, it is not saying that partition gets
more regular as the energy gets higher. Rather, the energy simply bounds the number of
iterations.
The bound on the number of parts guaranteed by the proof is a constant for each fixed
𝜀 > 0, but it grows extremely quickly as 𝜀 gets smaller. Is the poor quantitative dependence
somehow due to a suboptimal proof strategy? Surprisingly, the tower-type bound is necessary,
as shown by Gowers (1997).

Theorem 2.1.17 (Lower bound on the number of parts in a regularity partition)

There exists a constant 𝑐 > 0 such that for all sufficiently small 𝜀 > 0, there exists a
graph with no 𝜀-regular partition into fewer than tower( ⌈𝜀 −𝑐 ⌉) parts.

We do not include the proof here; see Moshkovitz & Shapira (2016) for a short proof.
The general idea is to construct a graph that roughly reverse engineers the proof of the
regularity lemma, so there is essentially a unique 𝜀-regular partition, which must have many
parts.
Remark 2.1.18 (Irregular pairs are necessary in the regularity lemma). Recall that in
Definition 2.1.7 of an 𝜀-regular partition, we are allowed to have some irregular pairs.
Are irregular pairs necessarily? It turns that we must permit them. Exercise 2.1.24 gives
an example of a canonical example (a “half graph”) where every regularity partition has
irregular pairs.

The regularity lemma is quite flexible. For example, we can start with an arbitrary partition
of 𝑉 (𝐺) instead of the trivial partition in the proof, in order to obtain a partition that is a
refinement of a given partition. The exact same proof with this modification yields the
following.
2.1 Szemerédi’s Graph Regularity Lemma 61

Theorem 2.1.19 (Regularity starting with an arbitrary initial partition)

For every 𝜀 > 0, there exists a constant 𝑀 such that for every graph 𝐺 and a partition
P0 of 𝑉 (𝐺), there exists an 𝜀-regular partition P of 𝑉 (𝐺) that is a refinement of P0 , and
such that each part of P0 is refined into at most 𝑀 parts.

Here is another strengthening of the regularity lemma. We impose the additional require-
ment that vertex parts should be as equal in size as possible. We say that a partition is
equitable if all part sizes are within one of each other; that is, |𝑉𝑖 | − 𝑉 𝑗 ≤ 1. In other
words, a partition of a set of size 𝑛 into 𝑘 parts is equitable if every part has size ⌊𝑛/𝑘⌋ or
⌈𝑛/𝑘⌉.

Theorem 2.1.20 (Equitable regularity lemma)

For all 𝜀 > 0 and 𝑚 0 , there exists a constant 𝑀 such that every graph has an 𝜀-regular
equitable partition of its vertex set into 𝑘 parts with 𝑚 0 ≤ 𝑘 ≤ 𝑀.

Remark 2.1.21. The lower bound 𝑚 0 requirement on the number of parts is somewhat
superficial. The reason for including it here is that it is often convenient to discard all the
edges that lie within individual parts of the partition, and since there are most 𝑛2 /𝑘 such
edges, they contribute negligibly if the number of parts 𝑘 is not too small, which is true if
we require 𝑚 0 ≥ 1/𝜀 in the equitable regularity lemma statement.
There are several ways to guarantee equitability. One method is sketched below. We
equitize the partition at every step of the refinement iteration, so that at each step in the proof,
we both obtain an energy increment and also end up with an equitable partition.
Proof sketch of the equitable regularity lemma (Theorem 2.1.20). Here is a modified al-
gorithm:
(1) Start with an arbitrary equitable partition of the graph into 𝑚 0 parts.
(2) While the current equitable partition P is not 𝜀-regular:
(a) (Refinement/energy boost) Refine the partition using pairs that witness irreg-
ularity (as in the earlier proof). The new partition P ′ divides each part of P
into ≤ 2 | P | parts.
(b) (Equitization) Modify P ′ into an equitable partition by arbitrarily chopping
each part of P ′ into parts of size |𝑉 (𝐺)| /𝑚 (for some appropriately chosen
𝑚 = 𝑚(|P ′ | , 𝜀)) plus some leftover pieces, which are then combined together
and then divided into parts of size |𝑉 (𝐺)| /𝑚.
The refinement step (2)(a) increases energy by ≥ 𝜀 5 as before. The energy might go down
in the equitization step (2)(b), but it should
not decrease
by much, provided that the 𝑚 chosen
in that step is large enough (say, 𝑚 = 100 |P ′ | 𝜀 −5 ). So overall, we still have an energy
increment of ≥ 𝜀 5 /2 at each step, and hence the process still terminates after 𝑂 (𝜀 −5 ) steps.
The total number of parts at the end is ≤ 𝑚 0 tower(𝑂 (𝜀 −5 )). □
Exercise 2.1.22. Complete the details in the above proof sketch.
62 Graph Regularity Method

Exercise 2.1.23 (Making each part 𝜀 -regular to nearly all other parts). Prove that for
all 𝜀 > 0 and 𝑚 0 , there exists a constant 𝑀 so that every graph has an equitable vertex
partition into 𝑘 parts, with 𝑚 0 ≤ 𝑘 ≤ 𝑀, such that each part is 𝜀-regular with all but at
most 𝜀𝑘 other parts.
The important example in the next exercise shows why we must allow irregular pairs in
the graph regularity lemma.
Exercise 2.1.24 (Unavoidability of irregular pairs). Let the half-graph 𝐻𝑛 be the bipartite
graph on 2𝑛 vertices {𝑎 1 , . . . , 𝑎 𝑛 , 𝑏 1 , . . . , 𝑏 𝑛 } with edges {𝑎 𝑖 𝑏 𝑗 : 𝑖 ≤ 𝑗 }.
(a) For every 𝜀 > 0, explicitly construct an 𝜀-regular partition of 𝐻𝑛 into 𝑂 (1/𝜀) parts.
(b) Show that there is some 𝑐 > 0 such that for every 𝜀 ∈ (0, 𝑐), every positive integer
𝑘 and sufficiently large multiple 𝑛 of 𝑘, every partition of the vertices of 𝐻𝑛 into 𝑘
equal-sized parts contains at least 𝑐𝑘 pairs of parts which are not 𝜀-regular.
The next exercise should remind you of the iteration technique from the proof of the graph
regularity lemma.
Exercise 2.1.25 (Existence of a regular pair of subsets). Show that there is some absolute
constant 𝐶 > 0 such that for every 0 < 𝜀 < 1/2, every graph on 𝑛 vertices contains an
𝜀-regular pair of vertex subsets each with size at least 𝛿𝑛, where 𝛿 = 2− 𝜀 .
−𝐶

Hint: Density increment (don’t use energy).

This exercise asks for two different proofs of the following theorem.
Given a graph 𝐺, we say that 𝑋 ⊆ 𝑉 (𝐺) is 𝜺-regular if the pair (𝑋, 𝑋) is 𝜀-regular; that
is, for all 𝐴, 𝐵 ⊆ 𝑋 with | 𝐴| , |𝐵| ≥ 𝜀 |𝑋 |, one has |𝑑 ( 𝐴, 𝐵) − 𝑑 (𝑋, 𝑋)| ≤ 𝜀.

Theorem 2.1.26 (𝜀 -regular subset)

For every 𝜀 > 0, there exists 𝛿 > 0 such that every graph contains an 𝜀-regular subset of
vertices that is an ≥ 𝛿 fraction of the vertex set.

Exercise 2.1.27 (𝜀 -regular subset).

(a) Prove Theorem 2.1.26 using Szemerédi’s regularity lemma, showing that one can
obtain the 𝜀-regular subset by combining a suitable sub-collection of parts from
some regularity partition.
(b*) Give an alternative proof of the theorem with 𝛿 = exp(− exp(𝜀 −𝐶 )) for some
constant 𝐶.
Exercise 2.1.28∗ (Regularity partition into regular sets). Show that for every 𝜀 > 0 there
exists 𝑀 so that every graph has an 𝜀-regular partition into at most 𝑀 parts, with every
part being 𝜀-regular with itself.

2.2 Triangle Counting Lemma

Szemerédi’s regularity lemma gave us a vertex partition of a graph. How can we use this
partition?
In this section, we begin by establishing the triangle counting lemma. Given three vertex
sets 𝑋, 𝑌 , 𝑍, pairwise 𝜀-regular in 𝐺, we can approximate it by a random tripartite graph on
2.2 Triangle Counting Lemma 63

𝑋, 𝑌 , 𝑍 with the same edge densities between parts. By comparing 𝐺 to its random model
approximation, we expect the number of triples (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍 forming a triangle in
𝐺 to be roughly
𝑑 (𝑋, 𝑌 )𝑑 (𝑋, 𝑍)𝑑 (𝑌 , 𝑍)|𝑋 ||𝑌 ||𝑍 |.
The triangle counting lemma makes this intuition precise.
𝑋

𝑦 𝑧
𝑌 𝑍

Theorem 2.2.1 (Triangle counting lemma)

Let 𝐺 be a graph and 𝑋, 𝑌 , 𝑍 be subsets of the vertices of 𝐺 such that (𝑋, 𝑌 ), (𝑌 , 𝑍),
(𝑍, 𝑋) are all 𝜀-regular pairs for some 𝜀 > 0. If 𝑑 (𝑋, 𝑌 ), 𝑑 (𝑋, 𝑍), 𝑑 (𝑌 , 𝑍) ≥ 2𝜀, then

|{(𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍 : 𝑥𝑦𝑧 is a triangle in 𝐺}|

≥ (1 − 2𝜀) (𝑑 (𝑋, 𝑌 ) − 𝜀) (𝑑 (𝑋, 𝑍) − 𝜀) (𝑑 (𝑌 , 𝑍) − 𝜀) |𝑋 | |𝑌 | |𝑍 | .

Remark 2.2.2. The vertex sets 𝑋, 𝑌 , 𝑍 do not have to be disjoint, but one does not lose
any generality by assuming that they are disjoint in this statement. Indeed, starting with
𝑋, 𝑌 , 𝑍 ⊆ 𝑉 (𝐺), one can always create an auxiliary tripartite graph 𝐺 ′ with vertex parts
being disjoint replicas of 𝑋, 𝑌 , 𝑍 and the edge relations in 𝑋 × 𝑌 being the same for 𝐺 and
𝐺 ′ , and likewise for 𝑋 × 𝑍 and 𝑌 × 𝑍. Under this auxiliary construction, a triple in 𝑋 × 𝑌 × 𝑍
forms a triangle in 𝐺 if and only it forms a triangle in 𝐺 ′ .

𝑋
𝑋
−→
𝑌 𝑍
𝑌 𝑍

𝐺 𝐺′

Now we show that in an 𝜀-regular pair (𝑋, 𝑌 ), almost all vertices of 𝑋 have roughly the
same number of neighbors in 𝑌 (the next lemma only states a lower bound on degree, but the
same argument also gives an analogous upper bound).
64 Graph Regularity Method

Lemma 2.2.3 (Most vertices have roughly the same degree)

Let (𝑋, 𝑌 ) be an 𝜀-regular pair. Then fewer than 𝜀 |𝑋 | vertices in 𝑋 have fewer than
(𝑑 (𝑋, 𝑌 ) − 𝜀) |𝑌 | neighbors in 𝑌 . Likewise, fewer than 𝜀 |𝑌 | vertices in 𝑌 have fewer than
(𝑑 (𝑋, 𝑌 ) − 𝜀) |𝑋 | neighbors in 𝑋.

Proof. Let 𝐴 be the subset of vertices in 𝑋 with < (𝑑 (𝑋, 𝑌 ) − 𝜀) |𝑌 | neighbors in 𝑌 . Then
𝑑 ( 𝐴, 𝑌 ) < 𝑑 (𝑋, 𝑌 ) − 𝜀, and thus | 𝐴| < 𝜀 |𝑋 | by Definition 2.1.2 as (𝑋, 𝑌 ) is an 𝜀-regular
pair. The other claim is similar. □
Proof of Theorem 2.2.1. By Lemma 2.2.3, we can find 𝑋 ′ ⊆ 𝑋 with |𝑋 ′ | ≥ (1 − 2𝜀) |𝑋 |
such that every vertex 𝑥 ∈ 𝑋 ′ has ≥ (𝑑 (𝑋, 𝑌 ) − 𝜀) |𝑌 | neighbors in 𝑌 and ≥ (𝑑 (𝑋, 𝑍) − 𝜀)|𝑍 |
neighbors in 𝑍. Write 𝑁𝑌 (𝑥) = 𝑁 (𝑥) ∩ 𝑌 and 𝑁 𝑍 (𝑥) = 𝑁 (𝑥) ∩ 𝑍.

𝑌 𝑍

𝑁𝑌 (𝑥) 𝑁 𝑍 (𝑥)

𝜖-regular

For each such 𝑥 ∈ 𝑋 ′ , we have |𝑁𝑌 (𝑥)| ≥ (𝑑 (𝑋, 𝑌 ) − 𝜀) |𝑌 | ≥ 𝜀|𝑌 |. Likewise, |𝑁 𝑍 (𝑥)| ≥
𝜀|𝑍 |. Since (𝑌 , 𝑍) is 𝜀-regular, the edge density between 𝑁𝑌 (𝑥) and 𝑁 𝑍 (𝑥) is ≥ 𝑑 (𝑌 , 𝑍) − 𝜀.
So for each 𝑥 ∈ 𝑋 ′ , the number of edges between 𝑁𝑌 (𝑥) and 𝑁 𝑍 (𝑥) is
≥ (𝑑 (𝑌 , 𝑍) − 𝜀)|𝑁𝑌 (𝑥)||𝑁 𝑍 (𝑥)| ≥ (𝑑 (𝑋, 𝑌 ) − 𝜀) (𝑑 (𝑋, 𝑍) − 𝜀) (𝑑 (𝑌 , 𝑍) − 𝜀)|𝑌 ||𝑍 |.
Multiplying by |𝑋 ′ | ≥ (1 − 2𝜀) |𝑋 |, we obtain the desired lower bound on the number of
triangles. □
Remark 2.2.4. We only need the lower bound on the triangle count for our applications in
this chapter, but the same proof can also be modified to give an upper bound, which we leave
as an exercise.

2.3 Triangle Removal Lemma

The triangle removal lemma (Ruzsa & Szemerédi 1978) is one of the first major applications
of the regularity method. Informally, the triangle removal lemma says that a graph with few
triangles can be made triangle-free by removing a few edges. Here, “few triangles” means a
2.3 Triangle Removal Lemma 65

subcubic number of triangles (i.e., asymptotically less than the maximum possible number)
and “few edges” means a subquadratic number of edges.

Theorem 2.3.1 (Triangle removal lemma)

For all 𝜀 > 0, there exists 𝛿 > 0 such that any graph on 𝑛 vertices with fewer than 𝛿𝑛3
triangles can be made triangle-free by removing fewer than 𝜀𝑛2 edges.

The triangle removal lemma can be equivalently stated as:

An 𝑛-vertex graph with 𝑜(𝑛3 ) triangles can be made triangle-free by removing 𝑜(𝑛2 ) edges.
Our proof of Theorem 2.3.1 demonstrates how to apply the graph regularity lemma. Here
is a representative “recipe” for the regularity method.
Remark 2.3.2 (Regularity method recipe). Typical applications of the regularity method
proceed in the following steps:
(1) Partition the vertex set of a graph using the regularity lemma.
(2) Clean the graph by removing edges that behave poorly in the regularity partition.
Most commonly, we remove edges that lie between pairs of parts with
(a) irregularity, or
(b) low-density, or
(c) one of the parts too small
This ends up removing a negligible number of edges.
(3) Count a certain pattern in the cleaned graph using a counting lemma.
To prove the triangle removal lemma, after cleaning the graph (which removes few edges),
we claim that the resulting cleaned graph must be triangle-free, or else the triangle counting
lemma would find many triangles, contradicting the hypothesis.
Proof of the triangle removal lemma (Theorem 2.3.1). Suppose we are given a graph on 𝑛
vertices with < 𝛿𝑛3 triangles, for some parameter 𝛿 we will choose later. Apply the graph
regularity lemma, Theorem 2.1.9, to obtain an 𝜀/4-regular partition of the graph with parts
𝑉1 , 𝑉2 , · · · , 𝑉𝑚 . Next, for each (𝑖, 𝑗) ∈ [𝑚] 2 , remove all edges between 𝑉𝑖 and 𝑉 𝑗 if
(a) (𝑉𝑖 , 𝑉 𝑗 ) is not 𝜀/4-regular, or
(b) 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) < 𝜀/2, or
(c) min{|𝑉𝑖 |, |𝑉 𝑗 |} < 𝜀𝑛/(4𝑚).
Since the partition is 𝜀/4-regular (recall Definition 2.1.7), the number of edges removed
in (a) from irregular pairs is
∑︁ 𝜀
≤ |𝑉𝑖 ||𝑉 𝑗 | ≤ 𝑛2 .
𝑖, 𝑗
4
(𝑉𝑖 ,𝑉 𝑗 ) not ( 𝜀/4)-regular

The number of edges removed in (b) from low-density pairs is

∑︁ 𝜀 ∑︁ 𝜀
≤ 𝑑 (𝑉𝑖 , 𝑉 𝑗 )|𝑉𝑖 ||𝑉 𝑗 | ≤ |𝑉𝑖 ||𝑉 𝑗 | = 𝑛2 .
𝑖, 𝑗
2 𝑖, 𝑗 2
𝑑 (𝑉𝑖 ,𝑉 𝑗 ) < 𝜀/2

The number of edges removed in (c) with an endpoint in a small part is

𝜀𝑛 𝜀
<𝑚· · 𝑛 = 𝑛2 .
4𝑚 4
66 Graph Regularity Method

In total, we removed < 𝜀𝑛2 edges from the graph.

We claim that the remaining graph is triangle-free, provided that 𝛿 was chosen appropri-
ately small. Indeed, suppose there remains a triangle whose three vertices lie in 𝑉𝑖 , 𝑉 𝑗 , 𝑉𝑘
(not necessarily distinct parts).
𝑉𝑖
triangle counting lemma

𝑉𝑗 𝑉𝑘

one triangle cubically many triangles

Because edges between the pairs described in (a) and (b) were removed, 𝑉𝑖 , 𝑉 𝑗 , 𝑉𝑘 satisfy the
hypotheses of the triangle counting lemma (Theorem 2.2.1),
𝜀 𝜀 3
#{triangles in 𝑉𝑖 × 𝑉 𝑗 × 𝑉𝑘 } ≥ 1 − |𝑉𝑖 | 𝑉 𝑗 |𝑉𝑘 |
2 4
𝜀 𝜀 3 𝜀𝑛 3
≥ 1− ,
2 4 4𝑚
where the final step uses (c) above. Then as long as
1 𝜀 𝜀 3 𝜀 3
𝛿< 1− ,
6 2 4 4𝑚
we would contradict the hypothesis that the original graph has < 𝛿𝑛3 triangles (the extra
factor of 6 above is there to account for the possibility that 𝑉𝑖 = 𝑉 𝑗 = 𝑉𝑘 ). Since 𝑚 is bounded
for each fixed 𝜀, we see that 𝛿 can be chosen to depend only on 𝜀. □
The next corollary of the triangle removal lemma will soon be used to prove Roth’s
theorem. Here “diamond” refers to the following graph, consisting of two triangles sharing
an edge.

Corollary 2.3.3 (Diamond-free lemma)

Let 𝐺 be an 𝑛-vertex graph where every edge lies in a unique triangle. Then 𝐺 has 𝑜(𝑛2 )
edges.

Proof. Let 𝐺 have 𝑚 edges. Because each edge lies in exactly one triangle, the number of
triangles in 𝐺 is 𝑚/3 = 𝑂 (𝑛2 ) = 𝑜(𝑛3 ). By the triangle removal lemma (see the statement
after Theorem 2.3.1), we can remove 𝑜(𝑛2 ) edges to make 𝐺 triangle-free. However, deleting
an edge removes at most one triangle from the graph by assumption, so 𝑚/3 edges need to
be removed to make 𝐺 triangle-free. Thus 𝑚 = 𝑜(𝑛2 ). □
2.4 Graph Theoretic Proof of Roth’s Theorem 67

Remark 2.3.4 (Quantitative dependencies in the triangle removal lemma). Since the above
proof of the triangle removal lemma applies the graph regularity lemma, the resulting
bounds from the proof are quite poor: it shows that one can pick 𝛿 = 1/tower(𝜀 −𝑂 (1) ).
Using a different but related method, Fox (2011) proved the triangle removal lemma with
a slightly better dependence 𝛿 = 1/tower(𝑂 (log(1/𝜀))). In the other direction, we know
that the triangle removal lemma does not hold with 𝛿 = 𝜀 𝑐 log(1/𝜀) for a sufficiently small
constant 𝑐 > 0. The construction comes from the Behrend construction of large 3-AP-free
sets that we will soon see in Section 2.5. Our knowledge of the quantitative dependence in
Corollary 2.3.3 comes from the same source, specifically, we know that the 𝑜(𝑛2 ) can be
sharpened to 𝑛2 /𝑒 Ω(log (1/𝜀) ) (where log∗ , the iterated logarithm function, is the number of
∗

iterations of log that one needs to take to√bring a number to at most 1) but the statement
is false if the 𝑜(𝑛2 ) is replaced by 𝑛2 𝑒 −𝐶 log 𝑛 for some sufficiently large constant 𝐶. It is
a major open problem to close the gap between these the upper and lower bounds in these
problems.
The triangle removal lemma was historically first considered in the following equivalent
formulation.

Theorem 2.3.5 ( (6, 3) -theorem)

Let 𝐻 be an 𝑛-vertex 3-uniform hypergraph without a subgraph having 6 vertices and 3
edges. Then 𝐻 has 𝑜(𝑛2 ) edges.

Exercise 2.3.6. Deduce the (6, 3)-theorem from Corollary 2.3.3, and vice-versa.
The following conjectural extension of the (6, 3)-theorem is is a major open problem in
extremal combinatorics. The conjecture is attributed to Brown, Erdős, & Sós (1973).

Conjecture 2.3.7 ( (7, 4) -conjecture)

Let 𝐻 be an 𝑛-vertex 3-uniform hypergraph without a subgraph having 7 vertices and 4
edges. Then 𝐻 has 𝑜(𝑛2 ) edges.

2.4 Graph Theoretic Proof of Roth’s Theorem

We will now prove Roth’s theorem, which we saw in Chapter 0 and restated below. The proof
below, due to Ruzsa & Szemerédi (1978) connects graph theory and additive combinatorics,
akin to the proof of Schur’s theorem in Chapter 0.
We write 3-AP for “3-term arithmetic progression.” We say that 𝐴 is 3-AP-free if there
are no 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴 with 𝑦 ≠ 0.

Theorem 2.4.1 (Roth’s theorem)

Let 𝐴 ⊆ [𝑁] be 3-AP-free. Then | 𝐴| = 𝑜(𝑁).

Proof. Embed 𝐴 ⊆ Z/𝑀Z with 𝑀 = 2𝑁 + 1 (to avoid wraparounds). Since 𝐴 is 3-AP-free

in Z, it is 3-AP-free in Z/𝑀Z as well.
Now, we construct a tripartite graph 𝐺 whose parts 𝑋, 𝑌 , 𝑍 are all copies of Z/𝑀Z. The
edges of the graph are (since 𝑀 is odd, we are allowed to divide by 2 in Z/𝑀Z):
68 Graph Regularity Method

• (𝑥, 𝑦) ∈ 𝑋 × 𝑌 whenever 𝑦 − 𝑥 ∈ 𝐴;
• (𝑦, 𝑧) ∈ 𝑌 × 𝑍 whenever 𝑧 − 𝑦 ∈ 𝐴;
• (𝑥, 𝑧) ∈ 𝑋 × 𝑍 whenever (𝑧 − 𝑥)/2 ∈ 𝐴.
Z/𝑀Z

𝑥 ∼ 𝑦 iff 𝑥 𝑥 ∼ 𝑧 iff
𝑦−𝑥 ∈ 𝐴 (𝑧 − 𝑥)/2 ∈ 𝐴

𝑦 𝑧
Z/𝑀Z Z/𝑀Z
𝑦 ∼ 𝑧 iff
𝑧−𝑦 ∈ 𝐴

In this graph, (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍 is a triangle if and only if

𝑧−𝑥
𝑦 − 𝑥, , 𝑧 − 𝑦 ∈ 𝐴.
2
The graph was designed so that the above three numbers form an arithmetic progression in
the listed order. Since 𝐴 is 3-AP-free, these three numbers must be all equal. So, every edge
of 𝐺 lies in a unique triangle, formed by setting the three numbers above to equal.
The graph 𝐺 has exactly 3𝑀 = 6𝑁 + 3 vertices and 3𝑀 | 𝐴| edges. Corollary 2.3.3 implies
that 𝐺 has 𝑜(𝑁 2 ) edges. So | 𝐴| = 𝑜(𝑁). □
Now we prove a higher dimensional generalization of Roth’s theorem.
A corner in Z2 is a three-element set of the form {(𝑥, 𝑦), (𝑥 + 𝑑, 𝑦), (𝑥, 𝑦 + 𝑑)} with 𝑑 > 0.

(Note that one could relax the assumption 𝑑 > 0 to 𝑑 ≠ 0, allowing “negative” corners. As
shown in the first step in the proof below, the assumption 𝑑 > 0 is inconsequential.)

Theorem 2.4.2 (Corner-free)

Every corner-free subset of [𝑁] 2 has size 𝑜(𝑁 2 ).

Remark 2.4.3 (History). The theorem is due to Ajtai & Szemerédi (1974), who originally
proved it by invoking the full power of Szemerédi’s theorem. Here we present a much simpler
proof using the triangle removal lemma due to Solymosi (2003).
Proof. First we show how to relax the assumption in the definition of a corner from 𝑑 > 0
to 𝑑 ≠ 0.
Let 𝐴 ⊆ [𝑁] 2 be a corner-free set. For each 𝑧 ∈ Z2 , let 𝐴 𝑧 = 𝐴 ∩ (𝑧 − 𝐴). Then | 𝐴 𝑧 | is the
Í
number of ways that one can write 𝑧 = 𝑎 + 𝑏 for some (𝑎, 𝑏) ∈ 𝐴 × 𝐴. So 𝑧 ∈ [2𝑁 ] 2 | 𝐴 𝑧 | = | 𝐴| 2 ,
so there is some 𝑧 ∈ [2𝑁] with | 𝐴 𝑧 | ≥ | 𝐴| 2 /(2𝑁) 2 . To show that | 𝐴| = 𝑜(𝑁 2 ), it suffices
to show that | 𝐴 𝑧 | = 𝑜(𝑁 2 ). Moreover, since 𝐴 𝑧 = 𝑧 − 𝐴 𝑧 , it being corner-free implies that it
does not contain three points {(𝑥, 𝑦), (𝑥 + 𝑑, 𝑦), (𝑥, 𝑦 + 𝑑)} with 𝑑 ≠ 0.
2.4 Graph Theoretic Proof of Roth’s Theorem 69

Write 𝐴 = 𝐴 𝑧 from now on. Build a tripartite graph 𝐺 with parts 𝑋 = {𝑥1 , . . . , 𝑥 𝑁 },
𝑌 = {𝑦 1 , . . . , 𝑦 𝑁 } and 𝑍 = {𝑧1 , . . . , 𝑧 2𝑁 }, where each vertex 𝑥 𝑖 corresponds to a vertical line
{𝑥 = 𝑖} ⊆ Z2 , each vertex 𝑦 𝑗 corresponds to a horizontal line {𝑦 = 𝑗 }, and each vertex 𝑧 𝑘
corresponds to a slanted line {𝑦 = −𝑥 + 𝑘 } with slope −1. Join two distinct vertices of 𝐺
with an edge if and only if the corresponding lines intersect at a point belonging to 𝐴. Then,
each triangle in the graph 𝐺 corresponds to a set of three lines of slopes 0, ∞, −1 pairwise
intersecting at a point of 𝐴.
𝑋
𝑥𝑖

𝑦= 𝑗
𝑦𝑗
𝑥+𝑦=𝑘 𝑌 𝑧𝑘 𝑍
𝑥=𝑖

Since 𝐴 is corner-free in the sense stated at the end of the previous paragraph, 𝑥 𝑖 , 𝑦 𝑗 , 𝑧 𝑘 form
a triangle in 𝐺 if and only if the three corresponding lines pass through the same point of 𝐴
(i.e., forming a trivial corner with 𝑑 = 0). Since there is exactly one line of each direction
passing through every point of 𝐴, it follows that each edge of 𝐺 belongs to exactly one
triangle. Thus, by Corollary 2.3.3, 3 | 𝐴| = 𝑒(𝐺) = 𝑜(𝑁 2 ). □
The upper bound on corner-free sets actually implies Roth’s theorem, as shown below. So
we now have a second proof of Roth’s theorem (though, this second proof is secretly the
same as the first proof).

Proposition 2.4.4 (Corner-free sets vs. 3-AP-free sets)

Let 𝑟 3 (𝑁) be the size of the largest subset of [𝑁] which contains no 3-term arithmetic
progression, and 𝑟 ⌞ (𝑁) be the size of the largest subset of [𝑁] 2 which contains no corner.
Then, 𝑟 3 (𝑁)𝑁 ≤ 𝑟 ⌞ (2𝑁).

2𝑁

0 𝐴 𝑁 2𝑁

Proof. Given a 3-AP-free set 𝐴 ⊆ [𝑁] of size 𝑟 3 (𝑁), define a set

𝐵 := (𝑥, 𝑦) ∈ [2𝑁] 2 : 𝑥 − 𝑦 ∈ 𝐴 .
Each element 𝑎 ∈ 𝐴 gives rise to ≥ 𝑁 different elements (𝑥, 𝑦) of 𝐵 with 𝑥 − 𝑦 = 𝑎. So
|𝐵| ≥ 𝑁 | 𝐴|. Furthermore, 𝐵 is corner-free, since each corner (𝑥 + 𝑑, 𝑦), (𝑥, 𝑦), (𝑥, 𝑦 + 𝑑) in
70 Graph Regularity Method

𝐵 gives rise to a 3-AP 𝑥 − 𝑦 − 𝑑, 𝑥 − 𝑦, 𝑥 − 𝑦 + 𝑑 with common difference 𝑑. So |𝐵| ≤ 𝑟 ⌞ (2𝑁).

Thus 𝑟 3 (𝑁)𝑁 ≤ | 𝐴| 𝑁 ≤ |𝐵| ≤ 𝑟 ⌞ (2𝑁). □
Remark 2.4.5 (Quantitative bounds). Both proofs above rely on the graph regularity lemma,
and hence give poor quantitative bounds. They tell us that a 3-AP-free 𝐴 ⊆ [𝑁] has | 𝐴| ≤
𝑁/(log∗ 𝑁) 𝑐 , where log∗ 𝑁 is the iterated logarithm (the number of times the logarithm
function must be applied to bring 𝑁 to less than or equal to 1). Later in Chapter 6 we discuss
Roth’s original Fourier analytic proof, which uses different methods (though sharing the
structure and randomness dichotomy theme) and gives much better quantitative bounds.
The current best upper bound on the size of a 3-AP-free subset of [𝑁] is 𝑁/(log 𝑁) 1+𝑐
for some constant 𝑐 > 0 (Bloom & Sisask 2020). The current best upper bound on the size
of corner-free subsets of [𝑁] 2 is 𝑁 2 /(log log 𝑁) 𝑐 for some constant 𝑐 > 0 (Shkredov 2006).
Both use Fourier analysis.
For the next exercise, apply the triangle removal lemma to an appropriate graph.
Exercise 2.4.6∗ (Arithmetic triangle removal lemma). Show that for every 𝜀 > 0, there
exists 𝛿 > 0 such that if 𝐴 ⊆ [𝑛] has fewer than 𝛿𝑛2 many triples (𝑥, 𝑦, 𝑧) ∈ 𝐴3 with
𝑥 + 𝑦 = 𝑧, then there is some 𝐵 ⊆ 𝐴 with | 𝐴 \ 𝐵| ≤ 𝜀𝑛 such that 𝐵 is sum-free (i.e., no
𝑥, 𝑦, 𝑧 ∈ 𝐵 with 𝑥 + 𝑦 = 𝑧).

2.5 Large 3-AP-Free Sets: Behrend’s Construction

How can we construct a large 3-AP-free subset of [𝑁]?
We can do it greedily. Starting with 0 (which produces a nicer pattern), we successively put
in each positive integer if adding it does not create a 3-AP with the already chosen integers.
This would produce the following sequence:
0 1 3 4 9 10 12 13 27 28 30 31 ··· .
The above sequence is known as a Stanley sequence. It consists of all nonnegative integers
whose ternary representations have only the digits 0 and 1 (why?). Up to 𝑁 = 3𝑘 , the subset
𝐴 ⊆ [𝑁] so constructed has size | 𝐴| = 2𝑘 = 𝑁 log3 2 .
For quite some time, people thought the above example was close to the optimal. Salem &
Spencer (1942) then found a much larger 3-AP-free subset of [𝑁], with size 𝑁 1−𝑜(1) . Their
result was further improved by Behrend (1946), whose construction we present below. This
construction has not yet been substantially improved (see Elkin (2011) and Green & Wolf
(2010) for some lower order improvements).
Behrend’s construction has surprising applications, such as in the design of fast matrix
multiplication algorithms (Coppersmith & Winograd 1990).

Theorem 2.5.1 (Behrend’s construction)

There exists a constant 𝐶 > 0 such√that for every positive integer 𝑁, there exists a
3-AP-free 𝐴 ⊆ [𝑁] with | 𝐴| ≥ 𝑁𝑒 −𝐶 log 𝑁 .

The rough idea is to first find a high dimensional sphere with many lattice points via the
pigeonhole principle. The sphere contains no 3-AP due to convexity. We then project these
2.5 Large 3-AP-Free Sets: Behrend’s Construction 71

lattice points onto Z in a way that creates no additional 3-APs. This is done by treating the
coordinates as the base-𝑞 expansion of an integer with some large 𝑞.
Proof. Let 𝑚 and 𝑑 be two positive integers depending on 𝑁 to be specified
√ later. Consider
the lattice points of 𝑋 = {0, 1, . . . , 𝑚 − 1} 𝑑 that lie on a sphere of radius 𝐿:

𝑋 𝐿 := (𝑥 1 , . . . , 𝑥 𝑑 ) ∈ 𝑋 : 𝑥12 + · · · + 𝑥 2𝑑 = 𝐿 .
Ð𝑑𝑚2
Then, 𝑋 = 𝑖=1 𝑋𝑖 . So by the pigeonhole principle, there exists an 𝐿 ∈ [𝑑𝑚 2 ] such that
2
|𝑋 𝐿 | ≥ 𝑚 /(𝑑𝑚 ). Define the base 2𝑚 digital expansion
𝑑

∑︁
𝑑
𝜙(𝑥1 , . . . , 𝑥 𝑑 ) := 𝑥𝑖 (2𝑚) 𝑖−1 .
𝑖=1

Then 𝜙 is injective on 𝑋. Furthermore, 𝑥, 𝑦, 𝑧 ∈ [𝑚] 𝑑 satisfy 𝑥 + 𝑧 = 2𝑦 if and only if

𝜙(𝑥) + 𝜙(𝑧) = 2𝜙(𝑦) (there are no wraparounds in base 2𝑚 with all coordinates less than
𝑚). Since 𝑋 𝐿 is a subset of a sphere, it is 3-AP-free. Thus 𝜙(𝑋) ⊆ [(2𝑚) 𝑑 ] is
√ a 3-AP-free
set of size ≥ 𝑚 /(𝑑𝑚 ). We can optimize the parameters and take 𝑚 = ⌊𝑒 log 𝑁 /2⌋
𝑑 2
√ and
√︁ −𝐶 log 𝑁
𝑑 = ⌊ log 𝑁⌋, thereby producing a 3-AP-free subset of [𝑁] with of size ≥ 𝑁𝑒 ,
where 𝐶 is some absolute constant. □
The Behrend construction also implies lower bound constructions for the other problems
we saw earlier. For example, since we used the diamond-free lemma (Corollary 2.3.3) to
deduce an upper bound on the size of 3-AP-free set, turning this implication around, we
see that having a large 3-AP-free set implies the following quantitative limitation on the
diamond-free lemma.

Corollary 2.5.2 (Lower bound for the diamond-free lemma)

√
For every 𝑛 ≥ 3, there is some 𝑛-vertex graph with at least 𝑛2 𝑒 −𝐶 log 𝑛 edges where every
edge lies on a unique triangle. Here 𝐶 is some absolute constant.

Proof. In the proof of Theorem 2.4.1, starting from a 3-AP-free set 𝐴 ⊆ [𝑁], we constructed a
graph with 6𝑁 +3 vertices and (6𝑁 +3) | 𝐴| edges such that every edge lies in a unique triangle.
Choosing 𝑁 √= ⌊(𝑛 − 3)/6⌋ and letting 𝐴 be the Behrend construction of Theorem 2.5.1 with
| 𝐴| ≥ 𝑁𝑒 −𝐶 log 𝑁 , we obtain the desired graph. □
Remark 2.5.3 (More lower bounds from Behrend’s construction). The same graph con-
struction also shows, after examining the proof of Corollary 2.3.3, that in the triangle removal
2
lemma, Theorem 2.3.1, one cannot take 𝛿 = 𝑒 −𝑐 (log(1/𝜀) ) if the constant 𝑐 > 0 is too small.
In Proposition 2.4.4 we deduced an upper bound 𝑟 3 (𝑁)𝑁 ≤ 𝑟 ⌞ (2𝑁) on corner-free sets
using 3-AP-free √ sets. The Behrend construction then also gives a corner free subset of [𝑁] 2
of size ≥ 𝑁 2 𝑒 −𝐶 log 𝑁 .
Exercise 2.5.4 (Modifying Behrend’s construction). Prove that there √︁ is some constant
𝐶 > 0 so that for all 𝑁, there exists 𝐴 ⊆ [𝑁] with | 𝐴| ≥ 𝑁 exp(−𝐶 log 𝑁) so that there
do not exist 𝑤, 𝑦, 𝑥, 𝑧 ∈ 𝐴 not all equal and satisfying 𝑥 + 𝑦 + 𝑧 = 3𝑤.
72 Graph Regularity Method

Exercise 2.5.5∗ (Avoiding 5-term quadratic configurations). Prove that there

√︁ is some
constant 𝐶 > 0 so that for all 𝑁, there exists 𝐴 ⊆ [𝑁] with | 𝐴| ≥ 𝑁 exp(−𝐶 log 𝑁) so
that there does not exist a non-constant quadratic polynomial 𝑃 so that 𝑃(0), 𝑃(1), 𝑃(2),
𝑃(3), 𝑃(4) ∈ 𝐴.

2.6 Graph Counting and Removal Lemmas

In this section, we generalize the triangle counting lemma from triangles to other graphs and
discuss applications.

Graph counting lemma

Let us first illustrate the technique for 𝐾4 . Similar to the triangle counting lemma, we embed
the vertices of 𝐾4 one at a time. At each stage we ensure that many eligible vertices remain
for the yet to be embedded vertices.

Proposition 2.6.1 (𝐾4 counting lemma)

Let 0 < 𝜀 < 1. Let 𝑋1 , . . . , 𝑋4 be vertex subsets of a graph 𝐺 such that (𝑋𝑖 , 𝑋 𝑗 ) is 𝜀-
√
regular with edge-density 𝑑𝑖 𝑗 := 𝑑 (𝑋𝑖 , 𝑋 𝑗 ) ≥ 3 𝜀 for each pair 𝑖 < 𝑗. Then the number
of quadruples (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) ∈ 𝑋1 × 𝑋2 × 𝑋3 × 𝑋4 such that 𝑥1 𝑥2 𝑥 3 𝑥4 is a clique in 𝐺 is
≥ (1 − 3𝜀)(𝑑12 − 3𝜀)(𝑑13 − 𝜀) (𝑑14 − 𝜀) (𝑑23 − 𝜀) (𝑑24 − 𝜀) (𝑑34 − 𝜀) |𝑋1 | |𝑋2 | |𝑋3 | |𝑋4 | .

Proof. We repeatedly apply the following statement, which is a simple consequence of the
definition of 𝜀-regularity (and a small extension of Lemma 2.2.3):
Given an 𝜀-regular pair (𝑋, 𝑌 ), and 𝐵 ⊆ 𝑌 with |𝐵| ≥ 𝜀 |𝑌 |, the number of vertices in 𝑋
with < (𝑑 (𝑋, 𝑌 ) − 𝜀) |𝐵| neighbors in 𝐵 is < 𝜀 |𝑋 |.
The number of vertices 𝑋1 with ≥ (𝑑1𝑖 − 𝜀) |𝑋𝑖 | neighbors in 𝑋𝑖 for each 𝑖 = 2, 3, 4 is
≥ (1 − 3𝜀) |𝑋1 |. Fix a choice of such an 𝑥1 ∈ 𝑋1 . For each 𝑖 = 2, 3, 4, let 𝑌𝑖 be the neighbors
of 𝑥 1 in 𝑋𝑖 , so that |𝑌𝑖 | ≥ (𝑑1𝑖 − 𝜀) |𝑋𝑖 |.

𝑋1
𝑥1
𝑥2
𝑍4
𝑌2 𝑌4
𝑌2 𝑌4
𝑋2 𝑋4
𝑌3 𝑍3

𝑋3 𝑌3

The number of vertices in 𝑌2 with ≥ (𝑑2𝑖 − 𝜀) |𝑌𝑖 | common neighbors in 𝑌𝑖 for each 𝑖 = 3, 4
2.6 Graph Counting and Removal Lemmas 73

is ≥ |𝑌2 | − 2𝜀 |𝑋2 | ≥ (𝑑12 − 3𝜀) |𝑋2 |. Fix a choice of such an 𝑥2 ∈ 𝑌2 . For each 𝑖 = 3, 4, let 𝑍𝑖
be the neighbors of 𝑥 2 in 𝑌𝑖 .
For each 𝑖 = 3, 4, |𝑍𝑖 | ≥ (𝑑1𝑖 − 𝜀) (𝑑2𝑖 − 𝜀) |𝑋𝑖 | ≥ 𝜀 |𝑋𝑖 |, and so
𝑒(𝑍3 , 𝑍4 ) ≥ (𝑑34 − 𝜀) |𝑍3 | |𝑍4 |
≥ (𝑑34 − 𝜀) · (𝑑13 − 𝜀) (𝑑23 − 𝜀) |𝑋3 | · (𝑑14 − 𝜀) (𝑑24 − 𝜀) |𝑋4 | .
Any edge between 𝑍3 and 𝑍4 forms a 𝐾4 together with 𝑥 1 and 𝑥 2 . Multiplying the above
quantity with the earlier lower bounds on the number of choices of 𝑥1 and 𝑥2 gives the
result. □
The same strategy works more generally for counting any graph. To find copies of 𝐻, we
embed vertices of 𝐻 one at a time.
Theorem 2.6.2 (Graph counting lemma)
For every graph 𝐻 and real 𝛿 > 0, there exists an 𝜀 > 0 such that the following is true.
Let 𝐺 be a graph, and 𝑋𝑖 ⊆ 𝑉 (𝐺) for each 𝑖 ∈ 𝑉 (𝐻) such that for each 𝑖 𝑗 ∈ 𝐸 (𝐻),
(𝑋𝑖 , 𝑋 𝑗 ) is an 𝜀-regular pair with edge density 𝑑𝑖 𝑗 := 𝑑 (𝑋𝑖 , 𝑋 𝑗 ) ≥ 𝛿. Then the number of
graph homomorphisms 𝐻 → 𝐺 where each 𝑖 ∈ 𝑉 (𝐻) is mapped to 𝑋𝑖 is
Ö Ö
≥ (1 − 𝛿) (𝑑𝑖 𝑗 − 𝛿) |𝑋𝑖 |.
𝑖 𝑗 ∈𝐸 (𝐻 ) 𝑖 ∈𝑉 (𝐻 )

Remark 2.6.3. (a) For a fixed 𝐻, as |𝑋𝑖 | → ∞ for each 𝑖, all but a negligible fraction of
such homomorphisms from 𝐻 are injective (i.e., yielding a copy of 𝐻 as a subgraph).
(b) It is useful (and in fact equivalent) to think about the setting where 𝐺 is a multipartite
graph with parts 𝑋𝑖 , as illustrated below.

3 𝑋2 𝑋3
2

−→

1 4 𝑋1 𝑋4
𝐻
𝐺

In the multipartite setting, we see that the graph counting lemma can be adapted to variants
such as counting induced copies of 𝐻. Indeed, an induced copy of 𝐻 is the same as a 𝑣(𝐻)-
clique in an auxiliary graph 𝐺 ′ obtained by replacing the bipartite graph in 𝐺 between 𝑋𝑖
and 𝑋 𝑗 by its complementary bipartite graph between 𝑋𝑖 and 𝑋 𝑗 for each 𝑖 𝑗 ∉ 𝐸 (𝐻).

𝑋1 𝑋1
in 𝐺 12 𝐺 13 ⇐⇒ in 𝐺 12 𝐺 13

𝑋2 𝐺 23 𝑋3 𝑋2 𝐺 23 𝑋3

induced 𝐻 𝐺 𝐾 𝑣 (𝐻 ) modified 𝐺
74 Graph Regularity Method

(c) We will see a different proof in Section 4.5 using the language of graphons. There,
instead of embedding 𝐻 one vertex at a time, we compare the density of 𝐻 and 𝐻 \ {𝑒}.
We establish the following stronger statement, which has the additional advantage that one
can choose the regularity parameter 𝜀 to depend on the maximum degree of 𝐻 rather than
𝐻 itself. You may wish to skip reading the proof, as it is notationally rather heavy. The main
ideas were already illustrated in the 𝐾4 counting lemma.

Theorem 2.6.4 (Graph counting lemma)

Let 𝐻 be a graph with maximum degree Δ ≥ 1 and 𝑐(𝐻) connected components.
Let 𝜀 > 0. Let 𝐺 be a graph. Let 𝑋𝑖 ⊆ 𝑉 (𝐺) for each 𝑖 ∈ 𝑉 (𝐻). Suppose that for each
𝑖 𝑗 ∈ 𝐸 (𝐻), (𝑋𝑖 , 𝑋 𝑗 ) is an 𝜀-regular pair with edge density 𝑑𝑖 𝑗 := 𝑑 (𝑋𝑖 , 𝑋 𝑗 ) ≥ (Δ+1)𝜀 1/Δ .
Then the number of graph homomorphisms 𝐻 → 𝐺 where each 𝑖 ∈ 𝑉 (𝐻) is mapped to
𝑋𝑖 is
Ö Ö
≥ (1 − Δ𝜀) 𝑐 (𝐻 ) (𝑑𝑖 𝑗 − Δ𝜀 1/Δ ) · |𝑋𝑖 | .
𝑖 𝑗 ∈𝐸 (𝐻 ) 𝑖 ∈𝑉 (𝐻 )

Furthermore, if |𝑋𝑖 | ≥ 𝑣(𝐻)/𝜀 for each 𝑖, then there exists such a homomorphism 𝐻 → 𝐺
that is injective (i.e., an embedding of 𝐻 as a subgraph).

Proof. Let us order and label the vertices of 𝐻 by 1, . . . , 𝑣(𝐻) arbitrarily. We will select
vertices 𝑥1 ∈ 𝑋1 , 𝑥2 ∈ 𝑋2 , . . . in order. The idea is to always make sure that they have enough
neighbors in 𝐺 so that there are many ways to continue the embedding of 𝐻. We say that a
partial embedding 𝑥 1 , . . . , 𝑥 𝑠−1 (here partial embedding means that 𝑥𝑖 𝑥 𝑗 ∈ 𝐸 (𝐺) whenever
𝑖 𝑗 ∈ 𝐸 (𝐻) for all the 𝑥 𝑖 ’s chosen so far) is abundant if for each 𝑗 ≥ 𝑠, the number of
valid extensions 𝑥 𝑗 ∈ 𝑋 𝑗 (meaning that 𝑥 𝑖 𝑥 𝑗 ∈ 𝐸 (𝐺) whenever 𝑖 < 𝑠 and 𝑖 𝑗 ∈ 𝐸 (𝐻)) is
Î
≥ |𝑋 𝑗 | 𝑖<𝑠:𝑖 𝑗 ∈𝐸 (𝐻 ) (𝑑𝑖 𝑗 − 𝜀).
For each 𝑠 = 1, 2, . . . , 𝑣(𝐻) in order, suppose we have already fixed an abundant partial
embedding 𝑥1 , . . . , 𝑥 𝑠−1 . For each 𝑗 ≥ 𝑠, let
𝑌 𝑗 = {𝑥 𝑗 ∈ 𝑋 𝑗 : 𝑥 𝑖 𝑥 𝑗 ∈ 𝐸 (𝐺) whenever 𝑖 < 𝑠 and 𝑖 𝑗 ∈ 𝐸 (𝐻)}
be the set of valid extensions of the 𝑗-th vertex in 𝑋 𝑗 given the partial embeddings of
𝑥1 , . . . , 𝑥 𝑠−1 , so that the abundance hypothesis gives
Ö
|𝑌 𝑗 | ≥ |𝑋 𝑗 | (𝑑𝑖 𝑗 − 𝜀) ≥ (𝜀 1/Δ ) | {𝑖<𝑠:𝑖 𝑗 ∈𝐸 (𝐻 ) } | |𝑋 𝑗 | ≥ 𝜀|𝑋 𝑗 |.
𝑖<𝑠
𝑖 𝑗 ∈𝐸 (𝐻 )

Thus, as in the proof of Proposition 2.6.1 for 𝐾4 , the number of choices 𝑥 𝑠 ∈ 𝑋𝑠 that would
extend 𝑥1 , . . . , 𝑥 𝑠−1 to an abundant partial embedding is
≥ |𝑌𝑠 | − |{𝑖 > 𝑠 : 𝑠𝑖 ∈ 𝐸 (𝐻)}| 𝜀 |𝑋𝑠 |
Ö
≥ |𝑋𝑠 | (𝑑𝑖 𝑗 − 𝜀) − |{𝑖 > 𝑠 : 𝑠𝑖 ∈ 𝐸 (𝐻)}| 𝜀 |𝑋𝑠 | . (†)
𝑖<𝑠
𝑖𝑠∈𝐸 (𝐻 )

If none of 1, . . . , 𝑠 − 1 is a neighbor of 𝑠 in 𝐻, then the first term in (†) is |𝑋𝑠 |, and so

(†) ≥ (1 − Δ𝜀) |𝑋𝑠 | .
2.6 Graph Counting and Removal Lemmas 75

Otherwise we can absorb the second term into the product and obtain
Ö Ö
(†) ≥ |𝑋𝑠 | (𝑑𝑖 𝑗 − 𝜀) − (Δ − 1)𝜀 |𝑋𝑠 | ≥ |𝑋𝑠 | (𝑑𝑖 𝑗 − Δ𝜀 1/Δ ).
𝑖<𝑠 𝑖<𝑠
𝑖𝑠∈𝐸 (𝐻 ) 𝑖𝑠∈𝐸 (𝐻 )

Fix such a choice of 𝑥 𝑠 . And now we move onto embedding the next vertex 𝑥 𝑠+1 .
Multiplying together these lower bounds for the number of choices of each 𝑥 𝑠 over all
𝑠 = 1, . . . , 𝑣(𝐻), we obtain the lower bound on the number of homomorphisms 𝐻 → 𝐺.
Finally, note that in both cases (†) ≥ 𝜀 |𝑋𝑠 |, and so if |𝑋𝑠 | ≥ 𝑣(𝐻)/𝜀, then (†) ≥ 𝑣(𝐻) and
so we can choose each 𝑥 𝑠 to be distinct from the previously embedded vertices 𝑥1 , . . . , 𝑥 𝑠−1 ,
thereby yielding an injective homomorphism. □

Graph removal lemma

As an application, we have the following graph removal lemma, generalizing the triangle
removal lemma, Theorem 2.3.1. The proof is basically the same as Theorem 2.3.1 except
with the above graph counting lemma taking the role of the triangle counting lemma, so we
will not repeat the proof here.

Theorem 2.6.5 (Graph removal lemma)

For every graph 𝐻 and constant 𝜀 > 0, there exists a constant 𝛿 = 𝛿(𝐻, 𝜀) > 0 such
that every 𝑛-vertex graph 𝐺 with fewer than 𝛿𝑛𝑣 (𝐻 ) copies of 𝐻 can be made 𝐻-free by
removing fewer than 𝜀𝑛2 edges.

The next exercise asks you to show that, if 𝐻 is bipartite, then one can prove the 𝐻-removal
lemma without using regularity, and thereby getting a much better bound.
Exercise 2.6.6 (Removal lemma for bipartite graphs with polynomial bounds). Prove
that for every bipartite graph 𝐻, there is a constant 𝐶 such that for every 𝜀 > 0, every
𝑛-vertex graph with fewer than 𝜀𝐶 𝑛𝑣 (𝐻 ) copies of 𝐻 can be made 𝐻-free by removing at
most 𝜀𝑛2 edges.

Erdős–Stone–Simonovits theorem
As another application, let us give a different proof of the Erdős–Stone–Simonovits theorem
from Section 1.5, restated below, which gives the asymptotics (up to a +𝑜(𝑛2 ) error term)
for ex(𝑛, 𝐻), the maximum number of edges in an 𝑛-vertex 𝐻-free graph. We saw a proof in
Section 1.5 using supersaturation and the hypergraph KST theorem. The proof below follows
the partition-clean-count strategy in Remark 2.3.2 combined with an application of Turán’s
theorem. A common feature of many regularity applications is that they “boost” an exact
extremal graph theoretic result (e.g., Turán’s theorem) to an asymptotic result involving more
complex derived structures (e.g., from the existence of a copy of 𝐾𝑟 to embedding a complete
𝑟-partite graph).
76 Graph Regularity Method

Theorem 2.6.7 (Erdős–Stone–Simonovits theorem)

Fix graph 𝐻 with at least one edge. Then
2
1 𝑛
ex(𝑛, 𝐻) = 1 − + 𝑜(1) .
𝜒(𝐻) − 1 2
2
1
Proof. Fix 𝜀 > 0. Let 𝐺 be any 𝑛-vertex graph with at least 1 − + 𝜀 𝑛2 edges. The
𝜒 (𝐻 ) −1
theorem is equivalent to the claim that for 𝑛 = 𝑛(𝜀, 𝐻) sufficiently large, 𝐺 contains 𝐻 as a
subgraph.
Apply the graph regularity lemma to obtain an 𝜂-regular partition 𝑉 (𝐺) = 𝑉1 ∪ · · · ∪ 𝑉𝑚
for some sufficiently small 𝜂 > 0 only depending on 𝜀 and 𝐻, to be decided later. Then the
number 𝑚 of parts is also bounded for fixed 𝐻 and 𝜀.
Remove an edge (𝑥, 𝑦) ∈ 𝑉𝑖 × 𝑉 𝑗 if
(a) (𝑉𝑖 , 𝑉 𝑗 ) is not 𝜂-regular, or
(b) 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) < 𝜀/8, or
(c) min{|𝑉𝑖 | , 𝑉 𝑗 } < 𝜀𝑛/(8𝑚).
Then, as in Theorem 2.3.1, the number of edges in (a) is ≤ 𝜂𝑛2 ≤ 𝜀𝑛2 /8, the number of
edges in (b) is < 𝜀𝑛2 /8, and the number of edges in (c) is < 𝑚𝜀𝑛2 /(8𝑚) ≤ 𝜀𝑛2 /8. Thus, the
2 . After removing all these edges, the resulting
2
total number of edges removed is ≤ (3/8)𝜀𝑛
graph 𝐺 ′ has still has > 1 − 𝜒 (𝐻1) −1 + 4𝜀 𝑛2 edges.

𝑉𝑖1
𝐻 𝐺′

𝑉𝑖2 𝑉𝑖3

By Turán’s theorem (Corollary 1.2.6), 𝐺 ′ contains a copy of 𝐾 𝜒 (𝐻 ) . Suppose that the 𝜒(𝐻)
vertices of this 𝐾 𝜒 (𝐻 ) land in 𝑉𝑖1 , · · · , 𝑉𝑖𝜒 (𝐻) (allowing repeated indices). Since each pair of
these sets is 𝜂-regular, has edge density ≥ 𝜀/8, and each has size ≥ 𝜀𝑛/(8𝑚), applying the
graph counting lemma, Theorem 2.6.2, we see that as long as 𝜂 is sufficiently small in terms
of 𝜀 and 𝐻, and 𝑛 is sufficiently large, there exists an injective embedding of 𝐻 into 𝐺 ′
where the vertices of 𝐻 in the 𝑟-th color class are mapped into 𝑉𝑖𝑟 . So 𝐺 contains 𝐻 as a
subgraph. □

2.7 Exercises on Applying Graph Regularity

The regularity method can be difficult at first to grasp conceptually. The following exercises
are useful for gaining familiarity in applying the regularity lemma. For these exercises, you
are welcome to use the equitable form of the graph regularity lemma (Theorem 2.1.20),
which is more convenient to apply.
2.8 Induced Graph Removal and Strong Regularity 77

Exercise 2.7.1 (Ramsey–Turán).

(a) Show that for every 𝜀 > 0, there exists 𝛿 > 0 such that every 𝑛-vertex 𝐾4 -free graph
with at least ( 81 + 𝜀)𝑛2 edges contains an independent set of size at least 𝛿𝑛.
(b) Show that for every 𝜀 > 0, there exists 𝛿 > 0 such that every 𝑛-vertex 𝐾4 -free graph
with at least ( 81 − 𝛿)𝑛2 edges and independence number at most 𝛿𝑛 can be made
bipartite by removing at most 𝜀𝑛2 edges.

Exercise 2.7.2 (Ramsey’s theorem in a nearly complete graph). Show that for every 𝐻
there exists some 𝛿 > 0 such that for all sufficiently large 𝑛, if 𝐺 is an 𝑛-vertex graph with
average degree at least (1 − 𝛿)𝑛 and the edges of 𝐺 are colored using 2 colors, then there
is a monochromatic copy of 𝐻.

Exercise 2.7.3 (Nearly homogeneous subset). Show that for every 𝐻 and 𝜀 > 0 there
exists 𝛿 > 0 such that every graph on 𝑛 vertices without an induced copy of 𝐻 contains an
induced subgraph on at least 𝛿𝑛 vertices whose edge density is at most 𝜀 or at least 1 − 𝜀.

Exercise 2.7.4 (Ramsey numbers of bounded degree graphs). Show that for every Δ
there exists a constant 𝐶Δ so that if 𝐻 is a graph with maximum degree at most Δ, then every
2-edge-coloring of a complete graph on at least 𝐶Δ 𝑣(𝐻) vertices contains a monochromatic
copy of 𝐻.

Exercise 2.7.5 (Counting 𝐻 -free graphs).

2
(a) Show that the number of 𝑛-vertex triangle-free graphs is 2 (1/4+𝑜 (1) ) 𝑛 .
(b) More generally, show that for any fixed graph 𝐻, the number of 𝑛-vertex 𝐻-free
2
graphs is 2ex(𝑛,𝐻 )+𝑜(𝑛 ) .

Exercise 2.7.6∗ (Induced Ramsey). Show that for every graph 𝐻 there is some graph 𝐺
such that if the edges of 𝐺 are colored with two colors, then some induced subgraph of 𝐺
is a monochromatic copy of 𝐻.

Exercise 2.7.7∗ (Finding a degree-regular subgraph). Show that for every 𝛼 > 0, there
exists 𝛽 > 0 such that every graph on 𝑛 vertices with at least 𝛼𝑛2 edges contains a 𝑑-regular
subgraph for some 𝑑 ≥ 𝛽𝑛 (here 𝑑-regular refers to every vertex having degree 𝑑).

2.8 Induced Graph Removal and Strong Regularity

Recall that 𝐻 is an induced subgraph of 𝐺 if one can obtain 𝐻 from 𝐺 by deleting vertices
from 𝐺 (but you are not allowed to simply remove edges from 𝐺). We say that 𝐺 is induced
𝑯-free if 𝐺 contains no induced subgraph isomorphic to 𝐻. (See Notation and Conventions.)
The following removal lemma for induced subgraphs is due to Alon, Fischer, Krivelevich,
& Szegedy (2000).

Theorem 2.8.1 (Induced graph removal lemma)

For any graph 𝐻 and 𝜀 > 0, there exists 𝛿 > 0 such that if an 𝑛-vertex graph has fewer
than 𝛿𝑛𝑣 (𝐻 ) induced copies of 𝐻, then it can be made induced 𝐻-free by adding and/or
deleting fewer than 𝜀𝑛2 edges.
78 Graph Regularity Method

Remark 2.8.2. Given two graphs on the same vertex set, the minimum number of edges
that one needs to add/delete to obtain the second graph from the first graph is called the edit
distance between the two graphs. The induced graph removal lemma can be rephrased as
saying that every graph with few induced copies of 𝐻 is close in edit distance to an induced
𝐻-free graph.
Unlike the previous graph removal lemma, for the induced version, it is important that
we allow both adding and deleting edges. The statement would be false if we only allow
edge deletion but not addition. For example, suppose 𝐺 = 𝐾𝑛 \ 𝐾3 (i.e., a complete graph on
𝑛 vertices with three edges of a single triangle removed). If 𝐻 is an empty graph on three
vertices, then 𝐺 has exactly one induced copy of 𝐻, but 𝐺 cannot be made induced 𝐻-free
by only deleting edges.
To see why the earlier proof of the graph removal lemma (Theorem 2.6.5) does not apply
in a straightforward way to prove the induced graph removal lemma, let us attempt to follow
the earlier strategy and see where things go wrong.
First we apply the graph regularity lemma. Then we need to clean up the graph. In the
induced graph removal lemma, edges and non-edges play symmetric roles. We can handle
low density pairs (edge density less than 𝜀) by removing edges between such pairs. Naturally,
for the induced graph removal lemma, we also need to handle high density pairs (density
more than 1 − 𝜀), and we can add all the edges between such pairs. However, it is not clear
what to do with irregular pairs. Earlier, we just removed all edges between irregular pairs. The
problem is that this may create many induced copies of 𝐻 that were not present previously
(see illustration below). Likewise, we cannot simply add all edges between irregular pairs.

irregular

Perhaps we can always find a regularity partition without irregular pairs? Unfortunately, this
is false, as shown in Exercise 2.1.24. One must allow for the possibility of irregular pairs.

Strong regularity lemma

We will iterate the regularity partitioning lemma to obtain a stronger form of the regularity
lemma. Recall the energy 𝑞(P) of a partition (Definition 2.1.10) as the mean-squared edge
density between parts.
2.8 Induced Graph Removal and Strong Regularity 79

Theorem 2.8.3 (Strong regularity lemma)

For any sequence of constants 𝜀0 ≥ 𝜀1 ≥ 𝜀2 ≥ . . . > 0, there exists an integer 𝑀 so that
every graph has two vertex partitions P and Q so that
(a) Q refines P,
(b) P is 𝜀 0 -regular and Q is 𝜀 | P | -regular,
(c) 𝑞(Q) ≤ 𝑞(P) + 𝜀0 , and
(d) |Q| ≤ 𝑀.

Remark 2.8.4. One should think of the sequence 𝜀 1 , 𝜀 2 , . . . as rapidly decreasing. This
strong regularity lemma outputs a refining pair of partitions P and Q such that P is regular,
Q is extremely regular, and P and Q are close to each other (as captured by 𝑞(P) ≤ 𝑞(Q) ≤
𝑞(P) + 𝜀0 ; see Lemma 2.8.7 below). A key point here is that we demand Q to be extremely
regular relative to the number of parts of P. The more parts P has, the more regular Q should
be.
Proof. We repeatedly apply the following version of Szemerédi’s regularity lemma:
Theorem 2.1.19 (restated): For all 𝜀 > 0, there exists an integer 𝑀0 = 𝑀0 (𝜀) so that for
all partitions P of 𝑉 (𝐺), there exists a refinement P ′ of P with each part in P refined into
≤ 𝑀0 parts so that P ′ is 𝜀-regular.
By iteratively applying the above regularity partition, we obtain a sequence of partitions
P0 , P1 , . . . of 𝑉 (𝐺) starting with P0 = {𝑉 (𝐺)} being the trivial partition. Each P𝑖+1 is
𝜀 | P𝑖 | -regular and refines P𝑖 . The regularity lemma guarantees that we can have |P𝑖+1 | ≤
|P𝑖 | 𝑀0 (𝜀 | P𝑖 | ).
Since 0 ≤ 𝑞(·) ≤ 1, there exists 𝑖 ≤ 𝜀0−1 so that 𝑞(P𝑖+1 ) ≤ 𝑞(P𝑖 ) + 𝜀0 . Then setting P = P𝑖
and Q = P𝑖+1 satisfies the desired requirements. Indeed, the number of parts of Q is bounded
by a function of the sequence (𝜀0 , 𝜀1 , . . . ) since there are a bounded number of iterations
and each iteration produced a refining partition with a bounded number of parts. □
Remark 2.8.5 (Bounds in the strong regularity lemma). The bound on 𝑀 produced by the
proof depends on the sequence (𝜀 0 , 𝜀1 , . . . ). In the application below, we use 𝜀 𝑖 = 𝜀 0 /poly(𝑖).
Then the size of 𝑀 is comparable to applying 𝑀0 to 𝜀 0 in succession 1/𝜀 0 times. Note that 𝑀0
is a tower function, and this makes 𝑀 a tower function iterated 𝑖 times. This iterated tower
function is called the wowzer function: wowzer(𝒌) := tower(tower(· · · (tower(𝑘)) · · · ))
(with 𝑘 applications of tower). The wowzer function is one step up from the tower function
in the Ackermann hierarchy. It grows extremely quickly.
Remark 2.8.6 (Equitability). We can further ensure that the parts have nearly equal size.
This can be done by adapting the ideas sketched in the proof sketch of Theorem 2.1.20.
The following lemma explains the significance of the inequality 𝑞(Q) ≤ 𝑞(P) + 𝜀 from
earlier.
80 Graph Regularity Method

Lemma 2.8.7 (Energy and approximation)

Let P and Q both be vertex partitions of a graph 𝐺, with Q refining P. For each 𝑥 ∈ 𝑉 (𝐺),
write 𝑉𝑥 for the part of P that 𝑥 lies in and 𝑊 𝑥 for the part of Q that 𝑥 lies in. If
𝑞(Q) ≤ 𝑞(P) + 𝜀 3 ,
then
𝑑 (𝑉𝑥 , 𝑉𝑦 ) − 𝑑 (𝑊 𝑥 , 𝑊 𝑦 ) ≤ 𝜀
for all but 𝜀𝑛2 pairs (𝑥, 𝑦) ∈ 𝑉 (𝐺) 2 .

Proof. Let 𝑥, 𝑦 ∈ 𝑉 (𝐺) be chosen uniformly at random. As in the proof of Lemma 2.1.11,
we have 𝑞(P) = E[𝑍 P2 ], where 𝑍 P = 𝑑 (𝑉𝑥 , 𝑉𝑦 ). Likewise, 𝑞(Q) = E[𝑍 Q2 ], where 𝑍 Q =
𝑑 (𝑊 𝑥 , 𝑊 𝑦 ).
We have
𝑞(Q) − 𝑞(P) = E[𝑍 Q2 ] − E[𝑍 P2 ] = E[(𝑍 Q − 𝑍 P ) 2 ],
where the final step above is a “Pythagorean identity.”
𝑍Q
𝑍P

𝑍Q − 𝑍P

Indeed, the identity E[𝑍 Q2 ] − E[𝑍 P2 ] = E[(𝑍 Q − 𝑍 P ) 2 ] is equivalent to E[𝑍 P (𝑍 Q − 𝑍 P )] = 0,

which is true since as 𝑥 and 𝑦 each vary over their own parts of P, the expression 𝑍 Q − 𝑍 P
averages to zero.
So 𝑞(Q) ≤ 𝑞(P) + 𝜀 3 is equivalent to E[(𝑍 Q − 𝑍 P ) 2 ] ≤ 𝜀 3 , which in turn implies,
by Markov’s inequality, that P(|𝑍 Q − 𝑍 P | > 𝜀) ≤ 𝜀, which is the same as the desired
conclusion. □
Exercise 2.8.8. Let 0 < 𝜀 < 1. Using the notation of Lemma 2.8.7, show that if
|𝑑 (𝑉𝑥 , 𝑉𝑦 ) − 𝑑 (𝑊 𝑥 , 𝑊 𝑦 )| ≤ 𝜀 for all but 𝜀𝑛2 pairs (𝑥, 𝑦) ∈ 𝑉 (𝐺) 2 , then 𝑞(Q) ≤ 𝑞(P) + 2𝜀.
We now deduce the following form of the strong regularity lemma, which considers only
select subsets of vertex parts but does not require irregular pairs.

Theorem 2.8.9 (Strong regularity lemma)

For any sequences of constants 𝜀 0 ≥ 𝜀1 ≥ 𝜀2 ≥ · · · > 0, there exists a constant 𝛿 > 0
so that every 𝑛-vertex graph has an equitable vertex partition 𝑉1 ∪ · · · ∪ 𝑉𝑘 and a subset
𝑊𝑖 ⊆ 𝑉𝑖 for each 𝑖 satisfying
(a) |𝑊𝑖 | ≥ 𝛿𝑛,
(b) (𝑊𝑖 , 𝑊 𝑗 ) is 𝜀 𝑘 -regular for all 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑘, and
(c) 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀 0 for all but < 𝜀0 𝑘 2 pairs (𝑖, 𝑗) ∈ [𝑘] 2 .
2.8 Induced Graph Removal and Strong Regularity 81

𝑉1

𝑊1

𝑊2 𝑊3
𝑉2 𝑉3

Remark 2.8.10. It is significant that all (rather than nearly all) pairs (𝑊𝑖 , 𝑊 𝑗 ) are regular.
We will need this fact in our applications below.
Proof sketch.. Here we show how to prove a slightly weaker result where 𝑖 ≤ 𝑗 in (b) is
replaced by 𝑖 < 𝑗. In other words, this proof does not promise that each 𝑊𝑖 is 𝜀 𝑘 -regular. To
obtain the stronger conclusion as stated (requiring each 𝑊𝑖 to be regular with itself), we can
adapt the ideas in Exercise 2.1.27. We omit the details.
By decreasing the 𝜀𝑖 ’s if needed (we can do this since a smaller sequence of 𝜀𝑖 ’s yields a
stronger conclusion), we may assume that 𝜀𝑖 ≤ 1/(10𝑖 2 ) and 𝜀 𝑖 ≤ 𝜀 0 /4 for every 𝑖 ≥ 1.
Let us apply the strong regularity lemma, Theorem 2.8.3, with equitable partitions (see
above Remark 2.8.6). That is, we have (we make the simplifying assumption that all partitions
are exactly equitable, to avoid unimportant technicalities):
• an equitable 𝜀0 -regular partition P = {𝑉1 , . . . , 𝑉𝑘 } of 𝑉 (𝐺) and
• an equitable 𝜀 𝑘 -regular partition Q refining P
satisfying
• 𝑞(Q) ≤ 𝑞(P) + 𝜀03 /8, and
• |Q| ≤ 𝑀 = 𝑀 (𝜀0 , 𝜀1 , . . . ).
Inside each part 𝑉𝑖 , let us choose a part 𝑊𝑖 of Q uniformly at random. Since |Q| ≤ 𝑀,
the equitability assumption implies that each part of Q has size ≥ 𝛿𝑛 for some constant
𝛿 = 𝛿(𝜀0 , 𝜀1 , . . . ). So (a) is satisfied.
Since Q is 𝜀 𝑘 -regular, all but an 𝜀 𝑘 -fraction of pairs of parts of Q are 𝜀 𝑘 -regular. Summing
over all 𝑖 < 𝑗, using linearity of expectations, the expected the number of pairs (𝑊𝑖 , 𝑊 𝑗 ) that
are not 𝜀 𝑘 -regular is ≤ 𝜀 𝑘 𝑘 2 ≤ 1/10. It follows that with probability ≥ 9/10, (𝑊𝑖 , 𝑊 𝑗 ) is
𝜀 𝑘 -regular for all 𝑖 < 𝑗, so (b) is satisfied (this argument ignores 𝑖 = 𝑗 as mentioned at the
beginning of the proof).
Let 𝑋 denote the number of pairs (𝑖, 𝑗) ∈ [𝑘] 2 with 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) > 𝜀0 . Since
𝑞(Q) ≤ 𝑞(P) + (𝜀 0 /2) 3 , by Lemma 2.8.7 and linearity of expectations, E𝑋 ≤ (𝜀0 /2)𝑘 2 . So
by Markov’s inequality, 𝑋 ≤ 𝜀 0 𝑘 2 with probability ≥ 1/2, so that (c) is satisfied.
It follows that (a) and (b) are both satisfied with probability ≥ 1 − 1/10 − 1/2. Therefore,
there exist valid choices of 𝑊𝑖 ’s. □

Induced graph removal lemma

As with earlier regularity applications, we follow the partition-clean-count recipe from
Remark 2.3.2.
Proof of the induced graph removal lemma (Theorem 2.8.1). Apply Theorem 2.8.9 to ob-
tain a partition 𝑉1 ∪ · · · ∪ 𝑉𝑘 of the vertex set of the graph, along with 𝑊 𝑘 ⊆ 𝑉𝑘 , so that:
82 Graph Regularity Method

(a) (𝑊𝑖 , 𝑊 𝑗 ) is 𝜀 ′ -regular for every 𝑖 ≤ 𝑗, with some sufficiently small constant 𝜀 ′ > 0
depending on 𝜀 and 𝐻,
(b) 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/8 for all but < 𝜀𝑘 2 /8 pairs (𝑖, 𝑗) ∈ [𝑘] 2 , and
(c) |𝑊𝑖 | ≥ 𝛿0 𝑛, for some constant 𝛿0 depending only on 𝜀 and 𝐻.
Now we clean the graph. For each pair 𝑖 ≤ 𝑗 (including 𝑖 = 𝑗),
• if 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/8, then remove all edges between (𝑉𝑖 , 𝑉 𝑗 ), and
• if 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≥ 1 − 𝜀/8, then add all edges between (𝑉𝑖 , 𝑉 𝑗 ).
Note that we are not simply add/removing edges within each pair (𝑊𝑖 , 𝑊 𝑗 ), but rather all of
(𝑉𝑖 , 𝑉 𝑗 ). To bound the number of edges add/deleted, recall (b) from the previous paragraph. If
𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/8 and 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/4, then 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) ≤ 𝜀/4, and the number
of edges in all such (𝑉𝑖 , 𝑉 𝑗 ) is at most 𝜀𝑛2 /4. Likewise for 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≥ 1 − 𝜀/8. For the
remaining < 𝜀𝑘 2 /8 pairs (𝑖, 𝑗) not satisfying 𝑑 (𝑉𝑖 , 𝑉 𝑗 ) − 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/8, the total number
of edges among all such pairs is at most 𝜀𝑛2 /8. All together, we added/deleted < 𝜀𝑛2 edges
from 𝐺. Call the resulting graph 𝐺 ′ . There are no irregular pairs (𝑊𝑖 , 𝑊 𝑗 ) for us to worry
about.
It remains to show that 𝐺 ′ is induced 𝐻-free. Suppose otherwise. Let us count induced
copies of 𝐻 in 𝐺 as in the proof of the graph removal lemma, Theorem 2.6.5. We have
some induced copy of 𝐻 in 𝐺 ′ , with each vertex 𝑣 ∈ 𝑉 (𝐻) embedded in 𝑉𝜙 (𝑣) for some
𝜙 : 𝑉 (𝐻) → [𝑘].
Consider a pair of distinct vertices 𝑢, 𝑣 of 𝐻. If 𝑢𝑣 ∈ 𝐸 (𝐻), there must be an edge in 𝐺 ′
between 𝑉 𝜙 (𝑢) and 𝑉𝜙 (𝑣) (here 𝜙(𝑢) and 𝜙(𝑣) are not necessarily different). So we must not
have deleted all the edges in 𝐺 between 𝑉𝜙 (𝑢) and 𝑉𝜙 (𝑣) in the cleaning step. By the cleaning
algorithm above, this means that 𝑑𝐺 (𝑊𝑖 , 𝑊 𝑗 ) > 𝜀/8.
Likewise, if 𝑢𝑣 ∉ 𝐸 (𝐻) for any pair of distinct 𝑢, 𝑣 ∈ 𝑉 (𝐻), we have 𝑑𝐺 (𝑊𝑖 , 𝑊 𝑗 ) < 1−𝜀/8.
Since (𝑊𝑖 , 𝑊 𝑗 ) is 𝜀 ′ -regular in 𝐺 for every 𝑖 ≤ 𝑗, provided that 𝜀 ′ is small enough (in
terms of 𝜀 and 𝐻), the graph counting lemma, (Theorem 2.6.2 with the induced variation as
in Remark 2.6.3(b)) applied to 𝐺 gives
𝜀 ( 𝑣 (𝐻)
2 )
# induced copies of 𝐻 in 𝐺 ≥ (1 − 𝜀) (𝛿0 𝑛) 𝑣 (𝐻 ) =: 𝛿𝑛𝑣 (𝐻 )
10
(recall |𝑊𝑖 | ≥ 𝛿0 𝑛). Setting 𝛿 as above, this contradicts the hypothesis that 𝐺 has < 𝛿𝑛𝑣 (𝐻 )
copies of 𝐻. Thus 𝐺 ′ must be induced 𝐻-free. □

Infinite graph removal lemma

Finally, let us prove a graph removal lemma with an infinite number of forbidden induced
subgraphs (Alon & Shapira 2008). Given a (possibly infinite) set H of graphs, we say that
𝐺 is induced H-free if 𝐺 is induced 𝐻-free for every 𝐻 ∈ H .

Theorem 2.8.11 (Infinite graph removal lemma)

For each (possibly infinite) set of graphs H and 𝜀 > 0, there exist ℎ0 and 𝛿 > 0 so that
if 𝐺 is an 𝑛-vertex graph with fewer than 𝛿𝑛𝑣 (𝐻 ) induced copies of 𝐻 for every 𝐻 ∈ H
with at most ℎ0 vertices, then 𝐺 can be made induced H -free by adding/removing fewer
than 𝜀𝑛2 edges.
2.8 Induced Graph Removal and Strong Regularity 83

Remark 2.8.12. The presence of ℎ0 may seem a bit strange at first. In the next section, we
will see a reformulation of this theorem in the language of property testing, where ℎ0 comes
up naturally.
Proof. The proof is mostly the same as the proof of the induced graph removal lemma that
we just saw. The main tricky issue here is how to choose the regularity parameter 𝜀 ′ for every
pair (𝑊𝑖 , 𝑊 𝑗 ) in condition (a) of the earlier proof. Previously, we did not use the full strength
of Theorem 2.8.9, which allowed 𝜀 ′ to depend on 𝑘, but now we are going to use it. Recall
that we had to make sure that this 𝜀 ′ was chosen to be small enough for the 𝐻-counting
lemma to work. Now that there are possibly infinitely many graphs in H , we cannot naively
choose 𝜀 ′ to be sufficiently small. The main point of the proof is to reduce the problem to a
finite subset of H for each 𝑘.
Define a template 𝑇 to be an edge-coloring of the looped 𝑘-clique (i.e., a complete graph
on 𝑘 vertices along with a loop at a every vertex) where each edge is colored by one of
{white, black, gray}. We say that a graph 𝐻 is compatible with a template 𝑇 if there exists a
map 𝜙 : 𝑉 (𝐻) → 𝑉 (𝑇) such that for every distinct pair 𝑢, 𝑣 of vertices of 𝐻:
• if 𝑢𝑣 ∈ 𝐸 (𝐻), then 𝜙(𝑢)𝜙(𝑣) is colored black or gray in 𝑇; and
• if 𝑢𝑣 ∉ 𝐸 (𝐻), then 𝜙(𝑢)𝜙(𝑣) is colored white or gray in 𝑇.
That is, a black edge in a template means an edge of 𝐻, a white edge means a non-edge of
𝐻, and a gray edge is a wildcard. An example is shown below.
𝑏
𝑎
𝑐 𝑏 𝜙 black
−→ gray
𝑏 𝑐 (none) white
𝑎 𝑐
𝐻 𝑇

As another example, every graph is compatible with every completely gray template.
For every template 𝑇, pick some representative 𝐻𝑇 ∈ H compatible with 𝑇, as long as
such a representative exists (and ignore 𝑇 otherwise). A graph in H is allowed to be the
representative of more than one template. Let H𝑘 be a set of all 𝐻 ∈ H that arise as the
representative of some 𝑘-vertex template. Note that H𝑘 is finite since there are finitely many
𝑘-vertex templates. We can pick each 𝜀 𝑘 > 0 to be small enough so that the conclusion of
the counting step later can be guaranteed for all elements of H𝑘 .
Now we proceed nearly identically as in the proof of the induced removal lemma, Theo-
rem 2.8.1, that we just saw. In applying Theorem 2.8.9 to obtain the partition 𝑉1 ∪ · · · ∪ 𝑉𝑘
and finding 𝑊𝑖 ⊆ 𝑉𝑖 , we ensure the following condition instead of the earlier (a):
(a) (𝑊𝑖 , 𝑊 𝑗 ) is 𝜀 𝑘 -regular for every 𝑖 ≤ 𝑗.
We set ℎ0 to be the maximum number of vertices of a graph in H𝑘 .
Now we do the cleaning step. Along the way, we create a 𝑘-vertex template 𝑇 with vertex
set [𝑘] corresponding to the parts {𝑉1 , . . . , 𝑉𝑘 } of the partition. For each 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛,
• if 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≤ 𝜀/4, then remove all edges between (𝑉𝑖 , 𝑉 𝑗 ) from 𝐺, and color the edge
𝑖 𝑗 in template 𝑇 white;
84 Graph Regularity Method

• if 𝑑 (𝑊𝑖 , 𝑊 𝑗 ) ≥ 1 − 𝜀/4, then add all edges between (𝑉𝑖 , 𝑉 𝑗 ), and color the edge 𝑖 𝑗 in
template 𝑇 black;
• otherwise, color the edge in 𝑖 𝑗 in template 𝑇 gray.
Finally, suppose some induced 𝐻 ∈ H remains in 𝐺 ′ . Due to our cleaning procedure, 𝐻
must be compatible with the template 𝑇. Then the representative 𝐻𝑇 ∈ H𝑘 of 𝑇 is a graph on
at most ℎ0 vertices, and furthermore, the counting lemma guarantees that, provided 𝜀 𝑘 > 0
is small enough (subject to a finite number of pre-chosen constraints, one for each element
of H𝑘 ), the number of copies of 𝐻𝑇 in 𝐺 is ≥ 𝛿𝑛𝑣 (𝐻𝑇 ) for some constant 𝛿 > 0 that only
depends on 𝜀 and H . This contradicts the hypothesis, and thus 𝐺 ′ is induced H -free. □
All the techniques above work nearly verbatim for a generalization to colored graphs.

Theorem 2.8.13 (Infinite edge-colored graph removal lemma)

For every 𝜀 > 0, positive integer 𝑟, and a (possibly infinite) set H of 𝑟-edge-colored
graphs, there exists some ℎ0 and 𝛿 > 0 such that if 𝐺 is an 𝑟-edge-coloring of the
complete graph on 𝑛 vertices with < 𝛿𝑛𝑣 (𝐻 ) copies of 𝐻 for every H with at most ℎ0
vertices, then 𝐺 can be made H -free by recoloring < 𝜀𝑛2 edges (using the same palette
of 𝑟 colors throughout).

The induced graph removal lemma corresponds to the special case 𝑟 = 2, with the two
colors representing edges and non-edges respectively.

2.9 Graph Property Testing

We are given random query access to a very large graph. The graph may be too large for us to
see every vertex or edge. What can we learn about the graph by sampling a constant number
of vertices and the edges between them?
For example, we cannot distinguish two graphs if they only differ on a small number of
vertices or edges. We also need some error tolerance.
A graph property P is simply a set of isomorphism classes of graphs. The graph properties
that we usually encounter have some nice name and/or compact description, such as triangle-
free, planar, and 3-colorable.
We say that an 𝑛-vertex graph 𝐺 is 𝜺-far from property P if one cannot change 𝐺 into a
graph in P by adding/deleting 𝜀𝑛2 edges.
The following theorem gives a straightforward algorithm, with a probabilistic guarantee,
on testing triangle-freeness. It allows us to distinguish two types of graphs from each other:
triangle-free vs. far from triangle-free.

Theorem 2.9.1 (Triangle-freeness is testable)

For every 𝜀 > 0, there exists 𝐾 = 𝐾 (𝜀) so that the following algorithm satisfies the
probabilistic guarantees below.
Input. A graph 𝐺.
Algorithm. Sample 𝐾 vertices from 𝐺 uniformly at random without replacement (if 𝐺
has fewer than 𝐾 vertices, then return the entire graph). If 𝐺 has no triangles among
2.9 Graph Property Testing 85

these 𝐾 vertices, then output that 𝐺 is triangle-free; else output that 𝐺 is 𝜀-far from
triangle-free.
Probabilistic guarantees.
(a) If the input graph 𝐺 is triangle-free, then the algorithm always correctly outputs
that 𝐺 is triangle-free;
(b) If the input graph 𝐺 is 𝜀-far from triangle-free, then with probability ≥ 0.99 the
algorithm outputs that 𝐺 is 𝜀-far from triangle-free;
(c) We do not make any guarantees when the input graph is neither triangle-free nor
𝜀-far from triangle-free.

Remark 2.9.2. This is an example of a one-sided tester, meaning that it always (non-
probabilistically) outputs a correct answer when 𝐺 satisfies property P and only has a
probabilistic guarantee when 𝐺 does not satisfy property 𝐺. (In contrast, a two-sided tester
would have probabilistic guarantees for both situations.)
For a one-sided tester, there is nothing special about the number 0.99 above in (b). It can
be any positive constant 𝛿 > 0. If we run the algorithm 𝑚 times, then the probability of
success improves from ≥ 𝛿 to ≥ 1 − (1 − 𝛿) 𝑚 , which can be made arbitrarily close to 1 if we
choose 𝑚 large enough.
The probabilistic guarantee turns out to be essentially a rephrasing of the triangle removal
lemma.
Proof. If the graph 𝐺 is triangle-free, the algorithm clearly always outputs correctly. On
the other hand, if 𝐺 is 𝜀-far
from triangle-free, then by the triangle removal lemma (The-
orem 2.3.1), 𝐺 has ≥ 𝛿 𝑛3 triangles with some constant 𝛿 = 𝛿(𝜀) > 0. If we sample three
vertices from 𝐺 uniformly at random, then then they form a triangle with probability ≥ 𝛿. And
if run 𝐾/3 independent trials, then the probability that we see a triangle is ≥ 1 − (1 − 𝛿) 𝐾/3 ,
which is ≥ 0.99 as long as 𝐾 is a sufficiently large constant (depending on 𝛿, which in turn
depends on 𝜀).
In the algorithm as stated in the theorem, 𝐾 vertices are sampled without replacement.
Above we had 𝐾 independent trials of picking a triple of vertices at random. But this difference
hardly matters. We can couple the two processes by adding additional random verrtices to
the latter process until we see 𝐾 distinct vertices. □
Just as how the guarantee of the above algorithm is essentially a rephrasing of the triangle
removal lemma, other graph removal lemmas can be rephrased as graph property testing
theorems. For the infinite induced graph removal lemma, Theorem 2.8.11, we can rephrase
the result in terms of graph property testing for hereditary properties.
A graph property P is hereditary if it is closed under vertex-deletion: if 𝐺 ∈ P, then
every induced subgraph of 𝐺 is in P. Here are some examples of hereditary graph properties:
𝐻-free, induced 𝐻-free, planar, 3-colorable, perfect. Every hereditary property P can be
characterized as the set of induced H -free graph for some (possibly infinite) family of graphs
H ; we can take H = {𝐻 : 𝐻 ∉ P}.
86 Graph Regularity Method

Theorem 2.9.3 (Every hereditary graph property is testable)

For every hereditary graph property P, and constant 𝜀 > 0, there exists a constant
𝐾 = 𝐾 (P, 𝜀) so that the following algorithm satisfies the probabilistic guarantees listed
below.
Input. A graph 𝐺.
Algorithm. Sample 𝐾 vertices from 𝐺 uniformly at random without replacement and let
𝐻 be the induced subgraph on these 𝐾 vertices. If 𝐻 ∈ P, then output that 𝐺 satisfies P;
else output that 𝐺 is 𝜀-far from P.
Probabilistic guarantees.
(a) If the input graph 𝐺 satisfies P, then the algorithm always correctly outputs that
𝐺 satisfies P;
(b) If the input graph 𝐺 is 𝜀-far from P, then with probability ≥ 0.99 the algorithm
outputs that 𝐺 is 𝜀-far from P;
(c) We do not make any guarantees when the input graph is neither in P nor 𝜀-far
from P.

Proof. If 𝐺 ∈ P, then since P is hereditary, 𝐻 ∈ P, and so the algorithm always correctly

outputs that 𝐺 ∈ P. So suppose 𝐺 is 𝜀-far from P. Let H be such that P is the set of
induced H -free graphs. By the infinite induced graph removal lemma, there is some ℎ0
𝑛
and 𝛿 > 0 so that 𝐺 has ≥ 𝛿 𝑣 (𝐻 )
copies of some 𝐻 ∈ H with at most ℎ0 vertices. So
with probability ≥ 𝛿, a sample of ℎ0 vertices sees an induced subgraph not satisfying P.
Running 𝐾/ℎ0 independent trials, we see some induced subgraph not satisfying P with
probability ≥ 1 − (1 − 𝛿) 𝐾/ℎ0 , which can made arbitrarily close to 1 by choosing 𝐾 to be
sufficiently large. As with earlier, this implies the result about choosing 𝐾 random points
without replacement. □

2.10 Hypergraph Removal and Szemerédi’s Theorem

We showed earlier how to deduce Roth’s theorem from the triangle removal lemma. However,
the graph removal lemma, or the graph regularity method more generally, is insufficient for
understanding longer arithmetic progressions.
Szemerédi’s theorem follows as a corollary of a hypergraph generalization of the triangle
removal lemma. (Note that historically, Szemerédi’s theorem was initially shown using other
methods; see the discussion in Section 0.2). The hypergraph removal lemma turns out to be
substantially more difficult. The following theorem was proved by Rödl et al. (2005) and
Gowers (2007). The special case of the tetrahedron removal lemma in 3-graphs was proved
earlier by Frankl & Rödl (2002).

Theorem 2.10.1 (Hypergraph removal lemma)

For every 𝑟-graph 𝐻 and 𝜀 > 0, there exists 𝛿 > 0 so that every 𝑛-vertex 𝑟-graph with
< 𝛿𝑛𝑣 (𝐻 ) copies of 𝐻 can be made 𝐻-free by removing < 𝜀𝑛𝑟 edges.

Recall Szemerédi’s theorem says that for every fixed 𝑘 ≥ 3, every 𝑘-AP-free subset of
[𝑁] has size 𝑜(𝑁). We will prove it as a corollary of the hypergraph removal lemma for
2.10 Hypergraph Removal and Szemerédi’s Theorem 87

𝐻 = 𝐾 𝑘(𝑘−1) , the complete (𝑘 − 1)-graph on 𝑘 vertices (also known as a simplex; when 𝑘 = 3

it is called a tetrahedron). For concreteness, we will show how the deduction works in the
case 𝑘 = 4 (it is straightforward to generalize).
Here is a corollary of the tetrahedron removal lemma. It is analogous to Corollary 2.3.3.

Corollary 2.10.2
If 𝐺 is a 3-graph such that every edge is contained in a unique tetrahedron (i.e., a clique
on four vertices), then 𝐺 has 𝑜(𝑛3 ) edges.

Proof of Szemerédi’s theorem for 4-APs. Let 𝐴 ⊆ [𝑁] be 4-AP-free. Let 𝑀 = 6𝑁 + 1.

Then 𝐴 is also a 4-AP-free subset of Z/𝑀Z (there are no wrap-arounds). Build a 4-partite
3-graph 𝐺 with parts 𝑊, 𝑋, 𝑌 , 𝑍, all of which are 𝑀-vertex sets indexed by the elements
of Z/𝑀Z. We define edges as follows, where 𝑤, 𝑥, 𝑦, 𝑧 range over elements of 𝑊, 𝑋, 𝑌 , 𝑍,
respectively:
𝑤𝑥𝑦 ∈ 𝐸 (𝐺) ⇐⇒ 3𝑤 + 2𝑥 + 𝑦 ∈ 𝐴,
𝑤𝑥𝑧 ∈ 𝐸 (𝐺) ⇐⇒ 2𝑤 + 𝑥 − 𝑧 ∈ 𝐴,
𝑤𝑦𝑧 ∈ 𝐸 (𝐺) ⇐⇒ 𝑤 − 𝑦 − 2𝑧 ∈ 𝐴,
𝑥𝑦𝑧 ∈ 𝐸 (𝐺) ⇐⇒ −𝑥 − 2𝑦 − 3𝑧 ∈ 𝐴.
What is important here is that the 𝑖th expression does not contain the 𝑖th variable.
The vertices 𝑥𝑦𝑧𝑤 form a tetrahedron if and only if
3𝑤 + 2𝑥 + 𝑦, 2𝑤 + 𝑥 − 𝑧, 𝑤 − 𝑦 − 2𝑧, −𝑥 − 2𝑦 − 3𝑧 ∈ 𝐴.
However, these values form a 4-AP with common difference −𝑥 − 𝑦 − 𝑧 − 𝑤. Since 𝐴 is
4-AP-free, the only tetrahedra in 𝐴 are trivial 4-APs (those with common difference zero).
For each triple (𝑤, 𝑥, 𝑦) ∈ 𝑊 × 𝑋 ×𝑌 , there is exactly one 𝑧 ∈ Z/𝑀Z such that 𝑥 + 𝑦 +𝑧 +𝑤 = 0.
Thus, every edge of the hypergraph lies in exactly one tetrahedron.
By Corollary 2.10.2, the number of edges in the hypergraph is 𝑜(𝑀 3 ). On the other hand,
the number of edges is exactly 4𝑀 2 | 𝐴| (for example, for every 𝑎 ∈ 𝐴, there are exactly 𝑀 2
triples (𝑤, 𝑥, 𝑦) ∈ (Z/𝑀Z) 3 with 3𝑤 + 2𝑥 + 𝑦 = 𝑎). Therefore | 𝐴| = 𝑜(𝑀) = 𝑜(𝑁). □
The hypergraph removal lemma is proved using a substantial and difficult generalization
of the graph regularity method to hypergraphs. We will not be able to prove it in this book.
In the next section, we sketch some key ideas in hypergraph regularity.
It is instructive to work out the proof in the special cases below. For the next two exercises,
you should assuming Corollary 2.10.2.
Exercise 2.10.3 (3-dimensional corners). Suppose 𝐴 ⊆ [𝑁] 3 contains no four points of
the form
(𝑥, 𝑦, 𝑧), (𝑥 + 𝑑, 𝑦, 𝑧), (𝑥, 𝑦 + 𝑑, 𝑧), (𝑥, 𝑦, 𝑧 + 𝑑), with 𝑑 > 0.
Show that | 𝐴| = 𝑜(𝑁 3 ).
88 Graph Regularity Method

Exercise 2.10.4 (Multidimensional Szemerédi for axis-aligned squares). Suppose 𝐴 ⊆

[𝑁] 2 contains no four points of the form
(𝑥, 𝑦), (𝑥 + 𝑑, 𝑦), (𝑥, 𝑦 + 𝑑), (𝑥 + 𝑑, 𝑦 + 𝑑), with 𝑑 ≠ 0.
2
Show that | 𝐴| = 𝑜(𝑁 ).

Exercise 2.10.5 (Multidimensional Szemerédi theorem from the hypergraph removal

lemma). Generalizing the previous exercise, prove the multidimensional Szemerédi theo-
rem (Theorem 0.2.6) using the hypergraph removal lemma.

2.11 Hypergraph Regularity

Hypergraph regularity is substantially more difficult to prove than graph regularity. We
only sketch some key ideas here. For concreteness, we focus our discussion on 3-graphs.
Throughout this section, 𝐺 will be a 3-graph with vertex set 𝑉.
What should correspond to an “𝜀-regular pair” from the graph regularity lemma? Here is
an initial attempt.

Definition 2.11.1 (Initial attempt at 3-graph regularity)

Given vertex subsets 𝑉1 , 𝑉2 , 𝑉3 ⊆ 𝑉, we say that (𝑉1 , 𝑉2 , 𝑉3 ) is 𝜺-regular if, for all 𝐴𝑖 ⊆ 𝑉𝑖
such that | 𝐴𝑖 | ≥ 𝜀 |𝑉𝑖 | , we have
|𝑑 (𝑉1 , 𝑉2 , 𝑉3 ) − 𝑑 ( 𝐴1 , 𝐴2 , 𝐴3 )| ≤ 𝜀.
Here, the edge density 𝑑 (𝑋, 𝑌 , 𝑍) is the fraction of elements of 𝑋 × 𝑌 × 𝑍 that are edges
of 𝐺.
By following the proof of the graph regularity lemma nearly verbatim, we can show the
following.

Proposition 2.11.2 (Initial attempt at 3-graph regularity partition)

For all 𝜀 > 0, there exists 𝑀 = 𝑀 (𝜀) such that every 3-graph has a partition into at most
𝑀 parts so that all but at most an 𝜀-fraction of triples of vertices lie in 𝜀-regular triples
of vertex parts.

Can this result be used to prove the hypergraph removal lemma? Unfortunately, no.
Recall that our graph regularity recipe (Remark 2.3.2) involves three steps: partition, clean,
and count. It turns out that no counting lemma is possible for the above notion of 3-graph
regularity.
The notion of 𝜀-regularity is supposed to model pseudorandomness. So why don’t we
try truly random hypergraphs and see what happens? Let us consider two different random
3-graph constructions:
(a) First pick constants 𝑝, 𝑞 ∈ [0, 1] . Build a random graph 𝐺 (2) = G(𝑛, 𝑝), an ordinary
Erdős–Rényi graph. Then construct 𝐺 (3) by including each triangle of 𝐺 (2) as an
edge of 𝐺 (3) with probability 𝑞. Call this 3-graph 𝑋.
(b) For each possible edge (i.e. triple of vertices), include the edge with probability 𝑝 3 𝑞,
independent of all other edges. Call this 3-graph 𝑌 .
2.11 Hypergraph Regularity 89

The edge density in both 𝑋 and 𝑌 are close to 𝑝 3 𝑞, even when restricted to linearly sized
triples of vertex subsets. So both graphs satisfy our above notion of 𝜀-regularity with high
probability. However, we can compute the tetrahedron densities in both of these graphs and
see that they do not match.
The tetrahedron density in 𝑋 is around 𝑞 4 times the 𝐾4 density in the underlying random
graph 𝐺 (2) . The 𝐾4 density in 𝐺 (2) is around 𝑝 6 . So the tetrahedron density in 𝑋 is around
𝑝6 𝑞4.
On the other hand, the tetrahedron density in 𝑌 is around ( 𝑝 3 𝑞) 4 , different from 𝑝 6 𝑞 4
earlier. So we should not expect a counting lemma with this notion of 𝜀-regularity. (Unless
the 3-graph we are counting is linear, as in the exercise below.)
Exercise 2.11.3. Under the notion of 3-graph regularity in Definition 2.11.1, formulate
and prove an 𝐻-counting lemma for every linear 3-graph 𝐻. Here a hypergraph is said to
be linear if every pair of its edges intersects in at most one vertex.
As hinted by the first random hypergraph above, a more useful notion of hypergraph
regularity should involve both vertex subsets as well as subsets of vertex-pairs (i.e., an
underlying 2-graph).
Given a 3-graph 𝐺, a regularity decomposition will consist of
(1) a partition of 𝑉2 into 2-graphs 𝐺 1(2) ∪ · · · ∪ 𝐺 𝑙(2) so that 𝐺 sits in a random-like way
on top of most triples of these 2-graphs (we won’t try to make it precise), and
(2) a partition of 𝑉 that gives an extremely regular partition for all 2-graphs 𝐺 1(2) , . . . , 𝐺 𝑙(2)
(this should be somewhat reminiscent of the strong graph regularity lemma from
Section 2.8).
For such a decomposition to be applicable, it should come with a corresponding counting
lemma.
There are several ways to make the above notions precise. Certain formulations make the
regularity partition easier to prove while the counting lemma harder, and some vice versa.
The interested readers should consult Rödl et al. (2005), Gowers (2007) (see Gowers (2006)
for an exposition of the case of 3-uniform hypergraphs), and Tao (2006) for three different
approaches to the hypergraph regularity lemma.
Remark 2.11.4 (Quantitative bounds). Whereas the proof of the graph regularity lemma
gives tower-type bounds tower(𝜀 −𝑂 (1) ), the proof of the 3-graph regularity lemma has
wowzer-type bounds. The 4-graph regularity lemma moves us one more step up in the Ack-
ermann hierarchy (i.e., iterating wowzer), and so on. Just as with the tower-type lower bound
(Theorem 2.1.17) for the graph regularity lemma, Ackermann type bounds are necessary for
hypergraph regularity as well (Moshkovitz & Shapira 2019).

Further Reading
For surveys on the graph regularity method and applications, see Komlós & Simonovits
(1996) and Komlós, Shokoufandeh, Simonovits, & Szemerédi (2002).
The survey Graph Removal Lemmas by Conlon & Fox (2013) discusses many variants,
extensions, and proof techniques of graph removal lemmas.
For a well-motivated introduction to the hypergraph regularity lemma, see the article
Quasirandomness, Counting and Regularity for 3-Uniform Hypergraphs by Gowers (2006).
90 Graph Regularity Method

Chapter Summary
• Szemerédi’s graph regularity lemma. For every 𝜀 > 0, there exists a constant 𝑀 such
that every graph has an 𝜀-regular partition into at most 𝑀 parts.
– Proof method: energy increment.
• Regularity method recipe: partition, clean, count.
• Graph counting lemma. The number of copies of 𝐻 among 𝜀-regular parts is similar to
random.
• Graph removal lemma. Fix 𝐻. Every 𝑛-vertex graph with 𝑜(𝑛 𝑣 (𝐻 ) ) copies of 𝐻 can be
made 𝐻-free by removing 𝑜(𝑛2 ) edges.
• Roth’s theorem can be proved by applying the triangle removal lemma to a graph whose
triangles correspond to 3-APs.
• Szemerédi’s theorem follows from the hypergraph removal lemma, whose proof uses
the hypergraph regularity method (not covered in this book).
• Induced removal lemma. Fix 𝐻. Every 𝑛-vertex graph with 𝑜(𝑛 𝑣 (𝐻 ) ) induced copies of
𝐻 can be made induced 𝐻-free by adding/removing 𝑜(𝑛2 ) edges
– Proof uses a strong regularity lemma, which involves iterating the earlier graph
regularity lemma.
• Every hereditary graph property is testable.
– One can distinguish graphs that have property P from those that are 𝜀-far from property
P (far in the sense of edit distance ≥ 𝜀𝑛2 ) by sampling a subgraph induced a constant
number of random vertices.
– The probabilistic guarantee is essentially equivalent to removal lemmas.
3

Pseudorandom Graphs

Chapter Highlights
• Equivalent notions of graph quasirandomness
• Role of eigenvalues in pseudorandomness
• Expander mixing lemma
• Eigenvalues of abelian Cayley graphs and the Fourier transform
• Quasirandom groups and representations theory
• Quasirandom Cayley graphs and Grothendieck’s inequality
• Alon–Boppana bound on the second eigenvalue of a 𝑑-regular graph

In the previous chapter on the graph regularity method, we saw that every graph can
be partitioned into a bounded number of vertex parts so that the graph looks “random-
like” between most pairs of parts. In this chapter, we dive further into how a graph can be
random-like.
Pseudorandomness is a concept prevalent in combinatorics, theoretical computer science,
and in many other areas. It specifies how a non-random object can behave like a truly random
object.
Example 3.0.1 (Pseudorandom generators). Suppose you want to generate a random num-
ber on a computer. In most systems and programming languages, you can do this easily with
a single command (e.g., rand()). The output is not actually truly random. Instead, the output
came from a pseudorandom generator, which is some function/algorithm that takes a seed as
input, and passes it through some sophisticated function, so that there is no practical way to
distinguish the output from a truly random object. In other words, the output is not actually
truly random, but for all practical purposes the output cannot be distinguished from a truly
random output.
Example 3.0.2 (Primes). In number theory, the prime numbers behave like a random se-
quence in many ways. The celebrated Riemann hypothesis and its generalizations give quanti-
tative predictions about how closely the primes behave in a certain specific way like a random
sequence. There is also something called Cramér’s random model for the primes that allows
one to make predictions about the asymptotic density of certain patterns in the primes (e.g.,
how many twin primes up to 𝑁 are there?). Empirical data support these predictions, and
they have been proved in certain cases. Nevertheless, there are still notorious open problems
such as the twin prime and Goldbach conjectures. Despite their pseudorandom behavior, the
primes are not random!
Example 3.0.3 (Normal numbers). It is very much believed that the digits of 𝜋 behave in

91
92 Pseudorandom Graphs

a random-like way, where every digit or block of digits appear with frequency similar to
that of a truly random
√ number. Such numbers are called normal. It is widely believed that
numbers such as 2, 𝜋, and 𝑒 are normal, but proofs remain elusive. Again, the digits of 𝜋
are deterministic, not random, but they are believed to behave pseudorandomly. On the other
hand, nearly all real numbers are normal, with the exceptions occupying only a measure zero
subset of the reals.
Coming back to graph theory. The Erdős–Rényi random graph 𝑮 (𝒏, 𝒑) is a random
𝑛-vertex graph where each edge appears with probability 𝑝 independently. Now, given some
specific graph (perhaps an instance of the random graph, or perhaps generated via some
other means), we can ask whether this graph, for the purpose of some intended application,
behaves similarly to that of a typical random graph. What are some useful ways to measure
the pseudorandomness of a graph? This is the main theme that we explore in this chapter.

3.1 Quasirandom Graphs

Here are several natural notions of how a graph (or rather, a sequence of graphs) can look
random. The main theorem of this section says that, surprisingly, these notions are all
equivalent. This result is due to Chung, Graham, & Wilson (1989), who coined the term
quasirandom graphs. Similar ideas also appeared in the work of Thomason (1987). These
results had an important impact in the field.

Theorem 3.1.1 (Quasirandom graphs)

Let 𝑝 ∈ [0, 1] be fixed. Let (𝐺 𝑛 ) be a sequence of graphs with 𝐺 𝑛 having 𝑛 vertices and
( 𝑝 + 𝑜(1)) 𝑛2 edges (here 𝑛 → ∞ along some subsequence of integers, and is allowed to
skip some integers). Denote 𝐺 𝑛 by 𝐺. The following properties are all equivalent:
DISC (discrepancy) 𝑒(𝑋, 𝑌 ) = 𝑝 |𝑋 | |𝑌 | + 𝑜(𝑛2 ) for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺).

DISC’ 𝑒(𝑋) = 𝑝 | 𝑋2 | + 𝑜(𝑛2 ) for all 𝑋 ⊆ 𝑉 (𝐺).
COUNT For every graph 𝐻, the number of labeled copies of 𝐻 in 𝐺 is ( 𝑝 𝑒 (𝐻 ) +
𝑜(1))𝑛𝑣 (𝐻 ) .
(Here a labeled copy of 𝐻 is the same as an injective map 𝑉 (𝐻) → 𝑉 (𝐺) that sends every edge of
𝐻 to an edge of 𝐺. The rate that the 𝑜(1) goes to zero is allowed to depend on 𝐻.)
C4 (4-cycle) The number of labeled 4-cycles is at most ( 𝑝 4 + 𝑜(1))𝑛4 .
CODEG (codegree) Letting codeg(𝑢, 𝑣) denote the number of common neighbors
of 𝑢 and 𝑣,
∑︁
codeg(𝑢, 𝑣) − 𝑝 2 𝑛 = 𝑜(𝑛3 ).
𝑢,𝑣 ∈𝑉 (𝐺)

EIG (eigenvalue) If 𝜆1 ≥ 𝜆2 ≥ · · · ≥ 𝜆 𝑛 are the eigenvalues of the adjacency matrix

of 𝐺, then 𝜆1 = 𝑝𝑛 + 𝑜(𝑛) and max𝑖≠1 |𝜆𝑖 | = 𝑜(𝑛).
3.1 Quasirandom Graphs 93

Definition 3.1.2 (Quasirandom graphs)

We say a sequence of graphs is quasirandom (at edge density 𝑝) if it satisfies the above
conditions for some constant 𝑝 ∈ [0, 1].

Remark 3.1.3 (Single graph vs. a sequence of graphs). Strictly speaking, it does not make
sense to say whether a single graph is quasirandom, but we will abuse the definition as such
when it is clear that the graph we are referring to is part of a sequence.
Remark 3.1.4 (C4 condition). The C4 condition is surprising. It says that the 4-cycle density,
a single statistic, is equivalent to all the other quasirandomness conditions.
We will soon see below in Proposition 3.1.14 that the C4 can be replaced by the equivalent
condition that the number of labeled 4-cycles is ( 𝑝 4 + 𝑜(1))𝑛4 (rather than at most this
quantity).
Remark 3.1.5 (Checking quasirandomness). The discrepancy conditions are hard to verify
since they involve checking exponentially many sets. The other conditions can all be checked
in time polynomial in the size of the graph. So the equivalence gives us an algorithmically
efficient way to certify the discrepancy condition.
Remark 3.1.6 (Quantiative equivalences). Rather than stating these properties for a se-
quence of graphs using a decaying error term 𝑜(1), we can state a quantitative quasirandom-
ness hypothesis for a specific graph using an error tolerance parameter 𝜀. For example, we
can restate the discrepancy condition as follows.
DISC(𝜀): For all 𝑋, 𝑌 ⊆ 𝑉 (𝐺), |𝑒(𝑋, 𝑌 ) − 𝑝 |𝑋 | |𝑌 || < 𝜀𝑛2 .
Similar statements can be made for other quasirandom graph notions. The proof below
shows that these notions are equivalent up to a polynomial change in 𝜀; that is, for each pair
of properties, Prop1(𝜀) implies Prop2(𝐶𝜀 𝑐 ) for some constants 𝐶, 𝑐 > 0.

Examples of quasirandom graphs

First let us check that random graphs are quasirandom (hence justifying the name).
Recall the following basic tail bound for a sum of independent random variables.

Theorem 3.1.7 (Chernoff bound)

Let 𝑋 be a sum of 𝑚 independent Bernoulli random variables (not necessarily identically
distributed). Then for every 𝑡 > 0,
2
P(|𝑋 − E𝑋 | ≥ 𝑡) ≤ 2𝑒 −𝑡 /(2𝑚)

Proposition 3.1.8 (Edge densities in a random graph)

2 2
Let 𝑝 ∈ [0, 1] and 𝜀 > 0. With probability at least 1 − 2𝑛+1 𝑒 − 𝜀 𝑛 , the Erdős–Rényi
random graph G(𝑛, 𝑝) has the property that for every vertex subset 𝑋,

|𝑋 |
𝑒(𝑋) − 𝑝 ≤ 𝜀𝑛2
2
94 Pseudorandom Graphs

Proof. Applying the Chernoff bound to 𝑒(𝑋), we see that

!
|𝑋 | −(𝜀𝑛2 ) 2

2 2 2
P 𝑒(𝑋) − 𝑝 > 𝜀𝑛 ≤ 2 exp ≤ 2 exp −𝜀 𝑛 .
2 2 | 𝑋2 |
The result then follows by taking a union bound over all 2𝑛 subsets 𝑋 of the 𝑛-vertex
graph. □
Applying the Borel–Cantelli lemma with the above bound, we obtain the following con-
sequence.

Corollary 3.1.9 (Random graphs are quasirandom)

Fix 𝑝 ∈ [0, 1]. With probability 1, a sequence of random graphs 𝐺 𝑛 ∼ G(𝑛, 𝑝) is
quasirandom at edge density 𝑝.

It would be somewhat disappointing if the only interesting example of quasirandom graph

were actual random graphs. Fortunately we have more explicit constructions. In the rest of
the chapter, we will see several constructions using Cayley graphs on groups. A notable
example, which we will prove in Section 3.3, is that the Paley graph is quasirandom.
Example 3.1.10 (Paley graph). Let 𝑝 ≡ 1 (mod 4) be a prime. Form a graph with vertex
set F 𝑝 , with two vertices 𝑥, 𝑦 joined if 𝑥 − 𝑦 is a quadratic residue. Then this graph is
quasirandom at edge density 1/2 as 𝑝 → ∞. (By a standard fact from elementary number
theory, since 𝑝 ≡ 1 (mod 4), −1 is a quadratic residue, and hence 𝑥 − 𝑦 is a quadratic residue
if and only if 𝑦 − 𝑥 is. So the graph is well defined.)
In Section 3.4, we will show that for certain sequence of groups, every sequence of Cayley
graphs on them is quasirandom provided that the edge densities converge. We will call such
groups quasirandom. We will later prove the following important example.
Example 3.1.11 (PSL(2, 𝑝) ). Let 𝑝 be a prime. Let 𝑆 ⊆ PSL(2, 𝑝) be a subset of non-zero
elements with 𝑆 = 𝑆 −1 . Let 𝐺 be the Cayley graph on PSL(2, 𝑝) with generator 𝑆, meaning
that the vertices are elements of PSL(2, 𝑝), and two vertices 𝑥, 𝑦 are adjacent if 𝑥 −1 𝑦 ∈ 𝑆.
Then 𝐺 is quasirandom as 𝑝 → ∞ as long as |𝑆| /𝑝 3 converges.
Finally, here is an explicit construction using finite geometry. We leave it as an exercise to
verify its quasirandomness using the conditions given earlier.
Example 3.1.12. Let 𝑝 be a prime. Let 𝑆 ⊆ F 𝑝 ∪ {∞}. Let 𝐺 be a graph on vertex set F2𝑝
where two points are joined if the slope of the line connecting them lies in 𝑆. Then 𝐺 is
quasirandom as 𝑝 → ∞ as long as |𝑆| /𝑝 converges.
Exercise 3.1.13. Prove that the construction in Example 3.1.12 is quasirandom.

Proof of equivalence of graph quasirandomness conditions

We will now start to prove Theorem 3.1.1. Let us begin with a warm-up on how to use apply
the Cauchy–Schwarz inequality in graph theory since it will come up several times in the
proof (we will revisit this topic in Section 5.2).
3.1 Quasirandom Graphs 95

The following statement says that the 4-cycle density is always roughly at least as much as
random. Later in Chapter 5, we will see Sidorenko’s conjecture, which says that all bipartite
graphs have this property.
As a consequence, the C4 condition is equivalent to saying that the number of labeled
4-cycles is ( 𝑝 4 + 𝑜(1))𝑛4 (rather than at most).

Proposition 3.1.14 (Minimum 4-cycle density)

Every 𝑛-vertex graph with at least 𝑝𝑛2 /2 edges has at least 𝑝 4 𝑛4 labeled closed walks of
length 4.

Remark 3.1.15. Since all but 𝑂 (𝑛3 ) such closed walks use four distinct vertices, the above
statement implies that the number of labeled 4-cycles is at least ( 𝑝 4 − 𝑜(1))𝑛4 .
Proof. The number of closed walks of length 4 is
∑︁ 𝑦
𝑤
|{(𝑤, 𝑥, 𝑦, 𝑧) closed walk}| = |{𝑥 : 𝑤 ∼ 𝑥 ∼ 𝑦}| 2
!2
𝑤,𝑦

1 ∑︁
≥ 2 |{𝑥 : 𝑤 ∼ 𝑥 ∼ 𝑦}| 𝑤 𝑦
𝑛 𝑤,𝑦
!2
1 ∑︁ 𝑥
= 2 |{(𝑤, 𝑦) : 𝑤 ∼ 𝑥 ∼ 𝑦}|
𝑛
!2
𝑥

1 ∑︁ 2 𝑥
= 2 (deg 𝑥)
𝑛
!4
𝑥

1 ∑︁
≥ 4 deg 𝑥 𝑥
𝑛 𝑥

= (2𝑒(𝐺)) 4 /𝑛4 ≥ 𝑝 4 𝑛4
Here both inequality steps are due to Cauchy–Schwarz. On the right column is a pictorial
depiction of what is being counted by the inner sum on each line. These diagrams are a useful
way to keep track of the graph inequalities, especially when dealing with much larger graphs,
where the algebraic expressions get unwieldly. Note that each application of the Cauchy–
Schwarz inequality corresponds to “folding” the graph along a line of reflection. □
We shall prove the equivalences of Theorem 3.1.1 in the following way:
DISC′ DISC COUNT

CODEG C4 EIG
Proof that DISC implies DISC′ . Take 𝑌 = 𝑋 in DISC. (Note that 𝑒(𝑋, 𝑋) = 2𝑒(𝑋) and
|𝑋|
2
= |𝑋 | 2 /2 − 𝑂 (𝑛).) □
96 Pseudorandom Graphs

Proof that DISC′ implies DISC. We have the following “polarization identity”, together
with a proof by picture (recall 2𝑒(𝑋) = 𝑒(𝑋, 𝑋)):
𝑒(𝑋, 𝑌 ) = 𝑒(𝑋 ∪ 𝑌 ) + 𝑒(𝑋 ∩ 𝑌 ) − 𝑒(𝑋 \ 𝑌 ) − 𝑒(𝑌 \ 𝑋).

𝑋
𝑌 𝑌
𝑋 𝑌
∩
\

\
𝑋

𝑋 \𝑌
𝑋 ∩𝑌 + = + − −
𝑌\𝑋

If DISC′ holds, then the right-hand side above equals to

|𝑋 ∪ 𝑌 | |𝑋 ∩ 𝑌 | |𝑋 \ 𝑌 | |𝑌 \ 𝑋 |
𝑝 +𝑝 −𝑝 −𝑝 + 𝑜(𝑛2 ) = 𝑝 |𝑋 | |𝑌 | + 𝑜(𝑛2 ),
2 2 2 2
where the final step applies the polarization identity again, this time on the complete graph.
So we have 𝑒(𝑋, 𝑌 ) = 𝑝 |𝑋 | |𝑌 | + 𝑜(𝑛2 ) thereby confirming DISC. □
Proof (deferred) that DISC implies COUNT. This is essentially a counting lemma. In Sec-
tion 2.6 we proved a version of the counting lemma but for lower bounds. The same proof
can be modified to a two-sided bound. We will see another proof of a counting lemma (The-
orem 4.5.1) in the next chapter on graph limits, which gives us a convenient language to set
up a more streamlined proof. So we will defer this proof until then. □
Proof that COUNT implies C4 . C4 is a special case of COUNT. □
Proof that C4 implies CODEG. Assuming C4 , we have
!2
∑︁ ∑︁ 1 ∑︁ 1 2
codeg(𝑢, 𝑣) = deg(𝑥) 2 ≥ deg(𝑥) = 𝑝𝑛2 + 𝑜(𝑛2 ) = 𝑝 2 𝑛3 + 𝑜(𝑛3 ).
𝑢,𝑣 𝑥 ∈𝐺
𝑛 𝑥 ∈𝐺 𝑛

We also have (below the 𝑂 (𝑛3 ) error term is due to walks of length 4 that use repeated
vertices)
∑︁
codeg(𝑢, 𝑣) 2 = # labeled 𝐶4 + 𝑂 (𝑛3 )
𝑢,𝑣

≤ 𝑝 4 𝑛4 + 𝑜(𝑛4 ).
Thus, by the Cauchy–Schwarz inequality,
2 ∑︁
1 ∑︁ 2
2
2
codeg(𝑢, 𝑣) − 𝑝 𝑛 ≤ codeg(𝑢, 𝑣) − 𝑝 2 𝑛
𝑛 𝑢,𝑣
∑︁ ∑︁
𝑢,𝑣

= codeg(𝑢, 𝑣) 2 − 2𝑝 2 𝑛 codeg(𝑢, 𝑣) + 𝑝 4 𝑛4
𝑢,𝑣 𝑢,𝑣

≤ 𝑝 𝑛 − 2𝑝 𝑛 · 𝑝 𝑛 + 𝑝 𝑛 + 𝑜(𝑛4 )
4 4 2 2 3 4 4

= 𝑜(𝑛4 ). □
3.1 Quasirandom Graphs 97

Remark 3.1.16. These calculations share the spirit of the second moment method in proba-
bilistic combinatorics. The condition C4 says that the variance of the codegree of two random
vertices is small.
Exercise 3.1.17. Show that if we modify the COEG condition to
∑︁
codeg(𝑢, 𝑣) − 𝑝 2 𝑛 = 𝑜(𝑛3 ),
𝑢,𝑣 ∈𝑉 (𝐺)

then it would not be enough to imply quasirandomness.

Proof that CODEG implies DISC. We first show that the codegree condition implies the
concentration of degrees:
2 ∑︁
1 ∑︁
|deg 𝑢 − 𝑝𝑛| ≤ (deg 𝑢 − 𝑝𝑛) 2
𝑛 𝑢
∑︁
𝑢
∑︁
= (deg 𝑢) 2 − 2𝑝𝑛 deg 𝑢 + 𝑝 2 𝑛3
∑︁
𝑢 𝑢

= codeg(𝑥, 𝑦) − 4𝑝𝑛 𝑒(𝐺) + 𝑝 2 𝑛3

𝑥,𝑦

= 𝑝 2 𝑛3 − 2𝑝 2 𝑛3 + 𝑝 2 𝑛3 + 𝑜(𝑛3 )
= 𝑜(𝑛3 ). (3.1)
Now we bound the expression in DISC. We have
2
1 2 1 ∑︁
|𝑒(𝑋, 𝑌 ) − 𝑝 |𝑋 | |𝑌 || = (deg(𝑥, 𝑌 ) − 𝑝 |𝑌 |)
𝑛 𝑛 𝑥 ∈𝑋
∑︁
≤ (deg(𝑥, 𝑌 ) − 𝑝 |𝑌 |) 2 .
𝑥 ∈𝑋

The above Cauchy–Schwarz step turned all the summands nonnegative, which allows us to
expand the domain of summation from 𝑋 to all of 𝑉 = 𝑉 (𝐺) in the next step. Continuing,
∑︁
≤ (deg(𝑥, 𝑌 ) − 𝑝 |𝑌 |) 2
∑︁
𝑥 ∈𝑉
∑︁
= deg(𝑥, 𝑌 ) 2 − 2𝑝 |𝑌 | deg(𝑥, 𝑌 ) + 𝑝 2 𝑛 |𝑌 | 2
∑︁
𝑥 ∈𝑉 𝑥 ∈𝑉
∑︁
= codeg(𝑦, 𝑦 ) − 2𝑝 |𝑌 |
′
deg 𝑦 + 𝑝 2 𝑛 |𝑌 | 2
𝑦,𝑦 ′ ∈𝑌 𝑦 ∈𝑌

= |𝑌 | 𝑝 𝑛 − 2𝑝 |𝑌 | · |𝑌 | 𝑝𝑛 + 𝑝 2 𝑛 |𝑌 | 2 + 𝑜(𝑛3 )
2 2
[by CODEG and (3.1)]
3
= 𝑜(𝑛 ). □
Finally, let us consider the graph spectrum, which are eigenvalues of the graph adja-
cency matrix, accounting for eigenvalue multiplicities. Eigenvalues are core to the study of
pseudorandomness and they will play a central role in the rest of this chapter.
In this book, when we talk about the eigenvalues of a graph, we always mean the
eigenvalues of the adjacency matrix of the graph. In other contexts, it may be useful to
98 Pseudorandom Graphs

consider other related matrices, such as the Laplacian matrix, or a normalized adjacency
matrix.
We will generally only consider real symmetric matrices, whose eigenvalues are always
all real (Hermitian matrices also have this property). Our usual convention is to list all the
eigenvalues in order (including multiplicities): 𝜆1 ≥ 𝜆2 ≥ · · · ≥ 𝜆 𝑛 . We refer to 𝜆1 as the
top eigenvalue (or largest eigenvalue), and 𝜆𝑖 as the 𝒊-th eigenvalue (or the 𝒊-th largest
eigenvalue). The second eigenvalue plays an important role. We write 𝜆𝑖 ( 𝐴) for the 𝑖-th
eigenvalue of the matrix 𝐴 and 𝜆𝑖 (𝐺) = 𝜆𝑖 ( 𝐴𝐺 ) where 𝐴𝐺 is the adjacency matrix of 𝐺.
Remark 3.1.18 (Linear algebra review). For every 𝑛 × 𝑛 real symmetric matrix 𝐴 with
eigenvalues 𝜆1 ≥ · · · ≥ 𝜆 𝑛 , we can choose an eigenvector 𝑣 𝑖 ∈ R𝑛 for each eigenvalue 𝜆𝑖
(so that 𝐴𝑣 𝑖 = 𝜆𝑖 𝑣 𝑖 ) and such that {𝑣 1 , . . . , 𝑣 𝑛 } is an orthogonal basis of R𝑛 (this is false for
general non-symmetric matrices).
The Courant–Fischer min-max theorem is an important characterization of eigenvalues
in terms of a variational problem. Here we only state some consequences most useful for us.
We have
⟨𝑣, 𝐴𝑣⟩
𝜆1 = max .
𝑣 ∈R \{0} ⟨𝑣, 𝑣⟩
𝑛

Once we have fixed a choice of an eigenvector 𝑣 1 for the top eigenvalue 𝜆1 , we have
⟨𝑣, 𝐴𝑣⟩
𝜆2 = max .
𝑣⊥𝑣1 ⟨𝑣, 𝑣⟩
𝑣 ∈R𝑛 \{0}

In particular, if 𝐺 is a 𝑑-regular graph, then the all-1 vector, denoted 1 ∈ R𝑣 (𝐺) , is an

eigenvector for the top eigenvalue 𝑑.
The Perron–Frobenius theorem tells us some important information about the top eigen-
vector and eigenvalue of a nonnegative matrix. For every connected graph 𝐺, the top eigen-
vector is simple (i.e., multiplicity one), so that 𝜆𝑖 < 𝜆1 for all 𝑖 > 1. We also have |𝜆𝑖 | ≤ 𝜆1
for all 𝑖 (one has 𝜆 𝑛 = −𝜆1 if and only if 𝐺 is bipartite; see Remark 3.1.23 below). Also, the
top eigenvector 𝑣 1 (which is unique up to scalar multiplication) has all coordinates positive.
If 𝐺 has multiple connected components 𝐺 1 , . . . , 𝐺 𝑘 , then the eigenvalues of 𝐺 (with
multiplicities) are obtained by taking a multiset union of the eigenvalues of its connected
components. An orthogonal system of eigenvectors can also be derived as such, by extending
each eigenvector of 𝐺 𝑖 to an eigenvector of 𝐺 via padding the eigenvector by zeros outside
the vertices of 𝐺 𝑖 .
Here is a useful formula:
tr 𝐴 𝑘 = 𝜆1𝑘 + · · · + 𝜆 𝑛𝑘 .
When 𝐴 is the adjacency matrix of a graph 𝐺, tr 𝐴 𝑘 counts the number of closed walks of
length 𝑘. In particular, tr 𝐴2 = 2𝑒(𝐺).
Proof that EIG implies C4 . Let 𝐴 denote the adjacency matrix of 𝐺. The number of labeled
4-cycles is within 𝑂 (𝑛3 ) of the number of closed walks of length 4, and the latter equals
∑︁
𝑛
4
tr 𝐴 = 𝜆41 +···+ 𝜆4𝑛 4 4
= 𝑝 𝑛 + 𝑜(𝑛 ) + 4
𝜆4𝑖 .
𝑖=2
3.1 Quasirandom Graphs 99

Since tr 𝐴2 = 2𝑒(𝐺) ≤ 𝑛2 , we have

∑︁
𝑛 ∑︁
𝑛
𝜆4𝑖 ≤ max 𝜆2𝑖 · 𝜆2𝑖 . = 𝑜(𝑛2 ) · tr 𝐴2 = 𝑜(𝑛4 ).
𝑖≠1
𝑖=2 𝑖=1
4 4 4 4
So tr 𝐴 ≤ 𝑝 𝑛 + 𝑜(𝑛 ). □
Í
Remark 3.1.19. A rookie error would be to bound 𝜆4𝑖 by 𝑛 max𝑖 ≥2 𝜆4𝑖 = 𝑜(𝑛5 ), but this
𝑖 ≥2
would not be enough. (Where do we save in the above proof?) We will see a similar situation
later in Chapter 6 when we discuss the Fourier analytic proof of Roth’s theorem.

Lemma 3.1.20 (Top eigenvalue and average degree)

The top eigenvalue of the adjacency matrix of a graph is always at least its average degree.

Proof. Let 1 ∈ R𝑛 be the all-1 vector. By the Courant–Fischer min-max theorem, the
adjacency matrix 𝐴 of the graph 𝐺 has top eigenvalue
⟨𝑥, 𝐴𝑥⟩ ⟨1, 𝐴1⟩ 2𝑒(𝐺)
𝜆1 = sup ≥ = = avgdeg(𝐺). □
𝑛
𝑥 ∈R ⟨𝑥, 𝑥⟩ ⟨1, 1⟩ 𝑣(𝐺)
𝑥≠0

Proof that C4 implies EIG. Again writing 𝐴 for the adjacency matrix,
∑︁
𝑛
𝜆4𝑖 = tr 𝐴4 = # {closed walks of length 4} ≤ 𝑝 4 𝑛4 + 𝑜(𝑛4 ).
𝑖=1

On the other hand, by Lemma 3.1.20 above, we have 𝜆1 ≥ 𝑝𝑛 + 𝑜(𝑛). So we must have
𝜆1 = 𝑝𝑛 + 𝑜(𝑛) and max𝑖 ≥2 |𝜆𝑖 | = 𝑜(𝑛). □
This completes all the implications in the proof of Theorem 3.1.1.

Additional remarks
Remark 3.1.21 (Forcing graphs). The C4 hypothesis says that having 4-cycle density
asymptotically the same as random implies quasirandomness. Which other graphs besides
𝐶4 have this property?
Chung, Graham, & Wilson (1989) called a graph 𝐹 forcing if every graph with edge
density 𝑝 + 𝑜(1) and 𝐹-density 𝑝 𝑒 (𝐹 ) + 𝑜(1) (i.e., asymptotically the same as random) is
automatically quasirandom. Theorem 3.1.1 implies that 𝐶4 is forcing. Here is a conjectural
characterization of forcing graphs (Skokan & Thoma 2004; Conlon, Fox, & Sudakov 2010).

Conjecture 3.1.22 (Forcing conjecture)

A graph is forcing if and only if it is bipartite and not a tree.

We will revisit this conjecture in Chapter 5 where we will reformulate it using the language
of graphons.
More generally, one says that a family of graphs F is forcing if having 𝐹-density being
𝑝 𝑒 (𝐹 ) + 𝑜(1) for each 𝐹 ∈ F implies quasirandomness. So {𝐾2 , 𝐶4 } is forcing. It seems to
be a difficult problem to classify forcing families.
Even though many other graphs can potentially play the role of the 4-cycle, the 4-cycle
100 Pseudorandom Graphs

nevertheless occupies an important role in the study of quasirandomness. The 4-cycle comes
up naturally in the proofs, as we will see below. It also is closely tied to other impor-
tant pseudorandomness measurements such as the Gowers 𝑈 2 uniformity norm in additive
combinatorics.
Let us formulate a bipartite analogue of Theorem 3.1.1 since we will need it later. It is
easy to adapt the above proofs to the bipartite version—we encourage the readers to think
about the differences between the two settings.
Remark 3.1.23 (Eigenvalues of bipartite graphs). Given a bipartite graph 𝐺 with vertex
bipartition 𝑉 ∪ 𝑊, we can write its adjacency matrix as

0 𝐵
𝐴= (3.2)
𝐵⊺ 0
where 𝐵 is an |𝑉 | × |𝑊 | matrix with rows indexed by 𝑉 and columns indexed by 𝑊. The
eigenvalues 𝜆1 ≥ · · · ≥ 𝜆 𝑛 of 𝐴 always satisfy
𝜆𝑖 = 𝜆 𝑛+1−𝑖 for every 1 ≤ 𝑖 ≤ 𝑛.
In other words, the eigenvalues are symmetric around zero. One way to see this is that if
𝑥 = (𝑣, 𝑤) is an eigenvector of 𝐴, where 𝑣 ∈ R𝑉 is the restriction of 𝑥 to the first |𝑉 |
coordinates, and 𝑤 is the restriction of 𝑥 to the last |𝑊 | coordinates, then

𝜆𝑣 0 𝐵 𝑣 𝐵𝑤
= 𝜆𝑥 = 𝐴𝑥 = = ,
𝜆𝑤 𝐵⊺ 0 𝑤 𝐵⊺ 𝑣
so that
𝐵𝑤 = 𝜆𝑣 and 𝐵 ⊺ 𝑣 = 𝜆𝑤.
Then the vector 𝑥 ′ = (𝑣, −𝑤) satisfies

0 𝐵 𝑣 −𝐵𝑤 −𝜆𝑣
𝐴𝑥 ′ = = = = −𝜆𝑥 ′ .
𝐵⊺ 0 −𝑤 𝐵⊺ 𝑣 𝜆𝑤
So we can pair each eigenvalue of 𝐴 with its negation.
Exercise 3.1.24. Using the notation from (3.2), show that the positive eigenvalues of the
adjacency matrix 𝐴 coincide with the positive singular values of 𝐵 (the singular values of
𝐵 are also the positive square roots of the eigenvalues of 𝐵 ⊺ 𝐵).

Theorem 3.1.25 (Bipartite quasirandom graphs)

Fix 𝑝 ∈ [0, 1]. Let (𝐺 𝑛 ) 𝑛≥1 be a sequence of bipartite graphs 𝐺 𝑛 . Write 𝐺 𝑛 as 𝐺, with
vertex bipartition 𝑉 ∪𝑊. Suppose |𝑉 | , |𝑊 | → ∞ and |𝐸 | = ( 𝑝 + 𝑜(1)) |𝑉 | |𝑊 | as 𝑛 → ∞.
The following properties are all equivalent:
DISC 𝑒(𝑋, 𝑌 ) = 𝑝 |𝑋 | |𝑌 | + 𝑜(𝑛2 ) for all 𝑋 ⊆ 𝑉 and 𝑌 ⊆ 𝑊.
COUNT For every bipartite graph 𝐻 with vertex bipartition (𝑆, 𝑇), the number of
labeled copies of 𝐻 in 𝐺 with 𝑆 embedded in 𝑉 and 𝑇 embedded in 𝑊 is ( 𝑝 𝑒 (𝐻 ) +
𝑜(1)) |𝑉 | |𝑆 | |𝑊 | |𝑇 | .
3.1 Quasirandom Graphs 101

C4 The number of closed walks of length 4 in 𝐺 starting in 𝑉 is at most ( 𝑝 4 +

𝑜(1)) |𝑉 | 2 |𝑊 | 2 .
Í
Left-CODEG 𝑥,𝑦 ∈𝑉 codeg(𝑥, 𝑦) − 𝑝 2 |𝑊 | = 𝑜(|𝑉 | 2 |𝑊 |).
Í
Right-CODEG 𝑥,𝑦 ∈𝑊 codeg(𝑥, 𝑦) − 𝑝 2 |𝑉 | = 𝑜(|𝑉 | |𝑊 | 2 ).
√︁
EIG The adjacency √︁ matrix of 𝐺 has top eigenvalue ( 𝑝 + 𝑜(1)) |𝑉 | |𝑊 | and second
largest eigenvalue 𝑜( |𝑉 | |𝑊 |).

The bipartite discrepancy condition DISC is equivalent to being an 𝑜(1)-regular pair

(Definition 2.1.2, Exercise 2.1.5).
Remark 3.1.26 (Bipartite double cover). Theorem 3.1.25 implies the non-bipartite version
Theorem 3.1.1, since every graph 𝐺 can be transformed into a bipartite graph 𝐺 × 𝐾2 (a
graph tensor power) whose two vertex parts are both copies of 𝑉 (𝐺). Each edge 𝑢 ∼ 𝑣 of 𝐺
lifts to two edges (𝑢, 0) ∼ (𝑣, 1) and (𝑢, 1) ∼ (𝑣, 0) in 𝐺 × 𝐾2 . An example is shown below.

𝐺 𝐺 × 𝐾2

Exercise 3.1.27. Show that a graph 𝐺 satisfies each property in Theorem 3.1.1 if and
only if 𝐺 × 𝐾2 satisfies the corresponding bipartite property in Theorem 3.1.25.
Like earlier, random bipartite graphs are bipartite quasirandom. The proof (omitted) is
essentially the same as Proposition 3.1.8 and Corollary 3.1.9.

Proposition 3.1.28 (Random bipartite graphs are typically quasirandom)

Fix 𝑝 ∈ [0, 1]. With probability 1, a sequence of bipartite random graphs 𝐺 𝑛 ∼ G(𝑛, 𝑛, 𝑝)
(obtained by keeping every edge of 𝐾𝑛,𝑛 with probability 𝑝 independently) is quasirandom
in the sense of Theorem 3.1.25.

Remark 3.1.29 (Sparse graphs). We stated quasirandom properties so far only for graphs
of constant order density (i.e., 𝑝 is a constant). Let us think about what happens if we allow
𝑝 = 𝑝 𝑛 to depend on 𝑛 and decaying to zero as 𝑛 → ∞. Such graphs are sometimes called
sparse (although some other authors reserve the word “sparse” for bounded degree graphs).
Theorems 3.1.1 and 3.1.25 as stated do hold for a constant 𝑝 = 0, but the results are not as
informative as we would like. For example, the error tolerance on the DISC is 𝑜(𝑛2 ), which
does not tell us much since the graph already has much fewer edges due to its sparseness
anyway.
To remedy the situation, the natural thing to do is to adjust the error tolerance relative to
the edge density 𝑝 = 𝑝 𝑛 → 0. Here are some representative examples (all of these properties
should also depend on 𝑝):
SparseDISC |𝑒(𝑋, 𝑌 ) − 𝑝 |𝑋 | |𝑌 || = 𝑜( 𝑝𝑛2 ) for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺).
SparseCOUNT 𝐻 The number of labeled copies of 𝐻 is (1 + 𝑜(1)) 𝑝 𝑒 (𝐻 ) 𝑛𝑣 (𝐻 ) .
SparseC4 The number of labeled 4-cycles is at most (1 + 𝑜(1)) 𝑝 4 𝑛4 .
102 Pseudorandom Graphs

SparseEIG 𝜆1 = (1 + 𝑜(1)) 𝑝𝑛 and max𝑖≠1 |𝜆𝑖 | = 𝑜( 𝑝𝑛).

Warning: these sparse pseudorandomness conditions are not all equivalent to each other.
Some of the implications still hold (the reader is encouraged to think about which ones).
However, some crucial implications such as the counting lemma fail quite miserably. For
example:
SparseDISC does not imply SparseCOUNT.
Indeed, suppose 𝑝 = 𝑛 −𝑐 for some constant 1/2 < 𝑐 < 1. In a typical random graph
G(𝑛, 𝑝), the number of triangles is close to 𝑛3 𝑝 3 , while the number of edges is close to 𝑛2 𝑝.
We have 𝑝 3 𝑛3 = 𝑜( 𝑝𝑛2 ) as long as 𝑝 = 𝑜(𝑛 −1/2 ), so there are significantly fewer triangles
than there are edges. Now remove an edge from every triangle in this random graph. We will
2 𝑛
have removed 𝑜( 𝑝𝑛 ) edges, a negligible fraction of the ( 𝑝 + 𝑜(1)) 2 edges, and this edge
removal should not significantly affect SparseDISC. However, we have changed the triangle
count significantly as a result.
Fortunately, this is not the end of the story. With additional hypotheses on the sparse graph,
we can sometimes salvage a counting lemma. Sparse counting lemmas play an important
role in the proof of the Green–Tao theorem on arithmetic progressions in the primes, as we
will explain in Chapter 9.
The next three exercises ask you to prove additional equivalent quasirandomness proper-
ties. It is easy to verify that the quasirandom graphs indeed satisfy each of the properties
below.
Exercise 3.1.30 (Nearly optimal 𝐶4 -free graphs are sparse quasirandom). Let 𝐺 𝑛 be a
sequence of 𝑛-vertex 𝐶4 -free graphs with (1/2 − 𝑜(1))𝑛3/2 edges. Prove that 𝑒 𝐺𝑛 ( 𝐴, 𝐵) =
𝑛 −1/2 | 𝐴| |𝐵| + 𝑜(𝑛3/2 ) for every 𝐴, 𝐵 ⊆ 𝑉 (𝐺 𝑛 ).
Hint: Revisit the CODEG =⇒ DISC proof and the proof of the KST theorem (Theorem 1.4.2).

Exercise 3.1.31∗ (Quasirandomness through fixed sized subsets). Fix 𝑝 ∈ [0, 1]. Let
(𝐺 𝑛 ) be a sequence of graphs with 𝑣(𝐺 𝑛 ) = 𝑛 (here 𝑛 → ∞ along a subsequence of
integers).
(a) Fix a single 𝛼 ∈ (0, 1). Suppose
𝑝𝛼2 𝑛2
𝑒(𝑆) = + 𝑜(𝑛2 ) for all 𝑆 ⊆ 𝑉 (𝐺) with |𝑆| = ⌊𝛼𝑛⌋ .
2
Prove that 𝐺 is quasirandom.
(b) Fix a single 𝛼 ∈ (0, 1/2). Suppose
𝑒(𝑆, 𝑉 (𝐺) \ 𝑆) = 𝑝𝛼(1 − 𝛼)𝑛2 + 𝑜(𝑛2 ) for all 𝑆 ⊆ 𝑉 (𝐺) with |𝑆| = ⌊𝛼𝑛⌋ .
Prove that 𝐺 is quasirandom. Furthermore, show that the conclusion is false for
𝛼 = 1/2.

Exercise 3.1.32 (Quasirandomness and regularity partitions). Fix 𝑝 ∈ [0, 1]. Let (𝐺 𝑛 )
be a sequence of graphs with 𝑣(𝐺 𝑛 ) → ∞. Suppose that for every 𝜀 > 0, there exists
𝑀 = 𝑀 (𝜀) so that each 𝐺 𝑛 has an 𝜀-regular partition where all but 𝜀-fraction of vertex
pairs lie between pairs of parts with edge density 𝑝 + 𝑜(1) (as 𝑛 → ∞). Prove that 𝐺 𝑛 is
quasirandom.
3.2 Expander Mixing Lemma 103

Exercise 3.1.33∗ (Triangle counts on induced subgraphs). Fix 𝑝 ∈ (0, 1]. Let (𝐺 𝑛 ) be
a sequence of graphs with 𝑣(𝐺 𝑛 ) = 𝑛. Let 𝐺 = 𝐺 𝑛 . Suppose that for every 𝑆 ⊆ 𝑉 (𝐺),
the number of triangles in the induced subgraph 𝐺 [𝑆] is 𝑝 3 |𝑆3 | + 𝑜(𝑛3 ). Prove that 𝐺 is
quasirandom.

Exercise 3.1.34∗ (Perfect matchings). Prove that there are constant 𝛽, 𝜀 > 0 such that
for every positive even integer 𝑛 and real 𝑝 ≥ 𝑛 −𝛽 , if 𝐺 is an 𝑛-vertex graph where every
vertex has degree (1 ± 𝜀) 𝑝𝑛 (meaning within 𝜀 𝑝𝑛 of 𝑝𝑛) and every pair of vertices has
codegree (1 ± 𝜀) 𝑝 2 𝑛, then 𝐺 has a perfect matching.

3.2 Expander Mixing Lemma

We dive further into the relationship between graph eigenvalues and its pseudorandomness
properties. We focus on 𝑑-regular graphs since they occur often in practice (e.g., from Cayley
graphs), and they are also cleaner to work with. Unlike the previous section, the results here
are effective for any value of 𝑑 (not just when 𝑑 is on the same order as 𝑛).
As we saw earlier, the magnitudes of eigenvalues are related to the pseudorandomness of
a graph. In a 𝑑-regular graph, the top eigenvalue is always exactly 𝑑. The following condition
says that all other eigenvalues are bounded by 𝜆 in absolute value.

Definition 3.2.1 ( (𝑛, 𝑑, 𝜆) -graph)

An (𝒏, 𝒅, 𝝀)-graph is an 𝑛-vertex, 𝑑-regular graph whose adjacency matrix eigenvalues
𝑑 = 𝜆1 ≥ · · · ≥ 𝜆 𝑛 satisfy
max |𝜆𝑖 | ≤ 𝜆.
𝑖≠1

Remark 3.2.2 (Notation). Rather than saying “an (𝑛, 7, 6)-graph” we prefer to say “an
(𝑛, 𝑑, 𝜆)-graph with 𝑑 = 7 and 𝜆 = 6” for clarity as the name “(𝑛, 𝑑, 𝜆)” is quite standard
and recognizable.

Remark 3.2.3 (Linear algebra review). The operator norm of a matrix 𝐴 ∈ R𝑚×𝑛 is defined
by
| 𝐴𝑥| ⟨𝑦, 𝐴𝑥⟩
∥ 𝐴∥ = sup = sup .
𝑥 ∈R𝑛 \{0} |𝑥| 𝑛
𝑥 ∈R \{0} |𝑥| |𝑦|
𝑦 ∈R𝑚 \{0}
√︁
Here |𝑥| = ⟨𝑥, 𝑥⟩ denotes the length of vector 𝑥. The operator norm of 𝐴 is the maximum
ratio that 𝐴 can amplify the length of a vector by. If 𝐴 is a real symmetric matrix, then
∥ 𝐴∥ = max |𝜆𝑖 ( 𝐴)| .
𝑖

For general matrices, the operator norm of 𝐴 equals the largest singular value of 𝐴.
Here is the main result of this section.
104 Pseudorandom Graphs

Theorem 3.2.4 (Expander mixing lemma)

If 𝐺 is an (𝑛, 𝑑, 𝜆)-graph, then
𝑑 √︁
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ 𝜆 |𝑋 | |𝑌 | for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺).
𝑛

On the left-hand side, (𝑑/𝑛) |𝑋 | |𝑌 | is the number of edges that one should expect between
𝑋 and 𝑌 purely based on the edge density 𝑑/𝑛 of the graph and the sizes of 𝑋 and 𝑌 . Note
that unlike the discrepancy condition (DISC) from quasirandom graphs (Theorem 3.1.1),
the error bound on the right-side hand depends on the sizes of 𝑋 and 𝑌 . We can apply the
expander mixing lemma to small subsets 𝑋 and 𝑌 and still obtain useful estimates on 𝑒(𝑋, 𝑌 ),
unlike the dense quasirandom graph conditions.
Proof. Let 𝐽 be the 𝑛 × 𝑛 all-1 matrix. Since the all-1 vector 1 ∈ R𝑛 is an eigenvector of
𝐴𝐺 with eigenvalue 𝑑, we see that 1 is an eigenvector of 𝐴𝐺 − 𝑑𝑛 𝐽 with eigenvalue 0. Any
other eigenvector 𝑣 of 𝐴𝐺 , with 𝑣 ⊥ 1, satisfies 𝐽𝑣 = 0, and thus 𝑣 is also an eigenvector
of 𝐴𝐺 − 𝑑𝑛 𝐽 with the same eigenvalue as in 𝐴𝐺 . Therefore, the eigenvalues of 𝐴𝐺 − 𝑑𝑛 𝐽 are
obtained by taking the eigenvalues of 𝐴𝐺 then replacing one top eigenvalue 𝑑 by zero. All the
other eigenvalues of 𝐴𝐺 − 𝑑𝑛 𝐽 are therefore at most 𝜆 in absolute value, so 𝐴𝐺 − 𝑑𝑛 𝐽 ≤ 𝜆.
Therefore,

𝑑 𝑑
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | = 1𝑋 , 𝐴𝐺 − 𝐽 1𝑌
𝑛 𝑛
𝑑
≤ 𝐴𝐺 − 𝐽 |1𝑋 | |1𝑌 |
𝑛
√︁
≤ 𝜆 |𝑋 | |𝑌 |. □

Exercise 3.2.5. Prove the following strengthening the expander mixing lemma.

Theorem 3.2.6 (Expander mixing lemma – slightly strengthened)

If 𝐺 is an (𝑛, 𝑑, 𝜆)-graph, then
𝑑 𝜆 √︁
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ |𝑋 | (𝑛 − |𝑋 |) |𝑌 | (𝑛 − |𝑌 |) for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺).
𝑛 𝑛

We also have a bipartite analogue (the nomenclature used here is less standard). Recall
from Remark 3.1.23 that the eigenvalues of a bipartite graph are symmetric around zero.

Definition 3.2.7 (Bipartite- (𝑛, 𝑑, 𝜆) -graph)

An bipartite-(𝒏, 𝒅, 𝝀)-graph is a 𝑑-regular bipartite graph with 𝑛 vertices in each part,
such that its second largest eigenvalue is at most 𝜆.

Exercise 3.2.8. Show that 𝐺 is an (𝑛, 𝑑, 𝜆)-graph if and only if 𝐺 × 𝐾2 is a bipartite-

(𝑛, 𝑑, 𝜆)-graph.
3.2 Expander Mixing Lemma 105

Theorem 3.2.9 (Bipartite expander mixing lemma)

Let 𝐺 be a bipartite-(𝑛, 𝑑, 𝜆)-graph with vertex bipartition 𝑉 ∪ 𝑊. Then
𝑑 √︁
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ 𝜆 |𝑋 | |𝑌 | for all 𝑋 ⊆ 𝑉 and 𝑌 ⊆ 𝑊 .
𝑛

Exercise 3.2.10. Prove Theorem 3.2.9.

Remark 3.2.11. The following partial converse to the expander mixing lemma was shown
by Bilu & Linial (2006). The extra log factor turns out to be necessary.

Theorem 3.2.12 (Converse to expander mixing lemma)

There exists an absolute constant 𝐶 such that if 𝐺 is a 𝑑-regular graph, and 𝛽 satisfies
𝑑 √︁
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ 𝛽 |𝑋 | |𝑌 | for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺),
𝑛
then 𝐺 is an (𝑛, 𝑑, 𝜆)-graph with 𝜆 ≤ 𝐶 𝛽 log(2𝑑/𝛽).

Cheeger’s inequality: edge expansion vs. spectral gap

Given a graph and its adjacency matrix, the spectral gap is defined to be the difference
between the two most significant eigenvalues; that is, 𝜆1 − 𝜆2 . This quantity turns out to
be closely related to expansion in graphs. We define the edge-expansion ratio of a graph
𝐺 = (𝑉, 𝐸) to be the quantity
𝑒 𝐺 (𝑆, 𝑉 \ 𝑆)
𝒉(𝑮) := min .
𝑆 ⊆𝑉 |𝑆|
0<|𝑆 | ≤ |𝑉 |/2

In other words, a graph with edge-expansion ratio at least ℎ has the property that for every
nonempty subset of vertices 𝑆 with |𝑆| ≤ |𝑉 | /2, there are at least ℎ |𝑆| edges leaving 𝑆.
Cheeger’s inequality, stated below, tells us that among 𝑑-regular graphs for a fixed 𝑑,
having spectral gap bounded away from zero is equivalent to having edge-expansion ratio
bounded away from zero. Cheeger (1970) originally developed this inequality for Riemannian
manifolds. The graph theoretic analogue was proved by Dodziuk (1984), and independently
by Alon & Milman (1985) and Alon (1986).

Theorem 3.2.13 (Cheeger’s inequality)

Let 𝐺 be an 𝑛-vertex 𝑑-regular graph with adjacency matrix spectral gap 𝜅 = 𝑑 − 𝜆2 .
Then its edge-expansion ratio ℎ = ℎ(𝐺) satisfies
√
𝜅/2 ≤ ℎ ≤ 2𝑑𝜅.

The two bounds of Cheeger’s inequality are tight up to constant factors. For the lower
bound, taking 𝐺 to be the skeleton of the 𝑑-dimensional cube with vertex set {0, 1} 𝑑 gives
ℎ = 1 (achieved by the 𝑑−1 dimensional subcube) and 𝜅 = 2. For the upper bound, taking 𝐺 to
be an 𝑛-cycle gives ℎ = 2/(𝑛/2) = Θ(1/𝑛) while 𝑑 = 2 and 𝜅 = 2 − 2 cos(2𝜋/𝑛)) = Θ(1/𝑛2 ).
We call a family of 𝑑-regular graphs expanders if there is some constant 𝜅 0 > 0 so that
106 Pseudorandom Graphs

each graph in the family has spectral gap ≥ 𝜅0 ; by Cheeger’s inequality, this is equivalent to
the existence of some ℎ0 > 0 so that each graph in the family has edge expansion ratio ≥ ℎ0 .
Expander graphs are important objects in mathematics and computer science. For example,
expander graphs have rapid mixing properties, which are useful for designing efficient Monte
Carlo algorithms for sampling and estimation.
The following direction of Cheeger’s inequality is easier to prove. It is similar to the
expander mixing lemma.
Exercise 3.2.14 (Spectral gap implies expansion). Prove the 𝜅/2 ≤ ℎ part of Cheeger’s
inequality.
√
The other direction, ℎ ≤ 2𝑑𝜅, is more difficult and interesting. The proof is outlined in
the following exercise.
Exercise 3.2.15 (Expansion implies spectral gap). Let 𝐺 = (𝑉, 𝐸) be a connected 𝑑-
regular graph with spectral gap 𝜅. Let 𝑥 = (𝑥 𝑣 ) 𝑣 ∈𝑉 ∈ R𝑉 be an eigenvector associated to
the second largest eigenvalue 𝜆2 = 𝑑 − 𝜅 of the adjacency matrix of 𝐺. Assume that 𝑥 𝑣 > 0
on at most half of the vertex set (or else we replace 𝑥 by −𝑥). Let 𝑦 = (𝑦 𝑣 ) 𝑣 ∈𝑉 ∈ R𝑉 be
obtained from 𝑥 by replacing all its negative coordinates by zero.
(a) Prove that
⟨𝑦, 𝐴𝑦⟩
𝑑− ≤ 𝜅.
⟨𝑦, 𝑦⟩
Í
𝑢∼𝑣 𝑥𝑢 . Hint: Recall that 𝜆2 𝑥𝑣 =

(b) Let
∑︁
Θ= 𝑦 2𝑢 − 𝑦 2𝑣 .
𝑢𝑣 ∈𝐸

Prove that
Θ2 ≤ 2𝑑 (𝑑 ⟨𝑦, 𝑦⟩ − ⟨𝑦, 𝐴𝑦⟩) ⟨𝑦, 𝑦⟩ .
2 − 𝑦 2 = ( 𝑦 − 𝑦 ) ( 𝑦 + 𝑦 ) . Apply Cauchy–Schwarz.
Hint: 𝑦𝑢 𝑣 𝑢 𝑣 𝑢 𝑣

(c) Relabel the vertex set 𝑉 by [𝑛] so that 𝑦 1 ≥ 𝑦 2 · · · ≥ 𝑦 𝑡 > 0 = 𝑦 𝑡+1 = · · · = 𝑦 𝑛 .

Prove
∑︁𝑡
Θ= (𝑦 2𝑘 − 𝑦 2𝑘+1 ) 𝑒( [𝑘], [𝑛] \ [𝑘]).
𝑘=1

(d) Prove that for some 1 ≤ 𝑘 ≤ 𝑡,

𝑒( [𝑘], [𝑛] \ [𝑘]) Θ
≤ .
𝑘 ⟨𝑦, 𝑦⟩
√
(e) Prove the ℎ ≤ 2𝑑𝜅 claim of Cheeger’s inequality.

Exercises
Exercise 3.2.16 (Independence numbers). Prove that every independent set in a (𝑛, 𝑑, 𝜆)-
graph has size at most 𝑛𝜆/(𝑑 + 𝜆).
3.3 Abelian Cayley Graphs and Eigenvalues 107

Exercise 3.2.17 (Diameter). Prove that the diameter of an (𝑛, 𝑑, 𝜆)–graph is at most
⌈log 𝑛/log(𝑑/𝜆)⌉. (The diameter of a graph is the maximum distance between a pair of
vertices.)

Exercise 3.2.18 (Counting cliques). For each part below, prove that for every 𝜀 > 0, there
exists 𝛿 > 0 such that the conclusion holds for every (𝑛, 𝑑, 𝜆)-graph 𝐺 with 𝑑 = 𝑝𝑛.
(a) If 𝜆 ≤ 𝛿 𝑝 2 𝑛, then the number of triangles of 𝐺 is within a 1 ± 𝜀 factor of 𝑝 3 𝑛3 .
(b*) If 𝜆 ≤ 𝛿 𝑝 3 𝑛, then the number of 𝐾4 ’s in 𝐺 is within a 1 ± 𝜀 factor of 𝑝 6 𝑛4 .

3.3 Abelian Cayley Graphs and Eigenvalues

Many important constructions of pseudorandom graphs come from groups.

Definition 3.3.1 (Cayley graph)

Let Γ be a finite group, and let 𝑆 ⊆ Γ be a subset with 𝑆 = 𝑆 −1 (i.e., 𝑠 −1 ∈ 𝑆 for all 𝑠 ∈ 𝑆)
and not containing the identity element. We write Cay(𝚪, 𝑺) to denote the Cayley graph
on Γ generated by 𝑆, which has elements of Γ as vertices, and
𝑔 ∼ 𝑔𝑠 for all 𝑔 ∈ Γ and 𝑠 ∈ 𝑆.
as edges.

In this section, we only consider abelian groups, specifically Z/𝑝Z for concreteness (though
everything here generalizes easily to all finite abelian groups). For abelian groups, we write
the group operation additively as 𝑔 + 𝑠. So edges join elements whose difference lies in 𝑆.
Remark 3.3.2. In later sections when we consider a non-abelian group Γ, one needs to
make a choice whether to define edges by left- or right-multiplication (i.e., 𝑔𝑠 or 𝑠𝑔; we
chose 𝑔𝑠 here). It does not matter which choice one makes (as long as one is consistent) since
the resulting Cayley graphs are isomorphic (why?). However, some careful bookkeeping is
sometimes required to make sure that later computations are consistent with the initial choice.
Example 3.3.3. Cay(Z/𝑛Z, {−1, 1}) is a cycle of length 𝑛. The graph for 𝑛 = 8 is shown
below.

Example 3.3.4. Cay(F2𝑛 , {𝑒 1 , . . . , 𝑒 𝑛 }) is the skeleton of an 𝑛-dimensional cube. Here 𝑒 𝑖 is

the 𝑖-th standard basis vector. The graphs for 𝑛 = 1, 2, 3, 4 are illustrated below..
108 Pseudorandom Graphs

Here is an explicitly constructed family of quasirandom graphs with edge density 1/2 +
𝑜(1).

Definition 3.3.5 (Paley graph)

Let 𝑝 ≡ 1 (mod 4) be a prime. The Paley graph of order 𝑝 is Cay(Z/𝑝Z, 𝑆), where 𝑆
is the set of non-zero quadratic residues in Z/𝑝Z (here Z/𝑝Z is viewed as an additive
group).

Example 3.3.6. The Paley graphs for 𝑝 = 5 and 𝑝 = 13 are shown below.

12 0 1
0
11 2
4 1 10 3
9 4
8 5
3 2 7 6
Cay(Z/5Z, {±1}) Cay(Z/13Z, {±1, ±3, ±4})
Remark 3.3.7 (Quadratic residues). Here we
2 recall some facts from elementary number
theory. For every odd prime 𝑝, the set 𝑆 = 𝑎 : 𝑎 ∈ F 𝑝 of quadratic residues is a multi-
×

plicative subgroup of F×𝑝 with index two. In particular, |𝑆| = ( 𝑝 − 1)/2. We have −1 ∈ 𝑆 if
and only if 𝑝 ≡ 1 (mod 4) (which is required to define a Cayley graph, as the generating set
needs to be symmetric in the sense that 𝑆 = −𝑆).
We will show that Paley graphs are quasirandom by verifying the EIG condition, which
says that all eigenvalues, except the top one, are small. Here is a general formula for computing
the eigenvalues of any Cayley graph on Z/𝑝Z.

Theorem 3.3.8 (Eigenvalues of abelian Cayley graphs on Z/𝑛Z)

Let 𝑛 be a positive integer. Let 𝑆 ⊆ Z/𝑛Z with 0 ∉ 𝑆 and 𝑆 = −𝑆. Let
𝜔 = exp(2𝜋𝑖/𝑛).
Then we have an orthonormal basis 𝑣 0 , . . . , 𝑣 𝑛−1 ∈ C𝑛 of eigenvectors of Cay(Z/𝑛Z, 𝑆)
where
√
𝑣 𝑗 ∈ C𝑛 has 𝑥-coordinate 𝜔 𝑗 𝑥 / 𝑛, for each 𝑥 ∈ Z/𝑛Z.
The eigenvalue associated to the eigenvector 𝑣 𝑗 equals to
∑︁
𝜆𝑗 = 𝜔 𝑗𝑠 .
𝑠∈𝑆
√
In particular, 𝜆0 = |𝑆| and 𝑣 0 has all coordinates 1/ 𝑛.

Remark 3.3.9 (Eigenvalues and the Fourier transform). The coordinates of the eigenvec-
3.3 Abelian Cayley Graphs and Eigenvalues 109

tors are shown below.

Z/𝑛Z
0 1 2 ··· 𝑛−1
√
𝑛𝑣 1 1 1 ··· 1
√ 0
𝑛𝑣 1 𝜔 𝜔2 ··· 𝜔 𝑛−1
√ 1
𝑛 𝑣2 1 𝜔2 𝜔4 ··· 𝜔2(𝑛−1)
.. .. .. .. .. ..
. . . . . .
√ 2
𝑛 𝑣 𝑛−1 1 𝜔 𝑛−1 𝜔2(𝑛−1) ··· 𝜔 (𝑛−1)
Viewed as a matrix, this is sometimes known as the discrete Fourier transform matrix.
We will study the Fourier transform in Chapter 6. These two topics are closely tied. The
eigenvalues of an abelian Cayley graph Cay(Γ, 𝑆) are precisely the Fourier transform in Γ of
the generating set 𝑆, up to normalizing factors:
eigenvalues of Cay(Γ, 𝑆) ←→ Fourier transform 1b𝑆 in Γ.
We will say more about this in Remark 3.3.11 below.
Proof. Let 𝐴 be the adjacency matrix of Cay(Z/𝑛Z, 𝑆). First we check that each 𝑣 𝑗 is an
√
eigenvector of 𝐴 with eigenvalue 𝜆 𝑗 . The coordinate of 𝑛𝐴𝑣 𝑗 at 𝑥 ∈ Z/𝑛Z equals to
∑︁ ∑︁
𝑗 ( 𝑥+𝑠)
𝜔 = 𝜔 𝜔 𝑗𝑥 = 𝜆 𝑗𝜔 𝑗𝑥.
𝑗𝑠

𝑠∈𝑆 𝑠∈𝑆

So 𝐴𝑣 𝑗 = 𝜆 𝑗 𝑣 𝑗 .
Next we check that {𝑣 0 , . . . , 𝑣 𝑛−1 } is an orthonormal basis. We have the inner product
1
⟨𝑣 𝑗 , 𝑣 𝑘 ⟩ = 1 · 1 + 𝜔 𝑗 𝜔 𝑘 + 𝜔2 𝑗 𝜔2𝑘 + · · · + 𝜔 (𝑛−1) 𝑗 𝜔 (𝑛−1) 𝑘
𝑛 (
1 2(𝑘− 𝑗 )
1 if 𝑗 = 𝑘,
= 1+𝜔 𝑘− 𝑗
+𝜔 +···+𝜔 (𝑛−1) (𝑘− 𝑗 )
= .
𝑛 0 if 𝑗 ≠ 𝑘.
Í
For the 𝑖 ≠ 𝑗 case, we use that for any 𝑚-th root of unity 𝜁 ≠ 1, 𝑚−1 𝑗=0 𝜁 = 0. So {𝑣 0 , . . . , 𝑣 𝑛−1 }
𝑗

is an orthonormal basis. □
Remark 3.3.10 (Real vs complex eigenbases). The adjacency matrix of a graph is a real
symmetric matrix, so all its eigenvalues are real, and it always has a real orthogonal eigenbasis.
The eigenbasis given in Theorem 3.3.8 is complex, but it can always be made real. Looking
at the formulas in Theorem 3.3.8, we have 𝜆 𝑗 = 𝜆 𝑛− 𝑗 , and 𝑣 𝑗 is the complex conjugate of
𝑣 𝑛− 𝑗 . So we can form a real orthogonal√ eigenbasis √ by replacing, for each 𝑗 ∉ {0, 𝑛/2}, the
pair (𝑣 𝑗 , 𝑣 𝑛− 𝑗 ) by ((𝑣 𝑗 + 𝑣 𝑛− 𝑗 )/ 2, 𝑖(𝑣 𝑗 − 𝑣 𝑛− 𝑗 )/ 2). Equivalently, we can separate the real
and imaginary parts of each 𝑣 𝑗 , which are both eigenvectors with eigenvalue 𝜆 𝑗 . All the real
eigenvalues and eigenvectors can be expressed in terms of sines and cosines.
Remark 3.3.11 (Every abelian Cayley graph has an eigenbasis independent of the gen-
erators). The above theorem and its proof generalizes to all finite abelian groups, not just
Z/𝑛Z. For every finite abelian group Γ, we have a set b
Γ of characters, where each character
is a homomorphism 𝜒 : Γ → C× . Then b Γ turns out to be a group isomorphic to Γ (one can
110 Pseudorandom Graphs

check this by first writing Γ as a direct product of cyclic groups).√︁For each 𝜒 ∈ bΓ, define
the vector 𝑣 𝜒 ∈ C by setting the coordinate at 𝑔 ∈ Γ to be 𝜒(𝑔)/ |Γ|. Then {𝑣 𝜒 : 𝜒 ∈ b
Γ
Γ}
is an orthonormal basis for the adjacency matrix of every Cayley graph on Γ. The eigen-
Í
value corresponding to 𝑣 𝜒 is 𝜆 𝜒 (𝑆) = 𝑠∈𝑆 𝜒(𝑠). Up to normalization, 𝜆 𝜒 (𝑆) is the Fourier
transform of the indicator function of 𝑆 on the abelian group Γ (Theorem 3.3.8 is a special
case of this construction). In particular, this eigenbasis {𝑣 𝜒 : 𝜒 ∈ b
Γ} depends only on the
finite abelian group and not on the generating set 𝑆. In other words, we have a simultaneous
diagonalization for all adjacency matrices of Cayley graphs on a fixed finite abelian group.
If Γ is a non-abelian group, then there does not exist a simultaneous eigenbasis for all
Cayley graphs on Γ. There is a corresponding theory of non-abelian Fourier analysis, which
uses group representation theory. We will discuss more about non-abelian Cayley graphs in
Section 3.4.
Now we apply the above formula to compute eigenvalues of Paley graphs. In particular,
the following tells us that Paley graphs satisfy the quasirandomness condition EIG from
Theorem 3.1.1.

Theorem 3.3.12 (Eigenvalues of Paley graphs)

Let 𝑝 ≡ 1 (mod 4) be a prime. The adjacency of matrix of the Paley graph of order 𝑝 has
√ √
top eigenvalue ( 𝑝 − 1)/2, and all other eigenvalues are either ( 𝑝 − 1)/2 or (− 𝑝 − 1)/2.

Proof. Applying Theorem 3.3.8, we see that the eigenvalues are given by, for 𝑗 = 0, 1, . . . , 𝑝−
1,
∑︁ ∑︁
𝑗𝑠 1 𝑗 𝑥2
𝜆𝑗 = 𝜔 = −1 + 𝜔 ,
𝑠∈𝑆
2 𝑥 ∈F 𝑝

since each quadratic residue 𝑠 appears as 𝑥 2 for exactly two non-zero 𝑥. Clearly 𝜆0 = ( 𝑝−1)/2.
√
For 𝑗 ≠ 0, the next result shows that the inner sum on the right-hand side is ± 𝑝 (note that
the above sum is real when 𝑝 ≡ 1 (mod 4) since 𝑆 = 𝑆 −1 and so the sum equals to its own
complex conjugate; alternatively, the sum must be real since all eigenvalues of a symmetric
matrix are real). □
Remark 3.3.13. Since the trace of the adjacency matrix is zero, and equals the sum of
√
eigenvalues, we see that the non-top eigenvalues are equally split between ( 𝑝 − 1)/2 and
√
(− 𝑝 − 1)/2.

Theorem 3.3.14 (Gauss sum)

Let 𝑝 be an odd prime, 𝜔 = exp(2𝜋𝑖/𝑝), and 𝑗 ∈ F 𝑝 \ {0}. Then
∑︁ 2 √
𝜔 𝑗 𝑥 = 𝑝.
𝑥 ∈F 𝑝

Proof. We have
∑︁ 2
2 ∑︁ 2
∑︁
− 𝑥2 ) 2
𝜔 𝑗𝑥 = 𝜔 𝑗 ( ( 𝑥+𝑦) = 𝜔 𝑗 (2𝑥 𝑦+𝑦 ) .
𝑥 ∈F 𝑝 𝑥,𝑦 ∈Z/ 𝑝Z 𝑥,𝑦 ∈Z/ 𝑝Z
3.4 Quasirandom Groups 111

For each fixed 𝑦, we have

(
∑︁ 𝑝 if 𝑦 = 0,
𝑗 (2𝑥 𝑦+𝑦 2 )
𝜔 =
𝑥 ∈Z/ 𝑝Z
0 if 𝑦 ≠ 0.
Summing over 𝑦 yields the claim. □
Remark 3.3.15 (Sign of the Gauss sum). The determination of this sign is a more difficult
problem. Gauss conjectured the sign in 1801 and it took him four years to prove it. When 𝑗
√
is a nonzero quadratic residue mod 𝑝, the inner sum above turns out to equal 𝑝 if 𝑝 ≡ 1
√ √ √
(mod 4) and 𝑖 𝑝 if 𝑝 ≡ 3 (mod 4). When 𝑗 is a quadratic non-residue, it is − 𝑝 and −𝑖 𝑝
in the two cases respectively. See Ireland & Rosen (1990, Section 6.4) for a proof.
Exercise 3.3.16. Let 𝑝 be an odd prime and 𝐴, 𝐵 ⊆ Z/𝑝Z. Show that
∑︁ ∑︁ 𝑎 + 𝑏 √︁
≤ 𝑝 | 𝐴| |𝐵|
𝑎∈ 𝐴 𝑏∈ 𝐵
𝑝
where (𝑎/𝑝) is the Legendre symbol defined by


0
 if 𝑎 ≡ 0 (mod 𝑝)
𝑎 

= 1 if 𝑎 is a nonzero quadratic residue mod 𝑝


 −1 if 𝑎 is a quadratic nonresidue mod 𝑝
𝑝

√
Exercise 3.3.17. Prove that in a Paley graph of order 𝑝, every clique has size at most 𝑝.

Exercise 3.3.18 (No spectral gap if too few generators). Prove that for every 𝜀 > 0 there
is some 𝑐 > 0 such that for every 𝑆 ⊆ Z/𝑛Z with 0 ∉ 𝑆 = −𝑆 and |𝑆| ≤ 𝑐 log 𝑛, the second
largest eigenvalue of the adjacency matrix of Cay(Z/𝑛Z, 𝑆) is at least (1 − 𝜀) |𝑆|.

Exercise 3.3.19∗. Let 𝑝 be a prime and let 𝑆 be a multiplicative subgroup of F×𝑝 . Suppose
−1 ∈ 𝑆. Prove that all eigenvalues of the adjacency matrix of Cay(Z/𝑝Z, 𝑆), other than the
√
top one, are at most 𝑝 in absolute value.

3.4 Quasirandom Groups

In the previous section, we saw that certain Cayley graphs on cyclic groups are quasirandom.
Note that not all Cayley graphs on cyclic groups are quasirandom. For example, the Cayley
graph with Γ = Z/𝑛Z and 𝑆 = {𝑥 : |𝑥| ≤ 𝑛/4} ⊆ Z/𝑛Z is not quasirandom.
In this section, we will see that for certain families of non-abelian groups, every Cayley
graph on the group is quasirandom, regardless of the Cayley graph generators. Gowers (2008)
called such groups quasirandom groups, and showed that they are precisely groups with no
small non-trivial representations. He came up with this notion while solving the following
problem about product-free sets in groups.

Question 3.4.1 (Product-free subset of groups)

Given a group of order 𝑛, what is the size of its largest product-free subset? Is it always
≥ 𝑐𝑛 for some constant 𝑐 > 0?
112 Pseudorandom Graphs

Remark 3.4.2 (Representations of finite groups). We need some basic concepts from group
representation theory in this section—mostly just some definitions. Feel free to skip this
remark if you have already seen group representations before.
Given a finite group Γ, it is often useful to study its actions as linear transformations on
some vector space. For example, if Γ is a cyclic or dihedral group, it is natural to think of
elements of Γ as rotations and reflection of a plane, which are linear transformations on R2 .
The theory turns out to be much nicer over C than R since C is algebraically closed. We are
interested in ways that Γ can be represented as a group of linear transformations acting on
some C𝑑 .
A representation of a finite group Γ is a group homomorphism 𝜌 : Γ → GL(𝑉), where
𝑉 is a complex vector space (everything will take place over C) and GL(𝑉) is the group of
invertible linear transformations of 𝑉. We sometimes omit 𝜌 from the notation and just say
that 𝑉 is a representation of Γ, and also that Γ acts on 𝑉 (via 𝜌). For each 𝑔 ∈ Γ and 𝑣 ∈ 𝑉,
we write 𝑔𝑣 = 𝜌(𝑔)𝑣 for the image of the 𝑔-action on 𝑣. We write dim 𝜌 = dim 𝑉 for the
dimension of the representation.
The fact that 𝜌 : Γ → GL(𝑉) is a group homomorphism means that the action of Γ on 𝑉 is
compatible with group operations in Γ in the following sense: if 𝑔, ℎ ∈ Γ, then the expression
𝑔ℎ𝑥 does not depend on whether we first apply ℎ to 𝑥 and then 𝑔 to ℎ𝑥, or if we first multiply
𝑔 and ℎ in Γ and then apply their product 𝑔ℎ to 𝑥.
For example, suppose Γ is a subgroup of permutations of [𝑛], with each element 𝑔 ∈ Γ
viewed as a permutation 𝑔 : [𝑛] → [𝑛]. We can define a representation of Γ on C𝑛 by letting
Γ permute the coordinates: for any 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ) ∈ C𝑛 , set 𝑔𝑥 = (𝑥 𝑔 (1) , . . . , 𝑥 𝑔 (𝑛) ). As
an element of GL(𝑛, C), 𝜌(𝑔) is the 𝑛 × 𝑛 permutation matrix of the permutation 𝑔, and
𝑔𝑥 = 𝜌(𝑔)𝑥 for each 𝑥 ∈ C𝑛 .
We say that the representation 𝑉 of Γ is trivial if 𝑔𝑣 = 𝑣 for all 𝑔 ∈ Γ and 𝑣 ∈ 𝑉, and
non-trivial otherwise.
We say that a subspace 𝑊 of 𝑉 is 𝚪-invariant if 𝑔𝑤 ∈ 𝑊 for all 𝑤 ∈ 𝑊. In other words,
the image of 𝑊 under Γ is contained in 𝑊 (and actually must equal 𝑊 due to the invertibility
of group elements). Then 𝑊 is a representation of Γ, and we call it a subrepresentation of 𝑉.
For an introduction to group representation theory, see any standard textbook such as
the classic Linear Representations of Finite Groups by Serre (1977) is a classic. Also, the
lectures notes titled Representation Theory of Finite Groups, and Applications by Wigderson
(2012) is a friendly introduction with applications to combinatorics and theoretical computer
science.
Recall from Definition 3.2.1 that an (𝒏, 𝒅, 𝝀)-graph is an 𝑛-vertex 𝑑-regular graph all of
whose eigenvalues, except the top one, are at most 𝜆 in absolute value.
The main theorem of this section, below, says that a group with no small non-trivial
representations always produces quasirandom Cayley graphs (Gowers 2008).

Theorem 3.4.3 (Cayley graphs on quasirandom groups)

Let Γ be a group of order 𝑛 with no non-trivial representations of dimension √︁
less than 𝐾.
Then every 𝑑-regular Cayley graph on Γ is an (𝑛, 𝑑, 𝜆)-graph for some 𝜆 < 𝑑𝑛/𝐾.

Remark 3.4.4 (Abelian groups and one-dimensional representations). If Γ is abelian, then

it has many one-dimensional non-trivial representations, namely coming its multiplicative
3.4 Quasirandom Groups 113

characters. For example, if Γ = Z/𝑛Z, then the map 𝜌 : Γ → C× sending 𝑔 ∈ Z/𝑛Z to 𝜔𝑔 ,

where 𝜔 is some non-trivial root of unity, is a non-trivial one-dimensional representation.
In fact, one can vary 𝜔 over all roots of unity to obtain all non-isomorphic one-dimensional
representations of Γ.
So the hypothesis of having no low dimensional non-trivial representations can be viewed
as a statement that the group is highly non-abelian in some sense.
A representation is irreducible if it contains no subrepresentations other than itself and
the zero-dimensional subrepresentation. Irreducible representations are the basic building
blocks of group representations, and so understanding all irreducible representations of a
group is a fundamental objective. Among finite groups, a group is abelian if and only if all
its irreducible representations are one-dimensional.
More generally we will prove the result for vertex-transitive groups, of which Cayley
graphs is a special case.

Definition 3.4.5 (Vertex-transitive graphs)

Let 𝐺 be a graph. An automorphism of 𝐺 is a permutation of 𝑉 (𝐺) that induces
an isomorphism of 𝐺 to itself (i.e., sending edges to edges). Let Γ be a group of
automorphisms of 𝐺 (not necessarily the whole automorphism group). We say that 𝚪
acts vertex-transitively on 𝑮 if for every pair 𝑣, 𝑤 ∈ 𝑉 (𝐺) there is some 𝑔 ∈ Γ such
that 𝑔𝑣 = 𝑤. We that 𝐺 is a vertex-transitive graph if the automorphism group of 𝐺 acts
vertex-transitively on 𝐺.

In particular, every group Γ acts vertex-transitively on its Cayley graph Cay(Γ, 𝑆) by

left-multiplication: the action of 𝑔 ∈ Γ sends each vertex 𝑥 ∈ Γ to 𝑔𝑥 ∈ Γ, which sends each
edge (𝑥, 𝑥𝑠) to (𝑔𝑥, 𝑔𝑥𝑠), for all 𝑥 ∈ Γ and 𝑠 ∈ 𝑆.

Theorem 3.4.6 (Vertex-transitive graphs and quasirandom groups)

Let Γ be a finite group with no non-trivial representations of dimension less than 𝐾. Then
√︁
every 𝑛-vertex 𝑑-regular graph that admits a vertex-transitive Γ action is an (𝑛, 𝑑, 𝜆)-
graph with 𝜆 < 𝑑𝑛/𝐾.
√︁ √
Note that 𝑑𝑛/𝐾 ≤ 𝑛/ 𝐾, so that a sequence of such Cayley graphs is quasirandom
(Definition 3.1.2) as long as 𝐾 → ∞ as 𝑛 → ∞.
Proof. Let 𝐴 denote the adjacency matrix of the graph, whose vertices are indexed by
{1, . . . , 𝑛}. Each 𝑔 ∈ Γ gives a permutation (𝑔(1), . . . , 𝑔(𝑛)) of the vertex set, which induces
a representation of Γ on C𝑛 given by permuting coordinates, sending 𝑣 = (𝑣 1 , . . . , 𝑣 𝑛 ) ∈ C𝑛
to 𝑔𝑣 = (𝑣 𝑔 (1) , . . . , 𝑣 𝑔 (𝑛) ).
We know that the all-1 vector 1 is an eigenvector of 𝐴 with eigenvalue 𝑑. Let 𝑣 ∈ R𝑛 be
an eigenvector of 𝐴 with eigenvalue 𝜇 such that 𝑣 ⊥ 1. Since each 𝑔 ∈ Γ induces a graph
automorphism, 𝐴𝑣 = 𝜇𝑣 implies 𝐴(𝑔𝑣) = 𝜇𝑔𝑣 (check this claim! Basically it is because 𝑔
relabels vertices in an isomorphically indistinguishable way).
Since Γ𝑣 = {𝑔𝑣 : 𝑔 ∈ Γ} is Γ-invariant, its C-span 𝑊 is a Γ-invariant subspace (i.e.,
𝑔𝑊 ⊆ 𝑊 for all 𝑔 ∈ Γ), and hence a subrepresentation of Γ. Since 𝑣 is not a constant vector,
the Γ-action on 𝑣 is non-trivial. So 𝑊 is a non-trivial representation of Γ. Hence dim 𝑊 ≥ 𝐾
114 Pseudorandom Graphs

by hypothesis. Every nonzero vector in 𝑊 is an eigenvector of 𝐴 with eigenvalue 𝜇. It follows

that 𝜇 appears as an eigenvalue of 𝐴 with multiplicity at least 𝐾. Recall that we also have an
eigenvalue 𝑑 from the eigenvector 1. Thus
∑︁
𝑛
𝑑 2 + 𝐾 𝜇2 ≤ 𝜆 𝑗 ( 𝐴) 2 = tr 𝐴2 = 𝑛𝑑.
𝑗=1

Therefore
√︂ √︂
𝑑 (𝑛 − 𝑑) 𝑑𝑛
|𝜇| ≤ < . □
𝐾 𝐾
The above proof can be modified to prove a bipartite version, which will be useful for
certain applications.
Given a finite group Γ and a subset 𝑆 ⊆ Γ (not necessarily symmetric), we define the
bipartite Cayley graph BiCay(𝚪, 𝑺) as the bipartite graph with vertex set Γ on both parts,
with an edge joining 𝑔 on the left with 𝑔𝑠 on the right for every 𝑔 ∈ Γ and 𝑠 ∈ 𝑆.

Theorem 3.4.7 (Bipartite Cayley graphs on quasirandom groups)

Let Γ be a group of order 𝑛 with no non-trivial representations of dimension less than
√︁ the bipartite Cayley graph BiCay(Γ, 𝑆) is a bipartite-
𝐾. Let 𝑆 ⊆ Γ with |𝑆| = 𝑑. Then
(𝑛, 𝑑, 𝜆)-graph for some 𝜆 < 𝑛𝑑/𝐾.

√︁
In other words, the second largest eigenvalue of the adjacency matrix of this bipartite
Cayley graph is less than 𝑛𝑑/𝐾.
Exercise 3.4.8. Prove Theorem 3.4.7.
As an application of the expander mixing lemma, we show that in a quasirandom group,
the number of solutions to 𝑥𝑦 = 𝑧 with 𝑥, 𝑦, 𝑧 lying in three given sets 𝑋, 𝑌 , 𝑍 ⊆ Γ is close to
what one should predict from density alone. Note that the right-hand side expression below
is relatively small if 𝐾 2 is large compared to |𝑋 | |𝑌 | |𝑍 | /|Γ| 3 (e.g., if 𝑋, 𝑌 , 𝑍 each occupy at
least a constant proportion of the group, and 𝐾 tends to infinity).

Theorem 3.4.9 (Mixing in quasirandom groups)

Let Γ be a finite group with no non-trivial representations of dimension less than 𝐾. Let
𝑋, 𝑌 , 𝑍 ⊆ Γ. Then
√︂
|𝑋 | |𝑌 | |𝑍 | |𝑋 | |𝑌 | |𝑍 | |Γ|
|{(𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍 : 𝑥𝑦 = 𝑧}| − < .
|Γ| 𝐾

Proof. Every solution to 𝑥𝑦 = 𝑧, with (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍 corresponds to an edge (𝑥, 𝑧) in

BiCay(Γ, 𝑌 ) between vertex subset 𝑋 on the left and vertex subset 𝑍 on the right.

𝑋
𝑧
Γ 𝑦 = 𝑥 −1 𝑧 ∈ 𝑌 Γ
𝑥 𝑍
BiCay(Γ, 𝑌 )
3.4 Quasirandom Groups 115

√︁
By Theorem 3.4.7, BiCay(Γ, 𝑌 ) is a bipartite-(𝑛, 𝑑, 𝜆)-graph with 𝑛 = |Γ|, 𝑑 = |𝑌 |, and some
𝜆 < |Γ| |𝑌 | /|𝐾 |. The above inequality then follows from applying the bipartite expander
mixing lemma, Theorem 3.2.9, to BiCay(Γ, 𝑌 ). □

Corollary 3.4.10 (Product-free sets)

Let Γ be a finite group with no non-trivial representations of dimension less than 𝐾. Let
𝑋, 𝑌 , 𝑍 ⊆ Γ. If there is no solution to 𝑥𝑦 = 𝑧 with (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍, then
|Γ| 3
|𝑋 | |𝑌 | |𝑍 | < .
𝐾
In particular, every product-free 𝑋 ⊆ Γ has size less than |Γ| /𝐾 1/3 . (Here product-free
means that there is no solution to 𝑥𝑦 = 𝑧 with 𝑥, 𝑦, 𝑧 ∈ 𝑋.)

Proof. If there is no solution to 𝑥𝑦 = 𝑧, then the left-hand side of the inequality in Theo-
rem 3.4.9 is |𝑋 | |𝑌 | |𝑍 | /|Γ|. Rearranging gives the result. □
The above result already shows that all product-free subsets of a quasirandom group must
be small. This sharply contrasts the abelian setting. For example, in Z/𝑛Z (written additively),
there is a sum-free subset of size around 𝑛/3 consisting of all group elements strictly between
𝑛/3 and 2𝑛/3.
Exercise 3.4.11 (Growth and expansion in quasirandom groups). Let Γ be a finite group
with no non-trivial representations of dimension less than 𝐾. Let 𝑋, 𝑌 , 𝑍 ⊆ Γ. Suppose
|𝑋 | |𝑌 | |𝑍 | ≥ |Γ| 3 /𝐾. Then 𝑋𝑌 𝑍 = Γ (i.e., every element of Γ can be expressed as 𝑥𝑦𝑧 for
some (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍).

Examples of quasirandom groups

Example 3.4.12 (Quasirandom groups). Here are some examples of groups with no small
non-trivial representations.
(a) A classic result of Frobenius from around 1900 shows that every non-trivial repre-
sentation of PSL(2, 𝑝) has dimension at least ( 𝑝 − 1)/2 for all prime 𝑝. A short proof
is included below. Jordan (1907) and Schur (1907) computed the character tables
for PSL(2, 𝑞) for all prime power 𝑞. In particular, we know that every non-trivial
representation of PSL(2, 𝑞) has dimension ≥ (𝑞 − 1)/2 for all prime power 𝑞.
(b) The alternating group 𝐴𝑚 for 𝑚 ≥ 2 has order 𝑚!/2, and its smallest non-trivial
representation has dimension 𝑚 − 1 = Θ(log 𝑛/log log 𝑛). The representations of
symmetric and alternating groups have a nice combinatorial description using Young
diagrams. See Sagan (2001) and Fulton & Harris (1991) for expository accounts of
this theory.
(c) Gowers (2008, Theorem 4.7) gives an elementary proof that in every non-cyclic
simple
√︁ group of order 𝑛, the smallest non-trivial representation has dimension at least
log 𝑛/2.
116 Pseudorandom Graphs

Recall that the special linear group SL(2, 𝑝) is the group of 2 × 2 matrices (under multi-
plication) with determinant 1:

𝑎 𝑏
SL(2, 𝑝) = : 𝑎, 𝑏, 𝑐, 𝑑 ∈ F 𝑝 , 𝑎𝑑 − 𝑏𝑐 = 1 .
𝑐 𝑑
The projective special linear group PSL(2, 𝑝) is a quotient of SL(2, 𝑝) by all scalars; that is,
PSL(2, 𝑝) = SL(2, 𝑝)/{±𝐼} .
The following result is due to Frobenius.

Theorem 3.4.13 (PSL(2, 𝑝) is quasirandom)

Let 𝑝 be a prime. Then all non-trivial representations of SL(2, 𝑝) and PSL(2, 𝑝) have
dimension at least ( 𝑝 − 1)/2.

Proof. The claim is trivial for 𝑝 = 2, so we can assume that 𝑝 is odd. It suffices to prove the
claim for SL(2, 𝑝). Indeed, any non-trivial representation of PSL(2, 𝑝) can be made into a
representation of SL(2, 𝑝) by first passing through the quotient SL(2, 𝑝) → SL(2, 𝑝)/{±𝐼} =
PSL(2, 𝑝).
Now suppose 𝜌 is a non-trivial representation of SL(2, 𝑝). The group SL(2, 𝑝) is generated
by the elements (Exercise: check!)

1 1 1 0
𝑔= and ℎ= .
0 1 −1 1

These two elements are conjugate in SL(2, 𝑝) via 𝑧 = 11 −1 0 as 𝑔𝑧 = 𝑧ℎ. If 𝜌(𝑔) = 𝐼,
then 𝜌(ℎ) = 𝐼 by conjugation, and 𝜌 would be trivial since 𝑔 and ℎ generate the group. So,
𝜌(𝑔) ≠ 𝐼. Since 𝑔 𝑝 = 𝐼, we have 𝜌(𝑔) 𝑝 = 𝐼. So 𝜌(𝑔) is diagonalizable (here we use that a
matrix is diagonalizable if and only if its minimal polynomial has distinct roots, and that the
minimal polynomial of 𝜌(𝑔) divides 𝑋 𝑝 − 1). Since 𝜌(𝑔) ≠ 𝐼, 𝜌(𝑔) has an eigenvalue 𝜆 ≠ 1.
Since 𝜌(𝑔) 𝑝 = 𝐼, 𝜆 is a primitive 𝑝-th root of unity.
For every 𝑎 ∈ F×𝑝 , 𝑔 is conjugate to

𝑎 0 1 1 𝑎 −1 0 1 𝑎2 2
= = 𝑔𝑎 .
0 𝑎 −1 0 1 0 𝑎 0 1
2
Thus 𝜌(𝑔) is conjugate to 𝜌(𝑔) 𝑎 . Hence these two matrices have same set of eigenvalues.
2
So 𝜆 𝑎 is an eigenvalue of 𝜌(𝑔) for every 𝑎 ∈ F×𝑝 , and by ranging over all 𝑎 ∈ F×𝑝 , this gives
( 𝑝 − 1)/2 distinct eigenvalues of 𝜌(𝑔) (recall that 𝜆 is a primitive 𝑝-th root of unity). It
follows that dim 𝜌 ≥ ( 𝑝 − 1)/2. □
Applying Corollary 3.4.10 with Theorem 3.4.13 yields the following corollary (Gowers
2008). Note that the order of PSL(2, 𝑝) is ( 𝑝 3 − 𝑝)/2.

Corollary 3.4.14 (Product-free subset of PSL(2, 𝑝) )

The largest product-free subset of PSL(2, 𝑝) has size 𝑂 ( 𝑝 3−1/3 ).
In particular, there exist infinitely many groups of order 𝑛 whose largest product-free
subset has size 𝑂 (𝑛8/9 ).
3.4 Quasirandom Groups 117

Before Gowers’ work, it was not known whether every order 𝑛 group has a product-free
subset of size ≥ 𝑐𝑛 for some absolute constant 𝑐 > 0 (this was Question 3.4.1, asked by
Babai and Sós). Gowers’ result shows that the answer is no.
In the other direction, Kedlaya (1997; 1998) showed that every finite group of order 𝑛
has a product-free subset of size ≳ 𝑛11/14 . In fact, he showed that if the group has a proper
subgroup 𝐻 of index 𝑚, then there is a product-free subset that is a union of ≳ 𝑚 1/2 cosets
of 𝐻.

Equivalence of quasirandomness conditions

We saw that having no small non-trivial representations is a useful property of groups.
Gowers further showed that this group representation theoretic property is equivalent to
several other characterizations of the group.

Theorem 3.4.15 (Quasirandom groups)

Let Γ𝑛 be a sequence of finite groups of increasing order. The following are equivalent:
REP The dimension of the smallest non-trivial representation of Γ𝑛 tends to infinity.
GRAPH Every sequence of bipartite Cayley graphs on Γ𝑛 , as 𝑛 → ∞, is quasirandom
in the sense of Theorem 3.1.25.
PRODFREE The largest product-free subset of Γ𝑛 has size 𝑜(|Γ𝑛 |).
QUOTIENT For every proper normal subgroup 𝐻 of Γ𝑛 , the quotient Γ𝑛 /𝐻 is non-
abelian and has order tending to infinity as 𝑛 → ∞.

Let us comment on the various implications.

By Theorem 3.4.7, REP implies GRAPH. For the converse, we need to construct a
non-quasirandom Cayley graph on each group with a non-trivial representation of bounded
dimension. One can first construct a weighted analogue of a bipartite Cayley graph with large
eigenvalues by appealing to formulas from non-abelian Fourier transform (see Remark 3.4.17
below). And then one can sample a genuine bipartite Cayley graph from the weighted version.
By Corollary 3.4.10, REP implies PRODFREE. The converse is proved in Gowers
(2008) using elementary methods. It was later proved with better polynomial quantitative
dependence in Nikolov & Pyber (2011), who proved the following result.

Theorem 3.4.16 (PRODFREE implies REP)

Let Γ be a group with a non-trivial representation of dimension 𝐾. Then Γ has a product-
free subset of size at least 𝑐 |Γ| /𝐾, where 𝑐 > 0 is some absolute constant.

To see that REP implies QUOTIENT, note that any non-trivial representation of Γ/𝐻 is
automatically a representation of Γ after passing through the quotient. Furthermore, every
non-trivial abelian group has a non-trivial 1-dimensional representation, and every group
√
of order 𝑚 > 1 has a non-trivial representation of dimension < 𝑚. For the proof of the
converse, see Gowers (2008, Theorem 4.8). (This implication has an exponential dependence
of parameters.)
Remark 3.4.17 (Non-abelian Fourier analysis). (This is an advanced remark and can be
skipped over.) Section 3.3 discussed the Fourier transform on finite abelian groups. The topic
118 Pseudorandom Graphs

of this section can be alternatively viewed through the lenses of the non-abelian Fourier
transform. We refer to Wigderson (2012) for a tutorial on the non-abelian Fourier transform
from a combinatorial perspective.
Let us give here the recipe for computing the eigenvalues and an orthonormal basis of
eigenvectors of Cay(Γ, 𝑆).
For each irreducible representation 𝜌 of Γ (always working over C), let
∑︁
𝑀𝜌 := 𝜌(𝑠),
𝑠∈𝑆

viewed as a dim 𝜌 × dim 𝜌 matrix over C. Then 𝑀𝜌 has dim 𝜌 eigenvalues 𝜆 𝜌,1 , . . . , 𝜆 𝜌,dim 𝜌 .
Here is how to list all the eigenvalues of the adjacency matrix of Cay(Γ, 𝑆): repeating
each 𝜆 𝜌,𝑖 with multiplicity dim 𝜌, ranging over all irreducible representations 𝜌 and all
1 ≤ 𝑖 ≤ dim 𝜌.
To emphasize, the eigenvalues always come in bundles with multiplicities determined by
the dimensions of the irreducible representations of Γ (although it is possible for there to be
additional coalescence of eigenvalues).
One can additionally recover a system of eigenvectors of Cay(Γ, 𝑆). For each eigenvector
𝑣 with eigenvalue 𝜆 of 𝑀𝜌 , and every 𝑤 ∈ Cdim 𝜌 , set 𝑥 𝜌,𝑣,𝑤 ∈ CΓ with coordinates
𝑥 𝑔𝜌,𝑣,𝑤 = ⟨𝜌(𝑔)𝑣, 𝑤⟩
for all 𝑔 ∈ Γ. Then 𝑥 is an eigenvector of Cay(Γ, 𝑆) with eigenvalue 𝜆. Now let 𝜌 range over
all irreducible representations of Γ, and let 𝑣 range over an orthonormal basis of eigenvectors
of 𝑀𝜌 (let 𝜆 be the corresponding eigenvalue), and let 𝑤 range over an orthonormal basis
of eigenvectors of Cdim 𝜌 , then 𝑥 𝜌,𝑣,𝑤 ranges over an orthogonal system of eigenvectors of
Cay(Γ, 𝑆). The eigenvalue associated to 𝑥 𝜌,𝑣,𝑤 is 𝜆.
A basic theorem in representation theory tells us that the regular representation decom-
poses into a direct sum of dim 𝜌 copies of 𝜌 ranging over every irreducible representation
𝜌 of Γ. This decomposition then corresponds to a block diagonalization (simultaneously for
all 𝑆) of the adjacency matrix of Cay(Γ, 𝑆) into blocks 𝑀𝜌 , repeated dim 𝜌 times, for each
𝜌. The above statement comes from interpreting this block diagonalization.
The matrix 𝑀𝜌 , appropriately normalized, is the non-abelian Fourier transform of the
indicator vector of 𝑆 at 𝜌. Many basic and important formulas for Fourier analysis over
abelian groups, e.g, inversion and Parseval (which we will see in Chapter 6) have nonabelian
analogs.

3.5 Quasirandom Cayley Graphs and Grothendieck’s Inequality

Let us examine the following two sparse quasirandom graph conditions (c.f. Remark 3.1.29).

Definition 3.5.1 (Sparse quasirandom graphs)

Let 𝐺 be an 𝑛-vertex 𝑑-regular graph. We say that 𝐺 satisfies property
SparseDISC(𝜀) if 𝑒(𝑋, 𝑌 ) − 𝑑𝑛 |𝑋 | |𝑌 | ≤ 𝜀𝑑𝑛 for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺);
SparseEIG(𝜀) if 𝐺 is an (𝑛, 𝑑, 𝜆)-graph for some 𝜆 ≤ 𝜀𝑑.

In Section 3.1, we saw that when 𝑑 grows linearly in 𝑛, then these two conditions are
3.5 Quasirandom Cayley Graphs and Grothendieck’s Inequality 119

equivalent up to a polynomial change in the constant 𝜀. As discussed in Remark 3.1.29, many

quasirandomness equivalences break down for sparse graphs, meaning 𝑑 = 𝑜(𝑛) here. Some
still holds, for example:

Proposition 3.5.2 (SparseEIG implies SPARSEDISC)

Among regular graphs,
SparseEIG(𝜀) implies SparseDISC(𝜀).

Proof. In an (𝑛, 𝑑, 𝜆) graph with 𝜆 ≤ 𝜀𝑑, by the expander mixing lemma (Theorem 3.2.4),
for every vertex subsets 𝑋 and 𝑌 ,
𝑑 √︁ √︁
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ 𝜆 |𝑋 | |𝑌 | ≤ 𝜀𝑑 |𝑋 | |𝑌 | ≤ 𝜀𝑑𝑛.
𝑛
So the graph satisfies SparseDISC(𝜀). □
The converse fails badly. Consider the disjoint union of a large random 𝑑-regular graph
and a 𝐾 𝑑+1 (here 𝑑 = 𝑜(𝑛)).

large random
∪
𝑑-regular graph
𝐾 𝑑+1

This graph satisfies SparseDISC(𝑜(1)) since it is satisfied by the large component, and the
small component 𝐾 𝑑+1 contributes negligibly to discrepancy due to its size. On the other
hand, each connected component contributes a eigenvalue of 𝑑 (by taking the all-1 vector
supported on each component), and so SparseEIG(𝜀) fails for any 𝜀 < 1.
The main result of this section is that despite the above example, if we restrict ourselves
to Cayley graphs (abelian or non-abelian), SparseDISC(𝜀) and SparseEIG(𝜀) are always
equivalent up to a linear change in 𝜀. This result is due to Conlon & Zhao (2017).

Theorem 3.5.3 (SparseDISC implies SparseEIG for Cayley graphs)

Among Cayley graphs,
SparseDISC(𝜀) implies SparseEIG(8𝜀).

As in Section 3.4, we prove the above result more generally for vertex-transitive graphs
(see Definition 3.4.5).

Theorem 3.5.4 (SparseDISC implies SparseEIG for vertex-transitive graphs)

Among vertex-transitive graphs,
SparseDISC(𝜀) implies SparseEIG(8𝜀).
120 Pseudorandom Graphs

Grothendieck’s inequality
The proof of the above theorem leads us to the following important inequality from functional
analysis due to Grothendieck (1953).
Given a matrix 𝐴 = (𝑎 𝑖, 𝑗 ) ∈ R𝑚×𝑛 , we can consider its ℓ ∞ → ℓ 1 norm
sup ∥ 𝐴𝑦∥ ℓ 1 ,
∥ 𝑦 ∥ ∞ ≤1

which can also be written as (exercise: check! Also see Lemma 4.5.3 for a related fact about
the cut norm of graphons)
∑︁
𝑛 ∑︁
𝑛
sup ⟨𝑥, 𝐴𝑦⟩ = sup 𝑎 𝑖, 𝑗 𝑥 𝑖 𝑦 𝑗 . (3.3)
𝑥 ∈ { −1,1} 𝑚 𝑥1 ,··· , 𝑥𝑚 ∈ { −1,1} 𝑖=1 𝑗=1
𝑦 ∈ { −1,1} 𝑛 𝑦1 ,...,𝑦𝑛 ∈ { −1,1}

This quantity is closely related to discrepancy.

One can consider a semidefinite relaxation of the above quantity:
∑︁
𝑚 ∑︁
𝑛
sup 𝑎 𝑖, 𝑗 𝑥 𝑖 , 𝑦 𝑗 , (3.4)
∥ 𝑥1 ∥ ,..., ∥ 𝑥𝑚 ∥ ≤1 𝑖=1 𝑗=1
∥ 𝑦1 ∥ ,..., ∥ 𝑦𝑛 ∥ ≤1

where the surpremum is taken over vectors 𝑥 1 , . . . , 𝑥 𝑚 , 𝑦 1 , . . . , 𝑦 𝑛 in the unit ball of some
real Hilbert space, whose norm is denoted by ∥ ∥. Without loss of generality, we can take
assume that these vectors lie in R𝑚+𝑛 with the usual Euclidean norm (here 𝑚 + 𝑛 dimensions
are enough since 𝑥1 , . . . , 𝑥 𝑚 , 𝑦 1 , . . . , 𝑦 𝑛 span a real subspace of dimension at most 𝑚 + 𝑛).
We always have
(3.3) ≤ (3.4)
by restricting the vectors in (3.4) to R. There are efficient algorithms (both in theory and in
practice) using semidefinite programming to solve (3.4), whereas no efficient algorithm is
believed to exist for computing (3.3) (Alon & Naor 2006).
Grothendieck’s inequality says that this semidefinite relaxation never loses more than a
constant factor.

Theorem 3.5.5 (Grothendieck’s inequality)

There exists a constant 𝐾 > 0 (𝐾 = 1.8 works) such that for all matrices 𝐴 = (𝑎 𝑖, 𝑗 ) ∈
R𝑚×𝑛 ,
∑︁𝑚 ∑︁𝑛 ∑︁
𝑚 ∑︁𝑛
sup 𝑎 𝑖, 𝑗 𝑥𝑖 , 𝑦 𝑗 ≤ 𝐾 sup 𝑎 𝑖, 𝑗 𝑥 𝑖 𝑦 𝑗 ,
∥ 𝑥𝑖 ∥ , ∥ 𝑦 𝑗 ∥ ≤1 𝑖=1 𝑗=1 𝑥𝑖 ,𝑦 𝑗 ∈ {±1} 𝑖=1 𝑗=1

where the left-hand side supremum is taken over vectors 𝑥1 , . . . , 𝑥 𝑛 , 𝑦 1 , . . . , 𝑦 𝑚 in the

unit ball of some real Hilbert space.

Remark 3.5.6. The optimal constant 𝐾 is known as the real Grothendieck’s constant. Its
exact value is unknown. It is known to lie within [1.676, 1.783]. There is also a complex ver-
sion of Grothendieck’s inequality, where the left-hand side uses a complex Hilbert space (and
place an absolute value around the final sum). The corresponding complex Grothendieck’s
constant is known to lie within [1.338, 1.405].
3.5 Quasirandom Cayley Graphs and Grothendieck’s Inequality 121

We will not prove Grothendieck’s inequality here. See Alon & Naor (2006) for three proofs
of the inequality, along with algorithmic discussions.

Proof that SparseDISC implies SparseEIG for vertex-transitive graphs

Proof of Theorem 3.5.4. Let 𝐺 be an 𝑛-vertex 𝑑-regular graph with a vertex-transitive group
Γ of automorphisms. Suppose 𝐺 satisfies SparseDISC(𝜀). Let 𝐴 be the adjacency matrix
of 𝐺. Write
𝑑
𝐵= 𝐴− 𝐽
𝑛
where 𝐽 is the 𝑛 × 𝑛 all-1 matrix. To show that 𝐺 is an (𝑛, 𝑑, 𝜆)-graph with 𝜆 ≤ 𝜀𝑑, it suffices
to show that 𝐵 has operator norm ∥𝐵∥ ≤ 𝜀𝑑 (here we are using that 𝐺 is 𝑑-regular, so the
all-1 eigenvector of 𝐴 with eigenvalue 𝑑 becomes an eigenvector of 𝐵 with eigenvalue zero
0).
For any 𝑋, 𝑌 ⊆ 𝑉 (𝐺), the corresponding indicator vectors 𝑥 = 1𝑋 ∈ R𝑛 and 𝑦 = 1𝑌 ∈ R𝑛
satisfy, by SparseDISC(𝜀),
𝑑
|⟨𝑥, 𝐵𝑦⟩| = 𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ 𝜀𝑑𝑛.
𝑛
Then, for any 𝑥, 𝑦 ∈ {−1, 1} 𝑛 , we can write 𝑥 = 𝑥 + − 𝑥 − and 𝑦 = 𝑦 + − 𝑦 − with 𝑥 + , 𝑥 − , 𝑦 + , 𝑦 − ∈
{0, 1} 𝑛 . Since,
⟨𝑥, 𝐵𝑦⟩ = ⟨𝑥 + , 𝐵𝑦 + ⟩ − ⟨𝑥 + , 𝐵𝑦 − ⟩ − ⟨𝑥 − , 𝐵𝑦 + ⟩ + ⟨𝑥 − , 𝐵𝑦 − ⟩,
and each term on the right-hand side is at most 𝜀𝑑𝑛 in absolute value, we have
|⟨𝑥, 𝐵𝑦⟩| ≤ 4𝜀𝑑𝑛 for all 𝑥, 𝑦 ∈ {−1, 1} 𝑛 . (3.5)
For any graph automorphism 𝑔 ∈ Γ and any 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ) ∈ R𝑛 and 𝑗 ∈ [𝑛], write
√︂
𝑛
𝑥𝑗 = 𝑥 𝑔 ( 𝑗 ) : 𝑔 ∈ Γ ∈ RΓ .
|Γ|
For every unit vector 𝑥 ∈ R𝑛 , the vector 𝑥 𝑗 ∈ RΓ is a unit vector since 𝑥 12 + · · · + 𝑥 𝑛2 = 1
and the map 𝑔 ↦→ 𝑔( 𝑗) is 𝑛/|Γ|-to-1 for each 𝑗. Similarly define 𝑦 𝑗 for any 𝑦 ∈ R𝑛 and
𝑗 ∈ [𝑛]. Furthermore, 𝐵𝑖, 𝑗 = 𝐵𝑔 (𝑖) ,𝑔 ( 𝑗 ) for any 𝑔 ∈ Γ and 𝑗 ∈ [𝑛] due to 𝑔 being a graph
automorphism.
To prove the operator norm bound ∥𝐵∥ ≤ 8𝜀𝑑, it suffices to show that ⟨𝑥, 𝐵𝑦⟩ ≤ 8𝜀𝑑 for
every pair of unit vectors 𝑥, 𝑦 ∈ R𝑛 . We have
∑︁
𝑛
1 ∑︁ ∑︁
𝑛
⟨𝑥, 𝐵𝑦⟩ = 𝐵𝑖, 𝑗 𝑥 𝑖 𝑦 𝑗 = 𝐵𝑔 (𝑖) ,𝑔 ( 𝑗 ) 𝑥 𝑔 (𝑖) 𝑦 𝑔 ( 𝑗 )
𝑖, 𝑗=1
|Γ| 𝑔∈Γ 𝑖, 𝑗=1
1 ∑︁ ∑︁ 1 ∑︁
𝑛 𝑛
= 𝐵𝑖, 𝑗 𝑥 𝑔 (𝑖) 𝑦 𝑔 ( 𝑗 ) = 𝐵𝑖, 𝑗 ⟨𝑥 𝑖 , 𝑦 𝑗 ⟩ ≤ 8𝜀𝑑.
|Γ| 𝑔∈Γ 𝑖, 𝑗=1 𝑛 𝑖, 𝑗=1

The final step follows from Grothendieck’s inequality (applied with 𝐾 ≤ 2) along with (3.5).
This completes the proof of SparseEIG(8𝜀). □
122 Pseudorandom Graphs

3.6 Second Eigenvalue: Alon–Boppana Bound

The expander mixing lemma tells us that in an (𝑛, 𝑑, 𝜆)-graph, a smaller value of 𝜆 guarantees
stronger pseudorandomness properties. In this chapter, we explore the following natural
extremal question.

Question 3.6.1 (Minimum second eigenvalue)

Fix a positive integer 𝑑. What is the smallest possible 𝜆 (as a function of 𝑑 alone) such
that there exist infinitely many (𝑛, 𝑑, 𝜆 + 𝑜(1))-graphs, where the 𝑜(1) is some quantity
that goes to zero as 𝑛 → ∞?

The answer turns out to be

√
𝜆 = 2 𝑑 − 1.
One significance of this quantity is that it is the spectral radius of the infinite 𝑑-regular tree.
The following result gives the lower bound on 𝜆 (Alon 1986).

Theorem 3.6.2 (Alon–Boppana second eigenvalue bound)

Fix a positive integer 𝑑. Let 𝐺 be an 𝑛-vertex 𝑑-regular graph. If 𝜆1 ≥ · · · ≥ 𝜆 𝑛 are the
eigenvalues of its adjacency matrix, then
√
𝜆2 ≥ 2 𝑑 − 1 − 𝑜(1),
where 𝑜(1) → 0 as 𝑛 → ∞.
√
In particular, the Alon–Boppana bound implies that max {|𝜆2 | , |𝜆 𝑛 |} ≥ 2 𝑑 − 1 − 𝑜(1),
which can be restated as below.

Corollary 3.6.3 (Alon–Boppana second eigenvalue bound)

√
For every fixed 𝑑 and 𝜆 < 2 𝑑 − 1, there are only finitely many (𝑛, 𝑑, 𝜆)-graphs.

We will see two different proofs. The first proof (Nilli 1991) constructs an eigenvector
explicitly. The second proof (only for Corollary 3.6.3) uses the trace method to bound
moments of the eigenvalues via counting closed walks.

Lemma 3.6.4 (Test vector)

Let 𝐺 = (𝑉, 𝐸) be a 𝑑-regular graph. Let 𝐴 be the adjacency matrix of 𝐺. Let 𝑟 be a
positive integer. Let 𝑠𝑡 be an edge of 𝐺. For each 𝑖 ≥ 0, let 𝑉𝑖 denote the set of all vertices
at distance exactly 𝑖 from {𝑠, 𝑡} (so that in particular 𝑉0 = {𝑠, 𝑡}). Let 𝑥 = (𝑥 𝑣 ) 𝑣 ∈𝑉 ∈ R𝑉
be a vector with coordinates
(
(𝑑 − 1) −𝑖/2 if 𝑣 ∈ 𝑉𝑖 and 𝑖 ≤ 𝑟,
𝑥𝑣 =
0 otherwise, i.e., dist(𝑣, {𝑠, 𝑡}) > 𝑟.
Then
⟨𝑥, 𝐴𝑥⟩ √ 1
≥ 2 𝑑−1 1−
⟨𝑥, 𝑥⟩ 𝑟 +1
3.6 Second Eigenvalue: Alon–Boppana Bound 123

𝑠 𝑡 𝑥𝑣
𝑉0 1
𝑉1 (𝑑 − 1) −1/2
𝑉2 (𝑑 − 1) −1
𝑉3 (𝑑 − 1) −3/2

Proof. Let 𝐿 = 𝑑𝐼 − 𝐴 (this is called the Laplacian matrix of 𝐺). The claim can be rephrased
as an upper bound on ⟨𝑥, 𝐿𝑥⟩ /⟨𝑥, 𝑥⟩. Here is an important and convenient formula (it can be
easily proved by expanding):
∑︁
⟨𝑥, 𝐿𝑥⟩ = (𝑥 𝑢 − 𝑥 𝑣 ) 2 .
𝑢𝑣 ∈𝐸

Since 𝑥 𝑣 is constant for all 𝑣 in the same 𝑉𝑖 , we only need to consider edges spanning
consecutive 𝑉𝑖 ’s. Using the formula for 𝑥, we obtain
∑︁
𝑟 −1 2
1 1 𝑒(𝑉𝑟 , 𝑉𝑟+1 )
⟨𝑥, 𝐿𝑥⟩ = 𝑒(𝑉𝑖 , 𝑉𝑖+1 ) − +
𝑖=0
(𝑑 − 1) 𝑖/2 (𝑑 − 1) (𝑖+1)/2 (𝑑 − 1) 𝑟
For each 𝑖 ≥ 0, each vertex in 𝑉𝑖 has at most 𝑑−1 neighbors in 𝑉𝑖+1 , so 𝑒(𝑉𝑖 , 𝑉𝑖+1 ) ≤ (𝑑−1) |𝑉𝑖 |.
Thus continuing from above,
∑︁
𝑟 −1 2
1 1 |𝑉𝑟 | (𝑑 − 1)
≤ |𝑉𝑖 | (𝑑 − 1) − +
𝑖=0
(𝑑 − 1) 𝑖/2 (𝑑 − 1) (𝑖+1)/2 (𝑑 − 1) 𝑟
√ 2 ∑︁
𝑟 −1
|𝑉𝑖 | |𝑉𝑟 | (𝑑 − 1)
= 𝑑−1−1 +
𝑖=0
(𝑑 − 1) 𝑖 (𝑑 − 1) 𝑟
√ ∑︁𝑟
|𝑉𝑖 | √ |𝑉 |
= 𝑑−2 𝑑−1 2 1 1
𝑟
+ 𝑑 − − .
𝑖=0
(𝑑 − 1) 𝑖 (𝑑 − 1) 𝑟

We have |𝑉𝑖+1 | ≤ (𝑑 − 1) |𝑉𝑖 | for every 𝑖 ≥ 0, so that |𝑉𝑟 | (𝑑 − 1) −𝑟 ≤ |𝑉𝑖 | (𝑑 − 1) −𝑖 for each
𝑖 ≤ 𝑟. So continuing,

! 𝑟
2 𝑑 − 1 − 1 ∑︁ |𝑉𝑖 |
√
√
≤ 𝑑−2 𝑑−1+
𝑟 +1 (𝑑 − 1) 𝑖
!
𝑖=0
√
√ 2 𝑑−1−1
= 𝑑−2 𝑑−1+ ⟨𝑥, 𝑥⟩ ,
𝑟 +1
It follows that
√ !
⟨𝑥, 𝐴𝑥⟩ ⟨𝑥, 𝐿𝑥⟩ √ 2 𝑑−1−1
=𝑑− ≥ 2 𝑑−1−
⟨𝑥, 𝑥⟩ ⟨𝑥, 𝑥⟩ 𝑟 +1

1 √
≥ 1− 2 𝑑 − 1. □
𝑟 +1
124 Pseudorandom Graphs

Proof of the Alon–Boppana bound (Theorem 3.6.2). Let 𝑉 = 𝑉 (𝐺). Let 1 be the all-1’s
vector, which is an eigenvector with eigenvalue 𝑑. To prove the theorem, it suffices to exhibit
a nonzero vector 𝑧 ⊥ 1 such that
⟨𝑧, 𝐴𝑧⟩ √
≥ 2 𝑑 − 1 − 𝑜(1).
⟨𝑧, 𝑧⟩
Let 𝑟 be an arbitrary positive integer. When 𝑛 is sufficiently large, there exist two edges 𝑠𝑡
and 𝑠′ 𝑡 ′ in the graph with distance at least 2𝑟 + 2 apart (indeed, since the number of vertices
within distance 𝑘 of an edge is ≤ 2(1 + (𝑑 − 1) + (𝑑 − 1) 2 + · · · + (𝑑 − 1) 𝑘 )). Let 𝑥 ∈ R𝑉
be the vector constructed as in Lemma 3.6.4 for 𝑠𝑡, and let 𝑦 ∈ R𝑉 be the corresponding
vector constructed for 𝑠′ 𝑡 ′ . Recall that 𝑥 is supported on vertices within distance 𝑟 from 𝑠𝑡,
and likewise with 𝑦 and 𝑠′ 𝑡 ′ . Since 𝑠𝑡 and 𝑠′ 𝑡 ′ are at distance at least 2𝑟 + 2 apart, the support
of 𝑥 is at distance at least 2 from the support of 𝑦. Thus
⟨𝑥, 𝑦⟩ = 0 and ⟨𝑥, 𝐴𝑦⟩ = 0.
Choose a constant 𝑐 ∈ R such that 𝑧 = 𝑥 − 𝑐𝑦 has sum of its entries equal to zero (this is
possible since ⟨𝑦, 1⟩ > 0). Then
⟨𝑧, 𝑧⟩ = ⟨𝑥, 𝑥⟩ + 𝑐2 ⟨𝑦, 𝑦⟩
and so by Lemma 3.6.4
⟨𝑧, 𝐴𝑧⟩ = ⟨𝑥, 𝐴𝑥⟩ + 𝑐2 ⟨𝑦, 𝐴𝑦⟩

1 √
≥ 1− 2 𝑑 − 1 ⟨𝑥, 𝑥⟩ + 𝑐2 ⟨𝑦, 𝑦⟩
𝑟 +1

1 √
= 1− 2 𝑑 − 1 ⟨𝑧, 𝑧⟩ .
𝑟 +1
Taking 𝑟 → ∞ as 𝑛 → ∞ gives the theorem. □
Remark 3.6.5. The above proof cleverly considers distance from an edge rather than from a
single vertex. This is important for a rather subtle reason. Why does the proof fail if we had
instead considered distance from a vertex?
Now let us give another proof—actually we will only prove the slightly weaker statement
of Corollary 3.6.3, which is equivalent to
√
max {|𝜆2 | , |𝜆 𝑛 |} ≥ 2 𝑑 − 1 − 𝑜(1). (3.6)
√
As a warmup, let us first prove (3.6) with 𝑑 − 𝑜(1) on the right-hand side. We have
∑︁𝑛
𝑑𝑛 = 2𝑒(𝐺) = tr 𝐴 =2
𝜆2𝑖 ≤ 𝑑 2 + (𝑛 − 1) max {|𝜆2 | , |𝜆 𝑛 |}2 .
𝑖=1

So
√︂
𝑑 (𝑛 − 𝑑) √
max {|𝜆2 | , |𝜆 𝑛 |} ≥ = 𝑑 − 𝑜(1)
𝑛−1
as 𝑛 → ∞ for fixed 𝑑.
To prove (3.6), we consider higher moments tr 𝐴 𝑘 . This is a useful technique, sometimes
called the trace method or the moment method.
3.6 Second Eigenvalue: Alon–Boppana Bound 125

Alternative proof of (3.6). The quantity

∑︁
𝑛
tr 𝐴2𝑘 = 𝜆2𝑘
𝑖
𝑖=1

counts the number of closed walks of length 2𝑘 on 𝐺. Let T𝑑 denote the infinite 𝑑-regular
tree. Observe that
# closed length-2𝑘 walks in 𝐺 starting from a fixed vertex
≥ # closed length-2𝑘 walks in T𝑑 starting from a fixed vertex.
Indeed, at each vertex, for both 𝐺 and T𝑑 , we can label its 𝑑 incident edges arbitrarily from
1 to 𝑑 (the labels assigned from the two endpoints of the same edge do not have to match).
Then every closed length-2𝑘 walk in T𝑑 corresponds to a distinct closed length-2𝑘 walk in
𝐺 by tracing the same outgoing edges at each step (why?). Note that not all closed walks in
𝐺 arise this way (e.g., walks that go around cycles in 𝐺).
The number of closed walks of length 2𝑘 on an infinite 𝑑-regular graph starting at a fixed
1 2𝑘
root is at least (𝑑 − 1) 𝑘 𝐶𝑘 , where 𝐶𝑘 = 𝑘+1 𝑘
is the 𝑘-th Catalan number. To see this, note
that each step in the walk is either “away from the root” or “towards the root.” We record a
sequence by denoting steps of the former type by + and of the latter type by −.

+ −
− − + ++
+ + − − −
+ −

+ + + − + − − + + + − − − −

Then the number of valid sequences permuting 𝑘 +’s and 𝑘 −’s is exactly counted by the
Catalan number 𝐶𝑘 , as the only constraint is that there can never be more −’s than +’s up to
any point in the sequence. Finally, there are at least 𝑑 − 1 choices for where to step in the
walk at any + (there are 𝑑 choices at the root), and exactly one choice for each −.
Thus, the number of closed walks of length 2𝑘 in 𝐺 is at least

𝑛 2𝑘
tr 𝐴2𝑘 ≥ 𝑛(𝑑 − 1) 𝑘 𝐶𝑘 ≥ (𝑑 − 1) 𝑘 .
𝑘 +1 𝑘
On the other hand, we have
∑︁
𝑛
tr 𝐴2𝑘 = 𝜆2𝑘
𝑖 ≤ 𝑑
2𝑘
+ (𝑛 − 1) max {|𝜆2 | , |𝜆 𝑛 |}2𝑘 .
𝑖=1

Thus,

2𝑘 1 2𝑘 𝑑 2𝑘
max {|𝜆2 | , |𝜆 𝑛 |} ≥ (𝑑 − 1) 𝑘 − .
𝑘 +1 𝑘 𝑛−1
126 Pseudorandom Graphs
1 2𝑘
The term 𝑘+1 is (2 − 𝑜(1)) 2𝑘 as 𝑘 → ∞. Letting 𝑘 → ∞ slowly (e.g., 𝑘 = 𝑜(log 𝑛)) as
𝑘 √
𝑛 → ∞ gives us max {|𝜆2 | , |𝜆 𝑛 |} ≥ 2 𝑑 − 1 − 𝑜(1). □
Remark 3.6.6. The infinite 𝑑-regular graph T𝑑 is the universal cover of all 𝑑-regular√graphs
(this fact is used in the first step of the argument). The spectral radius of T𝑑 is 2 𝑑 − 1,
which is the fundamental reason why this number arises in the Alon–Boppana bound.

√
Graphs with 𝜆2 ≈ 2 𝑑 − 1
Let us return to Question 3.6.1: what is the smallest possible 𝜆2 for 𝑛-vertex 𝑑-regular graphs,
with 𝑑 fixed and 𝑛 large? Is the Alon–Boppana bound tight? (The answer is yes.)
Alon’s second eigenvalue conjecture says that random 𝑑-regular graphs match the Alon–
Boppana bound. This was proved by Friedman (2008). We will not present the proof, as it is
quite a difficult result.

Theorem 3.6.7 (Friedman’s second eigenvalue theorem)

√
Fix positive integer 𝑑 and 𝜆 > 2 𝑑 − 1. With probability 1 − 𝑜(1) as 𝑛 → ∞ (with 𝑛 even
if 𝑑 is odd), a uniformly chosen random 𝑛-vertex 𝑑-regular graph is an (𝑛, 𝑑, 𝜆)-graph.

In other words, the above theorem says that random 𝑑-random graphs on 𝑛 vertices satisfy,
with probability 1 − 𝑜(1) (for fixed 𝑑 ≥ 3 and 𝑛 → ∞),
√
max {|𝜆2 | , |𝜆 𝑛 |} ≤ 2 𝑑 − 1 + 𝑜(1).
√
Can we get ≤ 2 𝑑 − 1 exactly without an error term? This leads us to one of the biggest
open problems of the field.

Definition 3.6.8 (Ramanujan graph)

√
A Ramanujan graph is an (𝑛, 𝑑, 𝜆)-graph with 𝜆 = 2 𝑑 − 1. In other words, it is a
𝑑-regular
√ graph whose adjacency matrix has all eigenvalues, except the top one, at most
2 𝑑 − 1 in absolute value.
A major open problem is to show the existence of infinite families of 𝑑-regular Ramanujan
graphs.

Conjecture 3.6.9 (Existence of Ramanujan graphs)

For every positive integer 𝑑 ≥ 3, there exist infinitely many 𝑑-regular Ramanujan graphs.

While it is not too hard to construct small Ramanujan graphs (e.g., 𝐾 𝑑+1 has eigenvalues
𝜆1 = 𝑑 and 𝜆2 = · · · = 𝜆 𝑛 = −1), it is a major open problem to construct infinitely many
𝑑-regular Ramanujan graphs for each 𝑑.
The term “Ramanujan graph” was coined by Lubotzky, Phillips, & Sarnak (1988), who
constructed infinite families of 𝑑-regular Ramanujan graphs when 𝑑 − 1 is an odd prime.
The same result was independently proved by Margulis (1988). The proof of the eigenvalue
bounds uses deep results from number theory, namely solutions to the Ramanujan conjecture
(hence the name). These constructions were later extended by Morgenstern (1994) whenever
𝑑 − 1 is a prime power. The current state of Conjecture 3.6.9 is given below, and it remains
open for all other 𝑑, with the smallest open case being 𝑑 = 7.
3.6 Second Eigenvalue: Alon–Boppana Bound 127

Theorem 3.6.10 (Existence of Ramanujan graphs)

If 𝑑 − 1 is a prime power, then there exist infinitely many 𝑑-regular Ramanujan graphs.

All known results are based on explicit constructions using Cayley graphs on PSL(2, 𝑞)
or related groups. We refer the reader to the book Davidoff, Sarnak, & Valette (2003) for a
gentle exposition of the construction.
Theorem 3.6.7 says that random 𝑑-regular graphs are “nearly-Ramanujan.” Empirical
evidence suggests that for each fixed 𝑑, a uniform random 𝑛-vertex 𝑑-regular graph is
Ramanujan with probability bounded away from 0 and 1, for large 𝑛.

Conjecture 3.6.11 (A random 𝑑 -regular graph is likely Ramanujan)

For every 𝑑 ≥ 3, there is some 𝑐 𝑑 > 0 so that for all sufficiently large 𝑛 (with 𝑛 even
if 𝑑 is odd), a a uniformly chosen random 𝑛-vertex 𝑑-regular graph is Ramanujan with
probability at least 𝑐 𝑑 .

If this were true, it would prove Conjecture 3.6.9 on the existence of Ramanujan graphs.
However, no rigorous results are known in this vein.
One can formulate a bipartite analog.

Definition 3.6.12 (Bipartite Ramanujan graph)

√
A bipartite Ramanujan graph is some bipartite-(𝑛, 𝑑, 𝜆)-graph with 𝜆 = 2 𝑑 − 1.

Given a Ramanujan graph 𝐺, we can turn it into a bipartite Ramanujan graph 𝐺 × 𝐾2 .

So the existence of bipartite Ramanujan graphs is weaker than of Ramanujan graphs. Nev-
ertheless, for a long time, it was not known how to construct infinite families of bipartite
Ramanujan graphs other than using Ramanujan graphs. A breakthrough by Marcus, Spiel-
man, & Srivastava (2015) completely settled the bipartite version of the problem. Unlike
earlier construction of Ramanujan graphs, their proof is existential (i.e., non-constructive)
and introduces an important technique of interlacing families of polynomials.

Theorem 3.6.13 (Bipartite Ramanujan graphs of every degree)

For every 𝑑 ≥ 3, there exist infinitely many 𝑑-regular bipartite Ramanujan graphs.

Exercise 3.6.14 (Alon–Boppana bound with multiplicity). Prove that for every positive
integer 𝑑 and real 𝜀 > 0, there is some constant√ 𝑐 > 0 so that every 𝑛-vertex 𝑑-regular
graph has at least 𝑐𝑛 eigenvalues greater than 2 𝑑 − 1 − 𝜀.

Exercise 3.6.15∗ (Net removal decreases top eigenvalue). Show that for every 𝑑 and 𝑟,
there is some 𝜀 > 0 such that if 𝐺 is a 𝑑-regular graph, and 𝑆 ⊆ 𝑉 (𝐺) is such that every
vertex of 𝐺 is within distance 𝑟 of 𝑆, then the top eigenvalue of the adjacency matrix of
𝐺 − 𝑆 (i.e., remove 𝑆 and its incident edges from 𝐺) is at most 𝑑 − 𝜀.

Further Reading
The survey Pseudo-random Graphs by Krivelevich & Sudakov (2006) discusses many com-
binatorial aspects of this topic.
128 Pseudorandom Graphs

Expander graphs are a large and intensely studied topic, partly due to many important
applications in computer science. Here are two important surveys articles:
• Expander Graphs and Their Applications by Hoory, Linial, & Wigderson (2006);
• Expander Graphs in Pure and Applied Mathematics by Lubotzky (2012).
For spectral graph theory, see the book Spectral Graph Theory by Chung (1997), or the
book draft Spectral and Algebraic Graph Theory by Spielman.
The book Elementary Number Theory, Group Theory and Ramanujan Graphs by Davidoff,
Sarnak, & Valette (2003) gives a gentle introduction to the construction of Ramanujan graphs.
The breakthrough by Marcus, Spielman, & Srivastava (2015) constructing bipartite Ra-
manujan graphs via interlacing polynomials is an instant classic.

Chapter Summary
• We are interested in quantifying how a given graph can be similar to a random graph.
• The Chung–Graham–Wilson quasirandom graphs theorem says that several notions
are equivalent, notably:
– DISC: edge discrepancy (c.f. the 𝜀-regular pair from c2),
– C4 : 4-cycle count close to random, and
– EIG: all eigenvalues (except the largest) small.
These equivalences only apply to graphs at constant order edge density. Some of the
implications break down for sparser graphs.
• An (𝒏, 𝒅, 𝝀)-graph is an 𝑛-vertex 𝑑-regular graph all of whose adjacency matrix eigenval-
ues are ≤ 𝜆 in absolute value except the top one (which must be 𝑑). The second eigenvalue
plays an important role in pseudorandomness.
• Expander mixing lemma. An (𝑛, 𝑑, 𝜆)-graph satisfies
𝑑 √︁
𝑒(𝑋, 𝑌 ) − |𝑋 | |𝑌 | ≤ 𝜆 |𝑋 | |𝑌 | for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺).
𝑛
• The eigenvalues of an abelian Cayley graph Cay(Γ, 𝑆) can be computed via the Fourier
transform of 1𝑆 𝑆. For example, using a Gauss sum, one can deduce that the Paley graph
(generated by quadratic residues in Z/𝑝Z) is quasirandom.
• A non-abelian group with no small non-trivial representations is call a quasirandom
group.
– Every Cayley graph on a quasirandom group is a quasirandom graph.
– There are no large product-free sets in a quasirandom group.
– Example of quasirandom group: PSL(2, 𝑝), which has order ( 𝑝 3 − 𝑝)/2, and all non-
trivial representations have dimension ≥ ( 𝑝 − 1)/2.
• Among vertex-transitive graphs (which includes all Cayley graphs), the sparse ana-
logues of the discrepancy property (SparseDISC) and small second eigenvalue property
(SparseEIG) are equivalent up to a linear change of the error tolerance parameter. This
equivalence is false for general graphs.
– Proof applies Grothendieck’s inequality, which says that that semidefinite relaxation
of the ℓ ∞ → ℓ 1 norm (equivalent to the cut norm) gives a constant factor approximation.
• Alon–Boppana√second eigenvalue bound. Every 𝑑-regular graph has second largest
eigenvalue ≥ 2 𝑑 − 1 − 𝑜(1) for the adjacency matrix, with 𝑑 fixed as the number of
vertices goes to infinity.
– Two spectral proof√ methods: (1) constructing a test vector and (2) trace/moment method
– The constant 2 𝑑 − 1 is √ best possible, as a random 𝑑-regular graph is typically an
(𝑛, 𝑑, 𝜆)-graph with 𝜆 = 2 𝑑 − 1 + 𝑜(1) (Friedman’s theorem).
√
– A Ramanujan graph is an (𝑛, 𝑑, 𝜆)-graph with 𝜆 = 2 𝑑 − 1. It is conjectured that for
every 𝑑 ≥ 3, there exist infinitely many 𝑑-regular Ramanujan graphs (this is known to
hold when 𝑑 − 1 is a prime power). A bipartite version of this conjecture is true.
4

Graph limits

Chapter Highlights
• An analytic language for studying dense graphs
• Convergence and limit for a sequence of graphs
• Compactness of the graphon space with respect to the cut metric
• Applications of compactness
• Equivalence of cut metric convergence and left-convergence

The theory of graph limits was developed by Lovász and his collaborators in a series
of works starting around 2003. The researchers were motivated by questions about very
large graphs from several different angles, including from combinatorics, statistical physics,
computer science, and applied math. Graph limits give an analytic framework for analyzing
large graphs. The theory offers both a convenient mathematical language as well as powerful
theorems.

Motivation
Suppose we live in a hypothetical world where we only had access to rational numbers and
had no language for irrational numbers. We are given the following optimization problem:
minimize 𝑥 3 − 𝑥 subject to 0 ≤ 𝑥 ≤ 1.
√
The minimum occurs at 𝑥 = 1/ 3, but this answer does not make sense over the rationals.
With only access to rationals, we can state a progressively improving sequence of answers
that converge to the optimum. This is rather cumbersome. It is much easier to write down a
single real number expressing the answer.
Now consider an analogous question for graphs. Fix some real 𝑝 ∈ [0, 1]. We want to
minimize (# closed walks of length 4)/𝑛4
among 𝑛-vertex graphs with ≥ 𝑝𝑛2 /2 edges.
We know from Proposition 3.1.14 every 𝑛-vertex graph with edge density ≥ 𝑝 has at least
𝑛4 𝑝 4 closed walks of length 4. On the other hand, every sequence of quasirandom graphs
with edge density 𝑝 + 𝑜(1) has 𝑝 4 𝑛4 + 𝑜(𝑛4 ) closed walks of length 4. It follows that the
minimum (or rather, infimum) is 𝑝 4 , and is attained not by any single graph, but rather by a
sequence of quasirandom graphs.
One of the purposes of graph limits is to provide an easy-to-use mathematical object that
captures the limit of such graph sequences. The central object in the theory of graph limits

129
130 Graph limits

is called a graphon (the word comes from combining graph and function), to be defined
shortly. Graphons can be viewed as an analytic generalization of graphs.
Here are some questions that we will consider:
(1) What does it mean for a sequence of graphs (or graphons) to converge?
(2) Are different notions of convergence equivalent?
(3) Does every convergent sequence of graphs (or graphons) have a limit?
Note that it is possible to talk about convergence without a limit. In a first real analysis
course, one learns about a Cauchy sequence in a metric space (X, 𝑑), which is some
sequence 𝑥1 , 𝑥2 , · · · ∈ X such that for every 𝜀 > 0, there is some 𝑁 so that 𝑑 (𝑥 𝑚 , 𝑥 𝑛 ) < 𝜀
for all 𝑚, 𝑛 ≥ 𝑁. For instance, one can have a Cauchy sequence without a limit in Q. A
metric space is complete if every Cauchy sequence has a limit. The completion of X is some
complete metric space X e such that X is isometrically embedded in X
e as a dense subset. The
completion of X is in some sense the smallest complete space containing X. For example, R
is the completion of Q. Intuitively, the completion of a space fills in all of its gaps. A basic
result in analysis says that every space has a unique completion.
Here is a key result about graph limits that we will prove:
The space of graphons is compact, and is the completion of the set of graphs.
To make this statement precise, we also need to define a notion of similarity (i.e., distance)
between graphs, and also between graphons. We will see two different notions, one based
on the cut metric, and another based on subgraph densities. Another important result in the
theory of graph limits is that these two notions are equivalent. We will prove it at the end of
the chapter once we have developed some tools.

4.1 Graphons
Here is the central object in the theory of dense graph limits.

Definition 4.1.1 (Graphon)

A graphon is a symmetric measurable function 𝑊 : [0, 1] 2 → [0, 1]. Here symmetric
means 𝑊 (𝑥, 𝑦) = 𝑊 (𝑦, 𝑥) for all 𝑥, 𝑦.

Remark 4.1.2. More generally, we can consider an arbitrary probability space Ω and study
symmetric measurable functions Ω × Ω → [0, 1]. In practice, we do not lose much by
restricting to [0, 1].
We will also sometimes consider symmetric measurable functions [0, 1] 2 → R (e.g.,
arising as the difference between two graphons). Such an object is sometimes called a kernel
in the literature.
Remark 4.1.3 (Measure theoretic technicalities). We try to sweep measure theoretic tech-
nicalities under the rug in order to focus on key ideas. If you have not seen measure theory
before, do not worry. Just view “measure” as lengths of intervals or areas of boxes (or count-
able unions thereof) in the most natural sense. We always ignore measure zero differences.
For example, we shall treat two graphons as the same if they only differ on a measure zero
subset of the domain.
4.1 Graphons 131

Turning a graph into a graphon

Here is a procedure to turn any graph 𝐺 into a graphon 𝑊𝐺 :
(1) Write down the adjacency matrix 𝐴𝐺 of the graph;
(2) Replace the matrix by a black and white pixelated picture on [0, 1] 2 , by turning every
1-entry into a black square and every 0-entry into a white square.
(3) View the resulting picture as a graphon 𝑊𝐺 : [0, 1] 2 → [0, 1] (with the axes labeled
like a matrix with 𝑥 ∈ [0, 1] running from top to bottom and 𝑦 ∈ [0, 1] running from
left to right), where we write 𝑊𝐺 (𝑥, 𝑦) = 1 if (𝑥, 𝑦) is black and 𝑊𝐺 (𝑥, 𝑦) = 0 if (𝑥, 𝑦)
is white.
An equivalent definition is given below. As with everything in this chapter, we ignore
measure zero differences, and so it does not matter what we do with boundaries of the pixels.

Definition 4.1.4 (Associated graphon of a graph)

Given a graph 𝐺 with 𝑛 vertices labeled 1, . . . , 𝑛, we define its associated graphon
𝑾𝑮 : [0, 1] 2 → [0, 1] by first partitioning [0, 1] into 𝑛 equal-length intervals 𝐼1 , . . . , 𝐼𝑛
and setting 𝑊𝐺 to be 1 on all 𝐼𝑖 × 𝐼 𝑗 where 𝑖 𝑗 is an edge of 𝐺, and 0 on all other 𝐼𝑖 × 𝐼 𝑗 ’s.

More generally, we can encode nonnegative vertex and edge weights in a graphon.

Definition 4.1.5 (Step graphon)

A step graphon 𝑊 with 𝑘 steps consists of first partitioning [0, 1] into 𝑘 intervals
𝐼1 , . . . , 𝐼 𝑘 , and then setting 𝑊 to be a constant on each 𝐼𝑖 × 𝐼 𝑗 .

Example 4.1.6 (Half-graph). Consider the bipartite graph on 2𝑛 vertices, with one vertex
part {𝑣 1 , . . . , 𝑣 𝑛 } and the other vertex part {𝑤 1 , . . . , 𝑤 𝑛 }, and edges 𝑣 𝑖 𝑤 𝑗 whenever 𝑖 ≤ 𝑗. Its
adjacency matrix and associated graphon are illustrated below.
0 0 0 0 0 0 1 1 1 1 1 1
1 7 0 0 0 0 0 0 0 1 1 1 1 1
0 0 0 0 0 0 0 0 1 1 1 1
2 8 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 1 1
3 9 0 0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0
4 10 1 1 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0
5 11 1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
6 12 1 1 1 1 1 1 0 0 0 0 0 0

As 𝑛 → ∞, the associated graphons converge pointwise almost everywhere to the graphon

(
0 1
0
1 if 𝑥 + 𝑦 ≤ 1/2 or 𝑥 + 𝑦 ≥ 3/2,
𝑊 (𝑥, 𝑦) =
0 otherwise.
1

In general, pointwise convergence turns out to be too restrictive. We will need a more
flexible notion of convergence, which we will discuss more in depth in the next section. Let
us first give some more examples to motivate subsequent definitions.
Example 4.1.7 (Quasirandom graphs). Let 𝐺 𝑛 be a sequence of quasirandom graphs with
edge density approaching 1/2, and 𝑣(𝐺 𝑛 ) → ∞. The constant graphon 𝑊 ≡ 1/2 seems like
a reasonable candidate for its limit, and later we will see that this is indeed the case.
132 Graph limits

−→ 1
2

Example 4.1.8 (Stochastic block model). Consider an 𝑛 vertex graph with two types of
vertices: red and blue. Half of the vertices are red and half of the vertices are blue. Two red
vertices are adjacent with probability 𝑝 𝑟 , two blue vertices are adjacent with probability 𝑝 𝑏 ,
and finally, a red vertex and a blue vertex are adjacent with probability 𝑝 𝑟 𝑏 , all independently.
Then as 𝑛 → ∞, the graphs converge to the step graphon shown below.

𝑝𝑟 𝑝𝑟 𝑏
−→
𝑝𝑟 𝑏 𝑝𝑏

The above examples suggest that the limiting graphon looks like a blurry image of the
adjacency matrix. However, there is an important caveat as illustrated in the next example.
Example 4.1.9 (Checkerboard). Consider the 2𝑛×2𝑛 “checkerboard” graphon shown below
(for 𝑛 = 4).
1 2
3 4
5 6
7 8

Since the 0’s and 1’s in the adjacency matrix are evenly spaced, one might suspect that
this sequence converges to the constant 1/2 graphon. However, this is not so. The checker-
board graphon is associated to the complete bipartite graph 𝐾𝑛,𝑛 , with the two vertex parts
interleaved. By relabeling the vertices, we see that below is another representation of the
associated graphon of the same graph.
1 5
2 6
3 7
4 8

So the graphon is the same for all 𝑛. So the graphon shown on the right, which is also 𝑊𝐾2 ,
must be the limit of the sequence, and not the constant 1/2 graphon.
This example tells us that we must be careful about the possibility of rearranging vertices
when studying graph limits.
A graphon is an infinite dimensional object. We would like some ways to measure the
similarity between two graphons. We will explain two different approaches:
• cut distance, and
4.2 Cut Distance 133

• homomorphism densities.
One of the main results in the theory of graph limits is that these two approaches are
equivalent—we will show this later in the chapter.

4.2 Cut Distance

There are many ways to measure the distance between two graphs. Different methods may
be useful for different applications. For example, we can consider the edit distance between
two graphs (say on the same set of vertices), defined to be the number of edges needed to
be added/deleted to obtain one graph from the other. The notion of edit distance arose when
discussing the induced graph removal lemmas in Section 2.8. However, edit distance is not
suitable for graph limits since it is incompatible with (quasi)random graphs. For example,
given two 𝑛-vertex random graphs, independently generated with edge-probability 1/2, we
would like to say that they are similar as these graphs will end up converging to the constant
1/2 graphon as 𝑛 → ∞ (e.g., Example 4.1.7). However, two independent random graphs
typically only agree on around half of their edges (even if we allow permuting vertices), and
so it takes (1/4 + 𝑜(1))𝑛2 edge additions/deletions to obtain one from the other.
A more suitable notion of distance is motivated by the discrepancy condition from Theo-
rem 3.1.1 on quasirandom graphs. Inspired by the condition DISC, we would like to say that
a graph 𝐺 is 𝜀-close to the constant 𝑝 graphon if
|𝑒 𝐺 (𝑋, 𝑌 ) − 𝑝 |𝑋 | |𝑌 || ≤ 𝜀 |𝑉 (𝐺)| 2 for all 𝑋, 𝑌 ⊆ 𝑉 (𝐺).
Inspired by this notion, we now compare a pair of graphs 𝐺 and 𝐺 ′ on a common vertex set
𝑉 = 𝑉 (𝐺) = 𝑉 (𝐺 ′ ). We say that 𝑮 and 𝑮 ′ are 𝜺-close in cut norm if
|𝑒 𝐺 (𝑋, 𝑌 ) − 𝑒 𝐺 ′ (𝑋, 𝑌 )| ≤ 𝜀 |𝑉 | 2 for all 𝑋, 𝑌 ⊆ 𝑉 . (4.1)
(This term “cut” is often used to refer to the set of edges in a graph 𝐺 between some
𝑋 ⊆ 𝑉 (𝐺) and its complement. The cut norm builds on this concept.) With this notion, two
independent 𝑛-vertex random graphs with the same edge-probability are 𝑜(1)-close in cut
norm as 𝑛 → ∞.
As illustrated in Example 4.1.9, we also need to consider possible relabelings of vertices.
Intuitively, the cut distance between two graphs will come from the relabeling of vertices that
gives the greatest alignment. The actual definition will be a bit more subtle, allowing vertex
fractionalization. The general definition of cut distance will allow us to compare graphs with
different numbers of vertices. It is conceptually easier to define cut distance using graphons.
The edit distance of graphs corresponds to the 𝐿 1 distance for graphons. For every 𝑝 ≥ 1,
we define the 𝑳 𝒑 norm of a function 𝑊 : [0, 1] 2 → R by
∫ 1/ 𝑝
∥𝑾 ∥ 𝒑 := 𝑝
|𝑊 (𝑥, 𝑦)| 𝑑𝑥𝑑𝑦 ,
[0,1] 2

and the 𝑳 norm by

∞

∥𝑾 ∥ ∞ := sup 𝑡 : 𝑊 −1 ( [𝑡, ∞)) has positive measure .
(This is not simply the supremum of 𝑊; the definition should be invariant under measure
zero changes of 𝑊.)
134 Graph limits

Definition 4.2.1 (Cut norm)

The cut norm of a measurable 𝑊 : [0, 1] 2 → R is defined as
∫
∥𝑾 ∥ □ := sup 𝑊 ,
𝑆,𝑇 ⊆ [0,1] 𝑆×𝑇

where 𝑆 and 𝑇 are measurable sets.

Let 𝐺 and 𝐺 ′ be two graphs sharing a common vertex set. Let 𝑊𝐺 and 𝑊𝐺 ′ be their
associated graphons (using the same ordering of vertices when constructing the graphons).
Then 𝐺 and 𝐺 ′ are 𝜀-close in cut norm (see (4.1)) if and only if
∥𝑊𝐺 − 𝑊𝐺 ′ ∥ □ ≤ 𝜀.
(There is a subtlety in this claim that is worth thinking about: should we be worried about
sets 𝑆, 𝑇 ⊆ [0, 1] in Definition 4.2.1 of cut norm that contain fractions of some intervals that
represent vertices? See Lemma 4.5.3 for a reformulation of the cut norm that may shed some
light.)
We need a concept for an analog of a vertex set permutation for graphons. We write
𝝀(𝑨) := the Lebesgue measure of 𝐴.
Intuitively, this is the “length” or “area” of 𝐴. We will always be referring to Lebesgue
measurable sets (measure theoretic technicalities are not central to the discussions here, so
feel free to ignore them).

Definition 4.2.2 (Measure preserving map)

We say that 𝜙 : [0, 1] → [0, 1] is a measure preserving map if
𝜆( 𝐴) = 𝜆(𝜙 −1 ( 𝐴)) for all measurable 𝐴 ⊆ [0, 1].
We say that 𝜙 is an invertible measure preserving map if there is another measure
preserving map 𝜓 : [0, 1] → [0, 1] such that 𝜙 ◦ 𝜓 and 𝜓 ◦ 𝜙 are both identity maps
outside sets of measure zero.

Example 4.2.3. For any constant 𝛼 ∈ R, the function 𝜙(𝑥) = 𝑥 + 𝛼 mod 1 is measure
preserving (this map rotates the circle R/Z by 𝛼).
A more interesting example is, 𝜙(𝑥) = 2𝑥 mod 1, illustrated below.
1 1

𝑓 (𝑥) = 2𝑥 mod 1
𝐴

0 0
0 1 0 1
𝑓 −1 ( 𝐴)

This map is also measure preserving. This might not seem to be the case at first, since 𝑓 seems
to shrink some intervals by half. However, the definition of measure preserving actually says
𝜆( 𝑓 −1 ( 𝐴)) = 𝜆( 𝐴) and not 𝜆( 𝑓 ( 𝐴)) = 𝜆( 𝐴). For any interval [𝑎, 𝑏] ⊆ [0, 1], we have
4.2 Cut Distance 135

𝑓 −1 ([𝑎, 𝑏]) = [𝑎/2, 𝑏/2] ∪ [1/2 + 𝑎/2, 1/2 + 𝑏/2], which does have the same measure as
[𝑎, 𝑏]. This map is 2-to-1, and it is not invertible.
Given 𝑊 : [0, 1] 2 → R and an invertible measure preserving map 𝜙 : [0, 1] → [0, 1], we
write
𝑾 𝝓 (𝒙, 𝒚) := 𝑊 (𝜙(𝑥), 𝜙(𝑦)).
Intuitively, this operation relabels the vertex set.

Definition 4.2.4 (Cut metric)

Given two symmetric measurable functions 𝑈, 𝑊 : [0, 1] 2 → R, we define their cut
distance (or cut metric) to be
𝜹□ (𝑼, 𝑾) := inf 𝑈 − 𝑊 𝜙 □
∫
𝜙

= inf sup (𝑈 (𝑥, 𝑦) − 𝑊 (𝜙(𝑥), 𝜙(𝑦))) 𝑑𝑥𝑑𝑦 ,

𝜙 𝑆,𝑇 ⊆ [0,1] 𝑆×𝑇

where the infimum is taken over all invertible measure preserving maps 𝜙 : [0, 1] →
[0, 1]. Define the cut distance between two graphs 𝐺 and 𝐺 ′ by the cut distance of their
associated graphons:
𝜹□ (𝑮, 𝑮 ′ ) := 𝛿□ (𝑊𝐺 , 𝑊𝐺 ′ ).
Likewise, we can also define the cut distance between a graph and a graphon 𝑈:
𝜹□ (𝑮, 𝑼) := 𝛿□ (𝑊𝐺 , 𝑈).

Definition 4.2.5 (Convergence in cut metric)

We say that a sequence of graphs or graphons converges in cut metric if they form a
Cauchy sequence with respect to 𝛿□ . Furthermore, we say that 𝑊𝑛 converges to 𝑾 in cut
metric if 𝛿□ (𝑊𝑛 , 𝑊) → 0 as 𝑛 → ∞.
Note that in 𝛿□ (𝐺, 𝐺 ′ ), we are doing more than just permuting vertices. A measure
preserving map on [0, 1] is also allowed to split a single node into fractions.
It is possible for two different graphons to have cut distance zero. For example, they could
differ on a measure-zero set, or could be related via measure preserving maps.

Space of graphons
We can form a metric space by identifying graphons with measure zero (i.e., treating such
two graphs with cut distance zero as the same point).

Definition 4.2.6 (Graphon space)

f0 be the set of graphons (i.e., symmetric measurable functions [0, 1] 2 → [0, 1])
Let W
where any pair of graphons with cut distance zero are considered the same point in the
space. This is a metric space under cut distance 𝛿□ .
We view every graph 𝐺 as a point in W f0 via its associated graphon (note that several
graphons can be identified as the same point in Wf0 ).
136 Graph limits
f0 is conventional. Sometimes, without the subscript, W
(The subscript 0 in W f is used to
2
denote the space of symmetric measurable functions [0, 1] → R.)
Here is a central theorem in the theory of graph limits, proved by Lovász & Szegedy
(2007).

Theorem 4.2.7 (Compactness of graphon space)

f0 , 𝛿□ ) is compact.
The metric space ( W

One of the main goals of this chapter is to prove this theorem and show its applications.
The compactness of graphon space is related to the graph regularity lemma. In fact,
we will use the regularity method to prove compactness. Both compactness and the graph
regularity lemma tell us that despite the infinite variability of graphs, every graph can be
𝜀-approximated by a graph from a finite set of templates.
We close this section with the following observation.

Theorem 4.2.8 (Graphs are dense in the space of graphons)

f0 , 𝛿□ ).
The set of graphs is dense in ( W

Proof. Let 𝜀 > 0. It suffices to show that for every graphon 𝑊 there exists a graph 𝐺 such
that 𝛿□ (𝐺, 𝑊) < 𝜀.
We approximate 𝑊 in several steps, illustrated below.

𝑊 𝑊1 𝑊2

First, by rounding down the values of 𝑊 (𝑥, 𝑦), we construct a graphon 𝑊1 whose values
are all integer multiples of 𝜀/3, such that
∥𝑊 − 𝑊1 ∥ ∞ ≤ 𝜀/3.
Next, since every Lebesgue measurable subset of [0, 1] 2 can be arbitrarily well approx-
imated using a union of boxes, we can find a step graphon 𝑊2 approximating 𝑊1 in 𝐿 1
norm:
∥𝑊1 − 𝑊2 ∥ 1 ≤ 𝜀/3.
Finally, by replacing each block of 𝑊2 by a sufficiently large quasirandom (bipartite) graph
of edge density equal to the value of 𝑊2 , we find a graph 𝐺 so that
∥𝑊2 − 𝑊𝐺 ∥ □ ≤ 𝜀/3.
Then 𝛿□ (𝑊, 𝐺) < 𝜀. □
Remark 4.2.9. In the above proof, to obtain ∥𝑊1 − 𝑊2 ∥ 1 ≤ 𝜀/3, the number of steps of 𝑊2
cannot be uniformly bounded as a function of 𝜀 (i.e., it must depend on 𝑊 as well—think
4.3 Homomorphism Density 137

about what happens for a random graph). Consequently the number of vertices of the final
graph 𝐺 produced by this proof is not bounded by a function of 𝜀.
Later on, we will see a different proof showing that for every 𝜀 > 0, there is some
𝑁 (𝜀) so that every graphon lies within cut distance 𝜀 of some graph with ≤ 𝑁 (𝜀) vertices
(Proposition 4.8.1).
Since every compact metric space is complete, we have the following corollary.

Corollary 4.2.10 (Graphons complete graphs)

f0 , 𝛿□ ) is the completion of the space of graphs with respect to the
The graphon space ( W
cut metric.

Exercise 4.2.11 (Zero-one valued graphons). Let 𝑊 be a {0, 1}-valued graphon. Sup-
pose graphons 𝑊𝑛 satisfy ∥𝑊𝑛 − 𝑊 ∥ □ → 0 as 𝑛 → ∞. Show that ∥𝑊𝑛 − 𝑊 ∥ 1 → 0 as
𝑛 → ∞.

4.3 Homomorphism Density

Subgraph densities give another way of measuring graphs. It will be technically more con-
venient to work with graph homomorphisms instead of subgraphs.

Definition 4.3.1 (Homomorphism density)

A graph homomorphism from 𝐹 to 𝐺 is a map 𝜙 : 𝑉 (𝐹) → 𝑉 (𝐺) such that if 𝑢𝑣 ∈ 𝐸 (𝐹)
then 𝜙(𝑢)𝜙(𝑣) ∈ 𝐸 (𝐺) (i.e., 𝜙 maps edges to edges). Define
Hom(𝑭, 𝑮) := {homomorphisms from 𝐹 to 𝐺}
and
hom(𝑭, 𝑮) := |Hom(𝐹, 𝐺)| .
Define the 𝑭-homomorphism density in 𝑮 (or 𝑭-density in 𝑮 for short) as
hom(𝐹, 𝐺)
𝒕(𝑭, 𝑮) := .
𝑣(𝐺) 𝑣 (𝐹 )
This is also the probability that a uniformly random map 𝑉 (𝐹) → 𝑉 (𝐺) induces a graph
homomorphism from 𝐹 to 𝐺.

Example 4.3.2 (Homomorphism counts).

• hom(𝐾1 , 𝐺) = 𝑣(𝐺).
• hom(𝐾2 , 𝐺) = 2𝑒(𝐺).
• hom(𝐾3 , 𝐺) = 6 · #triangles in 𝐺
• hom(𝐺, 𝐾3 ) is the number of proper colorings of 𝐺 using three labeled colors such as
{red, green, blue} (corresponding to the vertices of 𝐾3 ).
Remark 4.3.3 (Subgraphs vs. homomorphisms). Note that homomorphisms from 𝐹 to 𝐺
do not quite correspond to copies of subgraphs 𝐹 inside 𝐺, because these homomorphisms
138 Graph limits

can be non-injective. Define the injective homomorphism density

#injective homomorphisms from 𝐹 to 𝐺
𝒕 inj (𝑭, 𝑮) := .
𝑣(𝐺) (𝑣(𝐺) − 1) · · · (𝑣(𝐺) − 𝑣(𝐹) + 1)
Equivalently, this is the fraction of injective maps 𝑉 (𝐹) → 𝑉 (𝐺) that are graph homomor-
phisms (i.e., send edges to edges). The fraction of maps 𝑉 (𝐹) → 𝑉 (𝐺) that are non-injective
)
is ≤ 𝑣 (𝐹
2
/𝑣(𝐺) (for every fixed pair of vertices of 𝐹, the probability that they collide is
exactly 1/𝑣(𝐺)). So

1 𝑣(𝐹)
𝑡 (𝐹, 𝐺) − 𝑡inj (𝐹, 𝐺) ≤ .
𝑣(𝐺) 2
If 𝐹 is fixed, the right-hand side tends to zero as 𝑣(𝐺) → ∞. So all but a negligible fraction
of such homomorphisms correspond to subgraphs. This is why we often treat subgraph
densities interchangeably with homomorphism densities as they agree in the limit.
Now we define the corresponding notion of homomorphism density in graphons. We first
give an example and then the general formula.
Example 4.3.4 (Triangle density in graphons). The following quantity is the triangle density
in a graphon 𝑊.
∫
𝑡 (𝐾3 , 𝑊) = 𝑊 (𝑥, 𝑦)𝑊 (𝑦, 𝑧)𝑊 (𝑧, 𝑥) 𝑑𝑥𝑑𝑦𝑑𝑧.
[0,1] 3

This definition agrees with Definition 4.3.1 for the triangle density in graphs. Indeed, for
every graph 𝐺, the triangle density in 𝐺 equals the triangle density in the associated graphon
𝑊𝐺 ; that is, 𝑡 (𝐾3 , 𝑊𝐺 ) = 𝑡 (𝐾3 , 𝐺).

Definition 4.3.5 (Homomorphism density in graphon)

Let 𝐹 be a graph and 𝑊 a graphon. The 𝑭-density in 𝑾 is defined to be
∫ Ö Ö
𝑡 (𝐹, 𝑊) = 𝑊 (𝑥 𝑖 , 𝑥 𝑗 ) 𝑑𝑥𝑖 .
[0,1] 𝑉 (𝐹) 𝑖 𝑗 ∈𝐸 (𝐹 ) 𝑖 ∈𝑉 (𝐹 )

We also use the same formula when 𝑊 is a symmetric measurable function.

Note that for all graphs 𝐹 and 𝐺, letting 𝑊𝐺 be the graphon associated to 𝐺,
𝑡 (𝐹, 𝐺) = 𝑡 (𝐹, 𝑊𝐺 ). (4.2)
So the two definitions of 𝐹-density agree.

Definition 4.3.6 (Left convergence)

We say that a sequence of graphons 𝑊𝑛 is left-convergent if for every graph 𝐹, 𝑡 (𝐹, 𝑊𝑛 )
converges as 𝑛 → ∞. We say that this sequence left-converges to a graphon 𝑊 if
lim𝑛→∞ 𝑡 (𝐹, 𝑊𝑛 ) = 𝑡 (𝐹, 𝑊) for every graph 𝐹.
For a sequence of graphs, we say that it is left-convergent if the sequence of associated
graphons 𝑊𝑛 = 𝑊𝐺𝑛 is left-convergent, and that it left-converges to 𝑊 if 𝑊𝑛 does.

One usually has 𝑣(𝐺 𝑛 ) → ∞, but it is not strictly necessary for this definition. Note
4.4 𝑊-Random Graphs 139

that when 𝑣(𝐺 𝑛 ) → ∞, homomorphism densities and subgraph densities coincide (see
Remark 4.3.3).
It turns out that left-convergence is equivalent to convergence in cut metric. This founda-
tional result in graph limits is due to Borgs, Chayes, Lovász, Sós, & Vesztergombi (2008).

Theorem 4.3.7 (Equivalence of convergence)

A sequence of graphons is left-convergent if and only if it is a Cauchy sequence with
respect to the cut metric 𝛿□ .
The sequence left-converges to some graphon 𝑊 if and only if it converges to 𝑊 in cut
metric.

The implication that convergence in cut metric implies left-convergence is easier; it follows
from the counting lemma (Section 4.5). The converse is more difficult, and we will establish
it at the end of the chapter.
This allows us to talk about convergent sequences of graphs or graphons without spec-
ifying whether we are referring to left-convergence or convergence in cut metric. However,
since a major goal of this chapter is to prove the equivalence between these two notions, we
will be more specific about the notion of convergence.
From the compactness of the space of graphons and the equivalence of convergence
(actually only needing the easier implication), we will be able to quickly deduce the existence
of limit for a left-convergent sequence, which was first proved by Lovász & Szegedy (2006).
Note that the following statement does not require knowledge of the cut metric.

Theorem 4.3.8 (Existence of limit for left-convergence)

Every left-convergent sequence of graphs or graphons left-converges to some graphon.

Remark 4.3.9. One can artificially define a metric that coincides with left-convergence. Let
(𝐹𝑛 ) 𝑛≥1 enumerate over all graphs. One can define a distance between graphons 𝑈 and 𝑊 by
∑︁
2−𝑘 |𝑡 (𝐹𝑘 , 𝑊) − 𝑡 (𝐹𝑘 , 𝑈)| .
𝑘 ≥1

We see that a sequence of graphons convergences under this notion of distance if and only if
it is left-convergent. This shows that left-convergence defines a metric topology on the space
of graphons, but in practice the above distance is pretty useless.
Exercise 4.3.10 (Counting Eulerian orientations). Define 𝑊 : [0, 1] 2 → R by 𝑊 (𝑥, 𝑦) =
2 cos(2𝜋(𝑥 − 𝑦)). Let 𝐹 be a graph. Show that 𝑡 (𝐹, 𝑊) is the number of ways to orient all
edges of 𝐹 so that every vertex has the same number of incoming edges as outgoing edges.

4.4 𝑊-Random Graphs

In this section, we explain how to use a graphon to create a random graph model. This
hopefully gives more intuition about graphons.
The most common random graph model is the Erdős–Rényi random graph G(𝑛, 𝑝), which
is an 𝑛-vertex graph with every edge chosen with probability 𝑝.
140 Graph limits

Stochastic block model

The stochastic block model is a random graph model that generalizes the Erdős–Rényi
random graph. We already saw an example in Example 4.1.8. Let us first illustrate the
two-block model, which has several parameters:
𝑞𝑟 𝑞𝑏
𝑞 𝑟 𝑝 𝑟𝑟 𝑝 𝑟 𝑏
𝑞 𝑏 𝑝 𝑟 𝑏 𝑝 𝑏𝑏

with all the numbers lying in [0, 1], and subject to 𝑞 𝑟 + 𝑞 𝑏 = 1. We form a 𝑛-vertex random
graph as follows:
(1) Color each vertex red with probability 𝑞 𝑟 and blue with probability 𝑞 𝑏 , independently
at random. These vertex colors are “hidden states” and are not part of the data of
the output random graph (this step is slightly different from Example 4.1.8 in an
unimportant way);
(2) For every pair of vertices, independently place an edge between them with probability
• 𝑝 𝑟𝑟 if both vertices are red,
• 𝑝 𝑏𝑏 if both vertices are blue, and
• 𝑝 𝑟 𝑏 if one vertex is red and the other is blue.
One can easily generalize the above to a 𝒌-block model, where vertices have 𝑘 hidden
states, with 𝑞 1 , . . . , 𝑞 𝑘 (adding up to 1) being the vertex state probabilities, and a symmetric
𝑘 × 𝑘 matrix ( 𝑝 𝑖 𝑗 )1≤𝑖, 𝑗 ≤ 𝑘 of edge probabilities for pairs of vertices between various states.

𝑊-random graph
The 𝑊-random graph is a further generalization. The stochastic block model corresponds to
step graphons 𝑊.
𝑥3 𝑥5 𝑥1 𝑥2 𝑥4
𝑥3

𝑥5
𝑥1

𝑥2
𝑥4

Definition 4.4.1 (𝑊 -random graph)

Let 𝑊 be a graphon. The 𝑛-vertex 𝑾-random graph G(𝒏, 𝑾) denotes the 𝑛-vertex ran-
dom graph (with vertices labeled 1, . . . , 𝑛) obtained by first picking 𝑥1 , . . . , 𝑥 𝑛 uniformly
at random from [0, 1], and then putting an edge between vertices 𝑖 and 𝑗 with probability
𝑊 (𝑥𝑖 , 𝑥 𝑗 ), independently for all 1 ≤ 𝑖 < 𝑗 ≤ 𝑛.

Let us show that these 𝑊-random graphs left-converge to 𝑊 with probability 1.

4.4 𝑊-Random Graphs 141

Theorem 4.4.2 (𝑊 -random graphs left-converge to 𝑊 )

Let 𝑊 be a graphon. For each 𝑛, let 𝐺 𝑛 be a random graph distributed as G(𝑛, 𝑊). Then
𝐺 𝑛 left-converges to 𝑊 with probability 1.

Remark 4.4.3. The theorem does not require each 𝐺 𝑛 to be sampled independently. For
example, we can construct the sequence of random graphs, with 𝐺 𝑛 distributed as G(𝑛, 𝑊),
by revealing one vertex at a time without resampling the previous vertices and edges. In this
case, each 𝐺 𝑛 is a subgraph of the next graph 𝐺 𝑛+1 .
We will need the following standard result about concentration of Lipschitz functions. This
can be proved using Azuma’s inequality (e.g., see Chapter 7 of The Probabilistic Method by
Alon & Spencer).

Theorem 4.4.4 (Bounded differences inequality)

Let 𝑋1 ∈ Ω1 , . . . , 𝑋𝑛 ∈ Ω𝑛 be independent random variables. Suppose 𝑓 : Ω1 ×· · ·×Ω𝑛 →
R is 𝐿-Lipschitz for some constant 𝐿 in the sense of satisfying
𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) − 𝑓 (𝑥1′ , . . . , 𝑥 𝑛′ ) ≤ 𝐿 (4.3)
whenever (𝑥 1 , . . . , 𝑥 𝑛 ) and (𝑥 1′ , . . . , 𝑥 𝑛′ ) differ on exactly one coordinate. Then the random
variable 𝑍 = 𝑓 (𝑋1 , . . . , 𝑋𝑛 ) satisfies, for every 𝜆 ≥ 0,
2 2
P(𝑍 − E𝑍 ≥ 𝜆𝐿) ≤ 𝑒 −2𝜆 /𝑛 and P(𝑍 − E𝑍 ≤ −𝜆𝐿) ≤ 𝑒 −2𝜆 /𝑛 .

Let us show that the 𝐹-density in a 𝑊-random graph rarely differs significantly from
𝑡 (𝐹, 𝑊).

Theorem 4.4.5 (Sample concentration for graphons)

For every 𝜀 > 0, positive integer 𝑛, graph 𝐹, and graphon 𝑊, we have

−𝜀 2 𝑛
P (|𝑡 (𝐹, G(𝑛, 𝑊)) − 𝑡 (𝐹, 𝑊)| > 𝜀) ≤ 2 exp . (4.4)
8𝑣(𝐹) 2

Proof. Recall from Remark 4.3.3 that the injective homomorphism density 𝑡 inj (𝐹, 𝐺) is
defined to be the fraction of injective maps 𝑉 (𝐹) → 𝑉 (𝐺) that carry every edge of 𝐹 to an
edge of 𝐺. We will first prove that

−𝜀 2 𝑛
P 𝑡 inj (𝐹, G(𝑛, 𝑊)) − 𝑡 (𝐹, 𝑊) > 𝜀 ≤ 2 exp . (4.5)
2𝑣(𝐹) 2
Let 𝑦 1 , . . . , 𝑦 𝑛 , and 𝑧𝑖 𝑗 for each 1 ≤ 𝑖 < 𝑗 ≤ 𝑛, be independent uniform random variables in
[0, 1]. Let 𝐺 be the graph on vertices {1, . . . , 𝑛} with an edge between 𝑖 and 𝑗 if and only if
𝑧𝑖 𝑗 ≤ 𝑊 (𝑦 𝑖 , 𝑦 𝑗 ), for every 𝑖 < 𝑗. Then 𝐺 has the same distribution as G(𝑛, 𝑊). Let us group
variables 𝑦 𝑖 , 𝑧𝑖 𝑗 into 𝑥1 , 𝑥2 , . . . , 𝑥 𝑛 where
𝑥1 = (𝑦 1 ), 𝑥2 = (𝑦 2 , 𝑧12 ), 𝑥3 = (𝑦 3 , 𝑧13 , 𝑧23 ), 𝑥4 = (𝑦 4 , 𝑧14 , 𝑧24 , 𝑧34 ), ....
This amounts to exposing the graph 𝐺 one vertex at a time. Define the function 𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) =
𝑡 inj (𝐹, 𝐺). Note that E 𝑓 = E 𝑡 inj (𝐹, G(𝑛, 𝑊)) = 𝑡 (𝐹, 𝑊) by linearity of expectations (in this
step, it is important that we are using the injective variant of homomorphism densities). Note
142 Graph limits

changing a single coordinate of 𝑓 changes the value of the function by at most 𝑣(𝐹)/𝑛, since
exactly a 𝑣(𝐹)/𝑛 fraction of injective maps 𝑉 (𝐹) → 𝑉 (𝐺) includes a fixed 𝑣 ∈ 𝑉 (𝐺) in the
image. Then (4.5) follows from the bounded differences inequality, Theorem 4.4.4.
To deduce the theorem from (4.5), recall from Remark 4.3.3 that
𝑡 (𝐹, 𝐺) − 𝑡 inj (𝐹, 𝐺) ≤ 𝑣(𝐹) 2 /(2𝑣(𝐺)).
If 𝜀 < 𝑣(𝐹) 2 /𝑛, then the right-hand side of (4.4) is at least 2𝑒 − 𝜀/8 ≥ 1, and so the inequality
trivially holds. Otherwise, |𝑡 (𝐹, G(𝑛, 𝑊)) − 𝑡 (𝐹, 𝑊)| > 𝜀 implies 𝑡 inj (𝐹, G(𝑛, 𝑊)) − 𝑡 (𝐹, 𝑊) >
𝜀 − 𝑣(𝐹) 2 /(2𝑛) ≥ 𝜀/2, and then we can apply (4.5) to conclude. □
Theorem 4.4.2 then follows from the Borel–Cantelli lemma, stated below, applied to
Theorem 4.4.5 with a union bound over all rational 𝜀 > 0.

Theorem 4.4.6 (Borel–Cantelli lemma)

Í
Given a sequence of events 𝐸 1 , 𝐸 2 , . . . , if 𝑛 P(𝐸 𝑛 ) < ∞, then with probability 1, only
finitely of them occur.

4.5 Counting Lemma

In Chapter 2 on the graph regularity lemma, we proved a counting lemma to lower bound the
number of copies of some fixed graph 𝐻 in a regularity partition. The same techniques can be
modified to give a similar upper bound. Here we prove another graph counting lemma. The
proof is more analytic, whereas the previous proofs in Chapter 2 were more combinatorial
(embedding one vertex at a time).

Theorem 4.5.1 (Counting lemma)

Let 𝐹 be a graph. Let 𝑊 and 𝑈 be graphons. Then
|𝑡 (𝐹, 𝑊) − 𝑡 (𝐹, 𝑈)| ≤ |𝐸 (𝐹)| 𝛿□ (𝑊, 𝑈).

Qualitatively, the counting lemma tells us that for every graph 𝐹, the function 𝑡 (𝐹, ·) is
continuous in ( Wf0 , 𝛿□ ), the graphon space with respect to the cut metric. It implies the easier
direction of the equivalence in Theorem 4.3.7, namely that convergence in cut metric implies
left-convergence.

Corollary 4.5.2 (Cut metric convergence implies left-convergence)

Every Cauchy sequence of graphons with respect to the cut metric is left-convergent.

In the rest of this section, we prove Theorem 4.5.1. It suffices to prove that
|𝑡 (𝐹, 𝑊) − 𝑡 (𝐹, 𝑈)| ≤ |𝐸 (𝐹)| ∥𝑊 − 𝑈 ∥ □ . (4.6)
Indeed, for every invertible measure preserving map 𝜙 : [0, 1] → [0, 1], we have 𝑡 (𝐹, 𝑈) =
𝑡 (𝐹, 𝑈 𝜙 ). By considering the above inequality with 𝑈 replaced by 𝑈 𝜙 , and taking the infimum
over all 𝑈 𝜙 , we obtain Theorem 4.5.1.
The following reformulation of the cut norm is often quite useful.
4.5 Counting Lemma 143

Lemma 4.5.3 (Reformulation of cut norm)

For every measurable 𝑊 : [0, 1] 2 → R,
∫
∥𝑊 ∥ □ = sup 𝑊 (𝑥, 𝑦)𝑢(𝑥)𝑣(𝑦) 𝑑𝑥𝑑𝑦 .
𝑢,𝑣:[0,1]→[0,1] [0,1] 2
measurable

Proof. We want to show (left-hand side below is how we defined the cut norm in Defini-
tion 4.2.1)
∫ ∫
sup 𝑊 (𝑥, 𝑦)1𝑆 (𝑥)1𝑇 (𝑦) 𝑑𝑥𝑑𝑦 = sup 𝑊 (𝑥, 𝑦)𝑢(𝑥)𝑣(𝑦) 𝑑𝑥𝑑𝑦 .
𝑆,𝑇 ⊆ [0,1] [0,1] 2 𝑢,𝑣:[0,1]→[0,1] [0,1] 2
measurable measurable

The right-hand side is at least as large as the left-hand side since we can take 𝑢 = 1𝑆 and
𝑣 = 1𝑇 . On the other hand, the integral on the right-hand side is bilinear in 𝑢 and 𝑣, and so it is
always possible to change 𝑢 and 𝑣 to {0, 1}-valued functions without decreasing the value of
the integral (e.g., think about what is the best choice for 𝑣 with 𝑢 held fixed, and vice versa).
If 𝑢 and 𝑣 are restricted to {0, 1}-valued functions, then the two sides are identical. □
As a warm up, let us illustrate the proof of the triangle counting lemma, which has all the
ideas of the general proof but with simpler notation. As illustrated below, the main idea to
“replace” 𝑊 by 𝑈 on the triangle one at a time using the cut norm.
𝑊 𝑈 𝑈 𝑈

≈ ≈ ≈
𝑊 𝑊 𝑊 𝑊 𝑊 𝑈 𝑈 𝑈

Proposition 4.5.4 (Triangle counting lemma)

Let 𝑊 and 𝑈 be graphons. Then
|𝑡 (𝐾3 , 𝑊) − 𝑡 (𝐾3 , 𝑈)| ≤ 3 ∥𝑊 − 𝑈 ∥ □ .

Proof. Given three graphons 𝑊12 , 𝑊13 , 𝑊23 , define

∫
𝑡 (𝑊12 , 𝑊13 , 𝑊23 ) = 𝑊12 (𝑥, 𝑦)𝑊13 (𝑥, 𝑧)𝑊23 (𝑦, 𝑧) 𝑑𝑥𝑑𝑦𝑑𝑧.
[0,1] 3

So
𝑡 (𝐾3 , 𝑊) = 𝑡 (𝑊, 𝑊, 𝑊) and 𝑡 (𝐾3 , 𝑈) = 𝑡 (𝑈, 𝑈, 𝑈).
Observe that 𝑡 (𝑊12 , 𝑊13 , 𝑊23 ) is trilinear in 𝑊12 , 𝑊13 , 𝑊23 . We have
∫
𝑡 (𝑊, 𝑊, 𝑊) − 𝑡 (𝑈, 𝑊, 𝑊) = (𝑊 − 𝑈) (𝑥, 𝑦)𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧) 𝑑𝑥𝑑𝑦𝑑𝑧.
[0,1] 3

For any fixed 𝑧, note that 𝑥 ↦→ 𝑊 (𝑥, 𝑧) and 𝑦 ↦→ 𝑊 (𝑦, 𝑧) are both measurable functions
[0, 1] → [0, 1]. So applying Lemma 4.5.3 gives
∫
(𝑊 − 𝑈) (𝑥, 𝑦)𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧) 𝑑𝑥𝑑𝑦 ≤ ∥𝑊 − 𝑈 ∥ □
[0,1] 2
144 Graph limits

for every 𝑧. Now integrate over all 𝑧 and applying the triangle inequality, we obtain
|𝑡 (𝑊, 𝑊, 𝑊) − 𝑡 (𝑈, 𝑊, 𝑊)| ≤ ∥𝑊 − 𝑈 ∥ □ .
We have similar inequalities in the other two coordinates. We can write
𝑡 (𝑊, 𝑊, 𝑊) − 𝑡 (𝑈, 𝑈, 𝑈) = 𝑡 (𝑊, 𝑊, 𝑊 − 𝑈) + 𝑡 (𝑊, 𝑊 − 𝑈, 𝑈) + 𝑡 (𝑊 − 𝑈, 𝑈, 𝑈).
We say that each term on the right-hand side is at most ∥𝑊 − 𝑈 ∥ □ in absolute value. So the
result follows. □
The above proof generalizes in a straightforward way to a general graph counting lemma..
Proof. Given a collection of graphons 𝑊𝑒 indexed by the edges 𝑒 of 𝐹, define
∫ Ö Ö
𝑡 𝐹 (𝑊𝑒 : 𝑒 ∈ 𝐸 (𝐹)) = 𝑊𝑖 𝑗 (𝑥 𝑖 , 𝑥 𝑗 ) 𝑑𝑥 𝑖 .
[0,1] 𝑉 (𝐹) 𝑖 𝑗 ∈𝐸 (𝐹 ) 𝑖 ∈𝑉 (𝐻 )

In particular, this quantity equals 𝑡 (𝐹, 𝑊) if 𝑊𝑒 = 𝑊 for all 𝑒 ∈ 𝐸 (𝐹). A straightforward

generalization of the triangle case shows that if we change exactly one argument in the above
function from 𝑊 to 𝑈, then its value changes by at most ∥𝑊 − 𝑈 ∥ □ in absolute value. Thus,
starting with 𝑡 𝐹 (𝑊𝑒 : 𝑒 ∈ 𝐸 (𝐹)) with every 𝑊𝑒 = 𝑊, we can change each argument from 𝑊
to 𝑈, one by one, resulting in a total change of at most 𝑒(𝐹) ∥𝑊 − 𝑈 ∥ □ . This proves (4.6),
and hence the theorem. □

4.6 Weak Regularity Lemma

In Chapter 2, we defined an 𝜀-regular vertex partition of a graph to be a partition such that
all but 𝜀-fraction of pairs of vertices lie between 𝜀-regular pairs of vertex parts. The number
of parts is at most an exponential tower of height 𝑂 (𝜀 −5 ).
The goal of this section is to introduce a weaker version of the regularity lemma, requiring
substantially fewer parts for the partition. The guarantee provided by the partition can be
captured by the cut norm.
Let us first state this notion for a graph and then for a graphon.

Definition 4.6.1 (Weak regular partition for graphs)

Given graph 𝐺, a partition P = {𝑉1 , . . . , 𝑉𝑘 } of 𝑉 (𝐺) is called weak 𝜺-regular if for all
𝐴, 𝐵 ⊆ 𝑉 (𝐺),
∑︁
𝑘
𝑒( 𝐴, 𝐵) − 𝑑 (𝑉𝑖 , 𝑉 𝑗 )| 𝐴 ∩ 𝑉𝑖 ||𝐵 ∩ 𝑉 𝑗 | ≤ 𝜀𝑣(𝐺) 2 .
𝑖, 𝑗=1

Remark 4.6.2 (Interpreting weak regularity). Given 𝐴, 𝐵 ⊆ 𝑉 (𝐺), suppose we only knew
how many vertices from 𝐴 and 𝐵 lie in each part of the partition (and not specifically which
vertices), and we are asked to predict the number of edges between 𝐴 and 𝐵. Then the sum
above is the number of edges between 𝐴 and 𝐵 that one would naturally expect based on the
edge densities between vertex parts. Being weak regular says that this prediction is roughly
correct.
Weak regularity is more “global” compared to the notion of an 𝜀-regular partition from
4.6 Weak Regularity Lemma 145

Chapter 2. Here 𝐴 and 𝐵 have size a constant order fraction of the entire vertex set, rather
than subsets of individual parts of the partition. The edge densities between certain pairs
𝐴 ∩ 𝑉𝑖 and 𝐵 ∩ 𝑉 𝑗 could differ significantly from that of 𝑉𝑖 and 𝑉 𝑗 . All we ask is that on
average these discrepancies mostly cancel out.
The following weak regularity lemma was proved by Frieze & Kannan (1999), initially
motivated by algorithmic applications that we will mention in Remark 4.6.11.

Theorem 4.6.3 (Weak regularity lemma for graphs)

2
Let 0 < 𝜀 < 1. Every graph has a weak 𝜀-regular partition into at most 41/𝜀 vertex parts.

Now let us state the corresponding notions for graphons.

Definition 4.6.4 (Stepping operator)

Given a symmetric measurable function 𝑊 : [0, 1] 2 → R, and a measurable partition
P = {𝑆1 , . . . , 𝑆 𝑘 } of [0, 1], define a symmetric measurable function 𝑾 P : [0, 1] 2 → R
by setting its value on each 𝑆𝑖 × 𝑆 𝑗 to be the average value of 𝑊 over 𝑆𝑖 × 𝑆 𝑗 (since
we only care about functions up to measure zero sets, we can ignore all parts 𝑆𝑖 with
measure zero).
In other words, 𝑊 P is a step graphon with steps given by P and values given by
averaging 𝑊 over the steps.

Remark 4.6.5. The stepping operator is the orthogonal projection in the Hilbert space
𝐿 2 ([0, 1] 2 ) onto the subspace of functions constant on each step 𝑆𝑖 × 𝑆 𝑗 . It can also be
viewed as the conditional expectation with respect to the 𝜎-algebra generated by 𝑆𝑖 × 𝑆 𝑗 .

Definition 4.6.6 (Weak regular partition for graphons)

Given graphon 𝑊, we say that a measurable partition P of [0, 1] into finitely many parts
is weak 𝜺-regular if
∥𝑊 − 𝑊 P ∥ □ ≤ 𝜀.

Theorem 4.6.7 (Weak regularity lemma for graphons)

2
Let 0 < 𝜀 < 1. Then every graphon has a weak 𝜀-regular partition into at most 41/𝜀
parts.

Remark 4.6.8. Technically speaking, Theorem 4.6.3 does not follow from Theorem 4.6.7
since the partition of [0, 1] for 𝑊𝐺 could split intervals corresponding to individual vertices of
𝐺. However, the proofs of the two claims are exactly the same. Alternatively, one can allow a
more flexible definition of a graphon as a symmetric measurable function 𝑊 : Ω×Ω → [0, 1],
and then take Ω to be the discrete probability space 𝑉 (𝐺) endowed with the uniform measure.
Like the proof of the regularity lemma in Section 2.1, we use an energy increment strategy.
Recall from Definition 2.1.10 that the energy of a vertex partition is the mean-squared edge-
density between parts. Given a graphon 𝑊, we define the energy of a measurable partition
146 Graph limits

P = {𝑆1 , . . . , 𝑆 𝑘 } of [0, 1] by
∫ ∑︁
𝑘
2 2
∥𝑊 P ∥ 2 = 𝑊 P (𝑥, 𝑦) 𝑑𝑥𝑑𝑦 = 𝜆(𝑆𝑖 )𝜆(𝑆 𝑗 ) (avg of 𝑊 on 𝑆𝑖 × 𝑆 𝑗 ) 2 .
[0,1] 2 𝑖, 𝑗=1

Given 𝑊, 𝑈 : [0, 1] 2 → R, we write

∫ ∫
⟨𝑾, 𝑼⟩ := 𝑊𝑈 = 𝑊 (𝑥, 𝑦)𝑈 (𝑥, 𝑦) 𝑑𝑥𝑑𝑦.
[0,1] 2

Lemma 4.6.9 ( 𝐿 2 energy increment)

Let 𝑊 be a graphon. Let P be a finite measurable partition of [0, 1] that is not weak
𝜀-regular for 𝑊. Then there is a measurable refinement P ′ of P, dividing each part of P
into at most 4 parts, such that
∥𝑊 P ′ ∥ 22 > ∥𝑊 𝑃 ∥ 22 + 𝜀 2 .

Proof. Because ∥𝑊 − 𝑊 P ∥ □ > 𝜀, there exist measurable subsets 𝑆, 𝑇 ⊆ [0, 1] such that
|⟨𝑊 − 𝑊 P , 1𝑆×𝑇 ⟩| > 𝜀.
Let P ′ be the refinement of P by introducing 𝑆 and 𝑇, dividing each part of P into ≤ 4
sub-parts. We know that
⟨𝑊 P , 𝑊 P ⟩ = ⟨𝑊 P ′ , 𝑊 P ⟩
because 𝑊 P is constant on each step of P, and P ′ is a refinement of P. Thus,
⟨𝑊 P ′ − 𝑊 P , 𝑊 P ⟩ = 0.
By the Pythagorean Theorem (in the Hilbert space 𝐿 2 ( [0, 1] 2 )),
∥𝑊 P ′ ∥ 22 = ∥𝑊 P ∥ 22 + ∥𝑊 P ′ − 𝑊 P ∥ 22 . (4.7)
Note that ⟨𝑊 P ′ , 1𝑆×𝑇 ⟩ = ⟨𝑊, 1𝑆×𝑇 ⟩ since 𝑆 and 𝑇 are both unions of parts of the partition
P ′ . So, by the Cauchy–Schwarz inequality,
∥𝑊 P ′ − 𝑊 P ∥ 2 ≥ |⟨𝑊 P ′ − 𝑊 P , 1𝑆×𝑇 ⟩| = |⟨𝑊 − 𝑊 P , 1𝑆×𝑇 ⟩| > 𝜀.
So by (4.7), we have ∥𝑊 P ′ ∥ 22 > ∥𝑊 P ∥ 22 + 𝜀 2 , as claimed. □
We will prove the following slight generalization of Theorem 4.6.7, allowing an arbitrary
starting partition (this will be useful later).

Theorem 4.6.10 (Weak regularity lemma for graphons)

Let 0 < 𝜀 < 1. Let 𝑊 be a graphon. Let P0 be a finite measurable partition of [0, 1].
Then every graphon has a weak 𝜀-regular partition P, such that P refines P0 , and each
2
part of P0 is partitioned into at most 41/𝜀 parts under P.

This proposition specifically tells us that starting with any given partition, the regularity
argument still works.
Proof. Starting with 𝑖 = 0:
4.6 Weak Regularity Lemma 147

(1) If P𝑖 is weak 𝜀-regular, then STOP.

(2) Else, by Lemma 4.6.9, there exists a measurable partition P𝑖+1 refining each part of
2 2
P𝑖 into at most 4 parts, such that 𝑊 P𝑖+1 2 > 𝑊 P𝑖 2 + 𝜀 2 .
(3) Increase 𝑖 by 1 and go back to Step (1).
Since 0 ≤ ∥𝑊 P ∥ 2 ≤ 1 for every P, the process terminates with 𝑖 < 1/𝜀 2 , resulting in a
terminal P𝑖 with the desired properties. □
Remark 4.6.11 (Additive approximation of maximum cut). One of the initial motivations
for developing the weak regularity lemma was to develop a general efficient algorithm for
estimating the maximum cut in a dense graph. The maximum cut problem is a central
problem in algorithms and combinatorial optimization:
MAX CUT: Given a graph 𝑆, find a 𝑆 ⊆ 𝑉 (𝐺) that maximizes 𝑒(𝑆, 𝑉 (𝐺) \ 𝑆).
Goemans & Williamson (1995) found an efficient 0.878-approximation algorithm (this
means that the algorithm outputs some 𝑆 with 𝑒(𝑆, 𝑉 (𝐺) \ 𝑆) at least a factor 0.878 of
the optimum). Their seminal algorithm uses a semidefinite relaxation. The Unique Games
Conjecture (currently still open) would imply that it would not be possible to obtain a
better approximation than the Goemans–Williamson algorithm (Khot, Kindler, Mossel, &
O’Donnell 2007). It is also known that approximating beyond 16/17 ≈ 0.941 is NP-hard
(Håstad 2001).
On the other hand, an algorithmic version of the weak regularity lemma gives us an
efficient algorithm to approximate the maximum cut for dense graphs with an additive error.
This means, given 𝜀 > 0, we wish to find a cut whose number of edges is within 𝜀𝑛2 of the
optimum. The basic idea is to find a weak regular partition 𝑉 (𝐺) = 𝑉1 ∪ · · · ∪ 𝑉𝑘 , and then
do a brute-force search through all possibles size |𝑆 ∩ 𝑉𝑖 |. See Frieze & Kannan (1999) for
more details. These ideas have been further developed into efficient sampling algorithms,
sampling only poly(1/𝜀) random vertices, for estimating the maximum cut in a dense graph,
(e.g., Alon, Fernandez de la Vega, Kannan, & Karpinski (2003b)).
The following exercise offers another approach to the weak regularity lemma. It gives an
approximation of a graphon as a linear combination of ≤ 𝜀 −2 indictor functions of boxes. The
polynomial dependence of 𝜀 −2 is important for designing efficient approximation algorithms.
Exercise 4.6.12 (Weak regularity decomposition).
(a) Let 𝜀 > 0. Show that for every graphon 𝑊, there exist measurable 𝑆1 , . . . , 𝑆 𝑘 , 𝑇1 , . . . ,
𝑇𝑘 ⊆ [0, 1] and reals 𝑎 1 , . . . , 𝑎 𝑘 ∈ R, with 𝑘 < 𝜀 −2 , such that
∑︁
𝑘
𝑊− 𝑎 𝑖 1𝑆𝑖 ×𝑇𝑖 ≤ 𝜀.
𝑖=1 □

The rest of the exercise shows how to recover a regularity partition from the above
approximation.
(b) Show that the stepping operator is contractive with respect to the cut norm, in the sense
that if 𝑊 : [0, 1] 2 → R is a measurable symmetric function, then ∥𝑊 P ∥ □ ≤ ∥𝑊 ∥ □ .
(c) Let P be a partition of [0, 1] into measurable sets. Let 𝑈 be a graphon that is constant
on 𝑆 × 𝑇 for each 𝑆, 𝑇 ∈ P. Show that for every graphon 𝑊, one has
∥𝑊 − 𝑊 P ∥ □ ≤ 2 ∥𝑊 − 𝑈 ∥ □ .
148 Graph limits

(d) Use (a) and (c) to give a different proof of the weak regularity lemma (with slightly
worse bounds than the one given in class): show that for every 𝜀 > 0 and every
2
graphon 𝑊, there exists a partition P of [0, 1] into 2𝑂 (1/𝜀 ) measurable sets such that
∥𝑊 − 𝑊 P ∥ □ ≤ 𝜀.

Exercise 4.6.13∗ (Second neighborhood distance). Let 0 < 𝜀 < 1/2. Let 𝑊 be a
graphon. Define 𝜏𝑊 , 𝑥 : [0, 1] → [0, 1] by
∫
𝜏𝑊 ,𝑥 (𝑧) = 𝑊 (𝑥, 𝑦)𝑊 (𝑦, 𝑧) 𝑑𝑦.
[0,1]

(This models the second neighborhood of 𝑥.) Prove that if a finite set 𝑆 ⊆ [0, 1] satisfies
∥𝜏𝑊 ,𝑠 − 𝜏𝑊 ,𝑡 ∥ 1 > 𝜀 for all distinct 𝑠, 𝑡 ∈ 𝑆,
2
then |𝑆| ≤ (1/𝜀) 𝐶/𝜀 , where 𝐶 is some absolute constant.

Exercise 4.6.14 (Strong regularity lemma). Let ε = (𝜀 1 , 𝜀 2 , . . . ) be a sequence of positive

reals. By repeatedly applying the weak regularity lemma, show that there is some 𝑀 =
𝑀 ( ε) such that for every graphon 𝑊, there is a pair of partitions P and Q of [0, 1] into
measurable sets, such that Q refines P, |Q| ≤ 𝑀 (here |Q| denotes the number of parts of
Q),
∥𝑊 − 𝑊 Q ∥ □ ≤ 𝜀 | P | and ∥𝑊 Q ∥ 22 ≤ ∥𝑊 P ∥ 22 + 𝜀 12 .
Furthermore, deduce the strong regularity lemma in the following form: one can write
𝑊 = 𝑊str + 𝑊psr + 𝑊sml
where 𝑊str is a 𝑘-step graphon with 𝑘 ≤ 𝑀, 𝑊psr □ ≤ 𝜀 𝑘 , and ∥𝑊sml ∥ 1 ≤ 𝜀1 . State your
bounds on 𝑀 explicitly in terms of ε. (Note: the parameter choice 𝜀 𝑘 = 𝜀/𝑘 2 roughly
corresponds to Szemerédi’s regularity lemma, in which case your bound on 𝑀 should be
an exponential tower of 2’s of height 𝜀 −𝑂 (1) ; if not then you are doing something wrong.)

4.7 Martingale Convergence Theorem

In this section we prove a result about martingales that will be used in the proof of the
compactness of the graphon space.
Martingales are a standard notion in probability theory. It is a stochastic sequence where
the expected change at each step is zero, even conditioned on all prior values of the sequence.

Definition 4.7.1 (Discrete time martingale)

A martingale is a random real sequence 𝑋0 , 𝑋1 , 𝑋2 , . . . such that for all 𝑛 ≥ 0, E |𝑋𝑛 | < ∞,
and
E[𝑋𝑛+1 |𝑋0 , . . . , 𝑋𝑛 ] = 𝑋𝑛 .

Remark 4.7.2. The above definition is sufficient for our purposes. In order to give a more
formal definition of a martingale, we need to introduce the notion of a filtration. See any
standard measure theory based introduction to probability (Williams (1991, Chapters 10–11)
has a particularly lucid discussion of martingales and their convergence theorem discussed
4.7 Martingale Convergence Theorem 149

below). This martingale is indexed by integers, and hence called “discrete-time.” There are
also continuous-time martingales (e.g., Brownian motion), which we will not discuss here.
Example 4.7.3 (Partial sum of independent mean zero random variables). Let 𝑍1 , 𝑍2 , . . .
be a sequence of independent mean zero random variables (e.g., ±1 with equal probability).
Then 𝑋𝑛 = 𝑍1 + · · · + 𝑍 𝑛 , 𝑛 ≥ 0, is a martingale.
Example 4.7.4 (Betting strategy). Consider any betting strategy in a “fair” casino, where
the expected value of each bet is zero. Let 𝑋𝑛 be the balance after 𝑛 rounds of betting.
Then 𝑋𝑛 is a martingale regardless of the betting strategy. So every betting strategy has zero
expected gain after 𝑛 rounds. Also see the optional stopping theorem for a more general
statement (e.g., Williams (1991, Chapter 10)).
The original meaning of the word “martingale” refers to the following better strategy on a
sequence of fair coin tosses. Each round the better is allowed to bet an arbitrary amount 𝑍:
if heads, the better gains 𝑍 dollars, and if tails the better loses 𝑍 dollars.
Start betting 1 dollar. If one wins, stop. If one loses, then double one’s bet for the next
coin. And then repeat (i.e., keep doubling one’s bet until the first win, at which point one
stops).
A “fallacy” is that this strategy always results in a final net gain of $1, the supposed reason
being that with probability 1 one eventually sees a head. This initially appears to contradict
the earlier claim that all betting strategies have zero expected gain. Thankfully there is no
contradiction. In real life, one starts with a finite budget and could possibly go bankrupt with
this betting strategy, thereby leading to a forced stop. In the optional stopping theorem, there
are some boundedness hypotheses that are violated by the above strategy.
The following construction of martingales is most relevant for our purposes.
Example 4.7.5 (Doob martingale). Let 𝑋 be some “hidden” random variable. Partial infor-
mation is revealed about 𝑋 gradually over time. For example, 𝑋 is some fixed function of
some random inputs. So the exact value of 𝑋 is unknown but its distribution can be derived
from the distribution of the inputs. Initially one does not know any of the inputs. Over time,
some of the inputs are revealed. Let
𝑋𝑛 = E[𝑋 | all information revealed up to time 𝑛].
Then 𝑋0 , 𝑋1 , . . . is a martingale (why?). Informally, 𝑋𝑛 is the best guess (in expectation) of 𝑋
based on all the information available up to time 𝑛. We have 𝑋0 = E𝑋 (when no information
is revealed). All information is revealed as 𝑛 → ∞, and the martingale 𝑋𝑛 converges to the
random variable 𝑋 with probability 1.
Here is a real-life example. Let 𝑋 ∈ {0, 1} be whether a candidate wins in a presidential
election. Let 𝑋𝑛 be the inferred probability that the candidate wins, given all the information
known at time 𝑡 𝑛 . Then 𝑋𝑛 converges to the “truth”, a {0, 1}-value, eventually becoming
deterministic when the election result is finalized.
Then 𝑋𝑛 is a martingale. At time 𝑡 𝑛 , knowing 𝑋𝑛 , if the expectation for 𝑋𝑛+1 (conditioned
on everything known at time 𝑡 𝑛 ) were different from 𝑋𝑛 , then one should have adjusted 𝑋𝑛
accordingly in the first place.
The precise notion of “information” in the above formula can be formalized using the
notion of filtration in probability theory.
150 Graph limits

Here is the main result of this section.

Theorem 4.7.6 (Martingale convergence theorem)

Every bounded martingale converges with probability 1.

In other words, if 𝑋0 , 𝑋1 , . . . is a martingale with 𝑋𝑛 ∈ [0, 1] for every 𝑛, then the sequence
is convergent with probability 1.
Remark 4.7.7. The proof actually shows that the boundedness condition can be replaced by
the weaker 𝐿 1 -boundedness condition sup𝑛 E |𝑋𝑛 | < ∞. Even more generally, a hypothesis
called “uniform integrability” is enough.
Some boundedness condition is necessary. For example, in Example 4.7.3, a running sum
of independent uniform ±1 is a non-bounded martingale, and never converges.
Proof. If a sequence 𝑋0 , 𝑋1 , · · · ∈ [0, 1] does not converge, then there exist a pair of rational
numbers 0 < 𝑎 < 𝑏 < 1 such that 𝑋𝑛 “up-crosses” [𝑎, 𝑏] infinitely many times, meaning that
there is an infinite sequence 𝑠1 < 𝑡1 < 𝑠2 < 𝑡2 < · · · such that 𝑋𝑠𝑖 < 𝑎 < 𝑏 < 𝑋𝑡𝑖 for all 𝑖.

𝑠1 𝑡1 𝑠2 𝑡 2 𝑠3 𝑡3

We will show that for each 𝑎 < 𝑏, the probability that a bounded martingale 𝑋0 , 𝑋1 , · · · ∈
[0, 1] up-crosses [𝑎, 𝑏] infinitely many times is zero. Then, by taking a union of all countably
many such pairs (𝑎, 𝑏) of rationals, we deduce that the martingale converges with probability
1.
Consider the following betting strategy. Imagine that 𝑋𝑛 is a stock price. At any time, if
𝑋𝑛 dips below 𝑎, we buy and hold one share until 𝑋𝑛 reaches above 𝑏, at which point we
sell this share. (Note that we always hold either zero or one share–we do not buy more until
we have sold the currently held share). Start with a budget of 𝑌0 = 1 (so we will never go
bankrupt). Let 𝑌𝑛 be the value of our portfolio (cash on hand plus the value of the share if
held) at time 𝑛. Then 𝑌𝑛 is a martingale (why?). So E𝑌𝑛 = 𝑌0 = 1. Also 𝑌𝑛 ≥ 0 for all 𝑛. If
one buys and sells at least 𝑘 times up to time 𝑛, then 𝑌𝑛 ≥ 𝑘 (𝑏 − 𝑎) (this is only the net profit
from buying and selling; the actual 𝑌𝑛 may be higher due to the initial cash balance and the
value of the current share held). So, by Markov’s inequality, for every 𝑛,
E𝑌𝑛 1
P(≥ 𝑘 up-crossings up to time 𝑛) ≤ P(𝑌𝑛 ≥ 𝑘 (𝑏 − 𝑎)) ≤ = .
𝑘 (𝑏 − 𝑎) 𝑘 (𝑏 − 𝑎)
By the monotone convergence theorem,
1
P(≥ 𝑘 up-crossings) = lim P(≥ 𝑘 up-crossings up to time 𝑛) ≤ .
𝑛→∞ 𝑘 (𝑏 − 𝑎)
Letting 𝑘 → ∞, the probability of having infinitely many up-crossings is zero. □
4.8 Compactness of the Graphon Space 151

4.8 Compactness of the Graphon Space

Using the weak regularity lemma and the martingale convergence theorem, let us prove that
the space of graphons is compact with respect to the cut metric.
f0 is a metric space, it
Proof of compactness of the graphon space (Theorem 4.2.7). As W
suffices to prove sequential compactness. Fix a sequence 𝑊1 , 𝑊2 , . . . of graphons. We want to
show that there is a subsequence which converges (with respect to 𝛿□ ) to some limit graphon.
Step 1. Regularize.
For each 𝑛, apply the weak regularity lemma (Theorem 4.6.7) repeatedly, to obtain a
sequence of partitions P𝑛,1 , P𝑛,2 , P𝑛,3 , . . . (everything in this proof is measurable, and we
will stop repeatedly mentioning it) such that
(a) P𝑛,𝑘+1 refines P𝑛,𝑘 for all 𝑛, 𝑘,
(b) P𝑛,𝑘 = 𝑚 𝑘 where 𝑚 𝑘 is a function of only 𝑘, and
(c) ∥𝑊𝑛 − 𝑊𝑛,𝑘 ∥ □ ≤ 1/𝑘 where 𝑊𝑛,𝑘 = (𝑊𝑛 ) P𝑛,𝑘 .
The weak regularity lemma only guarantees that P𝑛,𝑘 ≤ 𝑚 𝑘 , but if we allow empty parts
then we can achieve equality in (b).
Step 2. Passing to a subsequence.
Initially, each P𝑛,𝑘 partitions [0, 1] into arbitrary measurable sets. By restricting to a
subsequence, we may assume that
• For each 𝑘 and 𝑖 ∈ [𝑚 𝑘 ], the measure of the 𝑖-th part of P𝑛,𝑘 converges to some value
𝛼𝑘,𝑖 as 𝑛 → ∞.
• For each 𝑘 and 𝑖, 𝑗 ∈ [𝑚 𝑘 ], the value of 𝑊𝑛,𝑘 on the product of the 𝑖-th and 𝑗-th parts
of P𝑛,𝑘 converges to some value 𝛽 𝑘,𝑖, 𝑗 as 𝑛 → ∞.
Now construct, for each 𝑘, the following limiting objects as 𝑛 → ∞ along the above
subsequence:
• Let P𝑘 = {𝐼 𝑘,1 , . . . , 𝐼 𝑘,𝑚𝑘 } denote a partition of [0, 1] into intervals with lengths
𝜆(𝐼 𝑘,𝑖 ) = 𝛼𝑘,𝑖 for each 𝑖 ∈ [𝑚 𝑘 ].
• Let 𝑈 𝑘 denote a step graphon with steps P𝑘 , and whose value on 𝐼 𝑘,𝑖 × 𝐼 𝑘, 𝑗 is 𝛽 𝑘,𝑖, 𝑗 for
each 𝑖, 𝑗 ∈ [𝑚 𝑘 ].
Then, for each 𝑘,
𝛿□ (𝑊𝑛,𝑘 , 𝑈𝑘 ) → 0, as 𝑛 → ∞. (4.8)
(In fact, some rearrangement of the step graphon 𝑊𝑛,𝑘 converges pointwise almost everywhere
to the step graphon 𝑈 𝑘 .)
For each 𝑘, since 𝑊𝑛,𝑘 = (𝑊𝑛,𝑘+1 ) P𝑛,𝑘 for every 𝑛, we have
𝑈 𝑘 = (𝑈 𝑘+1 ) P𝑘 .
.3 .8 .4 .6
.5 .3
.8 .1 0 .2
.368 ...
.4 0 .5 .2
.3 .4
.6 .2 .2 .7

𝑈1 𝑈2 𝑈3
152 Graph limits

Step 3. Finding the limit.

Now each 𝑈 𝑘 can be thought of as a random variable on probability space [0, 1] 2 (i.e.,
𝑈 𝑘 (𝑋, 𝑌 ) with (𝑋, 𝑌 ) ∼ Uniform( [0, 1] 2 )). The condition 𝑈𝑘 = (𝑈𝑘+1 ) P𝑘 implies that the
sequence 𝑈1 , 𝑈2 , . . . is a martingale. Since each 𝑈 𝑘 is bounded between 0 and 1, by the
martingale convergence theorem (Theorem 4.7.6), there exists a graphon 𝑈 such that 𝑈 𝑘 → 𝑈
pointwise almost everywhere as 𝑘 → ∞.
We claim that 𝑊1 , 𝑊2 , . . . (which is a relabeled subsequence of the original sequence)
converges to 𝑈 in cut metric.
Let 𝜀 > 0. Then there exists some 𝑘 > 3/𝜀 such that ∥𝑈 − 𝑈𝑘 ∥ 1 < 𝜀/3, by pointwise
convergence and the dominated convergence theorem. Then 𝛿□ (𝑈, 𝑈𝑘 ) < 𝜀/3. By (4.8),
there exists some 𝑛0 ∈ N such that 𝛿□ (𝑊𝑛,𝑘 , 𝑈𝑘 ) < 𝜀/3 for all 𝑛 > 𝑛0 . Finally, since we chose
𝑘 > 3/𝜀, we already know that 𝛿□ (𝑊𝑛 , 𝑊𝑛,𝑘 ) < 𝜀/3 for all 𝑛. We conclude that
𝛿□ (𝑈, 𝑊𝑛 ) ≤ 𝛿□ (𝑈, 𝑈𝑘 ) + 𝛿□ (𝑈 𝑘 , 𝑊𝑛,𝑘 ) + 𝛿□ (𝑊𝑛,𝑘 , 𝑊𝑛 ) ≤ 𝜀/3 + 𝜀/3 + 𝜀/3 = 𝜀.
Since 𝜀 > 0 can be chosen to be arbitrarily small, we find that the subsequence 𝑊𝑛 converges
to 𝑈 in cut metric. □

Quick applications
f0 , 𝛿□ ) is a powerful statement. We will use it to prove the equivalence
The compactness of ( W
of cut metric convergence and left-convergence in the next section. Right now, let us show
how to use compactness to deduce the existence of limits for a left-convergent sequence of
graphons.
Proof of Theorem 4.3.8 (existence of limit for a left-convergent sequence of graphons). Let
𝑊1 , 𝑊2 , . . . be a sequence of graphons such that the sequence of 𝐹-densities {𝑡 (𝐹, 𝑊𝑛 )} 𝑛
converges for every graph 𝐹. Since ( W f0 , 𝛿□ ) is a compact metric space by Theorem 4.2.7,
it is also sequentially compact, and so there is a subsequence (𝑛𝑖 )𝑖=1
∞
and a graphon 𝑊 such
that 𝛿□ (𝑊𝑛𝑖 , 𝑊) → 0 as 𝑖 → ∞. Fix any graph 𝐹. By the counting lemma, Theorem 4.5.1, it
follows that 𝑡 (𝐹, 𝑊𝑛𝑖 ) → 𝑡 (𝐹, 𝑊). But by assumption, the sequence {𝑡 (𝐹, 𝑊𝑛 )} 𝑛 converges.
Therefore 𝑡 (𝐹, 𝑊𝑛 ) → 𝑡 (𝐹, 𝑊) as 𝑛 → ∞. Thus 𝑊𝑛 left-converges to 𝑊. □
Let us now examine a different aspect of compactness. Recall that by definition, a set is
compact if every open cover has a finite subcover.
Recall from Theorem 4.2.8 that the set of graphs is dense in the space of graphons with
respect to the cut metric. This was proved by showing that for every 𝜀 > 0 and graphon
𝑊, one can find a graph 𝐺 such that 𝛿□ (𝐺, 𝑊) < 𝜀. However, the size of 𝐺 produced by
this proof depends on both 𝜀 and 𝑊, since the proof proceeds by first taking a discrete 𝐿 1
approximation of 𝑊, which could involve an unbounded number of steps to approximate. In
contrast, we show below that the number of vertices of 𝐺 needs to depend only on 𝜀 and not
on 𝑊.

Proposition 4.8.1 (Uniform approximation of graphons by graphs)

For every 𝜀 > 0 there is some positive integer 𝑁 = 𝑁 (𝜀) such that every graphon lies
within cut distance 𝜀 of a graph on at most 𝑁 vertices.
4.8 Compactness of the Graphon Space 153

Proof. Let 𝜀 > 0. For a graph 𝐺, define the open 𝜀-ball (with respect to the cut metric)
around 𝐺:
f0 : 𝛿□ (𝐺, 𝑊) < 𝜀}.
𝐵 𝜀 (𝐺) = {𝑊 ∈ W
Since every graphon lies within cut distance 𝜀 from some graph (Theorem 4.2.8), the balls
f0 as 𝐺 ranges over all graphs. By compactness, this open cover has a finite
𝐵 𝜀 (𝐺) cover W
subcover, and let 𝑁 be the maximum number of vertices in graphs 𝐺 of this subcover. Then
every graphon lies within cut distance 𝜀 of a graph on at most 𝑁 vertices. □
The following exercise asks to make the above proof quantitative.
Exercise 4.8.2. Show that for every 𝜀 > 0, every graphon lies within cut distance at most
2
𝜀 from some graph on at most 𝐶 1/𝜀 vertices, where 𝐶 is some absolute constant.
Hint: Use the weak regularity lemma.

Remark 4.8.3 (Ineffective bounds from compactness). Arguments using compactness usu-
ally do not generate quantitative bounds, meaning, for example, the proof of Proposition 4.8.1
does not give any specific function 𝑛(𝜀), only that such a function always exists. In case where
one does not have an explicit bound, we call the bound ineffective. Ineffective bounds also
often arise from arguments involving ergodic theory and non-standard analysis. Sometimes a
different argument can be found that generates a quantitative bound (e.g., Exercise 4.8.2), but
it is not always known how to do this. Here we illustrate a simple example of a compactness
application (unrelated to dense graph limits) that gives an ineffective bound, but it remains
an open problem to make the bound effective.
This example concerns bounded degree graphs. It is sometimes called a “regularity lemma”
for bounded degree graphs, but it is very different from the regularity lemmas we have
encountered so far.
A rooted graph (𝐺, 𝑣) consists of a graph 𝐺 with a vertex 𝑣 ∈ 𝑣(𝐺) designated as the
root. Given a graph 𝐺 and positive integer 𝑟, we can obtain a random rooted graph by first
picking a vertex 𝑣 of 𝐺 as the root uniformly at random, and then removing all vertices more
than distance 𝑟 from 𝑣. We define the 𝒓-neighborhood-profile of 𝐺 to be the probability
distribution on rooted graphs generated by this process.
Recall that the total variation distance between two probability distributions 𝜇 and 𝜆 is
defined by
𝑑𝑇𝑉 (𝜇, 𝜆) = sup |𝜇(𝐸) − 𝜆(𝐸)| ,
𝐸

where 𝐸 ranges over all events. In the case of two discrete discrete random distributions 𝜇
and 𝜆, the above definition can be written as half the ℓ 1 distance between the two probability
distributions:
1 ∑︁
𝑑𝑇𝑉 (𝜇, 𝜆) = |𝜇(𝑥) − 𝜆(𝑥)| .
2 𝑥
The following is an unpublished observation of Alon.
154 Graph limits

Theorem 4.8.4 (“Regularity lemma” for bounded degree graphs)

For every 𝜀 > 0 and positive integers Δ and 𝑟 there exists a positive integer 𝑁 = 𝑁 (𝜀, Δ, 𝑟)
such that for every graph 𝐺 with maximum degree at most Δ, there exists a graph 𝐺 ′
with at most 𝑁 vertices, so that the total variation distance between the 𝑟-neighborhood-
profiles of 𝐺 and 𝐺 ′ is at most 𝜀.

Proof. Let G = GΔ,𝑟 be the set of all possible rooted graphs with maximum degree Δ and
radius at most 𝑟 around the root. Then |G| < ∞. The 𝑟-neighborhood-profile 𝑝 𝐺 of any
rooted graph 𝐺 can be represented as a point 𝑝 𝐺 ∈ [0, 1] G with coordinate sum 1, and let
𝐴 = {𝑝 𝐺 : graph 𝐺} ⊆ [0, 1] G be the set of all points that can arise this way. Since [0, 1] G
is compact, the closure of 𝐴 is compact. Since the union of the open 𝜀-neighborhoods (with
respect to 𝑑𝑇𝑉 ) of 𝑝 𝐺 , ranging over all graphs 𝐺, covers the closure of 𝐴, by compactness
there is some finite subcover. This subcover is a finite collection X of graphs so that for every
graph 𝐺, 𝑝 𝐺 lies within 𝜀 total variance distance to some 𝑝 𝐺 ′ with 𝐺 ′ ∈ X. We conclude by
letting 𝑁 be the maximum number of vertices of a graph from X. □
Despite the short proof using compactness, it remains an open problem to make the above
result quantitative.

Open problem 4.8.5 (Effective “regularity lemma” for bounded degree graphs)
Find some specific 𝑁 (𝜀, Δ, 𝑟) so that Theorem 4.8.4 holds.

4.9 Equivalence of Convergence

In this section, we prove Theorem 4.3.7, that left-convergence is equivalent to convergence in
cut metric. The counting lemma (Theorem 4.5.1) already showed that cut metric convergence
implies left-convergence. It remains to show the converse. In other words, we need to show
that if 𝑊1 , 𝑊2 , . . . is a sequence of graphons such that 𝑡 (𝐹, 𝑊𝑛 ) converges as 𝑛 → ∞ for
every graph 𝐹, then 𝑊𝑛 is a Cauchy sequence in ( W f0 , 𝛿□ ).
By the compactness of the graphon space, there is always some (subsequential) limit point
𝑊 of the sequence 𝑊𝑛 under the cut metric. We want to show that this limit point is unique.
Suppose 𝑈 is another limit point. It remains to show that 𝑊 and 𝑈 are in fact the same point
in Wf0 .
Let (𝑛𝑖 )𝑖=1
∞
be a subsequence such that 𝑊𝑛𝑖 → 𝑊. By the counting lemma, 𝑡 (𝐹, 𝑊𝑛𝑖 ) →
𝑡 (𝐹, 𝑊) for all graphs 𝐹, and by convergence of 𝐹-densities, 𝑡 (𝐹, 𝑊𝑛 ) → 𝑡 (𝐹, 𝑊) for all
graphs 𝐹. Similarly, 𝑡 (𝐹, 𝑊𝑛 ) → 𝑡 (𝐹, 𝑈) for all 𝐹. Hence, 𝑡 (𝐹, 𝑈) = 𝑡 (𝐹, 𝑊) for all 𝐹. All
it remains is to prove is the following claim.

Theorem 4.9.1 (Uniqueness of moments)

Let 𝑈 and 𝑊 be graphons such that 𝑡 (𝐹, 𝑊) = 𝑡 (𝐹, 𝑈) for all graphs 𝐹. Then 𝛿□ (𝑈, 𝑊) =
0.

Remark 4.9.2. The result is reminiscent of results from probability theory on the uniqueness
of moments, which roughly says that if two “sufficiently well-behaved” real random variables
𝑋 and 𝑌 share the same moments, (i.e., E[𝑋 𝑘 ] = E[𝑌 𝑘 ] for all nonnegative integers 𝑘),
4.9 Equivalence of Convergence 155

then 𝑋 and 𝑌 must be identically distributed. One needs some technical conditions for the
conclusion to hold. For example, Carleman’s condition says that if the moments of 𝑋 satisfy
Í∞ 2𝑘 −1/(2𝑘 )
𝑘=1 E[𝑋 ] = ∞, then the distribution of 𝑋 is uniquely determined by its moments.
This sufficient condition holds as long as the 𝑘-th moment of 𝑋 does not grow too quickly
with 𝑘. It holds for many distributions in practice.
We need some preparation before proving the uniqueness of moments theorem.

Lemma 4.9.3 (Tail bounds for 𝑈 -statistics)

Let 𝑈 : [0, 1] 2 → [−1, 1] be a symmetric measurable function. Let 𝑥 1 , . . . , 𝑥 𝑘 ∈ [0, 1]
be chosen independently and uniformly at random. Let 𝜀 > 0. Then
∫ !
1 ∑︁
P 𝑘
2
𝑈 (𝑥 𝑖 , 𝑥 𝑗 ) − 𝑈 ≥ 𝜀 ≤ 2𝑒 −𝑘 𝜀 /8 .
2 𝑖< 𝑗 [0,1] 2

Proof. Let 𝑓 (𝑥 1 , . . . , 𝑥 𝑛 ) denote the expression inside the absolute value. So E 𝑓 = 0. Also

𝑓 changes by at most 2(𝑘 − 1)/ 𝑘2 = 4/𝑘 whenever we change exactly one coordinate of 𝑓 .
By the bounded differences inequality, Theorem 4.4.4, we obtain

−2𝜀 2 2
P(| 𝑓 | ≥ 𝜀) ≤ 2 exp = 2𝑒 −𝑘 𝜀 /8 . □
(4/𝑘) 2 𝑘
Let us now consider a variation of the 𝑊-random graph model from Section 4.4. Let
𝑥 1 , . . . , 𝑥 𝑘 ∈ [0, 1] be chosen independently and uniformly at random. Let H(𝑘, 𝑊) be an
edge-weighted random graph on vertex set [𝑘] with edge 𝑖 𝑗 having weight 𝑊 (𝑥𝑖 , 𝑥 𝑗 ), for
each 1 ≤ 𝑖 < 𝑗 ≤ 𝑛. Note that this definition makes sense for any symmetric measurable
𝑊 : [0, 1] 2 → R. Furthermore, when 𝑊 is a graphon, the 𝑊-random graph G(𝑘, 𝑊) can be
obtained by independently sampling each edge of H(𝑘, 𝑊) with probability equal to its edge
weight. We shall study the joint distributions of G(𝑘, 𝑊) and H(𝑘, 𝑊) coupled through the
above two-step process.
𝑥3 𝑥5 𝑥1 𝑥2 𝑥4 1 2 3 4 5 1 2 3 4 5
𝑥3
1 1
𝑥5
2 2
𝑥1
3 3
𝑥2 4 4
𝑥4
5 5

𝑊 H(𝑘, 𝑊) G(𝑘, 𝑊)

Similar to Definition 4.2.4 of the cut distance 𝛿□ , define the distance based on the 𝐿 1 norm:
𝜹1 (𝑾, 𝑼) := inf ∥𝑊 − 𝑈 𝜙 ∥ 1
𝜙

where the infimum is taken over all invertible measure preserving maps 𝜙 : [0, 1] → [0, 1].
Since ∥ · ∥ □ ≤ ∥ · ∥ 1 , we have 𝛿□ ≤ 𝛿1 .
156 Graph limits

Lemma 4.9.4 (1-norm convergence for H(𝑘, 𝑊) )

Let 𝑊 be a graphon. Then 𝛿1 (H(𝑘, 𝑊), 𝑊) → 0 as 𝑘 → ∞ with probability 1.

Proof. First we prove the result for step graphons 𝑊. In this case, with probability 1 the
fraction of vertices of H(𝑘, 𝑊) that fall in each step of 𝑊 converges to the length of each step
by the law of large numbers. If so, then after sorting the vertices of H(𝑘, 𝑊), the associated
graphon H(𝑘, 𝑊) is obtained from 𝑊 by changing the step sizes by 𝑜(1) as 𝑘 → ∞, and
then zeroing out the diagonal blocks, as illustrated below. Then H(𝑘, 𝑊) converges to 𝑊
pointwise almost everywhere as 𝑘 → ∞. In particular, 𝛿1 (H(𝑘, 𝑊), 𝑊) → 0.

𝑊 H(𝑘, 𝑊)

Now let 𝑊 be any graphon. For any other graphon 𝑊 ′ , by using the same random vertices
for H(𝑘, 𝑊) and H(𝑘, 𝑊 ′ ), the two random graphs are coupled so that with probability 1,
∥H(𝑘, 𝑊) − H(𝑘, 𝑊 ′ )∥ 1 = ∥H(𝑘, 𝑊 − 𝑊 ′ )∥ 1 = ∥𝑊 − 𝑊 ′ ∥ 1 + 𝑜(1) as 𝑘 → ∞
by Lemma 4.9.3 applied to 𝑈 (𝑥, 𝑦) = |𝑊 (𝑥, 𝑦) − 𝑊 ′ (𝑥, 𝑦)|.
For every 𝜀 > 0, we can find some step graphon 𝑊 ′ so that ∥𝑊 − 𝑊 ′ ∥ 1 ≤ 𝜀 (by approx-
imating the Lebesgue measure using boxes). We saw earlier that 𝛿1 (H(𝑘, 𝑊 ′ ), 𝑊 ′ ) → 0. It
follows that with probability 1,
𝛿1 (H(𝑘, 𝑊), 𝑊) ≤ ∥H(𝑘, 𝑊) − H(𝑘, 𝑊 ′ )∥ 1 + 𝛿1 (H(𝑘, 𝑊 ′ ), 𝑊 ′ ) + ∥𝑊 ′ − 𝑊 ∥ 1
= 2 ∥𝑊 ′ − 𝑊 ∥ 1 + 𝑜(1) ≤ 2𝜀 + 𝑜(1)
as 𝑘 → ∞. Since 𝜀 > 0 can be chosen to be arbitrarily small, we have 𝛿1 (H(𝑘, 𝑊), 𝑊) → 0
with probability 1. □
Proof of Theorem 4.9.1 (uniqueness of moments). By inclusion-exclusion, for any 𝑘-vertex
labeled graph 𝐹,

Pr[G(𝑘, 𝑊) 𝐹 as labeled graphs]

∑︁
(−1) 𝑒 (𝐹 ) −𝑒 (𝐹 ) Pr[G(𝑘, 𝑊) ⊇ 𝐹 ′ as labeled graphs],
′
=
𝐹 ′ ⊇𝐹

where the sum ranges over all graphs 𝐹 ′ with 𝑉 (𝐹 ′ ) = 𝑉 (𝐹) and 𝐸 (𝐹 ′ ) ⊇ 𝐸 (𝐹). Since
𝑡 (𝐹 ′ , 𝑊) = Pr[G(𝑘, 𝑊) ⊇ 𝐹 ′ as labeled graphs],
we see that the distribution of G(𝑘, 𝑊) is determined by the values of 𝑡 (𝐹, 𝑊) over all 𝐹.
Since 𝑡 (𝐹, 𝑊) = 𝑡 (𝐹, 𝑈) for all 𝐹, G(𝑘, 𝑊) and G(𝑘, 𝑈) are identically distributed.
4.9 Equivalence of Convergence 157

Our strategy is to prove

𝛿1 𝛿□ 𝐷 𝛿□ 𝛿1
𝑊 ≈ H(𝑘, 𝑊) ≈ G(𝑘, 𝑊) ≡ G(𝑘, 𝑈) ≈ H(𝑘, 𝑈) ≈ 𝑈.
By Lemma 4.9.4, 𝛿1 (H(𝑘, 𝑊), 𝑊) → 0 with probability 1.
By coupling H(𝑘, 𝑊) and G(𝑘, 𝑊) using the same random vertices as noted earlier, so that
G(𝑘, 𝑊) is generated from H(𝑘, 𝑊) by independently sampling each edge with probability
equal to the edge weight, we have
P(𝛿□ (G(𝑘, 𝑊), H(𝑘, 𝑊)) ≥ 𝜀) → 0 as 𝑘 → 0 for every fixed 𝜀 > 0.
We leave the details of this claim as an exercise, below. It can be proved via the Chernoff
bound and the union bound. We need to be a bit careful about the definition of the cut norm
as one needs to consider fractional vertices.
Exercise 4.9.5 (Edge-sampling an edge-weighted graph and cut norm). Let 𝐻 be an
edge-weighted graph on 𝑘 vertices, with edge weights in [0, 1], and let 𝐺 be a random
graph obtained from 𝐻 by independently keeping each edge with probability equal to its
edge-weight. Prove that for every 𝜀 > 0 and 𝛿 > 0, there exists 𝑘 0 such that 𝛿□ (𝐺, 𝐻) < 𝜀
with probability > 1 − 𝛿, provided that 𝑘 ≥ 𝑘 0 .
So with probability 1,
𝛿□ (H(𝑘, 𝑊), G(𝑘, 𝑊)) → 0 as 𝑘 → ∞.
Since 𝛿□ ≤ 𝛿1 , we have, with probability 1,
𝛿□ (𝑊, G(𝑘, 𝑊)) ≤ 𝛿1 (𝑊, H(𝑘, 𝑊)) + 𝛿□ (H(𝑘, 𝑊), G(𝑘, 𝑊)) = 𝑜(1).
Likewise 𝛿□ (𝑈, G(𝑘, 𝑈)) = 𝑜(1) with probability 1. Since G(𝑘, 𝑊) and G(𝑘, 𝑈) are identi-
cally distributed as noted earlier, we deduce that 𝛿□ (𝑊, 𝑈) = 0. □
This finishes the proof of the equivalence between left-convergence and cut metric con-
vergence. This equivalence can be recast as counting and inverse counting lemmas. We state
the inverse counting lemma below, and leave the proof as an instructive exercise in applying
the compactness of the graphon space. (One need not invoke anything from the proof of
the uniqueness of moments theorem. You may wish to review the discussions on applying
compactness at the end of the previous section and the beginning of this section.)

Corollary 4.9.6 (Inverse counting lemma)

For every 𝜀 > 0 there is some 𝜂 > 0 and integer 𝑘 > 0 such that if 𝑈 and 𝑊 are graphons
with
|𝑡 (𝐹, 𝑈) − 𝑡 (𝐹, 𝑊)| ≤ 𝜂 whenever 𝑣(𝐹) ≤ 𝑘,
then 𝛿□ (𝑈, 𝑊) ≤ 𝜀.

Exercise 4.9.7. Prove the inverse counting lemma Corollary 4.9.6 using the compactness
of the graphon space (Theorem 4.2.7) and the uniqueness of moments (Theorem 4.9.1).
Hint: Consider a hypothetical sequence of counterexamples.
158 Graph limits

Remark 4.9.8. The inverse counting lemma was first proved by Borgs, Chayes, Lovász, Sós,
& Vesztergombi (2008) in the following quantitative form:

Theorem 4.9.9 (Inverse counting lemma)

Let 𝑘 be a positive integer. Let 𝑈 and 𝑊 be graphons with
2
|𝑡 (𝐹, 𝑈) − 𝑡 (𝐹, 𝑊)| ≤ 2−𝑘 whenever 𝑣(𝐹) ≤ 𝑘,
then (here 𝐶 is some absolute constant)
𝐶
𝛿□ (𝑈, 𝑊) ≤ √︁ .
log 𝑘

Exercise 4.9.10. Prove that there exists a function 𝑓 : (0, 1] → (0, 1] such that for all
graphons 𝑈 and 𝑊, there exists a graph 𝐹 with
|𝑡 (𝐹, 𝑈) − 𝑡 (𝐹, 𝑊)|
≥ 𝑓 (𝛿□ (𝑈, 𝑊)).
𝑒(𝐹)

Exercise 4.9.11∗ (Generalized maximum cut). For symmetric measurable functions 𝑊, 𝑈 : [0, 1] 2 →
R, define
∫
C(𝑊, 𝑈) := sup ⟨𝑊, 𝑈 ⟩ = sup
𝜙
𝑊 (𝑥, 𝑦)𝑈 (𝜙(𝑥), 𝜙(𝑦)) 𝑑𝑥𝑑𝑦,
𝜙 𝜙

where 𝜙 ranges over all invertible measure preserving maps [0, 1] → [0, 1]. Extend the
definition of C(·, ·) to graphs via C(𝐺, ·) := C(𝑊𝐺 , ·) and so on.
(a) Is C(𝑈, 𝑊) continuous jointly in (𝑈, 𝑊) with respect to the cut norm? Is it contin-
uous in 𝑈 if 𝑊 is held fixed?
(b) Show that if 𝑊1 and 𝑊2 are graphons such that C(𝑊1 , 𝑈) = C(𝑊2 , 𝑈) for all
graphons 𝑈, then 𝛿□ (𝑊1 , 𝑊2 ) = 0.
(c) Let 𝐺 1 , 𝐺 2 , . . . be a sequence of graphs such that C(𝐺 𝑛 , 𝑈) converges as 𝑛 → ∞
for every graphon 𝑈. Show that 𝐺 1 , 𝐺 2 , . . . is convergent.
(d) Can the hypothesis in (c) be replaced by “C(𝐺 𝑛 , 𝐻) converges as 𝑛 → ∞ for every
graph 𝐻”?

Exercise 4.9.12 (Characterizing graphs in terms of homomorphism counts).

(a) Let 𝐺 1 and 𝐺 2 be two graphs such that hom(𝐹, 𝐺 1 ) = hom(𝐹, 𝐺 2 ) for every graph
𝐹. Show that 𝐺 1 and 𝐺 2 are isomorphic.
(b) Let 𝐺 1 and 𝐺 2 be two graphs such that hom(𝐺 1 , 𝐻) = hom(𝐺 2 , 𝐻) for every graph
𝐻. Show that 𝐺 1 and 𝐺 2 are isomorphic.

Further Reading
The book Large Networks and Graph Limits by Lovász (2012) is the authoritative reference
on the subject. His survey article titled Very Large Graphs (2009) also gives an excellent
overview.
One particularly striking application of the theory of dense graph limits is to large de-
viations for random graphs by Chatterjee & Varadhan (2011). See the survey article An
4.9 Equivalence of Convergence 159

Introduction to Large Deviations for Random Graphs by Chatterjee (2016) as well as his
book (Chatterjee 2017).

Chapter Summary

• A graphon is a symmetric measurable function 𝑊 : [0, 1] 2 → [0, 1].

– Every graph 𝐺 can be turned into an associated graphon 𝑊𝐺 .
– A graphon can be turned into a random graph model known a 𝑊-random graph,
generalizing the stochastic block model.
• The cut metric of two graphons 𝑈 and 𝑊 is defined by
𝛿□ (𝑈, 𝑊) = inf ∥𝑈 − 𝑊 𝜙 ∥ □
𝜙
∫
= inf sup (𝑈 (𝑥, 𝑦) − 𝑊 (𝜙(𝑥), 𝜙(𝑦))) 𝑑𝑥𝑑𝑦 ,
𝜙 𝑆,𝑇 ⊆ [0,1] 𝑆×𝑇

where the infimum is taken over all invertible measure preserving maps 𝜙 : [0, 1] → [0, 1].
• Given a sequence of graphons (or graphs) 𝑊1 , 𝑊2 , . . . , we say that it
– convergences in cut metric if it is a Cauchy sequence with respect to the cut metric
𝛿□ ;
– left-convergences if the homomorphism density 𝑡 (𝐹, 𝑊𝑛 ) convergences for every fixed
graph 𝐹 as 𝑛 → ∞.
• The graphon space is compact under the cut metric.
– Proof uses the weak regularity lemma and the martingale convergence theorem.
– Compactness has powerful consequences.
• Convergence in cut metric and left-convergence are equivalent for a sequence of graphons.
– (⇒) follows from a counting lemma.
– (⇐) was proved here using compactness.
5

Graph Homomorphism Inequalities

Chapter Highlights
• A suite of techniques for proving inequalities between subgraph densities
• The maximum/minimum triangle density in a graph of given edge density.
• How to apply Cauchy–Schwarz and Hölder inequalities
• Lagrangian method (another proof of Turán’s theorem, and linear inequalities between
clique densities)
• Entropy method (and applications to Sidorenko’s conjecture)

In this chapter, we study inequalities between graph homomorphism densities. Here is a

typical example.

Question 5.0.1 (Linear inequality between homomorphism densities)

Given fixed graphs 𝐹1 , . . . , 𝐹𝑘 and reals 𝑐 1 , . . . , 𝑐 𝑘 , does
𝑐 1 𝑡 (𝐹1 , 𝐺) + 𝑐 2 𝑡 (𝐹2 , 𝐺) + · · · + 𝑐 𝑘 𝑡 (𝐹𝑘 , 𝐺) ≥ 0. (5.1)
hold for all graphs 𝐺? Recall 𝑡 (𝐹, 𝐺) = hom(𝐹, 𝐺)/𝑣(𝐺) 𝑣 (𝐹 ) .

Although the left-hand side is a linear combination of various graph homomorphism

densities in 𝐺, polynomial combinations can also be written this way, as 𝑡 (𝐹1 , 𝐺)𝑡 (𝐹2 , 𝐺) =
𝑡 (𝐹1 ⊔ 𝐹2 , 𝐺) where 𝐹1 ⊔ 𝐹2 is the disjoint union of the two graphs.
More generally, we would like understand constrained optimization problems in terms of
graph homomorphism density. Many problems in extremal graph theory can be cast in this
framework. For example, Turán’s theorem from Chapter 1 on the maximum edge density of
a 𝐾𝑟 -free graph can be phrased in terms of the optimization problem
maximize 𝑡 (𝐾2 , 𝐺) subject to 𝑡 (𝐾𝑟 , 𝐺) = 0.
Turán’s theorem (Corollary 1.2.6) says that the answer is 1/(𝑟 − 1), achieved by 𝐺 = 𝐾𝑟 −1 .
We will see another proof of Turán’s theorem in later in this Chapter, in Section 5.4 using
the method of Lagrangians.
Remark 5.0.2 (Undecidability). Perhaps surprisingly, Question 5.0.1 is undecidable (for
the question to make sense, we need to restrict the coefficients to a countable sets, say the
rationals), as shown by Hatami & Norine (2011). This means that there is no algorithm that
always correctly decides whether a given inequality is true for all graphs (however, it does not
prevent us from proving/disproving specific inequalities). This undecidability stands in stark
contrast to the decidability of polynomial inequalities over the reals, which follows from a
classic result of Tarski (1948) that the first order theory of real numbers is decidable (via

161
162 Graph Homomorphism Inequalities

quantifier elimination). This undecidability of graph homomorphism inequalities is related to

Matiyasevich’s theorem (1970) (also known as the Matiyasevich–Robinson–Davis–Putnam
theorem) giving a negative solution to Hilbert’s 10th Problem, showing that diophantine
equations are undecidable (equivalently: polynomial inequalities over the integers are unde-
cidable). In fact, the proof of the former proceeds by converting polynomial inequalities over
the integers to inequalities between 𝑡 (𝐹, 𝐺) for various 𝐹.
As in the case of diophantine equations, the undecidability of graph homomorphism
inequalities should be positively viewed as evidence of the richness of this space of problems.
There are still many open problems, such as Sidorenko’s inequality that we will see shortly.
Remark 5.0.3 (Graphs vs. graphons). In the space of graphons with respect to the cut
norm, 𝑊 ↦→ 𝑡 (𝐹, 𝑊) is continuous (by the counting lemma, Theorem 4.5.1), and graphs are
a dense subset (Theorem 4.2.8). It follows any inequality for continuous functions of 𝑡 (𝐹, 𝐺)
over various 𝐹’s (e.g., linear combinations as in Question 5.0.1) holds for all graphs 𝐺 if
and only if they hold for all graphons 𝑊 in place of 𝐺. Furthermore, due to the compactness
of the space of graphons, the extremum of continuous functions of 𝐹-densities is always
attained at some graphon. The graphon formulation of the results can be often succinct and
attractive.
For example, consider the following extremal problem (already mentioned in Chapter 4),
where 𝑝 ∈ [0, 1] is a given constant,
minimize 𝑡 (𝐶4 , 𝐺) subject to 𝑡 (𝐾2 , 𝐺) ≥ 𝑝.
The minimum (or rather infimum) 𝑝 4 is not attained by any single graph, but rather by a
sequence of quasirandom graphs (see Section 3.1). However, if we enlarge the space from
graphs 𝐺 to graphons 𝑊, then the minimizer is attained, in this case by the constant graphon
𝑝.

Sidorenko’s conjecture and forcing conjecture

There are many important open problems on graph homomorphism inequalities. A ma-
jor conjecture in extremal combinatorics is Sidorenko’s conjecture (1993) (an equivalent
conjecture was given earlier by Erdős and Simonovits).

Definition 5.0.4 (Sidorenko graphs)

We say that a graph 𝐹 is Sidorenko if for every graph 𝐺,
𝑡 (𝐹, 𝐺) ≥ 𝑡 (𝐾2 , 𝐺) 𝑒 (𝐹 ) .

Conjecture 5.0.5 (Sidorenko’s conjecture)

Every bipartite graph is Sidorenko.

In other words, the conjecture says that for a fixed bipartite graph 𝐹, the 𝐹-density in
a graph of a given edge density is asymptotically minimized by a random graph. We will
develop techniques in this chapter to prove several interesting special cases of Sidorenko’s
conjecture.
Graph Homomorphism Inequalities 163

Every Sidorenko graph is necessarily bipartite. Indeed, given a non-bipartite 𝐹, we can

take a non-empty bipartite 𝐺 to get 𝑡 (𝐹, 𝐺) = 0 while 𝑡 (𝐾2 , 𝐺) > 0.
A notable open case of Sidorenko’s conjecture is 𝐹 = 𝐾5,5 \ 𝐶10 (below left). This 𝐹 is
called the Möbius graph since it is the point-face incidence graph of a minimum simplicial
decomposition of a Möbius strip (below right).
𝑎 𝐹
𝑏 𝑑 𝑎
𝑏 𝐺
𝑐 𝐻 𝐻 𝐽
𝐺 𝐼 𝐹
𝑑 𝐼
𝑎 𝑐 𝑒 𝑏
𝑒 𝐽

Sidorenko’s conjecture has the equivalent graphon formulation: for every bipartite graph 𝐹
and graphon 𝑊,
𝑡 (𝐹, 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 𝑒 (𝐹 ) .
Note that equality occurs when 𝑊 ≡ 𝑝, the constant graphon. One can think of Sidorenko’s
∫ as a separate problem for each 𝐹, and asking to minimize 𝑡 (𝐹, 𝑊) among graphons
conjecture
𝑊 with 𝑊 ≥ 𝑝. Whether the constant graphon is the unique minimizer is the subject of an
even stronger conjecture known as the forcing conjecture.

Definition 5.0.6 (Forcing graphs)

We say that a graph 𝐹 is forcing if every graphon 𝑊 with 𝑡 (𝐹, 𝑊) = 𝑡 (𝐾2 , 𝑊) 𝑒 (𝐹 ) is a
constant graphon (up to a set of measure zero)

By translating back and forth between graph limits and sequences of graphs, being forcing
is equivalent to the quasirandomness condition. Thus any forcing graph can play the role
of 𝐶4 in Theorem 3.1.1. This is what led Chung, Graham, and Wilson to consider forcing
graphs. In particular, 𝐶4 is forcing.

Proposition 5.0.7 (Forcing and quasirandomness)

A graph 𝐹 is forcing if and only if for every constant 𝑝 ∈ [0, 1], every sequence of graphs
𝐺 = 𝐺 𝑛 with
𝑡 (𝐾2 , 𝐺) = 𝑝 + 𝑜(1) and 𝑡 (𝐹, 𝐺) = 𝑝 𝑒 (𝐹 ) + 𝑜(1)
is quasirandom in the sense of Definition 3.1.2.

Exercise 5.0.8. Prove Proposition 5.0.7.

The forcing conjecture, below, states a complete characterization of forcing graphs (Skokan
& Thoma 2004; Conlon, Fox, & Sudakov 2010).

Conjecture 5.0.9 (Forcing conjecture)

A graph is forcing if and only if it is bipartite and has at least one cycle.

Exercise 5.0.10. Prove the “only if” direction of the forcing conjecture.
164 Graph Homomorphism Inequalities

Exercise 5.0.11. Prove that every forcing graph is Sidorenko.

Exercise 5.0.12 (Forcing and stability). Show that a graph 𝐹 is forcing if and only if for
every 𝜀 > 0, there exists 𝛿 > 0 such that if a graph 𝐺 satisfies 𝑡 (𝐹, 𝐺) ≤ 𝑡 (𝐾2 , 𝐺) 𝑒 (𝐹 ) + 𝛿,
then 𝛿□ (𝐺, 𝑝) ≤ 𝜀.
The following exercise shows that to prove a graph is Sidorenko, we do not lose anything
by giving away a constant factor. The proof is a quick and neat application of the tensor
power trick.
Exercise 5.0.13 (Tensor power trick). Let 𝐹 be a bipartite graph. Suppose there is some
constant 𝑐 > 0 such that
𝑡 (𝐹, 𝐺) ≥ 𝑐 𝑡 (𝐾2 , 𝐺) 𝑒 (𝐹 ) for all graphs 𝐺.
Show that 𝐹 is Sidorenko.

5.1 Edge vs. Triangle Densities

What are all the pairs of edge and triangles densities that can occur in a graph (or graphon)?
Since the set of graphs is dense in the space of graphons, the closure of {(𝑡 (𝐾2 , 𝐺), 𝑡 (𝐾3 , 𝐺)) :
graph 𝐺} is the
edge-triangle region := {(𝑡 (𝐾2 , 𝑊), 𝑡 (𝐾3 , 𝑊)) : graphon 𝑊 } ⊆ [0, 1] 2 . (5.2)
This is a closed subset of [0, 1] 2 , due to the compactness of the space of graphons. This set
has been completely determined, and it is illustrated in Figure 5.1 on the next page. We will
discuss its features in this section.
The upper and lower boundaries of this region correspond to the answers of the following
question.

Question 5.1.1 (Extremal triangle density given edge density)

Fix 𝑝 ∈ [0, 1]. What are the minimum and maximum possible 𝑡 (𝐾3 , 𝑊) among all
graphons with 𝑡 (𝐾2 , 𝑊) = 𝑝?

For a given 𝑝 ∈ [0, 1], the set {𝑡 (𝐾3 , 𝑊) : 𝑡 (𝐾2 , 𝑊) = 𝑝} is a closed interval. Indeed,
if 𝑊0 achieves the minimum triangle density, and 𝑊1 achieves the maximum, then their
linear interpolation 𝑊𝑡 = (1 − 𝑡)𝑊0 + 𝑡𝑊1 , ranging over 0 ≤ 𝑡 ≤ 1, must have triangle
density continuously interpolating between those of 𝑊0 and 𝑊1 , and therefore achieves every
intermediate value.

Maximum triangle density

The maximization part of Question 5.1.1 is easier. The answer is 𝑝 3/2 .

Theorem 5.1.2 (Max triangle density)

For every graph 𝐺,
𝑡 (𝐾3 , 𝐺) ≤ 𝑡 (𝐾2 , 𝐺) 3/2 .
5.1 Edge vs. Triangle Densities 165

𝑡 (𝐾3 , 𝑊)

𝑦 = 𝑥 3/2 12
25
3
8

2
9

1 2 3 4
0 2 3 4 5 1 𝑡 (𝐾2 , 𝑊)

𝑡 (𝐾3 , 𝑊)

0 1 𝑡 (𝐾2 , 𝑊)

Figure 5.1 The top figure shows the edge-triangle region. This region is often
depicted as in the bottom figure, which better highlights the concave scallops on the
lower boundary but is a less accurate plot.
166 Graph Homomorphism Inequalities

This inequality is asymptotically tight for 𝐺 being a clique on a subset of vertices. The
equivalent graphon inequality 𝑡 (𝐾3 , 𝑊) ≤ 𝑡 (𝐾2 , 𝑊) 3/2 attains equality for the clique graphon

(
0 𝑎 1
1 if 𝑥, 𝑦 ≤ 𝑎, 1
𝑊 (𝑥, 𝑦) = (5.3)
0 otherwise. 𝑎
0
1

For the above 𝑊, we have 𝑡 (𝐾3 , 𝐺) = 𝑎 3 while 𝑡 (𝐾2 , 𝐺) = 𝑎 2 .

Proof. The quantities hom(𝐾3 , 𝐺) and hom(𝐾2 , 𝐺) count the number of closed walks in the
graph of length 3 and 2, respectively. Let 𝜆1 ≥ · · · ≥ 𝜆 𝑛 be the eigenvalues of the adjacency
matrix 𝐴𝐺 of 𝐺, then
∑︁
𝑘 ∑︁
𝑘
3
hom(𝐾3 , 𝐺) = tr 𝐴𝐺 = 𝜆3𝑖 and 2
hom(𝐾2 , 𝐺) = tr 𝐴𝐺 = 𝜆2𝑖
𝑖=1 𝑖=1

Then (see lemma below)

! 3/2
∑︁
𝑛 ∑︁
𝑛
hom(𝐾3 , 𝐺) = 𝜆3𝑖 ≤ 𝜆2𝑖 = hom(𝐾2 , 𝐺) 3/2 .
𝑖=1 𝑖=1
3
After dividing by 𝑣(𝐺) on both sides, the result follows. □

Lemma 5.1.3 (A power sum inequality)

Let 𝑡 ≥ 1, and 𝑎 1 , · · · , 𝑎 𝑛 ≥ 0. Then,
𝑎 1𝑡 + · · · + 𝑎 𝑡𝑛 ≤ (𝑎 1 + · · · + 𝑎 𝑛 ) 𝑡 .

Proof. Assume at least one 𝑎 𝑖 is positive, or else both sides equal to zero. Then
𝑛 𝑡 ∑︁
LHS ∑︁
𝑛
𝑎𝑖 𝑎𝑖
= ≤ = 1. □
RHS 𝑖=1 𝑎 1 + · · · + 𝑎 𝑛 𝑖=1
𝑎1 + · · · + 𝑎 𝑛
Remark 5.1.4. We will see additional proofs of Theorem 5.1.2 not invoking eigenvalues
later in Exercise 5.2.14 and in Section 5.3. Theorem 5.1.2 is an inequality in “physical space”
(as opposed to going into the “frequency space” of the spectrum), and it is a good idea to
think about how to prove it while staying in the physical space.
More generally, the clique graphon (5.3) also maximizes 𝐾𝑟 -densities among all graphons
of given edge density.

Theorem 5.1.5 (Maximum clique density)

For any graphon 𝑊 and integer 𝑘 ≥ 3,
𝑡 (𝐾 𝑘 , 𝑊) ≤ 𝑡 (𝐾2 , 𝑊) 𝑘/2 .

Proof. There exist integers 𝑎, 𝑏 ≥ 0 such that 𝑘 = 3𝑎 + 2𝑏 (e.g., take 𝑎 = 1 if 𝑘 is odd and
5.1 Edge vs. Triangle Densities 167

𝑎 = 0 if 𝑘 is even). Then 𝑎𝐾3 + 𝑏𝐾2 (a disjoint union of 𝑎 triangles and 𝑏 isolated edges) is
a subgraph of 𝐾 𝑘 . So
𝑡 (𝐾 𝑘 , 𝑊) ≤ 𝑡 (𝑎𝐾3 + 𝑏𝐾2 , 𝑊) = 𝑡 (𝐾3 , 𝑊) 𝑎 𝑡 (𝐾2 , 𝑊) 𝑏 ≤ 𝑡 (𝐾2 , 𝑊) 3𝑎/2+𝑏 = 𝑡 (𝐾2 , 𝑊) 𝑘/2 . □
Remark 5.1.6 (Kruskal–Katona theorem). Thanks to a theorem of Kruskal (1963) and
Katona (1968), the exact answer to the following non-asymptotic question is completely
known:
What is the maximum
number of copies of 𝐾 𝑘 ’s in an 𝑛-vertex graph with 𝑚 edges?
When 𝑚 = 𝑎2 for some integer 𝑎, the optimal graph is a clique on 𝑎 vertices. More
generally, for any value of 𝑚, the optimal graph is obtained by adding edges in colexicographic
order:
12, 13, 23, 14, 24, 34, 15, 25, 35, 45, . . .
This is stronger than Theorem 5.1.5, which only gives an asymptotically tight answer as
𝑛 → ∞. The full Kruskal–Katona theorem also answers:
What is the maximum
number of 𝑘-cliques in an 𝑟-graph with 𝑛 vertices and 𝑚 edges?
When 𝑚 = 𝑎𝑟 , the optimal 𝑟-graph is a clique on 𝑎 vertices. (An asymptotic version of
this statement can be proved using techniques in Section 5.3.) More generally, the optimal
𝑟-graph is obtained by adding the edges in colexicographic order. For example, for 3-graphs,
the edges should be added in the following order:
123, 124, 134, 234, 125, 135, 235, 145, 245, 345, . . .
Here 𝑎 1 . . . 𝑎 𝑟 < 𝑏 1 . . . 𝑏 𝑟 in colexicographic order if 𝑎 𝑖 < 𝑏 𝑖 at the last 𝑖 where 𝑎 𝑖 ≠ 𝑏 𝑖 (i.e.,
dictionary order when read from right to left). Here we sort the elements of each 𝑟-tuple in
increasing order.
The Kruskal–Katona theorem can be proved by a compression/shifting argument. The
idea is to repeatedly modify the graph so that we eventually end up at the optimal graph. At
each step, we “push” all the edges towards a clique along some “direction” in a way that does
not reduce the number of 𝑘-cliques in the graph.

Minimum triangle density

Now we turn to the lower boundary of the edge-triangle region. What is the minimum triangle
density in a graph of given edge density 𝑝?
For 𝑝 ≤ 1/2, we can have complete bipartite graphs of density 𝑝 + 𝑜(1), which are
triangle-free. For 𝑝 > 1/2, the triangle density must be positive due to Mantel’s theorem
(Theorem 1.1.1) and supersaturation (Theorem 1.3.4). It turns out that among graphs with
edge density 𝑝 + 𝑜(1), the triangle density is asymptotically minimized by certain complete
multipartite graphs, although this is not easy to prove.
For each positive integer 𝑘, we have

1 1 2
𝑡 (𝐾2 , 𝐾 𝑘 ) = 1 − and 𝑡 (𝐾3 , 𝐾 𝑘 ) = 1 − 1− .
𝑘 𝑘 𝑘
As 𝑘 ranges over all positive integers, these pairs form special points on the lower boundary of
168 Graph Homomorphism Inequalities

the edge-triangle region, as illustrated in Figure 5.1 on page 165. (Recall that 𝐾 𝑘 is associated
to the same graphon as a complete 𝑘-partite graph with equal parts.)
Now suppose the given edge density 𝑝 lies strictly between 1 − 1/(𝑘 − 1) and 1 − 1/𝑘
for some integer 𝑘 ≥ 2. To obtain the graphon with edge density 𝑝 and minimum triangle
density, we first start with 𝐾 𝑘 with all vertices having equal weight. And then shrink the
relative weight of exactly one of the 𝑘 vertices (while keeping the remaining 𝑘 − 1 vertices
to have the same vertex weight). For example, the graphon illustrated below is obtained by
starting with 𝐾4 and shrinking the weight on one vertex.
𝐼1 𝐼2 𝐼3 𝐼4

𝐼1 0 1 1 1

𝐼2 1 0 1 1

𝐼3 1 1 0 1

𝐼4 1 1 1 0

During this process, the total edge density (account for vertex weights) decreases continuously
from 1 − 1/𝑘 to 1 − 1/(𝑘 − 1). At some point, the edge density is equal to 𝑝. This vertex-
weighted 𝑘-clique 𝑊 turns out minimize triangle density among all graphons with edge
density 𝑝.
The above claim is much more difficult to prove than the maximum triangle density result.
This theorem, stated below, due to Razborov (2008), was proved using an involved Cauchy–
Schwarz calculus that he coined flag algebra. We will say a bit more about this method in
Section 5.2.

Theorem 5.1.7 (Minimum triangle density)

Fix 0 ≤ 𝑝 ≤ 1 and 𝑘 = ⌈1/(1 − 𝑝)⌉. The minimum of 𝑡 (𝐾3 , 𝑊) among graphons 𝑊
with 𝑡 (𝐾2 , 𝑊) = 𝑝 is attained by the stepfunction 𝑊 associated to a 𝑘-clique with node
weights 𝑎 1 , 𝑎 2 , · · · , 𝑎 𝑘 with sum equal to 1, 𝑎 1 = · · · = 𝑎 𝑘−1 ≥ 𝑎 𝑘 , and 𝑡 (𝐾2 , 𝑊) = 𝑝.

We will not prove this theorem in full here. See Lovász (2012, Section 16.3.2) for a
presentation of the proof of Theorem 5.1.7. Later in this Chapter, we give lower bounds that
match the edge-triangle region at the cliques. In particular, Theorem 5.4.4 will allow us to
determine the convex hull of the region.
The graphon described in Theorem 5.1.7 turns out to be not unique unless 𝑝 = 1 − 1/𝑘
for some positive integer 𝑘. Indeed, suppose 1 − 1/(𝑘 − 1) < 𝑝 < 1 − 1/𝑘. Let 𝐼1 , . . . , 𝐼 𝑘 be
the partition of [0, 1] into the intervals corresponding to the vertices of the vertex-weighted
𝑘-clique, with 𝐼1 , . . . , 𝐼 𝑘−1 all having equal length, and 𝐼 𝑘 strictly smaller length. Now replace
the graphon on 𝐼 𝑘−1 ∪ 𝐼 𝑘 by an arbitrary triangle-free graphon of the same edge density.
5.2 Cauchy–Schwarz 169

𝐼1 𝐼2 𝐼3 𝐼4

𝐼1 0 1 1 1

𝐼2 1 0 1 1

any
𝐼3 1 1 triangle-
0 1
free
𝐼4 1 1 graphon
1 0

This operation does not change the edge-density or the triangle-density of the graphon
(check!). The non-uniqueness of the minimizer hints at the difficulty of the result.
This completes our discussion of the edge-triangle region (Figure 5.1 on page 165).
Theorem 5.1.7 was generalized from 𝐾3 to 𝐾4 (Nikiforov 2011), and then to all cliques 𝐾𝑟
(Reiher 2016). The construction for the minimizing graphon is the same as for the triangle
case.

Theorem 5.1.8 (Minimum clique density)

Fix 0 ≤ 𝑝 ≤ 1 and 𝑘 = ⌈1/(1 − 𝑝)⌉. The minimum of 𝑡 (𝐾𝑟 , 𝑊) among graphons 𝑊
with 𝑡 (𝐾2 , 𝑊) = 𝑝 is attained by the stepfunction 𝑊 associated to a 𝑘-clique with node
weights 𝑎 1 , 𝑎 2 , · · · , 𝑎 𝑘 with sum equal to 1, 𝑎 1 = · · · = 𝑎 𝑘−1 ≥ 𝑎 𝑘 , and 𝑡 (𝐾2 , 𝑊) = 𝑝.

Exercise 5.1.9. Prove that 𝐶6 is Sidorenko.

Hint: Write hom(𝐶6 , 𝐺) and hom(𝐾2 , 𝐺) in terms of the spectrum of 𝐺 .

5.2 Cauchy–Schwarz
We will apply the Cauchy–Schwarz inequality in the following form: given real-valued
functions 𝑓 and 𝑔 on the same space (always assuming the usual measurability assumptions
without further comments), we have
∫ 2 ∫ ∫
2 2
𝑓𝑔 ≤ 𝑓 𝑔 .
𝑋 𝑋 𝑋

It is one of the most versatile inequalities in combinatorics.

To better emphasize the variables being integrated, we write below the integral sign. The
domain of integration (usually [0, 1] for each variable) is omitted to avoid clutter. We write
∫ ∫
𝑓 (𝑥, 𝑦, . . . ) for 𝑓 (𝑥, 𝑦, . . . ) 𝑑𝑥𝑑𝑦 · · · .
𝑥,𝑦,...

In practice, we will often apply the Cauchy–Schwarz inequality by changing the order of
integration, and separating an integral into an outer integral and an inner integral.
A typical application of the Cauchy–Schwarz inequality is demonstrated in the following
170 Graph Homomorphism Inequalities

calculation (here one should think of 𝑥, 𝑦, 𝑧 each as collections of variables):

∫ ∫ ∫ ∫
𝑓 (𝑥, 𝑦)𝑔(𝑥, 𝑧) = 𝑓 (𝑥, 𝑦) 𝑔(𝑥, 𝑧)
2 ! 1/2 ∫ ∫ 2 ! 1/2
𝑥,𝑦,𝑧 𝑥 𝑦 𝑧
∫ ∫
≤ 𝑓 (𝑥, 𝑦) 𝑔(𝑥, 𝑧)
𝑥 𝑦 𝑥 𝑧
∫ 1/2 ∫ 1/2
′ ′
= 𝑓 (𝑥, 𝑦) 𝑓 (𝑥, 𝑦 ) 𝑔(𝑥, 𝑧)𝑔(𝑥, 𝑧 )
𝑥,𝑦,𝑦 ′ 𝑥,𝑧,𝑧 ′

Note that in the final step, “expanding a square” has the effect of “duplicating a variable.”
It is useful to recognize expressions with duplicated variables that can be folded back into a
square.
Let us warm up by proving that 𝐾2,2 is Sidorenko. We actually already proved this statement
in Proposition 3.1.14 in the context of the Chung–Graham–Wilson theorem on quasirandom
graphs. We repeat the same calculations here to demonstrate the integral notation.

Theorem 5.2.1 (𝐾2,2 is Sidorenko)

𝑡 (𝐾2,2 , 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 4 .

The theorem follows from the next two lemmas.

Lemma 5.2.2
𝑡 (𝐾1,2 , 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 2 .

Proof. By rewriting as a square and then applying the Cauchy–Schwarz inequality,

∫ ∫ ∫ 2
′
𝑡 (𝐾1,2 , 𝑊) = 𝑊 (𝑥, 𝑦)𝑊 (𝑥, 𝑦 ) = 𝑊 (𝑥, 𝑦)
∫ 2
𝑥,𝑦,𝑦 ′ 𝑥 𝑦

≥ 𝑊 (𝑥, 𝑦) = 𝑡 (𝐾2 , 𝑊) 2 . □
𝑥,𝑦

Lemma 5.2.3
𝑡 (𝐾2,2 , 𝑊) ≥ 𝑡 (𝐾1,2 , 𝑊) 2 .

Proof. Similar to the previous proof, we have

∫
𝑡 (𝐾2,2 , 𝑊) = 𝑊 (𝑥, 𝑧)𝑊 (𝑥, 𝑧 ′ )𝑊 (𝑦, 𝑧)𝑊 (𝑦, 𝑧 ′ )
∫ ∫ 2 ∫ 2
𝑥,𝑦,𝑧,𝑧 ′

= 𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧) ≥ 𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧) = 𝑡 (𝐾1,2 , 𝑊) 2 . □

𝑥,𝑦 𝑧 𝑥,𝑦,𝑧

Proofs involving Cauchy–Schwarz are sometimes called “sum-of-square” proofs. The

Cauchy–Schwarz inequality can be proved by writing the difference between the two sides
as a sum of square quantity:
∫ ∫ ∫ 2 ∫
1
𝑓2 𝑔2 − 𝑓𝑔 = ( 𝑓 (𝑥)𝑔(𝑦) − 𝑓 (𝑦)𝑔(𝑥)) 2 .
2 𝑥,𝑦
5.2 Cauchy–Schwarz 171

Commonly, 𝑔 = 1, in which case we can also write

∫ ∫ 2 ∫ ∫ 2
2
𝑓 − 𝑓 = 𝑓 (𝑥) − 𝑓 (𝑦) .
𝑥 𝑦

For example, We can write the proof of Lemma 5.2.3 as

∫ ∫ 2
2
𝑡 (𝐾1,2 , 𝑊) − 𝑡 (𝐾2 , 𝑊) ≥ 𝑊 (𝑥, 𝑦) − 𝑡 (𝐾2 , 𝑊) .
𝑥 𝑦

Exercise 5.2.4. Write 𝑡 (𝐾2,2 , 𝑊) − 𝑡 (𝐾2 , 𝑊) 4 as a single sum-of-squares expression.

The next inequality tells us that if we color the edges of 𝐾𝑛 using two colors, then at least
1/4 + 𝑜(1) fraction of all triangles are monochromatic (Goodman 1959). Note that this 1/4
constant is tight since it is obtained by a uniform random coloring. In the graphon formulation
below, the graphons 𝑊 and 1 − 𝑊 correspond to edges of each color. We have equality for
the constant 1/2 graphon.

Theorem 5.2.5 (Triangle is common)

𝑡 (𝐾3 , 𝑊) + 𝑡 (𝐾3 , 1 − 𝑊) ≥ 1/4

Proof. Expanding, we have

∫
𝑡 (𝐾3 , 1 − 𝑊) = (1 − 𝑊 (𝑥, 𝑦)) (1 − 𝑊 (𝑥, 𝑧)) (1 − 𝑊 (𝑦, 𝑧)) 𝑑𝑥𝑑𝑦𝑑𝑧

= 1 − 3𝑡 (𝐾2 , 𝑊) + 3𝑡 (𝐾1,2 , 𝑊) − 𝑡 (𝐾3 , 𝑊).

So
𝑡 (𝐾3 , 𝑊) + 𝑡 (𝐾3 , 1 − 𝑊) = 1 − 3𝑡 (𝐾2 , 𝑊) + 3𝑡 (𝐾1,2 , 𝑊)
≥ 1 − 3𝑡 (𝐾2 , 𝑊) + 3𝑡 (𝐾2 , 𝑊) 2
2
1 1 1
= + 3 𝑡 (𝐾2 , 𝑊) − ≥ . □
4 2 4
Which graphs, other than triangles, have the above property? We do not know the full
answer.
Definition 5.2.6 (Common graphs)
We say that a graph 𝐹 is common if for all graphons 𝑊,
𝑡 (𝐹, 𝑊) + 𝑡 (𝐹, 1 − 𝑊) ≥ 2−𝑒 (𝐹 )+1 .
In other words, the left-hand side is minimized by the constant 1/2 graphon.

Although it was initially conjectured that all graphs are common, this turns out to be false.
In particular, 𝐾𝑡 is not common for all 𝑡 ≥ 4 (Thomason 1989).

Proposition 5.2.7
Every Sidorenko graph is common.
172 Graph Homomorphism Inequalities

Proof. Suppose 𝐹 were Sidorenko. Let 𝑝 = 𝑡 (𝐾2 , 𝑊). Then 𝑡 (𝐹, 𝑊) ≥ 𝑝 𝑒 (𝐹 ) and 𝑡 (𝐹, 1 −
𝑊) ≥ 𝑡 (𝐾2 , 1 − 𝑊) 𝑒 (𝐹 ) = (1 − 𝑝) 𝑒 (𝐹 ) . Adding up and using convexity,
𝑡 (𝐹, 𝑊) + 𝑡 (𝐹, 1 − 𝑊) ≥ 𝑝 𝑒 (𝐹 ) + (1 − 𝑝) 𝑒 (𝐹 ) ≥ 2−𝑒 (𝐹 )+1 . □
The converse is false. The triangle is common but not Sidorenko (recall that every
Sidorenko graph is bipartite).
We also have the following lower bound on the minimum triangle density given edge
density (Goodman 1959).

Theorem 5.2.8 (Lower bound on triangle density)

𝑡 (𝐾3 , 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) (2𝑡 (𝐾2 , 𝑊) − 1).

Below is plot of Goodman’s bound against the true edge triangle region from Figure 5.1
on page 165. The inequality is tight whenever 𝑊 = 𝐾𝑛 , in which case 𝑡 (𝐾2 , 𝑊) = 1 − 1/𝑛
and 𝑡 (𝐾3 , 𝑊) = 𝑛3 /𝑛3 = (1 − 1/𝑛) (1 − 2/𝑛). In particular, Goodman’s bound implies that
𝑡 (𝐾3 , 𝑊) > 0 whenever 𝑡 (𝐾2 , 𝑊) > 1/2, which we saw from Mantel’s theorem.

𝑡 (𝐾3 , 𝑊)

𝑦 = 𝑥(2𝑥 − 1)

0 𝑡 (𝐾2 , 𝑊) 1

Figure 5.2 The Goodman lower bound on the triangle density from Theorem 5.2.8
plotted on top of the edge-triangle region (Figure 5.1 on page 165).

Proof. Since 0 ≤ 𝑊 ≤ 1, we have (1 − 𝑊 (𝑥, 𝑧)) (1 − 𝑊 (𝑦, 𝑧)) ≥ 0, and so

𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧) ≥ 𝑊 (𝑥, 𝑧) + 𝑊 (𝑦, 𝑧) − 1.
5.2 Cauchy–Schwarz 173

Thus
∫
𝑡 (𝐾3 , 𝐺) = 𝑊 (𝑥, 𝑦)𝑊 (𝑥, 𝑧)𝑊 (𝑦, 𝑧)
∫
𝑥,𝑦,𝑧

≥ 𝑊 (𝑥, 𝑦) (𝑊 (𝑥, 𝑧) + 𝑊 (𝑦, 𝑧) − 1)

𝑥,𝑦,𝑧

= 2𝑡 (𝐾1,2 , 𝑊) − 𝑡 (𝐾2 , 𝑊)
≥ 2𝑡 (𝐾2 , 𝑊) 2 − 𝑡 (𝐾2 , 𝑊). □
Finally, let us demonstrate an application of the Cauchy–Schwarz inequality in the follow-
ing form, for nonnegative functions 𝑓 and 𝑔:
∫ ∫ ∫
𝑓 2𝑔 𝑔 ≥ 𝑓𝑔 .

Recall that a graph 𝐹 is Sidorenko if 𝑡 (𝐹, 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 𝑒 (𝐹 ) for all graphons 𝑊 (Defini-
tion 5.0.4).

Theorem 5.2.9
is Sidorenko.

Proof. The idea is the “fold” the above graph 𝐹 in half along the middle using the Cauchy–
Schwarz inequality. Using 𝑤 and 𝑥 to indicate the two vertices in the middle, we have

∫ ∫ 2 𝑧 𝑥
𝑡 (𝐹, 𝑊) = 𝑊 (𝑤, 𝑦)𝑊 (𝑦, 𝑧)𝑊 (𝑧, 𝑥) 𝑊 (𝑤, 𝑥).
𝑤, 𝑥,𝑦,𝑧
𝑦 𝑤

So
∫ ∫ 2
𝑡 (𝐹, 𝑊)𝑡 (𝐾2 , 𝑊) ≥ 𝑊 (𝑤, 𝑦)𝑊 (𝑦, 𝑧)𝑊 (𝑧, 𝑥)𝑊 (𝑤, 𝑥)
𝑤, 𝑥,𝑦,𝑧

= 𝑡 (𝐶4 , 𝑊) 2 ≥ 𝑡 (𝐾2 , 𝑊) 8 ,
with the last step due to Theorem 5.2.1. Therefore 𝑡 (𝐹, 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 7 and hence 𝐹 is
Sidorenko. □
Remark 5.2.10 (Flag algebra). The above examples were all simple enough to be found
by hand. As mentioned earlier, every application of the Cauchy–Schwarz inequality can be
rewritten in the form of a sum of a squares. One could actually search for these sum-of-
squares proofs more systematically using a computer program. This idea, first introduced
by Razborov (2007), can be combined with other sophisticated methods to determine the
lower boundary of the edge-triangle region (Razborov 2008). Razborov coined the term flag
algebra to describe a formalization of such calculations. The technique is also sometimes
called graph algebra, Cauchy–Schwarz calculus, sum-of-squares proof.
Conceptually, the idea is that we are looking for all the ways to obtain nonnegative linear
combinations of squared expressions. In a typical application, one is asked to solve an
174 Graph Homomorphism Inequalities

extremal problem of the form

Minimize 𝑡 (𝐹0 , 𝑊)
Subject to 𝑡 (𝐹1 , 𝑊) = 𝑞 1 , ..., 𝑡 (𝐹ℓ , 𝑊) = 𝑞 ℓ ,
𝑊 a graphon.
The technique is very flexible. The objectives and constraints could be any linear combi-
nations of densities. It could be maximization instead of minimization. Extensions of the
techniques can handle wider classes of extremal problems, such as for hypergraphs, directed
graphs, edge-colored graphs, permutations, and more.
Let us illustrate the technique. The nonnegativity of squares implies inequalities such as
∫ ∫ 2
𝑊 (𝑥, 𝑦)𝑊 (𝑥, 𝑧) (𝑎𝑊 (𝑥, 𝑢)𝑊 (𝑦, 𝑢) − 𝑏𝑊 (𝑥, 𝑤)𝑊 (𝑤, 𝑢)𝑊 (𝑢, 𝑧) + 𝑐) ≥ 0.
𝑥,𝑦,𝑧 𝑢,𝑤

Here 𝑎, 𝑏, 𝑐 ∈ R are constants (to be chosen). We can expand the above expression, and then,
for instance,
∫ 2 ∫
replace 𝐺 𝑥,𝑦,𝑧 (𝑢, 𝑤) by 𝐺 𝑥,𝑦,𝑧 (𝑢, 𝑤)𝐺 𝑥,𝑦,𝑧 (𝑢 ′ , 𝑤 ′ ).
𝑢,𝑤 𝑢,𝑤,𝑢′ ,𝑤 ′

We obtain a nonnegative linear combination of 𝑡 (𝐹, 𝑊) over various 𝐹 with undetermined

real coefficients.
The idea is to now consider all such nonnegative expressions (in practice, on a computer,
we consider a large but finite set of such inequalities). Then we try to optimize the previously
undetermined real coefficients (𝑎, 𝑏, 𝑐 above). By adding together an optimized nonnegative
linear combination of all such inequalities, and combining with the given constraints, we
aim to obtain an inequality 𝑡 (𝐹0 , 𝑊) ≥ 𝛼 for some real 𝛼. This would prove a bound
on the minimization problem stated earlier. We can find such coefficient and nonnegative
combinations efficiently using a semidefinite program (SDP) solver. If we also happen to
have an example of 𝑊 satisfying the constraints and matching the bound (i.e., 𝑡 (𝐹0 , 𝑊) = 𝛼),
then we would have solved the extremal problem.
The flag algebra method, with computer assistance, has successfully solved many inter-
esting extremal problems in graph theory. For example, a conjecture of Erdős (1984) on the
maximum pentagon density in a triangle-free graph was solved using flag algebra methods;
the extremal construction is a blow-up of a 5-cycle (Grzesik 2012; Hatami, Hladký, Kráľ,
Norine, & Razborov 2013).

Theorem 5.2.11 (Maximum number 5-cycles in a triangle-free graph)

Every 𝑛-vertex triangle-free graph has at most (𝑛/5) 5 cycles of length 5.
5.2 Cauchy–Schwarz 175

Let us mention another nice result obtained using the flag algebra method.
What is the maximum possible number of induced copies of a given graph 𝐻 among all
𝑛-vertex graphs? (Pippenger & Golumbic 1975)
𝑛
The optimal limiting density (as a fraction of 𝑣 (𝐻 )
, as 𝑛 → ∞) is called the inducibility
of graph 𝐻. They conjectured that for every 𝑘 ≥ 5, the inducibility of a 𝑘-cycle is 𝑘!/(𝑘 𝑘 − 𝑘),
obtained by an iterated blow-up of a 𝑘-cycle (𝑘 = 5 illustrated below; in the limit the should
be infinitely many fractal-like iterations).

The conjecture for 5-cycles was proved by using flag algebra methods combined with addi-
tional “stability” methods (Balogh, Hu, Lidický, & Pfender 2016). The constant factor in the
following theorem is tight.

Theorem 5.2.12 (Inducibility of the 5-cycle)

Every 𝑛-vertex graph has at most 𝑛5 /(55 − 5) induced 5-cycles.

Although the flag algebra method has successfully solved several extremal problems, in
many interesting cases, the method does not give a tight bound. Nevertheless, for many open
extremal problems, such as the tetrahedron hypergraph Turán problem, the best known bound
comes from this approach.
Remark 5.2.13 (Incompleteness). Can every true linear inequality for graph homomor-
phism densities be proved via Cauchy–Schwarz/sum-of-squares?
Before giving the answer, we first discuss classical results about real polynomials. Suppose
𝑝(𝑥1 , . . . , 𝑥 𝑛 ) is a real polynomial such that 𝑝(𝑥 1 , . . . , 𝑥 𝑛 ) ≥ 0 for all 𝑥 1 , . . . , 𝑥 𝑛 ∈ R. Can
such a nonnegative polynomial always be written as a sum of squares? Hilbert (1888; 1893)
proved that the answer is yes for 𝑛 ≤ 2 and no in general for 𝑛 ≥ 3. The first explicit
counterexample was given by Motzkin (1967):
𝑝(𝑥, 𝑦) = 𝑥 4 𝑦 2 + 𝑥 2 𝑦 4 + 1 − 3𝑥 2 𝑦 2
is always nonnegative due to the AM-GM inequality, but it cannot be written as a non-
negative sum of squares. Solving Hilbert’s 17th problem, Artin (1927) proved that every
𝑝(𝑥1 , . . . , 𝑥 𝑛 ) ≥ 0 can be written as a sum of squares of rational functions, meaning that
there is some nonzero polynomial 𝑞 such that 𝑝𝑞 2 can be written as a sum of squares of
176 Graph Homomorphism Inequalities

polynomials. For the earlier example,

𝑥 2 𝑦 2 (𝑥 2 + 𝑦 2 + 1) (𝑥 2 + 𝑦 2 − 2) 2 + (𝑥 2 − 𝑦 2 ) 2
𝑝(𝑥, 𝑦) = .
(𝑥 2 + 𝑦 2 ) 2
Í
Turning back to inequalities between graph homomorphism densities, if 𝑓 (𝑊) = 𝑖 𝑐 𝑖 𝑡 (𝐹𝑖 , 𝑊)
is nonnegative for every graphon 𝑊, can 𝑓 always be written as a nonnegative sum of squares
of rational functions in 𝑡 (𝐹, 𝑊)? In other words, can every true inequality can be proved using
a finite number of Cauchy–Schwarz inequalities (i.e., via vanilla flag algebra calculations).
It turns out that the answer is no (Hatami & Norine 2011). Indeed, if there were always
a sum-of-squares proof, then we could obtain an algorithm for deciding whether 𝑓 (𝑊) ≥ 0
(with rational coefficients, say) holds for all graphons 𝑊, thereby contradicting the unde-
cidability of the problem (Remark 5.0.2). Consider the algorithm that enumerates over all
possible forms of sum-of-squares expressions (with undetermined coefficients that can then
be solved for) and in parallel enumerates over all graphs 𝐺 and checks whether 𝑓 (𝐺) ≥ 0. If
every true inequality had a sum-of-squares proof, then this algorithm would always terminate
and tell us whether 𝑓 (𝑊) ≥ 0 for all graphons 𝑊.
Exercise 5.2.14 (Another proof of maximum triangle density). Let 𝑊 : [0, 1] 2 → R be
a symmetric measurable function. Write 𝑊 2 for the function taking value 𝑊 2 (𝑥, 𝑦) =
𝑊 (𝑥, 𝑦) 2 .
(a) Show that 𝑡 (𝐶4 , 𝑊) ≤ 𝑡 (𝐾2 , 𝑊 2 ) 2 .
(b) Show that 𝑡 (𝐾3 , 𝑊) ≤ 𝑡 (𝐾2 , 𝑊 2 ) 1/2 𝑡 (𝐶4 , 𝑊) 1/2 .
Combining the two inequalities we deduce 𝑡 (𝐾3 , 𝑊) ≤ 𝑡 (𝐾2 , 𝑊 2 ) 3/2 , which is somewhat
stronger than Theorem 5.1.2. We will see another proof below in Corollary 5.3.10.
Exercise 5.2.15. Prove that the skeleton of the 3-cube (below) is Sidorenko.

Exercise 5.2.16. Prove that 𝐾4− is common, where 𝐾4− is 𝐾4 with one edge removed.

Exercise 5.2.17. Prove that every path is Sidorenko, by extending the proof of Theo-
rem 5.3.4.
Exercise 5.2.18 (A lower bound on clique density). Show that for every positive integer
𝑟 ≥ 3, and graphon 𝑊, writing 𝑝 = 𝑡 (𝐾2 , 𝑊),
𝑡 (𝐾𝑟 , 𝑊) ≥ 𝑝(2𝑝 − 1) (3𝑝 − 2) · · · ((𝑟 − 1) 𝑝 − (𝑟 − 2)) .
Note that this inequality is tight when 𝑊 is the associated graphon of a clique.
Exercise 5.2.19 (Triangle vs. diamond). Prove there is a function 𝑓 : [0, 1] → [0, 1]
with 𝑓 (𝑥) ≥ 𝑥 2 and lim 𝑥→0 𝑓 (𝑥)/𝑥 2 = ∞ such that
𝑡 (𝐾4− , 𝑊) ≥ 𝑓 (𝑡 (𝐾3 , 𝑊))
for all graphons 𝑊. Here 𝐾4− is 𝐾4 with one edge removed.
Hint: Apply the triangle removal lemma
5.3 Hölder 177

5.3 Hölder
Hölder’s inequality is a generalization of the Cauchy–Schwarz inequality. It says that given
𝑝 1 , . . . , 𝑝 𝑘 ≥ 1 with 1/𝑝 1 + · · · +1/𝑝 𝑘 = 1, and real-valued functions 𝑓1 , . . . , 𝑓 𝑘 on a common
space, we have
∫
𝑓1 𝑓2 · · · 𝑓 𝑘 ≤ ∥ 𝑓1 ∥ 𝑝1 · · · ∥ 𝑓 𝑘 ∥ 𝑝𝑘 ,

where the 𝒑-norm of a function 𝑓 is defined by

∫ 1/ 𝑝
∥ 𝒇 ∥ 𝒑 := | 𝑓 |𝑝 .

In practice, the case 𝑝 1 = · · · = 𝑝 𝑘 = 𝑘 of Hölder’s inequality is used often.

We can apply Hölder’s inequality to show that 𝐾𝑠,𝑡 is Sidorenko. The proof is essentially
verbatim to the proof of Theorem 5.2.1 that 𝑡 (𝐾2,2 , 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 4 from the previous section,
except that we now apply Hölder’s inequality instead of the Cauchy–Schwarz inequality. We
outline the steps below and leave the details as an exercise.

Theorem 5.3.1 (Complete bipartite graphs are Sidorenko)

𝑡 (𝐾𝑠,𝑡 , 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 𝑠𝑡 .

Lemma 5.3.2
𝑡 (𝐾𝑠,1 , 𝑊) ≥ 𝑡 (𝐾2 , 𝑊) 𝑠 .

Lemma 5.3.3
𝑡 (𝐾𝑠,𝑡 , 𝑊) ≥ 𝑡 (𝐾𝑠,1 , 𝑊) 𝑡 .

Sidorenko’s conjecture for 3-edge path

It is already quite a non-trivial fact that all paths are Sidorenko (Mulholland & Smith 1959;
Atkinson, Watterson, & Moran 1960; Blakley & Roy 1965). You are encouraged to try it
yourself before looking at the next proof.

Theorem 5.3.4
The 3-edge path is Sidorenko.

Let us give two short proofs that both appeared as answers to a MathOverflow question
https://mathoverflow.net/q/189222. Later in Section 5.5 we will see another proof
using the entropy method.
The first proof is a special case of a more general technique by Sidorenko (1991).
𝑤
𝑥
𝑦
𝑧
178 Graph Homomorphism Inequalities

∫ the 3-edge path is Sidorenko. Let 𝑃4 be the 3-edge path. Let 𝑊 be a graphon.
First proof that
Let 𝑔(𝑥) = 𝑊 (𝑥, 𝑦), representing the “degree” of vertex 𝑥. We have
∫ ∫
𝑦

𝑡 (𝑃4 , 𝑊) = 𝑊 (𝑥, 𝑤)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦) = 𝑔(𝑥)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦).
𝑤, 𝑥,𝑦,𝑧 𝑥,𝑦,𝑧

By relabeling, we can also write it as

∫
𝑡 (𝑃4 , 𝑊) = 𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦)𝑔(𝑧).
𝑥,𝑦,𝑧

Applying the Cauchy–Schwarz inequality twice, followed by Hölder’s inequality,

√︄∫ √︄∫
𝑡 (𝑃4 , 𝑊) = 𝑔(𝑥)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦) 𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦)𝑔(𝑧)
∫
𝑥,𝑦,𝑧 𝑥,𝑦,𝑧
√︁ √︁
≥ 𝑔(𝑥)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦) 𝑔(𝑧)
∫ ∫ √︁ 2
𝑥,𝑦,𝑧

= 𝑔(𝑥)𝑊 (𝑥, 𝑦)
∫ 2
𝑦 𝑥
√︁
≥ 𝑔(𝑥)𝑊 (𝑥, 𝑦)
∫ 2 ∫ 3 ∫ 3
𝑥,𝑦

3/2
= 𝑔(𝑥) ≥ 𝑔(𝑥) = 𝑊 (𝑥, 𝑦) . □
𝑥 𝑥 𝑥,𝑦

The second proof is due to Lee (2019).

∫
Second proof that the 3-edge path is Sidorenko. Define 𝑔(𝑥) = 𝑦
𝑊 (𝑥, 𝑦) as earlier. We
have ∫ ∫
𝑡 (𝑃4 , 𝑊) = 𝑊 (𝑥, 𝑤)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦) = 𝑔(𝑥)𝑊 (𝑥, 𝑦)𝑔(𝑦).
𝑤, 𝑥,𝑦,𝑧 𝑥,𝑦

Note that ∫ ∫
𝑊 (𝑥, 𝑦) 𝑔(𝑥)
= = 1.
𝑥,𝑦 𝑔(𝑥) 𝑥 𝑔(𝑥)
Similarly we have
∫
𝑊 (𝑥, 𝑦)
= 1.
𝑥,𝑦 𝑔(𝑦)
So by Hölder’s inequality
∫ ∫ ∫
𝑊 (𝑥, 𝑦) 𝑊 (𝑥, 𝑦)
𝑡 (𝑃4 , 𝑊) = 𝑔(𝑥)𝑊 (𝑥, 𝑦)𝑔(𝑦)
𝑔(𝑥) 𝑔(𝑦)
∫ 3
𝑥,𝑦 𝑥,𝑦 𝑥,𝑦

≥ 𝑊 (𝑥, 𝑦) . □
𝑥,𝑦
5.3 Hölder 179

A generalization of Hölder’s inequality

Now we discuss a powerful variant of Hölder’s inequality due to Finner (1992), which is
related more generally to Brascamp–Lieb inequalities. Here is a representative example.

Theorem 5.3.5 (Generalized Hölder inequality for a triangle)

Let 𝑋, 𝑌 , 𝑍 be measure spaces. Let 𝑓 : 𝑋 × 𝑌 → R, 𝑔 : 𝑋 × 𝑍 → R, and ℎ : 𝑌 × 𝑍 → R
be measurable functions (assuming integrability whenever needed). Then
∫
𝑓 (𝑥, 𝑦)𝑔(𝑥, 𝑧)ℎ(𝑦, 𝑧) ≤ ∥ 𝑓 ∥ 2 ∥𝑔∥ 2 ∥ℎ∥ 2 .
𝑥,𝑦,𝑧

∫
Note that a straightforward ∫
application of Hölder’s inequality, when 𝑋, 𝑌 , 𝑍 are probability
spaces (so that 𝑥,𝑦,𝑧 𝑓 (𝑥, 𝑦) = 𝑥,𝑦 𝑓 (𝑥, 𝑦)) would yield
∫
𝑓 (𝑥, 𝑦)𝑔(𝑥, 𝑧)ℎ(𝑦, 𝑧) ≤ ∥ 𝑓 ∥ 3 ∥𝑔∥ 3 ∥ℎ∥ 3 .
𝑥,𝑦,𝑧

This is weaker than Theorem 5.3.5. Indeed, in a probability space, ∥ 𝑓 ∥ 2 ≤ ∥ 𝑓 ∥ 3 by Hölder’s

inequality.
Proof of Theorem 5.3.5. We apply the Cauchy–Schwarz inequality three times. First to the
integral over 𝑥 (this affects 𝑓 and 𝑔 while leaving ℎ intact):
∫ ∫ ∫ 1/2 ∫ 1/2
𝑓 (𝑥, 𝑦)𝑔(𝑥, 𝑧)ℎ(𝑦, 𝑧) ≤ 𝑓 (𝑥, 𝑦) 2 𝑔(𝑥, 𝑧) 2 ℎ(𝑦, 𝑧).
𝑥,𝑦,𝑧 𝑦,𝑧 𝑥 𝑥

Next, we apply the Cauchy–Schwarz inequality to the variable 𝑦 (this affects 𝑓 and ℎ while
leaving 𝑔 intact). Continuing the above inequality,
∫ ∫ 1/2 ∫ 1/2 ∫ 1/2
2 2 2
≤ 𝑓 (𝑥, 𝑦) 𝑔(𝑥, 𝑧) ℎ(𝑦, 𝑧) .
𝑧 𝑥,𝑦 𝑥 𝑦

Finally, we apply the Cauchy–Schwarz inequality to the variable 𝑧 (this affects 𝑔 and ℎ while
leaving 𝑥 intact). Continuing the above inequality,
∫ 1/2 ∫ 1/2 ∫ 1/2
2 2 2
≤ 𝑓 (𝑥, 𝑦) 𝑔(𝑥, 𝑧) ℎ(𝑦, 𝑧) .
𝑥,𝑦 𝑥,𝑧 𝑦,𝑧

This completes the proof of Theorem 5.3.5. □

Remark 5.3.6 (Projection inequalities). What is the maximum volume of a body 𝐾 ⊆ R3
whose projection on each coordinate plane is at most 1? A unit cube has volume 1, but is
this the largest possible?
Letting | · | denote both volume and area (depending on the dimension) and 𝜋 𝑥 𝑦 (𝐾) denote
the project of 𝐾 onto the 𝑥𝑦-plane, and likewise with the other planes. Using 1𝐾 (𝑥, 𝑦, 𝑧) ≤
𝑓 (𝑥, 𝑦)𝑔(𝑥, 𝑧)ℎ(𝑦, 𝑧), Theorem 5.3.5 implies
|𝐾 | 2 ≤ 𝜋 𝑥 𝑦 (𝐾) |𝜋 𝑥𝑧 (𝐾)| 𝜋 𝑦𝑧 (𝐾) . (5.4)
In particular, if all three projections have volume at most 1, then |𝐾 | ≤ 1.
The inequality (5.4), which holds more generally in higher dimensions, is due to Loomis
180 Graph Homomorphism Inequalities

& Whitney (1949). See Exercise 5.3.9 below. It has important applications in combinatorics.
A powerful generalization known as Shearer’s entropy inequality will be discussed in
Section 5.5. Also see Exercise 5.5.19 for a strengthening of the projection inequalities.
Now let us state a more general form of Theorem 5.3.5, which can be proved using the
same techniques. The key point of the inequality in Theorem 5.3.5 is that each variable
(i.e., 𝑥, 𝑦, and 𝑧) is contained in exactly 2 of the factors (i.e., 𝑓 (𝑥, 𝑦), 𝑔(𝑥, 𝑧), and ℎ(𝑦, 𝑧)).
Everything works the same way as long as each variable is contained in exactly 𝑘 factors, as
long as we use 𝐿 𝑘 norms on the right-hand side.
For example,
∫
𝑓1 (𝑢, 𝑣) 𝑓2 (𝑣, 𝑤) 𝑓3 (𝑤, 𝑧) 𝑓4 (𝑥, 𝑦) 𝑣 𝑤
𝑢,𝑣,𝑤, 𝑥,𝑦,𝑧
Ö
9 𝑢 𝑥
· 𝑓5 (𝑦, 𝑧) 𝑓6 (𝑧, 𝑢) 𝑓7 (𝑢, 𝑥) 𝑓8 (𝑢, 𝑧) 𝑓9 (𝑤, 𝑦) ≤ ∥ 𝑓𝑖 ∥ 3 . 𝑧 𝑦
𝑖=1

Here the factors in the integral correspond to edges of a 3-regular graph shown. In particular,
every variable lies in exactly 3 factors.
More generally, each function 𝑓𝑖 can take as input any number of variables, as long as
every variable appears in exactly 𝑘 functions. For example
∫
𝑓 (𝑤, 𝑥, 𝑦)𝑔(𝑤, 𝑦, 𝑧)ℎ(𝑥, 𝑧) ≤ ∥ 𝑓 ∥ 2 ∥𝑔∥ 2 ∥ℎ∥ 2 .
𝑤, 𝑥,𝑦,𝑧

The inequality is stated more generally below. Given 𝑥 = (𝑥 1 , . . . , 𝑥 𝑚 ) ∈ 𝑋1 × · · · × 𝑋𝑚 and

Î
𝐼 ⊆ [𝑚], we write 𝜋 𝐼 (𝑥) = (𝑥𝑖 )𝑖 ∈ 𝐼 ∈ 𝑖 ∈ 𝐼 𝑋𝑖 for the projection onto the coordinate subspace
of 𝐼.

Theorem 5.3.7 (Generalized Hölder inequality)

Let 𝑋1 , . . . , 𝑋𝑚 be measure spaces. Let 𝐼1 , . . . , 𝐼ℓ ⊆ [𝑚] such that each element of [𝑚]
Î
appears in exactly 𝑘 different 𝐼𝑖′ 𝑠. For each 𝑖 ∈ [ℓ], let 𝑓𝑖 : 𝑗 ∈ 𝐼𝑖 𝑋 𝑗 → R. Then
∫
𝑓1 (𝜋 𝐼1 (𝑥)) · · · 𝑓ℓ (𝜋 𝐼ℓ (𝑥)) 𝑑𝑥 ≤ ∥ 𝑓1 ∥ 𝑘 · · · ∥ 𝑓ℓ ∥ 𝑘 .
𝑋1 ×···×𝑋𝑚

Furthermore, if every 𝑋𝑖 is a probability space, then we can relax the hypothesis to “each
element of [𝑚] appears in at most 𝑘 different 𝐼𝑖 ’s.”

Exercise 5.3.8. Prove Theorem 5.3.7 by generalizing the proof of Theorem 5.3.5.
The next exercise generalizes the projection inequality from Remark 5.3.6. Also see
Exercise 5.5.19 for a strengthening.
Exercise 5.3.9 (Projection inequalities). Let 𝐼1 , . . . , 𝐼ℓ ⊆ [𝑑] such that each element of
[𝑑] appears in exactly 𝑘 different 𝐼𝑖′ 𝑠. Prove that for any compact body 𝐾 ⊆ R𝑑 , with | · |
denoting volume in the appropriate dimension,
|𝐾 | 𝑘 ≤ |𝜋 𝐼1 (𝐾)| · · · |𝜋 𝐼ℓ (𝐾)|.
The version of Theorem 5.3.7 with each 𝑋𝑖 being a probability space is useful for graphons.
5.3 Hölder 181

Corollary 5.3.10 (Upper bound on 𝐹 -density)

For any graph 𝐹 with maximum degree at most 𝑘, and graphon 𝑊,
𝑡 (𝐹, 𝑊) ≤ ∥𝑊 ∥ 𝑒𝑘 (𝐹 ) .

In particular, since
∫
∥𝑊 ∥ 𝑘𝑘 = 𝑊 𝑘 ≤ 𝑡 (𝐾2 , 𝑊),

the inequality implies that

𝑡 (𝐹, 𝑊) ≤ 𝑡 (𝐾2 , 𝑊) 𝑒 (𝐹 )/𝑘 .
This implies the upper bound on clique densities (Theorems 5.1.2 and 5.1.5). The stronger
statement of Corollary 5.3.10 with the 𝐿 𝑘 norm of 𝑊 on the right-hand side has no direct
interpretations for subgraph densities, but it is important for certain applications such as to
understanding large deviation rates in random graphs (Lubetzky & Zhao 2017).
More generally, using different 𝐿 𝑝 norms for different factors in Hölder’s inequality, we
have the following statement (Finner 1992).

Theorem 5.3.11 (Generalized Hölder inequality)

Let 𝑋1 , . . . , 𝑋𝑚 be measure spaces. For each 𝑖 ∈ [ℓ], let 𝑝 𝑖 ≥ 1, let 𝐼𝑖 ⊆ [𝑚], and
Î
𝑓𝑖 : 𝑋 → R. If either
Í𝑗 ∈𝐼𝑖 𝑗
(a) 𝑖: 𝑗 ∈𝐼𝑖 1/𝑝 𝑖 = 1 for each 𝑗 ∈ [𝑚],
OR
Í
(b) each 𝑋𝑖 is a probability space and 𝑖: 𝑗 ∈ 𝐼𝑖 1/𝑝 𝑖 ≤ 1 for each 𝑗 ∈ [𝑚],
then ∫
𝑓1 (𝜋 𝐼1 (𝑥)) · · · 𝑓ℓ (𝜋 𝐼ℓ (𝑥)) 𝑑𝑥 ≤ ∥ 𝑓1 ∥ 𝑝1 · · · ∥ 𝑓ℓ ∥ 𝑝ℓ .
𝑋1 ×···×𝑋ℓ

Exercise 5.3.12. Prove Theorem 5.3.11.

An application of generalized Hölder inequalities

Now we turn to another graph inequality that where the above generalization of Hölder’s
inequality plays a key role.

Question 5.3.13 (Maximum number of independent sets in a regular graph)

Fix 𝑑. Among 𝑑-regular graphs, which graph 𝐺 maximizes 𝑖(𝐺) 1/𝑣 (𝐺) , where 𝑖(𝐺)
denotes the number of independent sets of 𝐺.

The answer turns out to be 𝐺 = 𝐾 𝑑,𝑑 . We can also take 𝐺 to be a disjoint union of copies
of 𝐾 𝑑,𝑑 ’s, and this would not change 𝑖(𝐺) 1/𝑣 (𝐺) . This result, stated below, was shown by
Kahn (2001) for bipartite regular graphs 𝐺, and later extended by Zhao (2010) to all regular
graphs 𝐺.
182 Graph Homomorphism Inequalities

Theorem 5.3.14 (Maximum number of independent sets in a regular graph)

For every 𝑛-vertex 𝑑-regular graph 𝐺,
𝑖(𝐺) ≤ 𝑖(𝐾 𝑑,𝑑 ) 𝑛/(2𝑑) = (2𝑑+1 − 1) 𝑛/(2𝑑) .

The set of independent sets of 𝐺 is in bĳection with the set of graph homomorphisms
from 𝐺 to the following graph:

Indeed, a map between their vertex sets form a graph homomorphism if and only if the
vertices of 𝐺 that map to the non-looped vertex is an independent set of 𝐺.
Let us first prove Theorem 5.3.14 for bipartite regular 𝐺. The following more general in-
equality was shown by Galvin & Tetali (2004). It implies the bipartite case of Theorem 5.3.14
by the above discussion.

Theorem 5.3.15 (The maximum number of 𝐻 -colorings in a regular graph)

For every 𝑛-vertex 𝑑-regular bipartite graph 𝐺, and any graph 𝐻 (allowing looped vertices
on 𝐻)
hom(𝐺, 𝐻) ≤ hom(𝐾 𝑑,𝑑 , 𝐻) 𝑛/(2𝑑) .

This is equivalent to the following statement.

Theorem 5.3.16
For any 𝑑-regular bipartite graph 𝐹,
2
𝑡 (𝐹, 𝑊) ≤ 𝑡 (𝐾 𝑑,𝑑 , 𝑊) 𝑒 (𝐹 )/𝑑

Let us prove this theorem in the case 𝐹 = 𝐶6 to illustrate the technique more concretely.
The general proof is basically the same. Let
∫
𝑓 (𝑥 1 , 𝑥2 ) = 𝑊 (𝑥1 , 𝑦)𝑊 (𝑥 2 , 𝑦).
𝑦

This function should be thought of the codegree of vertices 𝑥 1 and 𝑥 2 . Then, grouping the
factors in the integral according to their right-endpoint, we have
𝑥1 𝑦1
𝑥2 𝑦2
𝑥3 𝑦3
5.3 Hölder 183
∫
𝑡 (𝐶6 , 𝑊) = 𝑊 (𝑥1 , 𝑦 1 )𝑊 (𝑥 2 , 𝑦 1 )𝑊 (𝑥 1 , 𝑦 2 )𝑊 (𝑥 3 , 𝑦 2 )𝑊 (𝑥2 , 𝑦 3 )𝑊 (𝑥2 , 𝑦 3 )
∫ ∫ ∫
𝑥1 , 𝑥2 , 𝑥3 ,𝑦1 ,𝑦2 ,𝑦3

= 𝑊 (𝑥 1 , 𝑦 1 )𝑊 (𝑥2 , 𝑦 1 ) 𝑊 (𝑥 1 , 𝑦 2 )𝑊 (𝑥3 , 𝑦 2 )
∫
𝑥 1 , 𝑥2 , 𝑥3 𝑦1 𝑦2

· 𝑊 (𝑥 2 , 𝑦 3 )𝑊 (𝑥3 , 𝑦 3 )
∫
𝑦3

= 𝑓 (𝑥1 , 𝑥2 ) 𝑓 (𝑥1 , 𝑥3 ) 𝑓 (𝑥 2 , 𝑥3 )
𝑥 1 , 𝑥2 , 𝑥3

≤ ∥ 𝑓 ∥ 32 [by generalized Hölder, Theorem 5.3.5 / 5.3.7]

On the other hand, we have

∫
∥ 𝑓 ∥ 22 = 𝑓 (𝑥1 , 𝑥2 ) 2
∫ 1 2 ∫ ∫
𝑥 ,𝑥

= 𝑊 (𝑥1 , 𝑦 1 )𝑊 (𝑥2 , 𝑦 1 ) 𝑊 (𝑥1 , 𝑦 2 )𝑊 (𝑥2 , 𝑦 2 )

∫1 2
𝑥 ,𝑥 𝑦1 𝑦2

= 𝑊 (𝑥1 , 𝑦 1 )𝑊 (𝑥 2 , 𝑦 1 )𝑊 (𝑥 1 , 𝑦 2 )𝑊 (𝑥2 , 𝑦 2 )
𝑥1 , 𝑥2 ,𝑦1 ,𝑦2

= 𝑡 (𝐶4 , 𝑊).
𝑥1 𝑦1
𝑥2 𝑦2

This proves Theorem 5.3.16 in the case 𝐹 = 𝐶6 . The theorem in general can be proved via
a similar calculation.
Exercise 5.3.17. Complete the proof of Theorem 5.3.16 by generalizing the above argu-
ment.
Remark 5.3.18. Kahn (2001) first proved the bipartite case of Theorem 5.3.14 using
Shearer’s entropy inequality, which we will see in Section 5.5. His technique was extended
by Galvin & Tetali (2004) to prove Theorem 5.3.15. The proof using generalized Hölder’s
inequality presented here was given by Lubetzky & Zhao (2017).
So far we proved Theorem 5.3.14 for bipartite regular graphs. To prove it for all regular
graphs, we apply the following inequality by Zhao (2010). Here 𝐺 × 𝐾2 (tensor product) is
the bipartite double cover of 𝐺. An example is illustrated below:

𝐺 𝐺 × 𝐾2

The vertex set of 𝐺 × 𝐾2 is 𝑉 (𝐺) × {0, 1}. Its vertices are labeled 𝑣 𝑖 with 𝑣 ∈ 𝑉 (𝐺) and
𝑖 ∈ {0, 1}. Its edges are 𝑢 0 𝑣 1 for all 𝑢𝑣 ∈ 𝐸 (𝐺). Note that 𝐺 × 𝐾2 is always a bipartite graph.
184 Graph Homomorphism Inequalities

Theorem 5.3.19 (Bipartite double cover for independent sets)

For every graph 𝐺,
𝑖(𝐺) 2 ≤ 𝑖(𝐺 × 𝐾2 ).

Assuming Theorem 5.3.19, we can now prove Theorem 5.3.14 by reducing the statement
to the bipartite case, which we proved earlier. Indeed, for every 𝑑-regular graph 𝐺,
𝑖(𝐺) ≤ 𝑖(𝐺 × 𝐾2 ) 1/2 ≤ 𝑖(𝐾 𝑑,𝑑 ) 𝑛/(2𝑑) ,
where the last step follows from applying Theorem 5.3.14 to the bipartite graph 𝐺 × 𝐾2 .
Proof of Theorem 5.3.19. Let 2𝐺 denote a disjoint union of two copies of 𝐺. Label its
vertices by 𝑣 𝑖 with 𝑣 ∈ 𝑉 and 𝑖 ∈ {0, 1} so that its edges are 𝑢 𝑖 𝑣 𝑖 with 𝑢𝑣 ∈ 𝐸 (𝐺) and
𝑖 ∈ {0, 1}. We will give an injection 𝜙 : 𝐼 (2𝐺) → 𝐼 (𝐺 × 𝐾2 ). Recall that 𝐼 (𝐺) is the set of
independent sets of 𝐺. The injection would imply 𝑖(𝐺) 2 = 𝑖(2𝐺) ≤ 𝑖(𝐺 × 𝐾2 ) as desired.
Fix an arbitrary order on all subsets of 𝑉 (𝐺). Let 𝑆 be an independent set of 2𝐺. Let
𝐸 bad (𝑆) := {𝑢𝑣 ∈ 𝐸 (𝐺) : 𝑢 0 , 𝑣 1 ∈ 𝑆}.
Note that 𝐸 bad (𝑆) is a bipartite subgraph of 𝐺, since each edge of 𝐸 bad has exactly one
endpoint in {𝑣 ∈ 𝑉 (𝐺) : 𝑣 0 ∈ 𝑆} but not both (or else 𝑆 would not be independent). Let 𝐴
denote the first subset (in the previously fixed ordering) of 𝑉 (𝐺) such that all edges in 𝐸 bad (𝑆)
have one vertex in 𝐴 and the other outside 𝐴. Define 𝜙(𝑆) to be the subset of 𝑉 (𝐺) × {0, 1}
obtained by “swapping” the pairs in 𝐴. That is, for all 𝑣 ∈ 𝐴, 𝑣 𝑖 ∈ 𝜙(𝑆) if and only if 𝑣 1−𝑖 ∈ 𝑆
for each 𝑖 ∈ {0, 1}, and for all 𝑣 ∉ 𝐴, 𝑣 𝑖 ∈ 𝜙(𝑆) if and only if 𝑣 𝑖 ∈ 𝑆 for each 𝑖 ∈ {0, 1}. It is
not hard to verify that 𝜙(𝑆) is an independent set in 𝐺 × 𝐾2 . The swapping procedure fixes
the “bad” edges.

bad edges swap to get

highlighted indep set

2𝐺 𝐺 × 𝐾2 𝐺 × 𝐾2

It remains to verify that 𝜙 is an injection. For every 𝑆 ∈ 𝐼 (2𝐺), once we know 𝑇 = 𝜙(𝑆),
we can recover 𝑆 by first setting
′
𝐸 bad (𝑇) = {𝑢𝑣 ∈ 𝐸 (𝐺) : 𝑢 𝑖 , 𝑣 𝑖 ∈ 𝑇 for some 𝑖 ∈ {0, 1}},
so that 𝐸 bad (𝑆) = 𝐸 bad
′
(𝑇), and then finding 𝐴 as earlier and swapping the pairs of 𝐴 back.
(Remark: it follows that 𝑇 ∈ 𝐼 (𝐺 × 𝐾2 ) lies in the image of 𝜙 if and only if 𝐸 bad ′
(𝑇) is
bipartite.) □
Remark 5.3.20 (Reverse Sidorenko). Does Theorem 5.3.15 generalize to all regular graphs
𝐺 like Theorem 5.3.14? Unfortunately, no. For example, when 𝐻 = consists of two
isolated loops, hom(𝐺, 𝐻) = 2𝑐 (𝐺) , with 𝑐(𝐺) being the number of connected components
of 𝐺. So hom(𝐺, 𝐻) 1/𝑣 (𝐺) is minimized among 𝑑-regular graphs 𝐺 for 𝐺 = 𝐾 𝑑+1 , which is
the connected 𝑑-regular graph with the fewest vertices.
Theorem 5.3.15 actually extends to every triangle-free regular graph 𝐺. Furthermore, for
5.3 Hölder 185

every non-triangle-free regular graph 𝐺, there is some graph 𝐻 for which the inequality in
Theorem 5.3.15 fails.
There are several interesting families of graphs 𝐻 where Theorem 5.3.15 is known to
extend to all regular graphs 𝐺. Notably, this is true for 𝐻 = 𝐾𝑞 , which is significant since
hom(𝐺, 𝐾𝑞 ) is the number of proper 𝑞-colorings of 𝐺.
There are also generalizations of the above to non-regular graphs. For example, for a graph
𝐺 without isolated vertices, letting 𝑑𝑢 denote the degree of 𝑢 ∈ 𝑉 (𝐺), we have
Ö
𝑖(𝐺) ≤ 𝑖(𝐾 𝑑𝑢 ,𝑑𝑣 ) 1/(𝑑𝑢 𝑑𝑣 ) .
𝑢𝑣 ∈𝐸 (𝐺)

And similarly for the number of proper 𝑞-colorings. In fact, the results mentioned in this
remark about regular graphs are proved by induction on vertices of 𝐺, and thus require
considering the larger family of not necessarily regular graphs 𝐺.
The results discussed in this remark are due to Sah, Sawhney, Stoner, & Zhao (2019;
2020). The term reverse Sidorenko inequalities was introduced to describe inequalities such
2
as 𝑡 (𝐹, 𝑊) 1/𝑒 (𝐹 ) ≤ 𝑡 (𝐾 𝑑,𝑑 , 𝑊) 1/𝑑 , which mirror the inequality 𝑡 (𝐹, 𝑊) 1/𝑒 (𝐹 ) ≥ 𝑡 (𝐾2 , 𝑊) in
Sidorenko’s conjecture. Also see the earlier survey by Zhao (2017) for discussions of related
results and open problems.
We already know through the quasirandom graph equivalences (Theorem 3.1.1) that 𝐶4 is
forcing. The following exercise generalizes this fact.
Exercise 5.3.21. Prove that 𝐾 𝑠,𝑡 is forcing whenever 𝑠, 𝑡 ≥ 2.

Exercise 5.3.22. Let 𝐹 be a bipartite graph with vertex bipartition 𝐴 ∪ 𝐵 such that every
vertex in 𝐵 has degree 𝑑. Let 𝑑𝑢 denote the degree of 𝑢 in 𝐹. Prove that for every graphon
Ö
𝑊,
𝑡 (𝐹, 𝑊) ≤ 𝑡 (𝐾 𝑑𝑢 ,𝑑𝑣 , 𝑊) 1/(𝑑𝑢 𝑑𝑣 ) .
𝑢𝑣 ∈𝐸 (𝐹 )

Exercise 5.3.23 (Sidorenko for 3-edge path with vertex weights). Let 𝑊 : [0, 1] 2 →
[0, ∞) be a measurable function (not necessarily symmetric). Let 𝑝, 𝑞, 𝑟, 𝑠 : [0, 1] →
[0, ∞) be measurable functions. Prove that
∫
𝑝(𝑤)𝑞(𝑥)𝑟 (𝑦)𝑠(𝑧)𝑊 (𝑥, 𝑤)𝑊 (𝑥, 𝑦)𝑊 (𝑧, 𝑦) 𝑤
∫ 3
𝑤, 𝑥,𝑦,𝑧 𝑥
𝑦
≥ ( 𝑝(𝑥)𝑞(𝑥)𝑟 (𝑦)𝑠(𝑦)) 1/3𝑊 (𝑥, 𝑦) . 𝑧
𝑥,𝑦

Exercise 5.3.24. For a graph 𝐺, let 𝑓𝑞 (𝐺) denote the number of maps 𝑉 (𝐺) → {0, 1, . . . , 𝑞}
such that 𝑓 (𝑢) + 𝑓 (𝑣) ≤ 𝑞 for every 𝑢𝑣 ∈ 𝐸 (𝐺). Prove that for every 𝑛-vertex 𝑑-regular
graph 𝐺 (not necessarily bipartite),
𝑓𝑞 (𝐺) ≤ 𝑓𝑞 (𝐾 𝑑,𝑑 ) 𝑛/(2𝑑) .
186 Graph Homomorphism Inequalities

5.4 Lagrangian
Another proof of Turán’s theorem
Here is another proof of Turán’s theorem due to Motzkin & Straus (1965). It can be viewed
as a continuous/analytic analogue of the Zykov symmetrization proof of Turán’s theorem
from Section 1.2 (the third proof there).

Theorem 5.4.1 (Turán’s theorem)

The number of edges in an 𝑛-vertex 𝐾𝑟+1 -free graph is at most

1 𝑛2
1− .
𝑟 2

Proof. Let 𝐺 be a 𝐾𝑟+1 -free graph on vertex set [𝑛]. Consider the function
∑︁
𝑓 (𝑥 1 , . . . , 𝑥 𝑛 ) = 𝑥𝑖 𝑥 𝑗 .
𝑖 𝑗 ∈𝐸 (𝐺)

We want to show that

1 1 1 1
𝑓 ,..., ≤ 1− .
𝑛 𝑛 2 𝑟
In fact, we will show that

1 1
max 𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) ≤ 1− .
𝑥1 ,..., 𝑥𝑛 ≥0 2 𝑟
𝑥1 +···+𝑥𝑛 =1

By compactness, the maximum is achieved at some 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ). Let us choose such a

maximizing vector with the minimum support size (i.e., the number of nonzero coordinates).
Suppose 𝑖 𝑗 ∉ 𝐸 (𝐺) for some pair of distinct 𝑥 𝑖 , 𝑥 𝑗 > 0. If we replace (𝑥 𝑖 , 𝑥 𝑗 ) by (𝑠, 𝑥𝑖 +
𝑥 𝑗 − 𝑠), then 𝑓 changes linearly in 𝑠 (since 𝑥 𝑖 𝑥 𝑗 does not come up as a summand in 𝑓 ), and
since 𝑓 is already maximized at 𝑥, it must not actually change with 𝑠. So we can replace
(𝑥𝑖 , 𝑥 𝑗 ) by (𝑥𝑖 + 𝑥 𝑗 , 0), which keeps 𝑓 the same while decreasing the number of nonzero
coordinates of 𝑥.
Thus the support of 𝑥 is a clique in 𝐺. By labeling vertices, say that 𝑥1 , . . . , 𝑥 𝑘 > 0 and
𝑥 𝑘+1 = 𝑥 𝑘+2 = · · · = 𝑥 𝑛 = 0. Since 𝐺 is 𝐾𝑟+1 -free, this clique has size 𝑘 ≤ 𝑟. So
∑︁ 𝑘 !2
1 1 ∑︁ 1 1 1 1
𝑓 (𝑥) = 𝑥𝑖 𝑥 𝑗 ≤ 1− 𝑥𝑖 = 1− ≤ 1− . □
1≤𝑖< 𝑗 ≤ 𝑘
2 𝑘 𝑖=1 2 𝑘 2 𝑟

Remark 5.4.2 (Hypergraph Lagrangians). The Lagrangian of a hypergraph 𝐻 with vertex

set [𝑛] is defined to be
∑︁ Ö
𝝀(𝑯) := max 𝑓 (𝑥1 , . . . , 𝑥 𝑛 ), where 𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑥𝑖 .
𝑥1 ,..., 𝑥𝑛 ≥0
𝑥1 +···+𝑥𝑛 =1 𝑒∈𝐸 (𝐻 ) 𝑖 ∈𝑒

It is a useful tool for certain hypergraph Turán problems. The above proof of Turán’s theorem
shows that for every graph 𝐺, 𝜆(𝐺) = (1 − 1/𝜔(𝐺))/2, where 𝜔(𝐺) is the size of the largest
clique in 𝐺. A maximizing 𝑥 has coordinate 1/𝜔(𝐺) on vertices of the clique and zero
elsewhere.
5.4 Lagrangian 187

As an alternate but equivalent perspective, the above proof can rephrased in terms of
maximizing the edge density among 𝐾𝑟+1 -free vertex-weighted graphs (vertex weights are
given by the vector 𝑥 above). The proof shifts weights between non-adjacent vertices while
not decreasing the edge density, and this process preserves 𝐾𝑟+1 -freeness.

Linear inequalities between clique densities

The next theorem shows that to check whether a linear inequality in clique densities in
𝐺 holds, it suffices to check it for 𝐺 being cliques (Bollobás 1976; Schelp & Thomason
1998). The 𝐾𝑟 density in a vertex-weighted clique can be expressed in terms of elementary
symmetric polynomials, which recall are given as follows:
𝑒 0 (𝑥1 , . . . , 𝑥 𝑛 ) = 1,
𝑒 1 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑥1 + · · · + 𝑥 𝑛 ,
∑︁
𝑒 2 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑥𝑖 𝑥 𝑗 ,
∑︁
1≤𝑖< 𝑗 ≤𝑛

𝑒 3 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑥𝑖 𝑥 𝑗 𝑥 𝑘 ,
1≤𝑖< 𝑗<𝑘 ≤𝑛
..
.
𝑒 𝑛 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑥1 · · · 𝑥 𝑛 .

Lemma 5.4.3 (Extreme points of a linear combination of symmetric polynomials)

Let 𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) be real linear combination of elementary symmetric polynomials in
𝑥1 , . . . , 𝑥 𝑛 . Suppose 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ) minimizes 𝑓 (𝑥) among all vectors 𝑥 ∈ R𝑛 with
𝑥1 , . . . , 𝑥 𝑛 ≥ 0 and 𝑥 1 + · · · + 𝑥 𝑛 = 1, and furthermore 𝑥 has minimum support size among
all such minimizers. Then, up to permuting the coordinates of 𝑥, there is some 1 ≤ 𝑘 ≤ 𝑛
so that
𝑥1 = · · · = 𝑥 𝑘 = 1/𝑘 and 𝑥 𝑘+1 = · · · = 𝑥 𝑛 = 0.

Proof. Suppose 𝑥 1 , . . . , 𝑥 𝑘 > 0 and 𝑥 𝑘+1 = · · · = 𝑥 𝑛 = 0 with 𝑘 ≥ 2. Fixing 𝑥 3 , . . . , 𝑥 𝑛 , we

see that as a function of (𝑥1 , 𝑥2 ), 𝑓 has the form
𝐴𝑥 1 𝑥2 + 𝐵𝑥1 + 𝐵𝑥2 + 𝐶
where 𝐴, 𝐵, 𝐶 depend on 𝑥 3 , . . . , 𝑥 𝑛 . Notably the coefficients of 𝑥1 and 𝑥 2 agree due since 𝑓
is a symmetric polynomial. Holding 𝑥1 + 𝑥 2 fixed, 𝑓 has the form
𝐴𝑥 1 𝑥 2 + 𝐶 ′ .
If 𝐴 ≥ 0, then holding 𝑥1 + 𝑥2 fixed, we can set either 𝑥1 or 𝑥 2 to be zero while not increasing
𝑓 , which contradicts the hypothesis that the minimizing 𝑥 has minimum support size. So
𝐴 < 0, so that with 𝑥1 + 𝑥 2 held fixed, 𝐴𝑥 1 𝑥 2 + 𝐶 ′ is minimized uniquely at 𝑥 1 = 𝑥2 . Thus
𝑥 1 = 𝑥2 . Likewise, 𝑥 1 = · · · = 𝑥 𝑘 , as claimed. □
188 Graph Homomorphism Inequalities

Theorem 5.4.4 (Linear inequalities between clique densities)

Let 𝑐 1 , · · · , 𝑐 ℓ ∈ R. The inequality
∑︁
ℓ
𝑐𝑟 𝑡 (𝐾𝑟 , 𝐺) ≥ 0
𝑟=1

is true for every graph 𝐺 if and only if it is true with 𝐺 = 𝐾𝑛 for every positive integer 𝑛.

More explicitly, the above inequality holds for all graphs 𝐺 if and only if
∑︁
ℓ
𝑛(𝑛 − 1) · · · (𝑛 − 𝑟 + 1)
𝑐𝑟 · ≥0 for every 𝑛 ∈ N.
𝑟=1
𝑛𝑟
Since this is a single variable polynomial in 𝑛, it is usually easy to check this inequality. We
will see some examples right after the proof.
Proof. The only non-trivial direction is the “if” implication. Suppose the displayed inequality
holds for all cliques 𝐺. Let 𝐺 be an arbitrary graph with vertex set [𝑛]. Let
∑︁
ℓ ∑︁
𝑓 (𝑥1 , . . . , 𝑥 𝑛 ) = 𝑟!𝑐𝑟 𝑥𝑖1 · · · 𝑥𝑖𝑟 .
𝑟=1 {𝑖1 ,...,𝑖𝑟 }
𝑟 -clique in 𝐺

So
∑︁ℓ
1 1
𝑓 ,..., = 𝑐𝑟 𝑡 (𝐾𝑟 , 𝐺).
𝑛 𝑛 𝑟=1

It suffices to prove that

min 𝑓 (𝑥 1 , . . . , 𝑥 𝑛 ) ≥ 0.
𝑥1 ,..., 𝑥𝑛 ≥0
𝑥1 +···+𝑥𝑛 =1

By compactness, we can assume that the minimum is attained at some 𝑥. Among all
minimizing 𝑥, choose one with the smallest support (i.e., the number of nonzero coordinates).
As in the previous proof, if 𝑖 𝑗 ∉ 𝐸 (𝐺) for some pair of distinct 𝑥𝑖 , 𝑥 𝑗 > 0, then, replacing
(𝑥𝑖 , 𝑥 𝑗 ) by (𝑠, 𝑥𝑖 + 𝑥 𝑗 − 𝑠), 𝑓 changes linearly in 𝑠. Since 𝑓 is already minimized at 𝑥, it must
stay constant as 𝑠 changes. So we can replace (𝑥𝑖 , 𝑥 𝑗 ) by (𝑥𝑖 + 𝑥 𝑗 , 0), which keeps 𝑓 the same
while decreasing the number of nonzero coordinates of 𝑥. Thus the support of 𝑥 is a clique
in 𝐺. Suppose 𝑥 is supported on the first 𝑘 coordinates. Then 𝑓 is a linear combination of
elementary symmetric polynomial in 𝑥1 , . . . , 𝑥 𝑘 . By Lemma 5.4.3, 𝑥1 = · · · = 𝑥 𝑘 = 1/𝑘.
Íℓ
Then 𝑓 (𝑥) = 𝑟=1 𝑐𝑟 𝑡 (𝐾𝑟 , 𝐾 𝑘 ) ≥ 0 by hypothesis. □
Remark 5.4.5. This proof technique can be adapted to show the stronger result that among
Íℓ
all graphs 𝐺 with a given number of vertices, the quantity 𝑟=1 𝑐𝑟 𝑡 (𝐾𝑟 , 𝐺) is minimized
when 𝐺 is a multipartite graph. Compare with the Zykov symmetrization proof of Turán’s
theorem (Theorem 1.2.4).
The theorem only considers linear inequalities between clique densities. The statement
fails in general for inequalities with other graph densities (why?).
5.4 Lagrangian 189

Theorem 5.4.4 can be equivalently stated in terms of the convex hull of the region of all
possible clique density tuples.

Corollary 5.4.6 (Convex hull of feasible clique densities)

Let ℓ ≥ 3. In Rℓ −1 , the convex hull of
{(𝑡 (𝐾2 , 𝑊), 𝑡 (𝐾3 , 𝑊), · · · , 𝑡 (𝐾ℓ , 𝑊)) : graphons 𝑊 }
is the same as the convex hull of
{(𝑡 (𝐾2 , 𝐾𝑛 ), 𝑡 (𝐾3 , 𝐾𝑛 ), · · · , 𝑡 (𝐾ℓ , 𝐾𝑛 )) : 𝑛 ∈ N} .

For ℓ = 3, the points

1 1 2
(𝑡 (𝐾2 , 𝐾𝑛 ), 𝑡 (𝐾3 , 𝐾𝑛 )) = 1 − , 1 − 1− , 𝑛 ∈ N,
𝑛 𝑛 𝑛
are the extremal points of the convex hull of the edge-triangle region from (5.2). The actual
region, illustrated in Figure 5.1, has lower boundary consisting of concave curves connecting
the points (𝑡 (𝐾2 , 𝐾𝑛 ), 𝑡 (𝐾3 , 𝐾𝑛 )).
Exercise 5.4.7 (Turán’s theorem from the convex hull of feasible clique densities). Show
that Corollary 5.4.6 implies the following version of Turán’s theorem: 𝑡 (𝐾2 , 𝐺) ≤ 1 − 1/𝑟
for every 𝐾𝑟+1 -free graph 𝐺.

Exercise 5.4.8 (A generalization of Turán’s theorem). In an 𝑛-vertex graph, assign weight

𝑟/(𝑟 − 1) to each edge, there 𝑟 is the number of vertices in the largest clique containing
that edge. Prove that the sum of all edge weights is at most 𝑛2 /2.

Exercise 5.4.9. For each graph 𝐹, let 𝑐 𝐹 ∈ R be such that 𝑐 𝐹 ≥ 0 whenever 𝐹 is not
a clique (no restrictions when 𝐹 is a clique). Assume that 𝑐 𝐹 ≠ 0 for finitely many 𝐹’s.
Prove that the inequality
∑︁
𝑐 𝐹 𝑡 inj (𝐹, 𝐺) ≥ 0
𝐹

is true for every graph 𝐺 if and only if it is true with 𝐺 = 𝐾𝑛 for every positive integer 𝑛.

Exercise 5.4.10 (Cliquey edges). Let 𝑛, 𝑟, 𝑡 be nonnegative integers. Show that every
2
𝑛-vertex graph with at least (1 − 𝑟1 ) 𝑛2 + 𝑡 edges contains at least 𝑟𝑡 edges that belong to a
𝐾𝑟+1 .
Hint: Rephrase the statement as a linear inequality between the number of edges and the number of cliquey edges in every graph.

Exercise 5.4.11 (A hypergraph Turán density). Let 𝐹 be the 3-graph with 10 vertices and
6 edges illustrated below (lines denotes edges). Prove that the hypergraph Turán density of
𝐹 is 2/9.
190 Graph Homomorphism Inequalities

Exercise 5.4.12∗ (Maximizing 𝐾1,2 density). Prove that, for every 𝑝 ∈ [0, 1], among all
graphons 𝑊 with 𝑡 (𝐾2 , 𝑊) = 𝑝, the maximum possible value of 𝑡 (𝐾1,2 , 𝑊) is attained by
either a “clique” or a “hub” graphon, illustrated below.
0 𝑎 1 0 𝑎 1
1
1 𝑎
𝑎 0
0
1 1
clique graphon hub graphon
𝑊 (𝑥, 𝑦) = 1max{ 𝑥,𝑦 } ≤𝑎 𝑊 (𝑥, 𝑦) = 1min{ 𝑥,𝑦 } ≤𝑎

5.5 Entropy
In this section, we explain how to use entropy to prove certain graph homomorphism in-
equalities.

Entropy basics

Definition 5.5.1 (Entropy)

Let 𝑋 be a discrete random variable taking values in some set 𝑆. For each 𝑠 ∈ 𝑆, let
𝑝 𝑠 = P(𝑋 = 𝑠). We define the (binary) entropy of 𝑋 to be
∑︁
𝑯(𝑿) := −𝑝 𝑠 log2 𝑝 𝑠 .
𝑠∈𝑆

(By convention if 𝑝 𝑠 = 0 then the corresponding summand is set to zero).

Exercise 5.5.2. Show that 𝐻 (𝑋) ≥ 0 always.

Intuitively, 𝐻 (𝑋) measures the amount of “surprise” in the randomness of 𝑋. A more
rigorous interpretation of this intuition is given by the Shannon noiseless coding theorem,
which says that the minimum number of bits needed to encode 𝑛 independent copies of 𝑋 is
𝑛𝐻 (𝑋) + 𝑜(𝑛).
Here are some basic properties of entropy.

Lemma 5.5.3 (Uniform bound)

If 𝑋 is a random variable supported on a finite set 𝑆, then
𝐻 (𝑋) ≤ log2 |𝑆| .
Equality holds if and only if 𝑋 is uniformly distributed on 𝑆.

Proof. Let function 𝑓 (𝑥) = −𝑥 log2 𝑥 is concave for 𝑥 ∈ [0, 1]. We have, by concavity,
!
∑︁ 1 ∑︁ 1
𝐻 (𝑋) = 𝑓 ( 𝑝 𝑠 ) ≤ |𝑆| 𝑓 𝑝 𝑠 = |𝑆| 𝑓 = log2 |𝑆| . □
𝑠∈𝑆
|𝑆| 𝑠∈𝑆 |𝑆|
5.5 Entropy 191

We write 𝑯(𝑿, 𝒀) for the entropy of the joint random variables (𝑋, 𝑌 ). This means that
∑︁
𝑯(𝑿, 𝒀) := 𝐻 (𝑍) = −P(𝑋 = 𝑥, 𝑌 = 𝑦) log2 P(𝑋 = 𝑥, 𝑌 = 𝑦).
( 𝑥,𝑦)

In particular,
𝐻 (𝑋, 𝑌 ) = 𝐻 (𝑋) + 𝐻 (𝑌 ) if 𝑋 and 𝑌 are independent.
We can similarly define 𝐻 (𝑋, 𝑌 , 𝑍), and so on.

Definition 5.5.4 (Conditional entropy)

Intuitively, 𝐻 (𝑋 |𝑌 ) measures the expected amount of new information or surprise in 𝑋

Lemma 5.5.5 (Chain rule)

𝐻 (𝑋, 𝑌 ) = 𝐻 (𝑋) + 𝐻 (𝑌 |𝑋)

Proof. Writing 𝑝(𝑥, 𝑦) = P(𝑋 = 𝑥, 𝑌 = 𝑦) and so on, we have by Bayes’s rule

𝑝(𝑥|𝑦) 𝑝(𝑦) = 𝑝(𝑥, 𝑦),
and so (below we skip 𝑦 if 𝑝(𝑦) = 0)
∑︁
𝐻 (𝑋 |𝑌 ) = P(𝑌 = 𝑦)𝐻 (𝑋 |𝑌 = 𝑦)
∑︁ ∑︁
𝑦

= −𝑝(𝑦) 𝑝(𝑥|𝑦) log2 𝑝(𝑥|𝑦)

𝑦 𝑥
∑︁ 𝑝(𝑥, 𝑦)
= −𝑝(𝑥, 𝑦) log2
𝑝(𝑦)
∑︁ ∑︁
𝑥,𝑦

= −𝑝(𝑥, 𝑦) log2 𝑝(𝑥, 𝑦) + 𝑝(𝑦) log2 𝑝(𝑦)

𝑥,𝑦 𝑦

= 𝐻 (𝑋, 𝑌 ) − 𝐻 (𝑌 ). □
192 Graph Homomorphism Inequalities

Lemma 5.5.6 (Subadditivity)

𝐻 (𝑋, 𝑌 ) ≤ 𝐻 (𝑋) + 𝐻 (𝑌 ). More generally,
𝐻 (𝑋1 , . . . , 𝑋𝑛 ) ≤ 𝐻 (𝑋1 ) + · · · + 𝐻 (𝑋𝑛 ).

Proof. Let 𝑓 (𝑡) = log2 (1/𝑡), which is convex. We have

𝐻 (𝑋) + 𝐻 (𝑌 ) − 𝐻 (𝑋, 𝑌 )
∑︁
= −𝑝(𝑥, 𝑦) log2 𝑝(𝑥) − 𝑝(𝑥, 𝑦) log2 𝑝(𝑦) + 𝑝(𝑥, 𝑦) log2 𝑝(𝑥, 𝑦)
𝑥,𝑦
∑︁ 𝑝(𝑥, 𝑦)
= 𝑝(𝑥, 𝑦) log2
𝑝(𝑥) 𝑝(𝑦)

𝑥,𝑦
∑︁ 𝑝(𝑥) 𝑝(𝑦)
= 𝑝(𝑥, 𝑦) 𝑓
𝑝(𝑥, 𝑦)
!
𝑥,𝑦

∑︁ 𝑝(𝑥) 𝑝(𝑦)
≥ 𝑓 𝑝(𝑥, 𝑦) = 𝑓 (1) = 0.
𝑥,𝑦
𝑝(𝑥, 𝑦)

More generally, by iterating the above inequality for two random variables, we have
𝐻 (𝑋1 , . . . , 𝑋𝑛 ) ≤ 𝐻 (𝑋1 , . . . , 𝑋𝑛−1 ) + 𝐻 (𝑋𝑛 )
≤ 𝐻 (𝑋1 , . . . , 𝑋𝑛−2 ) + 𝐻 (𝑋𝑛−1 ) + 𝐻 (𝑋𝑛 )
≤ · · · ≤ 𝐻 (𝑋1 ) + · · · + 𝐻 (𝑋𝑛 ). □
Remark 5.5.7. The nonnegative quantity
𝐼 (𝑋; 𝑌 ) := 𝐻 (𝑋) + 𝐻 (𝑌 ) − 𝐻 (𝑋, 𝑌 )
is called mutual information. Intuitively, it measures the amount of common information
between 𝑋 and 𝑌 .

Lemma 5.5.8 (Dropping conditioning)

𝐻 (𝑋 |𝑌 ) ≤ 𝐻 (𝑋). More generally,
𝐻 (𝑋 |𝑌 , 𝑍) ≤ 𝐻 (𝑋 |𝑍).

Proof. By chain rule and subadditivity, we have

𝐻 (𝑋 |𝑌 ) = 𝐻 (𝑋, 𝑌 ) − 𝐻 (𝑌 ) ≤ 𝐻 (𝑋).
The inequality conditioning on 𝑍 follows since the above implies that
𝐻 (𝑋 |𝑌 , 𝑍 = 𝑧) ≥ 𝐻 (𝑋 |𝑍 = 𝑧)
holds for every 𝑧, and taking expectation of 𝑧 yields 𝐻 (𝑋 |𝑌 , 𝑍) ≤ 𝐻 (𝑋 |𝑍). □
Remark 5.5.9. Another way to state the dropping condition inequality is the data processing
inequality: 𝐻 (𝑋 | 𝑓 (𝑌 )) ≥ 𝐻 (𝑋 |𝑌 ) for any function 𝑓 .
5.5 Entropy 193

Applications to Sidorenko’s conjecture

Now let us use entropy to establish some interesting cases of Sidorenko’s conjecture. Recall
that a bipartite graph 𝐹 is said to be Sidorenko if
𝑡 (𝐹, 𝐺) ≥ 𝑡 (𝐾2 , 𝐺) 𝑒 (𝐹 )
for every graph 𝐺. Sidorenko’s conjecture says that every bipartite graph is Sidorenko.
The entropy approach to Sidorenko’s conjecture was first introduced by Li & Szegedy
(2011) and further developed in subsequent works (Szegedy (2015); Conlon, Kim, Lee, &
Lee (2018)). Here we illustrate the entropy approach to Sidorenko’s conjecture with several
examples.
To show that 𝐹 is Sidorenko, we need to show that for every graph 𝐺,
𝑒 (𝐹 )
hom(𝐹, 𝐺) 2𝑒(𝐺)
≥ . (5.5)
𝑣(𝐺) 𝑣 (𝐹 ) 𝑣(𝐺) 2
We write Hom(𝑭, 𝑮) for the the set of all maps 𝑉 (𝐹) → 𝑉 (𝐺) that give a graph
homomorphism 𝐹 → 𝐺. This set has cardinality hom(𝐹, 𝐺). Our strategy is to construct a
random element Φ ∈ Hom(𝐹, 𝐺) whose entropy satisfies
𝐻 (Φ) ≥ 𝑒(𝐹) log2 (2𝑒(𝐺)) − (2𝑒(𝐹) − 𝑣(𝐹)) log2 𝑣(𝐺). (5.6)
The uniform bound 𝐻 (Φ) ≤ log2 hom(𝐹, 𝐺) then implies (5.5).
Let us illustrate this technique for a three-edge path. We had already seen two proofs of
the following inequality in Section 5.3. Now we present a different proof using the entropy
method along with generalizations.

Theorem 5.5.10
The 3-edge path is Sidorenko.

Proof. Let 𝑃4 denote the 3-edge path and 𝐺 a graph. An element of Hom(𝑃4 , 𝐺) is a walk
of length three. We choose randomly a walk 𝑋𝑌 𝑍𝑊 in 𝐺 as follows:
• 𝑋𝑌 is a uniform random edge of 𝐺 (by this we mean first choosing an edge of 𝐺
uniformly at random, and then let 𝑋 be a uniformly chosen endpoint of this edge, and
then 𝑌 the other endpoint);
• 𝑍 is a uniform random neighbor of 𝑌 ;
• 𝑊 is a uniform random neighbor of 𝑍.
A key observation is that 𝑌 𝑍 is also distributed as a uniform random edge of 𝐺 (pause
and think about why). Indeed, conditioned on the choice of 𝑌 , the vertices 𝑋 and 𝑍 are both
independent and uniform neighbors of 𝑌 , so 𝑋𝑌 and 𝑌 𝑍 are identically distributed, and hence
𝑌 𝑍 is a uniform random edge of 𝐺.
Similarly, 𝑍𝑊 is distributed as uniform random edge.
Also, since 𝑋 and 𝑍 are conditionally independent given 𝑌
𝐻 (𝑍 |𝑋, 𝑌 ) = 𝐻 (𝑍 |𝑌 ) and 𝐻 (𝑊 |𝑋, 𝑌 , 𝑍) = 𝐻 (𝑊 |𝑍).
Furthermore,
𝐻 (𝑌 |𝑋) = 𝐻 (𝑍 |𝑌 ) = 𝐻 (𝑊 |𝑍)
194 Graph Homomorphism Inequalities

since 𝑋𝑌 , 𝑌 𝑍, 𝑍𝑊 are identically distributed as a uniform random edge.

This proves (5.6), and thus shows that 𝑃4 is Sidorenko. Indeed, by the uniform bound,
log2 hom(𝑃4 , 𝐹) ≥ 𝐻 (𝑋, 𝑌 , 𝑍, 𝑊) ≥ 3 log2 (2𝑒(𝐺)) − 2 log2 𝑣(𝐺),
and hence
3
hom(𝑃4 , 𝐺) 2𝑒(𝐺)
𝑡 (𝑃4 , 𝐺) = ≥ = 𝑡 (𝐾2 , 𝐺) 3 . □
𝑣(𝐺) 4 𝑣(𝐺) 2
Let us outline how to extend the above proof strategy from the 3-edge path to any tree 𝑇.
Define a 𝑻-branching random walk in a graph 𝐺 to be a random Φ ∈ Hom(𝑇, 𝐺) defined
by fixing an arbitrary root 𝑣 of 𝑇 (the choice of 𝑣 will not matter in the end). Then set Φ(𝑣)
to be a random vertex of 𝐺 with each vertex of 𝐺 chosen proportional to its degree. Then
extend Φ to a random homomorphism 𝑇 → 𝐺 one vertex at a time: if 𝑢 ∈ 𝑉 (𝑇) is already
mapped to Φ(𝑢) and its neighbor 𝑤 ∈ 𝑉 (𝑇) has not yet been mapped, then set Φ(𝑤) to
be a uniform random neighbor of Φ(𝑢), independent of all previous choices. The resulting
random Φ ∈ Hom(𝑇, 𝐺) has the following properties:
• for each edge of 𝑇, its image under Φ is a uniform random edge of 𝐺 and with the two
possible edge orientations equally likely; and
• for each vertex 𝑣 of 𝑇, conditioned on Φ(𝑣), the neighbors of 𝑣 in 𝑇 are mapped by Φ
to conditionally independent and uniform neighbors of Φ(𝑣) in 𝐺.
Furthermore, as in the proof of Theorem 5.5.10,
𝐻 (Φ) = 𝑒(𝑇) log2 (2𝑒(𝐺)) − (𝑒(𝑇) − 1)𝐻 (Φ(𝑣))
≥ 𝑒(𝑇) log2 (2𝑒(𝐺)) − (𝑒(𝑇) − 1) log2 𝑣(𝐺). (5.7)
(Exercise: fill in the details.) Together with the uniform bound 𝐻 (Φ) ≤ log2 hom(𝑇, 𝐺), we
proved the following.

Theorem 5.5.11
Every tree is Sidorenko.

We saw earlier that 𝐾𝑠,𝑡 is Sidorenko, which can be proved by two applications of Hölder’s
inequality (see Section 5.3). Here let us give another proof using entropy. This entropy proof
is subtler than the earlier Hölder’s inequality proof, but it will soon lead us more naturally to
the next generalization.
5.5 Entropy 195

Theorem 5.5.12
Every complete bipartite graph is Sidorenko.

Let us demonstrate the proof for 𝐾2,2 for concreteness. The same proof extends to all 𝐾𝑠,𝑡 .
𝑥1 𝑦1
𝑥2 𝑦2

Proof that 𝐾2,2 is Sidorenko. As earlier, we construct a random element of Hom(𝐾2,2 , 𝐺).
Pick a random (𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) ∈ 𝑉 (𝐺) 4 with 𝑋𝑖𝑌 𝑗 ∈ 𝐸 (𝐺) for all 𝑖, 𝑗 as follows:
• 𝑋1𝑌1 is a uniform random edge;
• 𝑌2 is a uniform random neighbor of 𝑋1 ;
• 𝑋2 is a conditionally independent copy of 𝑋1 given (𝑌1 , 𝑌2 ).
The last point deserves some attention. It does not say that we choose a uniform random
common neighbor of 𝑌1 and 𝑌2 , as one might naively attempt. Instead, one can think of
the first two steps as defining the 𝐾1,2 -branching random walk for (𝑋1 , 𝑌1 , 𝑌2 ). Under this
distribution, we can first sample (𝑌1 , 𝑌2 ) according to its marginal, and then produce two
conditionally independent copies of 𝑋1 (with the second copy now called 𝑋2 ).
We have
𝐻 (𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 )
= 𝐻 (𝑌1 , 𝑌2 ) + 𝐻 (𝑋1 , 𝑋2 |𝑌1 , 𝑌2 ) [chain rule]
= 𝐻 (𝑌1 , 𝑌2 ) + 2𝐻 (𝑋1 |𝑌1 , 𝑌2 ) [cond. indep.]
= 2𝐻 (𝑋1 , 𝑌1 , 𝑌2 ) − 𝐻 (𝑌1 , 𝑌2 ) [chain rule]
≥ 2(2 log2 (2𝑒(𝐺)) − log2 𝑣(𝐺)) − 𝐻 (𝑌1 , 𝑌2 ). [(5.7)]
≥ 2(2 log2 (2𝑒(𝐺)) − log2 𝑣(𝐺)) − 2 log2 𝑣(𝐺). [uniform bound]
= 4 log(2𝑒(𝐺)) − 4 log2 𝑣(𝐺).
Together with the uniform bound 𝐻 (𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 ) ≤ log2 hom(𝐾2,2 , 𝐺), we deduce that 𝐾2,2
is Sidorenko. □
Exercise 5.5.13. Complete the proof of Theorem 5.5.12 for general 𝐾 𝑠,𝑡 .
The following result was first proved by Conlon, Fox, & Sudakov (2010) using the de-
pendent random choice technique. The entropy proof was found later by Li & Szegedy
(2011).

Theorem 5.5.14
Let 𝐹 be a bipartite graph that has a vertex adjacent to all vertices in the other part. Then
𝐹 is Sidorenko.

Let us illustrate the proof for the following graph 𝐹. The proof extends to the general case.
196 Graph Homomorphism Inequalities
𝑦1
𝑥1
𝑥0
𝑦2
𝑥2
𝑦3

Proof that the above graph is Sidorenko. Pick (𝑋0 , 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 , 𝑌3 ) ∈ 𝑉 (𝐺) 6 randomly as
follows:
• 𝑋0𝑌1 is a uniform random edge;
• 𝑌2 and 𝑌3 are independent uniform random neighbors of 𝑋0 ;
• 𝑋1 is a conditionally independent copy of 𝑋0 given (𝑌1 , 𝑌2 );
• 𝑋2 is a conditionally independent copy of 𝑋0 given (𝑌2 , 𝑌3 ).
We have the following properties:
• 𝑋0 , 𝑋1 , 𝑋2 are conditionally independent given (𝑌1 , 𝑌2 , 𝑌3 );
• 𝑋1 and (𝑋0 , 𝑌3 , 𝑋2 ) are conditionally independent given (𝑌1 , 𝑌2 );
• The distribution of (𝑋0 , 𝑌1 , 𝑌2 ) is identical to the distribution of (𝑋1 , 𝑌1 , 𝑌2 ).
So (the 1st and 4th steps by chain rule, and the 2nd and 3rd steps by conditional independence)
𝐻 (𝑋0 , 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 , 𝑌3 )
= 𝐻 (𝑋0 , 𝑋1 , 𝑋2 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑌1 , 𝑌2 , 𝑌3 )
= 𝐻 (𝑋0 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑋1 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑋2 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑌1 , 𝑌2 , 𝑌3 )
= 𝐻 (𝑋0 |𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑋1 |𝑌1 , 𝑌2 ) + 𝐻 (𝑋2 |𝑌2 , 𝑌3 ) + 𝐻 (𝑌1 , 𝑌2 , 𝑌3 )
= 𝐻 (𝑋0 , 𝑌1 , 𝑌2 , 𝑌3 ) + 𝐻 (𝑋1 , 𝑌1 , 𝑌2 ) + 𝐻 (𝑋2 , 𝑌2 , 𝑌3 ) − 𝐻 (𝑌1 , 𝑌2 ) − 𝐻 (𝑌2 , 𝑌3 ).
By (5.7),
𝐻 (𝑋0 , 𝑌1 , 𝑌2 , 𝑌3 ) ≥ 3 log2 (2𝑒(𝐺)) − 2 log2 𝑣(𝐺),
𝐻 (𝑋1 , 𝑌1 , 𝑌2 ) ≥ 2 log2 (2𝑒(𝐺)) − log2 𝑣(𝐺),
and 𝐻 (𝑋2 , 𝑌2 , 𝑌3 ) ≥ 2 log2 (2𝑒(𝐺)) − log2 𝑣(𝐺).
And by the uniform bound,
𝐻 (𝑌1 , 𝑌2 ) = 𝐻 (𝑌2 , 𝑌3 ) ≤ 2 log2 𝑣(𝐺).
Putting everything together, we have
log2 hom(𝐹, 𝐺) ≥ 𝐻 (𝑋0 , 𝑋1 , 𝑋2 , 𝑌1 , 𝑌2 , 𝑌3 ) ≥ 7 log2 (2𝑒(𝐺)) − 8 log2 𝑣(𝐺).
Thereby verifying (5.6), showing that 𝐹 is Sidorenko. □
(Where did we use the assumption that 𝐹 has vertex complete to the other part?)
Exercise 5.5.15. Complete the proof of Theorem 5.5.14.

Shearer’s inequality
Another important tool in the entropy method is Shearer’s inequality, which is a powerful
generalization of subadditivity. Before stating it in full generality, let us first see a simple
instance of Shearer’s lemma.
5.5 Entropy 197

Theorem 5.5.16 (Shearer’s entropy inequality, special case)

2𝐻 (𝑋, 𝑌 , 𝑍) ≤ 𝐻 (𝑋, 𝑌 ) + 𝐻 (𝑋, 𝑍) + 𝐻 (𝑌 , 𝑍).

Proof. Using the chain rule and conditioning dropping, we have

𝐻 (𝑋, 𝑌 ) = 𝐻 (𝑋) + 𝐻 (𝑌 |𝑋),
𝐻 (𝑋, 𝑍) = 𝐻 (𝑋) + 𝐻 (𝑍 |𝑋),
and 𝐻 (𝑌 , 𝑍) = 𝐻 (𝑌 ) + 𝐻 (𝑍 |𝑌 ).
Adding up, and applying conditioning dropping 𝐻 (𝑌 ) ≥ 𝐻 (𝑌 |𝑋) , we see that their sum is
at at least
2𝐻 (𝑋) + 2𝐻 (𝑌 |𝑋) + 2𝐻 (𝑍 |𝑋, 𝑌 ) = 2𝐻 (𝑋, 𝑌 , 𝑍),
with the final equality due to the chain rule. □
Here is the general form of Shearer’s inequality (Chung, Graham, Frankl, & Shearer 1986).

Theorem 5.5.17 (Shearer’s entropy inequality)

Let 𝐴1 , . . . , 𝐴𝑠 ⊆ [𝑛] where each 𝑖 ∈ [𝑛] appears in at least 𝑘 sets 𝐴 𝑗 ’s. Let 𝑋1 , . . . , 𝑋𝑛
be a jointly distributed discrete random variables. Writing 𝑋 𝐴 := (𝑋𝑖 )𝑖 ∈ 𝐴, we have
∑︁
𝑘 𝐻 (𝑋1 , . . . , 𝑋𝑛 ) ≤ 𝐻 (𝑋 𝐴 𝑗 ).
𝑗 ∈ [𝑠]

Exercise 5.5.18. Prove Theorem 5.5.17 by generalizing the proof of Theorem 5.5.16.
Shearer’s entropy inequality is related to the generalized Hölder inequality from Sec-
tion 5.3. It is a significant generalization of the projection inequality discussed in Re-
mark 5.3.6. See Friedgut (2004) for more on these connections.
The next exercise asks you to prove a strengthening of the projection inequalities (Re-
mark 5.3.6 and Exercise 5.3.9) by mimicking the entropy proof of Shearer’s entropy inequal-
ity. The result is due to Bollobás & Thomason (1995), though their original proof does not
use the entropy method.
Exercise 5.5.19 (Box theorem). For each 𝐼 ⊆ [𝑑], write 𝜋 : R𝑑 → R𝐼 to denote the
projection obtained by omitting coordinates outside 𝐼. Show that for every compact body
𝐾 ⊆ R𝑑 , there exists a box 𝐵 = [𝑎 1 , 𝑏 1 ] × · · · × [𝑎 𝑑 , 𝑏 𝑑 ] ⊆ R𝑑 such that |𝐵| = |𝐾 | and
|𝜋 𝐼 (𝐵)| ≤ |𝜋 𝐼 (𝐾)| for every 𝐼 ⊆ [𝑑] (here | · | denotes volume).
Use this result to give another proof of the projection inequality from Exercise 5.3.9.
Hint: First prove it for 𝐾 being a union of grid boxes. Then extend it to general 𝐾 via compactness.

Let us use the entropy method to give another proof of Theorem 5.3.15, restated below.

Theorem 5.5.20 (The maximum number of 𝐻 -colorings in a regular graph)

For every 𝑛-vertex 𝑑-regular bipartite graph 𝐹, and any graph 𝐺 (allowing looped vertices
on 𝐺)
hom(𝐹, 𝐺) ≤ hom(𝐾 𝑑,𝑑 , 𝐺) 𝑛/(2𝑑) .
198 Graph Homomorphism Inequalities

The proof below is based on (with some further simplifications) the entropy proofs of
Galvin & Tetali (2004), which was in turn based on the proof by Kahn (2001) for independent
sets.
Proof. Let us first illustrate the proof for 𝐹 being the following graph
𝑥1 𝑦1
𝑥2 𝑦2
𝑥3 𝑦3

Choose Φ ∈ Hom(𝐹, 𝐺) uniformly at random among all homomorphisms from 𝐹 to 𝐺. Let

𝑋1 , 𝑋2 , 𝑋3 , 𝑌1 , 𝑌2 , 𝑌3 ∈ 𝑉 (𝐺) be the respective images of the vertices of 𝐺. We have
2 log2 hom(𝐹, 𝐺)
= 2𝐻 (𝑋1 , 𝑋2 , 𝑋3 , 𝑌1 , 𝑌2 , 𝑌3 )
= 2𝐻 (𝑋1 , 𝑋2 , 𝑋3 ) + 2𝐻 (𝑌1 , 𝑌2 , 𝑌3 |𝑋1 , 𝑋2 , 𝑋3 ) [chain rule]
≤ 𝐻 (𝑋1 , 𝑋2 ) + 𝐻 (𝑋1 , 𝑋3 ) + 𝐻 (𝑋2 , 𝑋3 )
+ 2𝐻 (𝑌1 |𝑋1 , 𝑋2 , 𝑋3 ) + 2𝐻 (𝑌2 |𝑋1 , 𝑋2 , 𝑋3 ) + 2𝐻 (𝑌3 |𝑋1 , 𝑋2 , 𝑋3 ) [Shearer]
= 𝐻 (𝑋1 , 𝑋2 ) + 𝐻 (𝑋1 , 𝑋3 ) + 𝐻 (𝑋2 , 𝑋3 )
+ 2𝐻 (𝑌1 |𝑋1 , 𝑋2 ) + 2𝐻 (𝑌2 |𝑋1 , 𝑋3 ) + 2𝐻 (𝑌3 |𝑋2 , 𝑋3 ) [cond. indep.]

In the final step, we use that 𝑋3 and 𝑌1 are conditionally independent given 𝑋1 and 𝑋2 (why?),
along with two other analogous statements. A more general statement is that if 𝑆 ⊆ 𝑉 (𝐹), then
the restrictions to the different connected components of 𝐹 − 𝑆 are conditionally independent
given (𝑋𝑠 ) 𝑠∈𝑆 .
To complete the proof, it remains to show
𝐻 (𝑋1 , 𝑋2 ) + 2𝐻 (𝑌1 |𝑋1 , 𝑋2 ) ≤ log2 hom(𝐾2,2 , 𝐺),
𝐻 (𝑋1 , 𝑋3 ) + 2𝐻 (𝑌2 |𝑋1 , 𝑋3 ) ≤ log2 hom(𝐾2,2 , 𝐺),
and 𝐻 (𝑋2 , 𝑋3 ) + 2𝐻 (𝑌3 |𝑋2 , 𝑋3 ) ≤ log2 hom(𝐾2,2 , 𝐺).
They are analogous so let us just show the first inequality. Let 𝑌1′ be a conditionally indepen-
dent copy of 𝑌1 given (𝑋1 , 𝑋2 ). Then (𝑋1 , 𝑋2 , 𝑌1 , 𝑌1′ ) is the image of a homomorphism from
𝐾2,2 to 𝐺 (though not necessarily chosen uniformly).
𝑥1 𝑦1
𝑥2 𝑦 1′

Thus we have
𝐻 (𝑋1 , 𝑋2 ) + 2𝐻 (𝑌1 |𝑋1 , 𝑋2 ) = 𝐻 (𝑋1 , 𝑋2 ) + 𝐻 (𝑌1 , 𝑌1′ |𝑋1 , 𝑋2 )
= 𝐻 (𝑋1 , 𝑋2 , 𝑌1 , 𝑌1′ ) [chain rule]
≤ log2 hom(𝐾2,2 , 𝐺) [uniform bound]

This concludes the proof for 𝐹 = 𝐶6 .

Now let 𝐹 be an arbitrary bipartite graph with vertex bipartition 𝑉 = 𝐴 ∪ 𝐵. Let Φ ∈
5.5 Entropy 199

Hom(𝐹, 𝐺) be chosen uniformly at random. For each 𝑣 ∈ 𝑉, let 𝑋𝑣 = Φ(𝑣). For each 𝑆 ⊆ 𝑉,
write 𝑋𝑆 := (𝑋𝑣 ) 𝑣 ∈𝑆 . We have
𝑑 log2 hom(𝐹, 𝐺) = 𝑑𝐻 (Φ) = 𝑑𝐻 (𝑋 𝐴) + 𝑑𝐻 (𝑋 𝐵 |𝑋 𝐴) [chain rule]
∑︁ ∑︁
≤ 𝐻 (𝑋 𝑁 (𝑏) ) + 𝑑 𝐻 (𝑋𝑏 |𝑋 𝐴) [Shearer]
∑︁
𝑏∈ 𝐵
∑︁
𝑏∈ 𝐵

= 𝐻 (𝑋 𝑁 (𝑏) ) + 𝑑 𝐻 (𝑋𝑏 |𝑋 𝑁 (𝑏) ). [cond. indep.]

𝑏∈ 𝐵 𝑏∈ 𝐵

For each 𝑏 ∈ 𝐵, let 𝑋𝑏(1) , . . . , 𝑋𝑏(𝑑) be conditionally independent copies of 𝑋𝑏 given 𝑋 𝑁 (𝑏) .
We have
𝐻 (𝑋 𝑁 (𝑏) ) + 𝑑𝐻 (𝑋𝑏 |𝑋 𝑁 (𝑏) ) = 𝐻 (𝑋 𝑁 (𝑏) ) + 𝐻 (𝑋𝑏(1) , . . . , 𝑋𝑏(𝑑) |𝑋 𝑁 (𝑏) )
= 𝐻 (𝑋𝑏(1) , . . . , 𝑋𝑏(𝑑) , 𝑋 𝑁 (𝑏) ) [chain rule]
≤ log2 hom(𝐾 𝑑,𝑑 , 𝐺). [uniform bound]

Summing over all 𝑏 ∈ 𝐵, and using the previous equality, we obtain

𝑑 log2 hom(𝐹, 𝐺) ≤ 𝑑 log2 hom(𝐾 𝑑,𝑑 , 𝐺). □
Exercise 5.5.21. Prove that the following graph is Sidorenko.

Exercise 5.5.22 (△ vs. ∧ in a directed graph). Let 𝑉 be a finite set, 𝐸 ⊆ 𝑉 × 𝑉, and

△= (𝑥, 𝑦, 𝑧) ∈ 𝑉 3 : (𝑥, 𝑦), (𝑦, 𝑧), (𝑧, 𝑥) ∈ 𝐸
(i.e., cyclic triangles; note the direction of edges) and

∧ = (𝑥, 𝑦, 𝑧) ∈ 𝑉 3 : (𝑥, 𝑦), (𝑥, 𝑧) ∈ 𝐸 .
Prove that △ ≤ ∧.

Further Reading
The book Large Networks and Graph Limits by Lovász (2012) contains an excellent treatment
of graph homomorphism inequalities in Section 2.1 and Chapter 16.
The survey Flag Algebras: An Interim Report by Razborov (2013) contains a survey of
results obtained using the flag algebra method.
For combinatorial applications of the entropy method, see the surveys
• Entropy and Counting by Radhakrishnan (2003), and
• Three Tutorial Lectures on Entropy and Counting by Galvin (2014).
200 Graph Homomorphism Inequalities

Chapter Summary
• Many problems in extremal graph theory can be phrased in terms of graph homomorphism
inequalities.
– Homomorphism density inequalities are undecidable in general.
– Many open problems remain, such as Sidorenko’s conjecture, which says that if 𝐹 is
bipartite, then 𝑡 (𝐹, 𝐺) ≥ 𝑡 (𝐾2 , 𝐺) 𝑒 (𝐹 ) for all graphs 𝐺.
• The set of all possible (edge, triangle) density pairs is known.
– For a given edge density, the maximum triangle density is maximized by a clique.
– For a given edge density, the minimum triangle density is given by a certain multipartite
graph. (We did not prove this result in full and only established the convex hull in
Section 5.4.)
• Cauchy–Schwarz and Hölder inequalities are versatile tools.
– Simple applications of Cauchy–Schwarz inequalities can often be recognized by “re-
flection symmetries” in a graph that can be “folded in half.”
– Flag algebra leads to computerized searches of Cauchy–Schwarz proofs of subgraph
density inequalities.
– Generalized Hölder inequality tells us that, as an example,
∫
𝑓 (𝑥, 𝑦)𝑔(𝑥, 𝑧)ℎ(𝑦, 𝑧) ≤ ∥ 𝑓 ∥ 2 ∥𝑔∥ 2 ∥ℎ∥ 2 .
𝑥,𝑦,𝑧

It can be proved by repeated applications of Hölder’s inequality, once for each variable.
The inequality is related to Shearer’s entropy inequality, an example of which says
that for joint random variables 𝑋, 𝑌 , 𝑍,
2𝐻 (𝑋, 𝑌 , 𝑍) ≤ 𝐻 (𝑋, 𝑌 ) + 𝐻 (𝑋, 𝑍) + 𝐻 (𝑌 , 𝑍).
• The Lagranian method relaxes an optimization problem on graphs to one about vertex-
weighted graphs, and then argue by shifting weights between vertices. We used the method
to prove
– Turán’s theorem (again);
– A linear inequality between clique densities in 𝐺 is true and only if it holds whenever
𝐺 is a clique.
• The entropy method can be used to establish various cases of Sidorenko’s conjecture,
including for trees, as well as for a bipartite graph with one vertex complete to the other
side.
6

Forbidding 3-Term Arithmetic Progressions

Chapter Highlights
• Fourier analytic proof of Roth’s theorem
• Finite field model in additive combinatorics: F𝑛𝑝 as a model for the integers
• Basics of discrete Fourier analysis
• Density increment argument in the proof of Roth’s theorem
• The polynomial method proof of Roth’s theorem in F3𝑛
• Arithmetic analogue of the regularity lemma, and application to Roth’s theorem with
popular difference

In this chapter, we study Roth’s theorem, which says that every 3-AP-free subset of [𝑁]
has size 𝑜(𝑁).
Previously, in Section 2.4, we gave a proof of Roth’s theorem using the graph regularity
lemma. The main goal of this chapter is to give a Fourier analytic proof of Roth’s theorem.
This is also Roth’s original proof (1953).
We begin by proving Roth’s theorem in the finite field model. That is, we first prove an
analogue of Roth’s theorem in F3𝑛 . The finite field vector space serves as a fruitful playground
for many additive combinatorics problems. Techniques such as Fourier analysis are often
simpler to carry out in the finite field model. After we develop the techniques in the finite
field model, we then prove Roth’s theorem in the integers. It can be a good idea to first try out
ideas in the finite field model before bringing them to the integers, as there may be additional
technical difficulties in the integers.
Later in Section 6.5, we will see a completely different proof of Roth’s theorem in F3𝑛
using the polynomial method, which gives significantly better quantitative bounds. This
proof surprised many people at the time of its discovery. However, unlike Fourier analysis,
this polynomial method technique only applies to the finite field setting, and it is unknown
how to apply it to the integers.
There is an interesting parallel between the Fourier analytic method in this chapter and the
graph regularity method from Chapter 2. In Section 6.6, we develop an arithmetic regularity lemma
and use it in Section 6.7 to prove a strengthening of Roth’s theorem showing popular com-
mon differences.

6.1 Fourier Analysis in Finite Field Vector Spaces

We review some basic facts about Fourier analysis in F𝑛𝑝 for a prime 𝑝. Everything here can
be extended to arbitrary abelian groups. As we saw in Section 3.3, eigenvalues of Cayley
graphs on an abelian group and the Fourier transform are intimately related.

201
202 Forbidding 3-Term Arithmetic Progressions

Throughout this section, we fix a prime 𝑝 and let

𝜔 = exp(2𝜋𝑖/𝑝).

Definition 6.1.1 (Fourier transform in F𝑛𝑝 )

The Fourier transform of 𝑓 : F𝑛𝑝 → C is a function b 𝑓 : F𝑛𝑝 → C defined by setting, for
each 𝑟 ∈ F 𝑝 ,
𝑛

1 ∑︁
𝒇 (𝒓) := E 𝑥 ∈F𝑛𝑝 𝑓 (𝑥)𝜔 −𝑟 · 𝑥 = 𝑛
b 𝑓 (𝑥)𝜔 −𝑟 · 𝑥
𝑝 𝑥 ∈F𝑛
𝑝

where 𝑟 · 𝑥 = 𝑟 1 𝑥 1 + · · · + 𝑟 𝑛 𝑥 𝑛 .

In particular, b
𝑓 (0) = E 𝑓 is the average of 𝑓 . This value often plays a special role compared
to other values b𝑓 (𝑟).
To simplify notation, it is generally understood that the variables being averaged or summed
over are varying uniformly in the domain F𝑛𝑝 .
Let us now state several important properties of the Fourier transform. We will see that all
these properties are consequences of the orthogonality of the Fourier basis.
The next result allows us to write 𝑓 in terms of b 𝑓.

Theorem 6.1.2 (Fourier inversion formula)

Let 𝑓 : F𝑛𝑝 → C. For every 𝑥 ∈ F𝑛𝑝 ,
∑︁
𝑓 (𝑥) = b
𝑓 (𝑟)𝜔𝑟 · 𝑥 .
𝑟 ∈F𝑛𝑝

The next result tells us that the Fourier transform preserves inner products.

Theorem 6.1.3 (Parseval / Plancheral)

Given 𝑓 , 𝑔 : F𝑛𝑝 → C, we have
∑︁
E 𝑥 ∈F𝑛𝑝 𝑓 (𝑥)𝑔(𝑥) = b
𝑓 (𝑟)b
𝑔 (𝑟).
𝑟 ∈F𝑛𝑝

In particular, as a special case ( 𝑓 = 𝑔),

∑︁
E 𝑥 ∈F𝑛𝑝 | 𝑓 (𝑥)| 2 = |b
𝑓 (𝑟)| 2 .
𝑟 ∈F𝑛𝑝

Remark 6.1.4 (History/naming). The names Parseval and Plancheral are often used in-
terchangeably in practice to refer to the unitarity of the Fourier transform (i.e., the above
theorem). Parseval derived the identity for the Fourier series of a periodic function on R,
whereas Plancheral derived it for the Fourier transform on R.
As is nowadays the standard in additive combinatorics, we adopt the following convention
for the Fourier transform in finite abelian groups:
average in physical space E 𝒇
and sum in frequency (Fourier) space 𝒇.
Íb
6.1 Fourier Analysis in Finite Field Vector Spaces 203

For example, following this convention, we define an “averaging” inner product for functions
𝑓 , 𝑔 : F𝑛𝑝 → C by

⟨ 𝒇 , 𝒈⟩ := E 𝑥 ∈F𝑛𝑝 𝑓 (𝑥)𝑔(𝑥) and ∥ 𝒇 ∥ 2 := ⟨ 𝑓 , 𝑓 ⟩ 1/2 .

In the frequency/Fourier domain, we define the “summing” inner product for functions
𝛼, 𝛽 : F𝑛𝑝 → C by
∑︁
⟨𝜶, 𝜷⟩ ℓ2 := 𝛼(𝑥) 𝛽(𝑥). and ∥𝜶∥ ℓ2 := ⟨𝛼, 𝛼⟩ ℓ1/2
2 .
𝑥 ∈F𝑛𝑝

Writing 𝛾𝑟 : F𝑛𝑝 → C for the function defined by

𝜸𝒓 (𝒙) := 𝜔𝑟 ·𝑥
(this is a character of the group F𝑛𝑝 ), the Fourier transform can be written as
b
𝑓 (𝑟) = E 𝑥 𝛾𝑟 (𝑥) 𝑓 (𝑥) = ⟨𝛾𝑟 , 𝑓 ⟩ . (6.1)
Parseval’s identity can be stated as
⟨ 𝑓 , 𝑔⟩ = ⟨ b
𝑓,b
𝑔 ⟩ℓ 2 and ∥ 𝑓 ∥2 = ∥ b
𝑓 ∥ℓ2 .
With these conventions, we often do not need to keep track of normalization factors.
The above identities can be proved via direct verification, by plugging in the formula for
the Fourier transform. We give a more conceptual proof below.
Proof of the Fourier inversion formula (Theorem 6.1.2). Let 𝛾𝑟 (𝑥) = 𝜔𝑟 · 𝑥 . Then the set of
functions
{𝛾𝑟 : 𝑟 ∈ F𝑛𝑝 }
forms an orthonormal basis for the space of functions F𝑛𝑝 → C with respect to the averaging
inner product ⟨·, ·⟩. Indeed,
(
(𝑠−𝑟 ) · 𝑥 1 if 𝑟 = 𝑠,
⟨𝛾𝑟 , 𝛾𝑠 ⟩ = E 𝑥 𝜔 =
0 if 𝑟 ≠ 𝑠
Furthermore, there are 𝑝 𝑛 functions 𝛾𝑟 (as 𝑟 ranges over F𝑛𝑝 ). So they form a basis of the
𝑝 𝑛 -dimensional vector space of all functions 𝑓 : F𝑛𝑝 → C. We will call this basis the Fourier
basis.
Now, given an arbitrary 𝑓 : F𝑛𝑝 → C, the “coordinate” of 𝑓 with respect to the basis vector
𝛾𝑟 of the Fourier basis is ⟨𝛾𝑟 , 𝑓 ⟩ = b
𝑓 (𝑟) by (6.1). So
∑︁
𝑓 = b
𝑓 (𝑟)𝛾𝑟 .
𝑟

This is precisely the Fourier inversion formula. □

Proof of Parseval’s identity (Theorem 6.1.3). Continuing from the previous proof, since the
Fourier basis is orthonormal, we can evaluate ⟨ 𝑓 , 𝑔⟩ with respects to coordinates in this basis,
thereby by yielding
∑︁ ∑︁
⟨ 𝑓 , 𝑔⟩ = ⟨ 𝑓 , 𝛾𝑟 ⟩ ⟨𝑔, 𝛾𝑟 ⟩ = b
𝑓 (𝑟)b
𝑔 (𝑟). □
𝑟 ∈F𝑛𝑝 𝑟 ∈F𝑛𝑝
204 Forbidding 3-Term Arithmetic Progressions

The convolution is an important operation.

Definition 6.1.5 (Convolution)

Given 𝑓 , 𝑔 : F𝑛𝑝 → C, define 𝑓 ∗ 𝑔 : F𝑛𝑝 → C by
( 𝒇 ∗ 𝒈) (𝑥) := E 𝑦 ∈F𝑛𝑝 𝑓 (𝑦)𝑔(𝑥 − 𝑦).
In other words, ( 𝑓 ∗ 𝑔)(𝑥) is the average of 𝑓 (𝑦)𝑔(𝑧) over all pairs (𝑦, 𝑧) with 𝑦 + 𝑧 = 𝑥.

Example 6.1.6. (a) If 𝑓 is supported on 𝐴 ⊆ F𝑛𝑝 and 𝑔 is supported on 𝐵 ⊆ F𝑛𝑝 , then 𝑓 ∗ 𝑔

is supported on the sumset 𝐴 + 𝐵 = {𝑎 + 𝑏 : 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵}.
(b) Let 𝑊 be a subspace of F𝑛𝑝 . Let 𝜇𝑊 = ( 𝑝 𝑛 /|𝑊 |)1𝑊 be the indicator function on 𝑊
normalized so that E𝜇𝑊 = 1. Then for any 𝑓 : F𝑛𝑝 → C, the function 𝑓 ∗ 𝜇𝑊 is obtained from
𝑓 by replacing its value at 𝑥 by its average value on the coset 𝑥 + 𝑊.
The second example suggests that convolution can be thought of as smoothing a function,
damping its potentially rough perturbations.
The Fourier transform conveniently converts convolutions to multiplication.

Theorem 6.1.7 (Convolution identity)

For any 𝑓 , 𝑔 : F𝑛𝑝 → C and any 𝑟 ∈ F𝑛𝑝 ,

𝑓 ∗ 𝑔(𝑟) = b
𝑓 (𝑟)b
𝑔 (𝑟).

Proof. We have

𝑓 ∗ 𝑔(𝑟) = E 𝑥 ( 𝑓 ∗ 𝑔)(𝑥)𝜔 −𝑟 · 𝑥 = E 𝑥 E 𝑦,𝑧:𝑦+𝑧=𝑥 𝑓 (𝑦)𝑔(𝑧)𝜔 −𝑟 · ( 𝑦+𝑧)

= E 𝑦,𝑧 𝑓 (𝑦)𝑔(𝑧)𝜔 −𝑟 · ( 𝑦+𝑧) = E 𝑦 𝑓 (𝑦)𝜔 −𝑟 ·𝑦 (E𝑧 𝑔(𝑧)𝜔 −𝑟 ·𝑧 ) = b
𝑓 (𝑟)b
𝑔 (𝑟). □
By repeated applications of the convolution identity, we have
( 𝑓1 ∗ · · · ∗ 𝑓 𝑘 ) ∧ = b
𝑓1 b
𝑓2 · · · b
𝑓𝑘
(here we write 𝑓 ∧ for b
𝑓 for typographical reasons).
Now we introduce a quantity relevant to Roth’s theorem on 3-APs.

Definition 6.1.8 (3-AP density)

Given 𝑓 , 𝑔, ℎ : F𝑛𝑝 → C, we write
𝚲( 𝒇 , 𝒈, 𝒉) := E 𝑥,𝑦 𝑓 (𝑥)𝑔(𝑥 + 𝑦)ℎ(𝑥 + 2𝑦), (6.2)
and
𝚲3 ( 𝒇 ) := Λ( 𝑓 , 𝑓 , 𝑓 ), (6.3)

Note that for any 𝐴 ⊆ F𝑛𝑝 ,

Λ(1 𝐴) = 𝑝 −2𝑛 |{(𝑥, 𝑦) : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}| = “3-AP density of 𝐴.”.
Here we include “trivial” 3-APs (i.e., those with with 𝑦 = 0).
The following identity, relating the Fourier transform and 3-APs, plays a central role in
the Fourier analytic proof of Roth’s theorem.
6.1 Fourier Analysis in Finite Field Vector Spaces 205

Proposition 6.1.9 (Fourier and 3-AP)

Let 𝑝 be an odd prime. If 𝑓 , 𝑔, ℎ : F𝑛𝑝 → C, then
∑︁
Λ( 𝑓 , 𝑔, ℎ) = b 𝑔 (−2𝑟)b
𝑓 (𝑟)b ℎ(𝑟).
𝑟

We will give two proofs of this proposition. The first proof is more mechanically straight-
forward. It is similar to the proof of the convolution identity earlier. The second proof directly
applies the convolution identity, and may be a bit more abstract/conceptual.
First proof. We expand the left-hand side using the formula for Fourier inversion.
E 𝑥,𝑦 𝑓 (𝑥)𝑔(𝑥 + 𝑦)ℎ(𝑥 + 2𝑦)
! ! !
∑︁ ∑︁ ∑︁
= E 𝑥,𝑦 b
𝑓 (𝑟 1 )𝜔𝑟1 · 𝑥 b
𝑔 (𝑟 2 )𝜔𝑟2 · ( 𝑥+𝑦) b
ℎ(𝑟 3 )𝜔𝑟3 · ( 𝑥+2𝑦)
∑︁
𝑟1 𝑟2 𝑟3

= b 𝑔 (𝑟 2 )b
𝑓 (𝑟 1 )b ℎ(𝑟 3 )E 𝑥 𝜔 𝑥· (𝑟1 +𝑟2 +𝑟3 ) E 𝑦 𝜔 𝑦· (𝑟2 +2𝑟3 )
∑︁
𝑟1 ,𝑟2 ,𝑟3

= b 𝑔 (𝑟 2 )b
𝑓 (𝑟 1 )b ℎ(𝑟 3 )1𝑟1 +𝑟2 +𝑟3 =0 1𝑟2 +2𝑟3 =0
∑︁
𝑟1 ,𝑟2 ,𝑟3

= b 𝑔 (−2𝑟)b
𝑓 (𝑟)b ℎ(𝑟).
𝑟

In the last step, we use that 𝑟 1 + 𝑟 2 + 𝑟 3 = 0 and 𝑟 2 + 2𝑟 3 = 0 together imply 𝑟 1 = 𝑟 2 = 𝑟 3 . □

Second proof. Write 𝑔1 (𝑦) = 𝑔(−𝑦/2). So 𝑔b1 (𝑟) = b
𝑔 (−2𝑟). Applying the convolution
identity,
E 𝑥,𝑦 𝑓 (𝑥)𝑔(𝑥 + 𝑦)ℎ(𝑥 + 2𝑦) = E 𝑥,𝑦,𝑧:𝑥−2𝑦+𝑧=0 𝑓 (𝑥)𝑔(𝑦)ℎ(𝑧)
= E 𝑥,𝑦,𝑧:𝑥+𝑦+𝑧=0 𝑓 (𝑥)𝑔1 (𝑦)ℎ(𝑧)
= ( 𝑓 ∗ 𝑔1 ∗ ℎ) (0)
∑︁
= 𝑓∗ 𝑔1 ∗ ℎ(𝑟) [Fourier inversion]
∑︁𝑟

= b
𝑓 (𝑟) 𝑔b1 (𝑟)b
ℎ(𝑟) [Convolution identity]
∑︁𝑟

= b 𝑔 (−2𝑟)b
𝑓 (𝑟)b ℎ(𝑟). □
𝑟

Remark 6.1.10. In the following section, we will work in F3𝑛 . Since −2 = 1 in F3 (and
so 𝑔1 = 𝑔 above), the proof looks even simpler. In particular, by Fourier inversion and the
convolution identity,
Λ3 (1 𝐴) = 3−2𝑛 {(𝑥, 𝑦, 𝑧) ∈ 𝐴3 : 𝑥 + 𝑦 + 𝑧 = 0}
∑︁ ∑︁
= (1 𝐴 ∗ 1 𝐴 ∗ 1 𝐴) (0) = (1 𝐴 ∗ 1 𝐴 ∗ 1 𝐴) ∧ (𝑟) = 1c𝐴 (𝑟) 3 . (6.4)
𝑟 𝑟

When 𝐴 = −𝐴, the eigenvalues of the adjacency matrix of the Cayley graph Cay(F3𝑛 , 𝐴) are
3𝑛 1c𝐴 (𝑟), 𝑟 ∈ F3𝑛 (recall from Section 3.3 on the eigenvalues of abelian Cayley graphs are
206 Forbidding 3-Term Arithmetic Progressions

given by the Fourier transforms). The quantity 32𝑛 Λ3 (1 𝐴) is the number of closed walks of
length 3 in the Cayley graph Cay(F𝑛𝑝 , 𝐴). So the above identity is saying that the number of
closed walks of length 3 in Cay(F3𝑛 , 𝐴) equals to the third moment of the eigenvalues of the
adjacency matrix, which is a general fact for every graph. (When 𝐴 ≠ −𝐴, we can consider
the directed or bipartite version of this argument.)
The following exercise generalizes the above identity.
Exercise 6.1.11. Let 𝑎 1 , . . . , 𝑎 𝑘 be nonzero integers, none divisible by the prime 𝑝. Let
𝑓1 , . . . , 𝑓 𝑘 : F𝑛𝑝 → C. Show that
∑︁
E 𝑥1 ,..., 𝑥𝑘 ∈F𝑛𝑝 :𝑎1 𝑥1 +···+𝑎𝑘 𝑥𝑘 =0 𝑓1 (𝑥 1 ) · · · 𝑓 𝑘 (𝑥 𝑘 ) = b
𝑓1 (𝑎 1 𝑟) · · · b
𝑓 𝑘 (𝑎 𝑘 𝑟).
𝑟 ∈F𝑛𝑝

6.2 Roth’s Theorem in the Finite Field Model

In this section, we use Fourier analysis to prove the following finite field analogue of Roth’s
theorem (Meshulam 1995). Later in the chapter, we will convert this proof to the integer
setting.
In an abelian group, a set 𝐴 is said to be 3-AP-free if 𝐴 does not have three distinct
elements of the form 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦. A 3-AP-free subset of F3𝑛 is also called a cap set. The
cap set problem asks to determine the size of the largest cap set in F3𝑛 .

Theorem 6.2.1 (Roth’s theorem in F3𝑛 )

Every 3-AP-free subset of F3𝑛 has size 𝑂 (3𝑛 /𝑛).

Remark 6.2.2 (General finite fields). We work in F3𝑛 mainly for convenience. The argument
presented in this section also shows that for every odd prime 𝑝, there is some constant 𝐶 𝑝 so
that every 3-AP-free subset of F𝑛𝑝 has size ≤ 𝐶 𝑝 𝑝 𝑛 /𝑛.
In F3𝑛 , there are several equivalent interpretations of 𝑥, 𝑦, 𝑧 ∈ F3𝑛 forming a 3-AP (allowing
the possibility for a trivial 3-AP with 𝑥 = 𝑦 = 𝑧):
• (𝑥, 𝑦, 𝑧) = (𝑥, 𝑥 + 𝑑, 𝑥 + 2𝑑) for some 𝑑;
• 𝑥 − 2𝑦 + 𝑧 = 0;
• 𝑥 + 𝑦 + 𝑧 = 0;
• 𝑥, 𝑦, 𝑧 are three distinct points of a line in F3𝑛 or are all equal;
• for each 𝑖, the 𝑖-th coordinates of 𝑥, 𝑦, 𝑧 are all distinct or all equal.
Remark 6.2.3 (SET card game). The card game SET comes with a deck of 81 cards (see
Figure 6.1 on the next page). Each card one of three possibilities in each of the following
four features:
• Number: 1, 2, 3;
• Symbol: diamond, squiggle, oval;
• Shading: solid, striped, open;
• Color: red, green, purple.
Each of the 34 = 81 combinations appears exactly once as a card.
In this game, a combination of three cards is called a “set” if each of the four features
6.2 Roth’s Theorem in the Finite Field Model 207

Figure 6.1 The complete deck of 81 cards in the game SET.

shows up as all identical or all distinct among the three cards. For the example, the three cards
shown below form a “set”: number (all distinct), symbol (all distinct), shading (all striped),
color (all red).

In a standard play of the game, the dealer lays down twelve cards on the table until some
player finds a “set”, in which case the player keeps the three cards of the “set” as their score,
then dealer replenishes the table by laying down more cards. If no set is found, then the dealer
continues to lay down more cards until a set is found.
The cards of the game correspond to points of F43 . A “set” is precisely a 3-AP. The cap set
problem in F43 asks for the number of cards without a “set.” The size of the maximum cap set
in F43 is 20 (Pellegrino 1970).
208 Forbidding 3-Term Arithmetic Progressions

Here is the proof strategy of Roth’s theorem in F3𝑛 :

(1) A 3-AP-free set has a large Fourier coefficient.
(2) A large Fourier coefficient implies density increment on some hyperplane.
(3) Iterate.
As in the proof of the graph regularity lemma (where we refined partitions to obtain an
energy increment), the above process must terminate in a bounded number of steps since the
density of a subset is always between 0 and 1.
Similar to what we saw in Chapter 3 on pseudorandom graphs, a set 𝐴 ⊆ F3𝑛 has pseu-
dorandom properties if and only if all its Fourier coefficients 1c𝐴 (𝑟), for 𝑟 ≠ 0, are small in
absolute value. When 𝐴 is pseudorandom in this Fourier-uniform sense, the 3-AP-density of
𝐴 is similar to that of a random set with the same density. On the flip side, a large Fourier
coefficient in 𝐴 points to non-uniformity along the direction of the Fourier character. Then
we can restrict 𝐴 to some hyperplane and extract a density increment.
The following counting lemma shows that a Fourier-uniform subset of F3𝑛 has 3-AP density
similar to that of a random set. It has a similar flavor as the proof that EIG implies C4 in
Theorem 3.1.1. It is also related to the counting lemma for graphons (Theorem 4.5.1). Recall
the 3-AP-density Λ3 from Definition 6.1.8.

Lemma 6.2.4 (3-AP counting lemma)

Let 𝑓 : F3𝑛 → [0, 1]. Then

Λ3 ( 𝑓 ) − (E 𝑓 ) 3 ≤ max | b
𝑓 (𝑟)| ∥ 𝑓 ∥ 22 .
𝑟≠0

Proof. By Proposition 6.1.9 (also see (6.4)),

∑︁ ∑︁
Λ3 ( 𝑓 ) = b
𝑓 (𝑟) 3 = b
𝑓 (0) 3 + b
𝑓 (𝑟) 3 .
𝑟 𝑟≠0

Since E 𝑓 = b
𝑓 (0), we have
∑︁ ∑︁
Λ3 ( 𝑓 ) − (E 𝑓 ) 3 ≤ |b
𝑓 (𝑟)| 3 ≤ max | b
𝑓 (𝑟)| · |b
𝑓 (𝑟)| 2 = max | b
𝑓 (𝑟)| ∥ 𝑓 ∥ 22 .
𝑟≠0 𝑟≠0
𝑟≠0 𝑟

The final step is by Parseval. □

Remark 6.2.5. It would be insufficient to bound each term | b
𝑓 (𝑟)| 3 by ∥ b
𝑓 ∥ 3∞ . Instead, Parseval
comes for the rescue. See Remark 3.1.19 for a similar issue.

Step 1. A 3-AP-free set has a large Fourier coefficient

Lemma 6.2.6 (3-AP-free implies a large Fourier coefficient)

Let 𝐴 ⊆ F3𝑛 and 𝛼 = | 𝐴| /3𝑛 . If 𝐴 is 3-AP-free and 3𝑛 ≥ 2𝛼 −2 , then there is 𝑟 ≠ 0 such
that | 1c𝐴 (𝑟)| ≥ 𝛼2 /2.

By the hypothesis 3𝑛 ≥ 2𝛼 −2 , the left-hand side above is ≥ 𝛼3 /2. So there is some 𝑟 ≠ 0

with | 1c𝐴 (𝑟)| ≥ 𝛼2 /2. □

Step 2. A large Fourier coefficient implies density increment on some hyperplane

Lemma 6.2.7 (Large Fourier coefficient implies density increment)

Let 𝐴 ⊆ F3𝑛 with 𝛼 = | 𝐴| /3𝑛 . Suppose | 1c𝐴 (𝑟)| ≥ 𝛿 > 0 for some 𝑟 ≠ 0. Then 𝐴 has
density at least 𝛼 + 𝛿/2 when restricted to some hyperplane.

Proof. We have
𝛼0 + 𝛼1 𝜔 + 𝛼2 𝜔2
1c𝐴 (𝑟) = E 𝑥 1 𝐴 (𝑥)𝜔 −𝑟 · 𝑥 =
3
where 𝛼0 , 𝛼1 , 𝛼2 are densities of 𝐴 on the cosets of 𝑟 ⊥ . We want to show that one of 𝛼0 , 𝛼1 , 𝛼2
is significantly larger than 𝛼. This is easy to check directly, but let us introduce a trick that
we will also use later in the integer setting.
We have 𝛼 = (𝛼0 + 𝛼1 + 𝛼2 )/3. By the triangle inequality,
3𝛿 ≤ 𝛼0 + 𝛼1 𝜔 + 𝛼2 𝜔2
= (𝛼0 − 𝛼) + (𝛼1 − 𝛼)𝜔 + (𝛼2 − 𝛼)𝜔2
≤ |𝛼0 − 𝛼| + |𝛼1 − 𝛼| + |𝛼2 − 𝛼|
∑︁
2

= |𝛼 𝑗 − 𝛼| + (𝛼 𝑗 − 𝛼) .
𝑗=0

Consequently, there exists 𝑗 such that |𝛼 𝑗 − 𝛼| + (𝛼 𝑗 − 𝛼) ≥ 𝛿. Note that |𝑡| + 𝑡 equals 2𝑡 if

𝑡 > 0 and 0 if 𝑡 ≤ 0. So 𝛼 𝑗 − 𝛼 ≥ 𝛿/2, as desired. □
Combining the previous two lemmas, here is what we have proved so far.

Lemma 6.2.8 (3-AP-free implies density increment)

Let 𝐴 ⊆ F3𝑛 and 𝛼 = | 𝐴| /3𝑛 . If 𝐴 is 3-AP-free and 3𝑛 ≥ 2𝛼 −2 , then 𝐴 has density at least
𝛼 + 𝛼2 /4 when restricted to some hyperplane. □

We now view this hyperplane 𝐻 as F3𝑛−1 (we may need to select a new origin for 𝐻 if
0 ∉ 𝐻). The restriction of 𝐴 to 𝐻 (i.e., 𝐴 ∩ 𝐻) is now a 3-AP-free subset of 𝐻. The density
increased from 𝛼 to 𝛼 + 𝛼2 /4. Next we iterate this density increment.
Remark 6.2.9 (Translation invariance). It is important that the pattern we are forbidding
(3-AP) is translation-invariant. What is wrong with the argument if instead we forbid the
pattern 𝑥 + 𝑦 = 𝑧? Note that {𝑥 ∈ F3𝑛 : 𝑥1 = 2} avoids solutions to 𝑥 + 𝑦 = 𝑧, and this set has
density 1/3.

Step 3. Iterate the density increment

We start with a 3-AP-free 𝐴 ⊆ F3𝑛 . Let 𝑉0 := F3𝑛 with density 𝛼0 := 𝛼 = | 𝐴| /3𝑛 . Repeatedly
apply Lemma 6.2.8. After 𝑖 rounds, we restrict 𝐴 to a codimension 𝑖 affine subspace 𝑉𝑖
210 Forbidding 3-Term Arithmetic Progressions

(with 𝑉0 ⊇ 𝑉1 ⊇ · · · ). Let 𝛼𝑖 = | 𝐴 ∩ 𝑉𝑖 | /|𝑉𝑖 | be the density of 𝐴 in 𝑉𝑖 . As long as

2𝛼𝑖−2 ≤ |𝑉𝑖 | = 3𝑛−𝑖 , we can apply Lemma 6.2.8 to obtain a 𝑉𝑖+1 with density increment
𝛼𝑖+1 ≥ 𝛼𝑖 + 𝛼𝑖2 /4.
Since 𝛼 = 𝛼0 ≤ 𝛼1 ≤ · · · ≤ 1, and 𝛼𝑖 increases by ≥ 𝛼𝑖2 /4 ≥ 𝛼2 /4 at each step, the process
terminates after 𝑚 ≤ 4/𝛼2 rounds, at which point we must have 3𝑛−𝑚 < 2𝛼𝑚 −2
≤ 2𝛼 −2 (or
2
√
else we can continue via Lemma 6.2.8). So 𝑛 < 𝑚 + log3 (2𝛼 ) = 𝑂 (1/𝛼 ). Thus 𝛼 ≤ 1/ 𝑛.
−2

This is just shy of the bound 𝛼 = 𝑂 (1/𝑛) that we aim to prove. So let us re-do the density
increment analysis more carefully to analyze how quickly 𝛼𝑖 grows.
Each round, 𝛼𝑖 increases by at least 𝛼2 /4. So it takes ≤ ⌈4/𝛼⌉ initial rounds for 𝛼𝑖 to
double. Once 𝛼𝑖 ≥ 2𝛼, it then increases by at least 𝛼𝑖2 /4 each round afterwards, so it takes
≤ ⌈1/𝛼𝑖 ⌉ ≤ ⌈1/𝛼⌉ additional
rounds
for the density to double again. And so on: the 𝑘-th
doubling time is at most 42−𝑘 /𝛼 . Since the density is always at least 𝛼, the density can
double at most log2 (1/𝛼) times. So the total number of rounds is at most
∑︁ 42− 𝑗
1
=𝑂 .
𝑗 ≤log (1/𝛼)
𝛼 𝛼
2

Suppose the process terminates after 𝑚 steps with density 𝛼𝑚 . Then, examining the
hypothesis of Lemma 6.2.8, we find that the size of the final subspace |𝑉𝑚 | = 3𝑛−𝑚 is less
than 𝛼𝑚
−2
≤ 𝛼 −2 . So 𝑛 ≤ 𝑚 + 𝑂 (log(1/𝛼)) ≤ 𝑂 (1/𝛼). Thus 𝛼 = | 𝐴| /𝑁 = 𝑂 (1/𝑛). This
completes the proof of Roth’s theorem in F3𝑛 (Theorem 6.2.1).
Remark 6.2.10 (Quantitative bounds). Edel (2004) obtained a cap set of size ≥ 2.21𝑛
for sufficiently large 𝑛. This is obtained by constructing a cap set in F480 3 of size 𝑚 =
2327 (273 + 37 776 ) ≥ 2.21480 , which then implies, by a product construction, a cap set in F480𝑘
3
of size 𝑚 𝑘 for each positive integer 𝑘.
It was an open problem of great interest whether an upper bound of the form 𝑐 𝑛 , with
constant 𝑐 < 3, was possible on the size of cap sets in F3𝑛 . With significant effort, the
Fourier analytic strategy above was extended to prove an upper bound of the form 3𝑛 /𝑛1+𝑐
(Bateman & Katz 2012). So it came as quite a shock to the community when a very short
polynomial method proof was discovered, giving an upper bound 𝑂 (2.76𝑛 ) (Croot, Lev, &
Pach 2017; Ellenberg & Gĳswĳt 2017). We will discuss this proof in Section 6.5. However,
the polynomial method proof appears to be specific to the finite field model, and it is not
known how to extend the strategy to the integers.
The following exercise shows why the above strategy does not generalize to 4-APs at least
in a straightforward manner.
Exercise 6.2.11 (Fourier uniformity does not control 4-AP counts). Let
𝐴 = {𝑥 ∈ F5𝑛 : 𝑥 · 𝑥 = 0}.
Prove that:
(a) | 𝐴| = (5−1 + 𝑜(1))5𝑛 and | 1c𝐴 (𝑟)| = 𝑜(1) for all 𝑟 ≠ 0;
(b) |{(𝑥, 𝑦) ∈ F5𝑛 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦, 𝑥 + 3𝑦 ∈ 𝐴}| ≠ (5−4 + 𝑜(1))52𝑛 .
Hint: First write 1 𝐴 as an exponential sum. Compare with the Gauss sum from Theorem 3.3.14.
6.2 Roth’s Theorem in the Finite Field Model 211

Exercise 6.2.12 (Linearity testing). Show that for every prime 𝑝 there is some 𝐶 𝑝 > 0
such that if 𝑓 : F𝑛𝑝 → F 𝑝 satisfies
P 𝑥,𝑦 ∈F𝑛𝑝 ( 𝑓 (𝑥) + 𝑓 (𝑦) = 𝑓 (𝑥 + 𝑦)) = 1 − 𝜀
then there exists some 𝑎 ∈ F𝑛𝑝 such that
P 𝑥 ∈F𝑛𝑝 ( 𝑓 (𝑥) = 𝑎 · 𝑥) ≥ 1 − 𝐶 𝑝 𝜀.
In the above P expressions 𝑥 and 𝑦 are chosen i.i.d. uniform from F𝑛𝑝 .

The following exercises introduce Gowers uniformity norms. Gowers (2001) used them
to prove Szemerédi’s theorem by extending the Fourier analytic proof strategy of Roth’s
theorem to what is now called higher order Fourier analysis.
The 𝑈 2 norm in the following exercise plays a role similar to Fourier analysis.
Exercise 6.2.13 (Gowers 𝑈 2 uniformity norm). Let 𝑓 : F𝑛𝑝 → C, define
1/4
∥ 𝒇 ∥𝑼 2 := E 𝑥,𝑦,𝑦 ′ ∈F𝑛𝑝 𝑓 (𝑥) 𝑓 (𝑥 + 𝑦) 𝑓 (𝑥 + 𝑦 ′ ) 𝑓 (𝑥 + 𝑦 + 𝑦 ′ ) .

(a) Show that the expectation above is always a nonnegative real number, so that the
above expression is well defined. Also, show that ∥ 𝑓 ∥𝑈 2 ≥ |E 𝑓 |.
(b) (Gowers Cauchy–Schwarz) For 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 : F𝑛𝑝 → C, let

⟨ 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 ⟩ = E 𝑥,𝑦,𝑦 ′ ∈F𝑛𝑝 𝑓1 (𝑥) 𝑓2 (𝑥 + 𝑦) 𝑓3 (𝑥 + 𝑦 ′ ) 𝑓4 (𝑥 + 𝑦 + 𝑦 ′ ).

Prove that
|⟨ 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 ⟩| ≤ ∥ 𝑓1 ∥𝑈 2 ∥ 𝑓2 ∥𝑈 2 ∥ 𝑓3 ∥𝑈 2 ∥ 𝑓4 ∥𝑈 2
(c) (Triangle inequality) Show that
∥ 𝑓 + 𝑔∥𝑈 2 ≤ ∥ 𝑓 ∥𝑈 2 + ∥𝑔∥𝑈 2 .
Conclude that ∥ ∥𝑈 2 is a norm.
Hint: Note that 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 is multilinear. Apply (b).

(d) (Relation with Fourier) Show that

∥ 𝑓 ∥𝑈 2 = ∥ b
𝑓 ∥ℓ4 .
Furthermore, deduce that if ∥ 𝑓 ∥ ∞ ≤ 1, then
∥b
𝑓 ∥ ∞ ≤ ∥ 𝑓 ∥𝑈 2 ≤ ∥ b
𝑓 ∥ 1/2
∞ .

(The second inequality gives a so-called “inverse theorem” for the 𝑈 2 norm: if
∥ 𝑓 ∥𝑈 2 ≥ 𝛿 then | b
𝑓 (𝑟)| ≥ 𝛿2 for some 𝑟 ∈ F𝑛𝑝 . Informally, if 𝑓 is not 𝑈 2 -uniform,
then 𝑓 correlates with some exponential phase function of the form 𝑥 ↦→ 𝜔𝑟 · 𝑥 .)
The inadequacy of Fourier analysis towards understanding 4-APs is remedied by the 𝑈 3
norm, which is significantly more mysterious than the 𝑈 2 norm. Some easier properties of
the 𝑈 3 norm are given in the exercise below. Understanding properties of functions with large
𝑈 3 norm (known as the inverse problem) lies at the heart of quadratic Fourier analysis,
212 Forbidding 3-Term Arithmetic Progressions

which we do not discuss in this book (see Further Reading). The structure of set addition,
which is the topic of the next chapter, plays a central role in this theory.
Exercise 6.2.14 (Gowers 𝑈 3 uniformity norm). Let 𝑓 : F𝑛𝑝 → C. Define

∥ 𝒇 ∥𝑼 3 := E 𝑥,𝑦1 ,𝑦2 ,𝑦3 𝑓 (𝑥) 𝑓 (𝑥 + 𝑦 1 ) 𝑓 (𝑥 + 𝑦 2 ) 𝑓 (𝑥 + 𝑦 3 ) · · ·
1/8
· 𝑓 (𝑥 + 𝑦 1 + 𝑦 2 ) 𝑓 (𝑥 + 𝑦 1 + 𝑦 3 ) 𝑓 (𝑥 + 𝑦 2 + 𝑦 3 ) 𝑓 (𝑥 + 𝑦 1 + 𝑦 2 + 𝑦 3 ) .

Alternatively, for each 𝑦 ∈ F𝑛𝑝 , define the multiplicative finite difference Δ 𝑦 𝑓 : F𝑛𝑝 → C by
Δ 𝑦 𝑓 (𝑥) := 𝑓 (𝑥) 𝑓 (𝑥 + 𝑦), we can rewrite the above expression in terms of the 𝑈 2 uniformity
norm from Exercise 6.2.13 as
8 4
∥ 𝑓 ∥𝑈 3 = E 𝑦 ∈F𝑛 Δ𝑦 𝑓 .
𝑝 𝑈2

(a) (Monotonicity) Verify that the above two definitions for ∥ 𝑓 ∥𝑈 3 coincides and give
well defined nonnegative real numbers. Also, show that
∥ 𝑓 ∥𝑈 2 ≤ ∥ 𝑓 ∥𝑈 3 .
(b) (Separation of norms) Let 𝑝 be odd and 𝑓 : F𝑛𝑝 → C be defined by 𝑓 (𝑥) = 𝑒 2 𝜋𝑖 𝑥· 𝑥/ 𝑝 .
Prove that ∥ 𝑓 ∥𝑈 3 = 1 and ∥ 𝑓 ∥𝑈 2 = 𝑝 −𝑛/4 .
(c) (Triangle inequality) Prove that
∥ 𝑓 + 𝑔∥𝑈 3 ≤ ∥ 𝑓 ∥𝑈 3 + ∥𝑔∥𝑈 3 .
Conclude that ∥ ∥𝑈 3 is a norm.
(d) (𝑈 3 norm controls 4-APs) Let 𝑝 ≥ 5 be a prime, and 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 : F𝑛𝑝 → C all
taking values in the unit disk. We write
Λ( 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 ) := E 𝑥,𝑦 ∈F𝑛𝑝 𝑓1 (𝑥) 𝑓2 (𝑥 + 𝑦) 𝑓3 (𝑥 + 2𝑦) 𝑓4 (𝑥 + 3𝑦).
Prove that
|Λ( 𝑓1 , 𝑓2 , 𝑓3 , 𝑓4 )| ≤ min ∥ 𝑓𝑠 ∥𝑈 3 .
𝑠

Furthermore, deduce that if 𝑓 , 𝑔 : F𝑛𝑝 → [0, 1], then

|Λ( 𝑓 , 𝑓 , 𝑓 , 𝑓 ) − Λ(𝑔, 𝑔, 𝑔, 𝑔)| ≤ 4 ∥ 𝑓 − 𝑔∥𝑈 3 .
Hint: Re-parameterize as in Section 2.10 and then repeatedly apply Cauchy–Schwarz.

6.3 Fourier Analysis in the Integers

Now we review the basic notions of Fourier analysis on the integers. In the next section, we
adapt the proof of Roth’s theorem from F3𝑛 to Z. The notions that we introduce below are
better known as Fourier series.
Here R/Z is the set of reals mod 1. A function 𝑓 : R/Z → C is the same as a function
𝑓 : R → C that is periodic mod 1 (i.e., 𝑓 (𝑥 + 1) = 𝑓 (𝑥) for all 𝑥 ∈ R).
6.3 Fourier Analysis in the Integers 213

Definition 6.3.1 (Fourier transform in Z)

Given a finitely supported 𝑓 : Z → C, define b𝑓 : R/Z → C by setting, for all 𝜃 ∈ R,
∑︁
𝒇 (𝜽) :=
b 𝑓 (𝑥)𝑒(−𝑥𝜃),
𝑥 ∈Z

where
𝒆(𝒕) := exp(2𝜋𝑖𝑡), 𝑡 ∈ R.

Note that b𝑓 (𝜃) = b

𝑓 (𝜃 + 𝑛) for all integers 𝑛. So b
𝑓 : R/Z → C is well defined.
The various identities in Section 6.1 have counterparts stated below. We leave the proofs
as exercises for the reader.

Theorem 6.3.2 (Fourier inversion formula)

Given a finitely supported 𝑓 : Z → C, for any 𝑥 ∈ Z,
∫ 1
𝑓 (𝑥) = b
𝑓 (𝜃)𝑒(𝑥𝜃) 𝑑𝜃.
0

Theorem 6.3.3 (Parseval / Plancheral)

Given finitely supported 𝑓 , 𝑔 : Z → C,
∑︁ ∫ 1
𝑓 (𝑥)𝑔(𝑥) = b
𝑓 (𝜃)b
𝑔 (𝜃) 𝑑𝜃
𝑥 ∈Z 0

In particular, as a special case ( 𝑓 = 𝑔),

∑︁ ∫ 1
| 𝑓 (𝑥)| 2 = |b
𝑓 (𝜃)| 2 𝑑𝜃
𝑥 ∈Z 0

Note the normalization conventions: we sum in the physical space Z (there is no sensible
way to average in Z) and average in the frequency space R/Z.

Definition 6.3.4 (Convolution)

Given finitely supported 𝑓 , 𝑔 : Z → C, define 𝑓 ∗ 𝑔 : Z → C by
∑︁
( 𝒇 ∗ 𝒈) (𝒙) := 𝑓 (𝑦)𝑔(𝑥 − 𝑦).
𝑦 ∈Z

Theorem 6.3.5 (Convolution identity)

Given finitely supported 𝑓 , 𝑔 : Z → C, for any 𝜃 ∈ R/Z,

𝑓 ∗ 𝑔(𝜃) = b
𝑓 (𝜃)b
𝑔 (𝜃).

Given finitely supported 𝑓 , 𝑔, ℎ : Z → C, define

∑︁
𝚲( 𝒇 , 𝒈, 𝒉) := 𝑓 (𝑥)𝑔(𝑥 + 𝑦)ℎ(𝑥 + 2𝑦)
𝑥,𝑦 ∈Z
214 Forbidding 3-Term Arithmetic Progressions

and
𝚲3 ( 𝒇 ) := Λ( 𝑓 , 𝑓 , 𝑓 ).
Then for any finite set 𝐴 of integers,
Λ3 ( 𝐴) = |{(𝑥, 𝑦) : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}|
counts the number of 3-APs in 𝐴, where each non-trivial 3-AP is counted twice, forward and
backward, and each trivial 3-AP is counted once.

Proposition 6.3.6 (Fourier and 3-AP)

Given finitely supported 𝑓 , 𝑔, ℎ : Z → C,
∫ 1
Λ( 𝑓 , 𝑔, ℎ) = b 𝑔 (−2𝜃)b
𝑓 (𝜃)b ℎ(𝜃) 𝑑𝜃.
0

Exercise 6.3.7. Prove all the identities above.

Exercise 6.3.8 (Counting solutions to a single linear equation). Let 𝑐 1 , . . . , 𝑐 𝑘 ∈ Z. Let

𝐴 ⊆ Z be a finite set. Show that
∫ 1
|{(𝑎 1 , . . . , 𝑎 𝑘 ) ∈ 𝐴 : 𝑐 1 𝑎 1 + · · · + 𝑐 𝑘 𝑎 𝑘 = 0}| =
𝑘
1c𝐴 (𝑐 1 𝑡) 1c𝐴 (𝑐 2 𝑡) · · · 1c𝐴 (𝑐 𝑘 𝑡) 𝑑𝑡.
0

Exercise 6.3.9. Show that if a finite set 𝐴 of integers contains 𝛽 | 𝐴| 2 solutions (𝑎, 𝑏, 𝑐) ∈
𝐴3 to 𝑎 +2𝑏 = 3𝑐, then it contains at least 𝛽2 | 𝐴| 3 solutions (𝑎, 𝑏, 𝑐, 𝑑) ∈ 𝐴4 to 𝑎 + 𝑏 = 𝑐 + 𝑑.

6.4 Roth’s Theorem in the Integers

In Section 6.2 we saw a Fourier analytic proof of Roth’s theorem in F3𝑛 . In this section, we
adapt the proof to the integers and obtain the following result. This is Roth’s original proof
(1953).

Theorem 6.4.1 (Roth’s theorem)

Every 3-AP-free subset of [𝑁] = {1, . . . , 𝑁 } has size 𝑂 (𝑁/log log 𝑁).

The proof of Roth’s theorem in F3𝑛 proceeded by density increment when restricting to
subspaces. An important difference between F3𝑛 and Z is that Z has no subspaces (more on
this later). Instead, we will proceed in Z by restricting to subprogressions. In this section, by
a progression we mean an arithmetic progression.
We have the following analogue of Lemma 6.2.4. It says that if 𝑓 and 𝑔 are “Fourier-close,”,
then they have similar 3-AP counts. We write
! 1/2
∑︁
𝒇 ∥ ∞ := sup | b
∥b 𝑓 (𝜃)| and ∥ 𝒇 ∥ ℓ2 := | 𝑓 (𝑥)| 2
.
𝜃 𝑥 ∈Z
6.4 Roth’s Theorem in the Integers 215

Proposition 6.4.2 (3-AP counting lemma)

Let 𝑓 , 𝑔 : Z → C be finitely supported functions. Then

|Λ3 ( 𝑓 ) − Λ3 (𝑔)| ≤ 3∥
𝑓 − 𝑔∥ ∞ max ∥ 𝑓 ∥ ℓ22 , ∥𝑔∥ ℓ22 .

Proof. We have
Λ3 ( 𝑓 ) − Λ3 (𝑔) = Λ( 𝑓 − 𝑔, 𝑓 , 𝑓 ) + Λ(𝑔, 𝑓 − 𝑔, 𝑓 ) + Λ(𝑔, 𝑔, 𝑓 − 𝑔).
Let us bound the first term on the right-hand side. We have
|Λ( 𝑓 − 𝑔, 𝑓 , 𝑓 )|
∫ 1
= (𝑓 − 𝑔)(𝜃) b
𝑓 (−2𝜃) b
𝑓 (𝜃) 𝑑𝜃 [Prop. 6.3.6]
∫
0
1
≤ ∥
𝑓 − 𝑔∥ ∞ b
𝑓 (−2𝜃) b
𝑓 (𝜃) 𝑑𝜃 [Triangle ineq.]
0
∫ 1 1/2 ∫ 1 1/2
≤ ∥ b b
2 2
𝑓 − 𝑔∥ ∞ 𝑓 (−2𝜃) 𝑑𝜃 𝑓 (𝜃) 𝑑𝜃 [Cauchy-Schwarz]
0 0

≤ ∥
𝑓 − 𝑔∥ ∞ ∥ 𝑓 ∥ ℓ22 . [Parseval]

By similar arguments, we have

|Λ(𝑔, 𝑓 − 𝑔, 𝑓 )| ≤ ∥
𝑓 − 𝑔∥ ∞ ∥ 𝑓 ∥ ℓ 2 ∥𝑔∥ ℓ 2
and
|Λ(𝑔, 𝑔, 𝑓 − 𝑔)| ≤ ∥
𝑓 − 𝑔∥ ∞ ∥𝑔∥ ℓ22 .
Combining with the first sum gives the result. □
Now we prove Roth’s theorem by following the same steps as in Section 6.2 for the finite
field setting.

Step 1. A 3-AP-free set has a large Fourier coefficient

Instead of directly studying the Fourier coefficients of 1 𝐴 (which is not a good idea since
1c𝐴 (𝜃) ≈ | 𝐴| is always large whenever 𝜃 ≈ 0), we apply a useful and standard trick and study
the Fourier coefficients of the de-meaned function
1 𝐴 − 𝛼1 [ 𝑁 ] .
This function has sum zero, and so its Fourier transform is zero at zero, which allows us
to focus on the interesting values away from zero. Subtracting by 𝛼1 [ 𝑁 ] here has the same
effect as considering 1c𝐴 (𝑟) only for nonzero 𝑟 in the finite field model.
216 Forbidding 3-Term Arithmetic Progressions

Lemma 6.4.3 (3-AP-free implies a large Fourier coefficient)

Let 𝐴 ⊆ [𝑁] be a 3-AP free set with | 𝐴| = 𝛼𝑁. If 𝑁 ≥ 5𝛼 −2 , then there exists 𝜃 ∈ R/Z
satisfying
∑︁
𝑁
𝛼2
(1 𝐴 − 𝛼) (𝑥)𝑒(𝜃𝑥) ≥ 𝑁.
𝑥=1
10

Proof. Since 𝐴 is 3-AP-free, the quantity 1 𝐴 (𝑥)1 𝐴 (𝑥 + 𝑦)1 𝐴 (𝑥 + 2𝑦) is nonzero only for
trivial 3-APs (here trivial means 𝑦 = 0). Thus
Λ3 (1 𝐴) = | 𝐴| = 𝛼𝑁.
On the other hand, a 3-AP in [𝑁] can be counted by counting pairs of integers with the same
parity to form the first and third element of the 3-AP, yielding,
Λ3 (1 [ 𝑁 ] ) = ⌊𝑁/2⌋ 2 + ⌈𝑁/2⌉ 2 ≥ 𝑁 2 /2.
Now apply the counting lemma (Proposition 6.4.2) to 𝑓 = 1 𝐴 and 𝑔 = 𝛼1 [ 𝑁 ] . We have
∥1 𝐴 ∥ ℓ22 = | 𝐴| = 𝛼𝑁 and ∥𝛼1 [ 𝑁 ] ∥ ℓ22 = 𝛼2 𝑁. So
𝛼3 𝑁 2
− 𝛼𝑁 ≤ 𝛼3 Λ3 (1 [ 𝑁 ] ) − Λ3 (1 𝐴) ≤ 3𝛼𝑁 (1 𝐴 − 𝛼1 [ 𝑁 ] ) ∧ .
2 ∞

Thus, using 𝑁 ≥ 5/𝛼2 , we have (the final step uses 𝑁 ≥ 5𝛼 −2 )

1 3 2
2
− 𝛼𝑁 1 2
𝛼 𝑁 1 1 2
(1 𝐴 − 𝛼1 [ 𝑁 ] ) ∧ ≥ = 𝛼 𝑁− ≥ 𝛼 𝑁.
∞ 3𝛼𝑁 6 3 10
Therefore there exists some 𝜃 ∈ R with
∑︁
𝑁
1 2
(1 𝐴 − 𝛼) (𝑥)𝑒(𝜃𝑥) = (1 𝐴 − 𝛼1 [ 𝑁 ] ) ∧ (𝜃) ≥ 𝛼 𝑁. □
𝑥=1
10

Step 2. A large Fourier coefficient implies density increment on a subprogression

In the finite field model, if 1c𝐴 (𝑟) is large for some 𝑟 ∈ F3𝑛 \ {0}, then we obtained a density
increment by restricting 𝐴 to some coset of the hyperplane 𝑟 ⊥ .
How can we adapt this argument in the integers?
In the finite field model, we used that the Fourier character 𝛾𝑟 (𝑥) = 𝜔𝑟 · 𝑥 is constant on
each coset of the hyperplane 𝑟 ⊥ ⊆ F3𝑛 . In the integer setting, we want to partition [𝑁] into
subprogressions such that the character Z → C : 𝑥 ↦→ 𝑒(𝑥𝜃) is roughly constant on each
subprogression. As a simple example, assume that 𝜃 is a rational 𝑎/𝑏 for some fairly small 𝑏.
Then 𝑥 ↦→ 𝑒(𝑥𝜃) is constant on arithmetic progressions with common difference 𝑏. Thus we
could partition [𝑁] into arithmetic progressions with common difference 𝑏. This is useful
as long as 𝑏 is not too large. On the other hand, if 𝑏 is too large, or if 𝜃 is irrational, then we
would want to approximate 𝜃 be a rational number with small denominator.
We write
∥𝜽 ∥ R/Z := distance from 𝜃 to the nearest integer.
6.4 Roth’s Theorem in the Integers 217

Lemma 6.4.4 (Dirichlet’s lemma)

Let 𝜃 ∈ R and 0 < 𝛿 < 1. Then there exists a positive integer 𝑑 ≤ 1/𝛿 such that
∥𝑑𝜃 ∥ R/Z ≤ 𝛿.

Proof. Let 𝑚 = ⌊1/𝛿⌋. By the pigeonhole principle, among the 𝑚 + 1 numbers 0, 𝜃, · · · , 𝑚𝜃,
we can find 0 ≤ 𝑖 < 𝑗 ≤ 𝑚 such that the fractional parts of 𝑖𝜃 and 𝑗 𝜃 differ by at most 𝛿. Set
𝑑 = |𝑖 − 𝑗 |. Then ∥𝑑𝜃 ∥ R/Z ≤ 𝛿, as desired. □
Given 𝜃, we now partition [𝑁] into subprogressions with roughly constant 𝑒(𝑥𝜃) inside
each progression. The constants appearing in rest of this argument are mostly unimportant.

Lemma 6.4.5 (Partition into progression level sets)

Let 0 < 𝜂 < 1 and 𝜃 ∈ R. Suppose 𝑁 ≥ (4𝜋/𝜂) 6 . Then one can partition [𝑁] into
subprogressions 𝑃𝑖 , each with length
𝑁 1/3 ≤ |𝑃𝑖 | ≤ 2𝑁 1/3 ,
such that
sup |𝑒(𝑥𝜃) − 𝑒(𝑦𝜃)| < 𝜂, for each 𝑖.
𝑥,𝑦 ∈ 𝑃𝑖

√ √
Proof. By Lemma 6.4.4, there is a positive integer 𝑑 < 𝑁 such that ∥𝑑𝜃 ∥ R/Z ≤ 1/ 𝑁.
Partition [𝑁] greedily into progressions with common difference 𝑑 of lengths between 𝑁 1/3
and 2𝑁 1/3 . Then, for two elements 𝑥, 𝑦 within the same progression 𝑃𝑖 , we have
|𝑒(𝑥𝜃) − 𝑒(𝑦𝜃)| ≤ |𝑃𝑖 | |𝑒(𝑑𝜃) − 1| ≤ 2𝑁 1/3 · 2𝜋 · 𝑁 −1/2 ≤ 𝜂.
Here we use the inequality |𝑒(𝑑𝜃) − 1| ≤ 2𝜋 ∥𝑑𝜃 ∥ R/Z from the fact that the length of a chord
on a circle is at most the length of the corresponding arc. □
We can now apply this lemma to obtain a density increment.

Lemma 6.4.6 (3-AP-free implies density increment)

Let 𝐴 ⊆ [𝑁] be 3-AP-free, with | 𝐴| = 𝛼𝑁 and 𝑁 ≥ (16/𝛼) 12 . Then there exists a
subprogression 𝑃 ⊆ [𝑁] with |𝑃| ≥ 𝑁 1/3 and | 𝐴 ∩ 𝑃| ≥ (𝛼 + 𝛼2 /40) |𝑃| .

Proof. By Lemma 6.4.3, there exists 𝜃 satisfying

∑︁
𝑁
𝛼2
(1 𝐴 − 𝛼) (𝑥)𝑒(𝑥𝜃) ≥ 𝑁.
𝑥=1
10

Next, apply Lemma 6.4.5 with 𝜂 = 𝛼2 /20 (the hypothesis 𝑁 ≥ (4𝜋/𝜂) 6 is satisfied since
(16/𝛼) 12 ≥ (80𝜋/𝛼2 ) 6 = (4𝜋/𝜂) 6 ) to obtain a partition 𝑃1 , . . . , 𝑃 𝑘 of [𝑁] satisfying 𝑁 1/3 ≤
|𝑃𝑖 | ≤ 2𝑁 1/3 and
𝛼2
|𝑒(𝑥𝜃) − 𝑒(𝑦𝜃)| ≤ for all 𝑖 and 𝑥, 𝑦 ∈ 𝑃𝑖 .
20
218 Forbidding 3-Term Arithmetic Progressions

So on each 𝑃𝑖 ,
∑︁ ∑︁ 𝛼2
(1 𝐴 − 𝛼) (𝑥)𝑒(𝑥𝜃) ≤ (1 𝐴 − 𝛼) (𝑥) + |𝑃𝑖 |.
𝑥 ∈ 𝑃𝑖 𝑥 ∈ 𝑃𝑖
20

Thus
𝛼2 ∑︁
𝑁
𝑁≤ (1 𝐴 − 𝛼) (𝑥)𝑒(𝑥𝜃)
10 𝑥=1
∑︁
𝑘 ∑︁
≤ (1 𝐴 − 𝛼) (𝑥)𝑒(𝑥𝜃) .
!
𝑖=1 𝑥 ∈ 𝑃𝑖
∑︁
𝑘 ∑︁ 𝛼2
≤ (1 𝐴 − 𝛼) (𝑥) + |𝑃𝑖 |
𝑖=1 𝑥∈𝑃
20
𝑖

∑︁
𝑘 ∑︁ 𝛼2
= (1 𝐴 − 𝛼) (𝑥) + 𝑁
𝑖=1 𝑥 ∈ 𝑃𝑖
20

Thus
𝛼2 ∑︁
𝑘 ∑︁
𝑁≤ (1 𝐴 − 𝛼) (𝑥)
20 𝑖=1 𝑥 ∈ 𝑃 𝑖

and hence
𝛼2 ∑︁ ∑︁
𝑘 𝑘
|𝑃𝑖 | ≤ | 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 | .
20 𝑖=1 𝑖=1

We want to show that there exists some 𝑃𝑖 such that 𝐴 has a density increment when restricted
to 𝑃𝑖 . The following trick is convenient. Note that

𝛼2 ∑︁ ∑︁
𝑘 𝑘
|𝑃𝑖 | ≤ | 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 |
20 𝑖=1 𝑖=1
∑︁
𝑘

= | 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 | + (| 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 |) ,
𝑖=1

as the newly added terms in the final step sum to zero. Thus there exists an 𝑖 such that
𝛼2
|𝑃𝑖 | ≤ | 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 | + (| 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 |) .
20
Since |𝑡| + 𝑡 is 2𝑡 for 𝑡 > 0 and 0 for 𝑡 ≤ 0, we deduce
𝛼2
|𝑃𝑖 | ≤ 2(| 𝐴 ∩ 𝑃𝑖 | − 𝛼|𝑃𝑖 |),
20
which yields

𝛼2
| 𝐴 ∩ 𝑃𝑖 | ≥ 𝛼 + |𝑃𝑖 |. □
40
6.4 Roth’s Theorem in the Integers 219

By translation and rescaling, we can identify 𝑃 with [𝑁 ′ ] with 𝑁 ′ = |𝑃|. Then 𝐴 ∩ 𝑃

becomes a subset 𝐴′ ⊆ [𝑁 ′ ]. Note that 𝐴′ is 3-AP-free (here we are invoking the important
fact that 3-APs are translation and dilation invariant). We can now iterate the argument. (Think
about where the argument goes wrong for patterns such as {𝑥, 𝑦, 𝑥 + 𝑦} and {𝑥, 𝑥 + 𝑦, 𝑥 + 𝑦 2 }.)

Step 3. Iterate the density increment

This step is nearly identical to the proof in the finite field model. Start with 𝛼0 = 𝛼 and
𝑁0 = 𝑁. After 𝑖 iterations, we arrive at a subprogression of length 𝑁𝑖 where 𝐴 has density 𝛼𝑖 .
As long as 𝑁𝑖 ≥ (16/𝛼𝑖 ) 12 , we can apply apply Lemma 6.4.6 to pass down to a subprogression
with
𝑁𝑖+1 ≥ 𝑁𝑖1/3 and 𝛼𝑖+1 ≥ 𝛼𝑖 + 𝛼𝑖2 /40.
We double 𝛼𝑖 from 𝛼0 after ≤ ⌈40/𝛼⌉ iterations. Once the density reaches at least 2𝛼, the
next doubling takes ≤ ⌈20/𝛼⌉ iterations, and so on. In general, the 𝑘-th doubling requires
≤ ⌈40 · 2−𝑘 /𝛼⌉ iterations. There are at most log2 (1/𝛼) doublings since the density is always
at most 1. Summing up, the total number of iterations is
log∑︁
2 (1/𝛼)

𝑚≤ 40 · 2−𝑘 /𝛼 = 𝑂 (1/𝛼).
𝑖=1

When the process terminates, by Lemma 6.4.6,

𝑁 1/3 ≤ 𝑁 𝑚 < (16/𝛼𝑖 ) 12 ≤ (16/𝛼) 12 .
𝑚

Rearranging gives
𝑁 ≤ (16/𝛼) 12·3 ≤ (16/𝛼) 𝑒
𝑚 𝑂 (1/𝛼)
.
Therefore
| 𝐴| 1
=𝛼=𝑂 .
𝑁 log log 𝑁
This completes the proof of Roth’s theorem (Theorem 6.4.1). □
We saw that the proofs in F3𝑛 and Z have largely the same set of ideas, but the proof in Z
is somewhat more technically involved. The finite field model is often a good sandbox to try
out Fourier analytic ideas.
Remark 6.4.7 (Bohr sets). Let us compare the results in F3𝑛 and [𝑁]. Write 𝑁 = 3𝑛 for
the size of the ambient space in both cases, for comparison. We obtained an upper bound
of 𝑂 (𝑁/log 𝑁) for 3-AP-free sets in F3𝑛 and 𝑂 (𝑁/log log 𝑁) in [𝑁] ⊆ Z. Where does the
difference in quantitative bounds stem from?
In the density increment step for F3𝑛 , at each step, we pass down to a subset which had size
a constant factor (namely 1/3) of the original one. However, in [𝑁], each iteration gives us
a subprogression which has size equal to the cube root of the previous subprogression. The
extra log for Roth’s theorem in the integers comes from this rapid reduction in the sizes of
the subprogressions.
Can we do better? Perhaps by passing down to subsets of [𝑁] that look more like subspaces?
220 Forbidding 3-Term Arithmetic Progressions

Indeed, this is possible. Bourgain (1999) used Bohr sets to prove an improved bound of
𝑁/(log 𝑁) 1/2+𝑜(1) on Roth’s theorem. Given 𝜃 1 , . . . , 𝜃 𝑘 , and some 𝜀 > 0, a Bohr set has the
form

𝑥 ∈ [𝑁] : ∥𝑥𝜃 𝑗 ∥ R/Z ≤ 𝜀 for each 𝑗 = 1, . . . , 𝑘 .
To see why this is analogous to subspaces, note that we can define a subspace of F3𝑛 as a set
of the following form

𝑥 ∈ F3𝑛 : 𝑟 𝑗 · 𝑥 = 0 for each 𝑗 = 1, . . . , 𝑘 .
where 𝑟 1 , . . . , 𝑟 𝑘 ∈ F3𝑛 \ {0}. Bohr sets are used widely in additive combinatorics, and in
nearly all subsequent work on Roth’s theorem in the integers, including the proof of the
current best bound 𝑁/(log 𝑁) 1+𝑐 for some constant 𝑐 > 0 (Bloom & Sisask 2020).
We will see Bohr sets again in the proof of Freiman’s theorem in Chapter 7.
The next exercise is analogous to Exercise 6.2.11, which was in F5𝑛 .
Exercise 6.4.8∗ (Fourier uniformity does not control 4-AP counts). Fix 0 < 𝛼 < 1. Let
𝑁 be a prime. Let

𝐴 = 𝑥 ∈ [𝑁] : 𝑥 2 mod 𝑁 < 𝛼𝑁 .
Viewing 𝐴 ⊆ Z/𝑁Z, prove that, as 𝑁 → ∞ with fixed 𝛼,
(a) | 𝐴| = (𝛼 + 𝑜(1))𝑁 and max𝑟≠0 | 1c𝐴 (𝑟)| = 𝑜(1);
(b) |(𝑥, 𝑦) ∈ Z/𝑁Z : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦, 𝑥 + 3𝑦 ∈ 𝐴| ≠ (𝛼4 + 𝑜(1))𝑁 2 .

6.5 Polynomial Method

An important breakthrough of Croot, Lev, & Pach (2017) showed how to apply the polyno-
mial method to Roth-type problems in the finite field model. Their method quickly found
many applications. Less than a week after the Croot, Lev, & Pach paper was made public,
Ellenberg & Gĳswĳt (2017) adapted their argument to prove the following bound on the cap
set problem. The discovery came as quite a shock to the community, especially as the proof
is so short.

Theorem 6.5.1 (Roth’s theorem in F3𝑛 : power-saving upper bound)

Every 3-AP-free subset of F3𝑛 has size 𝑂 (2.76𝑛 ).

The presentation of the proof below is due to Tao (2016).

Recall from linear algebra the usual rank of a matrix. Here we can view an | 𝐴| × | 𝐴|
matrix over the field F as a function 𝐹 : 𝐴 × 𝐴 → F. A function 𝐹 is said to have rank 1 if
𝐹 (𝑥, 𝑦) = 𝑓 (𝑥)𝑔(𝑦) for some nonzero functions 𝑓 , 𝑔 : 𝐴 → F. More generally, the rank of 𝐹
is the minimum 𝑘 so that 𝐹 can be written as a sum of 𝑘 rank 1 functions.
More generally, for other notions of rank, we can first define the set of rank 1 functions,
and then define the rank of 𝐹 to be the minimum 𝑘 so that 𝐹 can be written as a sum of 𝑘
rank 1 functions.
Whereas a function 𝐴 × 𝐴 → F corresponds to a matrix, a function 𝐴 × 𝐴 × 𝐴 → F
correspond to a 3-tensor. There is a notion of tensor rank, where the rank 1 functions are
6.5 Polynomial Method 221

those of the form 𝐹 (𝑥, 𝑦, 𝑧) = 𝑓 (𝑥)𝑔(𝑦)ℎ(𝑧). This is a standard and important notion (which
comes with a lot of mystery), but it is not the one that we shall use.

Definition 6.5.2 (Slice rank)

A function 𝐹 : 𝐴 × 𝐴 × 𝐴 → F is said to have slice rank 1 if it can be written as
𝑓 (𝑥)𝑔(𝑦, 𝑧), 𝑓 (𝑦)𝑔(𝑥, 𝑧), or 𝑓 (𝑧)𝑔(𝑥, 𝑦),
for some nonzero functions 𝑓 : 𝐴 → F and 𝑔 : 𝐴 × 𝐴 → F.
The slice rank of a function 𝐹 : 𝐴 × 𝐴 × 𝐴 → F is the minimum 𝑘 so that 𝐹 can be
written as a sum of 𝑘 slice rank 1 functions.
Here is an easy fact about the slice rank.

Lemma 6.5.3 (Trivial upper bound for slice rank)

Every function 𝐹 : 𝐴 × 𝐴 × 𝐴 → F has slice rank at most | 𝐴|.

Proof. Let 𝐹𝑎 be the restriction of 𝐹 to the “slice” {(𝑥, 𝑦, 𝑧) ∈ 𝐴 × 𝐴 × 𝐴 : 𝑥 = 𝑎}; that is,
(
𝐹 (𝑥, 𝑦, 𝑧) if 𝑥 = 𝑎,
𝐹𝑎 (𝑥, 𝑦, 𝑧) =
0 if𝑥 ≠ 𝑎.
𝐹𝑎

Then 𝐹𝑎 has slice rank ≤ 1 since 𝐹𝑎 (𝑥, 𝑦, 𝑧) = 𝛿 𝑎 (𝑥)𝐹 (𝑎, 𝑦, 𝑧), where 𝛿 𝑎 denotes the function
Í
taking value 1 at 𝑎 and 0 elsewhere. Thus 𝐹 = 𝑎∈ 𝐴 𝐹𝑎 has slice rank at most | 𝐴|. □
For the next lemma, we need the following fact from linear algebra.

Lemma 6.5.4 (Vector with large support)

Every 𝑘-dimensional subspace of an 𝑛-dimensional vector space (over any field) contains
a point with at least 𝑘 nonzero coordinates.

Proof. Form a 𝑘 × 𝑛 matrix 𝑀 whose rows form a basis of this 𝑘-dimensional subspace
𝑊. Then 𝑀 has rank 𝑘. So it has some invertible 𝑘 × 𝑘 submatrix with columns 𝑆 ⊆ [𝑛]
with |𝑆| = 𝑘. Then for every 𝑧 ∈ F𝑆 , there is some linear combination of the rows whose
coordinates on 𝑆 are identical to those of 𝑧. In particular, there is some vector in the 𝑘-
dimensional subspace 𝑊 whose 𝑆-coordinates are all nonzero. □
A diagonal matrix with nonzero diagonal entries has full rank. We show that a similar
statement holds true for the slice rank.
Lemma 6.5.5 (Slice rank of a diagonal)
Suppose 𝐹 : 𝐴 × 𝐴 × 𝐴 → F satisfies 𝐹 (𝑥, 𝑦, 𝑧) ≠ 0 if and only if 𝑥 = 𝑦 = 𝑧. Then 𝐹 has
slice rank | 𝐴|.

Proof. From Lemma 6.5.3, we already know that the slice rank of 𝐹 is ≤ | 𝐴|. It remains to
prove that the slice rank of 𝐹 is is ≥ | 𝐴|.
Suppose 𝐹 (𝑥, 𝑦, 𝑧) can be written as a sum of functions of the form
𝑓 (𝑥)𝑔(𝑦, 𝑧), 𝑓 (𝑦)𝑔(𝑥, 𝑧), and 𝑓 (𝑧)𝑔(𝑥, 𝑦),
222 Forbidding 3-Term Arithmetic Progressions

with 𝑚 1 summands of the first type, 𝑚 2 of the second type, and 𝑚 3 of the third type. By
Lemma 6.5.4, there is some function ℎ : 𝐴 → F that is orthogonal to all the 𝑓 ’s from the
Í
third type of summands (i.e., 𝑥 ∈ 𝐴 𝑓 (𝑥)ℎ(𝑥) = 0), and such that |supp ℎ| ≥ | 𝐴| − 𝑚 3 . Let
∑︁
𝐺 (𝑥, 𝑦) = 𝐹 (𝑥, 𝑦, 𝑧)ℎ(𝑧).
𝑧∈ 𝐴

Only summands of the first two types remain. Each summand of the first type turns into a
rank 1 function (in the matrix sense of the rank)
∑︁
(𝑥, 𝑦) ↦→ 𝑓 (𝑥)𝑔(𝑦, 𝑧)ℎ(𝑧) = 𝑓 (𝑥)e
𝑔 (𝑦)
𝑧

for some new function e 𝑔 : 𝐴 → F. Similarly with functions of the second type. So 𝐺 (viewed
as an | 𝐴| × | 𝐴| matrix) has rank ≤ 𝑚 1 + 𝑚 2 . On the other hand,
(
ℎ(𝑥) if 𝑥 = 𝑦,
𝐺 (𝑥, 𝑦) =
0 if 𝑥 ≠ 𝑦.
This 𝐺 has rank |supp ℎ| ≥ | 𝐴| − 𝑚 3 . Combining, we get
| 𝐴| − 𝑚 3 ≤ rank 𝐺 ≤ 𝑚 1 + 𝑚 2 .
So 𝑚 1 + 𝑚 2 + 𝑚 3 ≥ | 𝐴|. This shows that the slice rank of 𝐹 is ≥ | 𝐴|. □
Now we prove an upper bound on the slice rank by invoking magical powers of polynomials.

Lemma 6.5.6 (Upper bound on the slice rank of 1 𝑥+𝑦+𝑧=0 )

Define 𝐹 : 𝐴 × 𝐴 × 𝐴 → F3 by
(
1 if 𝑥 + 𝑦 + 𝑧 = 0,
𝐹 (𝑥, 𝑦, 𝑧) =
0 otherwise.
Then the slice rank of 𝐹 is at most
∑︁ 𝑛!
3 .
𝑎,𝑏,𝑐≥0
𝑎!𝑏!𝑐!
𝑎+𝑏+𝑐=𝑛
𝑏+2𝑐≤2𝑛/3

Proof. In F3 , one has

(
1 if 𝑥 = 0,
1 − 𝑥2 =
0 if 𝑥 ≠ 0.
So, writing 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ), 𝑦 = (𝑦 1 , . . . , 𝑦 𝑛 ), and 𝑧 = (𝑧1 , . . . , 𝑧 𝑛 ), we have
Ö
𝑛
𝐹 (𝑥, 𝑦, 𝑧) = (1 − (𝑥𝑖 + 𝑦 𝑖 + 𝑧𝑖 ) 2 ). (6.5)
𝑖=1

If we expand the right-hand side, we obtain a polynomial in 3𝑛 variables with degree 2𝑛.
This is a sum of monomials, each of the form
𝑥1𝑖1 · · · 𝑥 𝑛𝑖𝑛 𝑦 1𝑗1 · · · 𝑦 𝑛𝑗𝑛 𝑧1𝑘1 · · · 𝑧 𝑛𝑘𝑛 ,
6.5 Polynomial Method 223

where 𝑖1 , 𝑖2 , . . . , 𝑖 𝑛 , 𝑗1 , . . . , 𝑗 𝑛 , 𝑘 1 , . . . , 𝑘 𝑛 ∈ {0, 1, 2}. For each term, by the pigeonhole prin-

ciple, at least one of 𝑖 1 + · · · + 𝑖 𝑛 , 𝑗1 + · · · + 𝑗 𝑛 , 𝑘 1 + · · · + 𝑘 𝑛 is at most 2𝑛/3. So we can split
these summands into three sets:
Ö 𝑛 ∑︁
(1 − (𝑥𝑖 + 𝑦 𝑖 + 𝑧𝑖 ) 2 ) = 𝑥1𝑖1 · · · 𝑥 𝑛𝑖𝑛 𝑓𝑖1 ,...,𝑖𝑛 (𝑦, 𝑧)
𝑖1 +···+𝑖𝑛 ≤ 2𝑛
∑︁
𝑖=1
3

+ 𝑦 1𝑗1 · · · 𝑦 𝑛𝑗𝑛 𝑔 𝑗1 ,..., 𝑗𝑛 (𝑥, 𝑧)

2𝑛

∑︁
𝑗1 +···+ 𝑗𝑛 ≤ 3

+ 𝑧1𝑘1 · · · 𝑧 𝑛𝑘𝑛 ℎ 𝑘1 ,...,𝑘𝑛 (𝑥, 𝑦).

2𝑛
𝑘1 +···+𝑘𝑛 ≤ 3

Each summand has slice rank at most 1. The number of summands in the first sum is precisely
the number of triples of nonnegative integers 𝑎, 𝑏, 𝑐 with 𝑎 + 𝑏 + 𝑐 = 𝑛 and 𝑏 + 2𝑐 ≤ 2𝑛/3
(𝑎, 𝑏, 𝑐 correspond to the numbers of 𝑖 ∗ ’s that are equal to 0, 1, 2 respectively) . The lemma
then follows. □
Here is a standard estimate. The proof is similar to that of the Chernoff bound.

Lemma 6.5.7 (A trinomial coefficient estimate)

For every positive integer 𝑛,
∑︁ 𝑛!
≤ 2.76𝑛 .
𝑎,𝑏,𝑐≥0
𝑎!𝑏!𝑐!
𝑎+𝑏+𝑐=𝑛
𝑏+2𝑐≤2𝑛/3

Proof. Let 𝑥 ∈ [0, 1]. The sum equals to the coefficients of all the monomials 𝑥 𝑘 with
𝑘 ≤ 2𝑛/3 in the expansion of (1 + 𝑥 + 𝑥 2 ) 𝑛 . By deleting contributions 𝑥 𝑘 with 𝑘 > 2𝑛/3 and
using 𝑥 2𝑛/3 ≤ 𝑥 𝑘 whenever 𝑘 ≤ 2𝑛/3, we have
∑︁ 𝑛! (1 + 𝑥 + 𝑥 2 ) 𝑛
≤ .
𝑎,𝑏,𝑐≥0
𝑎!𝑏!𝑐! 𝑥 2𝑛/3
𝑎+𝑏+𝑐=𝑛
𝑏+2𝑐≤2𝑛/3

Setting 𝑥 = 0.6 shows that the left-hand side sum is ≤ (2.76) 𝑛 . □

√
Remark 6.5.8. Taking the optimal value 𝑥 = ( 33−1)/8 = 0.59307 . . . in the final step, we
obtain ≤ (2.75510 . . . ) 𝑛 . This is the true exponential asymptotics of the sum in Lemma 6.5.7
(for example, see Sanov’s theorem from large deviation theory). We have no idea how close
this is to the optimal bound for the cap set problem. However, quite surprisingly, such bound
is tight for a variant of the cap sets known as the tri-colored sum-free sets (Blasiak et al.
2017; Kleinberg et al. 2018).
Proof of Theorem 6.5.1. Let 𝐴 ⊆ F3𝑛 be 3-AP-free. Define 𝐹 : 𝐴 × 𝐴 × 𝐴 → F3 by
(
1 if 𝑥 + 𝑦 + 𝑧 = 0,
𝐹 (𝑥, 𝑦, 𝑧) =
0 otherwise.
Since 𝐴 is 3-AP-free, one has 𝐹 (𝑥, 𝑦, 𝑧) = 1 if and only if 𝑥 = 𝑦 = 𝑧 ∈ 𝐴. By Lemma 6.5.5, 𝐹
224 Forbidding 3-Term Arithmetic Progressions

has slice rank | 𝐴|. On the other hand, by Lemmas 6.5.6 and 6.5.7, 𝐹 has slice rank ≤ 3(2.76) 𝑛 .
So | 𝐴| ≤ 3(2.76) 𝑛 . □
It is straightforward to extend the above proof from F3 to any other fixed F 𝑝 , resulting:

Theorem 6.5.9 (Roth’s theorem in the finite field model)

For every odd prime 𝑝, there is some 𝑐 𝑝 < 𝑝 so that every 3-AP-free subset of F𝑛𝑝 has
size at most 3𝑐 𝑛𝑝 .

It remains an intriguing open problem to extend the techniques to other settings.

Open problem 6.5.10 (Szemerédi’s theorem in the finite field model)

Is there a constant 𝑐 < 5 such that every 4-AP-free subset of F5𝑛 has size 𝑂 (𝑐 𝑛 )?

Open problem 6.5.11 (Corner-free theorem in the finite field model)

Is there a constant 𝑐 < 2 such that every corner-free subset of F2𝑛 × F2𝑛 has size 𝑂 (𝑐2𝑛 )?
Here a corner is a configuration of the form {(𝑥, 𝑦), (𝑥 + 𝑑, 𝑦), (𝑥, 𝑦 + 𝑑)}.

Finally, the proof technique in this section seems specific to the finite field model. It is an
intriguing open problem to apply the polynomial method for Roth’s theorem in the integers.
Due to the Behrend example (Section 2.5), we cannot expect power-saving bounds in the
integers.
Exercise 6.5.12 (Tricolor sum-free set). Let 𝑎 1 , . . . , 𝑎 𝑚 , 𝑏 1 , . . . , 𝑏 𝑚 , 𝑐 1 , . . . , 𝑐 𝑚 ∈ F2𝑛 .
Suppose that the equation 𝑎 𝑖 + 𝑏 𝑗 + 𝑐 𝑘 = 0 holds if and only if 𝑖 = 𝑗 = 𝑘. Show that there
is some constant 𝑐 > 0 such that 𝑚 ≤ (2 − 𝑐) 𝑛 for all sufficiently large 𝑛.

Exercise 6.5.13 (Sunflower-free set). Three sets 𝐴, 𝐵, 𝐶 form a sunflower if 𝐴 ∩ 𝐵 =

𝐵 ∩ 𝐶 = 𝐴 ∩ 𝐶 = 𝐴 ∩ 𝐵 ∩ 𝐶. Prove that there exists some constant 𝑐 > 0 such that if F is
a collection of subsets of [𝑛] without a sunflower, then |F | ≤ (2 − 𝑐) 𝑛 provided that 𝑛 is
sufficiently large.

6.6 Arithmetic Regularity

Here we develop an arithmetic analogue of Szemerédi’s graph regularity lemma from Chap-
ter 2. Just as the graph regularity method has powerful applications, so too does the arithmetic
regularity lemma as well as the general strategy behind it.
First, we need a notion of what it means for a subset of F𝑛𝑝 to be uniform, in a sense
analogous to 𝜀-regular pairs from the graph regularity lemma. We also saw the following
notion in the Fourier analytic proof of Roth’s theorem.

Definition 6.6.1 (Fourier uniformity)

We say that 𝐴 ⊆ F𝑛𝑝 is 𝜺-uniform if | 1c𝐴 (𝑟)| ≤ 𝜀 for all 𝑟 ∈ F𝑛𝑝 \ {0}.

The following exercises explains how Fourier uniformity is analogous to the discrepancy-
type condition for 𝜀-regular pairs in the graph regularity lemma.
6.6 Arithmetic Regularity 225

Exercise 6.6.2 (Uniformity vs. discrepancy). Let 𝐴 ⊆ F𝑛𝑝 with | 𝐴| = 𝛼𝑝 𝑛 . We say that 𝐴
satisfies HyperplaneDISC(𝜂) if for every hyperplane 𝑊 of F𝑛𝑝 ,
|𝐴 ∩ 𝑊|
− 𝛼 ≤ 𝜂.
|𝑊 |
(a) Prove that if 𝐴 satisfies HyperplaneDISC(𝜀), then 𝐴 is 𝜀-uniform.
(b) Prove that if 𝐴 is 𝜀-uniform, then it satisfies HyperplaneDISC(( 𝑝 − 1)𝜀).

Definition 6.6.3 (Fourier uniformity on affine subspaces)

For an affine subspace 𝑊 of F𝑛𝑝 (i.e., the coset of a subspace), we say that 𝐴 is 𝜺-uniform
on 𝑾 if 𝐴 ∩ 𝑊 is 𝜀-uniform when viewed as a subset of 𝑊.

Here is an arithmetic analogue of Szemerédi’s graph regularity lemma that we saw in

Chapter 2. It is due to Green (2005a).

Theorem 6.6.4 (Arithmetic regularity lemma)

For every 𝜀 > 0 and prime 𝑝, there exists 𝑀 so that for every 𝐴 ⊆ F𝑛𝑝 , there is some
subspace 𝑊 of F𝑛𝑝 with codimension at most 𝑀 such that 𝐴 is 𝜀-uniform on all but at
most 𝜀-fraction of cosets of 𝑊.

The proof is very similar to the proof of the graph regularity lemma in Chapter 2. Each
subspace 𝑊 induces a partition of the whole space F𝑛𝑝 into 𝑊-cosets, and we keep track the
energy (mean-squared density) of the partition. We show that if the conclusion of Theo-
rem 6.6.4 does not hold for the current 𝑊, then we can replace 𝑊 by a smaller subspace so
that the energy increases significantly. Since the energy is always bounded between 0 and 1,
there are at most a bounded number of iterations.

Definition 6.6.5 (Energy)

Given 𝐴 ⊆ F𝑛𝑝 , and 𝑊 a subspace of F𝑛𝑝 , we define the energy of 𝑊 with respect to a
fixed 𝐴 to be

| 𝐴 ∩ (𝑊 + 𝑥)| 2
𝒒 𝑨 (𝑾) := E 𝑥 ∈F𝑛𝑝 .
|𝑊 | 2

Given a subspace 𝑊 of F𝑛𝑝 . Define 𝜇𝑊 : F𝑛𝑝 → R by

𝑝𝑛
𝝁𝑾 := 1𝑊 .
|𝑊 |
(One can regard 𝜇𝑊 as the uniform probability distribution on 𝑊; it is normalized so that
E𝜇𝑊 = 1.) Then,
| 𝐴 ∩ (𝑊 + 𝑥)|
(1 𝐴 ∗ 𝜇𝑊 )(𝑥) = for every 𝑥 ∈ F𝑛𝑝 .
|𝑊 |
We have (check!)
(
1 if 𝑟 ∈ 𝑊 ⊥ ,
𝜇c
𝑊 (𝑟) =
0 if 𝑟 ∉ 𝑊 ⊥ .
226 Forbidding 3-Term Arithmetic Progressions

So by the convolution identity (Theorem 6.1.7).

(
1c𝐴 (𝑟) if 𝑟 ∈ 𝑊 ⊥ ,
1 c c
𝐴 ∗ 𝜇 𝑊 (𝑟) = 1 𝐴 (𝑟) 𝜇 𝑊 (𝑟) = (6.6)
0 if 𝑟 ∉ 𝑊 ⊥ .
To summarize, convolving by 𝜇𝑊 averages 1 𝐴 along cosets of 𝑊 in the physical space, and
filters 𝑊 ⊥ in the Fourier space.
Energy interacts nicely with the Fourier transform. By Parseval’s identity (Theorem 6.1.3),
we have
∑︁ ∑︁
𝑞 𝐴 (𝑊) = ∥1 𝐴 ∗ 𝜇𝑊 ∥ 22 = | 1
𝐴 ∗ 𝜇 𝑊 (𝑟)| 2
= | 1c𝐴 (𝑟)| 2 . (6.7)
𝑟 ∈F𝑛𝑝 𝑟 ∈𝑊 ⊥

The next lemma is analogous to Lemma 2.1.12. It is an easy consequence of convexity. It

also directly follows from (6.7).

Lemma 6.6.6 (Energy never decreases under refinement)

Let 𝐴 ⊆ F𝑛𝑝 . For subspaces 𝑈 ≤ 𝑊 ≤ F2𝑝 , we have 𝑞 𝐴 (𝑈) ≥ 𝑞 𝐴 (𝑊). □

The next lemma is analogous to the energy boost lemma for irregular pairs in the proof of
graph regularity (Lemma 2.1.13).

Lemma 6.6.7 (Local energy increment)

If 𝐴 ⊆ F𝑛𝑝 is not 𝜀-uniform, then there is some codimension-1 subspace 𝑊 with 𝑞 𝐴 (𝑊) >
(| 𝐴| /𝑝 𝑛 ) 2 + 𝜀 2 .

c𝐴 (𝑟)| > 𝜀. Let

Proof. Suppose 𝐴 is not 𝜀-uniform. Then there is some 𝑟 ≠ 0 such that | 1
𝑊 = 𝑟 ⊥ . Then by (6.7),
𝑞 𝐴 (𝑊) = | 1c𝐴 (0)| 2 + | 1c𝐴 (𝑟)| 2 + | 1c𝐴 (2𝑟)| 2 + · · · + | 1c𝐴 (( 𝑝 − 1)𝑟)| 2
≥ | 1c𝐴 (0)| 2 + | 1c𝐴 (𝑟)| 2 > (| 𝐴| /𝑝 𝑛 ) 2 + 𝜀 2 . □
By applying the above lemmas locally to each 𝑊-coset, we obtain the following global
increment, analogous to Lemma 2.1.14

Lemma 6.6.8 (Global energy increment)

Let 𝐴 ⊆ F𝑛𝑝 . Let 𝑊 be a subspace of F𝑛𝑝 . Suppose that 𝑓 is not 𝜀-uniform on > 𝜀-fraction
of 𝑊-cosets. Then there is some subspace 𝑈 of 𝑊 with codim 𝑈 − codim 𝑊 ≤ 𝑝 codim 𝑊
such that
𝑞 𝐴 (𝑈) > 𝑞 𝐴 (𝑊) + 𝜀 3 .

Proof. By Lemma 6.6.7, for each coset 𝑊 ′ of 𝑊 on which 𝑓 is not 𝜀-uniform, we can find
some 𝑟 ∈ F𝑛𝑝 \ 𝑊 ⊥ so that replacing 𝑊 by its intersection with 𝑟 ⊥ increases its energy on 𝑊 ′
by more than 𝜀 2 . In other words,
| 𝐴 ∩ 𝑊 ′ |2
𝑞 𝐴∩𝑊 ′ (𝑊 ′ ∩ 𝑟 ⊥ ) > + 𝜀2 .
|𝑊 ′ | 2
6.6 Arithmetic Regularity 227

Let 𝑅 be a set of such 𝑟’s, one for each 𝑊-coset on which 𝑓 is not 𝜀-uniform (allowing some
𝑟’s to be chosen repeatedly).
Let 𝑈 = 𝑊 ∩ 𝑅 ⊥ . Then codim 𝑈 − codim 𝑊 ≤ |𝑅| ≤ |F 𝑝 /𝑊 | = 𝑝 codim 𝑊 .
Applying the monotonicity of energy (Lemma 6.6.6) on each 𝑊-coset and using the
observation in the first paragraph in this proof, we see the “local” energy of 𝑈 is more than
that of 𝑊 on by > 𝜀 2 on each of the > 𝜀-fraction of 𝑊-cosets on which 𝑓 is not 𝜀-uniform,
and is at least as great as that of 𝑊 on each of the remaining 𝑊-cosets. There the energy
increases by > 𝜀 2 when refining from 𝑊 to 𝑈. □
Proof of the arithmetic regularity lemma (Theorem 6.6.4). Starting with 𝑊0 = F𝑛𝑝 , we con-
struct a sequence of subspaces 𝑊0 ≥ 𝑊1 ≥ 𝑊2 ≥ · · · where each at step, unless 𝐴 is 𝜀-
uniform on all but ≤ 𝜀-fraction of 𝑊-cosets, then we apply Lemma 6.6.8 to find 𝑊𝑖+1 ≤ 𝑊𝑖 .
The energy increases by > 𝜀 3 at each iteration, so there are < 𝜀 −3 iterations. We have
codim 𝑊𝑖+1 ≤ codim 𝑊𝑖 + 𝑝 codim 𝑊𝑖 at each 𝑖, so the final 𝑊 = 𝑊𝑚 has codimension at most
some function of 𝑝 and 𝜀 (one can check that it is an exponential tower of 𝑝’s of height
𝑂 (𝜀 −3 )). This 𝑊 satisfies the desired properties. □
Remark 6.6.9 (Lower bound). Recall that Gowers (1997) showed that there exist graphs
whose 𝜀-regular partition requires at least tower(Ω(𝜀 −𝑐 )) parts (Theorem 2.1.17). There is a
similar tower-type lower bound for the arithmetic regularity lemma (Green 2005a; Hosseini,
Lovett, Moshkovitz, & Shapira 2016).
Remark 6.6.10 (Abelian groups). Green (2005a) also established an arithmetic regularity
lemma over arbitrary finite abelian groups. Instead of subspaces, one uses Bohr sets (see
Remark 6.4.7).
You may wish to skip ahead to Section 6.7 to see an application of the arithmetic regularity
lemma.

Arithmetic regularity decomposition

Now let us give another arithmetic regularity result. It has the same spirit as the above
regularity lemma, but phrased in terms of a decomposition rather than a partition. This
perspective of regularity as decompositions, popularized by Tao, allows one to adapt the
ideas of regularity to more general settings where we cannot neatly partition the underlying
space into easily describable pieces. It is very useful and has many applications in additive
combinatorics.
Theorem 6.6.11 (Arithmetic regularity decomposition)
For every sequence 𝜀0 ≥ 𝜀1 ≥ 𝜀2 ≥ · · · > 0, there exists 𝑀 so that every 𝑓 : F𝑛𝑝 → [0, 1]
can be written as
𝑓 = 𝑓str + 𝑓psr + 𝑓sml
where
• (structured piece) 𝑓str = 𝑓𝑊 for some subspace 𝑊 of codimension at most 𝑀;
• (pseudorandom piece) ∥ c 𝑓psr ∥ ∞ ≤ 𝜀 codim 𝑊 ;
• (small piece) ∥ 𝑓sml ∥ 2 ≤ 𝜀 0 .
228 Forbidding 3-Term Arithmetic Progressions

Remark 6.6.12. It is worth comparing Theorem 6.6.11 to the strong graph regularity lemma
(Theorem 2.8.3). It is important that the uniformity requirement on the pseudorandom piece
depends on the codim 𝑊.
In other more advanced applications, we would like 𝑓str to come from some structured
class of functions. For example, in higher order Fourier analysis, 𝑓str is a nilsequence.
Proof. Let 𝑘 0 = 0 and 𝑘 𝑖+1 = max{𝑘 𝑖 , ⌈𝜀 𝑘−2𝑖 ⌉} for each 𝑖 ≥ 0. Note that 𝑘 0 ≤ 𝑘 1 ≤ · · · .
Let us label the elements 𝑟 1 , 𝑟 2 , . . . , 𝑟 𝑝 𝑛 of F𝑛𝑝 so that

|b
𝑓 (𝑟 1 )| ≥ | b
𝑓 (𝑟 2 )| ≥ · · · .
By Parseval (Theorem 6.1.3), we have
∑︁
𝑝𝑛
|b
𝑓 (𝑟 𝑗 )| 2 = E 𝑓 2 ≤ 1.
𝑗=1

There is some positive integer 𝑚 ≤ ⌈𝜀0−2 ⌉ so that

into
𝑓 = 𝑓str + 𝑓sml + 𝑓psr
according to the sizes of the Fourier coefficients. Roughly speaking, the large spectrum will
go into the structured piece 𝑓str , the very small spectrum will go into pseudorandom piece
𝑓psr , and the remaining middle terms will form the small piece 𝑓sml (which has small 𝐿 2 norm
by (6.8)).
Let 𝑊 = {𝑟 1 , . . . , 𝑟 𝑘𝑚 }⊥ and set
𝑓str = 𝑓𝑊 .
Then, by (6.6),
(
b if 𝑟 ∈ 𝑊 ⊥ ,
c
𝑓str (𝑟) =
𝑓 (𝑟)
0 if 𝑟 ∈ 𝑊 ⊥ .
Let us define 𝑓psr and 𝑓sml via their Fourier transform (and we can recover the functions via
the inverse Fourier transform). For each 𝑗 = 1, 2, . . . , 𝑝 𝑛 , set
(
b
𝑓 (𝑟 𝑗 ) if 𝑗 > 𝑘 𝑚+1 and 𝑟 𝑗 ∉ 𝑊 ⊥ ,
c
𝑓psr (𝑟 𝑗 ) =
0 otherwise.
6.6 Arithmetic Regularity 229

Finally, let 𝑓sml = 𝑓 − 𝑓psr − 𝑓sml , so that

(
b
𝑓 (𝑟 𝑗 ) if 𝑘 𝑚 < 𝑗 ≤ 𝑘 𝑚+1 and 𝑟 𝑗 ∉ 𝑊 ⊥ ,
d
𝑓sml (𝑟 𝑗 ) =
0 otherwise.
Now we check that all the conditions are satisfied.
Structured piece. We have 𝑓str = 𝑓𝑊 where codim 𝑊 ≤ 𝑘 𝑚 ≤ 𝑘 ⌈ 𝜀0−2 ⌉ , which is bounded as
a function of the sequence 𝜀0 ≥ 𝜀 1 ≥ . . . .
Pseudorandom piece. For every 𝑗 > 𝑘 𝑚+1 , we have | b
√
𝑓 (𝑟 𝑗 )| ≤ 1/ 𝑘 𝑚+1 by (6.9), which is
in turn ≤ 𝜀 𝑘𝑚 ≤ 𝜀codim 𝑊 by the definition of 𝑘 𝑚 . It follows that ∥ c 𝑓psr ∥ ≤ 𝜀 codim 𝑊 .
Small piece. By (6.8),
∑︁
∥d
𝑓sml ∥ 22 ≤ |b
𝑓 (𝑟 𝑗 )| 2 ≤ 𝜀02 . □
𝑘𝑚 < 𝑗 ≤ 𝑘𝑚+1

Exercise 6.6.13. Deduce Theorem 6.6.4 from Theorem 6.6.11 by using an appropriate
sequence 𝜀𝑖 and using the same 𝑊 guaranteed by Theorem 6.6.11.
Remark 6.6.14 (Spectral proof of the graph regularity lemma). The proof technique of
Theorem 6.6.11 can be adapted to give an alternate proof of the graph regularity lemma
(along with certain weak and strong variants). Instead of iteratively refining partitions and
tracking energy increments as we did in Chapter 2, we can first take a spectral decomposition
of the adjacency matrix 𝐴 of a graph:
∑︁
𝑛
𝐴= 𝜆𝑖 𝑣 𝑖 𝑣 𝑖⊺ ,
𝑖=1

where 𝑣 1 , . . . , 𝑣 𝑛 is an orthonormal system of eigenvectors with eigenvalues 𝜆1 ≥ · · · ≥ 𝜆 𝑛 .

Then, as in the proof of Theorem 6.6.11, we can decompose 𝐴 as
𝐴 = 𝐴str + 𝐴psr + 𝐴sml
with
∑︁ ∑︁ ∑︁
𝐴str = 𝜆𝑖 𝑣 𝑖 𝑣 𝑖, 𝐴psr = 𝜆𝑖 𝑣 𝑖 𝑣 𝑖, and 𝐴sml = 𝜆𝑖 𝑣 𝑖 𝑣 𝑖,
𝑖≤𝑘 𝑖>𝑘 ′ 𝑘<𝑖 ≤ 𝑘 ′

for some appropriately chosen 𝑘 and 𝑘 similar to the proof of Theorem 6.6.11.
′

We have
∑︁
𝑛
𝜆2𝑖 = tr 𝐴2 ≤ 𝑛2 .
𝑖=1
√
So 𝜆𝑖 ≤ 𝑛/ 𝑖 for each 𝑖. We can guarantee that the spectral norm of 𝐴psr is small enough as
2
Í
a function of 𝑘 and 𝜀. Furthermore, we can guarantee that tr 𝐴sml = 𝑘<𝑖 ≤ 𝑘 ′ 𝜆2𝑖 ≤ 𝜀.
To turn 𝐴str into a vertex partition, we can use the approximate level sets of the top 𝑘
eigenvectors 𝑣 1 , . . . , 𝑣 𝑘 . Some bookkeeping calculations then shows that this is a regularity
partition. Intuitively, 𝐴psr provides us with regular pairs. Some of these regular pairs may not
stay regular after adding 𝐴sml , but since 𝐴sml has ≤ 𝜀 mass (in terms of 𝐿 2 norm), it destroys
at most a negligible fraction of regular pairs.
See Tao (2007a, Lemma 2.11) or Tao’s blog post The Spectral Proof of the Szemerédi
Regularity Lemma (2012) for more details of the proof.
230 Forbidding 3-Term Arithmetic Progressions

The following exercise is the arithmetic analogue of the existence of an 𝜀-regular vertex
subset in a graph (Theorem 2.1.26 and Exercise 2.1.27).
Exercise 6.6.15 (𝜀 -uniform subspace).
(a*) Prove that for every 0 < 𝜀 < 1/2 and 𝐴 ⊆ F2𝑛 , there exists a subspace 𝑊 ⊆ F2𝑛 (note
that 0 ∈ 𝑊) with codimension at most exp(𝐶/𝜀) such that 𝐴 is 𝜀-uniform on 𝑊.
Here 𝐶 is some absolute constant.
(b) Let 𝐴 = {𝑥 ∈ F3𝑛 : there exists 𝑖 such that 𝑥1 = · · · = 𝑥𝑖 = 0, 𝑥𝑖+1 = 1}. Prove that 𝐴
is not 𝑐-uniform on any positive dimensional subspace of F3𝑛 . Here 𝑐 > 0 is some
absolute constant.

6.7 Popular Common Difference

Roth’s theorem has the following qualitative strengthening. Given 𝐴 ⊆ F3𝑛 with density 𝛼,
there is some “popular common difference” 𝑦 ≠ 0 so that the number of 3-APs in 𝐴 with
common difference 𝑦 is ≥ 𝛼3 − 𝑜(1), which is what one expects for a random 𝐴 of density
𝛼. This was proved by Green (2005a) as an application of his arithmetic regularity lemma
(from the previous section).

Theorem 6.7.1 (Roth’s theorem with popular common difference in F3𝑛 )

For all 𝜀 > 0, there exists 𝑛0 = 𝑛0 (𝜀) such that for 𝑛 ≥ 𝑛0 and every 𝐴 ⊆ F3𝑛 with
| 𝐴| = 𝛼3𝑛 , there exists 𝑦 ≠ 0 such that
{𝑥 ∈ F3𝑛 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} ≥ (𝛼3 − 𝜀)3𝑛 .

In particular, Theorem 6.7.1 implies that every 3-AP-free subset of F3𝑛 has size 𝑜(3𝑛 ).
Exercise 6.7.2. Show that it is false that every 𝐴 ⊆ F3𝑛 with | 𝐴| = 𝛼3𝑛 , the number of
pairs (𝑥, 𝑦) ∈ F3𝑛 with 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴 is ≥ (𝛼3 − 𝑜(1))32𝑛 , where 𝑜(1) → 0 as 𝑛 → 0.
We will prove Theorem 6.7.1 via the next result, which concerns the number of 3-APs
with common difference coming from some subspace of bounded codimension, which is
picked via the arithmetic regularity lemma.

Theorem 6.7.3 (Roth’s theorem with common difference in some subspace)

For every 𝜀 > 0, there exists 𝑀 so that for every 𝐴 ⊆ F3𝑛 , there exists a subspace 𝑊 with
codimension at most 𝑀, so that
{(𝑥, 𝑦) ∈ F3𝑛 × 𝑊 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} ≥ (𝛼3 − 𝜀)3𝑛 |𝑊 | .

Proof. By the arithmetic regularity lemma (Theorem 6.6.4), there is some 𝑀 depending
only on 𝜀 and a subspace 𝑊 of F𝑛𝑝 of codimension ≤ 𝑀 so that 𝐴 is 𝜀-uniform on all but at
most 𝜀-fraction of 𝑊-cosets.
Let 𝑢 + 𝑊 be a 𝑊-coset on which 𝐴 is 𝜀-uniform. Denote the density of 𝐴 in 𝑢 + 𝑊 by
| 𝐴 ∩ (𝑢 + 𝑊)|
𝛼𝑢 = .
|𝑊 |
6.7 Popular Common Difference 231

Restricting ourselves inside 𝑢 + 𝑊 for a moment, by the 3-AP counting lemma Lemma 6.2.4,
the number of 3-APs of 𝐴 (including trivial ones) that are contained in 𝑢 + 𝑊 is
|{(𝑥, 𝑦) ∈ (𝑢 + 𝑊) × 𝑊 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}| ≥ (𝛼𝑢3 − 𝜀) |𝑊 | 2 .
Since 𝐴 is 𝜀-uniform on all but at most 𝜀-fraction of 𝑊-cosets, by varying 𝑢 + 𝑊 over all
such cosets, we find that the total number of 3-APs in 𝐴 with common difference in 𝑊 is
{(𝑥, 𝑦) ∈ F3𝑛 × 𝑊 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} ≥ (1 − 𝜀) (𝛼3 − 𝜀)3𝑛 |𝑊 | ≥ (𝛼3 − 2𝜀)3𝑛 |𝑊 | .
This proves the theorem (with 𝜀 replaced by 2𝜀). □
Exercise 6.7.4. Give another proof of Theorem 6.7.3 using Theorem 6.6.11 (arithmetic
regularity decomposition 𝑓 = 𝑓str + 𝑓psr + 𝑓sml ).
Proof of Theorem 6.7.1. First apply Theorem 6.7.3 with find a subspace 𝑊 of codimension
≤ 𝑀 = 𝑀 (𝜀). Choose 𝑛0 = 𝑀 + log3 (1/𝜀). So 𝑛 ≥ 𝑛0 guarantees |𝑊 | ≥ 1/𝜀.
We need to exclude 3-APs with common difference zero. We have
(𝛼3 − 𝜀)3𝑛 |𝑊 | ≤ {(𝑥, 𝑦) ∈ F3𝑛 × 𝑊 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}
= {(𝑥, 𝑦) ∈ F3𝑛 × (𝑊 \ {0}) : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} + | 𝐴| .
We have | 𝐴| ≤ 3𝑛 ≤ 𝜀3𝑛 |𝑊 |, so
(𝛼3 − 2𝜀)3𝑛 |𝑊 | ≤ {(𝑥, 𝑦) ∈ F3𝑛 × (𝑊 \ {0}) : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} .
By averaging, there exists 𝑦 ∈ 𝑊 \ {0} satisfying
{𝑥 ∈ F3𝑛 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴} ≥ (𝛼3 − 2𝜀)3𝑛 .
This proves the theorem (with 𝜀 replaced by 2𝜀). □
By adapting the above proof strategy with Bohr sets, Green (2005a) proved that a Roth’s
theorem with popular differences in finite abelian groups of odd order, as well as in the
integers.

Theorem 6.7.5 (Roth’s theorem with popular difference in finite abelian groups)
For all 𝜀 > 0, there exists 𝑁0 = 𝑁0 (𝜀) such that for all finite abelian groups Γ of odd
order |Γ| ≥ 𝑁0 , and every 𝐴 ⊆ Γ with | 𝐴| = 𝛼 |Γ|, there exists 𝑦 ∈ Γ \ {0} such that
|{𝑥 ∈ Γ : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}| ≥ (𝛼3 − 𝜀) |Γ| .

Theorem 6.7.6 (Roth’s theorem with popular difference in the integers)

For all 𝜀 > 0, there exists 𝑁0 = 𝑁0 (𝜀) such that for every 𝑁 ≥ 𝑁0 , and every 𝐴 ⊆ [𝑁]
with | 𝐴| = 𝛼𝑁, there exists 𝑦 ≠ 0 such that
|{𝑥 ∈ [𝑁] : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦 ∈ 𝐴}| ≥ (𝛼3 − 𝜀)𝑁.

See Tao’s blog post A Proof of Roth’s Theorem (2014) for a proof of Theorem 6.7.6 using
Bohr sets, following an arithmetic regularity decomposition in the spirit of Theorem 6.6.11.
232 Forbidding 3-Term Arithmetic Progressions

Remark 6.7.7 (Bounds). The above proof of Theorem 6.7.1 gives 𝑛0 = tower(𝜀 −𝑂 (1) ).
The bounds Theorems 6.7.5 and 6.7.6 are also tower-type. What is the smallest 𝑛0 (𝜀) for
which Theorem 6.7.1 holds? It turns out to be tower(Θ(log(1/𝜀))), as proved by Fox &
Pham (2019) over finite fields and Fox, Pham, & Zhao (2022) over the integers. Although it
had been known since Gowers (1997) that tower-type bounds are necessary for the regularity
lemmas themselves, Roth’s theorem with popular differences is the first regularity application
where a tower-type bound is shown to be indeed necessary.
Using quadratic Fourier analysis, Green & Tao (2010c) extended the popular difference
result over to 4-APs.
Theorem 6.7.8 (Popular difference for 4-APs)
For all 𝜀 > 0, there exists 𝑁0 = 𝑁0 (𝜀) such that for every 𝑁 ≥ 𝑁0 and 𝐴 ⊆ [𝑁] with
| 𝐴| = 𝛼𝑁, there exists 𝑦 ≠ 0 such that
|{𝑥 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦, 𝑥 + 3𝑦 ∈ 𝐴}| ≥ (𝛼4 − 𝜀)𝑁.

It may be a surprising that such a statement is false for APs of length 5 or longer. This was
shown by Bergelson, Host, & Kra (2005) with an appendix by Ruzsa giving a construction
that is a clever modification of the Behrend construction (Section 2.5).

Theorem 6.7.9 (Popular difference fails for 5-APs)

Let 0 < 𝛼 < 1/2. For all sufficiently large 𝑁, there exists 𝐴 ⊆ [𝑁] with | 𝐴| ≥ 𝛼𝑁 such
that for all 𝑦 ≠ 0,
|{𝑥 : 𝑥, 𝑥 + 𝑦, 𝑥 + 2𝑦, 𝑥 + 3𝑦, 𝑥 + 4𝑦 ∈ 𝐴}| ≤ 𝛼 𝑐 log(1/𝛼) 𝑁.
Here 𝑐 > 0 is some absolute constant.
For more on results of this type, as well as for popular difference for high dimensional
patterns, see Sah, Sawhney, & Zhao (2021).

Further Reading
Green has several excellent surveys and lecture notes:
• Finite Field Models in Additive Combinatorics (2005c) — For many additive combina-
torics problems, it is a good idea to first study them in the finite field setting (also see
the follow up by Wolf (2015)).
• Montreal Lecture Notes on Quadratic Fourier Analysis (2007a)— An introduction to
quadratic Fourier analysis and its application to the popular common difference theorem
for 4-APs in F5𝑛 .
• Lecture notes from his Cambridge course Additive Combinatorics (2009b).
Tao’s FOCS 2007 tutorial Structure and Randomness in Combinatorics (2007a) explains
many facets of arithmetic regularity and applications.
For more on algebraic methods in combinatorics (mostly pre-dating methods in Sec-
tion 6.5), see the books:
• Thirty-three Miniatures by Matoušek (2010);
• Linear Algebra Methods in Combinatorics by Babai & Frankl;
6.7 Popular Common Difference 233

• Polynomial Methods in Combinatorics by Guth (2016);

• Polynomial Methods and Incidence Theory by Sheffer (2022).
In particular, the book Fourier Analysis by Stein & Shakarchi (2003) is a superb un-
dergraduate textbook on Fourier analysis. The analysis viewpoint has different emphases
compared to this chapter, though many standard tools (e.g., Parseval) are common to both. It
is helpful to familiarize with general principles of Fourier analysis, such as the relationship
between smoothness and decay.

Chapter Summary
• Basic tools of discrete Fourier analysis:
– Fourier transform,
– Fourier inversion formula,
– Parsevel / Plancheral identity (unitarity of the Fourier transform),
– convolution identity (Fourier transform converts convolutions to multiplication).
• The finite field model (e.g., F3𝑛 ) offers a convenient playground for Fourier analysis
in additive combinatorics. Many techniques can then be adapted to the integer setting,
although often with additional technicalities.
• Roth’s theorem. Using Fourier analysis, we proved that every 3-AP-free subset has size
at most
– 𝑂 (3𝑛 /𝑛) in F𝑛 , and
– 𝑂 (𝑁/log log 𝑁) in [𝑁] ⊆ Z.
• The Fourier analytic proof of Roth’s theorem (both in F3𝑛 and in Z) proceeds via a density
increment argument:
(1) A 3-AP-free set has a large Fourier coefficient;
(2) A large Fourier coefficient implies density increment on some hyperplane (in F3𝑛 ) or
subprogression (in Z);
(3) Iterate the density increment.
• Using the polynomial method, we showed that every 3-AP-free subset of F3𝑛 has size
𝑂 (2.76𝑛 ).
• Arithmetic regularity lemma. Given 𝐴 ⊆ F𝑛𝑝 , we can find a bounded codimensional
subspace so that 𝐴 is Fourier-uniform on almost all cosets.
– An application: Roth’s theorem with popular difference. For every 𝐴 ⊆ F3𝑛 , there is
some “popular 3-AP common difference” with frequency at least nearly as much as if
𝐴 were random.
7

Structure of Set Addition

Chapter Highlights
• Freiman’s theorem: structure of sets with small doubling
• Inequalities between sizes of sumsets: Ruzsa triangle inequality and Plünnecke’s inequality
• Ruzsa covering lemma
• Freiman homomorphisms: preserving partial additive structure
• Ruzsa modeling lemma
• Structure in iterated sumsets: Bogolyubov’s lemma
• Geometry of numbers: Minkowski’s second theorem
• Polynomial Freiman–Ruzsa conjecture
• Additive energy and the Balog–Szemerédi–Gowers theorem

Let 𝐴 and 𝐵 be finite subsets of some ambient abelian group. We define their sumset to
be
𝑨 + 𝑩 := {𝑎 + 𝑏 : 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵} .
Note that we view 𝐴 + 𝐵 as a set, and do not keep track of the number of ways that each
element can be written as 𝑎 + 𝑏.
The main goal of this chapter is to understand the following question.

Question 7.0.1 (Sets with small doubling)

What can we say about 𝐴 if 𝐴 + 𝐴 is small?

One of the main goals of this chapter is to prove Freiman’s theorem, which is a deep and
foundational result in additive combinatorics. Freiman’s theorem tells us whenever 𝐴 + 𝐴 is
at most a constant factor larger than 𝐴, then 𝐴 must be a large fraction of some generalized
arithmetic progression.
Most of this chapter will be devoted towards proving Freiman’s theorem. We will see ideas
and tools from Fourier analysis, geometry of numbers, and additive combinatorics.
In Section 7.13, we will introduce the additive energy of a set, which is another way to
measure the additive structure of a set. We will see the Balog–Szemerédi–Gowers theorem,
which relates additive energy and doubling. This section can be read independently from the
earlier parts of the chapter.
These results on the structure of set addition are not only interesting on their own, but also
play a key role in Gowers’ proof (2001) of Szemerédi’s theorem (although we do not cover it
in this book; see Further Reading at the end of the chapter). Gowers’ deep and foundational
work shows how these topics in additive combinatorics are all highly connected.

235
236 Structure of Set Addition

Definition 7.0.2 (Sumset notation)

Given a positive integer 𝑘, we define the iterated sumset
𝒌 𝑨 := 𝐴 + · · · + 𝐴 (𝑘 times).
This is different from dilating a set, which is denoted by
𝝀 · 𝑨 := {𝜆𝑎 : 𝑎 ∈ 𝐴} .
We also consider the difference set
𝑨 − 𝑩 = {𝑎 − 𝑏 : 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵} .

7.1 Sets of Small Doubling: Freiman’s Theorem

How small or large can 𝐴 + 𝐴 be given | 𝐴|? This is an easy question to answer.

Proposition 7.1.1 (Easy bounds on sumset size)

Let 𝐴 ⊆ Z be a finite set. Then

| 𝐴| + 1
2 | 𝐴| − 1 ≤ | 𝐴 + 𝐴| ≤ .
2
Furthermore, both bounds are best possible as functions of | 𝐴|.

Proof. Let 𝑛 = | 𝐴|. For the lower bound | 𝐴 + 𝐴| ≥ 2𝑛 − 1, note that if the elements of 𝐴 are
𝑎 1 < 𝑎 2 < · · · < 𝑎 𝑛 , then
𝑎1 + 𝑎1 < 𝑎1 + 𝑎2 < · · · < 𝑎1 + 𝑎 𝑛 < 𝑎2 + 𝑎 𝑛 < · · · < 𝑎 𝑛 + 𝑎 𝑛
are 2𝑛 − 1 distinct elements of 𝐴 + 𝐴. So | 𝐴 + 𝐴| ≥ 2𝑛 − 1. Equality is attained when 𝐴 is an
arithmetic progression.
The upper bound | 𝐴 + 𝐴| ≤ 𝑛+1 2
follows from that there are 𝑛+12
unordered pairs of
elements of 𝐴. We have equality when there are no nontrivial solutions to 𝑎 + 𝑏 = 𝑐 + 𝑑 in
𝐴, such as when 𝐴 consists of powers of twos. □
Exercise 7.1.2 (Sumsets in abelian groups). Show that if 𝐴 is a finite subset of an abelian
group, then | 𝐴 + 𝐴| ≥ | 𝐴|, with equality if and only if 𝐴 is the coset of some subgroup.
What can we say about 𝐴 if 𝐴 + 𝐴 is not too much larger than 𝐴?

Definition 7.1.3 (Doubling constant)

The doubling constant of a finite subset 𝐴 in an abelian group is the ratio | 𝐴 + 𝐴|/| 𝐴|.

One of the main results of this chapter, Freiman’s theorem, addresses the following ques-
tion.

Question 7.1.4 (Sets of small doubling)

What is the structure of a set with bounded doubling constant (e.g. | 𝐴 + 𝐴| ≤ 100 | 𝐴|)?

We’ve already seen an example of such a set in Z, namely arithmetic progressions.

7.1 Sets of Small Doubling: Freiman’s Theorem 237

Example 7.1.5. If 𝐴 ⊆ Z is a finite arithmetic progression, | 𝐴 + 𝐴| = 2 | 𝐴| − 1 ≤ 2 | 𝐴|, so

it has doubling constant at most 2.
Moreover if we delete some elements of an arithmetic progression, it should still have
small doubling. In fact, if we delete even most of the elements of an arithmetic progression
but leave a constant fraction of the progression remaining, we will have small doubling.
Example 7.1.6. If 𝐵 is a finite arithmetic progression and 𝐴 ⊆ 𝐵 has | 𝐴| ≥ |𝐵| /𝐾, then
| 𝐴 + 𝐴| ≤ |𝐵 + 𝐵| ≤ 2 |𝐵| ≤ 2𝐾 | 𝐴|, so 𝐴 has doubling constant at most 2𝐾.
Now we generalize arithmetic progressions to allow multiple dimensions. Informally, we
consider affine images of 𝑑-dimensional “grids”, as illustrated below.

−→

Z2 Z

Definition 7.1.7 (GAP — generalized arithmetic progression)

A generalized arithmetic progression (GAP) in an abelian group Γ is defined to be an
affine map
𝜙 : [𝐿 1 ] × · · · × [𝐿 𝑑 ] → Γ.
That is, for some 𝑎 0 , . . . , 𝑎 𝑑 ∈ Γ,
𝜙(𝑥1 , . . . , 𝑥 𝑑 ) = 𝑎 0 + 𝑎 1 𝑥1 + · · · + 𝑎 𝑑 𝑥 𝑑 .
This GAP has dimension 𝑑 and volume 𝐿 1 · · · 𝐿 𝑑 . We say that this GAP is proper if 𝜙 is
injective.

We often abuse notation and use the term GAP to refer to the image of 𝜙, viewed as a set:
𝑎 0 + 𝑎 1 · [𝐿 1 ] + · · · + 𝑎 𝑑 · [𝐿 𝑑 ] = {𝑎 0 + 𝑎 1 𝑥1 + · · · + 𝑎 𝑑 𝑥 𝑑 : 𝑥 1 ∈ [𝐿 1 ], . . . , 𝑥 𝑑 ∈ [𝐿 𝑑 ]} .
Example 7.1.8. A proper GAP of dimension 𝑑 has doubling constant ≤ 2𝑑 .
Example 7.1.9. Let 𝑃 be a proper GAP of dimension 𝑑. Let 𝐴 ⊆ 𝑃 with | 𝐴| ≥ |𝑃| /𝐾. Then
𝐴 has doubling constant ≤ 𝐾2𝑑 .
While it is often easy to check that certain sets have small doubling, the inverse problem
is much more difficult. We would like to characterize all sets with small doubling. The
following foundational result by Freiman (1973) shows that all sets with bounded doubling
must look like Example 7.1.9.

Theorem 7.1.10 (Freiman’s theorem)

Let 𝐴 ⊆ Z be a finite set satisfying | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|. Then 𝐴 is contained in a GAP
of dimension at most 𝑑 (𝐾) and volume at most 𝑓 (𝐾) | 𝐴|, where 𝑑 (𝐾) and 𝑓 (𝐾) are
constants depending only on 𝐾.

Freiman’s theorem is a deep result. We will spend most the chapter proving it.
238 Structure of Set Addition

Remark 7.1.11 (Quantitative bounds). We will present a proof giving 𝑑 (𝐾) = exp(𝐾 𝑂 (1) )
and 𝑓 (𝐾) = exp(𝑑 (𝐾)), due to Ruzsa (1994). Chang (2002) showed that Freiman’s theorem
holds with 𝑑 (𝐾) = 𝐾 𝑂 (1) and 𝑓 (𝐾) = exp(𝑑 (𝐾)) (see Exercise 7.11.2). Schoen (2011)
further improved the bounds to 𝑑 (𝐾) = 𝐾 1+𝑜(1) and 𝑓 (𝐾) = exp(𝐾 1+𝑜(1) ). Sanders (2012,
2013) showed that if we change GAPs to “convex progressions” (see Section 7.12), then an
analogous theorem holds with 𝑑 (𝐾) = 𝐾 (log(2𝐾)) 𝑂 (1) and 𝑓 (𝐾) = exp(𝑑 (𝐾)).
It is easy to see that one cannot do better than 𝑑 (𝐾) ≤ 𝐾 − 1 and 𝑓 (𝐾) = 𝑒 𝑂 (𝐾 ) , by
considering a set without additive structure.
Also see Section 7.12 on the polynomial Freiman–Ruzsa conjecture for a variant of
Freiman’s theorem with much better quantitative dependencies.
Remark 7.1.12 (Making the GAP proper). The conclusion of Freiman’s theorem can be
strengthened to force the GAP to be proper, at the cost of potentially increasing 𝑑 (𝐾) and
𝑓 (𝐾). For example, it is known that every GAP of dimension 𝑑 is contained in some proper
3
GAP of dimension ≤ 𝑑 with at most 𝑑 𝑂 (𝑑 ) factor increase in the volume; see Tao & Vu
(2006, Theorem 3.40).
Remark 7.1.13 (History). Freiman’s original proof (1973) was quite complicated. Ruzsa
(1994) later found a simpler proof, which guided much of the subsequent work. We follow
Ruzsa’s presentation here. Theorem 7.1.10 is sometimes called the Freiman–Ruzsa theorem.
Freiman’s theorem was brought into further prominence due to the role it played in the new
proof of Szemerédi’s theorem by Gowers (2001).
Remark 7.1.14 (Freiman’s theorem in abelian groups). Green & Ruzsa (2007) proved a
generalization of Freiman’s theorem in an arbitrary abelian group. A coset progression is a
set of the form 𝑃 + 𝐻 where 𝑃 is a GAP and 𝐻 is a subgroup of the ambient abelian group.
Define the dimension of this coset progression to be the dimension of 𝑃, and its volume to
be |𝐻| vol 𝑃. Green & Ruzsa (2007) proved the following theorem.
Theorem 7.1.15 (Freiman’s theorem for general abelian groups)
Let 𝐴 be a subset of an abelian group satisfying | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|. Then 𝐴 is contained
in a coset progression of dimension at most 𝑑 (𝐾) and volume at most 𝑓 (𝑘) | 𝐴|, where
𝑑 (𝐾) and 𝑓 (𝐾) are constants depending only on 𝐾.

7.2 Sumset Calculus I: Ruzsa Triangle Inequality

Here are some basic and useful inequalities relating the sizes of sumsets.
Theorem 7.2.1 (Ruzsa triangle inequality)
If 𝐴, 𝐵, 𝐶 are finite subsets of an abelian group, then
| 𝐴| |𝐵 − 𝐶 | ≤ | 𝐴 − 𝐵| | 𝐴 − 𝐶 | .
Proof. For each 𝑑 ∈ 𝐵 − 𝐶, define b (𝑑) ∈ 𝐵 and c (𝑑) ∈ 𝐶 such that 𝑑 = b (𝑑) − c (𝑑). In
other words, we fix a specific choice of 𝑏 and 𝑐 for each element in 𝐵 − 𝐶. Define
𝜙 : 𝐴 × (𝐵 − 𝐶) −→ ( 𝐴 − 𝐵) × ( 𝐴 − 𝐶)
(𝑎, 𝑑) ↦−→ (𝑎 − b (𝑑), 𝑎 − c (𝑑)).
7.3 Sumset Calculus II: Plünnecke’s Inequality 239

Then 𝜙 is injective since we can recover (𝑎, 𝑑) from 𝜙(𝑎, 𝑑) = (𝑥, 𝑦) via 𝑑 = 𝑦 − 𝑥 and then
𝑎 = 𝑥 + b (𝑑). □
Remark 7.2.2. By replacing 𝐵 with −𝐵 and/or 𝐶 with −𝐶, Theorem 7.2.1 implies some
additional sumset inequalities:
| 𝐴| |𝐵 + 𝐶 | ≤ | 𝐴 + 𝐵| | 𝐴 − 𝐶 | ;
| 𝐴| |𝐵 + 𝐶 | ≤ | 𝐴 − 𝐵| | 𝐴 + 𝐶 | ;
| 𝐴| |𝐵 − 𝐶 | ≤ | 𝐴 + 𝐵| | 𝐴 + 𝐶 | .
However, this trick cannot be used to prove the similarly looking inequality
| 𝐴| |𝐵 + 𝐶 | ≤ | 𝐴 + 𝐵| | 𝐴 + 𝐶 | .
This inequality is also true, and we will prove it in the following section.
Remark 7.2.3 (Why is it called a triangle inequality?). If we define
| 𝐴 − 𝐵|
𝜌( 𝐴, 𝐵) := log √︁
| 𝐴| |𝐵|
(called a Ruzsa distance), then Theorem 7.2.1 can be rewritten as
𝜌(𝐵, 𝐶) ≤ 𝜌( 𝐴, 𝐵) + 𝜌( 𝐴, 𝐶).
This is why Theorem 7.2.1 is called a “triangle inequality.” However, one should not take
the name too seriously. The function 𝜌 is not a metric because 𝜌( 𝐴, 𝐴) ≠ 0 in general.
Exercise 7.2.4 (Iterated sumsets). Let 𝐴 be a finite ssubset of an abelian group satisfying
|2𝐴 − 2𝐴| ≤ 𝐾 | 𝐴| .
Prove that
|𝑚 𝐴 − 𝑚 𝐴| ≤ 𝐾 𝑚−1 | 𝐴| for every integer 𝑚 ≥ 2.
In the above exercise, we had to start with the assumption that |2𝐴 − 2𝐴| ≤ 𝐾 | 𝐴|. In
the next section, we bound the sizes of iterated sumsets starting with the weaker hypothesis
| 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|.

7.3 Sumset Calculus II: Plünnecke’s Inequality

We prove the following result, which says that having small doubling implies small iterated
sumsets, with only a polynomial factor change in the expansion ratios.

Theorem 7.3.1 (Plünnecke’s inequality)

Let 𝐴 be a finite subset of an abelian group satisfying
| 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| .
Then for all integers 𝑚, 𝑛 ≥ 0,
|𝑚 𝐴 − 𝑛𝐴| ≤ 𝐾 𝑚+𝑛 | 𝐴| .
240 Structure of Set Addition

Remark 7.3.2 (History). Plünnecke (1970) proved a version of the theorem originally using
graph theoretic methods. Ruzsa (1989) gave a simpler version of Plünnecke’s proof and also
extended it from sums to differences. Nevertheless, Ruzsa’s proof was still quite long and
complex. It sets up a “commutative layered graph”, and uses tools from graph theory including
Menger’s theorem. Theorem 7.3.1 is sometimes called the Plünnecke–Ruzsa inequality. See
Ruzsa (2009, Chapter 1) or Tao & Vu (2006, Chapter 6) for an account of this proof.
In a surprising breakthrough, Petridis (2012) found a very short proof of the result, which
we present here.
We will prove the following more general statement. Theorem 7.3.1 is the special case
𝐴 = 𝐵.
Theorem 7.3.3 (Plünnecke’s inequality)
Let 𝐴 and 𝐵 be finite subsets of an abelian group satisfying
| 𝐴 + 𝐵| ≤ 𝐾 | 𝐴| .
Then for all integers 𝑚, 𝑛 ≥ 0,
|𝑚𝐵 − 𝑛𝐵| ≤ 𝐾 𝑚+𝑛 | 𝐴| .

The following lemma plays a key role in the proof.

Lemma 7.3.4 (Expansion ratio bounds)

Let 𝑋 and 𝐵 be finite subsets of an abelian group, with |𝑋 | > 0. Suppose
|𝑌 + 𝐵| |𝑋 + 𝐵|
≥ for all nonempty 𝑌 ⊆ 𝑋.
|𝑌 | |𝑋 |
Then for any nonempty finite subsets 𝐶 of the abelian group,
|𝑋 + 𝐶 + 𝐵| |𝑋 + 𝐵|
≤ .
|𝑋 + 𝐶 | |𝑋 |

Remark 7.3.5 (Interpretation as expansion ratios). We can interpret Lemma 7.3.4 in terms
of vertex expansion ratios inside the bipartite graph between two copies of the ambient
abelian group, with edges (𝑥, 𝑥 + 𝑏) ranging over all 𝑥 ∈ Γ and 𝑏 ∈ 𝐵. Every vertex subset 𝑋
on the left has neighbors 𝑋 + 𝐵 on the right, and thus has vertex expansion ratio |𝑋 + 𝐵| /|𝑋 |.

+𝐵

𝐴
𝑋+𝐵
𝑋

𝑋+𝐶
𝑋+𝐶 +𝐵
7.3 Sumset Calculus II: Plünnecke’s Inequality 241

We will apply Lemma 7.3.4 by choosing 𝑋 among all nonempty subsets of 𝐴 with the
minimum expansion ratio, so that the hypothesis of Lemma 7.3.4 is automatically satisfied.
The conclusion of Lemma 7.3.4 then says that a union of translates of 𝑋 has expansion ratio
at most that of 𝑋.
Proof of Theorem 7.3.3 given Lemma 7.3.4. Choose 𝑋 among all nonempty subsets of 𝐴
with the minimum |𝑋 + 𝐵| /|𝑋 | so that the hypothesis of Lemma 7.3.4 is satisfied. Also we
have
|𝑋 + 𝐵| | 𝐴 + 𝐵|
≤ ≤ 𝐾.
|𝑋 | | 𝐴|
For every integer 𝑛 ≥ 0, applying Lemma 7.3.4 with 𝐶 = 𝑛𝐵, we have
|𝑋 + (𝑛 + 1)𝐵| |𝑋 + 𝐵|
≤ ≤ 𝐾.
|𝑋 + 𝑛𝐵| |𝑋 |
So induction on 𝑛 yields, for all 𝑛 ≥ 0,
|𝑋 + 𝑛𝐵| ≤ 𝐾 𝑛 |𝑋 | .
Finally, applying the Ruzsa triangle inequality (Theorem 7.2.1), for all 𝑚, 𝑛 ≥ 0.
|𝑋 + 𝑚𝐵| |𝑋 + 𝑛𝐵|
|𝑚𝐵 − 𝑛𝐵| ≤ ≤ 𝐾 𝑚+𝑛 |𝑋 | ≤ 𝐾 𝑚+𝑛 | 𝐴| . □
|𝑋 |
Proof of Lemma 7.3.4. We will proceed by induction on |𝐶 |. For the base case |𝐶 | = 1, note
that 𝑋 + 𝐶 is a translate of 𝑋, so |𝑋 + 𝐶 + 𝐵| = |𝑋 + 𝐵| and |𝑋 + 𝐶 | = |𝑋 |.
Now for the induction step, assume that for some 𝐶,
|𝑋 + 𝐶 + 𝐵| |𝑋 + 𝐵|
≤ .
|𝑋 + 𝐶 | |𝑋 |
Now consider 𝐶 ∪ {𝑐} for some 𝑐 ∉ 𝐶. We wish to show that
|𝑋 + (𝐶 ∪ {𝑐}) + 𝐵| |𝑋 + 𝐵|
≤ .
|𝑋 + (𝐶 ∪ {𝑐})| |𝑋 |
By comparing the change in the left-hand side fraction, it suffices to show that
|𝑋 + 𝐵|
|(𝑋 + 𝑐 + 𝐵) \ (𝑋 + 𝐶 + 𝐵)| ≤ |(𝑋 + 𝑐) \ (𝑋 + 𝐶)| . (7.1)
|𝑋 |
Let
𝑌 = {𝑥 ∈ 𝑋 : 𝑥 + 𝑐 + 𝐵 ⊆ 𝑋 + 𝐶 + 𝐵} ⊆ 𝑋.
Then
|(𝑋 + 𝑐 + 𝐵) \ (𝑋 + 𝐶 + 𝐵)| ≤ |𝑋 + 𝐵| − |𝑌 + 𝐵| .
Furthermore, if 𝑥 ∈ 𝑋 satisfies 𝑥 + 𝑐 ∈ 𝑋 + 𝐶, then 𝑥 + 𝑐 + 𝐵 ⊆ 𝑋 + 𝐶 + 𝐵 and hence 𝑥 ∈ 𝑌 . So
|(𝑋 + 𝑐) \ (𝑋 + 𝐶)| ≥ |𝑋 | − |𝑌 | .
Thus, to prove (7.1), it suffices to show
|𝑋 + 𝐵|
|𝑋 + 𝐵| − |𝑌 + 𝐵| ≤ (|𝑋 | − |𝑌 |) ,
|𝑋 |
242 Structure of Set Addition

which can be rewritten as

|𝑋 + 𝐵|
|𝑌 + 𝐵| ≥ |𝑌 | ,
|𝑋 |
which is true due to the hypothesis on 𝑋. □
Let us give a quick proof of a variant of the Ruzsa triangle inequality, mentioned in
Remark 7.2.2.

Corollary 7.3.6 (Another triangle inequality)

Let 𝐴, 𝐵, 𝐶 be finite subsets of an abelian group. Then
| 𝐴| |𝐵 + 𝐶 | ≤ | 𝐴 + 𝐵| | 𝐴 + 𝐶 | .

Proof. Choose 𝑋 ⊆ 𝐴 to minimize |𝑋 + 𝐵| /|𝑋 |. Then

Lem. 7.3.4 |𝑋 + 𝐵| | 𝐴 + 𝐵|
|𝐵 + 𝐶 | ≤ |𝑋 + 𝐵 + 𝐶 | ≤ |𝑋 + 𝐶 | ≤ |𝐴 + 𝐶| . □
|𝑋 | | 𝐴|
Exercise 7.3.7∗. Show that for every sufficiently large 𝐾 there is some finite set 𝐴 ⊆ Z
such that
| 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| and | 𝐴 − 𝐴| ≥ 𝐾 1.99 | 𝐴| .

Exercise 7.3.8∗ (Loomis–Whitney for sumsets). Show that for every finite subsets 𝐴, 𝐵, 𝐶
in an abelian group, one has
| 𝐴 + 𝐵 + 𝐶 | 2 ≤ | 𝐴 + 𝐵| | 𝐴 + 𝐶 | |𝐵 + 𝐶 | .

Exercise 7.3.9∗ (Sumset vs. difference set). Let 𝐴 ⊆ Z. Prove that

| 𝐴 − 𝐴| 2/3 ≤ | 𝐴 + 𝐴| ≤ | 𝐴 − 𝐴| 3/2 .

7.4 Covering Lemma

Here is a simple and powerful tool in the study of sumsets (Ruzsa 1999).

Theorem 7.4.1 (Ruzsa covering lemma)

Let 𝑋 and 𝐵 be finite sets in some abelian group. If
|𝑋 + 𝐵| ≤ 𝐾 |𝐵| ,
then there exists a subset 𝑇 ⊆ 𝑋 with |𝑇 | ≤ 𝐾 such that
𝑋 ⊆ 𝑇 + 𝐵 − 𝐵.

Remark 7.4.2 (Geometric intuition). Imagine that 𝐵 is a unit ball in R𝑛 , and cardinality
above is replaced by volume. Given some region 𝑋 (the shaded region below), consider a
maximal set T of disjoint union balls with centers in 𝑋 (maximal in the sense that one cannot
add an additional ball without intersecting some other ball).
7.5 Freiman’s Theorem in Groups with Bounded Exponent 243

Then replacing each ball in T by a ball of radius 2 with the same center, (i.e., replacing
𝐵 by 𝐵 − 𝐵) the resulting balls must cover the region 𝑋 (which amounts to the conclusion
𝑋 ⊆ 𝑇 + 𝐵 − 𝐵), for otherwise at any uncovered point of 𝑋 we could have added an additional
non-overlapping ball in the previous step.

Similar arguments are important in analysis (e.g., the Vitali covering lemma).
Proof. Let 𝑇 ⊆ 𝑋 be a maximal subset such that 𝑡 + 𝐵 as 𝑡 ranges over 𝑇 are disjoint. Then
|𝑇 | |𝐵| = |𝑇 + 𝐵| ≤ |𝑋 + 𝐵| ≤ 𝐾 |𝐵| .
So |𝑇 | ≤ 𝐾.
By the maximality of 𝑇, for all 𝑥 ∈ 𝑋 there exists some 𝑡 ∈ 𝑇 such that (𝑡 + 𝐵) ∩ (𝑥 + 𝐵) ≠ ∅.
In other words, there exist 𝑡 ∈ 𝑇 and 𝑏, 𝑏 ′ ∈ 𝐵 such that 𝑡 + 𝑏 = 𝑥 + 𝑏 ′ . Hence 𝑥 ∈ 𝑇 + 𝐵 − 𝐵
for every 𝑥 ∈ 𝑋. Thus 𝑋 ⊆ 𝑇 + 𝐵 − 𝐵. □
The following “more efficient” covering lemma can be used to prove a better bound in
Freiman’s theorem.
Exercise 7.4.3∗ (Chang’s covering lemma). Let 𝐴 and 𝐵 be finite sets in an abelian group
satisfying
| 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| and | 𝐴 + 𝐵| ≤ 𝐾 ′ |𝐵| .
Show that there exists some set 𝑋 in the abelian group so that
𝐴 ⊆ Σ𝑋 + 𝐵 − 𝐵 and |𝑋 | = 𝑂 (𝐾 log(𝐾𝐾 ′ )),
where Σ𝑋 denotes the set of all elements that can be written as the sum of a subset of
elements of 𝑋 (including zero as the sum of the empty set).
Hint: Try first finding 2𝐾 disjoint translates 𝑎 + 𝐵.

7.5 Freiman’s Theorem in Groups with Bounded Exponent

Let us prove a finite field model analogue of Freiman’s theorem. The proof only uses the
tools introduced so far, and so it is easier than Freiman’s theorem in the integers.
244 Structure of Set Addition

Theorem 7.5.1 (Freiman’s theorem in F2𝑛 )

If 𝐴 ⊆ F2𝑛 has | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then 𝐴 is contained in a subspace of cardinality at most
𝑓 (𝐾) | 𝐴|, where 𝑓 (𝐾) is a constant depending only on 𝐾.
4
Remark 7.5.2 (Quantitative bounds). We will prove Theorem 7.5.1 with 𝑓 (𝐾) = 2𝐾 𝐾 2 .
The exact optimal constant 𝑓 (𝐾) is known for each 𝐾 (Even-Zohar 2012). Asymptotically,
it is 𝑓 (𝐾) = Θ(22𝐾 /𝐾).
For a matching lower bound on 𝑓 (𝐾), let 𝐴 = {0, 𝑒 1 , . . . , 𝑒 𝑛 } ⊆ F2𝑛 , where 𝑒 𝑖 is the 𝑖-th
standard basis vector. Then | 𝐴 + 𝐴| ∼ 𝑛2 /2, and so | 𝐴 + 𝐴| /| 𝐴| ∼ 𝑛/2. However, 𝐴 is not
contained in a subspace of cardinality less than 2𝑛 .
In fact, we prove a more general statement that works for any group with bounded exponent.
This result and proof are due to Ruzsa (1999).

Definition 7.5.3 (Exponent of an abelian group)

The exponent of an abelian group (written additively) is the smallest positive integer
𝑟 such that 𝑟𝑥 = 0 for all elements 𝑥 of the group. If no finite 𝑟 exists, we say that its
exponent is infinite (some conventions say that the exponent is zero).

For example, F2𝑛 has exponent 2. The cyclic group Z/𝑁Z has exponent 𝑁. The integers Z
has infinite exponent.
We use ⟨𝑨⟩ to refer to the subgroup of a group 𝐺 generated by some subset 𝐴 of 𝐺. Then
the exponent of a group 𝐺 is sup 𝑥 ∈𝐺 |⟨𝑥⟩|. When the group is a vector space (e.g., F2𝑛 ), ⟨𝐴⟩
is the smallest subspace containing 𝐴.

Theorem 7.5.4 (Freiman’s theorem in groups with bounded exponent)

Let 𝐴 be a finite set in an abelian group with exponent 𝑟 < ∞. If | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then
4
|⟨𝐴⟩| ≤ 𝐾 2 𝑟 𝐾 | 𝐴| .

Remark 7.5.5. This theorem is a converse of the observation that if 𝐴 is a large fraction of
a subgroup, then 𝐴 has small doubling.
Proof. By Plünnecke’s inequality (Theorem 7.3.1), we have
| 𝐴 + (2𝐴 − 𝐴)| = |3𝐴 − 𝐴| ≤ 𝐾 4 | 𝐴| .
By the Ruzsa covering lemma (Theorem 7.4.1 applied with 𝑋 = 2𝐴 − 𝐴 and 𝐵 = 𝐴), there
exists some 𝑇 ⊆ 2𝐴 − 𝐴 with |𝑇 | ≤ | 𝐴 + (2𝐴 − 𝐴)| /| 𝐴| ≤ 𝐾 4 such that
2𝐴 − 𝐴 ⊆ 𝑇 + 𝐴 − 𝐴.
Adding 𝐴 to both sides, we have,
3𝐴 − 𝐴 ⊆ 𝑇 + 2𝐴 − 𝐴 ⊆ 2𝑇 + 𝐴 − 𝐴.
Iterating, for any positive integer 𝑛, we have
(𝑛 + 1) 𝐴 − 𝐴 ⊆ 𝑛𝑇 + 𝐴 − 𝐴 ⊆ ⟨𝑇⟩ + 𝐴 − 𝐴.
7.6 Freiman Homomorphisms 245

Since we are in an abelian group with bounded exponent, every element of ⟨𝐴⟩ lies in 𝑛𝐴
for some n. Thus
Ø
⟨𝐴⟩ ⊆ (𝑛𝐴 + 𝐴 − 𝐴) ⊆ ⟨𝑇⟩ + 𝐴 − 𝐴.
𝑛≥1

Since the exponent of the group is at most 𝑟 < ∞,

4
|⟨𝑇⟩| ≤ 𝑟 |𝑇 | ≤ 𝑟 𝐾 .
By Plünnecke’s inequality (Theorem 7.3.1),
| 𝐴 − 𝐴| ≤ 𝐾 2 | 𝐴| .
Thus we have,
4
|⟨𝐴⟩| ≤ 𝑟 𝐾 𝐾 2 | 𝐴| . □
Remark 7.5.6. Note the crucial use of the Ruzsa covering lemma for controlling 𝑛𝐴 − 𝐴.
Naively bounding 𝑛𝐴 using Plünnecke’s inequality is insufficient.
The above proof for Freiman’s theorem over abelian groups of finite exponent does not
immediately generalize to the integers. Indeed, in Z, |⟨𝑇⟩| = ∞. We overcome this issue
by representing subsets of Z inside a finite group in a way that partially preserves additive
structure.
Exercise 7.5.7. Show that for every real 𝐾 ≥ 1 there is some 𝐶𝐾 such that for every finite
set 𝐴 of an abelian group with | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, one has |𝑛𝐴| ≤ 𝑛𝐶𝐾 | 𝐴| for every positive
integer 𝑛.
(If we let 𝑓 (𝑛, 𝐾) denote the smallest real number so that | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| implies |𝑛𝐾 | ≤ 𝑓 (𝑛, 𝐾) | 𝐴|,
then Plünnecke’s inequality gives 𝑓 (𝑛, 𝐾) ≤ 𝐾 𝑛 , at most a polynomial in 𝐾 for a fixed 𝑛, whereas the above
exercise gives 𝑓 (𝑛, 𝐾) ≤ 𝑛𝐶𝐾 , a polynomial in 𝑛 for a fixed 𝐾. Does this mean that 𝑓 (𝑛, 𝐾) is at most some
polynomial in both 𝑛 and 𝐾?)

Exercise 7.5.8∗ (Ball volume growth in an abelian Cayley graph). Show that there is
some absolute constant 𝐶 so that if 𝑆 is a finite subset of an abelian group, and 𝑘 is a
positive integer, then
|2𝑘 𝑆| ≤ 𝐶 |𝑆 | |𝑘 𝑆| .

7.6 Freiman Homomorphisms

Consider two sets of integers, depicted pictorially below as elements on the number line:
𝐴=
𝐵=

The two sets are very similar from the point of view of additive structure. For example, the
obvious bĳection between 𝐴 and 𝐵 has the nice property that any solution to the equation
𝑤 + 𝑥 = 𝑦 + 𝑧 in one set is automatically a solution in the other. Sometimes, in additive
combinatorics, it is a good idea to treat these two sets as isomorphic. Let us define this
246 Structure of Set Addition

notion formally and study what it means for a map between sets to partially preserve additive
structure.

Definition 7.6.1 (Freiman homomorphism)

Let 𝐴 and 𝐵 be subsets in two possibly different abelian groups. Let 𝑠 ≥ 2 be a positive
integer. We say that 𝜙 : 𝐴 → 𝐵 is a Freiman 𝒔-homomorphism (or Freiman homomor-
phism of order 𝒔), if
𝜙(𝑎 1 ) + · · · + 𝜙(𝑎 𝑠 ) = 𝜙(𝑎 1′ ) + · · · + 𝜙(𝑎 ′𝑠 )
whenever 𝑎 1 , . . . , 𝑎 𝑠 , 𝑎 1′ , . . . , 𝑎 ′𝑠 ∈ 𝐴 satisfy
𝑎 1 + · · · + 𝑎 𝑠 = 𝑎 1′ + · · · + 𝑎 ′𝑠 .
We say that 𝜙 is a Freiman 𝒔-isomorphism if 𝜙 is a bĳection, and both 𝜙 and 𝜙 −1 are
Freiman 𝑠-homomorphisms. We say that 𝐴 and 𝐵 are Freiman 𝒔-isomorphic if there
exists a Freiman 𝑠-isomorphism between them.

Remark 7.6.2 (Interpretation). Informally, a Freiman 𝑠-homomorphism respects 𝑠-fold

sums relations. Two sets are Freiman 𝑠-isomorphic if there is a bĳection between them
that respects solutions to the equation 𝑎 1 + · · · + 𝑎 𝑠 = 𝑎 1′ + · · · + 𝑎 ′𝑠 .
Remark 7.6.3 (Composition). If 𝜙1 and 𝜙2 are both Freiman 𝑠-homomorphisms, then their
composition 𝜙1 ◦ 𝜙2 is also a Freiman 𝑠-homomorphism. If 𝜙1 and 𝜙2 are both Freiman
𝑠-isomorphisms, then their composition 𝜙1 ◦ 𝜙2 is a Freiman 𝑠-isomorphism.
Remark 7.6.4 (Descension). Every Freiman (𝑠 + 1)-homomorphism is automatically a
Freiman 𝑠-homomorphism (by setting 𝑎 𝑠+1 = 𝑎 ′𝑠+1 ). Likewise, every Freiman (𝑠 + 1)-
isomorphism is automatically a Freiman 𝑠-isomorphism.
Example 7.6.5 (Freiman homomorphism).
(a) Every abelian group homomorphism is a Freiman homomorphism of every order.
(b) Let 𝑆 be a set with no non-trivial solutions to 𝑎 + 𝑏 = 𝑐 + 𝑑 (such a set is called a Sidon
set). Then every map from 𝑆 to an abelian group is a Freiman 2-homomorphism.
(c) The natural embedding 𝜙 : {0, 1} 𝑛 → (Z/2Z) 𝑛 is the restriction of a group homo-
morphism from Z𝑛 , so it is a Freiman homomorphism of every order. This map 𝜙 is
a bĳection. However, the inverse of 𝜙 does not preserve some additive relations (e.g.,
1 + 1 = 0 + 0 (mod 2)). So 𝜙 is not a Freiman 2-isomorphism!
(d) Likewise, the natural embedding 𝜙 : [𝑁] → Z/𝑁Z is a Freiman homomorphism of
every order but not a Freiman 2-isomorphism. However, when the domain is restricted
to all integers less than 𝑁/𝑠, then 𝜙 becomes a Freiman 𝑠-isomorphism onto its image
(why?).
The last example has the following easy generalization, which we will use later. The
diameter of a set 𝐴 is defined to be
diam 𝑨 := sup |𝑎 − 𝑏| .
𝑎,𝑏∈ 𝐴
7.7 Modeling Lemma 247

Proposition 7.6.6 (Small diameter sets)

If 𝐴 ⊆ Z has diameter < 𝑁/𝑠, then 𝐴 is Freiman 𝑠-isomorphic to its image mod 𝑁.

Intuitively, the idea is that there are no wrap around additive relations mod 𝑁 if 𝐴 has
small diameter.
Proof. The mod 𝑁 map Z → Z/𝑁 is a group homomorphism, and hence automatically a
Freiman 𝑠-homomorphism. Now, if 𝑎 1 , . . . , 𝑎 𝑠 , 𝑎 1′ , . . . , 𝑎 ′𝑠 ∈ 𝐴 are such that
(𝑎 1 + · · · + 𝑎 𝑠 ) − (𝑎 1′ + · · · + 𝑎 ′𝑠 ) ≡ 0 (mod 𝑁),
then the left hand side, viewed as an integer, has absolute value less than 𝑁 (since 𝑎 𝑖 − 𝑎 𝑖′ <
𝑁/𝑠 for each 𝑖). Thus the left hand side must be 0 in Z. So the inverse of the mod 𝑁 map is a
Freiman 𝑠-homomorphism over 𝐴, and thus mod 𝑁 is a Freiman 𝑠-isomorphism. □

7.7 Modeling Lemma

The goal of the Ruzsa modeling lemma is to represent a set with bounded doubling inside
a small cyclic group in a way that that preserves relevant additive data. This is useful since
initially 𝐴 may contain integers of vastly different magnitudes. On the other hand, if 𝐴 is
a subset of Z/𝑁Z with 𝑁 comparable to 𝐴, then we have additional tools such as Fourier
analysis (to be discussed in the following section).
As warm up, let us first prove an easier result in the finite field model.

Proposition 7.7.1 (Modeling lemma in finite field model)

Let 𝐴 ⊆ F2𝑛 . Suppose |𝑠𝐴 − 𝑠 𝐴| ≤ 2𝑚 for some positive integer 𝑚. Then 𝐴 is Freiman
𝑠-isomorphic to some subset of F2𝑚 .

Remark 7.7.2. If | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then Plünnecke’s inequality (Theorem 7.3.1) implies

|𝑠𝐴 − 𝑠𝐴| ≤ 𝐾 2𝑠 | 𝐴|. By taking 𝑚 to be the smallest integer with 𝐾 2𝑠 | 𝐴| ≤ 2𝑚 , we see that
the cardinality of the final vector space F2𝑚 is within a constant factor 2𝐾 2𝑠 of | 𝐴|. In contrast,
𝐴 initially lived in a space F2𝑛 that could potentially be much larger.
Proof. It is easy to check that the following are equivalent for a linear map 𝜙 : F2𝑛 → F2𝑚 :
(1) 𝜙 is a Freiman 𝑠-isomorphism when restricted to 𝐴.
(2) 𝜙 is injective on 𝑠𝐴.
(3) 𝜙(𝑥) ≠ 0 for all nonzero 𝑥 ∈ 𝑠 𝐴 − 𝑠 𝐴.
Then let 𝜙 : F2𝑛 → F2𝑚 be a linear map chosen uniformly at random. Each nonzero 𝑥 ∈
𝑠𝐴 − 𝑠𝐴 violates condition (3) with probability 2−𝑚 . Since there are < 2𝑚 nonzero elements
in 𝑠𝐴 − 𝑠𝐴 by hypothesis, (3) is satisfied with with positive probability. Therefore, the desired
Freiman 𝑠-isomorphism exists. □
Starting with 𝐴 ⊆ Z of small doubling, we will find a large fraction of 𝐴 that can be
modeled inside a cyclic group whose size is comparable to | 𝐴|. It turns out to be enough to
model a large subset of 𝐴 rather than all of 𝐴. We will apply the Ruzsa covering lemma later
on to recover the structure of the entire set 𝐴.
248 Structure of Set Addition

Theorem 7.7.3 (Ruzsa modeling lemma)

Let 𝐴 ⊆ Z. Let 𝑠 ≥ 2 and 𝑁 be positive integers. Suppose |𝑠 𝐴 − 𝑠 𝐴| ≤ 𝑁. Then there
exists 𝐴′ ⊆ 𝐴 with | 𝐴′ | ≥ | 𝐴| /𝑠 such that 𝐴′ is Freiman 𝑠-isomorphic to a subset of
Z/𝑁Z.

Proof. Choose any prime 𝑞 > max(𝑠 𝐴 − 𝑠 𝐴). For every choice of 𝜆 ∈ [𝑞 − 1], we define 𝜙𝜆
as the composition of functions as follows
mod 𝑞 ·𝜆 (mod 𝑞) −1
𝜙 = 𝜙𝜆 : Z −−−−→ Z/𝑞Z −−−−→ Z/𝑞Z −−−−−−−−−→ {0, 1, . . . , 𝑞 − 1} .
The first map is the mod 𝑞 map. The second map sends 𝑥 to 𝜆𝑥. The last map inverts the mod
𝑞 map Z → Z/𝑞Z.
If 𝜆 ∈ [𝑞 − 1] is chosen uniformly at random, then each nonzero integer is mapped to a
uniformly random element of [𝑞 − 1] under 𝜙𝜆 , and so is divisible by 𝑁 with probability
≤ 1/𝑁. Since there are fewer than 𝑁 nonzero elements in 𝑠 𝐴 − 𝑠 𝐴, there exists a choice of 𝜆
so that
𝑁 ∤ 𝜙𝜆 (𝑥) for any nonzero 𝑥 ∈ 𝑠 𝐴 − 𝑠 𝐴. (7.2)
Let us fix this 𝜆 from now on and write 𝜙 = 𝜙𝜆 .
Among the three functions whose composition defines 𝜙, the first map (i.e., mod 𝑞) and the
second map (·𝜆 in Z/𝑞Z) are group homomorphisms, and hence Freiman 𝑠-homomorphisms.
The last map is not a Freiman 𝑠-homomorphism, but it becomes one when restricted to an
interval of at most 𝑞/𝑠 elements (see Proposition 7.6.6). By the pigeonhole principle, we can
find an interval 𝐼 with
diam 𝐼 < 𝑞/𝑠
such that
𝐴′ = {𝑎 ∈ 𝐴 : 𝜙(𝑎) ∈ 𝐼}
has ≥ | 𝐴| /𝑠 elements. So 𝜙 sends 𝐴′ Freiman 𝑠-homomorphically to its image.
We further compose 𝜙 with the mod 𝑁 map to obtain
𝜙 mod 𝑞
𝜓 : Z −−→ {0, 1, . . . , 𝑞 − 1} −−−−→ Z/𝑁Z.
We claim that 𝜓 maps 𝐴′ Freiman 𝑠-isomorphically to its image. Indeed, we saw that 𝜓 is a
Freiman 𝑠-homomorphism when restricted to 𝐴′ (since both 𝜙| 𝐴′ and the mod 𝑁 map are).
Now suppose 𝑎 1 , . . . , 𝑎 𝑠 , 𝑎 1′ , . . . , 𝑎 ′𝑠 ∈ 𝐴′ satisfy
𝜓(𝑎 1 ) + · · · + 𝜓(𝑎 𝑠 ) = 𝜓(𝑎 1′ ) + · · · + 𝜓(𝑎 ′𝑠 ),
which is the same as saying that 𝑁 divides
𝑦 := 𝜙(𝑎 1 ) + · · · + 𝜙(𝑎 𝑠 ) − 𝜙(𝑎 1′ ) − · · · − 𝜙(𝑎 ′𝑠 ) ∈ Z.
By swapping (𝑎 1 , . . . , 𝑎 𝑠 ) with (𝑎 1′ , . . . , 𝑎 ′𝑠 ) if needed, we may assume that 𝑦 ≥ 0. Since
𝜙( 𝐴′ ) ⊆ 𝐼, we have |𝜙(𝑎 𝑖 ) − 𝜙(𝑎 𝑖′ )| ≤ diam 𝐼 < 𝑞/𝑠 for each 𝑖, and thus
0 ≤ 𝑦 < 𝑞.
7.8 Iterated Sumsets: Bogolyubov’s Lemma 249

Let
𝑥 = 𝑎 1 + · · · + 𝑎 𝑠 − 𝑎 1′ − · · · − 𝑎 ′𝑠 ∈ 𝑠 𝐴 − 𝑠 𝐴.
Since 𝜙 mod 𝑞 is a group homomorphism,
𝜙(𝑥) ≡ 𝜙(𝑎 1 ) + · · · + 𝜙(𝑎 𝑠 ) − 𝜙(𝑎 1′ ) − · · · − 𝜙(𝑎 ′𝑠 ) = 𝑦 (mod 𝑞).
Since
𝜙(𝑥), 𝑦 ∈ [0, 𝑞) ∩ Z and 𝜙(𝑥) ≡ 𝑦 (mod 𝑞),
we have 𝜙(𝑥) = 𝑦. Since 𝑁 divides 𝑦 = 𝜙(𝑥), and by (7.2), 𝑁 ∤ 𝜙(𝑥) for any nonzero
𝑥 ∈ 𝑠𝐴 − 𝑠𝐴, we must have 𝑥 = 0. Thus
𝑎 1 + · · · + 𝑎 𝑠 = 𝑎 1′ + · · · + 𝑎 ′𝑠 .
Hence 𝐴′ is a set of size ≥ | 𝐴| /𝑠 that is Freiman 𝑠-isomorphic via 𝜓 to its image in Z/𝑁Z. □
Exercise 7.7.4 (Modeling arbitrary sets of integers). Let 𝐴 ⊆ Z with | 𝐴| = 𝑛.
(a) Let 𝑝 be a prime. Show that there is some integer 𝑡 relatively prime to 𝑝 such that
∥𝑎𝑡/𝑝∥ R/Z ≤ 𝑝 −1/𝑛 for all 𝑎 ∈ 𝐴.
(b) Show that 𝐴 is Freiman 2-isomorphic to a subset of [𝑁] for some 𝑁 = (4 + 𝑜(1)) 𝑛 .
(c) Show that (b) cannot be improved to 𝑁 = 2𝑛−2 .
(You may use the fact that the smallest prime larger than 𝑚 has size 𝑚 + 𝑜(𝑚).)

Exercise 7.7.5 (Sumset with 3-AP-free set). Let 𝐴 and 𝐵 be 𝑛-element subsets of the
integers. Suppose 𝐴 is 3-AP free. Prove that | 𝐴 + 𝐵| ≥ 𝑛(log log 𝑛) 1/100 provided that 𝑛 is
sufficiently large.
Hint: Ruzsa triangle inequality, Plünnecke’s inequality, Ruzsa model lemma, Roth’s theorem

Exercise 7.7.6 (3-AP-free subsets of arbitrary sets of integers). Prove that there is√some
log 𝑛
constant 𝐶 > 0 so that every set of 𝑛 integers has a 3-AP-free subset of size ≥ 𝑛𝑒 −𝐶 .

7.8 Iterated Sumsets: Bogolyubov’s Lemma

The goal of this section is to find a large Bohr set inside 2𝐴 − 2𝐴, provided that 𝐴 is a
relatively large subset of Z/𝑁Z. The idea is due to Bogolyubov (1939).
Let us first explain what happens in the finite field model. Let 𝐴 ⊆ F2𝑛 with | 𝐴| ≥ 𝛼2𝑛
(we think of 𝛼 as a constant for now). Since 𝐴 is arbitrary, we do not expect it to contain any
large subspaces. But perhaps 𝐴 + 𝐴 always does.

Question 7.8.1 (Large structure in 𝐴 + 𝐴)

Suppose 𝐴 ⊆ F2𝑛 and | 𝐴| = 𝛼2𝑛 where 𝛼 is a constant independent of 𝑛. Must it be the
case that 𝐴 + 𝐴 contains a large subspace of codimension 𝑂 𝛼 (1)?

The answer to the above question is no, as evidenced by the following example. (Niveau
is French for level.)
250 Structure of Set Addition

Example 7.8.2 (Niveau set). Let 𝐴 be the set of all points in F2𝑛 with Hamming weight
√
(number of 1 entries) at most (𝑛−𝑐 𝑛)/2. Note by the central limit theorem | 𝐴| = (𝛼+𝑜(1))2𝑛
for for some constant 𝛼 = 𝛼(𝑐) ∈ (0, 1). The sumset 𝐴 + 𝐴 consists of points in the boolean
√
cube whose Hamming weight is at most 𝑛 − 𝑐 𝑛 and thus does not contain any subspace of
√
codimension < 𝑐 𝑛, by Lemma 6.5.4.
It turns out that the iterated sumset 2𝐴 − 2𝐴 (same as 4𝐴 in F2𝑛 ) always contains a bounded
codimensional subspace. The intuition is that taking sumsets “smooths” out the structure of
a set, analogous to how convolutions in real analysis make functions more smooth.

𝑓∗𝑓

𝑓∗𝑓∗𝑓

𝑓∗𝑓∗𝑓∗𝑓

Recall some basic properties of the Fourier transform. Given 𝐴 ⊆ F𝑛𝑝 with | 𝐴| = 𝛼𝑝 𝑛 , we
have
1c𝐴 (0) = 𝛼,
and by Parseval’s identity
∑︁
| 1c𝐴 (𝑟)| 2 = E 𝑥 ∈F𝑛𝑝 |1 𝐴 (𝑥)| 2 = 𝛼.
𝑟 ∈F𝑛𝑝

We write 𝜔 = exp(2𝜋𝑖/𝑝) in the proof below.

Theorem 7.8.3 (Bogolyubov’s lemma in F𝑛𝑝 )

If 𝐴 ⊆ F𝑛𝑝 and | 𝐴| = 𝛼𝑝 𝑛 > 0, then 2𝐴 − 2𝐴 contains a subspace of codimension < 1/𝛼2 .

Proof. Let
𝑓 = 1 𝐴 ∗ 1 𝐴 ∗ 1− 𝐴 ∗ 1− 𝐴,
which is supported on 2𝐴 − 2𝐴. By the convolution identity (Theorem 6.1.7), noting that
1d c
− 𝐴 (𝑟) = 1 𝐴 (𝑟), we have, for every 𝑟 ∈ F 𝑝 ,
𝑛

b
𝑓 (𝑟) = 1c𝐴 (𝑟) 2 1d 2 c 4
− 𝐴 (𝑟) = | 1 𝐴 (𝑟)| .

By the Fourier inversion formula (Theorem 6.1.2), we have

∑︁ ∑︁
𝑓 (𝑥) = b
𝑓 (𝑟)𝜔𝑟 · 𝑥 = | 1c𝐴 (𝑟)| 4 𝜔𝑟 · 𝑥 .
𝑟 ∈F𝑛𝑝 𝑟 ∈F𝑛𝑝
7.8 Iterated Sumsets: Bogolyubov’s Lemma 251

It suffices to find a subspace where 𝑓 is positive since 𝑓 (𝑥) > 0 implies 𝑥 ∈ 2𝐴 − 2𝐴. We
will take the subspace defined by large Fourier coefficients. Let
n o
𝑅 = 𝑟 ∈ F𝑛𝑝 \{0} : | 1c𝐴 (𝑟)| > 𝛼3/2 .
We can bound the size of 𝑅 using Parseval’s identity:
∑︁ ∑︁
|𝑅| 𝛼3 ≤ | 1c𝐴 (𝑟)| 2 < | 1c𝐴 (𝑟)| 2 = E 𝑥 |1 𝐴 (𝑥)| 2 = 𝛼.
𝑟 ∈𝑅 𝑟 ∈F𝑛𝑝

So
|𝑅| < 1/𝛼2 .
If 𝑟 ∉ 𝑅 ∪ {0}, then | 1c𝐴 (𝑟)| ≤ 𝛼3/2 . So, applying Parseval’s identity again,
∑︁ ∑︁
| 1c𝐴 (𝑟)| 4 ≤ max | 1c𝐴 (𝑟)| 2 | 1c𝐴 (𝑟)| 2
𝑟∉𝑅∪{0}
∑︁
𝑟∉𝑅∪{0} 𝑟∉𝑅∪{0}

<𝛼 3
| 1c𝐴 (𝑟)| = 𝛼3 E 𝑥 |1 𝐴 (𝑥)| 2 = 𝛼4 .
2

𝑟 ∈F𝑛𝑝

Thus, for all 𝑥 ∈ 𝑅 ⊥ , so that 𝑥 · 𝑟 = 0 for all 𝑟 ∈ 𝑅, we have

∑︁
𝑓 (𝑥) = | 1c𝐴 (𝑟)| 4 Re 𝜔𝑟 · 𝑥
∑︁ ∑︁
𝑟 ∈F𝑛𝑝

≥ | 1c𝐴 (0)| 4 + | 1c𝐴 (𝑟)| 4 − | 1c𝐴 (𝑟)| 4

𝑟 ∈𝑅 𝑟∉𝑅∪{0}
4 4
> 𝛼 +0−𝛼
≥ 0.
Thus 𝑅 ⊥ ⊆ supp( 𝑓 ) = 2𝐴 − 2𝐴. Since |𝑅| < 1/𝛼2 , we have found a subspace of codimension
< 1/𝛼2 contained in 2𝐴 − 2𝐴. □
To formulate an analogous result for a cyclic group Z/𝑁Z, we need the notion of a Bohr
set, which was mentioned earlier in the context of Roth’s theorem (Remark 6.4.7).

Definition 7.8.4 (Bohr sets in Z/𝑁Z)

Let 𝑅 ⊆ Z/𝑁Z. Define
Bohr(𝑅, 𝜀) = {𝑥 ∈ Z/𝑁Z : ∥𝑟𝑥/𝑁 ∥ R/Z ≤ 𝜀, for all 𝑟 ∈ 𝑅}
where ∥·∥ R/Z denotes the distance to the nearest integer. Its dimension is |𝑅| and width
is 𝜀. (Strictly speaking, the definition of a Bohr set includes the data of 𝑅 and 𝜀 and not
just the set of elements above.)

Bogolyubov’s lemma holds over Z/𝑁Z after replacing subspaces by Bohr sets. Note that
the dimension of a Bohr set of Z/𝑁Z corresponds to the codimension of a subspace in F𝑛𝑝 .

Theorem 7.8.5 (Bogolyubov’s lemma in Z/𝑁Z)

If 𝐴 ⊆ Z/𝑁Z and | 𝐴| = 𝛼𝑁 then 2𝐴 − 2𝐴 contains some Bohr set Bohr(𝑅, 1/4) with
|𝑅| < 1/𝛼2 .
252 Structure of Set Addition

With the right setup, the proof is essentially identical to that of Theorem 7.8.3.
Given 𝑓 : Z/𝑁Z → C, we define its Fourier transform to be the function b 𝑓 : Z/𝑁Z → C
given by
b
𝑓 (𝑟) = E 𝑥 ∈Z/𝑁 Z 𝑓 (𝑥)𝜔 −𝑟 𝑥
where 𝜔 = exp(2𝜋𝑖/𝑁). Fourier inversion, Parseval’s identity, and the convolution identity
all work the same way.
Proof. Let
𝑓 = 1 𝐴 ∗ 1 𝐴 ∗ 1− 𝐴 ∗ 1− 𝐴,
which is supported on 2𝐴 − 2𝐴. By the convolution identity, for every 𝑟 ∈ Z/𝑁Z,
b
𝑓 (𝑟) = 1c𝐴 (𝑟)b
2
12− 𝐴 (𝑟) = | 1c𝐴 (𝑟)| 4 .
By Fourier inversion, we have (noting that 𝑓 is real-valued)
∑︁ ∑︁
𝑓 (𝑥) = b
𝑓 (𝑟)𝜔𝑟 𝑥 = | 1c𝐴 (𝑟)| 4 𝜔𝑟 𝑥 .
𝑟 ∈Z/𝑁 Z 𝑟 ∈Z/𝑁 Z

Let n o
𝑅 = 𝑟 ∈ Z/𝑁Z\{0} : | 1c𝐴 (𝑟)| > 𝛼3/2 .

As earlier, we can bound the size of 𝑅 using Parseval’s identity:

∑︁ ∑︁
|𝑅| 𝛼3 ≤ | 1c𝐴 (𝑟)| 2 < | 1c𝐴 (𝑟)| 2 = E 𝑥 |1 𝐴 (𝑥)| 2 = 𝛼.
𝑟 ∈𝑅 𝑟 ∈F𝑛𝑝

So
|𝑅| < 1/𝛼2 .
We have
∑︁ ∑︁
| 1c𝐴 (𝑟)| 4 ≤ 𝛼3 | 1c𝐴 (𝑟)| 2 < 𝛼4 .
𝑟∉𝑅∪{0} 𝑟∉𝑅∪{0}

For all 𝑥 ∈ Bohr(𝑅, 1/4), every 𝑟 ∈ 𝑅 satisfies ∥𝑟𝑥/𝑁 ∥ R/Z ≤ 1/4, and so cos(2𝜋𝑟𝑥/𝑁) ≥ 0.
Thus every 𝑥 ∈ Bohr(𝑅, 1/4) satisfies
∑︁
𝑓 (𝑥) = | 1c𝐴 (𝑟)| 4 𝜔𝑟 · 𝑥
∑︁ ∑︁
𝑟 ∈Z/𝑁 Z

≥ | 1c𝐴 (0)| 4 + | 1c𝐴 (𝑟)| 4 − | 1c𝐴 (𝑟)| 4

𝑟 ∈𝑅 𝑟∉𝑅∪{0}
4 4
> 𝛼 + 0 − 𝛼 ≥ 0.
Hence Bohr(𝑅, 1/4) ⊆ 2𝐴 − 2𝐴. □
Remark 7.8.6 (Iterated sumsets and Goldbach conjecture). The above proof hints at why
it is easier to understand the iterated sumset 𝑘 𝐴 when 𝑘 ≥ 3 than 𝑘 = 2 (roughly speaking,
we need two iterations to just apply Parseval, and the extra room is helpful). Exercise 7.8.7
below shows that the three-fold iterated sumset of every large subset of F𝑛𝑝 contains a large
7.9 Geometry of Numbers 253

affine subspace (we do not always have a large subspace since the origin is not necessarily
even in 3𝐴).
A related phenomenon arises in Goldbach conjecture. Let 𝑃 denote the set of primes. The
still open Goldbach conjecture states that 𝑃 + 𝑃 contains all sufficiently large even integers.
On the other hand, Vinogradov (1937) showed that 𝑃 + 𝑃 + 𝑃 contains all sufficiently large
odd integers (also known as the weak or ternary Goldbach problem).
Our next goal is to find a large GAP in the Bohr set produced by Bogolyubov’s lemma. To
do this, we need some results from the geometry of numbers.
Exercise 7.8.7 (Bogolyubov with 3-fold sums). Let 𝐴 ⊆ F𝑛𝑝 with | 𝐴| = 𝛼𝑝 𝑛 . Prove that
𝐴 + 𝐴 + 𝐴 contains a translate of a subspace of codimension 𝑂 (𝛼 −3 ).

Exercise 7.8.8 (Bogolyubov with better bounds). Let 𝐴 ⊆ F𝑛𝑝 with | 𝐴| = 𝛼𝑝 𝑛 .

(a) Show that if | 𝐴 + 𝐴| < 0.99 · 2𝑛 , then there is some 𝑟 ∈ F𝑛𝑝 \ {0} such that
| 1c𝐴 (𝑟)| > 𝑐𝛼3/2 for some absolute constant 𝑐 > 0.
(b) By iterating (a), show that 𝐴 + 𝐴 contains 99% of a subspace of codimension
𝑂 (𝛼 −1/2 ).
(c) Deduce that 2𝐴 − 2𝐴 contains a subspace of codimension 𝑂 (𝛼 −1/2 ).

7.9 Geometry of Numbers

We will need some results concerning lattices and convex bodies belonging to a topic in
number theory called the geometry of numbers.

Definition 7.9.1 (Lattice)

A lattice in R𝑑 is a set of the form
Λ = Z𝑣 1 ⊕ · · · ⊕ Z𝑣 𝑑 = {𝑛1 𝑣 1 + · · · + 𝑛 𝑑 𝑣 𝑑 : 𝑛1 , . . . , 𝑛 𝑑 ∈ Z}
where 𝑣 1 , . . . , 𝑣 𝑑 ∈ R𝑑 are linearly independent vectors.
The fundamental parallelepiped of a lattice Λ with respect to the basis 𝑣 1 , . . . , 𝑣 𝑑 is
{𝑥1 𝑣 1 + · · · + 𝑥 𝑑 𝑣 𝑑 : 𝑥1 , . . . , 𝑥 𝑑 ∈ [0, 1)} .
The determinant of this lattice is defined to be
| ··· |
© ª
det 𝚲 := det 𝑣 1 ··· 𝑣𝑑® .
«| ··· |¬
This is the absolute value of the determinant of a matrix with 𝑣 1 , . . . , 𝑣 𝑑 as columns.

Given a lattice, there are many choices of a basis for the lattice. The determinant of a
lattice does not depend on the choice of a basis, and equals the volume of every fundamental
parallelepiped. Translations of the fundamental parallelepiped by lattice vectors tiles (i.e.,
partitions) the space.
An example of a lattice is illustrated below. Two different fundamental parallelepipeds are
shaded.
254 Structure of Set Addition

𝑣2

0 𝑣1

Let Λ ⊆ R𝑑 be a lattice. Let 𝐾 ⊆ R𝑑 be a centrally symmetric convex body (here centrally

symmetric means that −𝑥 ∈ 𝐾 whenever 𝑥 ∈ 𝐾). For each 𝜆 ≥ 0, let 𝜆𝐾 = {𝜆𝑥 : 𝑥 ∈ 𝐾 } be
the dilation of 𝐾 by a factor 𝜆.
As illustrated below, imagine an animation where at time 𝜆 we see 𝜆𝐾. This growing
convex body initially is just the origin, and at some point it sees its first nonzero lattice point
b1 . Let us continue to grow this convex body. Later, at some point, it sees the first lattice
point b2 in a new dimension not seen previously. And we can continue until the convex body
grows big enough to contain lattice points that span all directions.

𝜆2 𝐾
𝜆1 𝐾
b1 𝐾
0

The process of dilating a convex body motivates the next definition.

Definition 7.9.2 (Successive minima)

Let Λ be a lattice in R𝑑 and 𝐾 ⊆ R𝑑 a centrally symmetric convex body. For each
1 ≤ 𝑖 ≤ 𝑑, the 𝒊-th successive minimum of 𝐾 with respect to Λ is defined to be
𝜆𝑖 = inf{𝜆 ≥ 0 : dim(span(𝜆𝐾 ∩ Λ)) ≥ 𝑖}.
Equivalently, 𝜆𝑖 is the minimum 𝜆 such that 𝜆𝐾 contains 𝑖 linearly independent lattice
vectors from Λ.
A directional basis of 𝐾 with respect to Λ is a basis b1 , . . . , b𝑑 of R𝑑 such that b𝑖 ∈ 𝜆𝑖 𝐾
for each 𝑖 = 1, . . . , 𝑑.
Note that there may be more than one possible directional basis.
Example 7.9.3 (A directional basis does not necessarily generate the lattice). Let 𝑒 1 , . . . ,
𝑒 8 be the standard basis vectors in R8 . Let 𝑣 = (𝑒 1 + · · · + 𝑒 8 )/2. Consider the lattice
Λ = Z𝑒 1 ⊕ · · · ⊕ Z𝑒 7 ⊕ Z𝑣 = Z8 + {0, 𝑣} .
Let 𝐾 be the unit ball in R8 . Note that the directional basis of 𝐾 with respect√to Λ is 𝑒 1 , . . . , 𝑒 8 ,
as all nonzero lattice points in Λ have length ≥ 1 (in particular, |𝑣| = 2). This example
shows that the directional basis of a convex body 𝐾 is not necessarily a Z-basis of Λ.
7.9 Geometry of Numbers 255

In the next section, we will apply the following fundamental result from the geometry of
numbers (Minkowski 1896).

Theorem 7.9.4 (Minkowski’s second theorem)

Let Λ ∈ R𝑑 be a lattice and 𝐾 ⊆ R𝑑 a centrally symmetric convex body. Let 𝜆1 ≤ · · · ≤ 𝜆 𝑑
be the successive minima of 𝐾 with respect to Λ. Then
𝜆1 . . . 𝜆 𝑑 vol(𝐾) ≤ 2𝑑 det(Λ).

Example 7.9.5. Note that Minkowski’s second theorem is tight when

1 1 1 1
𝐾= − , ×···× − ,
𝜆1 𝜆1 𝜆𝑑 𝜆𝑑
and Λ is the lattice Z𝑑 .
We will prove this theorem in the remainder of the section. The proof, while not long, is
rather tricky. Feel free to skip the proof and jump to the next section.
Here is a simple geometric pigeonhole principle (Blichfeldt 1914).

Theorem 7.9.6 (Blichfeldt’s theorem)

Let Λ ⊆ R𝑑 be a lattice and 𝐾 ⊆ R𝑑 be a measurable set with vol(𝐾) > det(Λ). Then
there are distinct points 𝑥, 𝑦 ∈ 𝐾 with 𝑥 − 𝑦 ∈ Λ.

Proof. Fix a fundamental parallelepiped 𝑃. Then 𝑣 + 𝑃 tiles R𝑑 as 𝑣 ranges over Λ. Partition

𝐾 by this tiling. For the portion of 𝐾 lying in 𝑣 + 𝑃, translate it by −𝑣 to bring it back to 𝑃.
Then the parts of 𝐾 all end up back in 𝑃 via translations by lattice vectors. Since vol 𝐾 >
vol 𝑃 = det Λ, some distinct pair of points 𝑥, 𝑦 ∈ 𝐾 must end up at the same point of 𝑃. This
then implies that 𝑥 − 𝑦 ∈ Λ. □
Here is an easy corollary (though we will not need it).

Theorem 7.9.7 (Minkowski’s first theorem)

Let Λ be a lattice in R𝑑 and 𝐾 ⊆ R𝑑 a centrally symmetric convex body. If vol(𝐾) >
2𝑑 det(Λ), then 𝐾 contains a nonzero point of Λ.

Proof. We have vol( 21 𝐾) = 2−𝑑 vol(𝐾) > det(Λ). By Blichfeldt’s theorem there exist distinct
𝑥, 𝑦 ∈ 21 𝐾 such that 𝑥 − 𝑦 ∈ Λ. The point 𝑥 − 𝑦 is the midpoint of 2𝑥 and −2𝑦, both of which lie
in 𝐾 (using that 𝐾 is centrally symmetric) and hence 𝑥 − 𝑦 lies in 𝐾 (since 𝐾 is convex). □
Note that Minkowski’s first theorem is tight for 𝐾 = [−1, 1] 𝑑 and Z𝑑 .
Proof of Minkowski’s second theorem (Theorem 7.9.4). The idea is to grow 𝐾 until we hit
a point of Λ, and then continue growing, but only in the complementary direction. However
rigorously carrying out this procedure is very tricky (and easy to get wrong).
In the argument below, 𝐾 is open (i.e., does not include the boundary). Fix a directional
basis b1 , . . . , b𝑑 . For each 1 ≤ 𝑗 ≤ 𝑑, define map 𝜙 𝑗 : 𝐾 → 𝐾 by sending each point 𝑥 ∈ 𝐾
to the center of mass of the ( 𝑗 − 1)-dimensional slice of 𝐾 which contains 𝑥 and is parallel
to spanR {b1 , . . . , b 𝑗 −1 }. In particular, 𝜙1 (𝑥) = 𝑥 for all 𝑥 ∈ 𝐾.
256 Structure of Set Addition

Define a function 𝜓 : 𝐾 → R𝑑 by
𝑑
∑︁
𝜆 𝑗 − 𝜆 𝑗 −1
𝜓(𝑥) = 𝜙 𝑗 (𝑥),
𝑗=1
2

where by convention we let 𝜆0 = 0.

For x = 𝑥1 b1 + · · · + 𝑥 𝑑 b𝑑 ∈ R𝑑 with 𝑥1 , . . . , 𝑥 𝑑 ∈ R, we have
∑︁ ∑︁
𝜙 𝑗 (x) = 𝑐 𝑗,𝑖 (𝑥 𝑗 , . . . , 𝑥 𝑑 )b𝑖 + 𝑥 𝑖 b𝑖
𝑖< 𝑗 𝑖≥ 𝑗

for some continuous functions 𝑐 𝑗,𝑖 . By examining the coefficient of each b𝑖 , we find
∑︁𝑑
𝜆𝑖 𝑥𝑖
𝜓(x) = + 𝜓𝑖 (𝑥 𝑖+1 , . . . , 𝑥 𝑑 ) b𝑖
𝑖=1
2
for some continuous functions 𝜓𝑖 (𝑥𝑖+1 , . . . , 𝑥 𝑑 ), so its Jacobian 𝜕𝜓(x)/𝜕x 𝑗 with respect to
the basis (b1 , . . . , b𝑑 ) is upper triangular with diagonal (𝜆1 /2, . . . , 𝜆 𝑑 /2). Therefore
𝜆1 · · · 𝜆 𝑑
vol 𝜓(𝐾) = vol 𝐾. (7.3)
2𝑑
Í Í
For any distinct points x = 𝑥𝑖 b𝑖 , y = 𝑦 𝑖 b𝑖 in 𝐾, let 𝑘 be the largest index such that
𝑥 𝑘 ≠ 𝑦 𝑘 . Then 𝜙𝑖 (x) agrees with 𝜙𝑖 (y) for all 𝑖 > 𝑘. So
∑︁
𝑑
𝜙 𝑗 (x) − 𝜙 𝑗 (y)
𝜓(x) − 𝜓(y) = (𝜆 𝑗 − 𝜆 𝑗 −1 )
𝑗=1
2
∑︁
𝑘 ∑︁ 𝑘
𝜙 𝑗 (x) − 𝜙 𝑗 (y)
= (𝜆 𝑗 − 𝜆 𝑗 −1 ) ∈ (𝜆 𝑗 − 𝜆 𝑗 −1 )𝐾 = 𝜆 𝑘 𝐾.
𝑗=1
2 𝑗=1

The ∈ step is due to 𝐾 being centrally symmetric and convex. The coefficient of b𝑘 in
(𝜓(x) − 𝜓(y)) is 𝜆 𝑘 (𝑥 𝑘 − 𝑦 𝑘 )/2 ≠ 0. So 𝜓(x) − 𝜓(y) ∉ spanR {b1 , b2 , . . . b𝑘−1 }. But we
just saw that 𝜓(x) − 𝜓(y) ∈ 𝜆 𝑘 𝐾 Recall that 𝐾 is open, and also 𝜆 𝑘 𝐾 ∩ Λ is contained in
spanR {b1 , b2 , . . . b𝑘−1 }. Thus 𝜓(x) − 𝜓(y) ∉ Λ.
So 𝜓(𝐾) contains no two points separated by a nonzero lattice vector. By Blichfeldt’s
theorem (Theorem 7.9.6), we deduce vol 𝜓(𝐾) ≤ det Λ. Combined with (7.3), we deduce
𝜆1 · · · 𝜆 𝑑 vol 𝐾 ≤ 2𝑑 vol 𝜓(𝐾) ≤ 2𝑑 det Λ. □

7.10 Finding a GAP in a Bohr Set

Now we use Minkowski’s second theorem to prove that a Bohr set of low dimension contains
a large GAP.

Theorem 7.10.1 (Large GAP in a Bohr set)

Let 𝑁 be a prime. Every Bohr set of dimension 𝑑 and width 𝜀 ∈ (0, 1) in Z/𝑁Z contains
a proper GAP with dimension at most 𝑑 and volume at least (𝜀/𝑑) 𝑑 𝑁.
7.10 Finding a GAP in a Bohr Set 257

Proof. Let 𝑅 = {𝑟 1 , . . . , 𝑟 𝑑 } ⊆ Z/𝑁Z. Recall that

Bohr(𝑅, 𝜀) = 𝑥 ∈ Z/𝑁Z : ∥𝑥𝑟/𝑁 ∥ R/Z ≤ 𝜀 for all 𝑟 ∈ 𝑅 .
Let 𝑟
𝑟𝑑 1
𝑣= . ,...,
𝑁 𝑁
Thus for each 𝑥 = 0, 1, . . . , 𝑁 − 1, we have 𝑥 ∈ Bohr(𝑅, 𝜀) if and only if some element of
𝑥𝑣 + Z𝑑 lies in [−𝜀, 𝜀] 𝑑 .
Let
Λ = Z𝑑 + Z𝑣 ⊆ R𝑑
be a lattice consisting of all points in R𝑑 that are congruent mod 1 to some integer multiple of
𝑣. Note det(Λ) = 1/𝑁 since there are exactly 𝑁 points of Λ within each translate of the unit
cube. We consider the convex body 𝐾 = [−𝜀, 𝜀] 𝑑 . Let 𝜆1 , . . . , 𝜆 𝑑 be the successive minima
of 𝐾 with respect to Λ. Let b1 , . . . , b𝑑 be the directional basis. We know
∥b 𝑗 ∥ ∞ ≤ 𝜆 𝑗 𝜀 for all 𝑗 .
For each 1 ≤ 𝑗 ≤ 𝑑, let 𝐿 𝑗 = ⌈1/(𝜆 𝑗 𝑑)⌉. If 0 ≤ 𝑙 𝑗 < 𝐿 𝑗 then
𝜀
∥𝑙 𝑗 b 𝑗 ∥ ∞ < .
𝑑
If we have integers 𝑙 1 , . . . , 𝑙 𝑑 with 0 ≤ 𝑙 𝑖 < 𝐿 𝑖 for all 𝑖 then
∥𝑙 1 b1 + · · · + 𝑙 𝑑 b𝑑 ∥ ∞ ≤ 𝜀.
For each 1 ≤ 𝑗 ≤ 𝑑, there is some 0 ≤ 𝑥 𝑗 < 𝑁 so that b 𝑗 ∈ 𝑥 𝑗 𝑣 + Z𝑑 , so its 𝑖-th coordinate
lies in 𝑥 𝑖 𝑟 𝑖 /𝑁 + Z𝑑 . The 𝑖-the coordinate in the above 𝐿 ∞ bound gives
(𝑙1 𝑥 1 + · · · + 𝑙 𝑑 𝑥 𝑑 )𝑟 𝑖
≤ 𝜀 for all 𝑖.
𝑁 R/Z

Thus, the GAP

𝑙1 𝑥1 + · · · + 𝑙 𝑑 𝑥 𝑑 , 0 ≤ 𝑙 𝑖 < 𝐿 𝑖 for each 1 ≤ 𝑖 ≤ 𝑑
is contained in Bohr(𝑅, 𝜀). It remains to show that this GAP is large and proper. It volume
is, applying Minkowski’s second theorem,
1 vol(𝐾) (2𝜀) 𝑑 𝜀 𝑑
𝐿1 · · · 𝐿 𝑘 ≥ ≥ = = 𝑁.
𝜆1 · · · 𝜆 𝑑 · 𝑑 𝑑 2𝑑 det(Λ)𝑑 𝑑 2𝑑 (1/𝑁)𝑑 𝑑 𝑑
Now we check that the GAP is proper. It suffices to show that if
𝑙 1 𝑥 1 + · · · + 𝑙 𝑑 𝑥 𝑑 ≡ 𝑙 1′ 𝑥1 + · · · + 𝑙 𝑑′ 𝑥 𝑑 (mod 𝑁),
then we must have 𝑙 𝑖 = 𝑙 𝑖′ for all 𝑖. Setting
b = (𝑙 1 − 𝑙 1′ )b1 + · · · + (𝑙 𝑑 − 𝑙 𝑑′ )b𝑑 ,
we have b ∈ Z𝑑 . Furthermore
∑︁
𝑑
1
∥b∥ ∞ ≤ ∥b𝑖 ∥ ∞ ≤ 𝜀 < 1,
𝑖=1
𝜆 𝑖𝑑
258 Structure of Set Addition

so actually b must be 0. Since 𝑏 1 , . . . , 𝑏 𝑑 is a basis we must have 𝑙 𝑖 = 𝑙 𝑖′ for all 𝑖, as desired. □

7.11 Proof of Freiman’s Theorem

We are now ready to prove Freiman’s theorem by putting together all the ingredients in this
chapter. Let us recall what we have proved.
• Plünnecke’s inequality (Theorem 7.3.1): | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| implies |𝑚 𝐴 − 𝑛𝐴| ≤
𝐾 𝑚+𝑛 | 𝐴| for all 𝑚, 𝑛 ≥ 0.
• Ruzsa covering lemma (Theorem 7.4.1): if |𝑋 + 𝐵| ≤ 𝐾 |𝐵|, then there exist some
𝑇 ⊆ 𝑋 with |𝑇 | ≤ 𝐾 such that 𝑋 ⊆ 𝑇 + 𝐵 − 𝐵.
• Ruzsa modeling lemma (Theorem 7.7.3): if 𝐴 ⊆ Z and |𝑠 𝐴 − 𝑠 𝐴| ≤ 𝑁, then there
exists 𝐴′ ⊆ 𝐴 with | 𝐴′ | ≥ | 𝐴| /𝑠 such that 𝐴′ is Freiman 𝑠-isomorphic to a subset of
Z/𝑁Z.
• Bogolyubov’s lemma (Theorem 7.8.5): for every 𝐴 ⊆ Z/𝑁Z with | 𝐴| = 𝛼𝑁, 2𝐴 − 2𝐴
contains some Bohr set with dimension < 1/𝛼2 and width 1/4.
• By a geometry of numbers argument (Theorem 7.10.1), for every prime 𝑁, every Bohr
set of dimension 𝑑 and width 𝜀 ∈ (0, 1) contains a proper GAP with dimension ≤ 𝑑
and volume ≥ (𝜀/𝑑) 𝑑 𝑁.
Now we will prove Freiman’s theorem. We restate it below with the bounds that we will
prove.

Theorem 7.11.1 (Freiman’s theorem)

Let 𝐴 ⊆ Z be a finite set satisfying | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|. Then 𝐴 is contained in a GAP
of dimension at most 𝑑 (𝐾) and volume at most 𝑓 (𝐾) | 𝐴|, where 𝑑 (𝐾) ≤ exp(𝐾 𝐶 ) and
𝑓 (𝐾) ≤ exp(exp(𝐾 𝐶 )) for some absolute constant 𝐶.

Proof. By Plünnecke’s theorem, we have |8𝐴 − 8𝐴| ≤ 𝐾 16 | 𝐴|. Let 𝑁 be a prime with
𝐾 16 | 𝐴| ≤ 𝑁 ≤ 2𝐾 16 | 𝐴| (it exists by Bertrand’s postulate). By Ruzsa modeling lemma, some
𝐴′ ⊆ 𝐴 with | 𝐴′ | ≥ | 𝐴| /8 is Freiman 8-isomorphic to a subset 𝐵 of Z/𝑁Z.
Applying Bogolyubov’s lemma on 𝐵 ⊆ Z/𝑁Z, with
|𝐵| | 𝐴′ | | 𝐴| 1
𝛼= = ≥ ≥ ,
𝑁 𝑁 8𝑁 16𝐾 16
we deduce that 2𝐵 − 2𝐵 contains a Bohr set with dimension < 256𝐾 32 and width 1/4. By
Theorem 7.10.1, 2𝐵 − 2𝐵 contains a proper GAP with dimension 𝑑 < 256𝐾 32 and volume
≥ (4𝑑) −𝑑 𝑁.
Since 𝐵 is Freiman 8-isomorphic to 𝐴′ , 2𝐵 − 2𝐵 is Freiman 2-isomorphic to 2𝐴′ − 2𝐴′
(why?). Note GAPs are preserved by Freiman 2-isomorphisms (why?). Hence, the proper
GAP in 2𝐵 − 2𝐵 is mapped to a proper GAP 𝑄 ⊆ 2𝐴′ − 2𝐴′ with the same dimension (≤ 𝑑)
and volume (≥ (4𝑑) −𝑑 𝑁). We have
| 𝐴| ≤ 8 | 𝐴′ | ≤ 8𝑁 ≤ 8(4𝑑) 𝑑 |𝑄| .
Since 𝑄 ⊆ 2𝐴′ − 2𝐴′ ⊆ 2𝐴 − 2𝐴, we have 𝑄 + 𝐴 ⊆ 3𝐴 − 2𝐴. By Plünnecke’s inequality,
|𝑄 + 𝐴| ≤ |3𝐴 − 2𝐴| ≤ 𝐾 5 | 𝐴| ≤ 8𝐾 5 (4𝑑) 𝑑 |𝑄| .
7.12 Polynomial Freiman–Ruzsa Conjecture 259

By the Ruzsa covering lemma, there exists a subset 𝑋 of 𝐴 with |𝑋 | ≤ 8𝐾 5 (4𝑑) 𝑑 such that
𝐴 ⊆ 𝑋 + 𝑄 − 𝑄. It remains to contain 𝑋 + 𝑄 − 𝑄 in a GAP.
By using two elements in each direction, 𝑋 is contained in a GAP of dimension |𝑋 | − 1
and volume ≤ 2 | 𝑋 | −1 . Since 𝑄 is a proper GAP with dimension 𝑑 < 256𝐾 32 and volume
≤ |2𝐴 − 2𝐴| ≤ 𝐾 4 | 𝐴|, 𝑄 − 𝑄 is a GAP with dimension 𝑑 and volume ≤ 2𝑑 𝐾 4 | 𝐴|. It follows
that 𝐴 ⊆ 𝑋 + 𝑄 − 𝑄 is contained in a GAP with
dimension ≤ |𝑋 | − 1 + 𝑑 ≤ 8(4𝑑) 𝑑 𝐾 5 + 𝑑 − 1 = 𝑒 𝐾
𝑂 (1)

(recall 𝑑 < 256𝐾 32 ) and

𝐾 𝑂 (1)
volume ≤ 2 | 𝑋 | −1+𝑑 𝐾 4 | 𝐴| = 𝑒 𝑒 | 𝐴| . □
The following exercise asks to improve the quantitative bounds on Freiman’s theorem.
Exercise 7.11.2 (Improved bounds on Freiman’s theorem). Using a more efficient cov-
ering lemma from Exercise 7.4.3, prove Freiman’s theorem with 𝑑 (𝐾) = 𝐾 𝑂 (1) and
𝑓 (𝐾) = exp(𝐾 𝑂 (1) ).

7.12 Polynomial Freiman–Ruzsa Conjecture

Here we explain one of the biggest open problems in additive combinatorics, known as the
polynomial Freiman–Ruzsa conjecture (PFR). As mentioned in Remark 7.1.11, we already
have nearly optimal bounds 𝑓 (𝐾) = 𝐾 1+𝑜 (1) and 𝑑 (𝐾) = exp(𝐾 1+𝑜(1) ) on Freiman’s theo-
rem. However, one can reformulate Freiman’s theorem with significantly better quantitative
dependencies.

PFR in the finite field model

Let us first explain what happens in the finite field model F2𝑛 . Theorem 7.5.1 showed that if
𝐴 ⊆ F2𝑛 has | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then 𝐴 is contained in a subspace of cardinality ≤ 𝑓 (𝐾) | 𝐴|. As
mentioned in Remark 7.5.2, the optimal constant is known and satisfies 𝑓 (𝐾) = Θ(22𝐾 /𝐾).
An example requiring this bound is 𝐴 ⊆ F𝑚+𝑛 defined by 𝐴 = {𝑒 1 , . . . , 𝑒 𝑛 } × F2𝑚 (where
𝑒 1 , . . . , 𝑒 𝑛 are the coordinate basis vectors of F2𝑛 ). Here 𝐾 = | 𝐴 + 𝐴| /| 𝐴| ∼ 𝑛/2 and |⟨𝐴⟩| =
(2𝑛 /𝑛) | 𝐴|. However, instead of trying to cover 𝐴 by a single subspace, we can easy cover
𝐴 by a small number of translates of a subspace with size comparable to 𝐴, namely 𝐴 is
covered by {𝑒 1 } × F2𝑚 , . . . , {𝑒 𝑛 } × F2𝑚 , which are translates of each other and each has size
≤ | 𝐴|.
The Polynomial Freiman–Ruzsa conjecture in F2𝑛 proposes a variant of Freiman’s theorem
with polynomial bounds, where we are only required to cover a large fraction of 𝐴. Ruzsa
(1999) attributes the conjecture to Marton.

Conjecture 7.12.1 (Polynomial Freiman–Ruzsa in F2𝑛 )

If 𝐴 ⊆ F2𝑛 , and | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then there exists a subspace 𝑉 ⊆ F2𝑛 with |𝑉 | ≤ | 𝐴| such
that 𝐴 can be covered by 𝐾 𝑂 (1) cosets of 𝑉.

The best current result says that in Conjecture 7.12.1 one can cover 𝐴 by exp((log 𝐾) 𝑂 (1) )
cosets of 𝑉 (Sanders 2012). This is called a quasipolynomial bound.
260 Structure of Set Addition

This conjecture has several equivalent forms. Here we give some highlights. For more
details, including proofs of equivalence, see the online note accompanying Green (2005c)
titled Notes on the Polynomial Freiman–Ruzsa Conjecture.
For example, here is a formulation where we just need to use one subspace to cover a large
fraction of 𝐴.
Conjecture 7.12.2 (Polynomial Freiman–Ruzsa in F2𝑛 )
If 𝐴 ⊆ F2𝑛 , and | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then there exists an affine subspace 𝑉 ⊆ F2𝑛 with
|𝑉 | ≤ | 𝐴| such that |𝑉 ∩ 𝐴| ≥ 𝐾 −𝑂 (1) | 𝐴|.

Proof of equivalence of Conjecture 7.12.1 and Conjecture 7.12.2. Conjecture 7.12.1 im-
plies Conjecture 7.12.2 since by the pigeonhole principle, at least one of the cosets of 𝑉
covers ≥ 𝐾 −𝑂 (1) fraction of 𝐴.
Now assume Conjecture 7.12.2. Let 𝐴 ⊆ F2𝑛 with | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|. Let 𝑉 be as in
Conjecture 7.12.2. By the Ruzsa covering lemma (Theorem 7.4.1) with 𝑋 = 𝐴 and 𝐵 = 𝑉 ∩ 𝐴
we find 𝑇 ⊆ 𝑋 with |𝑇 | ≤ |𝑋 + 𝐵| /|𝑋 | ≤ | 𝐴 + 𝐴| /| 𝐴| ≤ 𝐾 such that 𝐴 ⊆ 𝑇 + 𝐵 − 𝐵 ⊆ 𝑇 +𝑉.
The conclusion of Conjecture 7.12.1 holds. □
Here is another attractive equivalent formulation of the polynomial Freiman–Ruzsa con-
jecture in F2𝑛 .

Conjecture 7.12.3 (Polynomial Freiman–Ruzsa in F2𝑛 )

If 𝑓 : F2𝑛 → F2𝑛 satisfies
{ 𝑓 (𝑥, 𝑦) − 𝑓 (𝑥) − 𝑓 (𝑦) : 𝑥, 𝑦 ∈ F2𝑛 } ≤ 𝐾,
then there exists a linear function 𝑔 : F2𝑛 → F2𝑛 such that
{ 𝑓 (𝑥) − 𝑔(𝑥) : 𝑥 ∈ F2𝑛 } ≤ 𝐾 𝑂 (1) .

In Conjecture 7.12.3, it is straightforward to show a bound of 2𝐾 instead of 𝐾 𝑂 (1) , since

we can extend 𝑓 to a linear function based on its values at some basis.
To state our third reformulation, we need the notion of the Gowers uniformity norm. Given
a finite abelian group Γ, and 𝑓 : Γ → C, define the 𝑼 3 uniformity norm of 𝑓 by

∥ 𝒇 ∥𝑼 3 := E 𝑥,𝑦1 ,𝑦2 ,𝑦3 𝑓 (𝑥) 𝑓 (𝑥 + 𝑦 1 ) 𝑓 (𝑥 + 𝑦 2 ) 𝑓 (𝑥 + 𝑦 3 )·
1/8
· 𝑓 (𝑥 + 𝑦 1 + 𝑦 2 ) 𝑓 (𝑥 + 𝑦 1 + 𝑦 3 ) 𝑓 (𝑥 + 𝑦 2 + 𝑦 3 ) 𝑓 (𝑥 + 𝑦 1 + 𝑦 2 + 𝑦 3 ) .

The 𝑈 3 norm plays a central role in Gowers’ proof of Szemerédi’s theorem for 4-APs (the
𝑈 3 norm is also discussed in Exercise 6.2.14).
If 𝑓 : F2𝑛 → {−1, 1} given by 𝑓 (𝑥) = (−1) 𝑞 ( 𝑥 ) where 𝑞 is a quadratic polynomial in 𝑛
variables over F2 (e.g., 𝑥1 + 𝑥1 𝑥2 + · · · ), then it is not hard to check that the expression in
the expectation above is identically 1 (it comes from taking three finite differences of 𝑞).
So ∥ 𝑓 ∥𝑈 3 = 1. For proving Szemerédi’s theorem for 4-APs, one would like a “1% inverse
result” showing that any 𝑓 : F2𝑛 → [−1, 1] satisfying ∥ 𝑓 ∥𝑈 3 ≥ 𝛿 must correlates with some
quadratic polynomial phase function (−1) 𝑞 ( 𝑥 ) . Such a result is known but it remains open
7.12 Polynomial Freiman–Ruzsa Conjecture 261

to find optimal quantitative bounds. The polynomial Freiman–Ruzsa conjecture in F2𝑛 is

equivalent to a 𝑈 3 inverse statement with polynomial bounds (Green & Tao 2010b; Lovett
2012).

Conjecture 7.12.4 (𝑈 3 inverse with polynomial bounds)

If 𝑓 : F2𝑛 → C with ∥ 𝑓 ∥ ∞ ≤ 1 and ∥ 𝑓 ∥𝑈3 ≥ 1/𝐾, then there exists a quadratic polynomial
𝑞(𝑥1 , . . . , 𝑥 𝑛 ) over F2 such that
E 𝑥 ∈F2𝑛 𝑓 (𝑥) (−1) 𝑞 ( 𝑥 ) ≥ 𝐾 −𝑂 (1) .

Remark 7.12.5 (Quantitative equivalence). It is known that the bounds in each of the
above conjectures are equivalent to each other up to a polynomial change. This means that
if one statement is true with conclusion ≤ 𝑓 (𝐾) then all the other statements are true with
conclusion ≤ 𝐶 𝑓 (𝐾) 𝐶 (appropriately interpreted) with some absolute constant 𝐶.

PFR in the integers

Now we formulate the polynomial Freiman–Ruzsa conjecture in Z instead of F2𝑛 . It is not
enough to use GAPs (Lovett & Regev 2017). Instead, we need to consider convex progres-
sions.

−→

Z
Z2

Definition 7.12.6 (Convex progression)

A centered convex progression in an abelian group Γ is defined to be an affine map
𝜙 : Z𝑑 ∩ 𝐵 → Γ
where 𝐵 is a centrally symmetric convex body. We define its dimension to be 𝑑 and its
volume to be |Z𝑑 ∩ 𝐵|.

Conjecture 7.12.7 (Polynomial Freiman–Ruzsa conjecture in Z)

If 𝐴 ⊆ Z satisfies | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then one can cover 𝐴 using 𝐾 𝑂 (1) translates of some
centered convex progression of dimension 𝑂 (log 𝐾) and volume at most | 𝐴|.

More generally, one can formulate the polynomial Freiman–Ruzsa conjecture in an arbi-
trary abelian group.

Definition 7.12.8 (Centered convex coset progression)

In an abelian group, a centered convex coset progression is a set of the form 𝑃 + 𝐻,
where 𝑃 is a centered convex progression and 𝐻 is a subgroup. Its dimension is defined
to be the dimension of 𝑃, and is volume is defined to be |𝐻| vol 𝑃.
262 Structure of Set Addition

Conjecture 7.12.9 (Polynomial Freiman–Ruzsa conjecture in abelian groups)

If 𝐴 is a finite subset of an abelian group satisfying | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then one can cover 𝐴
using 𝐾 𝑂 (1) translates of some centered convex coset progression of dimension 𝑂 (log 𝐾)
and volume at most | 𝐴|.

For both Conjecture 7.12.7 and Conjecture 7.12.9, the best current result uses exp((log 𝐾) 𝑂 (1) )
translates and dimension bound (log 𝐾) 𝑂 (1) (Sanders 2012, 2013).

7.13 Additive Energy and the Balog–Szemerédi–Gowers Theorem

We introduce a new way of measuring additive structure by counting the number of solutions
to the equation 𝑎 + 𝑏 = 𝑐 + 𝑑.

Definition 7.13.1 (Additive energy)

Let 𝐴 be a finite set in an abelian group. Its additive energy is defined to be
𝑬 (𝑨) := |{(𝑎, 𝑏, 𝑐, 𝑑) ∈ 𝐴 × 𝐴 × 𝐴 × 𝐴 : 𝑎 + 𝑏 = 𝑐 + 𝑑}| .

Remark 7.13.2. The additive energy of 𝐴 counts 4-cycles in the bipartite Cayley graph with
generating set 𝐴. It is called an “energy” since we can write it as an 𝐿 2 quantity
∑︁
𝐸 ( 𝐴) = 𝑟 𝐴 (𝑥) 2
𝑥

where
𝒓 𝑨 (𝒙) := |{(𝑎, 𝑏) ∈ 𝐴 × 𝐴 : 𝑎 + 𝑏 = 𝑥}|
is the number of ways to write 𝑥 as the sum of two elements of 𝐴.
We have the easy bound
2 | 𝐴| 2 − | 𝐴| ≤ 𝐸 ( 𝐴) ≤ | 𝐴| 3 .
The lower bound is due to trivial solutions 𝑎 + 𝑏 = 𝑎 + 𝑏 and 𝑎 + 𝑏 = 𝑏 + 𝑎. The lower bound
is tight for sets without non-trivial solutions to 𝑎 + 𝑏 = 𝑐 + 𝑑. The upper bound is due to 𝑑
being determined by 𝑎, 𝑏, 𝑐 when 𝑎 + 𝑏 = 𝑐 + 𝑑. It is tight when 𝐴 is a subgroup.
Here is the main question we explore in this section.

Question 7.13.3
What is the relationship between small doubling and large additive energy? (Both encode
some notion of “lots of additive structure.”)

One direction is easy.

Proposition 7.13.4 (Small doubling implies large additive energy)

Let 𝐴 be a finite subset of an abelian group satisfying | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|. Then
| 𝐴| 3
𝐸 ( 𝐴) ≥ .
𝐾
7.13 Additive Energy and the Balog–Szemerédi–Gowers Theorem 263

Proof. Using 𝑟 𝐴 (𝑥) from Remark 7.13.2, By the Cauchy–Schwarz inequality

!2
∑︁ 1 ∑︁ | 𝐴| 4 | 𝐴| 3
𝐸 ( 𝐴) = 𝑟 𝐴 (𝑥) 2 ≥ 𝑟 𝐴 (𝑥) = ≥ . □
𝑥 ∈ 𝐴+𝐴
| 𝐴 + 𝐴| 𝑥 ∈ 𝐴+𝐴
| 𝐴 + 𝐴| 𝐾
The next example shows that the converse does not hold.
Example 7.13.5 (Large additive energy does not imply small doubling). The set

𝐴 = [𝑁] ∪ 2𝑁 + 1, 2𝑁 + 2, . . . , 2𝑁 + 2 𝑁
is the union of a set of small doubling and a set withour no additive structure. The first
component has large additive energy, and so 𝐸 ( 𝐴) = Θ(𝑁 3 ). On the other hand, the second
component gives large doubling | 𝐴 + 𝐴| = Θ(𝑁 2 ).
However, we do have a converse if we allow passing to large subsets. Balog & Szemerédi
(1994) showed that every set with large additive energy must contain a large subset with
small doubling. Their proof used the regularity method required tower-type dependencies on
the bounds. Gowers (2001) gave a new proof with much better bounds, and this result played
a key role in his work on a new proof of Szemerédi’s theorem. We will see Gowers’ proof
here. The presentation stems from Sudakov, Szemerédi, & Vu (2005).

Theorem 7.13.6 (Balog–Szemerédi–Gowers theorem)

Let 𝐴 be a finite subset of an abelian group satisfying
𝐸 ( 𝐴) ≥ | 𝐴| 3 /𝐾.
Then there is a subset 𝐴′ ⊆ 𝐴 with
| 𝐴′ | ≥ 𝐾 −𝑂 (1) | 𝐴| and | 𝐴′ + 𝐴′ | ≤ 𝐾 𝑂 (1) | 𝐴′ | .

We will prove a version of the theorem allowing two different sets. Given two finite sets
𝐴 and 𝐵 in an abelian group, define their additive energy to be
𝑬 (𝑨, 𝑩) := |{(𝑎, 𝑏, 𝑎 ′ , 𝑏 ′ ) ∈ 𝐴 × 𝐵 × 𝐴 × 𝐵 : 𝑎 + 𝑏 = 𝑎 ′ + 𝑏 ′ }| .
Then 𝐸 ( 𝐴, 𝐴) = 𝐸 ( 𝐴).

Theorem 7.13.7 (Balog–Szemerédi–Gowers theorem)

Let 𝐴 and 𝐵 be finite subsets of the same abelian group. If | 𝐴| , |𝐵| ≤ 𝑛 and
𝐸 ( 𝐴, 𝐵) ≥ 𝑛3 /𝐾
then there exist subsets 𝐴′ ⊆ 𝐴 and 𝐵′ ⊆ 𝐵 with
| 𝐴′ | , |𝐵′ | ≥ 𝐾 −𝑂 (1) 𝑛 and | 𝐴′ + 𝐵′ | ≤ 𝐾 𝑂 (1) 𝑛.

Proof that Theorem 7.13.7 implies Theorem 7.13.6. Suppose 𝐸 ( 𝐴) ≥ | 𝐴| 3 /𝐾. Apply The-
orem 7.13.7 with 𝐵 = 𝐴 to obtain 𝐴′ , 𝐵′ ⊆ 𝐴 with | 𝐴′ | , |𝐵′ | ≥ 𝐾 −𝑂 (1) | 𝐴| and | 𝐴′ + 𝐵′ | ≤
𝐾 𝑂 (1) | 𝐴|. Then by Corollary 7.3.6, a variant of the Ruzsa triangle inequality, we have
| 𝐴′ + 𝐵 ′ | 2
| 𝐴′ + 𝐴′ | ≤ ≤ 𝐾 𝑂 (1) | 𝐴| .
|𝐵′ |
264 Structure of Set Addition

□
We will prove Theorem 7.13.7 by setting up a graph.

Definition 7.13.8 (Restricted sumset)

Let 𝐴 and 𝐵 be subsets of an abelian group and let 𝐺 be a bipartite graph with vertex
bipartition 𝐴 ∪ 𝐵. We define the restricted sumset 𝐴 +𝐺 𝐵 to be the set of sums along
edges of 𝐺:
𝑨 +𝑮 𝑩 := {𝑎 + 𝑏 : (𝑎, 𝑏) ∈ 𝐴 × 𝐵 is an edge in 𝐺}.

Here is a graphical version of the Balog–Szemerédi–Gowers theorem.

Theorem 7.13.9 (Graph BSG)

Let 𝐴 and 𝐵 be finite subsets of an abelian group and let 𝐺 be a bipartite graph with
vertex bipartition 𝐴 ∪ 𝐵. If | 𝐴| , |𝐵| ≤ 𝑛,
𝑛2
𝑒(𝐺) ≥ and | 𝐴 +𝐺 𝐵| ≤ 𝐾𝑛,
𝐾
then there exist subsets 𝐴′ ⊆ 𝐴 and 𝐵′ ⊆ 𝐵 with
| 𝐴′ | , |𝐵′ | ≥ 𝐾 −𝑂 (1) 𝑛 and | 𝐴′ + 𝐵′ | ≤ 𝐾 𝑂 (1) 𝑛.

Proof that Theorem 7.13.9 implies Theorem 7.13.7. Denote the number of ways to write 𝑥
as 𝑎 + 𝑏 by
𝒓 𝑨,𝑩 (𝒙) := |{(𝑎, 𝑏) ∈ 𝐴 × 𝐵 : 𝑎 + 𝑏 = 𝑥}| .
Consider the “popular sums”
n 𝑛 o
𝑆 = 𝑥 ∈ 𝐴 + 𝐵 : 𝑟 𝐴,𝐵 (𝑥) ≥
2𝐾
Build a bipartite graph 𝐺 with bipartition 𝐴 ∪ 𝐵 such that (𝑎, 𝑏) ∈ 𝐴 × 𝐵 is an edge if and
only if 𝑎 + 𝑏 ∈ 𝑆.
We claim that 𝐺 has many edges, by showing that “unpopular sums” account for at most
half of 𝐸 ( 𝐴, 𝐵). Note that
𝑛3 ∑︁ ∑︁
≤ 𝐸 ( 𝐴, 𝐵) = 𝑟 𝐴,𝐵 (𝑥) 2 + 𝑟 𝐴,𝐵 (𝑥) 2 . (7.4)
𝐾 𝑥 ∈𝑆 𝑥∉𝑆

Because 𝑟 𝐴,𝐵 (𝑥) < 𝑛/(2𝐾) when 𝑥 ∉ 𝑆, we can bound the second term as
∑︁ 𝑛 ∑︁ 𝑛 𝑛3
𝑟 𝐴,𝐵 (𝑥) 2 ≤ 𝑟 𝐴,𝐵 (𝑥) ≤ | 𝐴| |𝐵| ≤ ,
𝑥∉𝑆
2𝐾 𝑥∉𝑆 2𝐾 2𝐾
and setting back into (7.4) yields
𝑛3 ∑︁ 𝑛3
≤ 𝑟 𝐴,𝐵 (𝑥) 2 + ,
𝐾 𝑥 ∈𝑆
2𝐾
and so
∑︁ 𝑛3
𝑟 𝐴,𝐵 (𝑥) 2 ≥ .
𝑥 ∈𝑆
2𝐾
7.13 Additive Energy and the Balog–Szemerédi–Gowers Theorem 265

Moreover, because 𝑟 𝐴,𝐵 (𝑥) ≤ | 𝐴| ≤ 𝑛 for all 𝑥, it follows that

∑︁ ∑︁ 𝑟 𝐴,𝐵 (𝑥) 2 𝑛2
𝑒(𝐺) = 𝑟 𝐴,𝐵 (𝑥) ≥ ≥ .
𝑥 ∈𝑆 𝑥 ∈𝑆
𝑛 2𝐾
Furthermore, 𝐴 +𝐺 𝐵 ⊆ 𝑆,
𝑛
| 𝐴 +𝐺 𝐵| ≤ | 𝐴| |𝐵| ≤ 𝑛2 ,
2𝐾
so | 𝐴 +𝐺 𝐵| ≤ 2𝐾𝑛. Hence, we can apply Theorem 7.13.9 to find sets 𝐴′ ⊆ 𝐴 and 𝐵′ ⊆ 𝐵
with the desired properties. □

Proof of graph BSG

The remainder of this section will focus on proving BSG (Theorem 7.13.9). We begin with
a few lemmas.
Lemma 7.13.10 (Path of length 2 lemma)
Let 𝛿, 𝜀 > 0. Let 𝐺 be a bipartite graph with vertex bipartition 𝐴 ∪ 𝐵 and at least 𝛿 | 𝐴| |𝐵|
edges. Then there is some 𝑈 ⊆ 𝐴 with |𝑈| ≥ 𝛿 | 𝐴| /2 such that at least (1 − 𝜀)-fraction
of the pairs (𝑥, 𝑦) ∈ 𝑈 2 have at least 𝜀𝛿2 |𝐵| /2 neighbors common to 𝑥 and 𝑦.

The proof uses the dependent random choice technique from Section 1.7. Instead of
quoting theorems from that section, let us prove the result from scratch.
𝐴 𝐵
𝑣
𝑈
𝑥
𝑦

Proof. Say that a pair (𝑥, 𝑦) ∈ 𝐴2 is “unfriendly” if it has < 𝜀𝛿2 |𝐵| /2 common neighbors.
Choose 𝑣 ∈ 𝐵 uniformly at random and let 𝑈 = 𝑁 (𝑣) be its neighborhood in 𝑣. We have
𝑒(𝐺)
E |𝑈| = E |𝑁 (𝑣)| = ≥ 𝛿 | 𝐴| .
|𝐵|
For each fixed pair (𝑥, 𝑦) ∈ 𝐴2 , we have
codeg(𝑥, 𝑦)
P(𝑥, 𝑦 ∈ 𝑈) = P(𝑥, 𝑦 ∈ 𝑁 (𝑣)) = .
|𝐵|
So if (𝑥, 𝑦) is unfriendly, then P(𝑥, 𝑦 ∈ 𝑈) < 𝜀𝛿2 /2. Let 𝑋 be the number of unfriendly pairs
(𝑥, 𝑦) ∈ 𝑈 2 . Then
∑︁ 𝜀𝛿2
E𝑋 = P(𝑥, 𝑦 ∈ 𝑈) < | 𝐴| 2 .
2
2
( 𝑥,𝑦) ∈ 𝐴
unfriendly
266 Structure of Set Addition

Hence, we have

𝑋
2 E𝑋 𝛿2
E |𝑈| − ≥ (E |𝑈|) 2 − > | 𝐴| 2 .
𝜀 𝜀 2
So for some 𝑣 ∈ 𝐵, 𝑈 = 𝑁 (𝑣) satisfies
𝑋 𝛿2
|𝑈| 2 − ≥ | 𝐴| 2 .
𝜀 2
Then this 𝑈 ⊆ 𝐴 satisfies |𝑈| 2 ≥ 𝛿2 | 𝐴| 2 /2, and so |𝑈| ≥ 𝛿 | 𝐴| /2. Moreover, we have
𝑋 ≤ 𝜀 |𝑈| 2 , so at most 𝜀-fraction of pairs (𝑥, 𝑦) ∈ 𝑈 2 are unfriendly. □

Lemma 7.13.11 (Path of length 3 lemma)

Let 𝛿 > 0. Let 𝐺 be a bipartite graph with vertex bipartition 𝐴 ∪ 𝐵 and at least 𝛿 | 𝐴| |𝐵|
edges. Then there are subsets 𝐴′ ⊆ 𝐴 and 𝐵′ ⊆ 𝐵 with | 𝐴′ | ≥ 𝑐𝛿𝐶 | 𝐴| and |𝐵′ | ≥ 𝑐𝛿𝐶 |𝐵|.
such that the number of 3-edge paths joining every pair (𝑎, 𝑏) ∈ 𝐴′ × 𝐵′ is at least
𝑐𝛿𝐶 | 𝐴| |𝐵|, and Here 𝑐, 𝐶 > 0 are absolute constants.

𝐴 𝐴
𝐴1
𝐴2
𝐵′
𝐴′
𝑏

Proof. We repeatedly trim low degree vertices.

Call vertices a pair of vertices in 𝐴 “friendly” if they have ≥ 𝛿3 |𝐵| /20 common neighbors.
Define
𝛿
𝐴1 := 𝑎 ∈ 𝐴 : deg 𝑎 ≥ |𝐵| .
2
Since each vertex in 𝐴 \ 𝐴1 has < 𝛿 |𝐵| /2 neighbors, 𝑒( 𝐴 \ 𝐴1 , 𝐵) ≤ 𝛿 | 𝐴| |𝐵| /2. So
𝛿 𝛿
𝑒( 𝐴1 , 𝐵) = 𝑒( 𝐴, 𝐵) − 𝑒( 𝐴 \ 𝐴1 , 𝐵) ≥ 𝛿 | 𝐴| |𝐵| − | 𝐴| |𝐵| ≥ | 𝐴| |𝐵| .
2 2
Hence | 𝐴1 | ≥ 𝛿 | 𝐴| /2.
Construct 𝐴2 ⊆ 𝐴1 via the path of length 2 lemma (Lemma 7.13.10) on ( 𝐴1 , 𝐵) with
𝜀 = 𝛿/10. Then, | 𝐴2 | ≥ 𝛿 | 𝐴1 | /2 ≥ 𝛿2 | 𝐴| /4 and ≤ (𝛿/10)-fraction pairs of vertices in 𝐴2
are unfriendly.
Set
𝛿
𝐵′ = 𝑏 ∈ 𝐵 : deg(𝑏, 𝐴2 ) ≥ | 𝐴2 | .
4
Since each vertex in 𝐵\𝐵′ has < 𝛿 | 𝐴2 | /4 neighbors in 𝐴2 , and so 𝑒( 𝐴2 , 𝐵\𝐵′ ) ≤ 𝛿 | 𝐴2 | |𝐵| /4
7.13 Additive Energy and the Balog–Szemerédi–Gowers Theorem 267

edges. Since every vertex in 𝐴1 has ≥ 𝛿 |𝐵| /2 neighbors in 𝐵, and 𝐴2 ⊆ 𝐴1 , we have

𝑒( 𝐴2 , 𝐵) ≥ 𝛿 | 𝐴2 | |𝐵| /2. Hence
𝛿 𝛿 𝛿
𝑒( 𝐴2 , 𝐵′ ) = 𝑒( 𝐴2 , 𝐵) − 𝑒( 𝐴2 , 𝐵 \ 𝐵′ ) ≥ | 𝐴2 | |𝐵| − | 𝐴2 | |𝐵| ≥ | 𝐴2 | |𝐵| .
2 4 4
Hence |𝐵′ | ≥ 𝛿 |𝐵| /4.
Let
𝐴′ = {𝑎 ∈ 𝐴2 : 𝑎 is friendly to ≥ (1 − 𝛿/5)-fraction of 𝐴2 } .
Since ≤ (𝛿/10)-fraction of pairs of vertices in 𝐴2 are unfriendly, we have | 𝐴′ | ≥ | 𝐴2 | /2 ≥
𝛿2 | 𝐴| /8.
We claim that that 𝐴′ and 𝐵′ satisfy the desired conclusions. Let (𝑎, 𝑏) ∈ 𝐴′ × 𝐵′ . Because
𝑏 is adjacent to ≥ 𝛿 | 𝐴2 | /4 vertices in 𝐴2 and 𝑎 is friendly to ≥ (1 − 𝛿/5) | 𝐴2 | vertices in
𝐴2 , there are ≥ 𝛿 | 𝐴2 | /20 vertices in 𝐴2 both friendly to 𝑎 and adjacent to 𝑏. For each such
𝑎 1 ∈ 𝐴2 , there are ≥ 𝛿3 |𝐵| /20 vertices 𝑏 1 ∈ 𝐵 for which 𝑎𝑏 1 𝑎 1 𝑏 is a path of length 3, so the
number of paths of length 3 from 𝑎 to 𝑏 is at least
𝛿 𝛿3 𝛿 𝛿2 𝛿3 𝛿6
| 𝐴2 | · |𝐵| ≥ · | 𝐴| · |𝐵| ≥ | 𝐴| |𝐵| .
20 20 20 4 20 1600
Furthermore, recall that | 𝐴′ | ≥ 𝛿2 | 𝐴| /8 and |𝐵′ | ≥ 𝛿 |𝐵| /4. □
We can use the path of length 3 lemma to prove the graph-theoretic analogue of the
Balog–Szemerédi–Gowers theorem.
𝐴 𝐵
𝑏
𝐴′ 𝑥=𝑎+ 1
𝑏1
1
𝑎 +𝑏
𝑎1
𝑦= 𝐵′
𝑧 = 𝑎1 + 𝑏
𝑏
𝑎1

Proof of Theorem 7.13.9 (Graph BSG). Since 𝑒(𝐺) ≥ 𝑛2 /𝐾, we have | 𝐴| , |𝐵| ≥ 𝑛/𝐾. By
the path of length 3 lemma (Lemma 7.13.11), we can find 𝐴′ ⊆ 𝐴 and 𝐵′ ⊆ 𝐵 each with
size ≥ 𝐾 −𝑂 (1) 𝑛 such that for every (𝑎, 𝑏) ∈ 𝐴′ × 𝐵′ , there are ≥ 𝐾 −𝑂 (1) 𝑛2 paths 𝑎𝑏 1 𝑎 1 𝑏 in
𝐺 with 𝑎 1 ∈ 𝐴 and 𝑏 1 ∈ 𝐵. Then, with
𝑥 = 𝑎 + 𝑏1, 𝑦 = 𝑎1 + 𝑏1, 𝑧 = 𝑎 1 + 𝑏,
we have
𝑎 + 𝑏 = 𝑥 − 𝑦 + 𝑧.
This shows that every element of 𝐴′ + 𝐵′ can be written as 𝑥 − 𝑦 + 𝑧 for some 𝑥 − 𝑦 + 𝑧 in
≥ 𝐾 −𝑂 (1) 𝑛2 ways (for a given (𝑎, 𝑏) ∈ 𝐴′ × 𝐵′ , these choices of 𝑥, 𝑦, 𝑧 are genuinely distinct;
why?). Thus
𝐾 −𝑂 (1) 𝑛2 | 𝐴′ + 𝐵′ | ≤ | 𝐴 +𝐺 𝐵| 3 ≤ 𝐾 3 𝑛3 .
Therefore | 𝐴′ + 𝐵′ | ≤ 𝐾 𝑂 (1) 𝑛. □
268 Structure of Set Addition

Further Reading
See Ruzsa’s lecture notes Sumsets and Structure (2009) for a comprehensive introduction to
many topics related to set addition, including but not limited to Freiman’s theorem.
Sanders’ article The Structure of Set Addition Revisited (2013) provides a modern expo-
sition of Freiman’s theorem and his proof of the quasipolynomial Freiman–Ruzsa theorem.
Lovett’s article An Exposition of Sanders’ Quasi-Polynomial Freiman–Ruzsa Theorem (2015)
gives a gentle exposition of Sanders’ proof in F2𝑛 .
The methods discussed in this chapter play a central role in Gowers’ proof of Szemerédi’s
theorem. The proof for 4-APs is especially worth studying, It contains many beautiful ideas
and shows how these the topics in this chapter and the previous chapter are closely linked.
See the original paper by Gowers (1998a) on Szemerédi’s theorem for 4-APs as well as
excellent lecture notes by Gowers (1998b), Green (2009b), and Soundararajan (2007).

Chapter Summary
• Freiman’s theorem. Every 𝐴 ⊆ Z with | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| is contained in a generalized
arithmetic progression (GAP) of dimension ≤ 𝑑 (𝐾) and volume ≤ 𝑓 (𝐾) | 𝐴|,
– Informally: a set with small doubling is contained in a small GAP.
– Up to constants, this gives a complete characterization of integer sets with bounded
doubling.
• Rusza triangle inequality. | 𝐴| |𝐵 − 𝐶 | ≤ | 𝐴 − 𝐵| | 𝐴 − 𝐶 |.
• Plünnecke’s inequality | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴| implies |𝑚 𝐴 − 𝑛𝐴| ≤ 𝐾 𝑚+𝑛 | 𝐴|.
• Ruzsa covering lemma. Idea: take a maximally disjoint set of translates, and their expan-
sions must cover the entire space.
• Freiman’s theorem in groups with bounded exponent. A set with bounded doubling is
contained in a small subgroup.
• Freiman 𝑠-homomorphisms are maps preserving 𝑠-fold sums.
• Ruzsa modeling lemma. A set of integers with small doubling can be partially modeled
as a large fraction of a cyclic group via a Freiman isomorphism.
• Bogolyubov’s lemma. If 𝐴 is large, then 2𝐴 − 2𝐴 contains a large subspace (finite field
model) or GAP (cyclic group).
• A large Bohr set contains a large GAP. Proof uses Minkowski’s second theorem from
the geometry of numbers.
• Polynomial Freiman–Ruzsa conjeture: a central conjecture in additive combinatorics.
The finite field model version has several equivalent and attractive statements, one of
which says: if 𝐴 ⊆ F2𝑛 , and | 𝐴 + 𝐴| ≤ 𝐾 | 𝐴|, then 𝐴 can be covered using 𝐾 𝑂 (1) translates
of some subspace with cardinality ≤ | 𝐴|.
• The additive energy 𝐸 ( 𝐴) of a set 𝐴 is the number of solutions to 𝑎 + 𝑏 = 𝑐 + 𝑑 in 𝐴.
• Balog–Szemerédi–Gowers theorem. If 𝐸 ( 𝐴) ≥ | 𝐴| 3 /𝐾, then 𝐴 has a subset 𝐴′ with
| 𝐴′ | ≥ 𝐾 −𝑂 (1) | 𝐴| and | 𝐴′ + 𝐴′ | ≤ 𝐾 𝑂 (1) | 𝐴′ |.
– Informally: a set with large additive energy contains a large subset with small doubling.
8

Sum-Product Problem

Chapter Highlights
• The sum-product problem: show either 𝐴 + 𝐴 or 𝐴 · 𝐴 must be large
• Erdős multiplication table problem
• Crossing number inequality: lower bound on the number of crossings in a graph drawing
• Szemerédi–Trotter theorem on point-line incidences
• Elekes’ sum-product bound using incidence geometry
• Solymosi’s sum-product bound via multiplicative energy

In the previous chapter we studied the sumset

𝑨 + 𝑨 := {𝑎 + 𝑏 : 𝑎, 𝑏 ∈ 𝐴} .
Likewise we can also consider the product set
𝑨 · 𝑨 = 𝑨 𝑨 := {𝑎𝑏 : 𝑎, 𝑏 ∈ 𝐴}

Question 8.0.1 (Sum-product problem)

Can the sumset and the product set be simultaneously small?

Arithmetic progressions have small additive doubling, while geometric progressions have
small multiplicative doubling. However, perhaps a set cannot simultaneously look both like
an arithmetic and a geometric progression.
Erdős & Szemerédi (1983) conjectured that at least one of 𝐴 + 𝐴 and 𝐴𝐴 is close to
quadratic size.

Conjecture 8.0.2 (Sum-product conjecture)

For every finite subset 𝐴 of R, we have
max {| 𝐴 + 𝐴| , | 𝐴𝐴|} ≥ | 𝐴| 2−𝑜(1) .
Here 𝑜(1) is some quantity that goes to zero as | 𝐴| → ∞.

Erdős & Szemerédi (1983) proved bounds of the form

max {| 𝐴 + 𝐴| , | 𝐴𝐴|} ≥ | 𝐴| 1+𝑐
for some constant 𝑐 > 0. In this chapter, we will give two different proofs of the above form.
First, we present a proof by Elekes (1997) using incidence geometry, in particular a seminal
theorem of Szemerédi & Trotter (1983) on the incidences of points and lines. Second, we
present a proof by Solymosi (2009) using multiplicative energy, which gives nearly the best
bound to date.

269
270 Sum-Product Problem

8.1 Multiplication Table Problem

Let us first explain why we need the error term −𝑜(1) in the exponent in Conjecture 8.0.2.
Erdős (1955) posed the following problem.

Question 8.1.1 (Erdős multiplication table problem)

What is the size of [𝑁] · [𝑁]?

This is asking for the number of distinct entries that appear in the 𝑁 × 𝑁 multiplication
table.
1 2 3 4 5 9 10 · · ·
6 7 8
2 4 6 8 10 12 14 16 18 20 · · ·
3 6 9 12 15 18 21 24 27 30 · · ·
4 8 12 16 20 24 28 32 36 40 · · ·
5 10 15 20 25 30 35 40 45 50 · · ·
6 12 18 24 30 36 42 48 54 60 · · ·
7 14 21 28 35 42 49 56 63 70 · · ·
8 16 24 32 40 48 56 64 72 80 · · ·
9 18 27 36 45 54 63 72 81 90 · · ·
10 20 30 40 50 60 70 80 90 100 · · ·
.. .. .. .. .. .. .. .. .. .. . .
. . . . . . . . . . .

After much work, we now have a satisfactory answer. A precise estimate was given by
Ford (2008):

𝑁2
| [𝑁] · [𝑁]| = Θ
(log 𝑁) 𝛿 (log log 𝑁) 3/2
where 𝛿 = 1 − (1 + log log 2)/log 2 ≈ 0.086. Here we give a short proof of some weaker
estimates (Erdős 1955).

Theorem 8.1.2 (Estimates on the multiplication table problem)

𝑁2
(1 − 𝑜(1)) ≤ | [𝑁] · [𝑁]| = 𝑜(𝑁 2 )
2 log 𝑁

This already show that is false that at least one of 𝐴 + 𝐴 and 𝐴𝐴 has size ≥ 𝑐 | 𝐴| 2 . So we
cannot remove the −𝑜(1) term from the exponent in the sum-product conjecture.
To prove Theorem 8.1.2, we apply the following fact from number theory due to Hardy &
Ramanujan (1917). A short probabilistic method proof was given by Turán (1934); also see
Alon & Spencer (2016, Section 4.2).

Theorem 8.1.3 (Hardy–Ramanujan theorem)

All but 𝑜(𝑁) positive integers up to 𝑁 have (1 + 𝑜(1)) log log 𝑁 prime factors counted
with multiplicity.
8.2 Crossing Number Inequality and Point-Line Incidences 271

Proof of Theorem 8.1.2. First let us prove the upper bound. By the Hardy–Ramanujan
theorem, all but at most 𝑜(𝑁 2 ) of the elements of [𝑁] · [𝑁] have (2 + 𝑜(1)) log log 𝑁
prime factors. However, by the Hardy–Ramanujan theorem again, all but 𝑜(𝑁 2 ) of posi-
tive integers ≤ 𝑁 2 have (1 + 𝑜(1)) log log(𝑁 2 ) = (1 + 𝑜(1)) log log 𝑁 prime factors, and
thus cannot appear in [𝑁] · [𝑁]. Hence |[𝑁] · [𝑁]| = 𝑜(𝑁 2 ). (Remark: this proof gives
|[𝑁] · [𝑁]| = 𝑂 (𝑁 2 /log log 𝑁).)
Now let us prove the lower bound by giving a lower bound to the number of positive
integers ≤ 𝑁 2 of the form 𝑝𝑚, where 𝑝 is a prime in (𝑁 2/3 , 𝑁] and 𝑚 ≤ 𝑁. Every such 𝑛
has at most 2 such representations as 𝑝𝑚 since 𝑛 ≤ 𝑁 2 can have at most two prime factors
greater than 𝑁 2/3 . There are (1 + 𝑜(1))𝑁/log 𝑁 primes in (𝑁 2/3 , 𝑁] by the prime number
theorem. So the number of distinct such 𝑝𝑚 is ≥ (1/2 − 𝑜(1))𝑁 2 /log 𝑁. □
Remark 8.1.4. The lower bound (up to a constant factor) also follows from Solymosi’s
sum-product estimate that we will see later in Theorem 8.3.1.

8.2 Crossing Number Inequality and Point-Line Incidences

The goal of this section is to give a proof of the following sum-product estimate, due to
Elekes (1997), using incidence geometry. Recall we use 𝑓 ≳ 𝑔 to mean that 𝑓 ≥ 𝑐𝑔 for some
constant 𝑐 > 0.

Theorem 8.2.1 (Elekes’ sum-product bound)

Every finite 𝐴 ⊆ R satisfies
| 𝐴 + 𝐴| | 𝐴𝐴| ≳ | 𝐴| 5/2 .

Corollary 8.2.2 (Elekes’ sum-product bound)

Every finite 𝐴 ⊆ R satifies
max{| 𝐴 + 𝐴| , | 𝐴𝐴|} ≳ | 𝐴| 5/4 .

We introduce a basic result from geometric graph theory.

Crossing number inequality

The crossing number cr(𝐺) of a graph 𝐺 is defined to be the minimum number of edge
crossings in a planar drawing of 𝐺 where edges are drawn with continuous curves.
The next theorem shows that every drawing of a graph with many edges necessarily has
lots of edge crossings. For example, it implies that every 𝑛-vertex graph with Ω(𝑛2 ) edges
has Ω(𝑛4 ) crossings; that is, a constant fraction of the edges must cross in a dense graph. This
result is independently due to Ajtai, Chvátal, Newborn, & Szemerédi (1982) and Leighton
(1984).
272 Sum-Product Problem

Theorem 8.2.3 (Crossing number inequality)

Every graph 𝐺 = (𝑉, 𝐸) with |𝐸 | ≥ 4 |𝑉 | has
|𝐸 | 3
cr(𝐺) ≳ .
|𝑉 | 2

Proof. For any connected planar graph 𝐺 = (𝑉, 𝐸) with at least one cycle, we have 3 |𝐹 | ≤
2 |𝐸 |, with |𝐹 | denoting the number of faces (including the outer face). The inequality follows
from double counting using that every face is adjacent to at least three edges and that every
edge is adjacent to at most two faces. By Euler’s formula, |𝑉 | − |𝐸 | + |𝐹 | = 2. Replacing |𝐹 |
using 3 |𝐹 | ≤ 2 |𝐸 |, we obtain |𝐸 | ≤ 3 |𝑉 | − 6. Therefore |𝐸 | ≤ 3 |𝑉 | holds for every planar
graph 𝐺 including ones that are not connected or do not have a cycle.
If an arbitrary graph 𝐺 = (𝑉, 𝐸) satisfies |𝐸 | > 3 |𝑉 |, then any drawing of 𝐺 can
be made planar by deleting at most cr(𝐺) edges, one for each crossing. It follows that
|𝐸 | − cr(𝐺) ≥ 3 |𝑉 | . Therefore, the following inequality holds universally for all graphs
𝐺 = (𝑉, 𝐸):
cr(𝐺) ≥ |𝐸 | − 3 |𝑉 | . (8.1)
Now we apply a probabilistic method technique to “boost” the above inequality to denser
graphs. Let 𝐺 = (𝑉, 𝐸) be a graph with |𝐸 | ≥ 4 |𝑉 |. Let 𝑝 ∈ [0, 1] be some real number to
be determined and let 𝐺 ′ = (𝑉 ′ , 𝐸 ′ ) be a graph obtained by independently randomly keeping
each vertex of 𝐺 with probability 𝑝. By (8.1), we have cr(𝐺 ′ ) ≥ |𝐸 ′ | − 3 |𝑉 ′ | for every 𝐺 ′ .
Therefore the same inequality must hold if we take the expected values of both sides:
E cr(𝐺 ′ ) ≥ E |𝐸 ′ | − 3E |𝑉 ′ | .
We have E |𝐸 ′ | = 𝑝 2 |𝐸 | since an edge remains in 𝐺 ′ if and only if both of its endpoints
are kept. Similarly E |𝑉 ′ | = 𝑝 |𝑉 |. By keeping the same drawing, we get the inequality
𝑝 4 cr(𝐺) ≥ E cr(𝐺 ′ ). Therefore
cr(𝐺) ≥ 𝑝 −2 |𝐸 | − 3𝑝 −3 |𝑉 | .
Finally set 𝑝 = 4 |𝑉 | /|𝐸 | ∈ [0, 1] (here we use |𝐸 | ≥ 4 |𝑉 |) to get cr(𝐺) ≳ |𝐸 | 3 /|𝑉 | 2 . □

Szemerédi–Trotter theorem on point-line incidences

Given a set of points P and the set of lines L, define the number of incidences to be
𝑰(P, L) := |{( 𝑝, ℓ) ∈ P × L : 𝑝 ∈ ℓ}|

Question 8.2.4 (Point-line incidence)

What’s the maximum number of incidences between 𝑛 points and 𝑚 lines?

One trivial upper bound is |P | |L|. We can get a better bound by using the fact that every
8.2 Crossing Number Inequality and Point-Line Incidences 273

pair of points is determined by at most one line:

|P | 2 ≥ |{( 𝑝, 𝑝 ′ , ℓ) ∈ P × P × L : 𝑝 𝑝 ′ ∈ ℓ , 𝑝 ≠ 𝑝 ′ }|
∑︁ 𝐼 (P, L) 2
≥ |P ∩ ℓ| (|P ∩ ℓ| − 1) ≥ − 𝐼 (P, L).
ℓ∈L |L| 2
The last inequality follows from Cauchy–Schwarz inequality. Therefore,
𝐼 (P, L) ≤ |P | |L| 1/2 + |L| .
By the same argument with the roles of points and lines swapped (or by applying point-line
duality),
𝐼 (P, L) ≤ |L| |P | 1/2 + |P | .
In particular these inequalities tell us that 𝑛 points and 𝑛 lines have 𝑂 (𝑛3/2 ) incidences.
The above bound only uses the fact that every pair of points determines at most one line.
Equivalently, we are only using that the bipartite point-line incidence graph is 4-cycle-free.
So the 𝑂 (𝑛3/2 ) bound (and the above proof) is the same as the 𝐾2,2 -free extremal number
bound from Section 1.4. Also, the 𝑂 (𝑛3/2 ) bound is tight for the finite field projective plane
over F𝑞 with 𝑛 = 𝑞 2 + 𝑞 + 1 points and 𝑛 = 𝑞 2 + 𝑞 + 1 lines gives 𝑛(𝑞 + 1) ∼ 𝑛3/2 incidences
(this the same construction showing that ex(𝑛, 𝐾2,2 ) ≳ 𝑛3/2 in Theorem 1.10.1).
On the other hand, in the real plane, the 𝑛3/2 bound can be substantially improved. The
following seminal result due to Szemerédi & Trotter (1983) gives a tight estimate on the
number of point-line incidences in the real plane.

Theorem 8.2.5 (Szemerédi–Trotter theorem)

For any set P of points and L of lines in R2 ,
𝐼 (P, L) ≲ |P | 2/3 |L| 2/3 + |P | + |L|

Corollary 8.2.6
The number of point-line incidences between 𝑛 points and 𝑛 lines in R2 is 𝑂 (𝑛4/3 ).

We will see a short proof using the crossing number inequality due to Székely (1997).
Since the inequality is false over finite fields, any proof necessarily requires the topology of
the real plane (via the application of Euler’s theorem in the proof of the crossing number
inequality).
Example 8.2.7. The bounds in both Theorem 8.2.5 and Corollary 8.2.6 are best possible
up to a constant factor. Here is an example showing that Corollary 8.2.6 is tight. Let P =
[𝑘] × [2𝑘 2 ] and L = {𝑦 = 𝑚𝑥 + 𝑏 : 𝑚 ∈ [𝑘], 𝑏 ∈ [𝑘 2 ]}. Then every line in L contains 𝑘
points from P, so 𝐼 (P, L) = 𝑘 4 = Θ(𝑛4/3 ).
274 Sum-Product Problem

𝑚=1 𝑚=2 𝑚=3

(3, 18) (3, 18) (3, 18)

𝑏 =9
𝑏 =9
𝑏 =9 𝑏 =8
𝑏 =8
𝑏 =8 𝑏 =7
𝑏 =7
𝑏 =7 𝑏 =6
𝑏 =6
𝑏 =6 𝑏 =5
𝑏 =5
𝑏 =5 𝑏 =4
𝑏 =4
𝑏 =4 𝑏 =3
𝑏 =3
𝑏 =3 𝑏 =2
𝑏 =2
𝑏 =2 𝑏 =1
𝑏 =1
𝑏 =1
(1, 1) (1, 1) (1, 1)

Proof of Theorem 8.2.5. We remove all lines in L containing at most one point in P. These
lines contribute to at most |L| incidences and thus does not affect the inequality we wish to
prove.
Now assume that every line in L contains at least two points of P. Turn every point of P
into a vertex and each line in L into edges connecting consecutive points of P on the line.
This constructs a drawing of a graph 𝐺 = (𝑉, 𝐸) on the plane.

−→

P and L graph 𝐺

Assume that 𝐼 (P, L) ≥ 8 |P | holds (otherwise we are done as 𝐼 (P, L) ≲ |P |). Each line
in L with 𝑘 incidences has 𝑘 − 1 ≥ 𝑘/2 edges. So |𝐸 | ≥ 𝐼 (P, L)/2 ≥ 4 |𝑉 |. The crossing
number inequality (Theorem 8.2.3) gives
|𝐸 | 3 𝐼 (P, L) 3
cr(𝐺) ≳ ≳ .
|𝑉 | 2 |P | 2
Moreover cr(𝐺) ≤ |L| 2 since every pair of lines intersect in at most one point. Rearranging
gives 𝐼 (P, L) ≲ |𝑃| 2/3 |𝐿| 2/3 . (Remember the linear contributions |P | + |L| that need to be
added back in due to the assumptions made earlier in the proof.) □
Now we are ready to prove the sum-product estimate in Theorem 8.2.1 for 𝐴 ⊆ R:
| 𝐴 + 𝐴| | 𝐴𝐴| ≳ | 𝐴| 5/2 .
Proof of Theorem 8.2.1. In R2 , consider a set of points
P = {(𝑥, 𝑦) : 𝑥 ∈ 𝐴 + 𝐴, 𝑦 ∈ 𝐴𝐴}
and a set of lines
L = {𝑦 = 𝑎(𝑥 − 𝑎 ′ ) : 𝑎, 𝑎 ′ ∈ 𝐴}.
8.3 Sum-Product via Multiplicative Energy 275

For a line 𝑦 = 𝑎(𝑥 − 𝑎 ′ ) in L, (𝑎 ′ + 𝑏, 𝑎𝑏) ∈ P is on the line for all 𝑏 ∈ 𝐴, so each line in L
contains ≥ | 𝐴| incidences. By definition of P and L, we have
|P | = | 𝐴 + 𝐴| | 𝐴𝐴| and |L| = | 𝐴| 2 .
By the Szemerédi–Trotter theorem (Theorem 8.2.5),
| 𝐴| 3 = | 𝐴| |L| ≤ 𝐼 (P, L) ≲ |P | 2/3 |L| 2/3 + |P | + |L|
≲ | 𝐴 + 𝐴| 2/3 | 𝐴𝐴| 2/3 | 𝐴| 4/3 .
The contributions from |P | + |L| are lower order as |P | = | 𝐴 + 𝐴| | 𝐴𝐴| ≤ | 𝐴| 4 = |L| 2 and
|L| = | 𝐴| 2 ≤ | 𝐴 + 𝐴| 2 | 𝐴𝐴| 2 = |P | 2 . Rearranging the above inequality gives
| 𝐴 + 𝐴| | 𝐴𝐴| ≳ | 𝐴| 5/2 . □
In Section 1.4, we proved an 𝑂 (𝑛3/2 ) upper bound on the unit distance problem (Ques-
tion 1.4.6) using the extremal number of 𝐾2,3 . The next exercise gives an improved bound
(in fact the best known result to date).
Exercise 8.2.8 (Unit distance bound). Using the crossing number inequality, prove given
𝑛 points in the plane, at most 𝑂 (𝑛4/3 ) pairs of points are separated by exactly unit distance.

8.3 Sum-Product via Multiplicative Energy

In this chapter, we give a different proof that gives a better sum-product estimate, due to
Solymosi (2009).

Theorem 8.3.1 (Solymosi’s sum-product bound)

Every finite set 𝐴 of positive reals satisfies
| 𝐴| 4
| 𝐴𝐴| | 𝐴 + 𝐴| 2 ≳
log | 𝐴|

Corollary 8.3.2 (Solymosi’s sum-product bound)

Every finite 𝐴 ⊆ R satisfies
max {| 𝐴 + 𝐴| , | 𝐴𝐴|} ≥ | 𝐴| 4/3−𝑜 (1) .

Proof of Theorem 8.3.1. We define multiplicative energy of 𝐴 to be

𝑬 × (𝑨) := |{(𝑎, 𝑏, 𝑐, 𝑑) ∈ 𝐴 × 𝐴 × 𝐴 × 𝐴 : 𝑎𝑏 = 𝑐𝑑}|
Note that the multiplicative energy is a multiplicative version of additive energy. As with
additive energy, having small multiplicative doubling implies large multiplicative energy, as
seen by an application of the Cauchy–Schwarz inequality:
∑︁ 2 | 𝐴| 4
𝐸 × ( 𝐴) = (𝑎, 𝑏) ∈ 𝐴2 : 𝑎𝑏 = 𝑥 ≥ .
𝑥 ∈ 𝐴𝐴
| 𝐴𝐴|
Let
𝑨/𝑨 := {𝑎/𝑏 : 𝑎, 𝑏 ∈ 𝐴} .
276 Sum-Product Problem

Write
𝑟 (𝑠) = |{(𝑎, 𝑏) ∈ 𝐴 × 𝐴 : 𝑠 = 𝑎/𝑏}| .
We have
∑︁
𝐸 × ( 𝐴) = 𝑟 (𝑠) 2 .
𝑠∈ 𝐴/𝐴

By pigeonhole principle (dyadic partitioning), there exists some nonnegative integer 𝑘 ≲

log | 𝐴| such that, setting
𝐷 = {𝑠 : 2𝑘 ≤ 𝑟 (𝑠) ≤ 2 𝑘+1 } and 𝑚 = |𝐷 | ,
one has
𝐸 × ( 𝐴) ∑︁
≲ 𝑟 (𝑠) 2 ≤ 𝑚22𝑘+2 . (8.2)
log | 𝐴| 𝑠∈𝐷
Let the elements of 𝐷 be 𝑠1 < 𝑠2 < · · · < 𝑠 𝑚 . For each 𝑖 ∈ [𝑚], let ℓ𝑖 be the line 𝑦 = 𝑠𝑖 𝑥. Let
ℓ𝑚+1 be the vertical ray 𝑥 = min( 𝐴) above ℓ𝑚 .
ℓ𝑚 ℓ𝑚−1

ℓ𝑚+1

ℓ2
𝐴
𝐿2
𝐿1 + 𝐿2
ℓ1

𝐿1

Let 𝐿 𝑗 = ( 𝐴 × 𝐴) ∩ ℓ 𝑗 . Then, for each 1 ≤ 𝑗 ≤ 𝑚,

|𝐿 𝑗 | = 𝑟 (𝑠 𝑗 ) ≥ 2𝑘 .
Furthermore, |𝐿 𝑚+1 | ≥ |𝐿 𝑚 | ≥ 2 𝑘 as well.
Since ℓ 𝑗 and ℓ 𝑗+1 are not parallel, we have |𝐿 𝑗 + 𝐿 𝑗+1 | = |𝐿 𝑗 ||𝐿 𝑗+1 |. Moreover, the sets
𝐿 𝑗 + 𝐿 𝑗+1 are disjoint for different 𝑗. The sumset 𝐴 × 𝐴 + 𝐴 × 𝐴 (here 𝐴 × 𝐴 is the cartesian
product) contains 𝐿 𝑗 + 𝐿 𝑗+1 for each 1 ≤ 𝑗 ≤ 𝑚, so, using (8.2),
∑︁
𝑚 ∑︁
𝑚
𝐸 × ( 𝐴)
| 𝐴 + 𝐴| 2 = | 𝐴 × 𝐴 + 𝐴 × 𝐴| ≥ |𝐿 𝑗 + 𝐿 𝑗+1 | = |𝐿 𝑗 ||𝐿 𝑗+1 | ≥ 𝑚22𝑘 ≳ .
𝑗=1 𝑗=1
log | 𝐴|
8.3 Sum-Product via Multiplicative Energy 277

Combining with 𝐸 × ( 𝐴) ≥ | 𝐴| 4 /| 𝐴𝐴|, which we obtained at the beginning of the proof, we

obtain
| 𝐴 + 𝐴| 2 | 𝐴𝐴| log | 𝐴| ≳ | 𝐴| 4 . □
Remark 8.3.3 (Improvements). Konyagin & Shkredov (2015) improved Solymosi’s sum-
product bound to max{| 𝐴 + 𝐴| , | 𝐴𝐴|} ≥ | 𝐴| 4/3+𝑐 for a small constant 𝑐 > 0. This constant 𝑐
was improved in subsequent works, but still remains quite small.
Remark 8.3.4 (Sum-product in F 𝑝 ). Bourgain, Katz, & Tao (2004), combined with a later
result of Bourgain, Glibichuk, & Konyagin (2006), proved the following sum-product estimate
in F 𝑝 with 𝑝 prime:

Theorem 8.3.5 (Sum-product in prime finite fields)

For every 𝜀 > 0 there exists 𝛿 > 0 and 𝑐 > 0 so that every 𝐴 ⊆ F 𝑝 , with 𝑝 prime, and
1 < | 𝐴| < 𝑝 1− 𝜀 , satisfies
max {| 𝐴 + 𝐴| , | 𝐴𝐴|} ≥ 𝑐 | 𝐴| 1+ 𝛿 .

The statement is false over non-prime fields, since we could take 𝐴 to be a subfield.
Informally, the above theorem says that a prime field does not have any approximate sub-
rings.

Further Reading
Dvir’s survey Incidence Theorems and Their Applications (2012) discusses many interesting
related topics including incidence geometry and additive combinatorics together with their
applications to computer science.
Guth’s book The Polynomial Method in Combinatorics (2016) gives an in-depth discussion
of incidence geometry in R2 and R3 leading to a proof of the solution of the Erdős distinct
distances problem by Guth & Katz (2015).
Sheffer’s book Polynomial Methods and Incidence Theory (2022) provides an introduction
to incidence geometry and related topics.

Chapter Summary

• Sum-product conjecture: max {| 𝐴 + 𝐴| , | 𝐴𝐴|} ≥ | 𝐴| 2−𝑜(1) for all 𝐴 ⊆ R.

• Elekes’ bound: max {| 𝐴 + 𝐴| , | 𝐴𝐴|} ≳ | 𝐴| 5/4
– Proof uses point-line incidences.
– Crossing number inequality. Every graph 𝐺 with 𝑛 vertices and 𝑚 ≥ 4𝑛 edges has
≳ 𝑚 3 /𝑛2 crossings in every drawing.
– Szemerédi–Trotter theorem. 𝑚 lines and 𝑛 points in R2 form 𝑂 (𝑚 2/3 𝑛2/3 + 𝑚 + 𝑛)
incidences.
• Solymosi’s bound: max {| 𝐴 + 𝐴| , | 𝐴𝐴|} ≳ | 𝐴| 4/3−𝑜(1) .
9

Progressions in Sparse Pseudorandom Sets

Chapter Highlights
• The Green–Tao theorem: proof strategy
• A relative Szemerédi theorem and its proof: a central ingredient in the proof of the
Green–Tao theorem
• Transference principle: applying Szemerédi’s theorem as a black box to the sparse pseu-
dorandom setting
• A graph theoretic approach
• Dense model theorem: modeling a sparse set by a dense set
• Sparse triangle counting lemma

In this chapter we discuss a celebrated theorem by Green & Tao (2008) that settled a
folklore conjecture about primes.

Theorem 9.0.1 (Green–Tao theorem)

The primes contain arbitrarily long arithmetic progressions.

The proof of this stunning result uses sophisticated ideas from both combinatorics and
number theory. As stated in the abstract of their paper:
[T]he main new ingredient of this paper . . . is a certain transference principle. This allows us to deduce from
Szemerédi’s theorem that any subset of a sufficiently pseudorandom set (or measure) of positive relative
density contains progressions of arbitrary length.

The main goal of this chapter is to explain what the above paragraph means. As Green
(2007b) writes (emphasis in original):
Our main advance, then, lies not in our understanding of the primes but rather in what we can say about
arithmetic progressions.

We will abstract away ingredients related to prime numbers (see Further Reading at the end
of the chapter) and instead focus on the central combinatorial result: a relative Szemerédi
theorem. We follow the graph theoretic approach by Conlon, Fox, & Zhao (2014, 2015),
which simplified both the hypotheses and the proof of the relative Szemerédi theorem.

9.1 Green–Tao Theorem

In this section, we give a high-level overview of the proof strategy of the Green–Tao theorem.
Recall Szemerédi’s theorem:

279
280 Progressions in Sparse Pseudorandom Sets

Theorem 9.1.1 (Szemerédi’s theorem)

Fix 𝑘 ≥ 3. Every 𝑘-AP-free subset of [𝑁] has size 𝑜(𝑁).

By the prime number theorem,

𝑁
# {primes ≤ 𝑁 } = (1 + 𝑜(1)) .
log 𝑁
So Szemerédi’s theorem does not automatically imply the Green–Tao theorem.
Remark 9.1.2 (Quantitative bounds). It is possible that better quantitative bounds on Sze-
merédi’s theorem might eventually imply the Green–Tao theorem based on the density of
primes alone. For example, Erdős famously conjectured that any 𝐴 ⊆ N with divergent
Í
harmonic series (i.e., 𝑎∈ 𝐴 1/𝑎 = ∞) contains arbitrarily long arithmetic progressions (Con-
jecture 0.2.5). The current best quantitative bounds on Szemerédi’s theorem for 𝑘-APs is
| 𝐴| ≤ 𝑁 (log log 𝑁) −𝑐𝑘 (Gowers 2001), which are insufficient for the primes, although better
bounds are known for 𝑘 = 3, 4. More recently, Bloom & Sisask (2020) proved that for 𝑘 = 3,
| 𝐴| ≤ 𝑁 (log 𝑁) −1−𝑐 for some constant 𝑐 > 0, thereby implying the Green–Tao theorem for
3-APs via the density of primes alone.
We will be quite informal here in order to highlight some key ideas of the proof of
the Green–Tao theorem. Fix 𝑘 ≥ 3. The idea is to embed the primes in a slightly larger
“pseudorandom host set”:
{primes} ⊆ {“almost primes”} .
Very roughly speaking, “almost primes” are numbers with no small prime divisors. The
“almost primes” are much easier to analyze compared to the primes. Using analytic number
theory (involving techniques related to the problem of small gaps between primes), one can
construct “almost primes” satisfying the following properties.
Properties of the “almost primes”:
(1) The primes occupy at least a positive constant fraction of the “almost primes”:
# {primes ≤ 𝑁 }
≥ 𝛿𝑘 .
# {“almost primes” ≤ 𝑁 }
(2) The “almost primes” behave pseudorandomly with respect to certain pattern counts.
The next key ingredient plays a central role in the proof of the Green–Tao theorem, as
mentioned at the beginning of this chapter. It will be nicer to work in Z/𝑁Z rather than [𝑁].
Relative Szemerédi theorem (informal). Fix 𝑘 ≥ 3. If 𝑆 ⊆ Z/𝑁Z satisfies certain pseudo-
randomness hypotheses, then every 𝑘-AP-free subset of 𝑆 has size 𝑜(|𝑆|).
Here imagine a sequence 𝑆 = 𝑆 𝑁 ⊆ Z/𝑁Z of size 𝑜(𝑁) (or else the relative Szemerédi
theorem would already follow from Szemerédi’s theorem), and |𝑆| ≥ 𝑁 1−𝑐𝑘 for some small
constant 𝑐 𝑘 > 0. In the proof of the Green–Tao theorem, the set 𝑆 will be the “almost primes”
(so that |𝑆| = Θ(𝑁/log 𝑁)), subject to various other technical modifications such as the
𝑊-trick discussed in the remark below.
The relative Szemerédi theorem and the construction of the “almost primes” together tell
us that the primes contain a 𝑘-AP. It also implies the following.
9.2 Relative Szemerédi Theorem 281

Theorem 9.1.3 (Green–Tao)

Fix 𝑘 ≥ 3. If 𝐴 is a 𝑘-AP-free subset of the primes, then
| 𝐴 ∩ [𝑁] |
lim = 0.
𝑁 →∞ |Primes ∩ [𝑁]|

In other words, every subset of primes with positive relative density contains arbitrarily
long arithmetic progressions.
Remark 9.1.4 (Residue biases in the primes and the 𝑊 -trick). There are certain local
biases that get in the way of pseudorandomness for primes. For example, all primes greater
than 2 are odd, all primes greater than 3 are not divisible by 3, and so on. In this way,
the primes look different from a subset of positive integers where each 𝑛 is included with
probability 1/log 𝑛 independently at random.
The 𝑾-trick corrects these residue class biases. Let 𝑤 = 𝑤(𝑁) be a function with 𝑤 → ∞
Î
slowly as 𝑁 → ∞. Let 𝑊 = 𝑝≤𝑤 𝑝 be the product of primes up to 𝑤. The 𝑊-trick tells
us to only consider primes that are congruent to 1 mod 𝑊. The resulting set of “𝑊-tricked
primes” {𝑛 : 𝑛𝑊 + 1 is prime} does not have any bias modulo a small fixed prime. The
relative Szemerédi theorem should be applied to the 𝑊-tricked primes.
We shall not dwell on the analytic number theoretic arguments here. See Further Reading
at the end of the chapter for references. For example, Conlon, Fox, & Zhao (2014, Sections
8 and 9) gives an exposition of the construction of the “almost primes” and the proofs of its
properties.
The goal of the rest of the chapter is to state and prove the relative Szemerédi theorem.

9.2 Relative Szemerédi Theorem

In this section, we formulate a relative Szemerédi theorem. For concreteness, we mostly
discuss 3-APs, though everything generalizes to 𝑘-APs straightforwardly.
Recall Roth’s theorem:

Theorem 9.2.1 (Roth’s theorem)

Every 3-AP-free subset of Z/𝑁Z has size 𝑜(𝑁).

We would like to formulate a result of the following form, where Z/𝑁Z is replaced by a
sparse pseudorandom host set 𝑆.
Relative Roth theorem (informal). If 𝑆 ⊆ Z/𝑁Z satisfies certain pseudorandomness
conditions, then every 3-AP-free subset of 𝑆 has size 𝑜(|𝑆|).
In what sense should 𝑆 behave pseudorandomly? It will be easiest to explain the pseudo-
random hypothesis using a graph.
Consider the following construction of a graph 𝐺 𝑆 that we saw in Chapter 6 (in particular
Sections 2.4 and 2.10).
282 Progressions in Sparse Pseudorandom Sets

𝑥 ∼ 𝑦 iff
2𝑥 + 𝑦 ∈ 𝑆

𝐺𝑆: 𝑋 = Z/𝑁Z 𝑌 = Z/𝑁Z

𝑥 𝑦

𝑥 ∼ 𝑧 iff 𝑦 ∼ 𝑧 iff
𝑥−𝑧 ∈𝑆 −𝑦 − 2𝑧 ∈ 𝑆
𝑧
𝑍 = Z/𝑁Z

Here 𝐺 𝑆 is a tripartite graph with vertex sets 𝑋, 𝑌 , 𝑍, each being a copy of Z/𝑁Z. Its edges
are:
• (𝑥, 𝑦) ∈ 𝑋 × 𝑌 whenever 2𝑥 + 𝑦 ∈ 𝑆;
• (𝑥, 𝑧) ∈ 𝑋 × 𝑍 whenever 𝑥 − 𝑧 ∈ 𝑆;
• (𝑦, 𝑧) ∈ 𝑌 × 𝑍 whenever −𝑦 − 2𝑧 ∈ 𝑆.
This graph 𝐺 𝑆 is designed so that (𝑥, 𝑦, 𝑧) ∈ 𝑋 × 𝑌 × 𝑍 is a triangle if and only if
2𝑥 + 𝑦, 𝑥 − 𝑧, −𝑦 − 2𝑧 ∈ 𝑆.
Note that these three terms form a 3-AP with common difference −𝑥 − 𝑦 − 𝑧. So the triangles
in 𝐺 𝑆 precisely correspond to 3-APs in 𝑆 (it is an 𝑁-to-1 correspondence).
The following definition is a variation of homomorphism density from Section 4.3.

Definition 9.2.2 ( 𝐹 -density)

Let 𝐹 and 𝐺 be tripartite graphs with three labeled parts. Define 𝑭-density in 𝑮,
denoted 𝒕(𝑭, 𝑮), to be the probability that a random map 𝑉 (𝐹) → 𝑉 (𝐺) is a graph
homomorphism 𝐹 → 𝐺, where each vertex in the first vertex part of 𝐹 is sent to a
uniform vertex of the first vertex part of 𝐺, and likewise with the second and third parts,
all independently.

𝑋 𝑌

−→

𝐹 𝐺𝑆 𝑍

Now we define the desired pseudorandomness hypotheses on 𝑆 ⊆ Z/𝑁Z, which says that
the associated graph 𝐺 𝑆 has certain subgraph counts close to random.
9.2 Relative Szemerédi Theorem 283

Definition 9.2.3 (3-linear forms condition)

We say that 𝑆 ⊆ Z/𝑁Z satisfies the 3-linear forms condition with tolerance 𝜺 if, setting
𝑝 = |𝑆| /𝑁, one has
(1 − 𝜀) 𝑝 𝑒 (𝐹 ) ≤ 𝑡 (𝐹, 𝐺 𝑆 ) ≤ (1 + 𝜀) 𝑝 𝑒 (𝐹 ) whenever 𝐹 ⊆ 𝐾2,2,2 .
(Here 𝐹 ⊆ 𝐾2,2,2 means that is a subgraph of the labeled tripartite graph 𝐾2,2,2 ; an
example is illustrated below.)

𝐹 ⊆ 𝐾2,2,2

In other words, comparing the graph 𝐺 𝑆 to a random tripartite graph with the same edge
density 𝑝, these two graphs have approximately the same 𝐹-density whenever 𝐹 ⊆ 𝐾2,2,2 .
Alternatively, we can state the 3-linear forms condition explicitly without referring to
graphs. This is done by expanding the definition of 𝐺 𝑆 . Let 𝑥0 , 𝑥1 , 𝑦 0 , 𝑦 1 , 𝑧0 , 𝑧1 ∈ Z/𝑁Z be
chosen independently and uniformly at random. Then 𝑆 ⊆ Z/𝑁Z with |𝑆| = 𝑝𝑁 satisfies the
3-linear forms condition with tolerance 𝜀 if the probability that

 



−𝑦 0 − 2𝑧0 , 𝑥 0 − 𝑧0 , 2𝑥0 + 𝑦 0 , 

 −𝑦 1 − 2𝑧0 ,
 𝑥 1 − 𝑧0 , 2𝑥1 + 𝑦 0 , 

⊆𝑆

 −𝑦 0 − 2𝑧1 , 𝑥 0 − 𝑧1 , 2𝑥0 + 𝑦 1 , 


 −𝑦 1 − 2𝑧1 , 

 𝑥 1 − 𝑧1 , 2𝑥1 + 𝑦 1 
lies in the interval (1 ± 𝜀) 𝑝 12 , and furthermore the same holds if we erase any subset of
the above 12 linear forms and also change the “12” in 𝑝 12 to the number of linear forms
remaining.
Remark 9.2.4. This 𝐾2,2,2 condition is reminiscent of the 𝐶4 -count condition for the quasir-
andom graph in Theorem 3.1.1 by Chung, Graham, & Wilson (1989). Just as how 𝐶4 = 𝐾2,2
is a 2-blow-up of a single edge, 𝐾2,2,2 is a 2-blow-up of a triangle.

2-blow-up 2-blow-up
−−−−−−−−→ −−−−−−−−→

The 3-linear forms condition can be viewed as a “second moment” condition with respect to
triangles. It is needed in the proof of the sparse triangle counting lemma later.
We are now ready to state a precise formulation of the relative Roth theorem.
284 Progressions in Sparse Pseudorandom Sets

Theorem 9.2.5 (Relative Roth theorem)

For every 𝛿 > 0, there exist 𝜀 > 0 and 𝑁0 so that for all odd 𝑁 ≥ 𝑁0 , if 𝑆 ⊆ Z/𝑁Z
satisfies the 3-linear forms condition with tolerance 𝜀, then every 3-AP-free subset of 𝑆
has size less than 𝛿 |𝑆|.

To extend these definitions and results to 𝑘-APs, we set up a (𝑘 − 1)-uniform hypergraph

similar to the deduction of Szemerédi’s theorem from the hypergraph removal lemma in
Section 2.10.
Let us illustrate it first for 4-APs. We say that 𝑆 ⊆ Z/𝑁Z satisfies the 4-linear forms
condition with tolerance 𝜺 if given random 𝑤 0 , 𝑤 1 , 𝑥0 , 𝑥1 , 𝑦 0 , 𝑦 1 , 𝑧0 , 𝑧1 ∈ Z/𝑁Z (independent
and uniform as always), the probability that

 3𝑤 0 + 2𝑥 0 + 𝑦 0 , 2𝑤 0 + 𝑥0 − 𝑧0 , 𝑤 0 − 𝑦 0 − 2𝑧0 , −𝑥 0 − 2𝑦 0 − 3𝑧0 , 


 


 3𝑤 0 + 2𝑥 0 + 𝑦 1 , 2𝑤 0 + 𝑥0 − 𝑧1 , 𝑤 0 − 𝑦 0 − 2𝑧1 , −𝑥0 − 2𝑦 0 − 3𝑧1 , 


 


 



3𝑤 0 + 2𝑥 1 + 𝑦 0 , 2𝑤 0 + 𝑥1 − 𝑧0 , 𝑤 0 − 𝑦 1 − 2𝑧0 , −𝑥0 − 2𝑦 1 − 3𝑧0 , 


 3𝑤 0 + 2𝑥 1 + 𝑦 1 , 2𝑤 0 + 𝑥1 − 𝑧 1 , 𝑤 0 − 𝑦 1 − 2𝑧1 , −𝑥0 − 2𝑦 1 − 3𝑧1 , 

⊆𝑆

 3𝑤 1 + 2𝑥 0 + 𝑦 0 , 2𝑤 1 + 𝑥0 − 𝑧0 , 𝑤 1 − 𝑦 0 − 2𝑧0 , −𝑥1 − 2𝑦 0 − 3𝑧0 , 


 


 3𝑤 1 + 2𝑥 0 + 𝑦 1 , 2𝑤 1 + 𝑥0 − 𝑧1 , 𝑤 1 − 𝑦 0 − 2𝑧1 , −𝑥1 − 2𝑦 0 − 3𝑧1 , 


 


 3𝑤 1 + 2𝑥 1 + 𝑦 0 , 2𝑤 1 + 𝑥1 − 𝑧0 , 𝑤 1 − 𝑦 1 − 2𝑧0 , −𝑥1 − 2𝑦 1 − 3𝑧0 , 


 

 3𝑤 1 + 2𝑥 1 + 𝑦 1 , 2𝑤 1 + 𝑥1 − 𝑧1 , 𝑤 1 − 𝑦 1 − 2𝑧1 , −𝑥1 − 2𝑦 1 − 3𝑧1 
lies within the interval (1 ± 𝜀) 𝑝 32 , and furthermore the same if true if we erase any subset of
the above 32 factors and replace the “32′′ in 𝑝 32 by the number of linear forms remaining.
Here is the statement for 𝑘-APs. (You may wish to skip it and simply imagine it should
generalize based on the above examples.)

Definition 9.2.6 ( 𝑘 -linear forms condition)

For each 1 ≤ 𝑟 ≤ 𝑘, let
𝐿 𝑟 (𝑥1 , . . . , 𝑥 𝑘 ) = 𝑘𝑥1 + (𝑘 − 1)𝑥2 + · · · + 𝑥 𝑘 − 𝑟 (𝑥 1 + · · · + 𝑥 𝑘 ).
We say that 𝑆 ⊆ Z/𝑁Z satisfies the 𝒌-linear forms condition with tolerance 𝜺 if for
every 𝑅 ⊆ [𝑘] × {0, 1} 𝑘 , with each variable 𝑥𝑖, 𝑗 ∈ Z/𝑁Z chosen independently and
uniformly at random, the probability that
𝐿 𝑟 (𝑥1, 𝑗1 , . . . , 𝑥 𝑘, 𝑗𝑘 ) ∈ 𝑆 for all (𝑟, 𝑗1 , . . . , 𝑗 𝑘 ) ∈ 𝑅
lies within the interval (1 ± 𝜀) 𝑝 | 𝑅 | .

Theorem 9.2.7 (Relative Szemerédi theorem)

For every 𝑘 ≥ 3 and 𝛿 > 0, there exist 𝜀 > 0 and 𝑁0 so that for all 𝑁 ≥ 𝑁0 coprime to
(𝑘 − 1)!, if 𝑆 ⊆ Z/𝑁Z satisfies the 𝑘-linear forms condition with tolerance 𝜀, then every
𝑘-AP-free subset of 𝑆 has size less than 𝛿 |𝑆|.

Remark 9.2.8 (History). The above formulations of relative Roth and Szemerédi theorems
are due to Conlon, Fox, & Zhao (2015). The original approach by Green & Tao (2008)
required in addition another technical hypothesis on 𝑆 known as the “correlation condition,”
which is no longer needed.
9.3 Transference Principle 285

Remark 9.2.9 (Szemerédi’s theorem in a random set). Instead of a pseudorandom host

set 𝑆, what happens if 𝑆 is a random subset of Z/𝑁Z obtained by keeping each element with
probability 𝑝 = 𝑝 𝑁 → 0 as 𝑁 → ∞? A second moment argument shows that, provided
that 𝑝 𝑁 tends to zero sufficiently slowly, the random set 𝑆 indeed satisfies the 𝑘-linear forms
condition (see Exercise 9.2.11 below). However, this argument is rather lossy. The following
sharp result was proved independently by Conlon & Gowers (2016) and Schacht (2016). In
the statement below, there is no substantive difference between [𝑁] and Z/𝑁Z.

Theorem 9.2.10 (Szemerédi’s theorem in a random set)

For every 𝑘 ≥ 3 and 𝛿 > 0, there is some 𝐶 such that as long as 𝑝 > 𝐶𝑁 −1/(𝑘−1) , with
probability approaching 1 as 𝑁 → ∞, given a random 𝑆 ⊆ [𝑁] where every element is
included independently with probability 𝑝, every 𝑘-AP-free subset of 𝑆 has size at most
𝛿 |𝑆|.

The threshold 𝐶𝑁 −1/(𝑘−1) is optimal up to the constant 𝐶. Indeed, the expected number
of 𝑘-APs in 𝑆 is 𝑂 ( 𝑝 𝑘 𝑁 2 ), which is less than half of E |𝑆| = 𝑝𝑁 if 𝑝 < 𝑐𝑁 −1/(𝑘−1) for
a sufficiently small constant 𝑐 > 0. One can delete from 𝑆 an element from each 𝑘-AP
contained in 𝑆. So with high probability, this process deletes at most half of 𝑆, and the
remaining subset of 𝑆 is 𝑘-AP-free.
The hypergraph container method gives another proof of the above result, plus much
more (Balogh, Morris, & Samotĳ 2015; Saxton & Thomason 2015). See the survey The
method of hypergraph containers by Balogh, Morris, & Samotĳ (2018) for more on this
topic.
Exercise 9.2.11 (Random sets and the linear forms condition). Let 𝑆 ⊆ Z/𝑁Z be a
random set where every element of Z/𝑁Z is included in 𝑆 independently with probability
𝑝.
Prove that there is some 𝑐 > 0 so that for every 𝜀 > 0 there is some 𝐶 > 0 so that as
long as 𝑝 > 𝐶𝑁 −𝑐 and 𝑁 is large enough, with probability at least 1 − 𝜀, 𝑆 satisfies the
3-linear forms condition with tolerance 𝜀 . What is the optimal 𝑐?
Hint: Use the second moment method; see Alon & Spencer (2016, Chapter 4).

9.3 Transference Principle

To prove the relative Szemerédi theorem, we shall assume Szemerédi’s theorem and apply it
as a black box to the sparse pseudorandom setting. It may be surprising that we can apply
Szemerédi’s theorem this way. Green and Tao developed a method known as a transference
principle for bringing Szemerédi’s theorem to the sparse pseudorandom setting (the idea also
appeared earlier in the work of Green (2005b) establishing Roth’s theorem in the primes).
The transference principle is an influential idea, and it can be applied to other extremal
problems in combinatorics.
Let us sketch the outline of the proof of the relative Szemerédi theorem. We are given
𝐴⊆𝑆 with | 𝐴| ≥ 𝛿 |𝑆| .
Here 𝑆 ⊆ Z/𝑁Z is a sparse pseudorandom set satisfying the 𝑘-linear forms condition.
286 Progressions in Sparse Pseudorandom Sets

Step 1. Approximate 𝐴 by a dense model.

We will prove a dense model theorem that produces a “dense model” 𝐵 of 𝐴. In particular,
the density of 𝐵 in Z/𝑁Z is similar to the relative density of 𝐴 in 𝑆:
|𝐵| | 𝐴|
≈ ≥ 𝛿.
𝑁 |𝑆|
And furthermore, 𝐵 will be close to 𝐴 with respect to a “cut norm” derived from the graphon
cut norm (see Chapter 4 on graph limits). Recall that the graphon cut norm is closely linked
to 𝜀-regularity from the regularity lemma (Chapter 2) and the discrepancy condition DISC
from quasirandom graphs (Chapter 3).
Step 2. Count 𝑘-APs in 𝐴 and 𝐵.
We will prove a sparse counting lemma to show that the number of 𝑘-APs in 𝐴 is similar
to the number of 𝑘-APs in 𝐵, after an appropriate density normalization. In other words,
setting 𝑝 = |𝑆| /𝑁 for the normalizing density, we will show
|{𝑘-APs in 𝐴}| ≈ 𝑝 𝑘 |{𝑘-APs in 𝐵}| .
Szemerédi’s theorem says that every subset of [𝑁] with size ≥ 𝛿𝑁 contains a 𝑘-AP
(provided that 𝑁 is sufficiently large compared the constant 𝛿 > 0). In fact, we can bootstrap
Szemerédi’s theorem to show that a subset of [𝑁] with size ≥ 𝛿𝑁 in fact must contain lots
of 𝑘-APs. The deduction uses a sampling argument and is attributed to Varnavides (1959).
(This was Exercise 1.3.7 from Section 1.3 on supersaturation.)

Theorem 9.3.1 (Szemerédi’s theorem, counting version)

For every 𝛿 > 0, there exists 𝑐 > 0 and 𝑁0 such that for every 𝑁 ≥ 𝑁0 , every subset of
Z/𝑁Z with ≥ 𝛿𝑁 elements contains ≥ 𝑐𝑁 2 𝑘-APs.

Since the “dense model” 𝐵 has size ≥ 𝛿𝑁/2, by the counting version of Szemerédi’s
theorem, 𝐵 has ≳ 𝛿 𝑁 2 𝑘-APs, and hence 𝐴 has ≳ 𝛿 𝑝 𝑘 𝑁 2 𝑘-APs by the sparse counting
lemma. So in particular, 𝐴 cannot be 𝑘-AP-free. This finishes the proof sketch of the relative
Szemerédi theorem.
Now that we have seen the above outline, it remains to formulate and prove:
• a dense model theorem, and
• a sparse counting lemma.
We will focus on explaining the 3-AP case (i.e., relative Roth theorem) in the rest of this
chapter. The 3-AP setting is notationally simpler than that of 𝑘-AP. It is straightforward to
generalize the 3-AP proof to 𝑘-APs following the (𝑘 −1)-uniform hypergraph setup discussed
in the previous section.

9.4 Dense Model Theorem

In this section, Γ is any finite abelian group. We will only need the case Γ = Z/𝑁Z in
subsequent sections.
Given 𝑓 : Γ → R, we define the following “cut norm” similar to the cut norm from graph
9.4 Dense Model Theorem 287

limits (Chapter 4):

∥ 𝒇 ∥ □ := sup E 𝑥,𝑦 ∈Γ [ 𝑓 (𝑥 + 𝑦)1 𝐴 (𝑥)1 𝐵 (𝑦)] .
𝐴,𝐵⊆Γ

This is essentially the graphon cut norm applied to the (not necessarily symmetric) function
Γ × Γ → R given by (𝑥, 𝑦) ↦→ 𝑓 (𝑥 + 𝑦).
As should be expected from the equivalence of DISC and EIG for quasirandom Cayley
graphs (Theorem 3.5.3), having small cut norm is equivalent to being Fourier uniform.
Exercise 9.4.1. Show that for all 𝑓 : Γ → R,
𝑐∥ b
𝑓 ∥∞ ≤ ∥ 𝑓 ∥□ ≤ ∥ b
𝑓 ∥∞,
where 𝑐 is some absolute constant (not depending on Γ or 𝑓 ).
Remark 9.4.2 (Generalizations to 𝑘 -APs). The above definition is tailored to 3-APs. For
4-APs, we should define the corresponding norm of 𝑓 as
sup E 𝑥,𝑦,𝑧 ∈Γ [ 𝑓 (𝑥 + 𝑦 + 𝑧)1 𝐴 (𝑥, 𝑦)1 𝐵 (𝑥, 𝑧)1𝐶 (𝑦, 𝑧)] .
𝐴,𝐵,𝐶 ⊆Γ×Γ

(The more obvious guess of using 1 𝐴 (𝑥)1 𝐵 (𝑦)1𝐶 (𝑧) instead of the above turns out to be insuf-
ficient for proving the relative Szemerédi theorem. A related issue in the context of hypergraph
regularity was discussed in Section 2.11.) The generalization to 𝑘-APs is straightforward.
However, for 𝑘 ≥ 4, the above norm is no longer equivalent to Fourier uniformity. This is
why we study ∥ 𝑓 ∥ □ norm instead of ∥ b𝑓 ∥ ∞ in this section.
Informally, the main result of this section says that if a sparse set 𝑆 is close to random
in normalized cut norm, then every subset 𝐴 ⊆ 𝑆 can be approximated by some dense
𝐵 ⊆ Z/𝑁Z in normalized cut norm.

Theorem 9.4.3 (Dense model theorem)

For every 𝜀 > 0, there exists 𝛿 > 0 such that the following holds. For every finite abelian
group Γ and sets 𝐴 ⊆ 𝑆 ⊆ Γ such that, setting 𝑝 = |𝑆| /|Γ|,
∥1𝑆 − 𝑝∥ □ ≤ 𝛿 𝑝,
there exists 𝑔 : Γ → [0, 1] such that
∥1 𝐴 − 𝑝𝑔∥ □ ≤ 𝜀 𝑝.

Remark 9.4.4 (3-linear forms condition implies small cut norm). The cut norm hypothesis
is weaker than the 3-linear forms condition, as can be proved by two applications of the
Cauchy–Schwarz inequality (for example, see the proof of Lemma 9.5.2 in the next section).
In short, ∥𝜈 − 1∥ 4□ ≤ 𝑡 (𝐾2,2 , 𝜈 − 1).
Remark 9.4.5 (Set instead of function). We can replace the function 𝑔 by a random set
𝐵 ⊆ Γ where each 𝑥 ∈ Γ is included in 𝐵 with probability 𝑔(𝑥). By standard concentration
bounds, changing 𝑔 to 𝐵 induces a negligible effect on 𝜀 if Γ is large enough. It is important
here that 𝑔(𝑥) ∈ [0, 1] for all 𝑥 ∈ Γ.
So the above theorem says, given a sparse pseudorandom host set 𝑆, any subset of 𝑆 can
be modeled by dense set 𝐵 that is close to 𝐴 with respect to the normalized cut norm.
288 Progressions in Sparse Pseudorandom Sets

It will be more natural to prove the above theorem a bit more generally where sets
𝐴 ⊆ 𝑆 ⊆ Γ are replaced by functional analogs. Since these are sparse sets, we should scale
indicator functions as follows:
𝑓 = 𝑝 −1 1 𝐴 and 𝜈 = 𝑝 −1 1𝑆 .
Then 𝑓 ≤ 𝜈 pointwise. Note that 𝑓 and 𝜈 take values in [0, 𝑝 −1 ], unlike 𝑔, which takes values
in [0, 1]. The normalization is such that E𝜈 = 1. Here is the main result of this section.

Theorem 9.4.6 (Dense model theorem)

For every 𝜀 > 0, there exists 𝛿 > 0 such that the following holds. For every finite abelian
group Γ and functions 𝑓 , 𝜈 : Γ → [0, ∞) satisfying
∥𝜈 − 1∥ □ ≤ 𝛿
and
𝑓 ≤𝜈 pointwise,
there exists a function 𝑔 : Γ → [0, 1] such that
∥ 𝑓 − 𝑔∥ □ ≤ 𝜀.

The rest of this section is devoted to proving the above theorem. First, we reformulate the
cut norm using convex geometry.
Let Φ denote the set of all functions Γ → R that can be written as a convex combination
of convolutions of the form 1 𝐴 ∗ 1 𝐵 or −1 𝐴 ∗ 1 𝐵 , where 𝐴, 𝐵 ⊆ Γ. In other words,
Φ = ConvexHull ({1 𝐴 ∗ 1 𝐵 : 𝐴, 𝐵 ⊆ Γ} ∪ {−1 𝐴 ∗ 1 𝐵 : 𝐴, 𝐵 ⊆ Γ}) .
Note that Φ is a centrally symmetric convex set of functions Γ → R.

Lemma 9.4.7 (Multiplicative closure)

The set Φ is closed under pointwise multiplication; that is, if 𝜑, 𝜑′ ∈ Φ, then 𝜑𝜑′ ∈ Φ.

Proof. Given 𝐴, 𝐵, 𝐶, 𝐷 ⊆ Γ, we have

(1 𝐴 ∗ 1 𝐵 )(𝑥)(1𝐶 ∗ 1𝐷 ) (𝑥) = E𝑎,𝑏,𝑐,𝑑:𝑎+𝑏=𝑐+𝑑=𝑥 1 𝐴 (𝑎)1 𝐵 (𝑏)1𝐶 (𝑐)1𝐷 (𝑑)
= E𝑎,𝑏,𝑠:𝑎+𝑏=𝑥 1 𝐴 (𝑎)1 𝐵 (𝑏)1𝐶 (𝑎 + 𝑠)1𝐷 (𝑏 − 𝑠)
= E𝑠 E𝑎,𝑏:𝑎+𝑏=𝑥 1 𝐴∩(𝐶 −𝑠) (𝑎)1 𝐵∩(𝐷+𝑠) (𝑏).
= E𝑠 (1 𝐴∩(𝐶 −𝑠) ∗ 1 𝐵∩(𝐷+𝑠) ) (𝑥).
Thus the pointwise product of 1 𝐴 ∗ 1 𝐵 and 1𝐶 ∗ 1𝐷 lies in Φ since it is an average of various
functions of the form 1𝑆 ∗ 1𝑇 . Since Φ is the convex hull of functions of the form 1 𝐴 ∗ 1 𝐵 and
−1 𝐴 ∗ 1 𝐵 , Φ is closed under pointwise multiplication. □
Given 𝑓 , 𝑔 : Γ → R, define their inner product by
⟨ 𝒇 , 𝒈⟩ := E 𝑥 ∈Γ 𝑓 (𝑥)𝑔(𝑥).
Since
E 𝑥,𝑦 ∈Γ 𝑓 (𝑥 + 𝑦)1 𝐴 (𝑥)1 𝐵 (𝑦) = ⟨ 𝑓 , 1 𝐴 ∗ 1 𝐵 ⟩ ,
9.4 Dense Model Theorem 289

we have
∥ 𝑓 ∥ □ = sup |⟨ 𝑓 , 1 𝐴 ∗ 1 𝐵 ⟩| = sup ⟨ 𝑓 , 𝜑⟩ .
𝐴,𝐵⊆Γ 𝜑 ∈Φ

Since Φ is a centrally symmetric convex body, ∥ ∥ □ is indeed a norm. Its dual norm is thus
given by, for any nonzero 𝜓 : Γ → R,

∥𝜓∥ ∗□ = sup ⟨ 𝑓 , 𝜓⟩ = sup 𝑟 ∈ R : 𝑟 −1 𝜓 ∈ Φ .
𝑓 : Γ→R
∥ 𝑓 ∥ □ ≤1

In other words, Φ is the unit ball for ∥ ∥ ∗□ norm. The following inequality holds for all
𝑓 , 𝜓 : Γ → R:
⟨ 𝑓 , 𝜓⟩ ≤ ∥ 𝑓 ∥ □ ∥𝜓∥ ∗□ .

Lemma 9.4.8 (Submultiplicativity of the dual cut norm)

The norm ∥·∥ ∗□ is submultiplicative; that is, for all 𝜓, 𝜓 ′ : Γ → R,
∥𝜓𝜓 ′ ∥ ∗□ ≤ ∥𝜓∥ ∗□ ∥𝜓 ′ ∥ ∗□ .

Proof. The inequality is not affected if we multiply 𝜓 and 𝜓 ′ each by a constant. So we can
assume that ∥𝜓∥ ∗□ = ∥𝜓 ′ ∥ ∗□ = 1. Then 𝜓, 𝜓 ′ ∈ Φ. Hence 𝜓𝜓 ′ ∈ Φ by Lemma 9.4.7. This
implies that ∥𝜓𝜓 ′ ∥ □′ ≤ 1. □
We need two classical results from analysis and convex geometry.

Theorem 9.4.9 (Weierstrass polynomial approximation theorem)

Let 𝑎, 𝑏 ∈ R and 𝜀 > 0. Let 𝐹 : [𝑎, 𝑏] → R be a continuous function. Then there exists
a polynomial 𝑃 such that |𝐹 (𝑡) − 𝑃(𝑡)| ≤ 𝜀 for all 𝑡 ∈ [𝑎, 𝑏].

Theorem 9.4.10 (Separating hyperplane theorem)

Given a closed convex set 𝐾 ⊆ R𝑛 and a point 𝑝 ∉ 𝐾, there exists a hyperplane separating
𝐾 and 𝑝.

Proof idea of the dense model theorem. If no 𝑔 : Γ → [0, 1] satisfies ∥ 𝑓 − 𝑔∥ □ ≤ 𝜀, then 𝑓

does not lie in the convex set containing all functions of the form 𝑔 + 𝑔 ′ where 𝑔 : Γ → [0, 1]
and ∥𝑔 ′ ∥ □ ≤ 𝜀. The separating hyperplane theorem then gives us a function 𝜓 so that
⟨ 𝑓 , 𝜓⟩ > 1 and ⟨𝑔 + 𝑔 ′ , 𝜓⟩ ≤ 1 for all such 𝑔, 𝑔 ′ (it helps to pretend a bit of extra slack
here, say ⟨ 𝑓 , 𝜓⟩ > 1 + 𝜀). Using the Weierstrass polynomial approximation theorem, choose
a polynomial 𝑃(𝑡) so that 𝑃(𝑡) ≈ max{0, 𝑡} pointwise for all |𝑡| ≤ ∥𝜓∥ ∗□ = 𝑂 𝜀 (1). Writing
𝜓+ (𝑥) = max {0, 𝜓(𝑥)} for the positive part of 𝜓, we have
⟨ 𝑓 , 𝜓⟩ ≤ ⟨ 𝑓 , 𝜓+ ⟩ ≤ ⟨𝜈, 𝜓+ ⟩ ≈ ⟨𝜈, 𝑃𝜓⟩ = ⟨𝜈 − 1, 𝑃𝜓⟩ + ⟨1, 𝑃𝜓⟩ .
290 Progressions in Sparse Pseudorandom Sets

We can show that ∥𝜓∥ ∗□ = 𝑂 𝜀 (1). As 𝑃 is a polynomial, by the triangle inequality and the
submultiplicativity of ∥ ∥ ∗□ , we find that ∥𝑃𝜓∥ ∗□ = 𝑂 𝜀 (1). And so
⟨𝜈 − 1, 𝑃𝜓⟩ ≤ ∥𝜈 − 1∥ □ ∥𝑃𝜓∥ ∗□ ≤ 𝛿 ∥𝑃𝜓∥ ∗□
can be made arbitrarily small by making 𝛿 small. We also have ⟨1, 𝑃𝜓⟩ ≈ ⟨1, 𝜓+ ⟩, which is
at most around 1. Together, we see that ⟨ 𝑓 , 𝜓⟩ is at most around 1, which would contradict
⟨ 𝑓 , 𝜓⟩ > 1 from earlier (assuming enough slack). ■
Proof of the dense model theorem (Theorem 9.4.6). We will show that the conclusion holds
with 𝛿 > 0 chosen to be sufficiently small as a function of 𝜀. We may assume that 0 < 𝜀 < 1/2.
We will prove the existence of a function 𝑔 : Γ → [0, 1 + 𝜀/2] such that ∥ 𝑓 − 𝑔∥ □ ≤ 𝜀/2.
(To obtain the function Γ → [0, 1] in the theorem, we can replace 𝑔 by min{𝑔, 1}.)
We are trying to prove that one can write 𝑓 as 𝑔 + 𝑔 ′ with

𝑔 ∈ 𝐾 := functions Γ → [0, 1 + 2𝜀 ]
and

𝑔 ′ ∈ 𝐾 ′ := functions Γ → R with ∥·∥ □ ≤ 𝜀
2
.
We can view the sets 𝐾 and 𝐾 ′ as convex bodies (both containing the origin) in the space of
all functions Γ → R. Our goal is to show that 𝑓 ∈ 𝐾 + 𝐾 ′ .
Let us assume the contrary. By the separating hyperplane theorem applied to 𝑓 ∉ 𝐾 + 𝐾 ′ ,
there exists a function 𝜓 : Γ → R (which is a normal vector to the separating hyperplane)
such that
(a) ⟨ 𝑓 , 𝜓⟩ > 1, and
(b) ⟨𝑔 + 𝑔 ′ , 𝜓⟩ ≤ 1 for all 𝑔 ∈ 𝐾 and 𝑔 ′ ∈ 𝐾 ′
Taking 𝑔 = (1 + 2𝜀 )1 𝜓≥0 and 𝑔 ′ = 0 in (b), we have
1
⟨1, 𝜓+ ⟩ ≤ . (9.1)
1 + 𝜀/2
Here we write 𝜓+ for the function 𝜓+ (𝑥) := max {𝜓(𝑥), 0}.
On the other hand, setting 𝑔 = 0, we have
𝜀
1 ≥ sup ⟨𝑔 ′ , 𝜓⟩ = sup ⟨𝑔 ′ , 𝜓⟩ = ∥𝜓∥ ∗□ .
𝑔′ ∈𝐾 ′ ∥𝑔′ ∥ □ ≤ 𝜀/2
2
So
2
∥𝜓∥ ∗□ ≤ .
𝜀
Setting 𝑔 = 0 and 𝑔 ′ = ± 2𝜀 𝑁1 𝑥 for a single 𝑥 ∈ Γ (i.e, 𝑔 ′ is supported on a single element of
Γ), we have ∥𝑔 ′ ∥ □ ≤ 𝜀/2 and 1 ≥ ⟨𝑔 ′ , 𝜓⟩ = ± 2𝜀 𝜓(𝑥). So |𝜓(𝑥)| ≤ 2/𝜀. This holds for every
𝑥 ∈ Γ. Thus
2
∥𝜓∥ ∞ ≤ .
𝜀
By the Weierstrass polynomial approximation theorem, there exists some real polynomial
𝑃(𝑥) = 𝑝 𝑑 𝑥 𝑑 + · · · + 𝑝 1 𝑥 + 𝑝 0 such that
𝜀 2
|𝑃(𝑡) − max {𝑡, 0}| ≤ whenever |𝑡| ≤ .
20 𝜀
9.4 Dense Model Theorem 291

𝑃(𝑡)
max{𝑡, 0}
Set
∑︁
𝑑 𝑖
2
𝑅= | 𝑝𝑖 | ,
𝑖=0
𝜀

which is a constant that depends only on 𝜀. (A more careful analysis gives 𝑅 = exp(𝜀 −𝑂 (1) ).)
Write 𝑃𝜓 : Γ → R to mean the function given by 𝑃𝜓(𝑥) = 𝑃(𝜓(𝑥)). By the triangle
inequality and the submulticativity of ∥·∥ ∗□ (Lemma 9.4.8) ,
∑︁
𝑑 ∑︁
𝑑 ∑︁
𝑑 𝑖
∗ 𝑖 ∗ ∗ 𝑖 2
∥𝑃𝜓∥ □ ≤ | 𝑝 𝑖 | ∥𝜓 ∥ □ ≤ | 𝑝 𝑖 | (∥𝜓∥ □ ) ≤ | 𝑝𝑖 | = 𝑅.
𝑖=0 𝑖=0 𝑖=0
𝜀
Let us choose n 𝜀 o
𝛿 = min ,1 .
20𝑅
Then ∥𝜈 − 1∥ □ ≤ 𝛿 implies that
𝜀
|⟨𝜈 − 1, 𝑃𝜓⟩| ≤ ∥𝜈 − 1∥ □ ∥𝑃𝜓∥ ∗□ ≤ 𝛿𝑅 ≤ . (9.2)
20
Earlier we showed that ∥𝜓∥ ∞ ≤ 2/𝜀, and also |𝑃(𝑡) − max {𝑡, 0}| ≤ 𝜀/20 whenever
|𝑡| ≤ 2/𝜀. Thus
𝜀
∥𝑃𝜓 − 𝜓+ ∥ ∞ ≤ . (9.3)
20
Hence,
⟨𝜈, 𝑃𝜓⟩ = ⟨1, 𝑃𝜓⟩ + ⟨𝜈 − 1, 𝑃𝜓⟩
𝜀
≤ ⟨1, 𝑃𝜓⟩ + [by (9.2)]
20
𝜀
≤ ⟨1, 𝜓+ ⟩ + [by (9.3)]
10
1 𝜀
≤ + . [by (9.1)].
1 + 𝜀/2 10
Also,
⟨𝜈 − 1, 1⟩ ≤ ∥𝜈 − 1∥ □ ≤ 𝛿.
Thus
∥𝜈∥ 1 ≤ 1 + ∥𝜈 − 1∥ 1 ≤ 1 + 𝛿 ≤ 2.
So by (9.3),
𝜀 𝜀
⟨𝜈, 𝜓+ − 𝑃𝜓⟩ ≤ ∥𝜈∥ 1 ∥𝜓+ − 𝑃𝜓∥ ∞ ≤ 2 · ≤ . (9.4)
20 10
292 Progressions in Sparse Pseudorandom Sets

Thus, using that 0 ≤ 𝑓 ≤ 𝜈,

⟨ 𝑓 , 𝜓⟩ ≤ ⟨ 𝑓 , 𝜓+ ⟩ ≤ ⟨𝜈, 𝜓+ ⟩
≤ ⟨𝜈, 𝑃𝜓⟩ + ⟨𝜈, 𝜓+ − 𝑃𝜓⟩
1 𝜀 𝜀 𝜀
≤ + + ≤ 1− .
1 + 𝜀/2 10 10 10
This contradicts (a) from earlier. This concludes the proof of the theorem. □
Remark 9.4.11 (History). A early version of the density model theorem was used by Green
& Tao (2008), where it was proved using a regularity-type energy increment argument. The
above significantly simpler proof is due to Gowers (2010) and Reingold, Trevisan, Tulsiani, &
Vadhan (2008) independently. Before the work of Conlon, Fox, & Zhao (2015), one needed to
consider the Gowers uniformity norm rather than the simpler cut norm as we did above. The
use of the cut norm further simplifies the proof of the corresponding dense model theorem,
as noted by Zhao (2014).
Exercise 9.4.12. State and prove a dense model theorem for 𝑘-APs.

9.5 Sparse Counting Lemma

Let us prove an extension of the triangle counting lemma from Section 4.5. Here we work
with a sparse graph (represented by 𝑓 ) that is a subgraph of a sparse pseudorandom host
graph (represented by 𝜈) satisfying a 3-linear forms condition (involving 𝐾2,2,2 densities).
The conclusion is that if 𝑓 is close in cut norm to another dense graph 𝑔, then 𝑓 and 𝑔 have
similar triangle densities (we normalize 𝑓 for density).
Setup for this section. Throughout this section, we have three finite sets 𝑋, 𝑌 , 𝑍 (which can
also be probability spaces) representing the vertex sets of a tripartite graph. The following
functions represent edge-weighted tripartite graphs:
𝑓 , 𝑔, 𝜈 : (𝑋 × 𝑌 ) ∪ (𝑋 × 𝑍) ∪ (𝑌 × 𝑍) → R.
• 𝜈 represents the normalized edge-indicator function of a possibly sparse pseudorandom
host graph (arising from 𝑆 ⊆ Z/𝑁Z in the statement of the relative Roth theorem).
• 𝑓 represents the normalized edge-indicator function of a relatively dense subset 𝐴 ⊆ 𝑆.
• 𝑔 represents the dense model of 𝑓 .

𝑋 𝑌

For any tripartite graph 𝐹, we write 𝑡 (𝐹, 𝑓 ) for the 𝐹-density in 𝑓 (and likewise with 𝑔
9.5 Sparse Counting Lemma 293

and 𝜈). Some examples:

𝑥 𝑦
𝑡 (𝐾3 , 𝑓 ) = E 𝑥,𝑦,𝑧 𝑓 (𝑥, 𝑦) 𝑓 (𝑥, 𝑧) 𝑓 (𝑦, 𝑧) and
𝑧

𝑥′
𝑦
𝑥
𝑡 (𝐾2,1,1 , 𝐹) = E 𝑥, 𝑥 ′ ,𝑦,𝑧 𝑓 (𝑥, 𝑦) 𝑓 (𝑥 ′ , 𝑦) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥 ′ , 𝑧) 𝑓 (𝑦, 𝑧)
𝑧

We maintain the convention that 𝑥, 𝑥 ′ range uniformly over 𝑋, and so on.

The functions 𝑓 , 𝑔, 𝜈 are assumed to satisfy:
• 0 ≤ 𝑓 ≤ 𝜈 pointwise;
• 0 ≤ 𝑔 ≤ 1 pointwise;
• The 3-linear forms condition:

|𝑡 (𝐹, 𝜈) − 1| ≤ 𝜀 whenever 𝐹 ⊆ 𝐾2,2,2 ;

• When restricted to each of 𝑋 × 𝑌 , 𝑋 × 𝑍, and 𝑌 × 𝑍, we have

∥ 𝑓 − 𝑔∥ □ ≤ 𝜀.
For example, when restricted to 𝑋 × 𝑌 , the left-hand side quantity denotes
𝑓 −𝑔
𝐵
𝐴
sup E 𝑥,𝑦 ( 𝑓 − 𝑔) (𝑥, 𝑦)1 𝐴 (𝑥)1 𝐵 (𝑦) .
𝐴⊆𝑋,𝐵⊆𝑌

Throughout we assume that 𝜀 > 0 is sufficiently small, so that ≤ 𝜀 Ω(1) means ≤ 𝐶𝜀 𝑐 for
some absolute constants 𝑐, 𝐶 > 0 (which could change from line to line).
Here is the main result of this section, due to Conlon, Fox, & Zhao (2015).

Theorem 9.5.1 (Sparse triangle counting lemma)

Assume the setup in the beginning of this section. Then
|𝑡 (𝐾3 , 𝑓 ) − 𝑡 (𝐾3 , 𝑔)| ≤ 𝜀 Ω(1) .

You should now pause and review the proof of the “dense” triangle counting lemma from
Proposition 4.5.4, which says that if in addition we assume 0 ≤ 𝑓 ≤ 1 (that is, assuming
𝜈 = 1 identically), then
|𝑡 (𝐾3 , 𝑓 ) − 𝑡 (𝐾3 , 𝑔)| ≤ 3 ∥ 𝑓 − 𝑔∥ □ ≤ 3𝜀.
Roughly speaking, the proof of the dense triangle counting lemma proceeds by replacing 𝑓
by 𝑔 one edge at a time, each time incurring at most an ∥ 𝑓 − 𝑔∥ □ loss.
294 Progressions in Sparse Pseudorandom Sets

𝑓 𝑔 𝑔 𝑔

≈ ≈ ≈
𝑓 𝑓 𝑓 𝑓 𝑔 𝑓 𝑔 𝑔

Having 𝜈 = 1 should be thought of as the “dense” case. Indeed, 𝜈 = 1 correponds

to 𝑆 = Z/𝑁Z rather than having a sparse pseudorandom set 𝑆. In general, starting with a
general “sparse” 𝜈, our strategy is to reduce the problem to another triangle counting problem
where 𝜈 is replaced by 1 on one of the edges of the triangle.
𝜈 1 1 1

→ → →
𝜈 𝜈 𝜈 𝜈 𝜈 1 1 1

This densification strategy reduces a sparse triangle counting problem to a progressively

easier triangle counting problem where some of the sparse bipartite graphs among 𝑋, 𝑌 , 𝑍
become dense.
Let Sparsity(𝝂) be the number of elements of {𝑋 × 𝑌 , 𝑋 × 𝑍, 𝑌 × 𝑍 } on which 𝜈 differs
from 1. We will prove the statement:
SparseTCL(𝒌): the sparse triangle counting lemma is true whenever Sparsity(𝜈) ≤ 𝑘. (The
hidden constants may depend on 𝑘.)
We already proved the base case SparseTCL(0), which is the “dense” case corresponding
to 𝜈 = 1, as discussed earlier. We will prove the implications
SparseTCL(0) =⇒ SparseTCL(1) =⇒ SparseTCL(2) =⇒ SparseTCL(3).
We phrase our argument as an induction (a slightly unusual induction setup, as 0 ≤ 𝑘 ≤ 3).
For the induction step, it suffices to prove the conclusion of the sparse triangle counting
lemmas under the following hypothesis.
Induction hypothesis. SparseTCL(𝑘 −1) holds with 𝑘 = Sparsity(𝜈), and 𝜈 is not identically
1 on 𝑋 × 𝑌 .
The next lemma show that 𝜈 is close to 1 in a strong sense, provided that 𝜈 satisfies the
3-linear forms condition.

Lemma 9.5.2 (Strong linear forms)

Assume the setup in the beginning of this section, we have
E 𝑥,𝑦,𝑧,𝑧 ′ (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ ) ≤ 𝜀 Ω(1) .
The same statement holds if any subset of the four 𝑓 factors are replaced by 𝑔.

Proof. The proof uses two applications of the Cauchy–Schwarz inequality. Let us write down
the proof in the case when none of the four 𝑓 ’s are replaced by 𝑔’s. The other cases are similar
(basically apply 𝑔 ≤ 1 instead of 𝑓 ≤ 𝜈 wherever appropriate).
Here is a figure illustrating the first application of the Cauchy–Schwarz inequality.
9.5 Sparse Counting Lemma 295
2
𝜈−1 𝜈−1 𝜈−1
© ª © ª © ª
® ®© ª ®© ª
® ® ® ® ®
® ≤ ® 𝑓 ®®
≤ ® 𝜈 ®®
𝑓 ® 𝑓 𝑓 ®® 𝜈 ®®
𝑓 ® 𝑓
« ¬ « ¬
« ¬ « ¬ « ¬
Here are the inequalities written out:
2
E 𝑥,𝑦,𝑧,𝑧 ′ (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ )
2
= E 𝑦,𝑧,𝑧 ′ E 𝑥 (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ )
2
≤ E 𝑦,𝑧,𝑧 ′ E 𝑥 (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ ) E 𝑦,𝑧,𝑧 ′ 𝑓 (𝑦, 𝑧) 𝑓 (𝑦, 𝑧 ′ )
2
≤ E 𝑦,𝑧,𝑧 ′ E 𝑥 (𝜈(𝑥, 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝜈(𝑦, 𝑧)𝜈(𝑦, 𝑧 ′ ) E 𝑦,𝑧,𝑧 ′ 𝜈(𝑦, 𝑧)𝜈(𝑦, 𝑧 ′ ).

Note that we are able to apply 𝑓 ≤ 𝜈 in the final step above due to the nonnegativity of the
square, which arose from the Cauchy–Schwarz inequality. We could not have applied 𝑓 ≤ 𝜈
at the very beginning.
The second factor above is at most 1 + 𝜀 due to the 3-linear forms condition. It remains to
show that the first factor is ≤ 𝜀 Ω(1) . The first factor expands to
E 𝑥, 𝑥 ′ ,𝑦,𝑧,𝑧 ′ (𝜈(𝑥, 𝑦) − 1)(𝜈(𝑥 ′ , 𝑦) − 1) 𝑓 (𝑥, 𝑧) 𝑓 (𝑥, 𝑧 ′ ) 𝑓 (𝑥 ′ , 𝑧) 𝑓 (𝑥 ′ , 𝑧 ′ )𝜈(𝑦, 𝑧)𝜈(𝑦, 𝑧 ′ ).
We can upper bound the above quantity as illustrated below, using a second application of
the Cauchy–Schwarz inequality.
2
𝜈−1 𝜈−1 𝜈−1 𝜈−1 𝜈−1
© ª © ª© ª © ª© ª
® ® ® ® ®
® ® ® ® ®
® ≤ ® ®≤ ® ®
𝑓 𝜈 ®® 𝑓 𝜈 ®® 𝑓 𝜈 ®® 𝜈 𝜈 ®® 𝜈 𝜈 ®®

« ¬ « ¬« ¬ « ¬« ¬
On the right-hand side, the first factor is ≤ 𝜀 Ω(1) by the 3-linear forms condition. Indeed,
|𝑡 (𝐹, 𝜈) − 1| ≤ 𝜀 for any 𝐹 ⊆ 𝐾2,2,2 . If we expand all the 𝜈 − 1 in the first factor above, then
it becomes an alternating sum of various 𝑡 (𝐹, 𝜈) ∈ [1 − 𝜀, 1 + 𝜀] with 𝐹 ⊆ 𝐾2,2,2 , with the
main contribution 1 from each term canceling each other out. The second factor is ≤ 1 + 𝜀
again by the 3-linear forms condition.
Putting everything together, this completes the proof of the lemma. □
Define 𝜈∧ , 𝑓∧ , 𝑔∧ : 𝑋 × 𝑌 → [0, ∞) by
𝜈∧ (𝑥, 𝑦) := E𝑧 𝜈(𝑥, 𝑧)𝜈(𝑦, 𝑧), 𝑥 𝑦
𝑓∧ (𝑥, 𝑦) := E𝑧 𝑓 (𝑥, 𝑧) 𝑓 (𝑦, 𝑧),
𝑔∧ (𝑥, 𝑦) := E𝑧 𝑔(𝑥, 𝑧)𝑔(𝑦, 𝑧).
They represent codegrees. Even though 𝜈 and 𝑓 are possibly unbounded, the new weighted
graphs 𝜈∧ and 𝑓∧ behave like dense graphs because the sparseness is somehow smoothed out
(this is a key observation). On a first reading of the proof, you may wish to pretend that 𝜈∧
296 Progressions in Sparse Pseudorandom Sets

and 𝑓∧ are uniformly bounded above by 1 (in reality, we need to control the negligible bit of
𝜈 exceeding 1).
We have

𝑡 (𝐾3 , 𝑓 ) = ⟨ 𝑓 , 𝑓∧ ⟩,
and 𝑡 (𝐾3 , 𝑔) = ⟨𝑔, 𝑔∧ ⟩.

So
𝑡 (𝐾3 , 𝑓 ) − 𝑡 (𝐾3 , 𝑔) = ⟨ 𝑓 , 𝑓∧ ⟩ − ⟨𝑔, 𝑔∧ ⟩
= ⟨ 𝑓 , 𝑓∧ − 𝑔∧ ⟩ + ⟨ 𝑓 − 𝑔, 𝑔∧ ⟩.
We have
𝑓 −𝑔

|⟨ 𝑓 − 𝑔, 𝑔∧ ⟩| ≤ ∥ 𝑓 − 𝑔∥ □ ≤ 𝜀.
𝑔 𝑔

by the same argument as in the dense triangle counting lemma (Proposition 4.5.4), as
0 ≤ 𝑔 ≤ 1. So it remains to show |⟨ 𝑓 , 𝑓∧ − 𝑔∧ ⟩| ≤ 𝜀 Ω(1) .
By the Cauchy-Schwarz inequality, we have
⟨ 𝑓 , 𝑓∧ − 𝑔∧ ⟩ 2 = E[ 𝑓 ( 𝑓∧ − 𝑔∧ )] 2 ≤ E[ 𝑓 ( 𝑓∧ − 𝑔∧ ) 2 ] E 𝑓 ≤ E[𝜈( 𝑓∧ − 𝑔∧ ) 2 ] E𝜈.
The second factor is E𝜈 ≤ 1 + 𝜀 by the 3-linear forms condition. So it remains to show that
E[𝜈( 𝑓∧ − 𝑔∧ ) 2 ] = ⟨𝜈, ( 𝑓∧ − 𝑔∧ ) 2 ⟩ ≤ 𝜀 Ω(1) .
By Lemma 9.5.2
⟨𝜈 − 1, ( 𝑓∧ − 𝑔∧ ) 2 ⟩ ≤ 𝜀 Ω(1)
(to see this inequality, first expand ( 𝑓∧ − 𝑔∧ ) 2 and then apply Lemma 9.5.2 term by term).
Thus
E[𝜈( 𝑓∧ − 𝑔∧ ) 2 ] ≤ E[( 𝑓∧ − 𝑔∧ ) 2 ] + 𝜀 Ω(1) .
Thus, to prove the induction step (as stated earlier) for the sparse triangle counting lemma, it
remains to prove the following.

Lemma 9.5.3 (Densified triangle counting)

Assuming the setup at the beginning of the section as well as the induction hypothesis,
we have
E[( 𝑓∧ − 𝑔∧ ) 2 ] ≤ 𝜀 Ω(1) . (9.5)

Let us first sketch the idea of the proof of Lemma 9.5.3. Expanding, we have
LHS of (9.5) = ⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨ 𝑓∧ , 𝑔∧ ⟩ − ⟨𝑔∧ , 𝑓∧ ⟩ + ⟨𝑔∧ , 𝑔∧ ⟩. (9.6)
Each term represents some 4-cycle density.
9.5 Sparse Counting Lemma 297

So it suffices to show that each of the four terms above differs from ⟨𝑔∧ , 𝑔∧ ⟩ by ≤ 𝜀 Ω(1) .
We are trying to show that ⟨ 𝑓∧ , 𝑓∧ ⟩ ≈ ⟨𝑔∧ , 𝑔∧ ⟩. Expanding the second factor in each ⟨·, ·⟩, we
are trying to show that
𝑓∧
E 𝑥,𝑦,𝑧 𝑓∧ (𝑥, 𝑦) 𝑓 (𝑥, 𝑧) 𝑓 (𝑦, 𝑧)
≈ E 𝑥,𝑦,𝑧 𝑔∧ (𝑥, 𝑦)𝑔(𝑥, 𝑧)𝑔(𝑦, 𝑧).
𝑓 𝑓

However, this is just another instance of the sparse triangle counting lemma! And importantly,
this instance is easier than the one we started with. Indeed, we have ∥ 𝑓∧ − 𝑔∧ ∥ □ ≤ 𝜀 Ω(1) (this
can be proved by invoking the induction hypothesis). Furthermore, the first factor 𝑓∧ (𝑥, 𝑦)
now behaves more like a bounded function (corresponding to a dense graph rather than a
sparse graph). Let us pretend for a second that 𝑓∧ ≤ 1, ignoring the negligible part of 𝑓∧
exceeding 1. Then we have reduced the original problem to a new instance of the triangle
counting lemma, except that now 𝑓 ≤ 𝜈 on 𝑋 ×𝑌 has been replaced by 𝑓∧ ≤ 1 (this is the key
point where densification occurs). Lemma 9.5.3 then follows from the induction hypothesis
as we have reduced the sparsity of the pseudorandom host graph.
Coming back to the proof, as discussed earlier, while 𝑓∧ is not necessarily ≤ 1, it is
almost so. We need to handle the error term arising from replacing 𝑓∧ by its capped version
𝑓∧ : 𝑋 × 𝑌 → [0, 1] defined by
𝑓∧ = min{ 𝑓∧ , 1} pointwise.
We have
0 ≤ 𝑓∧ − 𝑓∧ = max{ 𝑓∧ − 1, 0} ≤ max{𝜈∧ − 1, 0} ≤ |𝜈∧ − 1|. (9.7)
Also,
(E|𝜈∧ − 1|) 2 ≤ E[(𝜈∧ − 1) 2 ] = E𝜈∧2 − 2E𝜈∧ + 1 ≤ 3𝜀, (9.8)
by the 3-linear forms condition, since E𝜈∧2 and E𝜈∧ are both within 𝜀 of 1. So

⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨ 𝑓∧ , 𝑓∧ ⟩ = ⟨ 𝑓∧ − 𝑓∧ , 𝑓∧ ⟩ ≤ E |𝜈∧ − 1| 𝜈∧
= E |𝜈∧ − 1| (𝜈∧ − 1) + E |𝜈∧ − 1|
≤ E[(𝜈∧ − 1) 2 ] + E |𝜈∧ − 1|
≤ 𝜀 Ω(1) . [by (9.8)] (9.9)

Lemma 9.5.4 (Cut norm between codegrees)

With the same assumptions as Lemma 9.5.3,
∥ 𝑓∧ − 𝑔∧ ∥ □ ≤ 𝜀 Ω(1) .
298 Progressions in Sparse Pseudorandom Sets

Proof. Indeed, for any 𝐴 ⊆ 𝑋 and 𝐵 ⊆ 𝑌 , we have

⟨ 𝑓∧ − 𝑔∧ , 1 𝐴×𝐵 ⟩ = ⟨ 𝑓∧ − 𝑓∧ , 1 𝐴×𝐵 ⟩ + ⟨ 𝑓∧ − 𝑔∧ , 1 𝐴×𝐵 ⟩.
By (9.7) followed by (9.8)
⟨ 𝑓∧ − 𝑓∧ , 1 𝐴×𝐵 ⟩ ≤ E| 𝑓∧ − 𝑓∧ | ≤ E|𝜈∧ − 1| ≤ 𝜀 Ω(1) .
So it remains to show that
|⟨ 𝑓∧ − 𝑔∧ , 1 𝐴×𝐵 ⟩| ≤ 𝜀 Ω(1) .
This is true since
1 𝐴×𝐵
⟨ 𝑓∧ , 1 𝐴×𝐵 ⟩ = E 𝑥,𝑦,𝑧 1 𝐴×𝐵 (𝑥, 𝑦) 𝑓 (𝑥, 𝑧) 𝑓 (𝑦, 𝑧) 𝐵
𝐴
and ⟨𝑔∧ , 1 𝐴×𝐵 ⟩ = E 𝑥,𝑦,𝑧 1 𝐴×𝐵 (𝑥, 𝑦)𝑔(𝑥, 𝑧)𝑔(𝑦, 𝑧) 𝑓 𝑓

satisfy the hypothesis of the sparse counting lemma with 𝑓 , 𝑔, 𝜈 on 𝑋 × 𝑌 replaced by

1 𝐴×𝐵 , 1 𝐴×𝐵 , 1, thereby decreasing the sparsity of 𝜈 by 1, and hence we can apply the induction
hypothesis. □
Proof of Lemma 9.5.3. We need to show that each of the four terms on the right-hand side
of (9.6) is within 𝜀 Ω(1) of ⟨𝑔∧ , 𝑔∧ ⟩. Let us show that
|⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨𝑔∧ , 𝑔∧ ⟩| ≤ 𝜀 Ω(1) .
By (9.9), ⟨ 𝑓∧ , 𝑓∧ ⟩ differs from ⟨ 𝑓∧ , 𝑓∧ ⟩ by ≤ 𝜀 Ω(1) , and thus it suffices to show that
⟨ 𝑓∧ , 𝑓∧ ⟩ = E 𝑥,𝑦,𝑧 𝑓∧ (𝑥, 𝑦) 𝑓 (𝑥, 𝑧) 𝑓 (𝑦, 𝑧)
and
⟨𝑔∧ , 𝑔∧ ⟩ = E 𝑥,𝑦,𝑧 𝑔∧ (𝑥, 𝑦)𝑔(𝑥, 𝑧)𝑔(𝑦, 𝑧)
differ by ≤ 𝜀 . To show this, we apply the induction hypothesis to the setting where 𝑓 , 𝑔, 𝜈
Ω(1)

on 𝑋 × 𝑌 are replaced by 𝑓∧ , 𝑔, 1 (recall from Lemma 9.5.4 that ∥ 𝑓∧ − 𝑔∥ □ ≤ 𝜀 Ω(1) ), which

reduces the sparsity of 𝜈 by 1. So the induction hypothesis implies

⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨𝑔∧ , 𝑔∧ ⟩ ≤ 𝜀 Ω(1) .

Thus |⟨ 𝑓∧ , 𝑓∧ ⟩ − ⟨𝑔∧ , 𝑔∧ ⟩| ≤ 𝜀 Ω(1) . Likewise, the other terms on the right-hand side of
(9.9) are within 𝜀 Ω(1) of ⟨𝑔∧ , 𝑔∧ ⟩ (Exercise!). The conclusion E[( 𝑓∧ − 𝑔∧ ) 2 ] ≤ 𝜀 Ω(1) then
follows. □
Exercise 9.5.5. State and prove a generalization of the sparse counting lemma to count
an arbitrary but fixed subgraph (replacing the triangle above). How about hypergraphs?
9.6 Proof of the Relative Roth Theorem 299

9.6 Proof of the Relative Roth Theorem

Now we combine the dense model theorem and the sparse triangle counting lemma to prove
the relative Roth theorem:
Theorem 9.2.5 (restated). For every 𝛿 > 0, there exist 𝜀 > 0 and 𝑁0 so that for all 𝑁 ≥ 𝑁0 ,
if 𝑆 ⊆ Z/𝑁Z satisfies the 3-linear forms condition with tolerance 𝜀, then every 3-AP-free
subset of 𝑆 has size less than 𝛿 |𝑆|.
Recall that with 𝑥 0 , 𝑥1 , 𝑦 0 , 𝑦 1 , 𝑧0 , 𝑧1 ∈ Z/𝑁Z chosen independently and uniformly at ran-
dom, the set 𝑆 ⊆ Z/𝑁Z with |𝑆| = 𝑝𝑁 satisfies the 3-linear forms condition with tolerance
𝜺 if the probability that

 



−𝑦 0 − 2𝑧0 , 𝑥 0 − 𝑧0 , 2𝑥0 + 𝑦 0 , 

 −𝑦 1 − 2𝑧0 ,
 𝑥 1 − 𝑧0 , 2𝑥1 + 𝑦 0 , 

⊆𝑆

 −𝑦 0 − 2𝑧1 , 𝑥 0 − 𝑧1 , 2𝑥0 + 𝑦 1 , 


 −𝑦 1 − 2𝑧1 , 

 𝑥 1 − 𝑧1 , 2𝑥1 + 𝑦 1 
lies in the interval (1 ± 𝜀) 𝑝 12 , and furthermore the same holds if we erase any subset of
the above 12 linear forms and also change the “12” in 𝑝 12 to the number of linear forms
remaining.
The proof follows the strategy outlined in Section 9.3 on the transference principle.
We need a counting version of Roth’s theorem. As in Chapter 6, we define, for 𝑓 : Z/𝑁Z →
R, its 3-AP density by
𝚲3 ( 𝒇 ) := E 𝑥,𝑑 ∈Z/𝑁 Z 𝑓 (𝑥) 𝑓 (𝑥 + 𝑑) 𝑓 (𝑥 + 2𝑑).

Theorem 9.6.1 (Roth’s theorem, functional/counting version)

For every 𝛿 > 0, there exists 𝑐 = 𝑐(𝛿) > 0 such that every 𝑓 : Z/𝑁Z → [0, 1] with
E 𝑓 ≥ 𝛿,
Λ3 ( 𝑓 ) ≥ 𝑐.

Exercise 9.6.2. Deduce the above version of Roth’s theorem from the existence version
(namely that every 3-AP-free subset of [𝑁] has size 𝑜(𝑁).)
Proof of the relative Roth theorem (Theorem 9.2.5). Let 𝑝 = |𝑆| /𝑁. Define
𝜈 : Z/𝑁Z → [0, ∞) by 𝜈 = 𝑝 −1 1𝑆 .
Let 𝑋 = 𝑌 = 𝑍 = Z/𝑁Z. Consider the associated edge-weighted tripartite graph
𝜈 ′ : (𝑋 × 𝑌 ) ∪ (𝑋 × 𝑍) ∪ (𝑌 × 𝑍) → [0, ∞)
defined by, for 𝑥 ∈ 𝑋, 𝑦 ∈ 𝑌 , and 𝑧 ∈ 𝑍,
𝜈 ′ (𝑥, 𝑦) = 𝜈(2𝑥 + 𝑦), 𝜈 ′ (𝑥, 𝑧) = 𝜈(𝑥 − 𝑧), 𝜈 ′ (𝑦, 𝑧) = 𝜈(−𝑦 − 2𝑧).
Since 𝜈 satisfies the 3-linear forms condition (as a function on Z/𝑁Z), 𝜈 ′ also satisfies the
3-linear forms condition in the sense of Section 9.5. Likewise,
∥𝜈 − 1∥ □ = ∥𝜈 ′ − 1∥ □
300 Progressions in Sparse Pseudorandom Sets

where ∥𝜈 − 1∥ □ on the left-hand side is in the sense of Section 9.4 and ∥𝜈 ′ − 1∥ □ is defined
as in Section 9.5 with 𝜈 ′ is restricted to 𝑋 × 𝑌 (the same would be true had we restricted to
𝑋 × 𝑍 or 𝑌 × 𝑍). Indeed,
∥𝜈 − 1∥ □ = sup E(𝜈(𝑥 + 𝑦) − 1)1 𝐴 (𝑥)1 𝐵 (𝑦)
𝐴⊆𝑋,𝐵⊆𝑌

whereas
∥𝜈 ′ − 1∥ □ = sup E(𝜈 ′ (𝑥, 𝑦) − 1)1 𝐴 (𝑥)1 𝐵 (𝑦)
𝐴⊆𝑋,𝐵⊆𝑌

= sup E(𝜈(2𝑥 + 𝑦) − 1)1 𝐴 (𝑥)1 𝐵 (𝑦)

𝐴⊆𝑋,𝐵⊆𝑌

and these two expressions are equal to each other after a change of variables 𝑥 ↔ 2𝑥 (which
is a bĳection as 𝑁 is odd).
By Lemma 9.5.2 (or simply two applications of the Cauchy–Schwarz inequality followed
by the 3-linear forms condition), we obtain
∥𝜈 − 1∥ □ ≤ 𝜀 Ω(1) .
Now suppose 𝐴 ⊆ 𝑆 and | 𝐴| ≥ 𝛿𝑁. Define 𝑓 : Z/𝑁Z → [0, ∞) by
𝑓 = 𝑝 −1 1 𝐴
so that 0 ≤ 𝑓 ≤ 𝜈 pointwise. Then by the dense model theorem (Theorem 9.4.6), there exists
a function 𝑔 : Z/𝑁Z → [0, 1] such that
∥ 𝑓 − 𝑔∥ □ ≤ 𝜂,
where 𝜂 = 𝜂(𝜀) is some quantity that tends to zero as 𝜀 → 0.
Define the associated edge-weighted tripatite graphs
𝑓 ′ , 𝑔 ′ : (𝑋 × 𝑌 ) ∪ (𝑋 × 𝑍) ∪ (𝑌 × 𝑍) → [0, ∞)
where, for 𝑥 ∈ 𝑋, 𝑦 ∈ 𝑌 , and 𝑧 ∈ 𝑍,
𝑓 ′ (𝑥, 𝑦) = 𝑓 (2𝑥 + 𝑦), 𝑓 ′ (𝑥, 𝑧) = 𝑓 (𝑥 − 𝑧), 𝑓 ′ (𝑦, 𝑧) = 𝑓 (−𝑦 − 2𝑧),
𝑔 ′ (𝑥, 𝑦) = 𝑔(2𝑥 + 𝑦), 𝑔 ′ (𝑥, 𝑧) = 𝑔(𝑥 − 𝑧), 𝑔 ′ (𝑦, 𝑧) = 𝑔(−𝑦 − 2𝑧).
Note that 𝑔 ′ takes values in [0, 1]. Then
∥ 𝑓 ′ − 𝑔 ′ ∥ □ = ∥ 𝑓 − 𝑔∥ □ ≤ 𝜂
when 𝑓 ′ − 𝑔 ′ is interpreted as restricted to 𝑋 × 𝑌 (and the same for 𝑋 × 𝑍 or 𝑌 × 𝑍). Thus
by the sparse triangle counting lemma (Theorem 9.5.1), we have
|𝑡 (𝐾3 , 𝑓 ′ ) − 𝑡 (𝐾3 , 𝑔 ′ )| ≤ 𝜂Ω(1) .
Note that
𝑡 (𝐾3 , 𝑓 ′ ) = E 𝑥,𝑦,𝑧 𝑓 ′ (𝑥, 𝑦) 𝑓 ′ (𝑥, 𝑧) 𝑓 ′ (𝑦, 𝑧)
= E 𝑥,𝑦,𝑧 ∈Z/𝑁 Z 𝑓 (2𝑥 + 𝑦) 𝑓 (𝑥 − 𝑧) 𝑓 (−𝑦 − 2𝑧)
= E 𝑥,𝑑 ∈Z/𝑁 Z 𝑓 (𝑥) 𝑓 (𝑥 + 𝑑) 𝑓 (𝑥 + 2𝑑).
= Λ3 ( 𝑓 )
9.6 Proof of the Relative Roth Theorem 301

Likewise, 𝑡 (𝐾3 , 𝑔 ′ ) = Λ3 (𝑔). And so

|Λ3 ( 𝑓 ) − Λ3 (𝑔)| ≤ 𝜂Ω(1) . (9.10)
We have
E𝑔 ≥ E 𝑓 − 𝜂 ≥ 𝛿 − 𝜂.
Provided that 𝜀 is chosen to be small enough so that 𝜂 is small enough (say, so that E𝑔 ≥
𝛿/2), we deduce from Roth’s theorem (the functional version, Theorem 9.6.1) Λ3 (𝑔) ≳ 𝛿 1.
Therefore
(9.10)
𝑝 −3 𝑁 −2 |{(𝑥, 𝑑) : 𝑥, 𝑥 + 𝑑, 𝑥 + 2𝑑 ∈ 𝐴}| = Λ3 ( 𝑓 ) ≥ Λ3 (𝑔) − 𝜂Ω(1) ≳ 𝛿 1
provided that 𝜂 is sufficiently small. We can now conclude that 𝐴 must have a non-trivial
3-AP if 𝑁 is large enough. Indeed, if 𝐴 were 3-AP-free, then
|{(𝑥, 𝑑) : 𝑥, 𝑥 + 𝑑, 𝑥 + 2𝑑 ∈ 𝐴}| = | 𝐴| ≤ |𝑆| = 𝑝𝑁,
and so the above inequality would imply 𝑝 ≲ 𝛿 𝑁 −1/2 . However, this would be incompatible
with the 3-linear forms condition on 𝑆, since the probability that random 𝑥0 , 𝑥1 , 𝑦 0 , 𝑦 1 , 𝑧0 , 𝑧1 ∈
Z/𝑁Z satisfy
 −𝑦 0 − 2𝑧 0 , 𝑥 0 − 𝑧0 , 2𝑥0 + 𝑦 0 , 
 

 
 −𝑦 1 − 2𝑧 0 , 𝑥 1 − 𝑧0 , 2𝑥1 + 𝑦 0 , 
 

⊆𝑆
 −𝑦 0 − 2𝑧1 , 𝑥 0 − 𝑧1 , 2𝑥0 + 𝑦 1 , 
 

 −𝑦 1 − 2𝑧1 , 𝑥 1 − 𝑧1 , 2𝑥1 + 𝑦 1  
 
lies in the interval (1 ± 𝜀) 𝑝 12 , but this probability is at least |𝑆| /𝑁 5 = 𝑝/𝑁 4 (the probability
that all 12 terms above are equal to the same element of 𝑆). So (1 + 𝜀) 𝑝 12 ≥ 𝑝𝑁 −4 , and hence
𝑝 ≳ 𝑁 −4/11 , which would contradict the earlier 𝑝 ≲ 𝛿 𝑁 −1/2 if 𝑁 is large enough. □
Remark 9.6.3. The above proof generalizes to a proof of the relative Szemerédi theorem,
assuming Szemerédi’s theorem as a black box.
All the arguments in this chapter can be generalized to deduce the relative Szemerédi
theorem (Theorem 9.2.7) from Szemerédi’s theorem. The ideas are essentially the same,
although the notation gets heavier.

Further Reading
The original paper by Green & Tao (2008) titled The Primes Contain Arbitrarily Long
Arithmetic Progressions is worth reading. Their follow-up paper Linear Equations in Primes
(2010a) substantially strengthens the result to asymptotically count the number of 𝑘-APs
in the primes, though the proof was conditional on several claims that were subsequently
proved, most notably the inverse theorem for Gowers uniformity norms (Green, Tao, &
Ziegler 2012).
A number of expository articles were written on this topic shortly after the breakthroughs:
Green (2007b, 2014), Tao (2007b), Kra (2006), Wolf (2013).
The graph-theoretic approach taken in chapter is adapted from the article The Green–Tao
Theorem: an Exposition by Conlon, Fox, & Zhao (2014). The article presents a full proof
302 Progressions in Sparse Pseudorandom Sets

of the Green–Tao theorem that incorporates various simplifications found since the original
work. The analytic number theoretic arguments, which were omitted from this chapter, can
also be found in that article.
Chapter Summary
• Green–Tao theorem. The primes contain arbitrarily long arithmetic progressions. Proof
strategy:
– Embed the primes in a slightly larger set, the “almost primes,” which enjoys certain
pseudorandomness properties.
– Show that every 𝑘-AP-free subset of such a pseudorandom set must have negligible
size.
• Relative Szemerédi theorem. If 𝑆 ⊆ Z/𝑁Z satisfies a 𝒌-linear forms condition, then
every 𝑘-AP-free subset of 𝑆 has size 𝑜(|𝑆|).
– The 3-linear forms condition is a pseudorandomness hypothesis. It says that the asso-
ciated tripartite graph has 𝐹-density close to random whenever 𝐹 ⊆ 𝐾2,2,2 .
• Proof of the relative Szemerédi theorem uses the transference principle to transfer
Szemerédi’s theorem from the dense setting to the sparse pseudorandom setting.
– First approximate 𝐴 ⊆ 𝑆 by a dense set 𝐵 ⊆ Z/𝑁Z (dense model theorem).
– Then show that the normalized count of 𝑘-APs in 𝐴 and 𝐵 are similar (sparse counting
lemma).
– Finally conclude using Szemerédi’s theorem that 𝐵 has many 𝑘-APs, and therefore so
must 𝐴.
• Dense model theorem. If a sparse set 𝑆 is close to random in normalized cut norm, then
every subset 𝐴 ⊆ 𝑆 can be approximated by some dense 𝐵 ⊆ Z/𝑁Z in normalized cut
norm.
• Sparse counting lemma. If two graphs (one sparse and one dense) are close to normalized
cut norm, then they have similar triangle counts, provided that the sparse graph lies inside
a sparse pseudorandom graph satisfying the 3-linear forms condition (which says that the
densities of 𝐾2,2,2 and its subgraphs are close to random).
References

Ajtai, M. & Szemerédi, E. (1974)

Sets of lattice points that form no squares, Studia Sci. Math. Hungar. 9, 9–11 (1975). MR:369299
Ajtai, M., Chvátal, V., Newborn, M. M., & Szemerédi, E. (1982)
Crossing-free subgraphs, Theory and practice of combinatorics, North-Holland, pp. 9–12. MR:806962
Alon, N. & Milman, V. D. (1985)
𝜆 1 , isoperimetric inequalities for graphs, and superconcentrators, J. Combin. Theory Ser. B 38, 73–88.
MR:782626
Alon, Noga (1986)
Eigenvalues and expanders, Combinatorica 6, 83–96. MR:875835
Alon, Noga & Naor, Assaf (2006)
Approximating the cut-norm via Grothendieck’s inequality, SIAM J. Comput. 35, 787–803. MR:2203567
Alon, Noga & Shapira, Asaf (2008)
A characterization of the (natural) graph properties testable with one-sided error, SIAM J. Comput. 37,
1703–1727. MR:2386211
Alon, Noga & Spencer, Joel H. (2016)
The probabilistic method, fourth ed., Wiley. MR:3524748
Alon, Noga, Rónyai, Lajos, & Szabó, Tibor (1999)
Norm-graphs: variations and applications, J. Combin. Theory Ser. B 76, 280–290. MR:1699238
Alon, Noga, Fischer, Eldar, Krivelevich, Michael, & Szegedy, Mario (2000)
Efficient testing of large graphs, Combinatorica 20, 451–476. MR:1804820
Alon, Noga, Krivelevich, Michael, & Sudakov, Benny (2003a)
Turán numbers of bipartite graphs and related Ramsey-type questions, Combin. Probab. Comput. 12,
477–494. MR:2037065
Alon, Noga, Fernandez de la Vega, W., Kannan, Ravi, & Karpinski, Marek (2003b)
Random sampling and approximation of MAX-CSPs, J. Comput. System Sci. 67, 212–243. MR:2022830
Artin, Emil (1927)
Über die Zerlegung definiter Funktionen in Quadrate, Abh. Math. Sem. Univ. Hamburg 5, 100–115.
MR:3069468
Atkinson, F. V., Watterson, G. A., & Moran, P. A. P. (1960)
A matrix inequality, Quart. J. Math. Oxford Ser. 11, 137–140. MR:118731
Babai, Lászlo & Frankl, Péter (2020)
Linear algebra methods in combinatorics, book draft, http://people.cs.uchicago.edu/~laci/
CLASS/HANDOUTS-COMB/BaFrNew.pdf.

303
304 References

Baker, R. C., Harman, G., & Pintz, J. (2001)

The difference between consecutive primes. II, Proc. Lond. Math. Soc. 83, 532–562. MR:1851081
Balog, Antal & Szemerédi, Endre (1994)
A statistical theorem of set addition, Combinatorica 14, 263–268. MR:1305895
Balogh, József, Morris, Robert, & Samotĳ, Wojciech (2015)
Independent sets in hypergraphs, J. Amer. Math. Soc. 28, 669–709. MR:3327533
Balogh, József, Hu, Ping, Lidický, Bernard, & Pfender, Florian (2016)
Maximum density of induced 5-cycle is achieved by an iterated blow-up of 5-cycle, European J. Combin.
52, 47–58. MR:3425964
Balogh, József, Morris, Robert, & Samotĳ, Wojciech (2018)
The method of hypergraph containers, Proceedings of the International Congress of Mathematicians—Rio
de Janeiro 2018. Vol. IV. Invited lectures, World Scientific Publishing, pp. 3059–3092. MR:3966523
Bateman, Michael & Katz, Nets Hawk (2012)
New bounds on cap sets, J. Amer. Math. Soc. 25, 585–613. MR:2869028
Behrend, F. A. (1946)
On sets of integers which contain no three terms in arithmetical progression, Proc. Natl. Acad. Sci. USA
32, 331–332. MR:18694
Benson, Clark T. (1966)
Minimal regular graphs of girths eight and twelve, Canadian J. Math. 18, 1091–1094. MR:197342
Bergelson, V. & Leibman, A. (1996)
Polynomial extensions of van der Waerden’s and Szemerédi’s theorems, J. Amer. Math. Soc. 9, 725–753.
MR:1325795
Bergelson, Vitaly, Host, Bernard, & Kra, Bryna (2005)
Multiple recurrence and nilsequences, Invent. Math. 160, 261–303. MR:2138068
Bilu, Yonatan & Linial, Nathan (2006)
Lifts, discrepancy and nearly optimal spectral gap, Combinatorica 26, 495–519. MR:2279667
Blakley, G. R. & Roy, Prabir (1965)
A Hölder type inequality for symmetric matrices with nonnegative entries, Proc. Amer. Math. Soc. 16,
1244–1245. MR:184950
Blasiak, Jonah, Church, Thomas, Cohn, Henry, Grochow, Joshua A., Naslund, Eric, Sawin, William F.,
& Umans, Chris (2017)
On cap sets and the group-theoretic approach to matrix multiplication, Discrete Anal., Paper No. 3, 27
pp. MR:3631613
Blichfeldt, H. F. (1914)
A new principle in the geometry of numbers, with some applications, Trans. Amer. Math. Soc. 15,
227–235. MR:1500976
Bloom, Thomas F. & Sisask, Olof (2020)
Breaking the logarithmic barrier in Roth’s theorem on arithmetic progressions. arXiv:2007.03528
Bogolyubov, N. (1939)
Sur quelques propriétés arithmétiques des presque-périodes, Ann. Chaire Phys. Math. Kiev 4, 185–205.
MR:20164
Bollobás, Béla (1976)
Relations between sets of complete subgraphs, Proceedings of the Fifth British Combinatorial Conference
(Univ. Aberdeen, Aberdeen, 1975), pp. 79–84. MR:0396327
References 305

Bollobás, Béla (1998)

Modern graph theory, Springer-Verlag. MR:1633290
Bollobás, Béla & Thomason, Andrew (1995)
Projections of bodies and hereditary properties of hypergraphs, Bull. London Math. Soc. 27, 417–424.
MR:1338683
Bondy, J. A. & Murty, U. S. R. (2008)
Graph theory, Springer. MR:2368647
Bondy, J. A. & Simonovits, M. (1974)
Cycles of even length in graphs, J. Combin. Theory Ser. B 16, 97–105. MR:340095
Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T., & Vesztergombi, K. (2008)
Convergent sequences of dense graphs. I. Subgraph frequencies, metric properties and testing, Adv. Math.
219, 1801–1851. MR:2455626
Bourgain, J. (1999)
On triples in arithmetic progression, Geom. Funct. Anal. 9, 968–984. MR:1726234
Bourgain, J., Katz, N., & Tao, T. (2004)
A sum-product estimate in finite fields, and applications, Geom. Funct. Anal. 14, 27–57. MR:2053599
Bourgain, J., Glibichuk, A. A., & Konyagin, S. V. (2006)
Estimates for the number of sums and products and for exponential sums in fields of prime order, J. Lond.
Math. Soc. 73, 380–398. MR:2225493
Brown, W. G. (1966)
On graphs that do not contain a Thomsen graph, Canad. Math. Bull. 9, 281–285. MR:200182
Brown, W. G., Erdős, P., & Sós, V. T. (1973)
Some extremal problems on 𝑟-graphs, New directions in the theory of graphs (Proc. Third Ann Arbor
Conf., Univ. Michigan, Ann Arbor, Mich, 1971), pp. 53–63. MR:0351888
Bukh, Boris (2015)
Random algebraic construction of extremal graphs, Bull. Lond. Math. Soc. 47, 939–945. MR:3431574
Bukh, Boris (2021)
Extremal graphs without exponentially-small bicliques. arXiv:2107.04167
Chang, Mei-Chu (2002)
A polynomial bound in Freiman’s theorem, Duke Math. J. 113, 399–419. MR:1909605
Chatterjee, Sourav (2016)
An introduction to large deviations for random graphs, Bull. Amer. Math. Soc. 53, 617–642. MR:3544262
Chatterjee, Sourav (2017)
Large deviations for random graphs, Springer. MR:3700183
Chatterjee, Sourav & Varadhan, S. R. S. (2011)
The large deviation principle for the Erdős-Rényi random graph, European J. Combin. 32, 1000–1017.
MR:2825532
Cheeger, Jeff (1970)
A lower bound for the smallest eigenvalue of the Laplacian, Problems in analysis (Papers dedicated to
Salomon Bochner, 1969), pp. 195–199. MR:0402831
Chung, F. R. K., Graham, R. L., Frankl, P., & Shearer, J. B. (1986)
Some intersection theorems for ordered sets and graphs, J. Combin. Theory Ser. A 43, 23–37. MR:859293
306 References

Chung, F. R. K., Graham, R. L., & Wilson, R. M. (1989)

Quasi-random graphs, Combinatorica 9, 345–362. MR:1054011
Chung, Fan R. K. (1997)
Spectral graph theory, American Mathematical Society. MR:1421568
Conlon, D. & Gowers, W. T. (2016)
Combinatorial theorems in sparse random sets, Ann. of Math. 184, 367–454. MR:3548529
Conlon, David (2021)
Extremal numbers of cycles revisited, Amer. Math. Monthly 128, 464–466. MR:4249723
Conlon, David & Fox, Jacob (2013)
Graph removal lemmas, Surveys in combinatorics 2013, Cambridge University Press, pp. 1–49.
MR:3156927
Conlon, David & Zhao, Yufei (2017)
Quasirandom Cayley graphs, Discrete Anal., Paper No. 6, 14 pp. MR:3631610
Conlon, David, Fox, Jacob, & Sudakov, Benny (2010)
An approximate version of Sidorenko’s conjecture, Geom. Funct. Anal. 20, 1354–1366. MR:2738996
Conlon, David, Fox, Jacob, & Zhao, Yufei (2014)
The Green-Tao theorem: an exposition, EMS Surv. Math. Sci. 1, 249–282. MR:3285854
Conlon, David, Fox, Jacob, & Zhao, Yufei (2015)
A relative Szemerédi theorem, Geom. Funct. Anal. 25, 733–762. MR:3361771
Conlon, David, Kim, Jeong Han, Lee, Choongbum, & Lee, Joonkyung (2018)
Some advances on Sidorenko’s conjecture, J. Lond. Math. Soc. 98, 593–608. MR:3893193
Coppersmith, Don & Winograd, Shmuel (1990)
Matrix multiplication via arithmetic progressions, J. Symbolic Comput. 9, 251–280. MR:1056627
Croot, Ernie, Lev, Vsevolod F., & Pach, Péter Pál (2017)
Progression-free sets in Z4𝑛 are exponentially small, Ann. of Math. 185, 331–337. MR:3583357
Davidoff, Giuliana, Sarnak, Peter, & Valette, Alain (2003)
Elementary number theory, group theory, and Ramanujan graphs, Cambridge University Press.
MR:1989434
Dickson, L. E. (1909)
On the congruence 𝑥 𝑛 + 𝑦 𝑛 + 𝑧 𝑛 ≡ 0 ( mod 𝑝), J. Reine Angew. Math. 135, 134–141. MR:1580764
Diestel, Reinhard (2017)
Graph theory, fifth ed., Springer. MR:3644391
Dodziuk, Jozef (1984)
Difference equations, isoperimetric inequality and transience of certain random walks, Trans. Amer.
Math. Soc. 284, 787–794. MR:743744
Dvir, Zeev (2012)
Incidence theorems and their applications, Found. Trends Theor. Comput. Sci. 6, 257–393. MR:3004132
Edel, Yves (2004)
Extensions of generalized product caps, Des. Codes Cryptogr. 31, 5–14. MR:2031694
Elekes, György (1997)
On the number of sums and products, Acta Arith. 81, 365–367. MR:1472816
References 307

Elkin, Michael (2011)

An improved construction of progression-free sets, Israel J. Math. 184, 93–128. MR:2823971
Ellenberg, Jordan S. & Gĳswĳt, Dion (2017)
On large subsets of F𝑞𝑛 with no three-term arithmetic progression, Ann. of Math. 185, 339–343.
MR:3583358
Erdős, P. (1971)
On some extremal problems on 𝑟-graphs, Discrete Math. 1, 1–6. MR:297602
Erdős, P. & Simonovits, M. (1966)
A limit theorem in graph theory, Studia Sci. Math. Hungar. 1, 51–57. MR:205876
Erdős, P. & Szemerédi, E. (1983)
On sums and products of integers, Studies in pure mathematics, Birkhäuser, pp. 213–218. MR:820223
Erdős, P., Rényi, A., & Sós, V. T. (1966)
On a problem of graph theory, Studia Sci. Math. Hungar. 1, 215–235. MR:223262
Erdős, Paul (1984)
On some problems in graph theory, combinatorial analysis and combinatorial number theory, Graph
theory and combinatorics (Cambridge, 1983), Academic Press, pp. 1–17. MR:777160
Erdős, P. (1946)
On sets of distances of 𝑛 points, Amer. Math. Monthly 53, 248–250. MR:15796
Erdős, P. & Stone, A. H. (1946)
On the structure of linear graphs, Bull. Amer. Math. Soc. 52, 1087–1091. MR:18807
Erdős, Paul (1955)
Some remarks on number theory, Riveon Lematematika 9, 45–48. MR:73619
Erdős, Paul & Turán, Paul (1936)
On Some Sequences of Integers, J. Lond. Math. Soc. 11, 261–264. MR:1574918
Even Zohar, Chaim (2012)
On sums of generating sets in Z2𝑛 , Combin. Probab. Comput. 21, 916–941. MR:2981161
Finner, Helmut (1992)
A generalization of Hölder’s inequality and some probability inequalities, Ann. Probab. 20, 1893–1901.
MR:1188047
Ford, Kevin (2008)
The distribution of integers with a divisor in a given interval, Ann. of Math. 168, 367–433. MR:2434882
Fox, Jacob (2011)
A new proof of the graph removal lemma, Ann. of Math. 174, 561–579. MR:2811609
Fox, Jacob & Pham, Huy Tuan (2019)
Popular progression differences in vector spaces II, Discrete Anal., Paper No. 16, 39 pp. MR:4042159
Fox, Jacob & Sudakov, Benny (2011)
Dependent random choice, Random Structures Algorithms 38, 68–99. MR:2768884
Fox, Jacob & Zhao, Yufei (2015)
A short proof of the multidimensional Szemerédi theorem in the primes, Amer. J. Math. 137, 1139–1145.
MR:3372317
Fox, Jacob, Pham, Huy Tuan, & Zhao, Yufei (2022)
Tower-type bounds for Roth’s theorem with popular differences, J. Eur. Math. Soc. (JEMS).
308 References

Frankl, Peter & Rödl, Vojtěch (2002)

Extremal problems on set systems, Random Structures Algorithms 20, 131–164. MR:1884430
Freiman, G. A. (1973)
Foundations of a structural theory of set addition, American Mathematical Society, Providence, R.I.,
Translated from the Russian. MR:0360496
Friedgut, Ehud (2004)
Hypergraphs, entropy, and inequalities, Amer. Math. Monthly 111, 749–760. MR:2104047
Friedman, Joel (2008)
A proof of Alon’s second eigenvalue conjecture and related problems, Mem. Amer. Math. Soc. 195,
viii+100. MR:2437174
Frieze, Alan & Kannan, Ravi (1999)
Quick approximation to matrices and applications, Combinatorica 19, 175–220. MR:1723039
Fulton, William & Harris, Joe (1991)
Representation theory: a first course, Springer-Verlag. MR:1153249
Füredi, Zoltán (1991)
On a Turán type problem of Erdős, Combinatorica 11, 75–79. MR:1112277
Füredi, Zoltan & Gunderson, David S. (2015)
Extremal numbers for odd cycles, Combin. Probab. Comput. 24, 641–645. MR:3350026
Füredi, Zoltán & Simonovits, Miklós (2013)
The history of degenerate (bipartite) extremal graph problems, Erdös centennial, János Bolyai Mathe-
matical Society, pp. 169–264. MR:3203598
Furstenberg, H. (1977)
Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions, J.
Analyse Math. 31, 204–256. MR:0498471
Furstenberg, H. & Katznelson, Y. (1978)
An ergodic Szemerédi theorem for commuting transformations, J. Analyse Math. 34, 275–291. MR:531279
Galvin, David (2014)
Three tutorial lectures on entropy and counting. arXiv:1406.7872
Galvin, David & Tetali, Prasad (2004)
On weighted graph homomorphisms, Graphs, morphisms and statistical physics, American Mathematical
Society, pp. 97–104. MR:2056231
Goemans, Michel X. & Williamson, David P. (1995)
Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite
programming, J. Assoc. Comput. Mach. 42, 1115–1145. MR:1412228
Goodman, A. W. (1959)
On sets of acquaintances and strangers at any party, Amer. Math. Monthly 66, 778–783. MR:107610
Gowers, W. T. (1997)
Lower bounds of tower type for Szemerédi’s uniformity lemma, Geom. Funct. Anal. 7, 322–337.
MR:1445389
Gowers, W. T. (1998a)
A new proof of Szemerédi’s theorem for arithmetic progressions of length four, Geom. Funct. Anal. 8,
529–551. MR:1631259
References 309

Gowers, W. T. (1998b)
Additive and combinatorial number theory, online lecture notes written by Jacques Verstraëte based on
a course given by W. T. Gowers, https://www.dpmms.cam.ac.uk/~wtg10/.
Gowers, W. T. (2001)
A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11, 465–588. MR:1844079
Gowers, W. T. (2006)
Quasirandomness, counting and regularity for 3-uniform hypergraphs, Combin. Probab. Comput. 15,
143–184. MR:2195580
Gowers, W. T. (2007)
Hypergraph regularity and the multidimensional Szemerédi theorem, Ann. of Math. 166, 897–946.
MR:2373376
Gowers, W. T. (2008)
Quasirandom groups, Combin. Probab. Comput. 17, 363–387. MR:2410393
Gowers, W. T. (2010)
Decompositions, approximate structure, transference, and the Hahn-Banach theorem, Bull. Lond. Math.
Soc. 42, 573–606. MR:2669681
Graham, Ronald L., Rothschild, Bruce L., & Spencer, Joel H. (1990)
Ramsey theory, second ed., Wiley. MR:1044995
Green, B. (2005a)
A Szemerédi-type regularity lemma in abelian groups, with applications, Geom. Funct. Anal. 15, 340–376.
MR:2153903
Green, Ben (2005b)
Roth’s theorem in the primes, Ann. of Math. (2) 161, 1609–1636. MR:2180408
Green, Ben (2005c)
Finite field models in additive combinatorics, Surveys in combinatorics 2005, Cambridge University
Press, pp. 1–27. MR:2187732
Green, Ben (2007a)
Montréal notes on quadratic Fourier analysis, Additive combinatorics, American Mathematical Society,
pp. 69–102. MR:2359469
Green, Ben (2007b)
Long arithmetic progressions of primes, Analytic Number Theory: A Tribute to Gauss and Dirichlet,
American Mathematical Society, pp. 149–167. MR:2362199
Green, Ben (2009a)
Additive combinatorics (book review), Bull. Amer. Math. Soc. 46, 489–497. MR:2507281
Green, Ben (2009b)
Additive combinatorics, lecture notes, http://people.maths.ox.ac.uk/greenbj/notes.html.
Green, Ben (2014)
Approximate algebraic structure, Proceedings of the International Congress of Mathematicians—Seoul
2014. Vol. 1, Kyung Moon Sa, pp. 341–367. MR:3728475
Green, Ben & Ruzsa, Imre Z. (2007)
Freiman’s theorem in an arbitrary abelian group, J. Lond. Math. Soc. 75, 163–175. MR:2302736
Green, Ben & Tao, Terence (2008)
The primes contain arbitrarily long arithmetic progressions, Ann. of Math. 167, 481–547. MR:2415379
310 References

Green, Ben & Tao, Terence (2010a)

Linear equations in primes, Ann. of Math. 171, 1753–1850. MR:2680398
Green, Ben & Tao, Terence (2010b)
An equivalence between inverse sumset theorems and inverse conjectures for the 𝑈 3 norm, Math. Proc.
Cambridge Philos. Soc. 149, 1–19. MR:2651575
Green, Ben & Tao, Terence (2010c)
An arithmetic regularity lemma, an associated counting lemma, and applications, An irregular mind,
János Bolyai Mathematical Society, pp. 261–334. MR:2815606
Green, Ben & Tao, Terence (2017)
New bounds for Szemerédi’s theorem, III: a polylogarithmic bound for 𝑟 4 (𝑁), Mathematika 63, 944–1040.
MR:3731312
Green, Ben & Wolf, Julia (2010)
A note on Elkin’s improvement of Behrend’s construction, Additive number theory, Springer, pp. 141–144.
MR:2744752
Green, Ben, Tao, Terence, & Ziegler, Tamar (2012)
An inverse theorem for the Gowers 𝑈 𝑠+1 [𝑁]-norm, Ann. of Math. 176, 1231–1372. MR:2950773
Grothendieck, A. (1953)
Résumé de la théorie métrique des produits tensoriels topologiques, Bol. Soc. Mat. São Paulo 8, 1–79.
MR:94682
Grzesik, Andrzej (2012)
On the maximum number of five-cycles in a triangle-free graph, J. Combin. Theory Ser. B 102, 1061–1066.
MR:2959390
Guth, Larry (2016)
Polynomial methods in combinatorics, American Mathematical Society. MR:3495952
Guth, Larry & Katz, Nets Hawk (2015)
On the Erdős distinct distances problem in the plane, Ann. of Math. 181, 155–190. MR:3272924
Hardy, G. H. & Ramanujan, S. (1917)
The normal number of prime factors of a number 𝑛, Quart. J. Math. 48, 76–92. MR:2280878
Håstad, Johan (2001)
Some optimal inapproximability results, J. ACM 48, 798–859. MR:2144931
Hatami, Hamed & Norine, Serguei (2011)
Undecidability of linear inequalities in graph homomorphism densities, J. Amer. Math. Soc. 24, 547–565.
MR:2748400
Hatami, Hamed, Hladký, Jan, Kráľ, Daniel, Norine, Serguei, & Razborov, Alexander (2013)
On the number of pentagons in triangle-free graphs, J. Combin. Theory Ser. A 120, 722–732. MR:3007147
Hilbert, David (1888)
Ueber die Darstellung definiter Formen als Summe von Formenquadraten, Math. Ann. 32, 342–350.
MR:1510517
Hilbert, David (1893)
Über ternäre definite Formen, Acta Math. 17, 169–197. MR:1554835
Hoory, Shlomo, Linial, Nathan, & Wigderson, Avi (2006)
Expander graphs and their applications, Bull. Amer. Math. Soc. 43, 439–561. MR:2247919
References 311

Hosseini, Kaave, Lovett, Shachar, Moshkovitz, Guy, & Shapira, Asaf (2016)
An improved lower bound for arithmetic regularity, Math. Proc. Cambridge Philos. Soc. 161, 193–197.
MR:3530502
Ireland, Kenneth & Rosen, Michael (1990)
A classical introduction to modern number theory, second ed., Springer-Verlag. MR:1070716
Jordan, Herbert E. (1907)
Group-Characters of Various Types of Linear Groups, Amer. J. Math. 29, 387–405. MR:1506021
Kahn, Jeff (2001)
An entropy approach to the hard-core model on bipartite graphs, Combin. Probab. Comput. 10, 219–237.
MR:1841642
Katona, G. (1968)
A theorem of finite sets, Theory of graphs (Proc. Colloq., Tihany, 1966), pp. 187–207. MR:0290982
Kedlaya, Kiran S. (1997)
Large product-free subsets of finite groups, J. Combin. Theory Ser. A 77, 339–343. MR:1429085
Kedlaya, Kiran S. (1998)
Product-free subsets of groups, Amer. Math. Monthly 105, 900–906. MR:1656927
Keevash, Peter (2011)
Hypergraph Turán problems, Surveys in combinatorics 2011, Cambridge University Press, pp. 83–139.
MR:2866732
Khot, Subhash, Kindler, Guy, Mossel, Elchanan, & O’Donnell, Ryan (2007)
Optimal inapproximability results for MAX-CUT and other 2-variable CSPs?, SIAM J. Comput. 37,
319–357. MR:2306295
Kleinberg, Robert, Speyer, David E., & Sawin, Will (2018)
The growth of tri-colored sum-free sets, Discrete Anal., Paper No. 12, 10 pp. MR:3827120
Kollár, János, Rónyai, Lajos, & Szabó, Tibor (1996)
Norm-graphs and bipartite Turán numbers, Combinatorica 16, 399–406. MR:1417348
Komlós, J. & Simonovits, M. (1996)
Szemerédi’s regularity lemma and its applications in graph theory, Combinatorics, Paul Erdős is eighty,
Vol. 2 (Keszthely, 1993), János Bolyai Mathematical Society, pp. 295–352. MR:1395865
Komlós, János, Shokoufandeh, Ali, Simonovits, Miklós, & Szemerédi, Endre (2002)
The regularity lemma and its applications in graph theory, Theoretical aspects of computer science
(Tehran, 2000), Springer, pp. 84–112. MR:1966181
Konyagin, S. V. & Shkredov, I. D. (2015)
On sum sets of sets having small product set, Proc. Steklov Inst. Math. 290, 288–299. MR:3488800
Kővári, T., Sós, V. T., & Turán, P. (1954)
On a problem of K. Zarankiewicz, Colloq. Math. 3, 50–57. MR:65617
Kra, Bryna (2006)
The Green-Tao theorem on arithmetic progressions in the primes: an ergodic point of view, Bull. Amer.
Math. Soc. 43, 3–23. MR:2188173
Krivelevich, M. & Sudakov, B. (2006)
Pseudo-random graphs, More sets, graphs and numbers, Springer, pp. 199–262. MR:2223394
312 References

Kruskal, Joseph B. (1963)

The number of simplices in a complex, Mathematical optimization techniques, University of California
Press, pp. 251–278. MR:0154827
Lang, Serge & Weil, André (1954)
Number of points of varieties in finite fields, Amer. J. Math. 76, 819–827. MR:65218
Lee, Joonkyung (2019)
MathOverflow post, https://mathoverflow.net/q/189222/.
Leighton, Frank Thomson (1984)
New lower bound techniques for VLSI, Math. Systems Theory 17, 47–70. MR:738751
Li, J.L. Xiang & Szegedy, Balazs (2011)
On the logarithimic calculus and Sidorenko’s conjecture. arXiv:1107.1153
Loomis, L. H. & Whitney, H. (1949)
An inequality related to the isoperimetric inequality, Bull. Amer. Math. Soc. 55, 961–962. MR:0031538
Lovász, László (2009)
Very large graphs, Current developments in mathematics, 2008, International Press, pp. 67–128.
MR:2555927
Lovász, László (2012)
Large networks and graph limits, American Mathematical Society. MR:3012035
Lovász, László & Szegedy, Balázs (2006)
Limits of dense graph sequences, J. Combin. Theory Ser. B 96, 933–957. MR:2274085
Lovász, László & Szegedy, Balázs (2007)
Szemerédi’s lemma for the analyst, Geom. Funct. Anal. 17, 252–270. MR:2306658
Lovett, Shachar (2012)
Equivalence of polynomial conjectures in additive combinatorics, Combinatorica 32, 607–618.
MR:3004811
Lovett, Shachar (2015)
An exposition of Sanders’ quasi-polynomial Freiman-Ruzsa theorem, Theory of Computing Library
Graduate Surveys, vol. 6, pp. 1–14.
Lovett, Shachar & Regev, Oded (2017)
A counterexample to a strong variant of the polynomial Freiman-Ruzsa conjecture in Euclidean space,
Discrete Anal., Paper No. 8, 6 pp. MR:3651924
Lubetzky, Eyal & Zhao, Yufei (2017)
On the variational problem for upper tails in sparse random graphs, Random Structures Algorithms 50,
420–436. MR:3632418
Lubotzky, A., Phillips, R., & Sarnak, P. (1988)
Ramanujan graphs, Combinatorica 8, 261–277. MR:963118
Lubotzky, Alexander (2012)
Expander graphs in pure and applied mathematics, Bull. Amer. Math. Soc. 49, 113–162. MR:2869010
Mantel, W. (1907)
Problem 28, Wiskundige Opgaven 10, 60–61.
Marcus, Adam W., Spielman, Daniel A., & Srivastava, Nikhil (2015)
Interlacing families I: Bipartite Ramanujan graphs of all degrees, Ann. of Math. 182, 307–325.
MR:3374962
References 313

Margulis, G. A. (1988)
Explicit group-theoretic constructions of combinatorial schemes and their applications in the construction
of expanders and concentrators, Problemy Peredachi Informatsii 24, 51–60. MR:939574
Matiyasevich, Ju. V. (1970)
The Diophantineness of enumerable sets, Dokl. Akad. Nauk. SSSR. 191, 279–282. MR:0258744
Matoušek, Jiří (2010)
Thirty-three miniatures: Mathematical and algorithmic applications of linear algebra, American Math-
ematical Society. MR:2656313
Meshulam, Roy (1995)
On subsets of finite abelian groups with no 3-term arithmetic progressions, J. Combin. Theory Ser. A 71,
168–172. MR:1335785
Minkowski, Hermann (1896)
Geometrie der Zahlen, Teubner. MR:249269
Morgenstern, Moshe (1994)
Existence and explicit constructions of 𝑞 + 1 regular Ramanujan graphs for every prime power 𝑞, J.
Combin. Theory Ser. B 62, 44–62. MR:1290630
Moshkovitz, Guy & Shapira, Asaf (2016)
A short proof of Gowers’ lower bound for the regularity lemma, Combinatorica 36, 187–194. MR:3516883
Moshkovitz, Guy & Shapira, Asaf (2019)
A tight bound for hypergraph regularity, Geom. Funct. Anal. 29, 1531–1578. MR:4025519
Motzkin, T. S. (1967)
The arithmetic-geometric inequality, Inequalities (Proc. Sympos. Wright-Patterson Air Force Base, Ohio,
1965), Academic Press, pp. 205–224. MR:0223521
Motzkin, T. S. & Straus, E. G. (1965)
Maxima for graphs and a new proof of a theorem of Turán, Canadian J. Math. 17, 533–540. MR:175813
Mulholland, H. P. & Smith, C. A. B. (1959)
An inequality arising in genetical theory, Amer. Math. Monthly 66, 673–683. MR:110721
Nešetřil, Jaroslav & Rosenfeld, Moshe (2001)
I. Schur, C. E. Shannon and Ramsey numbers, a short story, Discrete Math. 229, 185–195. MR:1815606
Nikiforov, V. (2011)
The number of cliques in graphs of given order and size, Trans. Amer. Math. Soc. 363, 1599–1618.
MR:2737279
Nikolov, N. & Pyber, L. (2011)
Product decompositions of quasirandom groups and a Jordan type theorem, J. Eur. Math. Soc. (JEMS)
13, 1063–1077. MR:2800484
Nilli, A. (1991)
On the second eigenvalue of a graph, Discrete Math. 91, 207–210. MR:1124768
Pellegrino, Giuseppe (1970)
Sul massimo ordine delle calotte in 𝑆4,3 , Matematiche (Catania) 25, 149–157 (1971). MR:363952
Peluse, Sarah (2020)
Bounds for sets with no polynomial progressions, Forum Math. Pi 8, e16, 55 pp. MR:4199235
314 References

Petridis, Giorgis (2012)

New proofs of Plünnecke-type estimates for product sets in groups, Combinatorica 32, 721–733.
MR:3063158
Pippenger, Nicholas & Golumbic, Martin Charles (1975)
The inducibility of graphs, J. Combin. Theory Ser. B 19, 189–203. MR:401552
Plünnecke, Helmut (1970)
Eine zahlentheoretische Anwendung der Graphentheorie, J. Reine Angew. Math. 243, 171–183.
MR:266892
Polymath, D. H. J. (2012)
A new proof of the density Hales-Jewett theorem, Ann. of Math. 175, 1283–1327. MR:2912706
Radhakrishnan, Jaikumar (2003)
Entropy and counting, Computational Mathematics, Modelling and Algorithms, Narosa.
Razborov, Alexander A. (2007)
Flag algebras, J. Symbolic Logic 72, 1239–1282. MR:2371204
Razborov, Alexander A. (2008)
On the minimal density of triangles in graphs, Combin. Probab. Comput. 17, 603–618. MR:2433944
Razborov, Alexander A. (2013)
Flag algebras: an interim report, The mathematics of Paul Erdős. II, Springer, pp. 207–232. MR:3186665
Reiher, Christian (2016)
The clique density theorem, Ann. of Math. 184, 683–707. MR:3549620
Reingold, Omer, Trevisan, Luca, Tulsiani, Madhur, & Vadhan, Salil (2008)
New proofs of the Green-Tao-Ziegler dense model theorem: an exposition. arXiv:0806.0381
Rödl, V., Nagle, B., Skokan, J., Schacht, M., & Kohayakawa, Y. (2005)
The hypergraph regularity method and its applications, Proc. Natl. Acad. Sci. USA 102, 8109–8113.
MR:2167756
Roth, K. F. (1953)
On certain sets of integers, J. Lond. Math. Soc. 28, 104–109. MR:51853
Ruzsa, I. Z. (1994)
Generalized arithmetical progressions and sumsets, Acta Math. Hungar. 65, 379–388. MR:1281447
Ruzsa, Imre Z. (1989)
An application of graph theory to additive number theory, Sci. Ser. A Math. Sci. 3, 97–109. MR:2314377
Ruzsa, Imre Z. (1999)
An analog of Freiman’s theorem in groups, Astérisque 258, xv, 323–326. MR:1701207
Ruzsa, Imre Z. (2009)
Sumsets and structure, Combinatorial number theory and additive group theory, Birkhäuser Verlag,
pp. 87–210. MR:2522038
Ruzsa, Imre Z. & Szemerédi, Endre (1978)
Triple systems with no six points carrying three triangles, Combinatorics (Proc. Fifth Hungarian Colloq.,
Keszthely, 1976), Vol. II, pp. 939–945. MR:519318
Sagan, Bruce E. (2001)
The symmetric group: Representations, combinatorial algorithms, and symmetric functions, second ed.,
Springer-Verlag. MR:1824028
References 315

Sah, Ashwin, Sawhney, Mehtaab, Stoner, David, & Zhao, Yufei (2019)
The number of independent sets in an irregular graph, J. Combin. Theory Ser. B 138, 172–195.
MR:3979229
Sah, Ashwin, Sawhney, Mehtaab, Stoner, David, & Zhao, Yufei (2020)
A reverse Sidorenko inequality, Invent. Math. 221, 665–711. MR:4121160
Sah, Ashwin, Sawhney, Mehtaab, & Zhao, Yufei (2021)
Patterns without a popular difference, Discrete Anal., Paper No. 8, 30 pp. MR:4293329
Salem, R. & Spencer, D. C. (1942)
On sets of integers which contain no three terms in arithmetical progression, Proc. Natl. Acad. Sci. USA
28, 561–563. MR:7405
Sanders, Tom (2012)
On the Bogolyubov-Ruzsa lemma, Anal. PDE 5, 627–655. MR:2994508
Sanders, Tom (2013)
The structure theory of set addition revisited, Bull. Amer. Math. Soc. 50, 93–127. MR:2994996
Sárkőzy, A. (1978)
On difference sets of sequences of integers. I, Acta Math. Acad. Sci. Hungar. 31, 125–149. MR:466059
Saxton, David & Thomason, Andrew (2015)
Hypergraph containers, Invent. Math. 201, 925–992. MR:3385638
Schacht, Mathias (2016)
Extremal results for random discrete structures, Ann. of Math. 184, 333–365. MR:3548528
Schelp, Richard H. & Thomason, Andrew (1998)
A remark on the number of complete and empty subgraphs, Combin. Probab. Comput. 7, 217–219.
MR:1617934
Schoen, Tomasz (2011)
Near optimal bounds in Freiman’s theorem, Duke Math. J. 158, 1–12. MR:2794366
Schoen, Tomasz & Shkredov, Ilya D. (2014)
Roth’s theorem in many variables, Israel J. Math. 199, 287–308. MR:3219538
Schoen, Tomasz & Sisask, Olof (2016)
Roth’s theorem for four variables and additive structures in sums of sparse sets, Forum Math. Sigma 4,
e5, 28 pp. MR:3482282
Schrĳver, Alexander (2003)
Combinatorial optimization: Polyhedra and efficiency, Springer-Verlag. MR:1956924
Schur, I. (1916)
Uber die kongruenz 𝑥 𝑚 + 𝑦 𝑚 ≡ 𝑧 𝑚 (mod 𝑝), Jber. Deutsch. Math.-Verein 25, 114–116.
Schur, J. (1907)
Untersuchungen über die Darstellung der endlichen Gruppen durch gebrochene lineare Substitutionen,
J. Reine Angew. Math. 132, 85–137. MR:1580715
Serre, Jean-Pierre (1977)
Linear representations of finite groups, Springer-Verlag. MR:0450380
Sheffer, Adam (2022)
Polynomial methods and incidence theory, Cambridge University Press. MR:4394303
Shkredov, I. D. (2006)
On a generalization of Szemerédi’s theorem, Proc. Lond. Math. Soc. 93, 723–760. MR:2266965
316 References

Sidorenko, A. F. (1991)
Inequalities for functionals generated by bipartite graphs, Diskret. Mat. 3, 50–65. MR:1138091
Sidorenko, Alexander (1993)
A correlation inequality for bipartite graphs, Graphs Combin. 9, 201–204. MR:1225933
Simonovits, M. (1974)
Extermal graph problems with symmetrical extremal graphs. Additional chromatic conditions, Discrete
Math. 7, 349–376. MR:337690
Singleton, Robert (1966)
On minimal graphs of maximum even girth, J. Combinatorial Theory 1, 306–332. MR:201347
Skokan, Jozef & Thoma, Lubos (2004)
Bipartite subgraphs and quasi-randomness, Graphs Combin. 20, 255–262. MR:2080111
Solymosi, József (2003)
Note on a generalization of Roth’s theorem, Discrete and computational geometry, Springer, pp. 825–827.
MR:2038505
Solymosi, József (2009)
Bounding multiplicative energy by the sumset, Adv. Math. 222, 402–408. MR:2538014
Soundararajan, K. (2007)
Additive combinatorics, online lecture notes, http://math.stanford.edu/~ksound/Notes.pdf.
Spielman, Daniel A. (2019)
Spectral and algebraic graph theory, textbook draft, http://cs-www.cs.yale.edu/homes/
spielman/sagt/.
Stein, Elias M. & Shakarchi, Rami (2003)
Fourier analysis: An introduction, Princeton University Press. MR:1970295
Sudakov, B., Szemerédi, E., & Vu, V. H. (2005)
On a question of Erdős and Moser, Duke Math. J. 129, 129–155. MR:2155059
Szegedy, Balázs (2015)
An information theoretic approach to Sidorenko’s conjecture. arXiv:1406.6738
Székely, László A. (1997)
Crossing numbers and hard Erdős problems in discrete geometry, Combin. Probab. Comput. 6, 353–358.
MR:1464571
Szemerédi, E. (1975)
On sets of integers containing no 𝑘 elements in arithmetic progression, Acta Arith. 27, 199–245.
MR:369312
Szemerédi, Endre & Trotter, William T., Jr. (1983)
Extremal problems in discrete geometry, Combinatorica 3, 381–392. MR:729791
Tao, Terence (2006)
A variant of the hypergraph removal lemma, J. Combin. Theory Ser. A 113, 1257–1280. MR:2259060
Tao, Terence (2007a)
Structure and randomness in combinatorics, 48th Annual IEEE Symposium on Foundations of Computer
Science (FOCS’07), pp. 3–15.
Tao, Terence (2007b)
The dichotomy between structure and randomness, arithmetic progressions, and the primes, International
Congress of Mathematicians. Vol. I, European Mathematical Society, pp. 581–608. MR:2334204
References 317

Tao, Terence (2012)

The spectral proof of the szemeredi regularity lemma, blog post, https://terrytao.wordpress.com/
2012/12/03/.
Tao, Terence (2014)
A proof of Roth’s theorem, blog post, https://terrytao.wordpress.com/2014/04/24/.
Tao, Terence (2016)
A symmetric formulation of the Croot-Lev-Pach-Ellenberg-Gĳswĳt capset bound, blog post, https:
//terrytao.wordpress.com/2016/05/18/.
Tao, Terence & Vu, Van (2006)
Additive combinatorics, Cambridge University Press. MR:2289012
Tao, Terence & Ziegler, Tamar (2008)
The primes contain arbitrarily long polynomial progressions, Acta Math. 201, 213–305. MR:2461509
Tao, Terence & Ziegler, Tamar (2015)
A multi-dimensional Szemerédi theorem for the primes via a correspondence principle, Israel J. Math.
207, 203–228. MR:3358045
Tarski, Alfred (1948)
A decision method for elementary algebra and geometry, RAND Corporation. MR:0028796
Thomason, Andrew (1987)
Pseudorandom graphs, Random graphs ’85 (Poznań, 1985), North-Holland, pp. 307–331. MR:930498
Thomason, Andrew (1989)
A disproof of a conjecture of Erdős in Ramsey theory, J. Lond. Math. Soc. 39, 246–255. MR:991659
Turán, Paul (1934)
On a Theorem of Hardy and Ramanujan, J. Lond. Math. Soc. 9, 274–276. MR:1574877
Turán, Paul (1941)
Eine Extremalaufgabe aus der Graphentheorie, Mat. Fiz. Lapok 48, 436–452 (Hungarian, with German
summary).
Varnavides, P. (1959)
On certain sets of positive density, J. Lond. Math. Soc. 34, 358–360. MR:106865
Vinogradov, I. M. (1937)
The representation of an odd number as a sum of three primes., Dokl. Akad. Nauk. SSSR. 16, 139–142.
van der Waerden, B. L. (1927)
Beweis einer baudetschen vermutung, Nieuw Arch. Wisk. 15, 212–216.
Wenger, R. (1991)
Extremal graphs with no 𝐶 4 ’s, 𝐶 6 ’s, or 𝐶 10 ’s, J. Combin. Theory Ser. B 52, 113–116. MR:1109426
West, Douglas B. (1996)
Introduction to graph theory, Prentice Hall. MR:1367739
Wigderson, Avi (2012)
Representation theory of finite groups, and applications, Lecture notes for the 22nd McGill invita-
tional workshop on computational complexity, https://www.math.ias.edu/~avi/TALKS/Green_
Wigderson_lecture.pdf.
Williams, David (1991)
Probability with martingales, Cambridge University Press. MR:1155402
318 References

Wolf, J. (2015)
Finite field models in arithmetic combinatorics—ten years on, Finite Fields Appl. 32, 233–274.
MR:3293412
Wolf, Julia (2013)
Arithmetic and polynomial progressions in the primes [after Gowers, Green, Tao and Ziegler], Astérisque
352, 389–427. MR:3087352
Zarankiewicz, K. (1951)
Problem 101, Colloq. Math. 2, 201.
Zhao, Yufei (2010)
The number of independent sets in a regular graph, Combin. Probab. Comput. 19, 315–320. MR:2593625
Zhao, Yufei (2014)
An arithmetic transference proof of a relative Szemerédi theorem, Math. Proc. Cambridge Philos. Soc.
156, 255–261. MR:3177868
Zhao, Yufei (2017)
Extremal regular graphs: independent sets and graph homomorphisms, Amer. Math. Monthly 124,
827–843. MR:3722040
Index

(6, 3)-theorem, 67 coloring

(7, 4)-conjecture, 67 𝐻-coloring, 182
[𝑁], ix proper, see chromatic number
3-AP, see Roth’s theorem Ramsey, 2
4-cycle common difference, ix
forbidding, 22 popular, 230
construction, 40 common graph, 171
minimum density, 95 concentration
additive combinatorics, 5 subgraph density, 141
adjacency matrix, ix constellation, 8
Alon–Boppana bound, 122 construction
AP, ix algebraic, 37, 40
4-AP, 87, 210, 220, 224 randomized, 37, 38
≍, see asymptotics randomized algebraic, 38, 46
asymptotics, xi Turán, 37
Azuma’s inequality, 141 convergence
cut metric, 135, 142
Balog–Szemerédi–Gowers theorem, 262 left, 138, 140, 142, 152
graph, 264 convolution, 204
Behrend construction, 70 corner, 68
bipartite graph counting lemma
complete, ix 3-AP, 208, 215
constructions, 40 graph, 72
forbidding, see Kővári–Sós–Turán theorem graphon, 142
sparse inverse, 157
forbidding, 33 sparse, 286, 292
Blichfeldt’s theorem, 255 triangle, 62, 143
block model, see stochastic block model Courant–Fischer min-max theorem, 98
blow-up, 28 covering lemma, 242
Bogolyubov’s lemma, 249 crossing number, 271
Bohr set, 219, 251, 256 cut
Borel–Cantelli lemma, 142 convergence, see convergence
bounded differences inequality, 141, 155 distance, 135
BSGseeBalog–Szemerédi–Gowers theorem, 262 metric, 135
Cayley graph, 107 norm, 134, 142, 286
character, 109, 203 cycle, ix
Cheeger’s inequality, 105 constructions, 45
Chernoff bound, 93 forbidding, 31
𝜒( ), see chromatic number deg(𝑥), see degree
chromatic number, ix, 27 degree, ix
Chung–Graham–Wilson theorem, see quasirandom dense model theorem, 286, 287
graph density
𝐶ℓ , see cycle 2-density, 39
clique, ix clique, 187
forbidding, see Turán’s theorem edge, 19, 54

319
320 Index

homomorphism, 137, 161, 282 Furstenberg–Sárkőzy theorem, 8

increment, 62, 209, 216 𝐺 (𝑛, 𝑝), see random graph, Erdős–Rényi
integers, 6 𝐺 [𝑆], see induced subgraph
dependent random choice, 33 GAP, see progression, generalized arithmetic
diamond-free lemma, 66 Gauss sum, 110
discrepancy, 92, 133 Gowers uniformity norm, 212
distinct distances problem, 26 graph, ix
doubling graphon, 130
constant, 236 associated, 131
small, 236 space, 135
E, ix compact, 136, 151
𝐸 (𝐺), 𝑒(𝐺), 𝑒( 𝐴, 𝐵), 𝑒( 𝐴), see graph step, 131
edge expansion, 105 Green–Tao theorem, 9, 279
eigenvalue, 97 Grothendieck’s inequality, 120
abelian Cayley graph, 108 𝐻-free, ix, see also Turán problem
bipartite, 100 half-graph, 62, 131
second, 122 hereditary graph property, 85
Alon–Boppana bound, 122 homomorphism
Friedman’s theorem, 126 Freiman, 245
Ramanujan graph, 126 graph, ix, 137, 161
energy, 56, 225 hypergraph, ix
additive, 262
induced subgraph, ix
increment, 226
infinitary, see finitary
multiplicative, 275
invertible
entropy, 190
measure preserving map, 134
Erdős conjecture
arithmetic progressions, 7 Kővári–Sós–Turán theorem, 22
distinct distances problem, 26 constructions, 40
unit distance problem, 24 geometric applications, 24
Erdős–Turán conjecture, see Szemerédi’s theorem 𝐾𝑟 , see clique
Erdős–Stone–Simonovits theorem, 27, 75 𝐾 𝑠,𝑡 , see bipartite graph, complete
ESS theorem, see Erdős–Stone–Simonovits KST theorem, see Kővári–Sós–Turán theorem
theorem lattice, 253
ex(𝑛, 𝐻), see Turán problem ≲, see asymptotics
expander mixing lemma, 103 linear forms condition, 283, 284
bipartite, 104 Mantel’s theorem, 12
converse, 105 martingale, 148
exponent convergence theorem, 150
abelian group, 244 maximum cut, 147, 158
extremal number, see Turán problem measure preserving map, 134
finitary, 1 Minkowski
finite field model, 201, 206, 259 first theorem, 255
flag algebra, 173 second theorem, 255
forcing graph/conjecture, 99, 163 mixing
Fourier, 108, 201, 212, 252 quasirandom groups, 114
3-AP, 214 modeling lemma, 247
inversion, 202, 213 moment, 124
non-abelian, 117 graphon, 154
uniform, 208, 210, 215, 220, 224 N, ix
Freiman 𝑁 (𝑥), see neighborhood
homomorphism, 245 (𝑛, 𝑑, 𝜆)-graph, 103
isomorphism, 245 bipartite, 104
polynomial Freiman–Ruzsa conjecture, 259 neighborhood, ix
theorem, 237, 258 norm graph, 43
abelian group, 238 projective, 44
finite field model, 244
groups with bounded exponent, 243 𝑂 ( ), see asymptotics
Ω( ), see asymptotics
Index 321

operator norm, 103 finite field, 206, 220, 224

Paley graph, 94, 108 relative, 284
eigenvalues, 110 Sárkőzy theorem, see Furstenberg–Sárkőzy
Parseval, 202, 213 theorem
Perron–Frobenius theorem, 98 Schur’s theorem, 1
Plünnecke’s Inequality, 239 semidefinite
Plancheral, 202, 213 program, 147, 174
point-line incidence, 272 relaxation, 120
polynomial method, 220 SET card game, 206
product set, 269 Shearer’s inequality, 196
product-free, 111, 115 Sidorenko’s conjecture, 162, 170, 173, 177, 193
progression ∼, see asymptotics
arithmetic, ix spectral gap, 105
convex, 261 square, see Furstenberg–Sárkőzy theorem
generalized arithmetic, 237 Stanley sequence, 70
progression, generalized arithmetic, 256 stochastic block model, 132, 140
property testing, 84 subgraph, ix
pseudorandomness, 91 induced, ix
PSL(2, 𝑝), 94, 116 successive minimum, 254
quasirandom sum-free, 1, 224
Cayley graph, 118 sum-product, 269
graph, 92, 131 sumset, 235, 238, 239, 269
bipartite, 100 iterated, 249
group, 111, 117 restricted, 264
examples, 115 sunflower, 224
supersaturation, 21, 286
𝑟-graph, see hypergraph symmetrization
Ramanujan graph, 126 Zykov, 17, 186
Ramsey’s theorem, 2 Szemerédi’s regularity lemma, see regularity
random graph lemma, graph
Erdős–Rényi, 92 Szemerédi’s theorem, 6, 86, 280
quasirandom, 94 bounds, 7
rank counting, 286
slice, 221 multidimensional, 8
regular polynomial, 8
pair, 55 random, 285
partition, 55 relative, 284
set, 62
regularity lemma Θ( ), see asymptotics
arithmetic, 225, 227 tower function, 60
bounded degree, 153 trace method, 124
equitable, 61 transference, 285
graph, 56 triangle
hypergraph, 88 density, 164
irregular pair, 60, 62 forbidding, see Mantel’s theorem
lower bound, 60 triangle inequality
strong, 78 Ruzsa, 238
weak, 144, 147 Turán
removal lemma density, 19
graph, 75 graph, 14
hypergraph, 86 hypergraph, 20, 189
induced, 77 number, 11
infinite, 82 problem, 11
triangle, 64, 176 theorem, 14, 186
representation theory, 112 3
𝑈 norm, 212, 260
Roth’s theorem, 6, 9, 67, 214, 281 undecidability, 161
Behrend construction, 70 unit distance problem, 24
counting, 299 𝑉 (𝐺), 𝑣(𝐺), see graph
322 Index

van der Waerden’s theorem, 5

vertex-transitive graph, 113
𝑊-random graph, 139
𝑊𝐺 , see graphon, associated
Zarankiewicz problem, 22

Gtacbook
No ratings yet
Gtacbook
340 pages
Gtacbook
No ratings yet
Gtacbook
342 pages
Yufei Zhao (赵宇飞) - Graph Theory and Additive Combinatorics - Exploring Structure and Randomness-CUP (2023)
No ratings yet
Yufei Zhao (赵宇飞) - Graph Theory and Additive Combinatorics - Exploring Structure and Randomness-CUP (2023)
334 pages
Mit18 225 f23 Lec Full
No ratings yet
Mit18 225 f23 Lec Full
343 pages
00108
100% (1)
00108
991 pages
MIT18 217F19 Full Notes
No ratings yet
MIT18 217F19 Full Notes
162 pages
Graph Theory-An Introductory Course-Bela Bollobas
No ratings yet
Graph Theory-An Introductory Course-Bela Bollobas
194 pages
Lowell W Beineke Robin J Wilson Peter J Cameron Topics in Algebraic Graph Theory PDF
100% (5)
Lowell W Beineke Robin J Wilson Peter J Cameron Topics in Algebraic Graph Theory PDF
293 pages
Lowell W. Beineke & Robin J. Wilson & Peter J. Cameron - Topics in Algebraic Graph Theory
100% (2)
Lowell W. Beineke & Robin J. Wilson & Peter J. Cameron - Topics in Algebraic Graph Theory
293 pages
Gtacbook
No ratings yet
Gtacbook
195 pages
Probabilistic Paradigm Combin-Chap 1-6
No ratings yet
Probabilistic Paradigm Combin-Chap 1-6
124 pages
Probabilistic Methods in Combinatorics
No ratings yet
Probabilistic Methods in Combinatorics
215 pages
谱图理论
No ratings yet
谱图理论
390 pages
Rama-Topics in Combinatorics and Graph Theory
No ratings yet
Rama-Topics in Combinatorics and Graph Theory
452 pages
(Texts and Readings in Mathematics 55) Sebastian M. Cioabă, M. Ram Murty (Auth.) - A First Course in Graph Theory and Combinatorics-Hindustan Book Agency (2009)
No ratings yet
(Texts and Readings in Mathematics 55) Sebastian M. Cioabă, M. Ram Murty (Auth.) - A First Course in Graph Theory and Combinatorics-Hindustan Book Agency (2009)
185 pages
Sagt (N-Up 2x1)
No ratings yet
Sagt (N-Up 2x1)
199 pages
Wagner Notes-1
No ratings yet
Wagner Notes-1
222 pages
Comprehensive Guide to Combinatorial Mathematics
No ratings yet
Comprehensive Guide to Combinatorial Mathematics
6 pages
Probmethod Notes
No ratings yet
Probmethod Notes
214 pages
赵宇飞概率方法
No ratings yet
赵宇飞概率方法
209 pages
Bela Bollobas Graph Theory An Introductory Course
100% (1)
Bela Bollobas Graph Theory An Introductory Course
193 pages
Additive Combinatorics - Trevisan
No ratings yet
Additive Combinatorics - Trevisan
18 pages
Spectral and Algebraic Graph Theory
No ratings yet
Spectral and Algebraic Graph Theory
399 pages
University of Waterloo MATH 239 Course Notes, Fall 2016: Intro To Combinatorics Department of Combinatorics and Optimization PDF Download
No ratings yet
University of Waterloo MATH 239 Course Notes, Fall 2016: Intro To Combinatorics Department of Combinatorics and Optimization PDF Download
118 pages
ProbabilisticCombinatorics 15 MAR 2019
No ratings yet
ProbabilisticCombinatorics 15 MAR 2019
114 pages
Combinatorial Problems and Exercises 2, With Corrections Edition László Lovász Instant Download
No ratings yet
Combinatorial Problems and Exercises 2, With Corrections Edition László Lovász Instant Download
52 pages
A Brief Introduction To Spectral Graph Theory (Nica)
No ratings yet
A Brief Introduction To Spectral Graph Theory (Nica)
167 pages
Erdős-Ko-Rado Theorems: Algebraic Approaches
100% (2)
Erdős-Ko-Rado Theorems: Algebraic Approaches
352 pages
Combinatorics
100% (1)
Combinatorics
104 pages
Spectral and Algebraic Graph Theory
No ratings yet
Spectral and Algebraic Graph Theory
400 pages
Preview-9781439894996 A37928744
No ratings yet
Preview-9781439894996 A37928744
40 pages
Intro. To Combinatorics PDF
100% (7)
Intro. To Combinatorics PDF
392 pages
University of Waterloo MATH 239 Course Notes, Fall 2016: Intro To Combinatorics Department of Combinatorics and Optimization PDF Download
100% (1)
University of Waterloo MATH 239 Course Notes, Fall 2016: Intro To Combinatorics Department of Combinatorics and Optimization PDF Download
56 pages
Combinatorial Techniques Sharad S. Sane
100% (1)
Combinatorial Techniques Sharad S. Sane
477 pages
Combinatorics Notes
100% (1)
Combinatorics Notes
303 pages
University of Waterloo MATH 239 Course Notes, Fall 2016: Intro To Combinatorics Department of Combinatorics and Optimization Updated 2025
100% (1)
University of Waterloo MATH 239 Course Notes, Fall 2016: Intro To Combinatorics Department of Combinatorics and Optimization Updated 2025
81 pages
Introduction To Combinatorics, Course Notes PDF
100% (1)
Introduction To Combinatorics, Course Notes PDF
215 pages
ComboNoteswSolutions11 6 04
No ratings yet
ComboNoteswSolutions11 6 04
358 pages
What Is Combinatorics
No ratings yet
What Is Combinatorics
3 pages
Guichard D. An Introduction To Combinatorics and Graph Theory 2021
No ratings yet
Guichard D. An Introduction To Combinatorics and Graph Theory 2021
153 pages
CGT PDF
No ratings yet
CGT PDF
153 pages
Topics in Topological Graph Theory 1st Edition Beineke L.W. Available Instanly
No ratings yet
Topics in Topological Graph Theory 1st Edition Beineke L.W. Available Instanly
100 pages
Combinatorial Problems and Exercises 2, With Corrections Edition László Lovász Download
No ratings yet
Combinatorial Problems and Exercises 2, With Corrections Edition László Lovász Download
127 pages
A Primer in Combinatorics - Alexander Kheyfits PDF
100% (16)
A Primer in Combinatorics - Alexander Kheyfits PDF
334 pages
CGT PDF
No ratings yet
CGT PDF
153 pages
Graph Theory - Frank Harary and Longrio1
50% (2)
Graph Theory - Frank Harary and Longrio1
284 pages
An Introduction To Combinatorics and Graph Theory: David Guichard
100% (1)
An Introduction To Combinatorics and Graph Theory: David Guichard
155 pages
21 Extremal Properties 459: V Tools and Methods 531
No ratings yet
21 Extremal Properties 459: V Tools and Methods 531
5 pages
The Regularity Lemma and Its Applications
No ratings yet
The Regularity Lemma and Its Applications
30 pages
Teoria de Números
No ratings yet
Teoria de Números
155 pages
Aperiodic Order - Crystallography and Almost Periodicity. 2
100% (2)
Aperiodic Order - Crystallography and Almost Periodicity. 2
407 pages
A First Course in Enumerative Combinatorics 1st Edition Carl G. Wagner Available Instanly
No ratings yet
A First Course in Enumerative Combinatorics 1st Edition Carl G. Wagner Available Instanly
109 pages
Combinatorics & Graph Theory Guide
No ratings yet
Combinatorics & Graph Theory Guide
123 pages
Quantitative Graph Theory. Topic-Planar Graph
No ratings yet
Quantitative Graph Theory. Topic-Planar Graph
18 pages
UNIT III Directed Paths and Connectedness
No ratings yet
UNIT III Directed Paths and Connectedness
2 pages
MOA L11 Prime - Composite Numbers
No ratings yet
MOA L11 Prime - Composite Numbers
12 pages
Graphs and Graph Algorithms
No ratings yet
Graphs and Graph Algorithms
46 pages
Department of Mathematics: Wynberg Boys' High School
No ratings yet
Department of Mathematics: Wynberg Boys' High School
11 pages
Trigo Ratios & Identities-01 - Theory
No ratings yet
Trigo Ratios & Identities-01 - Theory
14 pages
Calculus Derivatives Guide
No ratings yet
Calculus Derivatives Guide
26 pages
Question Paper Code:: Reg. No.
No ratings yet
Question Paper Code:: Reg. No.
3 pages
Analysis and Designs of Algorithms: BCS401 Model Question Paper With Effect From 2023-24 (CBCS Scheme)
0% (2)
Analysis and Designs of Algorithms: BCS401 Model Question Paper With Effect From 2023-24 (CBCS Scheme)
2 pages
7.1 Graphs
No ratings yet
7.1 Graphs
55 pages
Advanced Math Homework Tasks
No ratings yet
Advanced Math Homework Tasks
2 pages
Complementary Error Function Table
No ratings yet
Complementary Error Function Table
1 page
Trigonometric Equations
No ratings yet
Trigonometric Equations
13 pages
Permutations & Combinations Guide
No ratings yet
Permutations & Combinations Guide
54 pages
COMBINATORICS
No ratings yet
COMBINATORICS
26 pages
MCQ 4A Ch5 Quad Functions
No ratings yet
MCQ 4A Ch5 Quad Functions
3 pages
Stick Diagram Construction Guide
No ratings yet
Stick Diagram Construction Guide
5 pages
ITF (Nexus - 2026) - Practice Sheet - 1
No ratings yet
ITF (Nexus - 2026) - Practice Sheet - 1
19 pages
Unit 3 MM PP - Copy-1
No ratings yet
Unit 3 MM PP - Copy-1
157 pages
Origin of Graph Theory
No ratings yet
Origin of Graph Theory
5 pages
Maths
No ratings yet
Maths
4 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
6 pages
Maximum Flow in Networks
No ratings yet
Maximum Flow in Networks
26 pages
Advanced Counting Techniques
No ratings yet
Advanced Counting Techniques
31 pages
Combinatorial Games
No ratings yet
Combinatorial Games
78 pages
Laplace Table PDF
No ratings yet
Laplace Table PDF
2 pages
Directed Graphs MCQs
No ratings yet
Directed Graphs MCQs
4 pages
Combinatorics
No ratings yet
Combinatorics
66 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
DAA Pastpaper Solve by M.Noman Tariq
No ratings yet
DAA Pastpaper Solve by M.Noman Tariq
23 pages