0% found this document useful (0 votes)

11 views90 pages

Zyqwadawfafslecture09 Query Optimization

The lecture covers query optimization, focusing on cost-based optimization algorithms that evaluate alternative query plans to choose the one with the lowest estimated cost. It discusses the importance of considering various physical plans, the history of query optimization, and the foundational laws of relational algebra that guide the optimization process. Additionally, it highlights the role of semijoins and their significance in parallel databases.

Uploaded by

Linacero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views90 pages

Zyqwadawfafslecture09 Query Optimization

Uploaded by

Linacero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

Lecture 9

Query Optimization
November 24, 2010

Dan Suciu -- CSEP544 Fall 2010 1

Outline
• Chapter 15 in the textbook

• Paper: Query Optimizers: Time to

Rethink the Contract ?, by Surajit
Chaudhuri

• Next time: parallel databases, Bloom

filters
Dan Suciu -- CSEP544 Fall 2010 2
Query Optimization Algorithm
• Enumerate alternative plans

• Compute estimated cost of each plan

– Compute number of I/Os
– Compute CPU cost

• Choose plan with lowest cost

– This is called cost-based optimization
Dan Suciu -- CSEP544 Fall 2010 3
Example
Supplier(sid, sname, scity, SELECT sname
sstate) FROM Supplier x, Supply y
Supply(sid, pno, quantity) WHERE x.sid = y.sid
and y.pno = 2
• Some statistics
and x.scity = „Seattle‟
– T(Supplier) = 1000 records
and x.sstate = „WA‟
– T(Supply) = 10,000 records
– B(Supplier) = 100 pages
– B(Supply) = 100 pages
– V(Supplier,scity) = 20, V(Supplier,state) = 10
– V(Supply,pno) = 2,500
– Both relations are clustered
• M = 10
Dan Suciu -- CSEP544 Fall 2010 4
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 10
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 1

(On the fly) sname

(On the fly)

scity=„Seattle‟ sstate=„WA‟  pno=2

(Block-nested loop)
sid = sid

Supplier Supply
(File scan) (File scan)
Dan Suciu -- CSEP544 Fall 2010 5
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 10
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 1

(On the fly) sname Selection and project on-the-fly
-> No additional cost.
(On the fly)
scity=„Seattle‟ sstate=„WA‟  pno=2

(Block-nested loop) Total cost of plan is thus cost of join:

= B(Supplier)+B(Supplier)*B(Supply)/M
sid = sid = 100 + 10 * 100
= 1,100 I/Os

Supplier Supply
(File scan) (File scan)
Dan Suciu -- CSEP544 Fall 2010 6
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 10
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 2

(On the fly) sname 

(Sort-merge join) 

sid = sid
(Scan
write to T1) (Scan
scity=„Seattle‟ sstate=„WA‟ write to T2)
 pno=2

Supplier Supply
(File scan) (File scan)
Dan Suciu -- CSEP544 Fall 2010 7
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 10
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 2

Total cost
(On the fly) sname 
= 100 + 100 * 1/20 * 1/10 (1)
+ 100 + 100 * 1/2500 (2)
+ 2 (3)
(Sort-merge join)  + 0 (4)
sid = sid Total cost  204 I/Os
(Scan
write to T1) (Scan
scity=„Seattle‟ sstate=„WA‟ write to T2)
 pno=2

Supplier Supply
(File scan) (File scan)
Dan Suciu -- CSEP544 Fall 2010 8
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 10
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 3

(On the fly)  sname
(On the fly)
 scity=„Seattle‟ sstate=„WA‟

 (Index nested loop)

sid = sid

(Use index)
 pno=2

Supply Supplier
(Index lookup on pno ) (Index lookup on sid)
Assume: clustered Doesn‟t matter if clustered or not9
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 10
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 3

(On the fly)  sname
(On the fly)
 scity=„Seattle‟ sstate=„WA‟

 (Index nested loop)

sid = sid

(Use index) 4 tuples

 pno=2

Supply Supplier
(Index lookup on pno ) (Index lookup on sid)
Assume: clustered Doesn‟t matter if clustered or not10
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 10
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 3

(On the fly)  sname
Total cost
(On the fly) = 1 (1)
+ 4 (2)
 scity=„Seattle‟ sstate=„WA‟ + 0 (3)
+ 0 (3)
Total cost  5 I/Os
 (Index nested loop)
sid = sid

(Use index) 4 tuples

 pno=2

Supply Supplier
(Index lookup on pno ) (Index lookup on sid)
Assume: clustered Doesn‟t matter if clustered or not11
Simplifications
• In the previous examples, we assumed
that all index pages were in memory

• When this is not the case, we need to

add the cost of fetching index pages
from disk

Dan Suciu -- CSEP544 Fall 2010 12

Lessons
1. Need to consider several physical plan
– even for one, simple logical plan

2. No plan is best in general

– need to have statistics over the data
– the B‟s, the T‟s, the V‟s

Dan Suciu -- CSEP544 Fall 2010 13

The Contract of the Optimizer
[Chaudhuri]
• high-quality execution plans for all
queries,
• while taking relatively small optimization
time, and
• with limited additional input such as
histograms.

Dan Suciu -- CSEP544 Fall 2010 14

Query Optimization
Three major components:

1. Search space

2. Algorithm for enumerating query plans

3. Cardinality and cost estimation

Dan Suciu -- CSEP544 Fall 2010 15
History of Query Optimization
• First query optimizer was for System R,
from IBM, in 1979
• It had all three components in place, and
defined the architecture of query
optimizers for years to come
• You will see often references to System R
• Read Section 15.6 in the book

Dan Suciu -- CSEP544 Fall 2010 16

1. Search Space
• This is the set of all alternative plans
that are considered by the optimizer

• Defined by the set of algebraic laws and

the set of plans used by the optimizer

• Will discuss these laws next

Dan Suciu -- CSEP544 Fall 2010 17
Left-Deep Plans and
Bushy Plans

R3 R1 R3 R1 R2 R4
Left-deep plan Bushy plan

System R considered only left deep plans,

and so do some optimizers
Dan Suciu today
-- CSEP544 Fall 2010 18
Relational Algebra Laws
• Selections
– Commutative: c1(c2(R)) = c2(c1(R))
– Cascading: c1c2(R) = c2(c1(R))
• Projections
• Joins
– Commutativity : R ⋈ S = S ⋈ R
– Associativity: R ⋈ (S ⋈ T) = (R ⋈ S) ⋈ T
– Distributivity: R ⨝ (S  T) = (R ⨝ S)  (R ⨝ T)
– Outer joins get more complicated
Dan Suciu -- CSEP544 Fall 2010 19
Example
Which plan is more efficient ?
R ⨝ (S ⨝ T) or (R ⨝ S) ⨝ T ?
• Assumptions:
– Every join selectivity is 10%
• That is: T(R ⨝ S) = 0.1 * T(R) * T(S) etc.
– B(R)=100, B(S) = 50, B(T)=500
– All joins are main memory joins
– All intermediate results are materialized
Dan Suciu -- CSEP544 Fall 2010 20
Simple Laws

 C AND C‟(R) =  C( C‟(R)) =  C(R)   C‟(R)

 C OR C‟(R) =  C(R)   C‟(R)
 C (R ⨝ S) =  C (R) ⨝ S

When C involves
 C (R – S) =  C (R) – S only attributes of R
 C (R  S) =  C (R)   C (S)
 C (R ⨝ S) =  C (R) ⨝ S
Dan Suciu -- CSEP544 Fall 2010 21
Example
• Example: R(A, B, C, D), S(E, F, G)
 F=3 (R ⨝ D=E S) = ?
 A=5 AND G=9 (R ⨝ D=E S) = ?

Dan Suciu -- CSEP544 Fall 2010 22

Simple Laws

PM(R ⨝ S) = PM(PP(R) ⨝ PQ(S))

PM(PN(R)) = PM(R) /* note that M ⊆ N */

• Example R(A,B,C,D), S(E, F, G)

PA,B,G(R ⨝ D=E S) = P ? (P?(R) ⨝ D=E P?(S))

Dan Suciu -- CSEP544 Fall 2010 23

Laws for Group-by and Join
if agg is
“duplicate
insensitive”
(A, agg(B)(R)) = A, agg(B)(R)
A, agg(B)((R)) = A, agg(B)(R)

Which of the following are “duplicate insensitive” ?

sum, count, avg, min, max

Dan Suciu -- CSEP544 Fall 2010 24

Example

A, agg(D)(R(A,B) ⨝ B=C S(C,D)) =

A, agg(D)(R(A,B) ⨝ B=C (C, agg(D)S(C,D)))

These are very powerful laws.

They were introduced only in the 90‟s.

Dan Suciu -- CSEP544 Fall 2010 25

Laws Involving Constraints
Foreign key
Product(pid, pname, price, cid)
Company(cid, cname, city, state)

Ppid, price(Product ⨝cid=cid Company) = Ppid, price(Product)

Need a second constraint for this law to hold. Which ?

Dan Suciu -- CSEP544 Fall 2010 26

Example Foreign key
Product(pid, pname, price, cid)
Company(cid, cname, city, state)

CREATE VIEW CheapProductCompany

SELECT *
FROM Product x, Company y
WHERE x.cid = y.cid and x.price < 100

SELECT pname, price SELECT pname, price

FROM CheapProductCompany FROM Product

Dan Suciu -- CSEP544 Fall 2010 27

Law of Semijoins
Recall the definition of a semijoin:
• R ⋉ S = P A1,…,An (R ⨝ S)
• Where the schemas are:
– Input: R(A1,…An), S(B1,…,Bm)
– Output: T(A1,…,An)
• The law of semijoins is:
R ⨝ S = (R ⋉ S) ⨝ S
Dan Suciu -- CSEP544 Fall 2010 28
Laws with Semijoins
• Very important in parallel databases
• Often combined with Bloom Filters (next
lecture)
• Read pp. 747 in the textbook

Dan Suciu -- CSEP544 Fall 2010 29

Semijoin Reducer
• Given a query: Q = R1 ⨝ R2 ⨝ . . . ⨝ Rn

Ri1 = Ri1 ⋉ Rj1

• A semijoin reducer for Q is
Ri2 = Ri2 ⋉ Rj2
.....
Rip = Rip ⋉ Rjp

such that the query is equivalent to:

Q = Rk1 ⨝ Rk2 ⨝ . . . ⨝ Rkn

• A full reducer is such that no dangling tuples remain

Dan Suciu -- CSEP544 Fall 2010 30

Example
• Example:
Q = R(A,B) ⨝ S(B,C)

• A semijoin reducer is:

R1(A,B) = R(A,B) ⋉ S(B,C)

• The rewritten query is:

Q = R1(A,B) ⨝ S(B,C)
31
Why Would We Do This ?
• Large attributes:
Q = R(A, B, D, E, F,…) ⨝ S(B, C, M, K, L, …)

• Expensive side computations

Q = γA,B,count(*)R(A,B,D) ⨝ σC=value(S(B,C))

R1(A,B,D) = R(A,B,D) ⋉ σC=value(S(B,C))

Q = γA,B,count(*)R1(A,B,D) ⨝ σC=value(S(B,C))

32
Semijoin Reducer
• Example:
Q = R(A,B) ⨝ S(B,C)

• A semijoin reducer is:

R1(A,B) = R(A,B) ⋉ S(B,C)

• The rewritten query is:

Q = R1(A,B) ⨝ S(B,C)

Are there dangling tuples ?

33
Semijoin Reducer
• Example:
Q = R(A,B) ⨝ S(B,C)

• A full semijoin reducer is:

R1(A,B) = R(A,B) ⋉ S(B,C)
S1(B,C) = S(B,C) ⋉ R1(A,B)

• The rewritten query is:

Q :- R1(A,B) ⨝ S1 (B,C)

No more dangling tuples 34

Semijoin Reducer
• More complex example:
Q = R(A,B) ⨝ S(B,C) ⨝ T(C,D,E)

• A full reducer is:

S‟(B,C) := S(B,C) ⋉ R(A,B)
T‟(C,D,E) := T(C,D,E) ⋉ S(B,C)
S‟‟(B,C) := S‟(B,C) ⋉ T‟(C,D,E)
R‟(A,B) := R (A,B) ⋉ S‟‟(B,C)

Q = R‟(A,B) ⨝ S‟‟(B,C) ⨝ T‟(C,D,E)

35
Semijoin Reducer
• Example:

Q = R(A,B) ⨝ S(B,C) ⨝ T(A,C)

• Doesn‟t have a full reducer (we can reduce forever)

Theorem a query has a full reducer iff it is “acyclic”

[Database Theory, by Abiteboul, Hull, Vianu]

Dan Suciu -- CSEP544 Fall 2010 36

Example with Semijoins
Emp(eid, ename, sal, did) [Chaudhuri‟98]
Dept(did, dname, budget)
DeptAvgSal(did, avgsal) /* view */

View: CREATE VIEW DepAvgSal As (

SELECT E.did, Avg(E.Sal) AS avgsal
FROM Emp E
GROUP BY E.did)

Query: SELECT E.eid, E.sal

FROM Emp E, Dept D, DepAvgSal V
WHERE E.did = D.did AND E.did = V.did
AND E.age < 30 AND D.budget > 100k
AND E.sal > V.avgsal
37
Goal: compute only the necessary part of the view
Example with Semijoins
Emp(eid, ename, sal, did) [Chaudhuri‟98]
Dept(did, dname, budget)
DeptAvgSal(did, avgsal) /* view */
CREATE VIEW LimitedAvgSal As (
New view SELECT E.did, Avg(E.Sal) AS avgsal
uses a reducer: FROM Emp E, Dept D
WHERE E.did = D.did AND D.buget > 100k
GROUP BY E.did)

SELECT E.eid, E.sal

New query: FROM Emp E, Dept D, LimitedAvgSal V
WHERE E.did = D.did AND E.did = V.did
AND E.age < 30 AND D.budget > 100k
AND E.sal > V.avgsal 38
Example with Semijoins
Emp(eid, ename, sal, did)
Dept(did, dname, budget) [Chaudhuri‟98]
DeptAvgSal(did, avgsal) /* view */
CREATE VIEW PartialResult AS
(SELECT E.eid, E.sal, E.did
FROM Emp E, Dept D
Full reducer: WHERE E.did=D.did AND E.age < 30
AND D.budget > 100k)

CREATE VIEW Filter AS

(SELECT DISTINCT P.did FROM PartialResult P)

CREATE VIEW LimitedAvgSal AS

(SELECT E.did, Avg(E.Sal) AS avgsal
FROM Emp E, Filter F
39
WHERE E.did = F.did GROUP BY E.did)
Example with Semijoins

New query:

SELECT P.eid, P.sal

FROM PartialResult P, LimitedDepAvgSal V
WHERE P.did = V.did AND P.sal > V.avgsal

Dan Suciu -- CSEP544 Fall 2010 40

Pruning the Search Space
• Prune entire sets of plans that are
unpromising

• The choice of partial plans influences how

effective we can prune

Dan Suciu -- CSEP544 Fall 2010 41

Complete Plans
R(A,B) SELECT *
S(B,C) FROM R, S, T
T(C,D) WHERE R.B=S.B and S.C=T.C and R.A<40

⨝
⨝
⨝ Pruning is
T difficult
σA<40 ⨝ here.
σA<40 S
R S T
R
Dan Suciu -- CSEP544 Fall 2010 42
Bottom-up Partial Plans
R(A,B) SELECT *
S(B,C) FROM R, S, T
T(C,D) WHERE R.B=S.B and S.C=T.C and R.A<40

Pruning can be done ⨝

more efficiently
⨝ ⨝ T

σA<40 ⨝ σA<40 S ⨝ σA<40 S

…..
R S T R R S R 43
Top-down Partial Plans
R(A,B) SELECT *
S(B,C) FROM R, S, T
T(C,D) WHERE R.B=S.B and S.C=T.C and R.A<40

⨝ ⨝ σA<40

⨝
T T SELECT R.A, T.D
SELECT * FROM R, S, T
FROM R, S WHERE R.B=S.B
WHERE R.B=S.B
SELECT * S and S.C=T.C …..
and R.A < 40
FROM R
WHERE R.A < 40 44
Query Optimization
Three major components:

1. Search space

2. Algorithm for enumerating query plans

3. Cardinality and cost estimation

Dan Suciu -- CSEP544 Fall 2010 45
2. Plan Enumeration Algorithms
• System R (in class)
– Join reordering – dynamic programming
– Access path selection
– Bottom-up; simple; limited
• Modern database optimizers (will not discuss)
– Rule-based: database of rules (x 100s)
– Dynamic programming
– Top-down; complex; extensible
Dan Suciu -- CSEP544 Fall 2010 46
Join Reordering
System R [1979]
• Push all selections down (=early) in the query plan
• Pull all projections up (=late) in the query plan
• What remains are joins:

SELECT list
FROM R1, …, Rn
WHERE cond1 AND cond2 AND . . . AND condk

Dan Suciu -- CSEP544 Fall 2010 47

SELECT list
FROM R1, …, Rn
WHERE cond1 AND cond2 AND . . . AND condk

Join Reordering
Dynamic programming

• For each subquery Q {R1, …, Rn},

compute the optimal join order for Q

• Store results in a table: 2n-1 entries

– Often much fewer entries
Dan Suciu -- CSEP544 Fall 2010 48
SELECT list
FROM R1, …, Rn
WHERE cond1 AND cond2 AND . . . AND condk

Join Reordering
Step 1: For each {Ri} do:
• Initialize the table entry for {Ri} with the cheapest
access path for Ri
Step 2: For each subset Q  {R1, …, Rn} do:
• For every partition Q = Q‟  Q‟‟
• Lookup optimal plan for Q‟ and for Q‟‟ in the table
• Compute the cost of the plan Q‟ ⨝ Q‟‟
• Store the cheapest plan Q‟ ⨝ Q‟‟ in table entry for Q
Dan Suciu -- CSEP544 Fall 2010 49
Reducing the Search Space
Restriction 1: only left linear trees (no bushy)

Restriction 2: no trees with cartesian product

R(A,B) ⨝ S(B,C) ⨝ T(C,D)

Plan: (R(A,B)⨝T(C,D)) ⨝ S(B,C)
has a cartesian product.
Most query optimizers will not consider it
Dan Suciu -- CSEP544 Fall 2010 50
Access Path Selection
• Access path: a way to retrieve tuples from a table
– A file scan
– An index plus a matching selection condition

• Index matches selection condition if it can be used to

retrieve just tuples that satisfy the condition
– Example: Supplier(sid,sname,scity,sstate)
– B+-tree index on (scity,sstate)
• matches scity=„Seattle‟
• does not match sid=3, does not match sstate=„WA‟

Dan Suciu -- CSEP544 Fall 2010 51

Access Path Selection
• Supplier(sid,sname,scity,sstate)

• Selection condition: sid > 300 scity=„Seattle‟

• Indexes: B+-tree on sid and B+-tree on scity

Dan Suciu -- CSEP544 Fall 2010 52

Access Path Selection
• Supplier(sid,sname,scity,sstate)

• Selection condition: sid > 300 scity=„Seattle‟

• Indexes: B+-tree on sid and B+-tree on scity

• Which access path should we use?

Dan Suciu -- CSEP544 Fall 2010 53

Access Path Selection
• Supplier(sid,sname,scity,sstate)

• Selection condition: sid > 300 scity=„Seattle‟

• Indexes: B+-tree on sid and B+-tree on scity

• Which access path should we use?

• We should pick the most selective access path

Dan Suciu -- CSEP544 Fall 2010 54

Access Path Selectivity
• Access path selectivity is the number of pages
retrieved if we use this access path
– Most selective retrieves fewest pages

• As we saw earlier, for equality predicates

– Selection on equality: a=v(R)
– V(R, a) = # of distinct values of attribute a
– 1/V(R,a) is thus the reduction factor
– Clustered index on a: cost B(R)/V(R,a)
– Unclustered index on a: cost T(R)/V(R,a)
– (we are ignoring I/O cost of index pages for simplicity)

Dan Suciu -- CSEP544 Fall 2010 55

Other Decisions for the
Optimization Algorithm
• How much memory to allocate to each
operator

• Pipeline or materialize (next)

Dan Suciu -- CSEP544 Fall 2010 56

Materialize Intermediate
Results Between Operators
HashTable  S

V2 ⋈ repeat read(R, x)
y  join(HashTable, x)
write(V1, y)

HashTable  T
⋈ U repeat read(V1, y)
z  join(HashTable, y)
V1 write(V2, z)

HashTable  U
⋈ T repeat read(V2, z)
u  join(HashTable, z)
write(Answer, u)

R S Dan Suciu -- CSEP544 Fall 2010 57

Materialize Intermediate
Results Between Operators
Question in class

Given B(R), B(S), B(T), B(U)

• What is the total cost of the plan ?

– Cost =
• How much main memory do we need ?
– M=

Dan Suciu -- CSEP544 Fall 2010 58

Pipeline Between Operators

⋈
HashTable1  S
HashTable2  T
HashTable3  U
⋈ U repeat read(R, x)
y  join(HashTable1, x)
z  join(HashTable2, y)
u  join(HashTable3, z)
write(Answer, u)
⋈ T

R S Dan Suciu -- CSEP544 Fall 2010 59

Pipeline Between Operators
Question in class

Given B(R), B(S), B(T), B(U)

• What is the total cost of the plan ?

– Cost =
• How much main memory do we need ?
– M=

Dan Suciu -- CSEP544 Fall 2010 60

Pipeline in Bushy Trees
⋈

⋈
⋈
⋈
⋈ ⋈
⋈ V
Z
R S T I X Y
Dan Suciu -- CSEP544 Fall 2010 61
Example (will skip in class)
• Logical plan is:

k blocks
U(y,z)
10,000 blocks
R(w,x) S(x,y)
5,000 blocks 10,000 blocks

• Main memory M = 101 buffers

Dan Suciu -- CSEP544 Fall 2010 62

Example (will skip in class)
M = 101

k blocks
U(y,z)
10,000 blocks
R(w,x) S(x,y)
5,000 blocks 10,000 blocks

Naïve evaluation:
• 2 partitioned hash-joins
• Cost = 3B(R) + 3B(S) + 4k + 3B(U) = 75000 + 4k

63
Example (will skip in class)
M = 101

k blocks
U(y,z)
10,000 blocks
R(w,x) S(x,y)
5,000 blocks 10,000 blocks
Smarter:
• Step 1: hash R on x into 100 buckets, each of 50 blocks; to disk
• Step 2: hash S on x into 100 buckets; to disk
• Step 3: read each Ri in memory (50 buffer) join with Si (1 buffer); hash
result on y into 50 buckets (50 buffers) -- here we pipeline
• Cost so far: 3B(R) + 3B(S) 64
Example (will skip in class)
M = 101

k blocks
U(y,z)
10,000 blocks
R(w,x) S(x,y)
5,000 blocks 10,000 blocks
Continuing:
• How large are the 50 buckets on y ? Answer: k/50.
• If k <= 50 then keep all 50 buckets in Step 3 in memory, then:
• Step 4: read U from disk, hash on y and join with memory
• Total cost: 3B(R) + 3B(S) + B(U) = 55,000 65
Example (will skip in class)
M = 101

k blocks
U(y,z)
10,000 blocks
R(w,x) S(x,y)
5,000 blocks 10,000 blocks
Continuing:
• If 50 < k <= 5000 then send the 50 buckets in Step 3 to disk
– Each bucket has size k/50 <= 100
• Step 4: partition U into 50 buckets
• Step 5: read each partition and join in memory
• Total cost: 3B(R) + 3B(S) + 2k + 3B(U) = 75,000 + 2k 66
Example (will skip in class)
M = 101

k blocks
U(y,z)
10,000 blocks
R(w,x) S(x,y)
5,000 blocks 10,000 blocks
Continuing:
• If k > 5000 then materialize instead of pipeline
• 2 partitioned hash-joins
• Cost 3B(R) + 3B(S) + 4k + 3B(U) = 75000 + 4k
67
Query Optimization
Three major components:

1. Search space

2. Algorithm for enumerating query plans

3. Cardinality and cost estimation

Dan Suciu -- CSEP544 Fall 2010 68
3. Cardinality and Cost
Estimation
• Collect statistical summaries of stored data

• Estimate size (=cardinality) in a bottom-up

fashion
– This is the most difficult part, and still inadequate
in today‟s query optimizers
• Estimate cost by using the estimated size
– Hand-written formulas, similar to those we used
for computing the cost of each physical operator
Dan Suciu -- CSEP544 Fall 2010 69
Statistics on Base Data
• Collected information for each relation
– Number of tuples (cardinality)
– Indexes, number of keys in the index
– Number of physical pages, clustering info
– Statistical information on attributes
• Min value, max value, number distinct values
• Histograms
– Correlations between columns (hard)
• Collection approach: periodic, using sampling
Dan Suciu -- CSEP544 Fall 2010 70
Size Estimation Problem

S = SELECT list
FROM R1, …, Rn
WHERE cond1 AND cond2 AND . . . AND condk

Given T(R1), T(R2), …, T(Rn)

Estimate T(S)

How can we do this ? Note: doesn‟t have to be exact.

Dan Suciu -- CSEP544 Fall 2010 71
Size Estimation Problem

S = SELECT list
FROM R1, …, Rn
WHERE cond1 AND cond2 AND . . . AND condk

Remark: T(S) ≤ T(R1) × T(R2) × … × T(Rn)

Dan Suciu -- CSEP544 Fall 2010 72

Selectivity Factor
• Each condition cond reduces the size
by some factor called selectivity factor

• Assuming independence, multiply the

selectivity factors

Dan Suciu -- CSEP544 Fall 2010 73

Example
R(A,B) SELECT *
S(B,C) FROM R, S, T
T(C,D) WHERE R.B=S.B and S.C=T.C and R.A<40

T(R) = 30k, T(S) = 200k, T(T) = 10k

Selectivity of R.B = S.B is 1/3

Selectivity of S.C = T.C is 1/10
Selectivity of R.A < 40 is ½

What is the estimated size of the query output ?

Dan Suciu -- CSEP544 Fall 2010 74
Rule of Thumb
• If selectivities are unknown, then:
selectivity factor = 1/10
[System R, 1979]

Dan Suciu -- CSEP544 Fall 2010 75

Using Data Statistics
• Condition is A = c /* value selection on R */
– Selectivity = 1/V(R,A)

• Condition is A < c /* range selection on R */

– Selectivity = (c - Low(R, A))/(High(R,A) - Low(R,A))T(R)

• Condition is A = B /* R ⨝A=B S */
– Selectivity = 1 / max(V(R,A),V(S,A))
– (will explain next)

Dan Suciu -- CSEP544 Fall 2010 76

Assumptions
• Containment of values: if V(R,A) <= V(S,B), then
the set of A values of R is included in the set of
B values of S
– Note: this indeed holds when A is a foreign key in R,
and B is a key in S

• Preservation of values: for any other attribute C,

V(R ⨝A=B S, C) = V(R, C) (or V(S, C))

Dan Suciu -- CSEP544 Fall 2010 77

Selectivity of R ⨝A=B S
Assume V(R,A) <= V(S,B)

• Each tuple t in R joins with T(S)/V(S,B) tuple(s) in S

• Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B)

In general: T(R ⨝A=B S) = T(R) T(S) / max(V(R,A),V(S,B))

Dan Suciu -- CSEP544 Fall 2010 78

Size Estimation for Join
Example:
• T(R) = 10000, T(S) = 20000
• V(R,A) = 100, V(S,B) = 200
• How large is R ⨝A=B S ?

Dan Suciu -- CSEP544 Fall 2010 79

Histograms
• Statistics on data maintained by the
RDBMS
• Makes size estimation much more
accurate (hence, cost estimations are
more accurate)

Dan Suciu -- CSEP544 Fall 2010 80

Histograms
Employee(ssn, name, age)
T(Employee) = 25000, V(Empolyee, age) = 50
min(age) = 19, max(age) = 68

σage=48(Empolyee) = ? σage>28 and age<35(Empolyee) = ?

Dan Suciu -- CSEP544 Fall 2010 81

Histograms
Employee(ssn, name, age)
T(Employee) = 25000, V(Empolyee, age) = 50
min(age) = 19, max(age) = 68

σage=48(Empolyee) = ? σage>28 and age<35(Empolyee) = ?

Estimate = 25000 / 50 = 500 Estimate = 25000 * 6 / 60 = 2500

Dan Suciu -- CSEP544 Fall 2010 82
Histograms
Employee(ssn, name, age)
T(Employee) = 25000, V(Empolyee, age) = 50
min(age) = 19, max(age) = 68

σage=48(Empolyee) = ? σage>28 and age<35(Empolyee) = ?

Age: 0..20 20..29 30-39 40-49 50-59 > 60

Tuples 200 800 5000 12000 6500 500

Dan Suciu -- CSEP544 Fall 2010 83

Histograms
Employee(ssn, name, age)
T(Employee) = 25000, V(Empolyee, age) = 50
min(age) = 19, max(age) = 68

σage=48(Empolyee) = ? σage>28 and age<35(Empolyee) = ?

Age: 0..20 20..29 30-39 40-49 50-59 > 60

Tuples 200 800 5000 12000 6500 500

Estimate = 1200 Estimate = 280 + 5500 = 266084

Types of Histograms
• How should we determine the bucket
boundaries in a histogram ?

Dan Suciu -- CSEP544 Fall 2010 85

Types of Histograms
• How should we determine the bucket
boundaries in a histogram ?

• Eq-Width
• Eq-Depth
• Compressed
• V-Optimal histograms
Dan Suciu -- CSEP544 Fall 2010 86
Employee(ssn, name, age)
Histograms
Eq-width:

Age: 0..20 20..29 30-39 40-49 50-59 > 60

Tuples 200 800 5000 12000 6500 500

Eq-depth:

Age: 0..20 20..29 30-39 40-49 50-59 > 60

Tuples 1800 2000 2100 2200 1900 1800

Compressed: store separately highly frequent values: (48,1900)

Dan Suciu -- CSEP544 Fall 2010 87
V-Optimal Histograms
• Defines bucket boundaries in an optimal
way, to minimize the error over all point
queries
• Computed rather expensively, using
dynamic programming
• Modern databases systems use V-
optimal histograms or some variations

Dan Suciu -- CSEP544 Fall 2010 88

Difficult Questions on Histograms
• Small number of buckets
– Hundreds, or thousands, but not more
– WHY ?
• Not updated during database update,
but recomputed periodically
– WHY ?
• Multidimensional histograms rarely used
– WHY ?
Dan Suciu -- CSEP544 Fall 2010 89
Summary of Query
Optimization
• Three parts:
– search space, algorithms, size/cost estimation
• Ideal goal: find optimal plan. But
– Impossible to estimate accurately
– Impossible to search the entire space
• Goal of today‟s optimizers:
– Avoid very bad plans

Dan Suciu -- CSEP544 Fall 2010 90

CSE 544: Optimizations: Wednesday, 5/10/2006
No ratings yet
CSE 544: Optimizations: Wednesday, 5/10/2006
51 pages
Query Execution
No ratings yet
Query Execution
87 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
4-Query Processing (Autosaved)
No ratings yet
4-Query Processing (Autosaved)
74 pages
UEU Basis Data Pertemuan 14
No ratings yet
UEU Basis Data Pertemuan 14
32 pages
CSE 544 Principles of Database Management Systems: Lecture 8 - Query Optimization
No ratings yet
CSE 544 Principles of Database Management Systems: Lecture 8 - Query Optimization
55 pages
Lec 14
No ratings yet
Lec 14
26 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
13 QP1
No ratings yet
13 QP1
33 pages
Query Processing
No ratings yet
Query Processing
39 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
8 pages
QEII
No ratings yet
QEII
44 pages
Queryoptimization Examples
No ratings yet
Queryoptimization Examples
26 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
CAS CS 460/660 Introduction To Database Systems Query Optimization
No ratings yet
CAS CS 460/660 Introduction To Database Systems Query Optimization
20 pages
ADE Unit-1: Entity Relational Model
No ratings yet
ADE Unit-1: Entity Relational Model
19 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
22426913
No ratings yet
22426913
124 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
11 pages
Query Processing for CS Students
No ratings yet
Query Processing for CS Students
47 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Relational Query Optimization Guide
No ratings yet
Relational Query Optimization Guide
71 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
13 pages
Lecture 4
No ratings yet
Lecture 4
59 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
Query Optimization
No ratings yet
Query Optimization
20 pages
DDBS Unit 2
No ratings yet
DDBS Unit 2
7 pages
Slides 12
No ratings yet
Slides 12
24 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Query Optimization in Database Systems
No ratings yet
Query Optimization in Database Systems
7 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
Query Processing and Optimisation - Intr
No ratings yet
Query Processing and Optimisation - Intr
41 pages
Practice Optimizer Blank
No ratings yet
Practice Optimizer Blank
7 pages
DBMS Query Optimization Guide
No ratings yet
DBMS Query Optimization Guide
24 pages
Chapter15 1
No ratings yet
Chapter15 1
43 pages
Query Planning & Optimization: Intro To Database Systems Andy Pavlo
No ratings yet
Query Planning & Optimization: Intro To Database Systems Andy Pavlo
30 pages
QueryOptimization Siao
No ratings yet
QueryOptimization Siao
24 pages
CH 11
No ratings yet
CH 11
19 pages
Lec 17
No ratings yet
Lec 17
24 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
Relational Query Optimization Guide
No ratings yet
Relational Query Optimization Guide
7 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Relational Algebra Optimization
No ratings yet
Relational Algebra Optimization
24 pages
Query Processing
No ratings yet
Query Processing
77 pages
Chapter 8
No ratings yet
Chapter 8
65 pages
12 Query Plan Space
No ratings yet
12 Query Plan Space
72 pages
Plan Cost
No ratings yet
Plan Cost
37 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
CSE 444: Database Internals: Section 4: Query Optimizer
No ratings yet
CSE 444: Database Internals: Section 4: Query Optimizer
16 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
38 pages
Relational Query Optimization: CS186 R & G Chapters 12/15
No ratings yet
Relational Query Optimization: CS186 R & G Chapters 12/15
51 pages
Distributed Databases: CS347 May 30, 2001
No ratings yet
Distributed Databases: CS347 May 30, 2001
48 pages
Query Optimization in Relational Database Systems
No ratings yet
Query Optimization in Relational Database Systems
77 pages
PCS SmartVoucher Instructions 22110233 44325
No ratings yet
PCS SmartVoucher Instructions 22110233 44325
46 pages
Media Dfas in Manual 37 100 24
No ratings yet
Media Dfas in Manual 37 100 24
165 pages
2023 AFR Working Document Version 45 Low Res
No ratings yet
2023 AFR Working Document Version 45 Low Res
121 pages
Query Opt5235234534t34vt4wtwtw45t4w
No ratings yet
Query Opt5235234534t34vt4wtwtw45t4w
24 pages
CL 152342624563536 GBSDBTSRBSRTVSRVSR
No ratings yet
CL 152342624563536 GBSDBTSRBSRTVSRVSR
100 pages
INDUSTRIALCONNECT2025 Brochure35235325236674847552887
No ratings yet
INDUSTRIALCONNECT2025 Brochure35235325236674847552887
8 pages
DIL SCA Overview
No ratings yet
DIL SCA Overview
17 pages
07a 44dsafgvaebnmui, Fmntyjsukzx
No ratings yet
07a 44dsafgvaebnmui, Fmntyjsukzx
28 pages
19-Qo-Notes42354252 53252
No ratings yet
19-Qo-Notes42354252 53252
8 pages
Liberia CAC35254432245ww5gwf
No ratings yet
Liberia CAC35254432245ww5gwf
1 page
G01002832 Social Community Home Care
No ratings yet
G01002832 Social Community Home Care
25 pages
Teamapev150624 5234524wtw3twtw4t4ww45
No ratings yet
Teamapev150624 5234524wtw3twtw4t4ww45
2 pages
Disabled Facilities Grants Policy
No ratings yet
Disabled Facilities Grants Policy
4 pages
A Guide To Disabled Facilities Grants: Information Library Helpline
No ratings yet
A Guide To Disabled Facilities Grants: Information Library Helpline
3 pages
ZuJHzNgy Dsafafer Erafsdf 129238434253465
No ratings yet
ZuJHzNgy Dsafafer Erafsdf 129238434253465
3 pages
For Solicitation Information Call:: Offeror To Complete Blocks 12, 17, 23, 24, & 30
No ratings yet
For Solicitation Information Call:: Offeror To Complete Blocks 12, 17, 23, 24, & 30
12 pages
Direct Expansion Air Ha Ndling Unit
No ratings yet
Direct Expansion Air Ha Ndling Unit
14 pages
PG HR Prac CH 10-Awards Dec15
No ratings yet
PG HR Prac CH 10-Awards Dec15
16 pages
McCrady Tri-Foldv4 - 2019
No ratings yet
McCrady Tri-Foldv4 - 2019
2 pages
Modular Air Handling Unit 1
No ratings yet
Modular Air Handling Unit 1
16 pages
Annex 1 - Cabinet Paper For 25 July 2019
No ratings yet
Annex 1 - Cabinet Paper For 25 July 2019
9 pages
FOI Request: Disabled Facilities Grants
No ratings yet
FOI Request: Disabled Facilities Grants
3 pages
Disabled Facilities Grants (DFGS) : Lifetime Homes, Lifetime Neighbourhoods
No ratings yet
Disabled Facilities Grants (DFGS) : Lifetime Homes, Lifetime Neighbourhoods
16 pages
Document 23425234532
No ratings yet
Document 23425234532
1 page
32 Hong
No ratings yet
32 Hong
6 pages
Shared Ownership and DFGs
No ratings yet
Shared Ownership and DFGs
3 pages
$item 09a
No ratings yet
$item 09a
7 pages
13 الافعال الشاذة
No ratings yet
13 الافعال الشاذة
3 pages
Roman to Arabic Numerals Converter
No ratings yet
Roman to Arabic Numerals Converter
1 page
Lal Bahadur Shastri Training Institute 2
No ratings yet
Lal Bahadur Shastri Training Institute 2
10 pages
Making Primavera P6 Look-Ahead Schedule - Ultimate Guide - Planrama
No ratings yet
Making Primavera P6 Look-Ahead Schedule - Ultimate Guide - Planrama
17 pages
CH 01
No ratings yet
CH 01
268 pages
Volkswagen CI Manual
No ratings yet
Volkswagen CI Manual
21 pages
Maqasid Syariah Menurut Imam Al-Ghazali Dan Aplikasinya Dalam Penyusunan Undang-Undang Islam Di Indonesia
No ratings yet
Maqasid Syariah Menurut Imam Al-Ghazali Dan Aplikasinya Dalam Penyusunan Undang-Undang Islam Di Indonesia
12 pages
CTEVT Entrance Exam PAST Question Papers With Solutions 2071 Scholarship
No ratings yet
CTEVT Entrance Exam PAST Question Papers With Solutions 2071 Scholarship
17 pages
Why I Am A Christian - Revised - Corrected
No ratings yet
Why I Am A Christian - Revised - Corrected
12 pages
Linear and Nonlinear Multivariable Feedback Control A Classical Approach 1st Edition Oleg Gasparyan Download
100% (4)
Linear and Nonlinear Multivariable Feedback Control A Classical Approach 1st Edition Oleg Gasparyan Download
81 pages
Updated Approved List of Private Schools in Ekiti-2
No ratings yet
Updated Approved List of Private Schools in Ekiti-2
30 pages
Garbage Collection
No ratings yet
Garbage Collection
44 pages
The Systems Life Cycle - Final
No ratings yet
The Systems Life Cycle - Final
27 pages
CMS Supervisor ODBC and JDBC Data Connection Documentation
No ratings yet
CMS Supervisor ODBC and JDBC Data Connection Documentation
46 pages
G8 - Practice 1ST Term Test
No ratings yet
G8 - Practice 1ST Term Test
5 pages
Software Engineer or Staff Programmer or Support Systems Special
No ratings yet
Software Engineer or Staff Programmer or Support Systems Special
4 pages
CMP 204 Data Structure
No ratings yet
CMP 204 Data Structure
19 pages
PD Day-8
No ratings yet
PD Day-8
15 pages
Module 8 Active-Passive Voice
No ratings yet
Module 8 Active-Passive Voice
3 pages
360WebPlatform v2 InstallationGuide en-US
No ratings yet
360WebPlatform v2 InstallationGuide en-US
28 pages
乌班图k8s1.27.4+Cri Docker
No ratings yet
乌班图k8s1.27.4+Cri Docker
11 pages
Form 5 English Syllabus 2020
No ratings yet
Form 5 English Syllabus 2020
8 pages
Breast and Axillae Examination Guide
No ratings yet
Breast and Axillae Examination Guide
30 pages
NE9000 V800R022C00 Configuration Guide 01 Basic Configuration
No ratings yet
NE9000 V800R022C00 Configuration Guide 01 Basic Configuration
408 pages
سنوات دوائر - أساسيات تكنولوجيا
No ratings yet
سنوات دوائر - أساسيات تكنولوجيا
20 pages
G09 Creative Arts Visual Arts Term 1 SBA Module 8.5 Question Paper Paper 2
No ratings yet
G09 Creative Arts Visual Arts Term 1 SBA Module 8.5 Question Paper Paper 2
7 pages
AICT (Outline)
No ratings yet
AICT (Outline)
5 pages
JVM Performance Engineering 2024 4
100% (1)
JVM Performance Engineering 2024 4
397 pages
Parametric Sensitivity and Runaway in Tubular Reactors
No ratings yet
Parametric Sensitivity and Runaway in Tubular Reactors
9 pages
409 Prototyping
No ratings yet
409 Prototyping
8 pages