0% found this document useful (0 votes)

77 views52 pages

Cost-Based Query Optimization Guide

The document discusses cost-based query optimization and focuses on plan cost estimation. It explains that the database management system stores statistics about tables, attributes, and indexes to estimate the cost of executing a query plan. These statistics include the number of tuples and distinct values for attributes. It also discusses how to estimate the selectivity of different predicate types like equality, range, and complex predicates. The document notes limitations in assuming uniform data distributions and independent predicates. It introduces techniques like histograms and sampling to improve selectivity estimates.

Uploaded by

smumin011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views52 pages

Cost-Based Query Optimization Guide

Uploaded by

smumin011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Cost-Based Query Optimization

Lecture 20: Cost-Based Query Optimization

1 / 52
Cost-Based Query Optimization Recap

Recap

2 / 52
Cost-Based Query Optimization Recap

Query Optimization

• Approach 1: Heuristics / Rules

▶ Rewrite the query to remove stupid / inefficient things.
▶ These techniques may need to examine catalog, but they do not need to examine data.
• Approach 2: Cost-based Search
▶ Use a model to estimate the cost of executing a plan.
▶ Evaluate multiple equivalent plans for a query and pick the one with the lowest cost.

3 / 52
Cost-Based Query Optimization Recap

Today’s Agenda

• Plan Cost Estimation

• Plan Enumeration

4 / 52
Cost-Based Query Optimization Plan Cost Estimation

Plan Cost Estimation

5 / 52
Cost-Based Query Optimization Plan Cost Estimation

Cost Estimation

• How long will a query take?

▶ CPU: Small cost; tough to estimate
▶ Disk: Number of block transfers
▶ Memory: Amount of DRAM used
▶ Network: Number of messages
• How many tuples will be read/written?
• It is too expensive to run every possible plan to determine this information, so the
DBMS need a way to derive this information. . .

6 / 52
Cost-Based Query Optimization Plan Cost Estimation

Statistics

• The DBMS stores internal statistics about tables, attributes, and indexes in its internal
catalog.
• Different systems update them at different times.
• Manual invocations:
▶ Postgres/SQLite: ANALYZE
▶ Oracle/MySQL: ANALYZE TABLE
▶ SQL Server: UPDATE STATISTICS
▶ DB2: RUNSTATS

7 / 52
Cost-Based Query Optimization Plan Cost Estimation

Statistics

• For each relation R, the DBMS maintains the following information:

▶ NR : Number of tuples in R.
▶ V(A, R): Number of distinct values for attribute A.

8 / 52
Cost-Based Query Optimization Plan Cost Estimation

Derivable Statistics

• The selection cardinality SC(A, R) is the average number of records with a value for
an attribute A is given by: NR / V(A, R)
• What could go wrong with this estimate?

9 / 52
Cost-Based Query Optimization Plan Cost Estimation

Derivable Statistics

• The selection cardinality SC(A, R) is the average number of records with a value for
an attribute A is given by: NR / V(A, R)
• Note that this assumes data uniformity.
▶ 10,000 students, 10 colleges – how many students in SCS?

10 / 52
Cost-Based Query Optimization Plan Cost Estimation

Selection Statistics

• Equality predicates on unique keys are easy to estimate.

• What about more complex predicates? What is their selectivity?
CREATE TABLE people (
id INT PRIMARY KEY,
val INT NOT NULL,
age INT NOT NULL,
status VARCHAR(16)
);
SELECT * FROM people WHERE id = 123 --- Easier
SELECT * FROM people WHERE val > 1000 --- Harder: Range predicate
SELECT * FROM people WHERE age = 30 AND status = 'Lit' --- Harder:
Complex predicate

11 / 52
Cost-Based Query Optimization Plan Cost Estimation

Complex Predicates

• The selectivity (sel) of a predicate P is the fraction of tuples that qualify.

• Formula depends on type of predicate:
▶ Equality
▶ Range
▶ Negation
▶ Conjunction
▶ Disjunction

12 / 52
Cost-Based Query Optimization Plan Cost Estimation

Selection – Complex Predicates

• Assume that V(age,people) has five distinct values (0–4) and NR = 5
• Equality Predicate: A=constant
▶ sel(A=constant) = SC(P) / NR
▶ Example: sel(age=2) = 1/5
SELECT * FROM people WHERE age = 2

13 / 52
Cost-Based Query Optimization Plan Cost Estimation

Selection – Complex Predicates

• Range Predicate:
▶ sel(A>=a) = (Amax – a) / (Amax – Amin )
▶ Example: sel(age>=2) ≈ (4 – 2) / (4 – 0) ≈ 1/2
SELECT * FROM people WHERE age >= 2

14 / 52
Cost-Based Query Optimization Plan Cost Estimation

Selection – Complex Predicates

• Negation Query:
▶ sel(not P) = 1 – sel(P)
▶ Example: sel(age != 2) = 1 – (1/5) = 4/5
• Observation: Selectivity ≈ Probability
SELECT * FROM people WHERE age != 2

15 / 52
Cost-Based Query Optimization Plan Cost Estimation

Selection – Complex Predicates

• Conjunction:
▶ sel(P1 ∧ P2) = sel(P1) × sel(P2)
▶ sel(age=2 ∧ name LIKE ’A%’)
• This assumes that the predicates are independent.
• Not always true in practice!
SELECT * FROM people WHERE age = 2 AND name LIKE 'A%'

16 / 52
Cost-Based Query Optimization Plan Cost Estimation

Selection – Complex Predicates

• Disjunction:
▶ sel(P1 ∨ P2) = sel(P1) + sel(P2) – sel(P1∧P2) = sel(P1) + sel(P2) – sel(P1) × sel(P2)
▶ sel(age=2 OR name LIKE ’A%’)
• This again assumes that the selectivities are independent.
SELECT * FROM people WHERE age = 2 OR name LIKE 'A%'

17 / 52
Cost-Based Query Optimization Plan Cost Estimation

Selection Cardinality

• Assumption 1: Uniform Data

▶ The distribution of values (except for the heavy hitters) is the same.
• Assumption 2: Independent Predicates
▶ The predicates on attributes are independent
• Assumption 3: Inclusion Principle
▶ The domain of join keys overlap such that each key in the inner relation will also exist in
the outer table.

18 / 52
Cost-Based Query Optimization Plan Cost Estimation

Correlated Attributes

• Consider a database of automobiles:

▶ Number of Makes = 10, Number of Models = 100
• And the following query: (make = ”Honda”ANDmodel = ”Accord”)
• With the independence and uniformity assumptions, the selectivity is:
▶ 1/10 × 1/100 = 0.001
• But since only Honda makes Accords, the real selectivity is 1/100 = 0.01

19 / 52
Cost-Based Query Optimization Plan Cost Estimation

Cost Estimation

• Our formulas are nice, but we assume that data values are uniformly distributed.

20 / 52
Cost-Based Query Optimization Plan Cost Estimation

Cost Estimation

• Our formulas are nice, but we assume that data values are uniformly distributed.

21 / 52
Cost-Based Query Optimization Plan Cost Estimation

Cost Estimation

• Our formulas are nice, but we assume that data values are uniformly distributed.

22 / 52
Cost-Based Query Optimization Plan Cost Estimation

Histograms With Quantiles

• Vary the width of buckets so that the total number of occurrences for each bucket is
roughly the same.

23 / 52
Cost-Based Query Optimization Plan Cost Estimation

Histograms With Quantiles

• Vary the width of buckets so that the total number of occurrences for each bucket is
roughly the same.

24 / 52
Cost-Based Query Optimization Plan Cost Estimation

Sampling

• Modern DBMSs also collect samples from tables to estimate selectivities.

• Update samples when the underlying tables changes significantly.
• Example: 1 billion tuples

SELECT AVG(age) FROM people WHERE age > 50

id name age status

1001 Shiyi 58 Senior
1002 Rahul 41 Sophomore
1003 Peter 25 Freshman
1004 Mark 25 Junior
1005 Alice 38 Senior

25 / 52
Cost-Based Query Optimization Plan Cost Estimation

Sampling

• Modern DBMSs also collect samples from tables to estimate selectivities.

• Update samples when the underlying tables changes significantly.
• Example: 1 billion tuples
• sel(age>50) = 1/3

SELECT AVG(age) FROM people WHERE age > 50

id name age status

1001 Shiyi 58 Senior
1003 Mark 25 Junior
1005 Alice 38 Senior

26 / 52
Cost-Based Query Optimization Plan Cost Estimation

Observation

• Now that we can (roughly) estimate the selectivity of predicates, what can we
actually do with them?

27 / 52
Cost-Based Query Optimization Plan Enumeration

Plan Enumeration

28 / 52
Cost-Based Query Optimization Plan Enumeration

Query Optimization

• After performing rule-based rewriting, the DBMS will enumerate different plans for
the query and estimate their costs.
▶ Single relation
▶ Multiple relations
• It chooses the best plan it has seen for the query after exhausting all plans or
some timeout.

29 / 52
Cost-Based Query Optimization Plan Enumeration

Single-Relation Query Planning

• Pick the best access method.

▶ Sequential Scan
▶ Binary Search (clustered indexes)
▶ Index Scan
• Predicate evaluation ordering.
• Simple heuristics are often good enough for this.
• OLTP queries are especially easy. . .

30 / 52
Cost-Based Query Optimization Plan Enumeration

OLTP Query Planning

• Query planning for OLTP queries is easy because they are sargable (Search Argument
Able).
▶ It is usually just picking the best index.
▶ Joins are almost always on foreign key relationships with a small cardinality.
▶ Can be implemented with simple heuristics.

CREATE TABLE people (

id INT PRIMARY KEY,
val INT NOT NULL,
);

SELECT * FROM people WHERE id = 123;

31 / 52
Cost-Based Query Optimization Plan Enumeration

Multi-Relation Query Planning

• As number of joins increases, number of alternative plans grows rapidly

▶ We need to restrict search space.
• Fundamental decision in System R: only left-deep join trees are considered.
▶ Modern DBMSs do not always make this assumption anymore.

32 / 52
Cost-Based Query Optimization Plan Enumeration

Multi-Relation Query Planning

• Fundamental decision in System R: Only consider left-deep join trees.

33 / 52
Cost-Based Query Optimization Plan Enumeration

Multi-Relation Query Planning

• Fundamental decision in System R: Only consider left-deep join trees.

34 / 52
Cost-Based Query Optimization Plan Enumeration

Multi-Relation Query Planning

• Fundamental decision in System R: Only consider left-deep join trees.

• Allows for fully pipelined plans where intermediate results are not written to temp
files.
▶ Not all left-deep trees are fully pipelined.

35 / 52
Cost-Based Query Optimization Plan Enumeration

Multi-Relation Query Planning

• Enumerate the orderings

▶ Example: Left-deep tree 1, Left-deep tree 2. . .
• Enumerate the physical join operator for each logical join operator
▶ Example: Hash, Sort-Merge, Nested Loop. . .
• Enumerate the access paths for each table
▶ Example: Index 1, Index 2, Seq Scan. . .
• Use dynamic programming to reduce the number of cost estimations.

36 / 52
Cost-Based Query Optimization Plan Enumeration

Dynamic Programming

37 / 52
Cost-Based Query Optimization Plan Enumeration

Dynamic Programming

38 / 52
Cost-Based Query Optimization Plan Enumeration

Dynamic Programming

39 / 52
Cost-Based Query Optimization Plan Enumeration

Dynamic Programming

40 / 52
Cost-Based Query Optimization Plan Enumeration

Dynamic Programming

41 / 52
Cost-Based Query Optimization Plan Enumeration

Candidate Plan Example

• How to generate plans for search algorithm:

▶ Enumerate relation orderings
▶ Enumerate join algorithm choices
▶ Enumerate access method choices
• No real DBMSs does it this way. It’s actually more messy. . .
SELECT * FROM R, S, T
WHERE R.a = S.a AND S.b = T.b

42 / 52
Cost-Based Query Optimization Plan Enumeration

Candidate Plans

• Step 1: Enumerate relation orderings

43 / 52
Cost-Based Query Optimization Plan Enumeration

Candidate Plans

• Step 2: Enumerate join algorithm choices

44 / 52
Cost-Based Query Optimization Plan Enumeration

Candidate Plans

• Step 3: Enumerate access method choices

45 / 52
Cost-Based Query Optimization Plan Enumeration

Postgres Optimizer

• Examines all types of join trees

▶ Left-deep, Right-deep, bushy
• Two optimizer implementations:
▶ Traditional Dynamic Programming Approach
▶ Genetic Query Optimizer (GEQO)
• Postgres uses the traditional algorithm when number of tables in query is less than 12
and switches to GEQO when there are 12 or more.

46 / 52
Cost-Based Query Optimization Plan Enumeration

Postgres Optimizer

47 / 52
Cost-Based Query Optimization Plan Enumeration

Postgres Optimizer

48 / 52
Cost-Based Query Optimization Plan Enumeration

Postgres Optimizer

49 / 52
Cost-Based Query Optimization Conclusion

Conclusion

50 / 52
Cost-Based Query Optimization Conclusion

Parting Thoughts

• Selectivity estimations
• Key assumptions in query optimization
▶ Uniformity
▶ Independence
▶ Histograms
▶ Join selectivity
• Dynamic programming for join orderings

51 / 52
Cost-Based Query Optimization Conclusion

Next Class

• Design Decisions in Query Optimization

52 / 52

15 Optimization
No ratings yet
15 Optimization
8 pages
Lec 16
No ratings yet
Lec 16
26 pages
3 Query Processing and Optimization-1
No ratings yet
3 Query Processing and Optimization-1
18 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
127 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
ADB Slides 4
No ratings yet
ADB Slides 4
47 pages
QueryOptimization Siao
No ratings yet
QueryOptimization Siao
24 pages
1.3 PPT - Measure of Query Cost
100% (1)
1.3 PPT - Measure of Query Cost
42 pages
Cost Estimation For Query Optimization
No ratings yet
Cost Estimation For Query Optimization
14 pages
Query Optimization
No ratings yet
Query Optimization
20 pages
2 Select Optimization
No ratings yet
2 Select Optimization
23 pages
05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
Lec 13
No ratings yet
Lec 13
26 pages
DBMS Query Optimization Guide
No ratings yet
DBMS Query Optimization Guide
24 pages
Query Processing and Query Optimization Techniques
No ratings yet
Query Processing and Query Optimization Techniques
20 pages
Query Optimization Techniques
No ratings yet
Query Optimization Techniques
48 pages
Query
No ratings yet
Query
10 pages
DBMS
No ratings yet
DBMS
24 pages
11 Query Evaluations
No ratings yet
11 Query Evaluations
17 pages
Ivunit Query Processing
No ratings yet
Ivunit Query Processing
12 pages
Relational Query Optimization: Warih Maharani, ST.,MT
No ratings yet
Relational Query Optimization: Warih Maharani, ST.,MT
39 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
55 pages
Chap 12
No ratings yet
Chap 12
73 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
Rdbms Assignment
No ratings yet
Rdbms Assignment
12 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Query Proc Notes
No ratings yet
Query Proc Notes
10 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
CH 11
No ratings yet
CH 11
19 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
Relational Query Optimization Guide
No ratings yet
Relational Query Optimization Guide
7 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
28 pages
Databases LEVEL 3 Notes
No ratings yet
Databases LEVEL 3 Notes
29 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Query Processing
No ratings yet
Query Processing
39 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Query Processing & Evaluation Guide
No ratings yet
Query Processing & Evaluation Guide
23 pages
Lec 14
No ratings yet
Lec 14
26 pages
Overview Ioannidis Chapter
No ratings yet
Overview Ioannidis Chapter
3 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
45 pages
Query Optimization
No ratings yet
Query Optimization
51 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
64 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
Chapter 13: Query Processing: Database System Concepts, 5th Ed
No ratings yet
Chapter 13: Query Processing: Database System Concepts, 5th Ed
55 pages
Lec 17
No ratings yet
Lec 17
24 pages
Query Processing for DBMS Students
No ratings yet
Query Processing for DBMS Students
13 pages
CH 19 Sum
No ratings yet
CH 19 Sum
8 pages
Lec 7 Query Processing, Optimization & Indexing
No ratings yet
Lec 7 Query Processing, Optimization & Indexing
29 pages
Top 101 Consulting Framework
91% (11)
Top 101 Consulting Framework
205 pages
McKinsey Handbook - How To Write A Business Plan
97% (29)
McKinsey Handbook - How To Write A Business Plan
116 pages
LeadFunnels Ebook
92% (39)
LeadFunnels Ebook
172 pages
The 100+ Business Models by FourWeekMBA - Full Library
95% (41)
The 100+ Business Models by FourWeekMBA - Full Library
780 pages
Kpi
83% (29)
Kpi
17 pages
Developing A Value Proposition
100% (20)
Developing A Value Proposition
22 pages
The Big Book of Key Performance Indicators by Eric Peterson
95% (20)
The Big Book of Key Performance Indicators by Eric Peterson
109 pages
8 Figure Sales Funnels Playbook 3
100% (20)
8 Figure Sales Funnels Playbook 3
20 pages
GTM Strategy Playbook
100% (9)
GTM Strategy Playbook
41 pages
Business Models for Entrepreneurs
100% (22)
Business Models for Entrepreneurs
55 pages
The Product Book 2nd Edition
100% (19)
The Product Book 2nd Edition
304 pages
Value Proposition
77% (26)
Value Proposition
33 pages
Business Strategy & Customer Insights
100% (6)
Business Strategy & Customer Insights
108 pages
15000+ ChatGPT Prompts, (Crafti - Pro) - Tareas
93% (27)
15000+ ChatGPT Prompts, (Crafti - Pro) - Tareas
367 pages
High Ticket Courses - PRINT
92% (12)
High Ticket Courses - PRINT
190 pages
Business Strategy + Models
100% (13)
Business Strategy + Models
109 pages
Visual Data Storytelling With Tableau by Lindy Ryan
85% (20)
Visual Data Storytelling With Tableau by Lindy Ryan
450 pages
Mastering SaaS Pricing Ebook PDF
100% (1)
Mastering SaaS Pricing Ebook PDF
62 pages
Strategy Tools
100% (13)
Strategy Tools
317 pages
B2B Demand Generation Hacks
100% (14)
B2B Demand Generation Hacks
209 pages
250+ McKinsey Frequently Used Templates (7 Color Themes)
83% (12)
250+ McKinsey Frequently Used Templates (7 Color Themes)
256 pages
Brick & Mortar Funnel - Russell Brunson's Experts - Full Download
100% (16)
Brick & Mortar Funnel - Russell Brunson's Experts - Full Download
183 pages
Operational Excellence Template
91% (11)
Operational Excellence Template
41 pages
Jay Abraham MindMap
96% (23)
Jay Abraham MindMap
6 pages
Strategic Analysis Tools Book
92% (12)
Strategic Analysis Tools Book
40 pages
Scorecard Marketing by Daniel Priestley
100% (12)
Scorecard Marketing by Daniel Priestley
166 pages
Customer Experience Excellence - The Six Pillars of Growth-Kogan Page (2021)
100% (2)
Customer Experience Excellence - The Six Pillars of Growth-Kogan Page (2021)
403 pages
BAIN GUIDE Management Tools 2015 Executives Guide
100% (12)
BAIN GUIDE Management Tools 2015 Executives Guide
68 pages
Value Proposition for Startups
100% (8)
Value Proposition for Startups
5 pages
Sales Scripts That Sell
92% (51)
Sales Scripts That Sell
192 pages
Modul 2 - Data Governance - DMBOK2
No ratings yet
Modul 2 - Data Governance - DMBOK2
55 pages
Relational Data Model & SQL DDL
No ratings yet
Relational Data Model & SQL DDL
79 pages
Advance
No ratings yet
Advance
2 pages
MDGP WhitePaper Performance
No ratings yet
MDGP WhitePaper Performance
37 pages
SAP BW Info Objects and Data Flow
No ratings yet
SAP BW Info Objects and Data Flow
4 pages
Test Bank For Database Concepts 8th Edition
No ratings yet
Test Bank For Database Concepts 8th Edition
22 pages
Change "Query1" SQL With New SQL Using VBA
No ratings yet
Change "Query1" SQL With New SQL Using VBA
3 pages
Professional Practices: Assignment # 03
No ratings yet
Professional Practices: Assignment # 03
4 pages
02-Tools For Data Science
No ratings yet
02-Tools For Data Science
6 pages
7 - Data Analysis and Presentation
No ratings yet
7 - Data Analysis and Presentation
27 pages
Log
No ratings yet
Log
1,389 pages
Domain Controller Critical Services
No ratings yet
Domain Controller Critical Services
14 pages
Storage Devices - Worksheet: Storage Device Size Description Advantages Disadvantages Hard Disk
No ratings yet
Storage Devices - Worksheet: Storage Device Size Description Advantages Disadvantages Hard Disk
3 pages
s4 Hana Sales 1809
No ratings yet
s4 Hana Sales 1809
173 pages
Joiner Transformation Overview
No ratings yet
Joiner Transformation Overview
16 pages
Ethics and Law in Data Scandals
No ratings yet
Ethics and Law in Data Scandals
9 pages
Amazon Mws Access: General Role: Listorders Listordersbynexttoken Getorder
No ratings yet
Amazon Mws Access: General Role: Listorders Listordersbynexttoken Getorder
2 pages
Snowflake
No ratings yet
Snowflake
22 pages
Mongodb-Unit 5
No ratings yet
Mongodb-Unit 5
120 pages
Working With Cursors
No ratings yet
Working With Cursors
3 pages
ADMS Chapter One
No ratings yet
ADMS Chapter One
51 pages
ETL Testing for BI Professionals
No ratings yet
ETL Testing for BI Professionals
13 pages
Practive 1 Php&Mysql SDLC
No ratings yet
Practive 1 Php&Mysql SDLC
22 pages
Class 12 Computer Science Exam
No ratings yet
Class 12 Computer Science Exam
9 pages
Redundancy, Replication
No ratings yet
Redundancy, Replication
5 pages
Summary Chapter 1 - Database Concepts
No ratings yet
Summary Chapter 1 - Database Concepts
15 pages
Talend For Data Integreation Day - 1 (01-09-2017)
No ratings yet
Talend For Data Integreation Day - 1 (01-09-2017)
9 pages
How To Use Excel XLOOKUP Effectively 1727695273
No ratings yet
How To Use Excel XLOOKUP Effectively 1727695273
7 pages
CET341 Assignment Two 2021 - 22
No ratings yet
CET341 Assignment Two 2021 - 22
9 pages
Three Schema Architecture
No ratings yet
Three Schema Architecture
3 pages

Cost-Based Query Optimization Guide

Uploaded by

Cost-Based Query Optimization Guide

Uploaded by

Cost-Based Query Optimization

Lecture 20: Cost-Based Query Optimization

• Approach 1: Heuristics / Rules

• Plan Cost Estimation

Plan Cost Estimation

• How long will a query take?

• For each relation R, the DBMS maintains the following information:

• Equality predicates on unique keys are easy to estimate.

• The selectivity (sel) of a predicate P is the fraction of tuples that qualify.

Selection – Complex Predicates

Selection – Complex Predicates

Selection – Complex Predicates

Selection – Complex Predicates

Selection – Complex Predicates

• Assumption 1: Uniform Data

• Consider a database of automobiles:

Histograms With Quantiles

Histograms With Quantiles

• Modern DBMSs also collect samples from tables to estimate selectivities.

SELECT AVG(age) FROM people WHERE age > 50

id name age status

• Modern DBMSs also collect samples from tables to estimate selectivities.

SELECT AVG(age) FROM people WHERE age > 50

id name age status

Single-Relation Query Planning

• Pick the best access method.

OLTP Query Planning

CREATE TABLE people (

SELECT * FROM people WHERE id = 123;

Multi-Relation Query Planning

• As number of joins increases, number of alternative plans grows rapidly

Multi-Relation Query Planning

• Fundamental decision in System R: Only consider left-deep join trees.

Multi-Relation Query Planning

• Fundamental decision in System R: Only consider left-deep join trees.

Multi-Relation Query Planning

• Fundamental decision in System R: Only consider left-deep join trees.

Multi-Relation Query Planning

• Enumerate the orderings

Candidate Plan Example

• How to generate plans for search algorithm:

• Step 1: Enumerate relation orderings

• Step 2: Enumerate join algorithm choices

• Step 3: Enumerate access method choices

• Examines all types of join trees

• Design Decisions in Query Optimization

You might also like