Yes, Vipul!
Here's a more detailed and GATE-focused explanation of Query Processing and
Optimization – broken down topic by topic with relevant concepts, formulas, examples, and
strategies.
1. Evaluation of Relational Algebra Expressions
What is it?
A relational algebra expression is an abstract representation of a query.
The evaluation is how a DBMS executes this expression using algorithms (e.g.,
for σ, π, ⨝).
Steps in Query Evaluation:
1. Parsing: SQL → Parse tree.
2. Translation: Parse tree → Relational algebra expression.
3. Optimization: Relational algebra → Efficient evaluation plan.
4. Evaluation: Execute plan using access methods (e.g., sequential scan, index
scan).
Evaluation Methods:
Materialization: Intermediate results are stored in temporary tables.
Pipelining: Results flow directly from one operation to the next (more memory
efficient).
Cost Estimation Factors:
Block accesses (I/Os)
CPU cost
Size of input/output relations
Use of indexes
Example:
SELECT name FROM Employee WHERE dept = 'IT';
Relational Algebra:
πname(σdept='IT'(Employee))
Evaluation plan:
Use index on dept for selection
Project only name attribute (reduce size)
2. Query Equivalence
Definition:
Two relational algebra expressions are equivalent if they give the same result for all valid input
relations.
Why is it important?
Helps optimizer transform a complex query into a simpler, faster one without
changing the result.
Common Equivalence Rules:
Rule Transformation
Selection Commutativity σc1(σc2(R)) ≡ σc2(σc1(R))
Selection over Join σc(R ⨝ S) ≡ (σc(R)) ⨝ S (if c applies only to R)
Projection Commutativity πA(πB(R)) ≡ πA(R) (if A ⊆ B)
Join Commutativity R⨝S≡S⨝R
Join Associativity (R ⨝ S) ⨝ T ≡ R ⨝ (S ⨝ T)
Cartesian Product with Selection σc(R × S) ≡ R ⨝c S
Use Case:
Helps in rewriting queries to minimize intermediate result size, which improves performance.
3. Join Strategies
JOIN operations are costly, so choosing the right strategy is essential.
(a) Nested Loop Join
Outer loop on R, inner loop on S.
Time: O(m × n), where m = |R|, n = |S|
Block Nested Loop Join: Loads blocks instead of rows — improves
performance.
(b) Index Nested Loop Join
Use index on join attribute of inner table.
Faster if an index is available.
Cost: O(|R| + (|R| × log |S|)) if index is B-Tree.
(c) Sort-Merge Join
Sort both R and S on join attribute, then merge.
Time: O(n log n + m log m + n + m)
Good when relations are already sorted.
(d) Hash Join
Partition both relations using a hash function.
Join matching partitions.
Time: O(n + m)
Works best when relations fit in memory.
Strategy Choice Criteria:
Size of relations
Index availability
Memory size
Already sorted? Use sort-merge.
Small table? Use nested loop.
4. Query Optimization Algorithms
Goal:
Select the best execution plan with the lowest cost.
Types of Optimization:
Heuristic-Based Optimization
Uses rules of thumb:
o Perform selection early to reduce row count.
o Apply projection early to reduce column size.
o Combine Cartesian product + selection → Join
o Perform most selective operations first.
Cost-Based Optimization
Uses statistics (table size, index, data distribution) to compute cost for each
plan.
Chooses the one with least estimated cost.
Uses dynamic programming for optimal join order.
Dynamic Programming for Join Order:
For N relations, O(2^N) plans possible.
Use memoization to avoid recomputing subplans.
Example: Query Plan Optimization
Given:
SELECT E.name, D.name
FROM Employee E, Department D
WHERE E.dept_id = D.id AND E.salary > 50000;
Possible Plans:
1. Join then filter
o E ⨝ D → σsalary>50000
o Costly if E ⨝ D is large.
2. Filter then Join (Better)
o σsalary>50000(E) ⨝ D
o Smaller intermediate result → faster
Conclusion for GATE:
Understand relational algebra operations
Know equivalence rules for rewriting queries
Study all join algorithms with cost comparison
Grasp heuristic and cost-based optimization
Practice query trees and plan selection
Let me know if you want diagrams, MCQs, or handwritten-style one-pager notes for revision.