Query processing in DDBMS
Query processing in a Distributed Database Management System (DDBMS) involves
several key phases to efficiently retrieve and manipulate data across multiple,
geographically dispersed databases. These phases are:
Parsing and Translation: The system checks the syntax of the user's query and
translates it into an internal representation, often using relational algebra. This
ensures the query is valid and prepares it for optimization.
Query Decomposition: The translated query is broken down into simpler, more
manageable components. This step involves normalization, analysis,
simplification, and restructuring to facilitate efficient processing.
Data Localization: The system identifies which fragments of data are required
and where they are located across the distributed network. It then transforms the
query to operate on these specific data fragments.
Global Query Optimization: Various execution strategies are evaluated to
determine the most efficient plan. This involves considering factors like data
location, network latency, and resource availability.
Distributed Execution: The optimized query plan is executed across the relevant
sites. Results from different locations are combined and presented to the user as a
single output.
Query Decomposition in Optimization
Decomposition is a key step during query optimization that breaks down a complex
relational algebra expression into smaller parts. This process enables more efficient
execution, especially in distributed database systems, by reducing the cost of
computation, network traffic, and disk access.
Example: Query in SQL
We are working with three relations: Student, Course, and Teacher.
Let’s consider the following query:
SELECT Student.Name, Course.CourseName, Teacher.TeacherName
FROM Student
JOIN Course ON Student.CourseID = Course.CourseID
JOIN Teacher ON Course.TeacherID = Teacher.TeacherID
WHERE Course.CourseName = 'Math';
1. Translation to Relational Algebra
First, convert the SQL query to relational algebra:
π_Name, CourseName, TeacherName (
(Student ⨝ Student.CourseID = Course.CourseID
(σ_CourseName='Math' (Course)))
⨝ Course.TeacherID = Teacher.TeacherID Teacher
)
2.Decomposition (Breaking into Smaller Subqueries)
Step 1: Apply Selection Early (Selection)
We apply selection directly on the Course table:
σ_CourseName = 'Math' (Course)
This reduces the number of tuples involved in the join operation later, improving
performance.
Step 2: Join with Student Table
Next, we perform a join between the Student table and the filtered Course records:
Student ⨝ Student.CourseID = Course.CourseID
(σ_CourseName = 'Math' (Course))
Only the relevant students enrolled in the 'Math' course are considered.
Step 3: Join with Teacher Table
Now, we join the result with the Teacher table:
(Previous Result) ⨝ Course.TeacherID = Teacher.TeacherID Teacher
This gives us complete information about the student, their course, and the teacher.
Step 4: Apply Projection (Column Selection)
Finally, we extract only the required columns:
π_Student.Name, Course.CourseName, Teacher.TeacherName (Final Result)
Why Decomposition
Performance Improvement: Filtering early reduces data volume during joins.
Better Optimization: Subqueries can be executed independently and even
distributed across sites.
Modularity: Easier to manage and optimize small queries than one large
complex query.