[go: up one dir, main page]

0% found this document useful (0 votes)
11 views63 pages

Unit 3

Uploaded by

reqmail2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views63 pages

Unit 3

Uploaded by

reqmail2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

1

Overview of Query Processing


• Energy efficiency is an important feature in designing and executing databases.
The aim of query processing is to transform a query written in a high level
language, typically SQL into a correct and efficient execution strategy expressed
in a low level language and to execute the strategy to retrieve the required
data.
• Query processing is the activities involved in parsing , validating, optimizing and
executing a query.
• A query has many possible execution strategies, and the process of choosing a
suitable one for processing a query is known as query processing.

2
Query Processing and Optimization
Query Processing includes translation of high-level queries into
low-level expression that can be used at the physical level of the
file system, query optimization and actual execution of the query
to get the result.
Query Optimization is the process in which multiple query-
execution plans for satisfying a query are examined and a most
efficient query plan is identified for execution.
The term optimization is actually a misnomer because in most of
the cases the chosen execution plan is not optimal strategy. It is
just a reasonably efficient or the best available strategy for
executing the query. Finding the optimal strategy is usually too
time consuming except for the simplest of queries.

3
Query Processing Steps

1. Parsing and translation: translate the query


into its internal form.
 This is then translated into relational
algebra.
 Query Tree, Graph
 Parser checks syntax, verifies relations
2. Optimization: The process of choosing a
suitable execution strategy for processing
a query.
3. Evaluation
 The query-evaluation engine takes a
query-execution plan,
 executes that plan,
 and returns the answers to the query.
4
Query Processing Steps

5
6
Query Tree
• It is used to represent relational algebra expressions.
• Tree Data Structure
• Input relations are represented as leaf nodes
• Relational algebra operation as internal nodes
• Example:
• Relation: Book
Bid B_name Category A_id Price
• SQL Query
Select B_name, Price
From Book
Where Price > 50;

7
Query Tree: Example
Select B_name, Price
From Book
Where Price > 50;

Price>50(B_name, Price(Book)) B_name, Price(Price>50(Book))

Price>50 B_name, Price

B_name, Price Price>50

Book Book
8
Query Evaluation Plan
• It is used to fully specify how to evaluate a query, each
operation in the query tree is annotated with
instructions which specify the algorithm or the index to
be used to evaluate that operation.

B_name, Price(Price>50(Book))

B_name, Price

Price>50

Book 9
Query Evaluation Plan
• Note:
• Different evaluation plans for a given query can have different costs. It is the
responsibility of the query optimizer to generate a least costly plan.
• Best Query Evaluation plan is finally submitted to the query
evaluation engine for actual execution.

10
Query Blocks
• Query submitted to the database system is first decomposed into
Query Blocks.
 A Query Block forms a basic unit that can be translated into

relational algebra expression –represented as a query tree data


structure-that is then optimized. Typically, SQL queries are
decomposed into query blocks, which form the basic units that
can be translated into the algebraic expression and optimized.
Nested queries within a query are identified as separate blocks.

A query block contains a single SELECT-FROM WHERE expression,


as well as GROUP BY and HAVING clause if these are part of the
block.
Hence, nested queries within a query are identified as separate
query blocks.
11
The optimizer would then choose an execution plan for
each query block

12
Algorithms for External Sorting

• Sorting is an often-used algorithm in query processing. For example, whenever


an SQL query specifies an ORDER BY-Clause, the query result must be sorted.
Sorting also a key component in sort-merge algorithms used for JOIN and
other operations and in duplicate elimination algorithms for project
operations
• External sorting:
• Algorithms suitable for large files that do not fit entirely in main memory
• Sort-merge strategy based on sorting smaller subfiles (runs) and merging the sorted
runs
• Requires buffer space in main memory
• DBMS cache

13
14
Algorithms for SELECT Operation

• SELECT operation : There are many algorithms for executing a SELECT operation,
which is basically a search operation to locate the records in a disk file that
satisfy a certain condition.

15
16
Search Methods for Complex Selections
• Conjunction: A conjunction selection is of the form:
• σ θ1^ θ2^…. ^Θn( r ).
• Disjunction: A disjunctive selection is a selection of the form:
• σ θ1v θ2v…. vΘn( r ). A disjunctive condition is satisfied by the union of all
records satisfying the individual, simple condition θi.
• Negation : The result of selection σ-θ( r ) is the set of tuples of r for
which the condition θ evaluates to be false.

17
Joining
• Like selection, the join operation can be implemented in a variety of
ways. In terms of disk access, the joining can be very expensive, so
implementing and utilizing efficient join algorithms is critical in
minimizing a query’s execution time.
• The following are 5 well-known types of join algorithms are:
• Nested-Loop Join
• Block Nested Loop Join
• Indexed Nested Loop Join
• Sort Nested Loop Join
• Sort Merge Join
• Hash Join
18
Nested Join Loop
• This algorithm consists of an inner for loop nested within an outer for loop. We
use following notations
• r,s : Relations r and s
• tr : Tuple in relation r
• ts : Tuple in relation s
• nr : Number of records in relation r
• ns : Number of records in relation s
• br : Number of blocks with records in relation r
• bs : Number of blocks with records in relation s

19
Nested Join Loop

• for each tuple tr in r {


• for each tuple ts in s do {
• begin
• if join condition is true for (tr,ts)
• then add tuple tr x ts to the result
•}
•}
• In the algorithm, ts and tr are the tuples of relations r and s respectively. The
relation ts*tr is a tuple constructed by concatenating the attribute values of
tuples tr and ts
20
Block Nested Loop Join
• When the buffer is to small to hold either relation entirely in memory, we can
still obtain a major saving in block accesses if we process the relations on a per
block, rather than on a per tuple basis. The following figure shows block nested
loop join, where every block of the inner relation is paired with every block of
outer relation. Within each pair of blocks, every tuple in one block is paired
with every tuple in outer block, to generate all pairs of tuples. As before, all
pairs of tuples that satisfy the join condition are added to the result.

• The primary difference in cost between the block nested loop join and the
basic nested-loop join is that, in the worst case, each block in the inner relation
s is read only once for each block in the outer relation, instead of once for each
tuple in the outer relation. Clearly, it is more efficient to use the smaller
relation as the outer relation, in case neither of the relations fits in memory.
21
Block Nested Loop Join
• for each block br of r {
• for each block bs of s {
• for each tuple tr in br {
• for each tuple ts in bs {
• if join condition is true for (tr, ts)
• add tuple tr*ts to the result
•}
•}
•}
•}
22
Sort Merge Join
• This algorithm can be used to perform natural joins and equi-joins and
requires that each relation r and r be sorted by the common attributes
between R ∩ S. The details for how this algorithm works will not be
presented here.
• However, it is notable to point out that each record in r and s is only
scanned once, thus producing a worst and best-case cost of br + bs.
• Variations of the Sort-Merge join algorithm are used for instance, when
the data files are in unsorted order, but there exist secondary indices for
the two relations.

23
Hash Join
• Like the sort-merge join, the hash join algorithm can be used to perform natura
joins and equi-joins. The concept behind the hash –join algorithm is to partition
the tuples of each given relation into sets.

• The partition is done on the basic of same hash value on join attributes.

• The has function provides the hash value. The main goal of using the hash
function in the algorithm is to reduce the number of comparisons and increase
the efficiency to complete the join operation on the relations.

24
Hash Join
• For example, suppose there are two tuples a and b where both of them
satisfy the join condition. It means they have the same hash value for the
join attributes.

• Suppose that both a and b tuples consists of a hash value as i. It implies that
tuple a should be in ai, and tuple b should be in bi.

• Thus, only compares tuples in ai with tuples of bi. There is no need to


compare the b tuples in any other partition.
• Therefore, in this way, the hash join operation works.

25
Evaluation of Expression
• One obvious way to evaluate an expression is simply to evaluate one
operation at a time, in an appropriate order. There are two
approaches how a query expression tree can be evaluated.
• Materialization : Compute the result of an evaluation primitive and
materialize (store) the new relation on the disk
• Pipelining : Pass on tuple to parent operation even while an
operation is still being executed.

26
Materialization
• It is easier to understand intuitively to evaluate an expression by looking at a
pictorial representation of the expression in an operator tree.
• Consider the expression: π staff_name (σdept_id=3(Department) ⋈ staff)

• π staff_name

• ⋈

• σdept_id=3 Staff

• Department 27
Materialization

• In this method, the given expression evaluates one relational operation at a


time. Also, each operation is evaluated in an appropriate sequence or order.
After evaluating all the operations, the outputs are materialized in a temporary
relation for their subsequent uses. The examples of above figure is computed
as follows:
1. Compute σdept_id=3(Department) and store in relation1
2. Compute relation1 ⋈ staff and store result int relation2
3. Compute π staff_name on materialized relation2
By repeating the process, we will eventually evaluate the operation at the root
of the tree, giving the final result of the expression.
The cost of this type of evaluation is always more leading to a disadvantages.
The disadvantages is that it needs to construct those temporary relations for
materializing the results of the evaluated relations. 28
Pipelining
• In this method, DBMS do not store the records into temporary tables. Instead,
it queries each and result of which will be passed to next query to express and
so on.
• It will process the query one after the other and each will use the result of
previous query for its processing.
• Pipelining evaluates multiple operations simultaneous by passing results of one
operation to the next one without storing the tuples on the disk. In the above
example, all three operations can be placed in a pipeline, which passes the
results of the selection to the join as they are generated.
• In turn, it passes the results of the join to the projection as they are generated.

29
Pipelining

• Creating a pipeline of operations can provide two benefits.


• It eliminates the cost of reading and writing temporary relations, reducing the
cost of query evaluation.

• It can start generating query results quickly, if the most operator of query
evaluation plan in combined in a pipeline with its inputs. This can be quite
useful if the results are displayed to user as they are generated.

30
Pipelining
• Pipeline can be executed in either of two ways.
• Demand-driven (or Lazy evaluation)Pipelining: In this method, the result of
lower level queries is not passed to the higher level automatically. It will be
passed to higher level only when it is requested by the higher level. In this
method, it retains the result value and state with it and it will be transformed
to the next level only when it is requested.

• Production-Driven (or Eager) Pipelining: In this method, the lower-level


queries eagerly pass the results to higher level queries. It does not wait for the
higher level queries for the results. In this method, lower-level query creates a
buffer to store the results and the higher level queries pulls the results for its
use. If the buffer is full, then the lower level query waits for the higher level
query to empty it. Hence it is also called as PULL and PUSH pipelining.
31
Query Optimization
• The function of query optimization engine is to find an evaluation plan
that reduces the overall execution cost of a query.
• It is clear that the costs for performing particular operations such as
select and join can very dramatically.
• Example: Consider two relations r and r with the following characteristics
• 10000 = nr = Number of tuples in r
• 1000 = ns = Number of tuples in s

32
Query Optimization
• Selecting a single record from a r on a non-key attribute can have,
• Cost of [log2(br)] = 100 [binary search]
• Cost of br/2 = 5000 (Linear search]
• It is noticed that the cost difference between the two selects differs by a
factor of 500

33
Cost Based Optimization
• This process of selecting a lower cost mechanism is known as cost based
optimization. This is based on the cost of the query. The query can use
different paths based on indexes, constraints, sorting methods etc.
• This method mainly uses the statistics like record size, number of records ,
number of records per block, number of blocks, table size, whether whole
table fits in a block, organization of tables, uniqueness of column values,
size of columns etc.
• Some of the features of the cost based optimization are as follows.
• It is based on the cost of the query that can be optimized.
• The query can use a lot of paths on the value of indexes, available sorting
methods, constraints etc.

34
Cost Based Optimization
• The aim of query optimization is to choose the most efficient path of
implementing the query at the possible lowest minimum cost in the
form of an algorithm.

• The cost of executing the algorithm needs to be provided by the


query optimizer so that the most suitable query can selected for an
operation.

• The cost of an algorithm also depends upon the cardinality of the


input

35
Heuristic Base Optimization
• Heuristic optimization transforms the query tree by using a set of rules
(Heuristics) that typically (but not in all cases) improve execution performance .
Some common heuristic rules are:
• Perform selections early (reduces the number of tuples)
• Perform the projection early (reduces the number of attributes)
• Perform most restrictive selection and join operations (with smallest result size )
before other similar operations
Initially query tree from SQL statement is generated. Query tree is transformed
into more complex tree, via a series of tree modification, each of which hopefully
reduces the execution time

36
Semantic Base Optimization

• This strategy uses constraints specified on the database schema such as unique
attributes and other more complex constraints-in order to modify one query into
another query that is more efficient to execute.
• Example: Consider the following query
• select e.first_name, last_name from Employee e, Employee m
• where e.super_ssn = m.ssn and e.salary > m.salary
• This query retrieves the names of employees who earn more than their
supervisors. Suppose that we have constraint on the database schema that stated
that no employee can earn more than his or her direct supervisor. If the semantic
query optimizer checks for the existence of this constraint, it does not need to
execute the query at all because it knows that the result of the query will be
empty.
37
Notations for Query Tree and Query Graphs

38
Expression and Tree Transformation
• After a high level query (SQL statement) has been parsed into an equivalent
relational algebra expression, the query optimizer can perform heuristic rules
on the expression and tree to transform the expression and tree into
equivalent, but optimal form.
• Example: Consider the following query:

• SELECT stu_name, marks_obtained


• FROM student, marks
• WHERE stud_id=10 and sub_id=20
• A corresponding relational algebra expression is:
• Π stu_name, marks_obtained (σstu_id = 10(σsub_id = 20)(student ⋈ Marks)
39
Expression and Tree Transformation

• Π stu_name, marks_obtained

• σstu_id = 10

• σsub_id = 20

• ⋈


• Student Marks
40
Expression and Tree Transformation
• Suppose the student and marks relations both have 100 records and the
number of stu_id = 10 is 50. Note that the Cartesian product resulting in 10000
records can be reduced by 50% if the σstu_id = 10 is performed first.
• We can also combine the σsub_id = 20 and Cartesian product operations into a
more efficient join operation, as well as eliminating any unneeded columns
before the expensive join in performed. The diagram below shows this better
“optimized” version of the tree.

41
Expression and Tree Transformation
• Π stu_name, marks_obtained

• ⋈stu_id = sub_id


• Π stu_name, marks_obtained Π stu_name, marks_obtained

σstu_id = 10
• σsub_id = 20

student Marks 42
Query Trees and Heuristics for Query Optimization
• Step 1: Scanner and parser generate initial query representation
• Step 2: Representation is optimized according to heuristic rules
• Step 3: Query execution plan is developed
• Execute groups of operations based on access paths available and files involved

43
Heuristic Optimization of Query Trees
• In general, many different relational algebra expressions and hence many
different query trees can be equivalent; that is, they can correspond to
the same query. The query parser will typically generate a standard initial
query tree to correspond to an SQL query, without doing any
optimization. It is now the job of heuristic query optimizer to transform
this initial query tree into a final query tree that is efficient to execute.
• The optimizer must include rules for equivalent among relational algebra
expressions that can be applied to the initial tree. The heuristic query
optimization rules then utilize these equivalence expressions to transform
the initial tree into the final, optimized query tree.

44
Query Trees and Heuristics for Query Optimization (cont’d.)
• Example Heuristic rule
• Apply SELECT and PROJECT before JOIN
• Reduces size of files to be joined
• Query Tree
• Represents relational algebra expression
• Query Graph
• Represents relational calculus expression

45
Example of Transforming a Query
• For every project located in ‘Stafford’, retrieve the project number,
the controlling department number, department manager’s last
name, address, and birthdate

46
Figure 19.1 Two query trees for
the query Q2. (a) Query tree
corresponding to the relational
algebra expression for Q2. (b)
Initial (canonical) query tree for
SQL query Q2. (c) Query graph
for Q2.

47
Query Trees and Heuristics for Query Optimization (cont’d.)
• Query tree represents a specific order of operations for executing a query
• Preferred to query graph for this reason
• Query graph
• Relation nodes displayed as single circles
• Constants represented by constant nodes
• Double circles or ovals
• Selection or join conditions represented as edges
• Attributes to be retrieved displayed in square brackets

48
Heuristic Optimization of Query Trees
• Many different query trees can be used to represent the query and get the
same results
• Figure b shows initial tree for Q2
• Very inefficient - will never be executed
• Optimizer will transform into equivalent final query tree

Query : Consider the following query Q on the following database schema: Find the last
names of employees born after 1957 who work on a project named ‘ Aquarius’.

49
Schema

50
Query Transformation Example

Figure 19.2 Steps in converting a query tree during heuristic


optimization. (a) Initial (canonical) query tree for SQL query Q.
51
Query Transformation Example (cont’d.)

Figure 19.2 Steps in converting a query tree during heuristic optimization


(b) Moving SELECT operations down the query tree.
52
Query Transformation Example (cont’d.)

Figure 19.2 Steps in converting a query tree during heuristic optimization


(c) Applying the more restrictive SELECT operation first.
53
Query Transformation Example (cont’d.)

Figure 19.2 Steps in converting a query tree during heuristic optimization


(d) Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.

54
Query Transformation Example (cont’d.)

Figure 19.2 Steps in converting a query tree during heuristic optimization


(e) Moving PROJECT operations down the query tree.
55
General Transformation Rules for Relational
Algebra Operations
• There are many rules for transforming relational algebra operations into
equivalent ones. If two relations have the same set of attributes in a
different order but the two relations represent the same information, we
consider the relations equivalent. We can state some transformation rules
that are useful in query optimization without proving them.

56
General Transformation Rules for Relational
Algebra Operations

57
General Transformation Rules for Relational Algebra Operations

58
59
Summary of Heuristics for Algebraic Optimization
• Apply first the operations that reduce the size of intermediate results
• Perform SELECT and PROJECT operations as early as possible to reduce the number of
tuples and attributes
• The SELECT and JOIN operations that are most restrictive should be executed before
other similar operations

60
Choice of Query Execution Plans
1. Alternatives for Query Evaluation: An execution plan for a relational
algebra expression represented as a query tree includes information
about the access methods available for each relation as well as the
algorithms to be used in computing the relational operators
represented in the tree.
2. Nested Subquery Optimization : The process of removing the nested
query and converting the outer and inner query into one block is called
unnesting. Whenever possible, query optimizer tries to convert
queries with nested subqueries into a join operation.
3. Subquery (View) Merging Transformation : There are instances where
a subquery appears in the FROM clause of a query and amounts to
including a derived relation, which is similar to predefined view that is
involved in the query. This FROM clause subquery is often referred to as
inline view. In such cases, the transformation of the query can be 61
Choice of Query Execution Plans
• -referred to as a view-merging or subquery merging transformation.
• 4. Materialized Views : A view is defined in the database as a query, and
a materialized view stores the results of that query. Using materialized
views. A materialized view may be stored temporarily to allow more
queries to be processed against it or permanently.
• A materialized view constitutes derived data because its content can be
computed as a result of processing the defining query of the materialized
view. The main idea behind materialization is that it is much cheaper to
read it when needed and query against it than to recognize it from
scratch. The savings can be significant when the view involves costly
operations like join, aggregation and so forth.

62
Choice of Query Execution Plans
• Use of Selectivities in Cost Based Optimization: A query optimizer does not
depend solely on heuristic rules or query transformations: it also eliminates
and compares the costs of executing a query using different execution
strategies and algorithms, and it then chooses the strategy with the lower cost
estimates. For this approach to work, accurate cost estimates are required so
that different strategies can be compared fairly and realistically.

63

You might also like