[go: up one dir, main page]

0% found this document useful (0 votes)
8 views215 pages

Module 3-2

Module 3 covers SQL Data Manipulation Language (DML) and physical data organization, detailing SQL queries, integrity constraints, and various SQL operations such as SELECT, UNION, and INTERSECT. It explains how to manage data retrieval, including handling duplicates, using aliases, and employing nested queries. Additionally, it discusses the ordering of query results and the use of arithmetic and pattern matching operators.

Uploaded by

adsaj470
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views215 pages

Module 3-2

Module 3 covers SQL Data Manipulation Language (DML) and physical data organization, detailing SQL queries, integrity constraints, and various SQL operations such as SELECT, UNION, and INTERSECT. It explains how to manage data retrieval, including handling duplicates, using aliases, and employing nested queries. Additionally, it discusses the ordering of query results and the use of arithmetic and pattern matching operators.

Uploaded by

adsaj470
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 215

MODULE 3: SQL DML (Data

Manipulation Language),
Physical Data Organization
SYLLAB
US
• SQL DML (Data Manipulation Language)
▫ SQL queries on single and multiple queries
tables,
and Nested
non-correlated), Aggregation and grouping,
(correlated
Views, assertions, Triggers, SQL data types.
• Physical Data Organization
▫ Review of terms: physical and logical records, blocking factor,
pinned and unpinned organization. Heap files,
Indexing, Singe level indices, numerical examples,
Multi-level-indices, numerical examples, B-Trees & B+-Trees
(structure only, algorithms not required), Extendible
Hashing, Indexing on multiple keys – grid files
Data-manipulation
language(DML)
• The SQL DML provides the ability to query
information from the database and to insert
tuples into, delete tuples from, and modify
tuples in the database.
▫ Integrity
□ The SQL DDL includes commands for specifying
integrity constraints that the data stored in the
database must satisfy. Updates that violate
integrity constraints are disallowed.
▫ View definition
□ The SQL DDL includes commands for defining
views.
▫ Transaction control
Basic Retrieval Queries in
SQL
• SQL has one basic statement for retrieving information from a
database; the SELECT statement
• This is not the same as the SELECT operation of the relational
algebra
• Important distinction between SQL and the formal relational
model;
▫ SQL allows a table (relation) to have two or more tuples that are
identical in all their
attribute values
▫ Hence, an SQL relation (table) is a multi-set (sometimes called a
bag) of tuples;
▫ it is not a set of tuples SQL relations can be constrained to be sets by
specifying PRIMARY KEY or UNIQUE attributes, or by using the
DISTINCT option in a query
SELEC <attribute
T list>
FROM <table list>
• <attribute
WHER <condition>
list>
E ▫is a list of attribute names whose values
are to be retrieved by the query
• <table list>
▫is a list of the relation names required to
process the query
• <condition>
▫is a conditional (Boolean) expression that
identifies the tuples to be retrieved by
the query
PREPARED BY SHARIKA T
R,
SNGCE
QO. Retrieve the birth date andaddress of the
employee(s) whose name is ‘John B.
Smith’.
Q1. Retrieve the name and address of all
employees who work for the ‘Research’
department.
Q 2. For every project located in ‘Stafford’, list the
project number, the controlling department number,
and the department manager’s last name, address,
and birth date.
PREPARED BY SHARIKA T R

PREPARED BY SHARIKA T
R,
SNGCE
Ambiguous Attribute
Names, Aliasing,
Renaming, and Tuple
Variables
• In SQL, we can use the same name for two (or
more) attributes as long as the attributes are in
different relations
• a multitable query refers to two or more attributes
with the same name, we must qualify the attribute
name with the relation name to prevent ambiguity
• This is done by prefixing the relation name to the
attribute name and separating the two by a
Suppose
Dno and Lname attributes of
EMPLOYEE relation were
called Dnumber and Name and
the Dname attribute was called
Name. To prevent ambiguity Q1
should be rephrased as Q1A
Q8. For each employee, retrieve
the eemployee’s
first and last name and the first and last
name of his or her immediate
supervisor.
• Alternative relation names E and S are called
aliases or tuple variables, for the EMPLOYEE
relation.
• An alias follow the keyword AS
• It is also possible to rename the relation
attributes within the query in SQL by giving
them aliases.
Unspecified WHERE Clause and Use of the
Asterisk

missing WHERE clause indicates no condition
on tuple selection;
▫hence, all tuples of the relation specified in the
FROM sclause qualify and are selected for the query
result
• If more than one relation is specified in the
FROM clause and there is no WHERE clause,
then the CROSS PRODUCT all possible tuple
combinations of these relations is selected
Q 9 and 10. Select all EMPLOYEE Ssns(Q 9)
and all combinations of EMPLOYEE Ssn and
DEPARTMENT Dname (Q10) in the
database.
• To retrieve all the attribute values of the
selected tuples, we do not have to list the
attribute names explicitly in SQL;
• we just specify an asterisk (*), which
stands for all the attributes
Q1C: retrieves all the attribute values of
any EMPLOYEE who works in DEPARTMENT
number
5
Q1D: Retrieves all the attributes of an EMPLOYEE and
the attributes of the DEPARTMENT in which he or she
works for every employee of the ‘Research’
department
Q10A: specifies the CROSS
PRODUCT of the EMPLOYEE and
DEPARTMENT
relations
Tables as Sets in
SQL
• SQL usually treats a table not as a set but rather as a multiset;
▫ duplicate tuples can appear more than once in a table, and
in the result of a
query.
• SQL does not automatically eliminate duplicate tuples in the
results of queries, for the following reasons
▫ Duplicate elimination is an expensive operation.
One way to implement it is to sort the tuples first and then
eliminate duplicates.
▫ The user may want to see duplicate tuples in the result of
a query.
▫ When an aggregate function is applied to tuples, in most
cases we do not want to eliminate duplicates
DISTINCT
Keyword
• to eliminate duplicate tuples from the result of
an SQL querys we use the keyword DISTINCT
in the SELECT clause
• only distinct tuples should remain in the result
• a query with SELECT DISTINCT eliminates
duplicates, whereas a query with SELECT ALL
does not.
• SELECT with neither ALL nor DISTINCT is
equivalent to SELECT ALL
Q11 retrieves the salary of
every employee without
distinct
Q11A :retrieves the salary of every
employee using keyword DISTINCT
EXCEPT and
INTERSECT
• set union (UNION), set difference (EXCEPT),
and set intersection (INTERSECT)
operations.
• The relations resulting from these set operations are sets
of tuples; that is, duplicate tuples are eliminated from
the result.
• These set operations apply only to union-compatible
relations, so we must make sure that the two relations
on which we apply the operation have the same
attributes and that the attributes appear in the same
order in both relations
Q4. Make a list of all project numbers for projects that
involve an employee whose last name is ‘Smith’,
either as a worker or as a manager of the department
that controls the project

The first SELECT query retrieves the projects that involve a ‘Smith’ as
manager of the department that controls the project, and the second
retrieves the projects that involve a ‘Smith’ as a worker on the project.
Notice that if several employees have the last name ‘Smith’, the project
names involving any of them will be retrieved.
Applying the UNION operation to the two SELECT queries gives the
desired result.
UNION
ALL
• The UNION ALL command combines the result
set of two or more SELECT statements (allows
duplicate values).
• The following SQL statement returns the cities
(duplicate values also) from both the "Customers"
and the "Suppliers" table:
This SQL UNION ALL example would return the supplier_id multiple
times in the result set if that same value appeared in both the
suppliers and orders table. The SQL UNION ALL operator does not
remove duplicates. If you wish to remove duplicates, try using the
UNION operator.
SNGC
E
INTERSECT
Operator
• INTERSECT operator is
used to return the records
that are in common
between two SELECT
statements or data sets.
• If a record exists in one
query and not in the
other, it will be omitted
from the INTERSECT
results.

EXCEP
T• The SQL EXCEPT clause/operator is used to
combine two SELECT statements and returns
rows from the first SELECT statement that are
not returned by the second SELECT statement.
▫This means EXCEPT returns only rows, which
are not available in the second SELECT
statement.
• Just as with the UNION operator, the same
rules apply when using the EXCEPT operator.
Substring Pattern Matching
and Arithmetic Operators
• LIKE comparison operator
▫ This can be used for string pattern matching
▫ Partial strings are specified using two reserved
characters
□ % replaces an arbitrary number of zero or more
characters, and
□ the underscore (_) replaces a single character
Q 12. Retrieve all employees whose
address is in Houston, Texas.
Q 12A. Find all employees who were born
during the 1950s.

• To retrieve all employees who were born


during the 1950s,
• Here, ‘5’ must be the third character of
the string (according to our format for
date),
• so we use the value ‘_ _ 5 _ _ _ _ _ _ _’, with
each underscore serving as a placeholder
• If an underscore or % is needed as a literal
character in the string, the character should be
preceded by an escape character, which is
specified after the string using the keyword
ESCAPE.
• For example, ‘AB\_CD\%EF’ ESCAPE ‘\’ represents
the literal
• string ‘AB_CD%EF’ because \ is specified as the
escape character.
• Any character not used in the string can be
chosen as the escape character.
• Also, we need a rule to specify apostrophes
or single quotation marks (‘ ’) if they are to be
included in a string because they are used
to begin and end strings.
• If an apostrophe (’) is needed, it is represented as

The standard arithmetic operators for addition
(+), subtraction (–), multiplication (*), and
division (/) can be applied to numeric values or
attributes with numeric domains
Q13. Show the resulting salaries if every
employee working on the ‘ProductX’ project is
given a 10 percent raise.

suppose that we want to see the effect of giving all employees who work
on the ‘ProductX’ project a 10 percent raise; we can issue Query 13 to
see what their salaries would become. This example also shows how we
can rename an attribute in the query result using AS in the SELECT
clause.
• For string data types, the concatenate
operator || can be used in a query to
append two string values.
• For date, time, timestamp, and interval
data types, operators include
incrementing (+) or decrementing (–) a
date, time, or timestamp by an interval.
• In addition, an interval value is the result
of the difference between two date, time,
or timestamp values.
• Another comparison operator, which
can be used for convenience, is
BETWEEN
Q 14. Retrieve all employees in
department 5 whose salary is between
$30,000 and $40,000.

The condition (Salary BETWEEN 30000 AND


40000) in Q14 is equivalent to the condition
((Salary >= 30000) AND (Salary <= 40000))
Ordering of Query
Results
• The ORDER BY statement in sql is used to sort the
fetched data in either ascending or descending
according to one or more columns.
▫ By default ORDER BY sorts the data in ascending
order.
▫ We can use the keyword DESC to sort the data in
descending order and the keyword ASC to sort in
ascending order.
Q. Fetch all data from the table Student
and sort the result in ascending order
according to the column ROLL_NO

SELECT * FROM Student ORDER BY ROLL_NO ASC;

Student Table
SNGC
E
Sort according to multiple columns
Q. Fetch all data from the table Student and then
sort the result in ascending order first according to
the column Age. and then in descending order
according to the column ROLL_NO.
Q15. Retrieve a list of employee and the
projects they are working on,ordered by
department and, within each department,
ordered alphabetically by last name, then
first name.

if we want descending alphabetical order on Dname and ascending order on


Lname, Fname, the ORDER BY clause of Q15 can be written as
NESTING OF
QUERIES
• A complete SELECT query, called a nested query
, can be specified within the WHERE-clause of
another query, called the outer query
• Many of the previous queries can be
specified in an alternative form using
nesting
Q1: SELECT FNAME,LNAME,
ADDRESS FROM EMPLOYEE
WHERE DNO IN (SELECT
DNUMBER FROM DEPARTMENT
WHERE DNAME=’Research’)

• The nested query selects the number of the 'Research'


department
• The outer query select an EMPLOYEE tuple if its DNO
value is in the result of either nested query
• The comparison operator IN compares a value v with a set
(or multi- set) of values V, and evaluates to TRUE if v is one
of the elements in V
• In general, we can have several levels of nested queries
• A reference to an unqualified attribute refers to
the relation declared in the innermost nested query
• In this example, the nested query is not correlated
Q. Returns the names of
employees whose
salary is greater than the salary of
all theyees
emplo
in
department 5

The keyword ALL can be combined with each of >, >=, <, <=, and
<>, these operators. For example, the comparison condition (v > ALL
V) returns TRUE if the value v is greater than all the values in the set
(or multiset) V.
Q . Select the Essns of all employees who
work the same (project,
hours)combination on some project that
employee ‘John Smith’ (whose Ssn
=‘123456789’) works on.

In this example, the IN operator compares the subtuple of


values in parentheses (Pno, Hours) within each tuple in
WORKS_ON with the set of type-compatible tuples
produced by the nested query.
Q 16. Retrieve the name of each
employee who has a dependent with the
same first name and is the same sex as
the employee.

It is generally advisable to create tuple variables


(aliases) for all the tables referenced in an SQL query
to avoid potential errors and ambiguities
CORRELATED NESTED
QUERIES
• If a condition in the WHERE-clause of a nested
query references an attribute of a relation
declared in the outer query , the two queries
are said to be correlated
• The result of a correlated nested query is different
for each tuple (or combination of tuples) of the
relation(s) the outer query
Query: Retrieve the name of each
employee who has a dependent with the
same first name as the employee.

Here the nested query has a different result for


each tuple in the outer query.
A query written with nested SELECT... FROM...
WHERE... blocks and using the = or IN
comparison operators can alwaysbe expressed as
A Simple Retrieval Query in
SQL
• A simple retrieval query in SQL can consist of
up to four clauses,
▫ but only the first two SELECT and
FROM are mandatory
▫ the clauses between square brackets
[ ... ] being optional:
• The SELECT-clause lists the attributes or
functions to be retrieved
• The FROM-clause specifies all relations (or aliases)
needed in the query but not those needed in nested
queries
• The WHERE-clause specifies the conditions for
selection and join of tuples from the relations
specified in the FROM-clause
• GROUP BY specifies grouping attributes
• HAVING specifies a condition for selection of groups
• ORDER BY specifies an order for displaying the
result of a query
• A query is evaluated by first applying the WHERE-
clause, then GROUP BY and HAVING, and finally the
SELECT-clause
• There are three SQL commands to modify the
The EXISTS and UNIQUE Functions
in

SQL
EXISTS function in SQL is used to check whether
the result of a correlated nested query is empty
(contains no tuples) or not
• The result of EXISTS is a Boolean value
▫TRUE if the nested query result contains at least
one tuple, or
▫FALSE if the nested query result contains no tuples.
SELECT E.Fname,
E.Lname FROM
EMPLOYEE AS E
WHERE EXISTS
( SELECT *
FROM DEPENDENT AS
D
WHERE E.Ssn=D.Essn
AND E.Sex=D.Sex
);
OUTPU
T
• EXISTS(Q) returns TRUE if there is at least one
tuple in the result of the nested query Q, and it
returns FALSE otherwise.
• On the other hand, NOT EXISTS(Q) returns TRUE
if there are no tuples in the result of nested query
Q, and it returns FALSE otherwise
Q. Retrieve the names of employees
who have no dependents.
OUTPU
T
Aggregate Functions in
SQL
• Aggregate functions are used to summarize
information from multiple tuples into a single-
tuple summary.
• Grouping is used to create sub-groups of
tuples before summarization.
• A number of built-in aggregate functions exist:
1. COUNT,
2. SUM,
3. MAX,
4. MIN, and
5. AVG
• COUNT function returns the number of tuples or
values as specified in a query.
• The functions SUM, MAX, MIN, and AVG can be
applied to a set or multiset of numeric values and
return, respectively, the sum, maximum value,
minimum value, and average (mean) of those
values
• These functions can be used in the SELECT
clause or in a HAVING clause
Q19. Find the sum of the salaries
of all employees, the maximum
salary,
the minimum salary, and the
average salary.
Q 20. Find the sum of the salaries of all
employees of the
‘Research’department, as well as the
maximum salary, the
minimum salary, and the average
salary in this
department.
Q21. Retrieve the total
number of employees in the
company

SELECT
COUNT(*)
FROM
EMPLOYEE;
Q22. The number of employees in the
‘Research’ department

Here the asterisk (*) refers to the rows


(tuples), so COUNT (*) returns the
number of rows in the result of the
query. We may also use the COUNT
function to count values in a column
rather
than tuples
Query 23. Count the number of distinct
salary values in the database.

If we write COUNT(SALARY)
instead of COUNT(DISTINCT
SALARY) then
duplicate values will not be
eliminated. However, any tuples with
NULL for SALARY will not be
counted. In general, NULL values are
discarded when aggregate functions
are applied to a particular column
Q. Retrieve the names of all
employees who have two or
more dependents

The correlated nested query counts the


number of dependents that each
employee
has; if this is greater than or equal to
two, the employee tuple is selected.
Grouping: The GROUP BY and HAVING
Clauses
• In many cases we want to apply the aggregate functions to
subgroups of tuples in a relation, where the subgroups are
based on some attribute values.
• For example,
▫ we may want to find the average salary of employees in each
department or
▫ the number of employees who work on each
• In these
project.
cases we need to partition the relation into
nonoverlapping subsets (or groups) of tuples.
• Each group (partition) will consist of the tuples that have the
same value of some attributes called the grouping attributes.
GROUP
BY The GROUP BY clause specifies the grouping

attributes, which should also appear in the


SELECT clause,
• so that the value resulting from applying each
aggregate function to a group of tuples appears
along with the value of the grouping attributes
Q24. For each department, retrieve
the department number, the
number of employees in the
department, and their average
salary

the EMPLOYEE tuples are partitioned into groups—each group having


the same value for the grouping attribute Dno. Hence, each group
contains the employees who work in the same department.
If NULLs exist in the grouping attribute, then a separate group is
created for all tuples with a NULL value in the grouping attribute.
For example, if the EMPLOYEE table had some tuples that had
NULL for the grouping attribute Dno, there would be a separate
group for those tuples in the result of Q24
Query 25. For each project, retrieve the
project number, the project name,
and the number of employees
who work on that project.
HAVING
clause
• In GROUP BY the grouping and functions are
applied after the joining of the two relations.
• Sometimes we want to retrieve the values of these
functions only for groups that satisfy certain conditions
• HAVING clause, which can appear in
conjunction with a GROUP BY clause
• HAVING provides a condition on the summary
information regarding the group of tuples associated
with each value of the grouping attributes.
• Only the groups that satisfy the condition are
retrieved in the result of the query.
Q 26. For each project on which more than two
employees work, retrieve
the project number, the project name,
and the number of employees who
work
on the project.
After applying the WHERE clause but
before applying HAVING
After applying the HAVING
clause condition
Query 27. For each project, retrieve the
project number, the project name, and
the number of employees from
department 5 who work on the project.
SQL
JOIN
• How do I get data from multiple tables?
▫ A SQL JOIN combines records from two
tables.
▫ A JOIN locates related column values in the
two tables.
▫ A query can contain zero, one, or
multiple JOIN operations.
▫INNER JOIN is the same as JOIN; the keyword
INNER is optional
• Four different types of
JOINs
1. (INNER) JOIN:
□ Select records that matching values
have both tables. in
2. FULL (OUTER)
JOIN:
□ Selects all records match either left
that right table or
records.
3. □LEFT (OUTER)
Select records from thefirst (left-
JOIN: most) table with matching
right table records.
4. RIGHT (OUTER) JOIN:
□ Select records from the
second
JOIN
Syntax
• The general
syntax is
FROM table-
SELECT INNER
column-names JOIN
name1 table- column-name1
name2 ON =
column-name2
WHERE condition
• The INNER keyword is optional: it is the default
as well as the most commonly used JOIN
operation.
Example
• Problem: List all orders with customer information
SELECT OrderNumber,TotalAmount,FirstName, LastName,City,
Country
FROM Order JOIN Customer
ON Order.CustomerId = Customer.Id
SQL LEFT
JOIN
• A LEFT JOIN performs a join starting with the
first (left- most) table.
• Then, any matched records from the second
table (right- most) will be included.
• LEFT JOIN and LEFT OUTER JOIN are the
same.
The SQL LEFT JOIN
syntax
• The general LEFT JOIN syntax is
SELECT column-names
FROM table-name1 LEFT JOIN table-
name2 ON column-name1 = column-
name2
WHERE condition
• The general LEFT OUTER JOIN syntax is
SELECT column-names
FROM table-name1 LEFT OUTER
JOIN table-name2 ON column-
SQL LEFT JOIN
Example

• Problem: List all customers and the total amount


they spent irrespective whether they placed any
ordersOrderNumber,TotalAmount,FirstName,
SELECT or not. LastName,City,
Country
FROM Customer C LEFT JOIN
[Order] O ON O.CustomerId=
C.Id ORDER BY TotalAmount
• The ORDER BY TotalAmount shows the customers
without orders first (i.e. TotalMount is NULL).
SQL RIGHT
JOIN
• A RIGHT JOIN performs a join starting with the
second (right-most) table and then any matching
first (left-most) table records. RIGHT JOIN and
RIGHT OUTER JOIN are the same.
The SQL RIGHT JOIN
syntax
• The general syntax is
SELECT column-names
FROM table-name1 RIGHT JOIN
table-name2 ON column-name1 =
column-name2
WHERE condition
The general RIGHT OUTER JOIN syntax
is:
SELECT column-names
FROM table-name1 RIGHT OUTER JOIN
table-name2 ON column- =
SQL RIGHT JOIN
Examples

• Problem: List customers that have not placed


orders SELECT TotalAmount, FirstName,
LastName, City, Country
FROM [Order] O RIGHT JOIN Customer
C ON O.CustomerId = C.Id
WHERE TotalAmount IS
NULL
SQL FULL JOIN
Statement
• FULL JOIN returns all matching records from
both tables whether the other table matches or
not.
• Be aware that a FULL JOIN can potentially
return very large datasets.
• These two: FULL JOIN and FULL OUTER JOIN
are the same.
The SQL FULL JOIN
syntax
• The general syntax is:
SELECT column-names
FROM table-name1 FULL JOIN table-
name2 ON column-name1 = column-
name2
WHERE condition
• The general FULL OUTER JOIN syntax is:
SELECT column-names
FROM table-name1 FULL OUTER
JOIN table-name2 ON column-
SQL FULL JOIN
Examples

• Problem: Match all customers and suppliers by


country SELECT C.FirstName,
C.LastName, C.Country AS
CustomerCountry,
FROM S.Country
Customer CAS SupplierCountry,
FULL JOIN Supplier
S.CompanyName S
ON C.Country =
S.Country ORDER BY
C.Country, S.Country
• This returns suppliers that have no customers in their
country and customers that have no suppliers in their
country, and customers and suppliers that are from the
same country.
Views (Virtual Tables) in
SQL
• A view in SQL terminology is a single table that is
derived from other tables
• These other tables can be base tables or previously
defined views
• A view does not necessarily exist in physical form;
▫ it is considered to be a virtual table, in contrast to
base tables,
▫ whose tuples are always physically stored in the
database.
• This limits the possible update operations that can be
applied to views, but it does not provide any limitations
• We can think of a view as a way of
specifying a table that we need to
reference frequently, even though it
may not exist physically.
• Views, which are a type of virtual
tables allow users to do the following
▫Structure data in a way that users or
classes of users find natural or
intuitive.
▫Restrict access to the data in such a way
that a user can see and (sometimes)
modify exactly what they need and no
more.
Exampl
e• In COMPANY database we may frequently issue queries that
retrieve the employee name and the project names that the
employee works on.
• Rather than having to specify the join of the three tables
EMPLOYEE, WORKS_ON, and PROJECT every time we
issue this query, we can define a view that is specified as
the result of these joins.
• Then we can issue queries on the view, which are specified as
single table retrievals rather than as retrievals involving two
joins on three
tables.
• We call the EMPLOYEE, WORKS_ON, and PROJECT
tables the definingtables of the view.
Specification of Views in
SQL
• In SQL, the command to specify a view is CREATE
VIEW.
• The view is given a (virtual) table name (or view
name), a list of attribute names, and a query to specify
the contents of the view.
• If none of the view attributes results from applying
functions or arithmetic operations,
▫ we do not have to specify new attribute names for
the view,
▫ since they would be the same as the names of the
attributes of the defining tables in the default case
PREPARED BY SHARIKA T
R,
SNGCE

V
1
V
2
• We can specify SQL queries on a view in the
same way we specify queries involving base
tables.
• For example,
▫ to retrieve the last name and first name of all
employees who work on the ‘ProductX’ project,
we can utilize the WORKS_ON1 view and
specify the query as in QV1:
• A view is supposed to be always up-to-date;
▫ if we modify the tuples in the base tables on
which the view is defined, the view must
automatically reflect these changes.
• Hence, the view is not realized or materialized at the
time of view definition but rather at the time when we
specify a query on the view.
• It is the responsibility of the DBMS and not the user
to make sure that the view is kept up-to-date
DROP
•VIEW
If we do not need a view any more, we can use
the DROP VIEW command to dispose of it.
• For example, to get rid of the view V1, we can use
the SQL statement in V1A:
View Implementation, View Update, and
Inline
View• squery modification
▫ involves modifying or transforming the view
query (submitted by the user) into a query on
the underlying base tables.
▫ For example, the query QV1 would be
automatically modified to the following
query by the DBMS

▫ The disadvantage of this approach is that it is


inefficient for views defined via complex
queries that are time- consuming to execute,
▫ especially if multiple queries are going to be
applied to the same view within a short period
View
Materialization
• involves physically creating a temporary view table when the
view is first queried and keeping that table on the assumption
that other queries on the view will follow
• an efficient strategy for automatically updating the view table
when the base tables are updated must be developed in order to
keep the view up-to- date.
• Techniques using the concept of incremental update have been
developed for this purpose,
▫ where the DBMS can determine what new tuples must be
inserted, deleted, or modified in a materialized view table when
a database update is applied to one of the defining base tables

The view is generally kept as a materialized
(physically stored) table as long as it is being
queried.
• If the view is not queried for a certain period of
time, the system may then automatically remove
the physical table and recompute it from scratch
when future queries reference the view
Updating of views is complicated and can
be ambiguous
• an update on a view defined on a single table
without any aggregate functions can be mapped
to an update on the underlying base table under
certain conditions.
• For a view involving joins,
▫ an update operation may be mapped to
update operations on the underlying base
relations in multiple ways.
• Hence, it is often not possible for the DBMS to
determine which of the updates is intended.
• consider the WORKS_ON1 view, and suppose that
we issue the command to update the PNAME
attribute of ‘John Smith’ from ‘ProductX’ to
‘ProductY’.
• This view update is shown in UV1
• a view update is feasible when only one
possible update on the base relations can
accomplish the desired update effect on
the view.
• Whenever an update on the view can be
mapped to more than one update on the
underlying base relations,
▫we must have a certain procedure for
choosing one of the possible updates as
the most likely one.
▫ Some researchers have
developed methods for choosing
the most likely update, while other
researchers prefer to have the user
• A view with a single defining table is updatable if
the view attributes contain the primary key of the
base relation, as well as all attributes with the
NOT NULL constraint that do not have default
values specified.
• Views defined on multiple tables using joins are
generally not updatable.
• Views defined using grouping and aggregate
functions are not updatable.
Specifying General Constraints as
Assertions in SQL
• Assertions = conditions that the database must always satisfy
• Domain constraints and referential-integrity constraints are
specific forms of assertions
• CHECK – verify the assertion on one-table, one-attribute
• ASSERTION – verify one or more tables, one or more attributes
• Some constraints cannot be expressed by using only domain
constraints or referential-integrity constraints; for example,
▫ “Every department must have at least five courses offered
every semester”

must be expressed as an assertion
Example
o specify the constraint that the salary of an employee must
T1
not be greater than the salary of the manager of the
department that the employee works for
in SQL, we can write the following assertion:.
•The constraint name SALARY_CONSTRAINT is
followed by the keyword CHECK, which is followed
by a condition in parentheses that must hold true o n
every database state for the assertion to be satisfied.
•The constraint name can be used later to
refer to the constraint or to modify or drop
it.
•constraints can be specified using the EXISTS
and NOT EXISTS style of SQL conditions.
•Whenever some tuples in the database cause the
condition of an
ASSERTION statement to evaluate to FALSE, the
constraint is violated
TRIGGE
R A trigger is a stored procedure in database which

automatically invokes whenever a special event in the


database occurs.
• action to be taken when certain events occur
and when certain conditions are satisfied.
• For example,
▫ a trigger can be invoked when a row is inserted into a
specified table or when certain table columns are being
updated.
▫ it may be useful to specify a condition that, if
violated, causes some user to be informed of the
Benefits of
Triggers
• Generating some derived column values
automatically
• Enforcing referential integrity
• Event logging and storing information on table
access
• Auditing
• Synchronous replication of tables
• Imposing security authorizations
• Preventing invalid transactions
Creatin
g
Trigger
s
• CREATE [OR REPLACE] TRIGGER
trigger_name
▫ Creates or replaces an existing trigger
with the trigger_name.
• {BEFORE | AFTER | INSTEAD OF}
▫ This specifies when the trigger will be
executed.
▫ The INSTEAD OF clause is used for creating
trigger on a view.
• {INSERT [OR] | UPDATE [OR] | DELETE}
▫ This specifies the DML operation.
• [OF col_name]
▫ This specifies the column name that will be
updated.
• [ON table_name]
• [REFERENCING OLD AS o NEW AS n]
▫ This allows you to refer new and old values for
various DML statements, such as INSERT,
UPDATE, and DELETE.
• [FOR EACH ROW]
▫ This specifies a row-level trigger, i.e., the
trigger will be executed for each row being
affected.
▫ Otherwise the trigger will execute just once when
the SQL statement is executed, which is called a
table level trigger.
• WHEN (condition)
▫ This provides a condition for rows for which
the trigger would fire.
▫ This clause is valid only for row-level triggers.
Exampl
e • creates a row-level trigger
for the customers table
that would fire for
INSERT or UPDATE or
DELETE operations
performed on the
CUSTOMERS table.
• This trigger will display
the salary difference
between the old values
and new values
• The following points need to be considered
here −
▫ OLD and NEW references are not available for
table- level triggers, rather you can use them
for record-level triggers.
▫ If you want to query the table in the same
trigger, then you should use the AFTER
keyword,
□ because triggers can query the table or change
it again only after the initial changes are
applied and the table is back in a consistent
state.
▫ The above trigger has been written in such a
way that it will fire before any DELETE or
INSERT or UPDATE operation on the table,
□ but you can write your trigger on a
Triggering a
Trigger
• Here is one INSERT statement, which will create a new
record in the table

• When a record is created in the CUSTOMERS table, the


above create trigger, display_salary_changes will be fired
and it will display the following result

Because this is a new record, old salary is


not available and the above result comes
as null.
• The UPDATE statement will update an
existing record in the table

• When a record is updated in the


CUSTOMERS table, the above create
trigger, display_salary_changes will be
fired and it will display the following
result
Consider the relational database, where the primary keys are
underlined. Give an expression in SQL for each of the
following queries: Employee(person_name,street,city)
Works(person_name,company_name,salary)
Company(company_name,city)
Manages(person_name,manager_name)
1.Find the names of all employees who work for the SB
Corporation.
2.Find the names of all employees who live in the same city and
on the same street as do their managers.
3.Find the names, street address, and cities of residence of all
employees who work for First Bank Corporation and earn
more than $20,000 per annum.
4.Find the names of all employees in this database who live in
the same city as the company for which they work.
• Select person_name from Works Where company_name=’SB Corporation'

• SELECT e.person_name FROM Employee e, Employee m, Manages mng WHERE e.street =


m.street AND e.city = m.city AND mng.person_name = e.person_name AND
mng.manager_name = m.person_name

• SELECT e.person_name, e.street, e.city FROM Employee e, Works w WHERE e.person_name


= w.person_name AND w.company_name = 'First Bank Corporation' AND w.salary > 20000

• SELECT e.person_name FROM Employee e, Works w, Company c WHERE e.city = c.city AND
w.company_name = c.company_name AND e.person_name = w.person_name
SYLLAB
US
• SQL DML (Data Manipulation Language)
▫ SQL queries on single and multiple queries
tables,
and Nested
non-correlated), Aggregation and grouping,
(correlated
Views, assertions, Triggers, SQL data types.
• Physical Data Organization
▫ Review of terms: physical and logical records, blocking factor,
pinned and unpinned organization. Heap files,
Indexing, Singe level indices, numerical examples,
Multi-level-indices, numerical examples, B-Trees & B+-Trees
(structure only, algorithms not required), Extendible
Hashing, Indexing on multiple keys – grid files
Fixed and Variable length
records
• A file is a sequence of records.
• In many cases, all records in a file are of the
same record type.
• If every record in the file has exactly the
same size (in bytes), the file is said to be
made up of fixed-length records.
• If different records in the file have different sizes,
the file is said to be made up of variable-length
records.
Spanned and Unspanned
Organization
•Part of the record can be stored on one block and
the rest on another.
• A pointer at the end of the first block points to
the block containing the remainder of the
record.
• This organization is called spanned because
records can span more than one block.
• Whenever a record is larger than a block, we
must use a spanned organization.
• If records are not allowed to cross block
Blocking factor for the
file
• The records of a file must be allocated to disk blocks
because a block is the unit of data transfer between
disk and memory.
• When the block size is larger than the record size, each
block will contain numerous records, although some
files may have unusually large records that cannot fit
in one block.
• Suppose that the block size is B bytes.
• For a file of fixed-length records of size R bytes, with B ≥
R, we can fit bfr =B / R records per block.
• The value bfr is called the blocking factor for the file.
Index Structures
• Indexing is a data structure technique to
efficiently retrieve records from the database
files based on some attributes on which the
indexing has been done.
• An index on a database table provides a
convenient mechanism for locating a row
(data record) without scanning the entire
table and thus greatly reduces the time it
takes to process a query.
• The index is usually specified on one field of
the file.
• One form of an index is a file of entries
<field value, pointer to record>, which is
ordered by field value.
Types of
index
• Indexes can be characterized as
1. Dense index
2. Sparse index

A dense index has an index entry for
every search key value (and hence every
record) in the data file.

A sparse (or nondense) index, on the
other hand, has index entries for only
some of the search values.
• Advantages:
▫Stores and organizes data into computer
files.
▫Makes it easier to find and access data
at any given time.
▫It is a data structure that is added to
a file to provide faster access to the
data.
▫It reduces the number of blocks that the
DBMS has to check.
• Disadvantages
▫Index needs to be updated periodically for
insertion or deletion of records in the
Structure of
index
• An index is a small table having only two
columns.
• The first column contains a copy of the
primary or candidate key of a table
• The second column contains a set of pointers
holding the address of the disk block where that
particular key value can be found.
• If the indexes are sorted, then it is called as
ordered indices.
Exampl
e• Suppose we have an ordered file with 30,000
records and stored on a disk of block size 1024
bytes and records are of fixed size, unspanned
organisation. Record length = 100 bytes. How
many block access needed to search a record?
Types of Indexes
Primary
Index
• Primary index is defined on an ordered data file.
The data file is ordered on a key field.
• The key field is generally the primary key of the
relation.
• The first record in each block of the data file is called the
anchor record of the block or the block anchor.
• A primary index is a nondense (sparse) index, since
it includes an entry for each disk block of the data
file and the keys of its anchor record rather than
for every search value.
Figure 5.1: Primary Index on the Ordering Key Field
of the File
Exampl
e• Suppose we have an ordered file with 30,000
records and stored on a disk of block size 1024
bytes and records are of fixed size, unspanned
organisation.
• Record length = 100 bytes. How many block
access if using a primary index file, with an
ordering key field of the file 9 bytes and block
pointer size 6 bytes.
Clustering
Index
• Defined on an ordered data file.
• The data file is ordered on a non-key field unlike
primary index, which requires that the ordering field
of the data file have a distinct value for each record.
• Includes one index entry for each distinct value of the
field;
▫ the index entry points to the first data block that
contains records with that field value.
• It is another example of nondense index where
Insertion and Deletion is relatively straightforward
with a clustering index.
Figure 5.2: A Clustering Index on Dept_number Ordering Nonkey
Field of a File
Secondary
Index
• A secondary index provides a secondary means of
accessing a file for which some primary access
already exists.
• The secondary index may be on a field which is a
candidate key and has a unique value in every
record, or a non-key with duplicate values.
• The index is an ordered file with two fields.
• The first field is of the same data type as some non-
ordering field of the data file that is an indexing
field.
• The second field is either a block pointer or a
record pointer.
• There can be many secondary indexes (and
hence, indexing fields) for the same file.
• Includes one entry for each record in the data file;
Exampl
e• Suppose we have an ordered file with 30,000
records and stored on a disk of block size 1024
bytes and records are of fixed size, unspanned
organisation. Record length = 100 bytes. How
many block access if using a secondary index file.
Figure 5.3: Dense Secondary Index (with Block Pointer) on
a Non Ordering Key Field of the File
Figure 5.4: Secondary Index (with Record Pointer) on a Non Key Field implemented
using one level of indirection so that Index entries are of Fixed Length and have unique
field values
Single level and Multi-level
indexing
• Because a single-level index is an ordered
file, we can create a primary index to the
index itself;
• In this case, the original index file is called
the first- level index and the index to the
index is called the second-level index.
• We can repeat the process, creating a third,
fourth, ..., top level until all entries of the top
level fit in one disk block.
• A multi-level index can be created for any type
of first- level index (primary, secondary,
Figure 5.5: Two-Level Primary
Search Trees
• A search tree is slightly different from a
multilevel index.
• A search tree of order p is a tree such that each
node contains at most p − 1 search values and
p pointers in the order <P1, K1, P2, K2, ...,
Pq−1, Kq−1, Pq>, where q
≤ p.
• Each Pi is a pointer to a child node (or a
NULL pointer), and each Ki is a search
value from some ordered set of values.
• Two constraints must hold at all times on the
search tree:
1. Within each node, K1 < K2 < ... < Kq−1.
2. For all values X in the subtree pointed at
Figure 5.6: A node in a search tree with pointers to subtrees below
it
B-
Trees
•The B-tree has additional constraints that
ensure that the tree is always balanced.
• A B-tree of order p, when used as an access
structure on a key field to search for records in
a data file, can be defined as follows:
1. Each internal node in the B-tree is of the
form
□ <P1, <K1, Pr1>, P2, <K2, Pr2>, ..., <Kq–1,
Prq–1>,
Pq> where q ≤ p. Each Pi is a tree pointer—
a pointer to another node in the Btree. Each
Pri is a data pointer—a pointer to the record
whose search key field value is equal to Ki.
2. Within each node, K1 < K2 < ... < Kq−1.
3. For all search key field values X in the subtree
pointed at by Pi, we have: Ki–1 < X < Ki for 1 <
i < q; X < Ki for i = 1; and Ki–1 < X for i = q.
4. Each node has at most p tree pointers.
5. Each node, except the root and leaf nodes, has
at least
┌(p/2)┐tree pointers. The root node has at least
two tree pointers unless it is the only node in
the tree.
6. A node with q tree pointers, q ≤ p, has q – 1
search key field values (and hence has q – 1
data pointers).
7. All leaf nodes are at the same level. Leaf
nodes have the same structure as internal
nodes except that all of their tree pointers Pi
are NULL
Figure 5.7: B-tree structures. (a) A node in a B-tree with q –
1 search values.
(b) A B-tree of order p = 3.The values were inserted in
the order 8, 5, 1, 7, 3, 12, 9, 6.
• A B-tree starts with a single root node (which
is also a leaf node) at level 0 (zero).
• Once the root node is full with p – 1 search
key values and we attempt to insert another
entry in the tree, the root node splits into two
nodes at level 1.
• Only the middle value is kept in the root node,
and the rest of the values are split evenly
between the other two nodes.
• When a nonroot node is full and a new
entry is inserted into it,
▫ that node is split into two nodes at the
same level, and the middle entry is moved to
the parent node along with two
pointers to the new split nodes.
• If the parent node is full, it is also split.
• If deletion of a value causes a node to be less than
half full,
▫ it is combined with its neighboring nodes,
and this can also propagate all the way to
the root.
▫Hence, deletion can reduce the number of tree
levels.
Properties of a B-
tree
• For a tree to be classified as a B-tree, it must fulfill the
following conditions:
▫ the nodes in a B-tree of order m can have a
maximum of m children
▫ each internal node (non-leaf and non-root) can have at
least (m/2) children (rounded up)
▫ the root should have at least two children – unless it’s
a leaf
▫ a non-leaf node with k children should have k-1 keys
▫ all leaves must appear on the same level
Building a B-
tree
• Since we’re starting with an empty tree, the first item
we insert will become the root node of our tree.
• At this point, the root node has the key/value pair.
• The key is 1, but the value is depicted as a star to make
it easier to represent, and to indicate it is a reference
to a record.
• The root node also has pointers to its left and right
children shown as small rectangles to the left and
right of the key.
• Since the node has no children, those pointers will
be empty for now:
We know that this tree has order of 3, so it
can have only up to 2 keys in it. So we can
add the payload with key 2 to the root node in
ascending order:

Next, if we wanted to insert 3, for us to


keep the tree balanced and fulfilling the
conditions of a B-tree we need to perform
what is called a split operation.
We can determine how to split the node by
Now, let’s insert 4. To determine where this needs to be placed
we must remember that B-trees are organized such that sub-
trees on the right have greater keys than sub-trees on the left.
Consequently, Key 4 belongs in the right sub-tree. And since
the right sub-tree still has the capacity, we can simply add 4 to
it alongside 3 in ascending order:
Our right sub-tree is now at full capacity, so to insert 5 we need to
use the same splitting logic explained above. We split the node into
two so that Key 3 goes to a left sub-tree and 5 goes to a right sub-
tree leaving 4 to be promoted to the root node alongside 2.
This rebalancing gives us space in the rightmost sub-tree
to insert 6:
Next, we try to insert 7. However, since the rightmost tree is now at
full capacity we know that we need to do another split operation and
promote one of the keys. But wait! The root node is also at full
capacity, which means that it also needs to be split.
So, we end up doing this in two steps. First, we need to split the right
nodes 5 and 6 so that 7 will be on the right, 5 will be on the left, and
6 will be promoted.
Then, to promote 6, we need to split the root node such that 4 will
become a part of new root and 6 and 2 become the parents of the
Continuing in this way, we fill the tree by adding Keys 8,9 and
10 until we get the final tree:
B+-
Trees
• In a B+-tree, data pointers are stored only at the
leaf nodes of the tree; hence, the structure of leaf
nodes differs from the structure of internal nodes.
• The leaf nodes have an entry for every value of
the search field, along with a data pointer to the
record.
• The structure of the internal nodes of a
B+ tree of order p is as follows:
1. Each internal node is of the form <P1, K1,
P2, K2, ..., Pq
– 1, Kq –1, Pq> where q ≤ p and each Pi
is a tree pointer.
2. Within each internal node, K1 < K2 < ...
< Kq−1.
3. For all search field values X in the subtree
pointed at by Pi, we have Ki−1 < X≤ Ki for 1 <
i < q; X ≤ Ki for i = 1; and Ki−1 < X for i = q.
4. Each internal node has at most p tree pointers.
5. Each internal node, except the root, has at least
┌(p/2)┐ tree pointers. The root node has at
least two tree pointers if it is an internal
• The structure of the leaf nodes of a
B+-tree of order p is as follows:
1. Each leaf node is of the form <<K1,
Pr1>, <K2, Pr2>, ..., <Kq–1, Prq–1>,
Pnext> where q ≤ p, each Pri is a data
pointer, and Pnext points to the next
leaf node of the B+-tree.
2. Within each leaf node, K1 ≤ K2 ... ,
Kq−1, q ≤ p.
3. Each Pri is a data pointer that points to
the record whose search field value is
Ki or to a file block containing the
record
4. Each leaf node has at least ┌(p/2)┐
Figure 5.8: The nodes of a B+-tree. (a) Internal node of a B+-tree
with q – 1 search values.
(b) Leaf node of a B+-tree with q – 1 search values and q – 1 data
pointers.
• When a leaf node is full and a new entry is inserted
there, the node overflows and must be split.
• The first j = ┌((pleaf + 1)/2)┐entries in the original
node are kept there, and the remaining entries are
moved to a new leaf node.
• The jth search value is replicated in the parent internal
node, and an extra pointer to the new node is created
in the parent.
• These must be inserted in the parent node in
their correct sequence.
• If the parent internal node is full, the new value will
cause it to overflow also, so it must be split.
• The entries in the internal node up to Pj—the jth
tree pointer after inserting the new value and
pointer, where j = ⎣((p + 1)/2)⎦—are kept, while the
jth search value is moved to the parent, not
replicated.
Building a
B+tree
When we come to insert Key 3, we find that in doing so we will
exceed the capacity of the root node.Similar to a normal B-tree
this means we need to perform a split operation. However,
unlike with the B-tree, we must copy-up the first key in the
new rightmost leaf node. As mentioned, this is so we can make
sure we have a key/value pair for Key 2 in the leaf nodes:
Next, we add Key 4 to the rightmost leaf node. Since it’s full,
we need to perform another a split operation and copy-up
Key 3 to the root node:
Now, let’s add 5 to the rightmost leaf node. Once again to keep the order,
we’ll split the leaf node and copy-up 4. Since that will overflow the root
node, we’ll have to perform another split operation splitting the root
node into two nodes and promoting 3 into a new root node
Notice the difference between splitting a leaf node and splitting an
internal node. When we split the internal node in the second split
operation we didn’t copy-up Key 3.
In the same way, we keep adding the keys from 6 to 10, each time
splitting and copying-up when necessary until we have reached our
final tree:
Searching a
B+tree
• Searching for a specific key within a B+tree is very
similar to searching for a key in a normal B-tree.
• Let’s see what it would be like to search for Key 6 again
but on the B+tree:
• The shaded nodes show us the path we have taken in
order to find our match. Deduction tells us that
searching within a B+tree means that we must go
all the way down to a leaf node to get the satellite
data. As opposed to B-trees where we could find the
data at any level.
• In addition to exact key match queries, B+trees
support range queries. This is enabled by the fact
that the B+tree leaf nodes are all linked together. To
perform a range query all we need to do is:
▫find an exact match search for the lowest key
▫and from there, follow the linked list until we reach the
leaf node with the maximum key
Example of an Insertion in a B+-tree

Figure 5.9: An example of insertion in a B+-tree with p = 3 and pleaf = 2.


The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6
Example of a Deletion in a B +-tree
• When an entry is deleted, it is always removed from the leaf
level. If it happens to occur in an internal node, it must also
be removed from there.
• In the latter case, the value to its left in the leaf node must
replace it in the internal node because that value is now the
rightmost entry in the subtree.
• Deletion may cause underflow by reducing the number of
entries in the leaf node to below the minimum required.
• In this case, we try to find a sibling leaf node
▫a leaf node directly to the left or to the right of the node
with underflow and
▫ redistribute the entries among the node and its sibling so
that both are at least half full;
▫ otherwise, the node is merged with its siblings and the
Figure 5.10: An example of deletion from a B+-tree with
Deletion sequence: 5, 12, 9
• Most multi-level indexes use B-tree or B+-tree data structures
because of the insertion and deletion problem.
• This leaves space in each tree node (disk block) to allow for
new index entries.
• These data structures are variations of search trees that
allow efficient insertion and deletion of new search values.
• In B-Tree and B+-Tree data structures, each node
corresponds to a disk block.
• Each node is kept between half-full and completely full.
• An insertion into a node that is not full is quite efficient.
• If a node is full the insertion causes a split into two nodes.
• Splitting may propagate to other tree levels.
• A deletion is quite efficient if a node does not become less
than half full.
• If a deletion causes a node to become less than half full, it must
be merged with neighboring nodes
B-trees in the Context of
Databases
• DBMSs leverage the logarithmic efficiency of B-
tree indexing to reduce the number of reads
required to find a particular record.
• B-trees are typically constructed so that each node
takes up a single page in memory and they are
designed to reduce the number of accesses by
requiring that each node be at least half full.
Difference between B-tree and B+-
tree
• In B+trees, search keys can be repeated but this is not the case
for B-trees
• B+trees allow satellite data to be stored in leaf nodes only,
whereas B-trees store data in both leaf and internal nodes
• In B+trees, data stored on the leaf node makes the
search more efficient since we can store more
keys in internal nodes
▫ this means we need to access fewer nodes
• Deleting data from a B+tree is easier and less time consuming
because we only need to remove data from leaf nodes
• Leaf nodes in a B+tree are linked together making
range search operations efficient and quick
• Finally, although B-trees are useful, B+trees are more
popular. In fact, 99% of database management
systems use B+trees for indexing.
• This is because the B+tree holds no data in the
internal nodes.
• This maximizes the number of keys stored in a node
thereby minimizing the number of levels needed in
a tree.
• Smaller tree depth invariably means faster search.
Extendible
Hashing
• https://www.geeksforgeeks.org/extendible-hashing-dynamic-app
roach-to
-dbms/
Query Optimization: Heuristics-based
Query Optimization
• Query Optimization is the process of choosing a suitable
execution strategy for processing a query.
• Process for heuristics optimization are:
▫ The parser of a high-level query generates an
initial internal representation.
▫ Apply heuristics rules to optimize the internal
representation.
▫ A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
• The main heuristic is to apply first the operations that reduce
the size of intermediate results.
• E.g, Apply SELECT and PROJECT operations before applying
Query
tree
• A tree data structure that corresponds to a
relational algebra expression.
• It represents the input relations of the query as
leaf nodes of the tree, and represents the
relational algebra operations as internal nodes.
• An execution of the query tree consists of
executing an internal node operation
whenever its operands are available and then
replacing that internal node by the relation
that results from executing the operation.
Query
graph
• A graph data structure that corresponds to a
relational calculus expression.
• It does not indicate an order on which
operations to perform first.
• There is only a single graph corresponding
to each query.
Exampl
e• For every project located in ‘Stafford’, retrieve
the project number, the controlling department
number and the department manager’s last
name, address and birthdate.
• Relation algebra:
PREPARED
BY

Figure 5.11: Two query trees for the query Q2.


(a) Query tree corresponding to the relational algebra expression
for Q2.
(b) Initial (canonical) query tree for SQL query Q2.
(c) Query graph for Q2.
Heuristic Optimization of Query
Trees
• The same query could correspond to many
different relational algebra expressions and
hence many different query trees.
• The task of heuristic optimization of query trees is
to find a final query tree that is efficient to
execute.
Example

(a) Initial (canonical) query tree for SQL query Q.


(b) Moving SELECT operations down the query
tree.

(c) Applying the more restrictive SELECT operation first.


(d) Replacing CARTESIAN PRODUCT and SELECT with JOIN
operations

(e) Moving PROJECT operations down the query tree


General Transformation Rules for Relational

Algebra
Operations
PREPARED BY SHARIKA T
R,
SNGCE
PREPARED BY SHARIKA T
R,
SNGCE
Outline of a Heuristic Algebraic Optimization

Algorith
• Using rule 1, break up any select operations with conjunctive
m:conditions into a cascade of select operations.
• Using rules 2, 4, 6, and 10 concerning the commutativity of select
with other operations, move each select operation as far down the
query tree as is permitted by the attributes involved in the select
condition.
• Using rule 9 concerning associativity of binary operations, rearrange
the leaf nodes of the tree so that the leaf node relations with the most
restrictive select operations are executed first in the query tree
representation.
• Using Rule 12, combine a Cartesian product operation with a
subsequent select operation in the tree into a join operation.
• Using rules 3, 4, 7, and 11 concerning the cascading of project and the
commuting of project with other operations, break down and move
lists of projection attributes down the tree as far as possible by
creating new project operations as needed.
Summary of Heuristics
for Algebraic
Optimization
• The main heuristic is to apply first the operations that
reduce the size of intermediate results.
• Perform select operations as early as possible to
reduce the number of tuples and perform project
operations as early as possible to reduce the
number of attributes.
▫ This is done by moving select and project operations
as far down the
tree as possible.
• The select and join operations that are most restrictive
should be executed before other similar operations.
Query Execution
Plans
• An execution plan for a relational algebra query
consists of a combination of the relational algebra
query tree and information about the access
methods to be used for each relation as well as
the methods to be used in computing the
relational operators stored in the tree.
• Materialized evaluation: the result of an operation
is stored as a temporary relation.
• Pipelined evaluation:
EN
D

You might also like