[go: up one dir, main page]

0% found this document useful (0 votes)
43 views19 pages

Relational Algebra

The document discusses query languages used in databases, categorizing them into imperative, functional, and declarative types, with relational algebra serving as a procedural foundation for SQL. It explains the operations of relational algebra, including selection, projection, union, Cartesian product, and set difference, while contrasting it with relational calculus, which is non-procedural and declarative. The document highlights the significance of these languages in enabling efficient database communication and query execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views19 pages

Relational Algebra

The document discusses query languages used in databases, categorizing them into imperative, functional, and declarative types, with relational algebra serving as a procedural foundation for SQL. It explains the operations of relational algebra, including selection, projection, union, Cartesian product, and set difference, while contrasting it with relational calculus, which is non-procedural and declarative. The document highlights the significance of these languages in enabling efficient database communication and query execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Relational Algebra

Query Language

A query language allows users to request information from a database. They are usually easier to use than standard
programming languages. Query languages can be categorised into three types: imperative, functional, or
declarative.

Imperative query languages require the user to give a sequence of operations to perform on the database.
They usually have the concept of state variables which are updated throughout the process.
Functional query languages are based on the evaluation of functions. They may operate on database data or
results of other functions. These functions are side-effect free and do not update the program state. The
relational algebra is a functional query language and is the theoretical foundation of the SQL query
language.
Declarative query languages only require the user to describe the information they want without needing to
state how to obtain it. The database system is responsible for determining how to obtain the information. Tuple
relational calculus and domain relational calculus are examples of declarative languages.

Query languages used in practice such as SQL include elements from all three approaches. Relational query
languages use relational algebra to process user requests and tell the database management system (DBMS) to
execute them. They are the means by which users communicate with the database.

Procedural query languages use a set of queries to tell the DBMS to perform a series of transactions in a
specific order to satisfy the user request. The user must specify both the data and the method to retrieve it.
Relational algebra is an example of a procedural query language.
Example: A ‘get_CGPA’ procedure may involve several queries to fetch a student's marks for each subject,
calculate the total marks, and then determine the CGPA based on that total.
Non-procedural query languages only use a single query across one or more tables to get the desired result
from the database. The language only specifies what needs to be done with the tables without defining how to
achieve it.
Relational calculus is a non-procedural language.
Example: A query that fetches the name and address of a student given their ID may only use a single query
on the student table.

SQL is largely based on tuple relational calculus. It is a powerful language that can define data structures, control
access to data and perform operations on the data itself. The language can be categorised into four types of
statements:

Data Definition Language (DDL) defines and modifies the structure of the database.
Data Manipulation Language (DML) allows users to manipulate data in the database.
Data Control Language (DCL) is used to manage user permissions for accessing database objects.
Transaction Control Language (TCL) is responsible for managing database transactions to ensure data
integrity.

Relational algebra

Relational algebra is a procedural query language that forms the theoretical basis of SQL. It is used to
manipulate relations, which are sets of tuples. Relational algebra is one of the languages used to express queries, but
queries must also be expressible in relational calculus to be relationally complete. Relational algebra defines a set of
algebraic operations that take one or more relations as input and return a relation as output. These operations can be
combined to create expressions that express the desired query. Although relational algebra forms the foundation of
SQL, database systems do not allow users to write queries in relational algebra.
There are two main types of relational operations:

Unary operations: operate on a single relation. Examples include select, project, and rename.
Binary operations: operate on two relations. Examples include union, Cartesian product, and set difference.

Database systems typically translate SQL queries into a lower-level representation based on relational
algebra. This representation is used to optimise and evaluate queries. Relational algebra makes it easier to find
efficient, equivalent alternative expressions.

Relational algebra is a set-based language. This means relations are viewed as sets of tuples and therefore
duplicates are not allowed. SQL, however, allows duplicates and requires the DISTINCT keyword to remove them.
Relational algebra provides the methods to retrieve data while SQL specifies what data is required.

Relational calculus

Relational calculus is a non-procedural query language that uses mathematical predicate calculus to define
queries. It expresses the desired result without specifying the steps for retrieval, unlike the procedural approach of
relational algebra. Relational calculus is considered to be more theoretical and closer to natural language than
relational algebra. Relational calculus forms the basis of SQL, but it is rarely used directly in practice due to its
complexity. SQL includes many of the features of relational calculus. It is a declarative language where users tell the
database what data they want, rather than how to obtain it.

There are two main types of relational calculus:

Tuple relational calculus (TRC) selects tuples from a relation based on a specified condition. The result can
include one or multiple tuples. TRC reflects traditional pre-relational file structures.
Domain Relational Calculus (DRC) selects attributes from a relation based on a condition. This type focuses on
specific attributes instead of entire tuples and is more similar to logic as a modelling language.

Relational calculus queries are expressed as: { t | P(t) } Where:

t represents the set of tuples


P(t) is a predicate or condition that is true for the set of tuples

Example of a TRC query: {t | EMPLOYEE (t) AND t.DEPTID = 10} This would select all tuples (_t) from the
EMPLOYEE relation where the employee's department ID (DEPT_ID) is 10.

Example of a DRC query: { | < EMPLOYEE > DEPTID = 10 } This would select the values from the domain where
the EMPLOYEE's _DEPT_ID is 10.

The expressiveness of relational calculus is used as a standard to judge the power of other query languages.
A database language is considered relationally complete if the queries written in it can also be expressed in relational
calculus.

Key differences between relational algebra and relational calculus:

Feature Relational Algebra Relational Calculus


Language Type Procedural Declarative
Query Focus How to obtain the result What result to obtain
Order of Operations Specified Not specified
Domain Dependency Independent Can be domain-dependent (due to DRC)
Feature Relational Algebra Relational Calculus
Similarity to... Programming language Natural language
SQL Influence SQL includes some features SQL is largely based on it
Relational Queries should also be expressible in relational Must be at least as powerful as relational calculus
Completeness calculus
Query Evaluation Relies on specified order Order of operations does not matter
Database Access Step-by-step solution for how to retrieve Specifies what is required and lets the system find the
information solution

In our previous conversation, it was established that relational algebra is a functional query language, and tuple
relational calculus and domain relational calculus are declarative languages. You asked about the specific
features of relational algebra. This response focuses on relational calculus, covering its types, expressive power and
how it compares to relational algebra.

Relational Algebra

Relational algebra is a procedural query language that provides a theoretical foundation for relational
databases and SQL. It is used to manipulate relations, which are essentially tables with rows and columns. Each
operation in relational algebra takes one or more relations as input and produces a new relation as output.
These operations can be combined to form complex expressions that represent database queries.

There are two main categories of relational operations: unary and binary. Unary operations work on a single relation,
while binary operations work on two relations.

Some of the key operations in relational algebra include:


Select (σ): This operation selects tuples (rows) from a relation based on a specified condition.
Project (Π): This operation creates a new relation by selecting specific attributes (columns) from an existing
relation and eliminates duplicate tuples.
Union (∪ ): This operation combines tuples from two relations, producing a new relation that contains all tuples
from both input relations, without duplicates.
Set Difference (-): This operation finds the tuples that are present in one relation but not in another.
Cartesian Product (×): This operation combines each tuple from the first relation with every tuple from the
second relation, resulting in a relation with all possible combinations.
Rename (ρ): This operation changes the name of a relation or an attribute within a relation.
Join Operations (⋈ ): These operations combine tuples from two relations based on a shared attribute or a
specific condition. Several types of joins exist, including natural join, theta join, and equi-join.

Relational algebra is procedural in nature, which means you need to specify the steps involved in obtaining the
desired result. This is in contrast to SQL, which is declarative and only requires you to specify what data you want
without detailing how to get it. However, relational algebra serves as the theoretical basis for SQL and is often
used by database systems to optimise and execute SQL queries.

Basic Operator

1. Projection

Projection (Π) is a fundamental operation in relational algebra that creates a new relation by selecting specific
attributes (columns) from an existing relation. It effectively reduces the width of the relation, keeping only the
desired columns. The projection operation also eliminates duplicate tuples, ensuring that the resulting relation
contains only unique rows**.

Here's a breakdown of the key aspects of projection:


Purpose: To extract specific columns from a relation.
Input: A relation (table).
Output: A new relation containing only the specified attributes.
Duplicate Elimination: Projection removes duplicate rows from the result, ensuring uniqueness.
Column | Vertical Projection
Notation:

Π Column Name (Table Name)

where:

Π represents the projection operation.


Column Name is a comma-separated list of attributes to be included in the new relation.
R is the name of the input relation.

Example:

Consider a relation named "Students" with attributes (StudentID, Name, Age, City). To obtain a new relation with only
the StudentID and Name columns, you would use the following projection operation:

Π StudentID, Name (Students)

This operation would create a new relation containing two columns, "StudentID" and "Name," with all unique
combinations of student IDs and names from the original "Students" relation.

In our previous conversation, you asked about relational algebra, and this response specifically addresses your
follow-up question about the "projection" operation**. This explanation expands on the concept of projection,
highlighting its purpose, notation, and the important aspect of duplicate elimination.

2. Selection

Selection (σ) is a fundamental operation in relational algebra that retrieves tuples (rows) from a relation
(table) based on a specified condition. It filters the relation, keeping only the tuples that satisfy the given condition.
The condition is expressed using a predicate, which involves comparison operators such as:

Row | Horizontal Selection

Equal to (=)
Not equal to (≠ or <>)
Less than (<)
Greater than (>)
Less than or equal to (≤ or <=)
Greater than or equal to (≥ or >=)

Selection is a unary operation, meaning it operates on a single relation.

Notation:

σ condition (Table Name)

Where:

σ represents the selection operation.


is the predicate that determines which tuples to select.
R is the name of the input relation.

Example:

Consider a relation named "Employees" with attributes (EmpID, Name, Department, Salary). To select employees
from the "Finance" department, you would use the following selection operation:

σ Department = 'Finance' (Employees)

This operation would create a new relation containing only those tuples from the "Employees" relation where
the "Department" attribute is equal to 'Finance'.

In our previous conversation about relational algebra, we discussed its various operations, including
selection. You have now asked for a more detailed explanation of selection. This response delves deeper into the
concept of selection, outlining its purpose, notation, and providing a specific example to illustrate its functionality. It
also highlights the role of comparison operators in formulating selection conditions.

*Combine both

Π column Name (σ condition (Table Name))

Union

The union operation (∪ ) in relational algebra combines all the tuples from two relations, producing a new
relation that contains all unique tuples from both input relations. It's similar to the concept of set union in
mathematics. However, for the union operation to be valid in relational algebra, the two input relations must be union-
compatible.
*Similar to Full Outer Join

Union-compatibility requires the following conditions to be met:

Both relations must have the same number of attributes ( same No. of Columns ).
The corresponding attributes in both relations must have compatible data types
i.e. Same Domain . For example, you can't perform a union on relations where one has an attribute for 'salary'
as a number and the other has an attribute for 'salary' as text.

The union operation automatically eliminates duplicate tuples, resulting in a relation with only distinct tuples.

Notation:

R∪S

where:

R and S are the input relations.


∪ represents the union operation.

Example:

Let's say you have two relations:

Students_A with attributes (StudentID, Name, Major)


Students_B with attributes (StudentID, Name, Major)

To combine all students from both relations into a single relation called "All_Students," you would use the following
union operation:

All_Students = Students_A ∪ Students_B

The resulting "All_Students" relation would contain all unique students from both "Students_A" and
"Students_B." If a student exists in both input relations, they would appear only once in the "All_Students" relation.

The union operation is commutative, meaning the order of the input relations doesn't affect the result:

R∪S=S∪R

Key points from our conversation history relevant to understanding the Union operation:

You previously asked about relational algebra, which is a procedural query language. The union operation is one
of the fundamental operations in relational algebra.
We discussed that relational algebra is set-based, meaning it works on sets of tuples. The union operation
embodies this concept by combining two sets of tuples.

This explanation goes beyond a simple definition of the union operation and focuses on its practical application,
including the conditions for its validity and the importance of duplicate elimination. The example clarifies the concept
further. By connecting it to our previous conversation about relational algebra's procedural and set-based nature, this
response enhances your overall understanding of the topic.

Cross Product (x)


The Cartesian Product (×), also known as the cross product, is a fundamental operation in relational algebra
that combines every tuple from the first relation with every tuple from the second relation . This results in a
new relation containing all possible combinations of tuples from the two input relations.

The Cartesian product is different from other join operations like Natural Join or Equi-join, which apply specific
conditions to filter the combinations. The Cartesian product doesn't apply any conditions; it simply creates all possible
pairings.
#Joins
Example:

Let's say you have two relations:

"Employees" with attributes (EmpID, EmpName, Department)


"Projects" with attributes (ProjectID, ProjectName, Manager)

The Cartesian product of "Employees" × "Projects" would create a new relation with attributes (EmpID, EmpName,
Department, ProjectID, ProjectName, Manager). This new relation would have a tuple for every possible
combination of an employee and a project. So, if there are 5 employees and 3 projects, the resulting relation would
have 15 (5 x 3) tuples.

Properties of Cartesian Product:

Number of Tuples: The number of tuples (row) in the resulting relation is the product of the number of tuples in
each input relation.
Number of Attributes: The number of attributes in the resulting relation is the sum of the number of attributes in
each input relation.
No Shared Attributes Required: Unlike some other join operations, the Cartesian product doesn't require the
input relations to have any shared attributes.

Significance in Relational Algebra:

Foundation for Other Joins: While the Cartesian product itself is rarely used directly in database queries due to
the large result it generates, it serves as the foundation for other more practical join operations like Natural Join,
Equi-join, and Theta Join. These other join types are essentially filtered versions of the Cartesian product,
selecting only specific combinations based on shared attributes or conditions.
Theoretical Basis: The Cartesian product helps in understanding the underlying mechanism of how different
tuples from relations can be combined, which is essential for comprehending other relational algebra operations.

Relationship to Our Conversation:

We've previously discussed relational algebra and its various operations, including selection, projection, and
union. The cross product adds to your understanding of the different ways relations can be manipulated in
relational algebra.
You specifically asked for an explanation of the cross product, highlighting your interest in this specific operation.
This response fulfills your request by providing a detailed explanation with an example, properties, and its
significance in the context of relational algebra.

It's important to note that while SQL databases use the concept of the Cartesian product, you rarely write queries
explicitly using "CROSS JOIN." Usually, you specify conditions using "WHERE" clauses, which effectively filter the
Cartesian product to get the desired results.

Set Difference
The Set Difference operation (-) in relational algebra finds tuples present in one relation but not in another .
It's analogous to the set difference operation in mathematics. The resulting relation contains only the tuples that are
unique to the first relation.

Conditions for Set Difference:

Same Number of Attributes: Both input relations must have the same number of attributes (same no. columns).
Compatible Data Types: The corresponding attributes in both relations must have compatible data types.

Notation:

R-S

Where:

R and S are the input relations.


'-' represents the Set Difference operation.

Example:

Let's say you have two relations:

"Borrower" with attributes (CustomerID, CustomerName, LoanAmount).


"Depositor" with attributes (CustomerID, CustomerName, AccountBalance).

To find customers who are borrowers but not depositors, you would use the Set Difference operation:
Borrowers_Only = Borrower - Depositor

The "Borrowers_Only" relation would contain tuples representing customers who have loans but don't have deposit
accounts.

Relationship to Our Conversation:

You've asked about Set Difference as part of your exploration of relational algebra, showing your interest in
understanding how to manipulate relations to extract specific information.
We've discussed relational algebra's set-based nature. Set Difference aligns with this concept by operating on
sets of tuples and finding distinct elements in one set that are not present in another.
This explanation builds on our previous discussions about other relational algebra operations like selection,
projection, union, and cross product.

The Set Difference operation is not commutative, meaning the order of input relations is crucial:

R-S≠S-R

Derived Operator

Division Operator

The Division operator (÷) in relational algebra is used to find tuples in one relation (the dividend) that are associated
with all tuples in another relation (the divisor). It's a less common operation compared to others like selection,
projection, or joins, but it's valuable for specific types of queries.
Imagine you have a relation A with attributes ( P , Q ) and another relation B with attribute ( Q ). The division A ÷ B
would result in a relation with attribute ( P ) containing those values of P in A that are associated with every value of
Q in B .

Here's a way to conceptualize it:

1. Match Q Values: For each tuple in A , check if its Q value appears in every tuple of relation B .
2. Select P : If a tuple in A satisfies the matching condition (its Q value is present for all Q values in B ), select
its P value for the resulting relation.

Example:

Let's say you have these relations:

Courses (CourseID, CourseName, Dept)


Students (StudentID, StudentName, CourseID)

Query: Find the students who have taken all the courses offered by the 'Computer Science' department.

Using Division:

1. Create a temporary relation CS_Courses with CourseIDs of all 'Computer Science' courses: CS_Courses = Π
CourseID (σ Dept = 'Computer Science' (Courses))
2. Apply the Division operator: Result = Π StudentID (Students) ÷ CS_Courses

The Result relation will contain the StudentIDs of students who have taken every course listed in the CS_Courses
relation.
Explanation in the Context of Our Conversation:

You've been systematically exploring relational algebra operations. We've discussed selection, projection, union,
cross product, and set difference. The division operator adds another tool to your relational algebra toolkit,
allowing you to answer queries about "all" relationships between tuples.
This explanation connects to the set-based nature of relational algebra we've discussed. The division operation is
based on set comparisons, ensuring a tuple is included in the result only if it meets the condition of being
associated with all elements in another set.

Important Considerations:

The division operation can be complex to implement efficiently.


It's often possible to express division queries using other relational algebra operations, although this might lead to
more complex expressions.
Source provides a formal definition of the division operator and challenges you to write a query using it and then
express the same query without division (Source).

SQL and Division:

Standard SQL doesn't have a direct equivalent of the relational algebra division operator. However, you can usually
express division queries using combinations of other SQL clauses like EXISTS , NOT EXISTS , or subqueries.

Domain Relational Calculus

Domain Relational Calculus (DRC) is a non-procedural query language that focuses on selecting specific
attributes (columns) from relations based on defined conditions. Instead of selecting entire tuples like Tuple
Relational Calculus (TRC), DRC pinpoints the attributes you want and the criteria they must meet.
Here's how DRC works:

Specify Desired Attributes: You list the attributes you want to retrieve in the result.
Define Conditions: You use logical expressions and predicates to specify the conditions that the attributes must
satisfy. These conditions often involve comparing attribute values to constants or to values from other attributes.
No Procedural Steps: You don't specify how to retrieve the data. The database system determines the most
efficient way to find the attributes that meet your criteria.

Example:

Let's consider a relation "Employees" with attributes: EmpID, EmpName, Department, Salary.

DRC Query: Find the names and salaries of employees in the Sales department.

1 {EmpName, Salary | <EmpName, Salary, Department> ∈ Employees ∧ Department = 'Sales'}

Explanation:

{EmpName, Salary} : Specifies the attributes to be retrieved (EmpName and Salary).


<EmpName, Salary, Department> ∈ Employees : Indicates that the values for EmpName, Salary, and Department
should come from the "Employees" relation.
∧ Department = 'Sales' : Defines the condition that the Department attribute must equal 'Sales'.

Key Differences from TRC:

Focus: DRC focuses on individual attributes, while TRC selects entire tuples.
Variable Representation: In DRC, variables represent values drawn from specific domains (like a specific
employee's salary). In TRC, variables represent entire tuples.

Relationship to SQL:

While SQL borrows more from TRC, DRC also influences its structure. The "SELECT" clause in SQL, where you
specify the columns to retrieve, aligns with the attribute selection in DRC.

Connecting to Our Conversation:

You've been exploring different Relational Algebra operations, which are procedural. DRC provides a declarative
contrast, emphasizing what data you want without outlining the steps.
We've discussed the concept of domains in relational databases. DRC's name reflects its emphasis on domains,
as the conditions you define often involve comparing attribute values within their specific domains.

DRC offers a more mathematically formal way to express queries compared to SQL. It's not as commonly used for
practical database interactions, but understanding DRC helps in grasping the theoretical foundation of query
languages and how they relate to the relational model.

You might also like