[go: up one dir, main page]

0% found this document useful (0 votes)
41 views65 pages

Lec3 1

Uploaded by

nathandyoung1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views65 pages

Lec3 1

Uploaded by

nathandyoung1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Database Systems I

CMPT 354 Summer 2024


Zhengjie Miao

15 May 2024
Course Setup
• OH slots:
• Tue 2pm-3pm, Zhengjie@TASC 1 9407
• Mon 11am-12pm, Xinyi@ASB9812
• Wed 1pm-2pm, Obumneme@ASB9812
• Thu 3pm-4pm, Zahra@ASB9812

• Sign up for Piazza!


• The place to ask course-related questions
• https://piazza.com/sfu.ca/summer2024/cmpt354d100
• Access code: qc76kdzsz9l
Recap: MapReduce
• Which design is better for OLAP?
• MapReduce vs. Relational DBMS?
• Many problems can be processed in MapReduce pattern:
• Given a lot of unsorted data
• Map: extract something of interest from each record
• Shuffle: group the intermediate results in some way
• Reduce: further process (e.g., aggregate, summarize, analyze, transform)
each group and write final results
(Customize map and reduce for problem at hand)

3
MR Example - Map
1.com 2.com 3.org 4.edu key: document id
ABC BBC CBC AAC value: document contents

A, 1 C, 3 B, 2 A, 4 map(String key, String value):


B, 1 B, 3 C, 2 C, 4
C, 1 for each word w in value:
(Local disk)
Emit_Intermediate(w, document id);

4
MR Example - Shuffle & Reduce
A, 1 C, 3 B, 2 A, 4
B, 1 B, 3 C, 2 C, 4
C, 1

Values exchanged by shuffle process


key: a word
A, { 1, 4 } values: a list of doc id
B, { 1, 3, 2 }
C, { 3, 1, 2, 4 }

A, [1, 4] The reduce function accepts all pairs for a


B, [1, 2, 3] given word, sorts the corresponding
C, [1, 2, 3, 4] document IDs

5
Trends
• 1 Hybrid Transactional and Analytical Processing

• 2. Cloud-Native Databases

• 3. Lakehouse = Data Lake + Data Warehouse

• 4. Specialized Systems
• E.g. Time-series Database, GPU Acceleration, Vector Database

6
Outline
• An overview of data models

• Basics of the Relational Model

• Relational Algebra
• Core operators & Derived operators
• Extension
• Relational calculus

7
Recap: Data Models
• Query: Who have accounts with 0 balance managed by a branch in
Springfield? Branch
Account Id Location
234 Brockway
Id Name Balance Branch
1000 Ned Flanders 0 234
Id Location
1001 Homer Simpson 0 653
653 Springfield
1002 Montgomery Burns 1000000000 781

Id Location
781 Shelbyville

• Programmer controls “navigation”: accounts → branches


• How about branches → accounts? (e.g., find branches managing accounts
with 0 balance)
8
Data Model
• Data Model
• mathematical formalism (or conceptual way) for describing the data

• The description generally consists of three parts:


• Structure of the data
• Operations on the data
• Constraints on the data

9
Structure of the data
• Schema (e.g., table names, attribute names)
• Describe the conceptual structure of the data

• Different from data structure (e.g., list, array)


• Data structure can be seen as a physical data model

10
Operations on the data
• Query language (e.g., RA, SQL)
• Describe what operations that can be performed on data

• Two kinds of operations


• operations that retrieve information (“query” the data)
• operations that change the database (“update” the data)

• Different from programming languages (e.g., C, Java)


• Support a set of limited operations
• Allow for query optimizations

11
Constraints on the data.
• Constraints (e.g., balance >= 0, account id is unique)
• describe limitations on what the data can be.

• Different kinds of constraints


• Domain constraints
• Integrity constraints

• Why does it matter?


• Ensure the correctness of data

12
Commonly Used Data Models
• Relational Data Model

• Key-Value Data Model

• Semi-structured Data Model (e.g., Json, XML)

13
The Relational Model in Brief
Id Name Balance Branch Id Location
1000 Ned Flanders 0 234 234 Brockway
1001 Homer Simpson 0 653 653 Springfield
1002 Montgomery Burns 1000000000 781 781 Shelbyville

• Structure of the Data


• Table structure
• Query language
• RA, RC, SQL
• Constraints on the data
• E.g., id is unique, balance >= 0, name is not NULL
14
The Key-Value Model in Brief
Key à Value
1000 à (Ned Flanders, 0, 234)
1001 à (Homer Simpson, 0, 653)
1002 à (Montgomery Burns, 1000000000, 781)

• Structure of the Data


• (Key, Value) pairs
• Key is an integer/string, value can be any object
• Query language
• get(key), put(key, value)
• Constraints on the data
• E.g., key is unique, value is not NULL
15
The Semistructured Model in Brief
• Structure of the Data
• Tree structure (e.g., XML, json)
• Query language
• XQuery, MongoDB QL
• Constraints
• E.g., each <Account> has a <Id> element
and a <name> element nested within

Resembles the hierarchical model


“what goes around comes around”

16
Edgar F. Codd (1923-2003)

• Pilot in the Royal Air Force in WW2


• Inventor of the relational model
and algebra while at IBM
• Turing Award, 1981

17
http://en.wikipedia.org/wiki/File:Edgar_F_Codd.jpg
Relational data model
• A database is a collection of relations (or tables)
• Each relation has a set of attributes (or columns)
• Each attribute has a name and a domain (or type)
• Set-valued attributes are not allowed
• Each relation contains a set of tuples (or rows)
• Each tuple has a value for each attribute of the relation
• Duplicate tuples are not allowed
• Two tuples are duplicates if they agree on all attributes

• Simplicity is a virtue!

18
Example Group
User gid name
uid name age pop abc Book Club
142 Bart 10 0.9 gov Student Government
123 Milhouse 10 0.2 dps Dead Putting Society
857 Lisa 8 0.7 … …
456 Ralph 8 0.3
… … … …
Member uid gid
142 dps
123 gov
Ordering of rows doesn’t matter
857 abc
(even though output is
always in some order) 857 gov
456 abc
456 gov
… …
19
Schema vs. instance
• Schema (metadata)
• Specifies the logical structure of data
• Is defined at setup time
• Rarely changes
• Instance
• Represents the data content
• Changes rapidly, but always conforms to the schema
• Compare to types vs. collections of objects of these types in a
programming language

20
Example
• Schema
• User (uid int, name string, age int, pop float)
• Group (gid string, name string)
• Member (uid int, gid string)
• Instance
• User: {〈142, Bart, 10, 0.9〉, 〈857, Milhouse, 10, 0.2〉, …}
• Group: {〈abc, Book Club〉, 〈gov, Student Government〉, …}
• Member: {〈142, dps〉, 〈123, gov〉, …}

21
Relational algebra
A language for querying relational data based on “operators”

RelOp
RelOp

• Core operators:
• Selection, projection, cross product, union, difference, and renaming
• Additional, derived operators:
• Join, natural join, intersection, etc.
• Compose operators to make complex queries
22
Selection
• Input: a table 𝑅
• Notation: 𝜎! 𝑅
• 𝑝 is called a selection condition (or predicate)
• Purpose: filter rows according to some criteria
• Output: same columns as 𝑅, but only rows or 𝑅 that satisfy 𝑝

23
Selection example
• Users with popularity higher than 0.5
𝜎!"!#$.& 𝑈𝑠𝑒𝑟

uid name age pop uid name age pop


142 Bart 10 0.9 142 Bart 10 0.9
123 Milhouse 10 0.2
𝜎!"!#$.&
857 Lisa 8 0.7 857 Lisa 8 0.7
456 Ralph 8 0.3
… … … … … … … …

24
More on selection
• Selection condition can include any column of 𝑅, constants,
comparison (=, ≤, etc.) and Boolean connectives (∧: and, ∨: or, ¬:
not)
• Example: users with popularity at least 0.9 and age under 10 or above 12
𝜎"#"$%.' ∧ *+,-.% ∨ *+,0.1 𝑈𝑠𝑒𝑟
• You must be able to evaluate the condition over each single row of
the input table!
• Example: the most popular user
𝜎"#" $ ,2,34 "#" 56 78,3 𝑈𝑠𝑒𝑟
RO N G!
W
25
Projection
• Input: a table 𝑅
• Notation: 𝜋' 𝑅
• 𝐿 is a list of columns in 𝑅
• Purpose: output chosen columns
• Output: same rows, but only the columns in 𝐿

26
Projection example
• IDs and names of all users
𝜋()*,,-./ 𝑈𝑠𝑒𝑟

uid name age pop uid name


142 Bart 10 0.9 142 Bart
123 Milhouse 10 0.2 123 Milhouse
𝜋'(),+,-.
857 Lisa 8 0.7 857 Lisa
456 Ralph 8 0.3 456 Ralph
… … … … … …

27
More on projection
• Duplicate output rows are removed (by definition)
• Example: user ages
𝜋*+, 𝑈𝑠𝑒𝑟
uid name age pop age
142 Bart 10 0.9 10
123 Milhouse 10 0.2 10
𝜋,/.
857 Lisa 8 0.7 8
456 Ralph 8 0.3 8
… … … … …

28
Composing Select and Project
• Names of users with popularity higher than 0.5
𝜋,-./ (𝜎!"!#$.& 𝑈𝑠𝑒𝑟)

uid name age pop uid name age pop name


142 Bart 10 0.9 142 Bart 10 0.9 Bart
123 Milhouse 10 0.2
857 Lisa 8 0.7 Lisa
857 Lisa 8 0.7
… … … … …
456 Ralph 8 0.3
… … … …
𝜎%&%'(.* 𝜋!"#$

• What about: 𝜎!"!#$.& (𝜋'()* (𝑈𝑠𝑒𝑟))


• Invalid types. Input to 𝜎!"!#$.& does not contain pop.
Cross product
• Input: two tables 𝑅 and 𝑆
• Natation: 𝑅×𝑆
• Purpose: pairs rows from two tables
• Output: for each row 𝑟 in 𝑅 and each 𝑠 in 𝑆, output a row 𝑟𝑠
(concatenation of 𝑟 and 𝑠)

30
Cross product example
𝑈𝑠𝑒𝑟×𝑀𝑒𝑚𝑏𝑒𝑟 uid gid
uid name age pop
123 gov
123 Milhouse 10 0.2
857 abc
857 Lisa 8 0.7
857 gov
… … … … × … …

uid name age pop uid gid

• How many rows in result? 123 Milhouse 10 0.2 123 gov


123 Milhouse 10 0.2 857 abc
• |R|*|S|
123 Milhouse 10 0.2 857 gov
• Schema compatability? 857 Lisa 8 0.7 123 gov
• Not needed. 857 Lisa 8 0.7 857 abc

• Duplicates? 857 Lisa 8 0.7 857 gov


… … … … … …
• None generated. 31
A note on column ordering
• Ordering of columns is unimportant as far as contents are concerned
uid name age pop uid gid uid gid uid name age pop
123 Milhouse 10 0.2 123 gov 123 gov 123 Milhouse 10 0.2
123 Milhouse 10 0.2 857 abc 857 abc 123 Milhouse 10 0.2
123 Milhouse 10 0.2 857 gov 857 gov 123 Milhouse 10 0.2
857 Lisa 8 0.7 123 gov = 123 gov 857 Lisa 8 0.7
857 Lisa 8 0.7 857 abc 857 abc 857 Lisa 8 0.7
857 Lisa 8 0.7 857 gov 857 gov 857 Lisa 8 0.7
… … … … … … … … … … … …

• So cross product is commutative, i.e., for any 𝑅 and 𝑆, 𝑅×𝑆 = 𝑆×𝑅 (up
to the ordering of columns)

32
Derived operator: join
(A.k.a. “theta-join”)
• Input: two tables 𝑅 and 𝑆
• Notation: 𝑅 ⋈! 𝑆
• 𝑝 is called a join condition (or predicate)
• Purpose: relate rows from two tables according to some
criteria
• Output: for each row 𝑟 in 𝑅 and each row 𝑠 in 𝑆, output a
row 𝑟𝑠 if 𝑟 and 𝑠 satisfy 𝑝
• Shorthand for 𝜎! 𝑅×𝑆
33
Join example
• Info about users, plus IDs of their groups
𝑈𝑠𝑒𝑟 ⋈!"#$.&'()*#+,#$.&'( 𝑀𝑒𝑚𝑏𝑒𝑟
uid gid
uid name age pop
123 gov
123 Milhouse 10 0.2
857 abc
857

Lisa

8

0.7


×
!"#$.&'()
*#+,#$.&'( 857 gov
… …

uid name age pop uid gid


Prefix a column reference 123 Milhouse 10 0.2 123 gov
with table name and “.” to 123 Milhouse 10 0.2 857 abc
disambiguate identically named 123 Milhouse 10 0.2 857 gov
columns from different tables 857 Lisa 8 0.7 123 gov
857 Lisa 8 0.7 857 abc
857 Lisa 8 0.7 857 gov
… … … … … …
34
Derived operator: natural join
• Input: two tables 𝑅 and 𝑆
• Notation: 𝑅 ⋈ 𝑆
• Purpose: relate rows from two tables, and
• Enforce equality between identically named columns
• Eliminate one copy of identically named columns
• Shorthand for 𝜋" 𝑅 ⋈! 𝑆 , where
• 𝑝 equates each pair of columns common to 𝑅 and 𝑆
• 𝐿 is the union of column names from 𝑅 and 𝑆 (with duplicate columns
removed)

35
Natural join example
𝑈𝑠𝑒𝑟 ⋈ 𝑀𝑒𝑚𝑏𝑒𝑟 = 𝜋? 𝑈𝑠𝑒𝑟 ⋈? 𝑀𝑒𝑚𝑏𝑒𝑟
= 𝜋&'(,/0+#,01#,232,1'( 𝑈𝑠𝑒𝑟 ⋈ !"#$.&'() 𝑀𝑒𝑚𝑏𝑒𝑟
*#+,#$.&'(
uid gid
uid name age pop
123 gov
123 Milhouse 10 0.2
857 abc
857 Lisa 8 0.7 ⋈

!"#$.&'()
*#+,#$.&'( 857 gov
… … … …
… …

uid name age pop uid gid


123 Milhouse 10 0.2 123 gov

857 Lisa 8 0.7 857 abc


857 Lisa 8 0.7 857 gov
… … … … … …
36
Union
• Input: two tables 𝑅 and 𝑆
• Notation: 𝑅 ∪ 𝑆
• 𝑅 and 𝑆 must have identical schema
• Output:
• Has the same schema as 𝑅 and 𝑆
• Contains all rows in 𝑅 and all rows in 𝑆 (with duplicate rows removed)

uid gid uid gid uid gid


123 gov ∪ 123 gov = 123 gov
857 abc 857 gov 857 abc
857 gov

37
Difference
• Input: two tables 𝑅 and 𝑆
• Notation: 𝑅 − 𝑆
• 𝑅 and 𝑆 must have identical schema
• Output:
• Has the same schema as 𝑅 and 𝑆
• Contains all rows in 𝑅 that are not in 𝑆

uid gid uid gid uid gid


123 gov − 123 gov = 857 abc
857 abc 857 gov

38
Derived operator: intersection
• Input: two tables 𝑅 and 𝑆
• Notation: 𝑅 ∩ 𝑆
• 𝑅 and 𝑆 must have identical schema
• Output:
• Has the same schema as 𝑅 and 𝑆
• Contains all rows that are in both 𝑅 and 𝑆
• Shorthand for?
uid gid uid gid uid gid
123 gov ∩ 123 gov = 123 gov
857 abc 857 gov

39
Intersection
R ∩ S
• R∩S=R–?

= R – ?

40
Intersection
• R ∩ S = R – (R – S)
R ∩ S
• Also = S – (S – R)

= R –( R
? – S )
• How can you write it without –
𝑅⋈𝑆
41
Renaming
• Input: a table 𝑅
• Notation: 𝜌4 𝑅, 𝜌 5!,5",… 𝑅, or 𝜌4 5!,5",… 𝑅
• Purpose: “rename” a table and/or its columns
• Output: a table with the same rows as 𝑅, but called differently

Member M1
uid gid uid gid
123 gov 𝜌,- ./0-,2/0- 𝑀𝑒𝑚𝑏𝑒𝑟 = 123 gov
857 abc 857 gov

42
Renaming
• Used to
• Avoid confusion caused by identical column names
• Create identical column names for natural joins
• As with all other relational operators, it doesn’t modify the database
• Think of the renamed table as a copy of the original
Member M1
uid gid uid gid
123 gov 𝜌,- ./0-,2/0- 𝑀𝑒𝑚𝑏𝑒𝑟 = 123 gov
857 abc 857 gov

43
Renaming example
• IDs of users who belong to at least two groups
𝑀𝑒𝑚𝑏𝑒𝑟 ⋈? 𝑀𝑒𝑚𝑏𝑒𝑟
uid gid uid gid
123 gov 123 gov
857 abc ⋈? 857 abc
857 gov 857 gov
uid gid uid gid
123 gov 123 gov Same uid
123 gov 857 abc
123 gov 857 gov
857 abc 123 gov Different gids
857 abc 857 abc
857 abc 857 gov
857 gov 123 gov
857 gov 857 abc
857 gov 857 gov 44
Renaming example
• IDs of users who belong to at least two groups
𝑀𝑒𝑚𝑏𝑒𝑟 ⋈? 𝑀𝑒𝑚𝑏𝑒𝑟

𝜋&'( 𝑀𝑒𝑚𝑏𝑒𝑟 ⋈*#+,#$.&'()*#+,#$.&'( ∧ 𝑀𝑒𝑚𝑏𝑒𝑟


N G! *#+,#$.1'(9*#+,#$.1'(
WRO

𝜌 &'(!,1'(! 𝑀𝑒𝑚𝑏𝑒𝑟
𝜋&'(! ⋈&'(!)&'(" ∧ 1'(!91'("
𝜌 &'(",1'(" 𝑀𝑒𝑚𝑏𝑒𝑟

45
Expression tree notation

𝜋#$%,
⋈#$%, (#$%- ∧ '$%, +'$%-

𝜌 #$%, ,'$%, 𝜌 #$%- ,'$%-

𝑀𝑒𝑚𝑏𝑒𝑟 𝑀𝑒𝑚𝑏𝑒𝑟

46
Summary of core operators
• Selection: 𝜎2 𝑅
• Projection: 𝜋: 𝑅
• Cross product: 𝑅×𝑆
• Union: 𝑅 ∪ 𝑆
• Difference: 𝑅 − 𝑆
• Renaming: 𝜌4 5!,5",… 𝑅
• Does not really add “processing” power

47
Summary of derived operators
• Join: 𝑅 ⋈2 𝑆
• Natural join: 𝑅 ⋈ 𝑆
• Intersection: 𝑅 ∩ 𝑆

• Many more
• Semijoin, anti-semijoin, quotient, …

48
User (uid int, name string, age int, pop float)

An exercise
Group (gid string, name string)
Member (uid int, gid string)

• Names of users in Lisa’s groups


Writing a query bottom-up: Their names 𝜋,-./ name
Milhouse
gid uid
Lisa
abc
gov
123
857

Users in
𝜋#$%
gov 857

uid name age pop gid


Lisa’s groups 𝑈𝑠𝑒𝑟
857
857
Lisa
Lisa
8
8
0.7
0.7
abc
gov

uid name age pop
Lisa’s groups 𝜋'$% 𝑀𝑒𝑚𝑏𝑒𝑟
857 Lisa 8 0.7
Who’s Lisa? ⋈ uid
123
gid
gov
𝜎,-./(""$1-" 𝑀𝑒𝑚𝑏𝑒𝑟 857 abc
857 gov
𝑈𝑠𝑒𝑟 … …
49
User (uid int, name string, age int, pop float)

Another exercise
Group (gid string, name string)
Member (uid int, gid string)

• IDs of groups that Lisa doesn’t belong to


Writing a query top-down:

All group IDs IDs of Lisa’s groups

𝜋'$% 𝜋'$%

𝐺𝑟𝑜𝑢𝑝 ⋈

𝑀𝑒𝑚𝑏𝑒𝑟 𝜎,-./(""$1-"

𝑈𝑠𝑒𝑟 50
User (uid int, name string, age int, pop float)

A trickier exercise
Group (gid string, name string)
Member (uid int, gid string)

• Who are the most popular?


• Who do NOT have the highest pop rating?
• Whose pop is lower than somebody else’s?

𝜋#$% 𝜋21/3, .#$%

𝑈𝑠𝑒𝑟 ⋈21/3, .!5!621/3- .!5!


𝜌21/3, 𝜌21/3-
A deeper question:
When (and why) is “−” needed?
𝑈𝑠𝑒𝑟 𝑈𝑠𝑒𝑟 51
Monotone operators
RelOp What happens
Add more rows to the output?
to the input...
• If some old output rows may need to be removed
• Then the operator is non-monotone
• Otherwise the operator is monotone
• That is, old output rows always remain “correct” when more rows are
added to the input
• Formally, for a monotone operator 𝑜𝑝:
𝑅 ⊆ 𝑅 ; implies 𝑜𝑝 𝑅 ⊆ 𝑜𝑝 𝑅 ; for any 𝑅, 𝑅 ;

52
Monotone operators
RelOp What happens
Add more rows to the output?
to the input...
• If some old output rows may need to be removed
• Then the operator is non-monotone
• Otherwise the operator is monotone
• That is, old output rows always remain “correct” when more rows are
added to the input
uid gid uid gid uid gid
123 gov − 123 gov = 857 abc
857 abc 857 gov
857 abc
53
Monotone operators
RelOp What happens
Add more rows to the output?
to the input...
• If some old output rows may need to be removed
• Then the operator is non-monotone
• Otherwise the operator is monotone
• That is, old output rows always remain “correct” when more rows are
added to the input The old row is always
uid gid uid gid uid gid “correct” no matter
123 gov − 123 gov = 857 abc what is added to R
857 abc 857 gov 693 abc
857 gov
693 abc 54
Classification of relational operators
• Selection: 𝜎2 𝑅 Monotone
• Projection: 𝜋: 𝑅 Monotone
• Cross product: 𝑅×𝑆 Monotone
• Join: 𝑅 ⋈2 𝑆 Monotone
• Natural join: 𝑅 ⋈ 𝑆 Monotone
• Union: 𝑅 ∪ 𝑆 Monotone
• Difference: 𝑅 − 𝑆 Monotone w.r.t. 𝑅; non-monotone w.r.t 𝑆
• Intersection: 𝑅 ∩ 𝑆 Monotone?

55
Is intersection monotonic?
R ∩ S
•R ∩ S = R – (R – S)

= R –( R
? – S )
• Yes!
R1 ⊆ R2 ⇒ S ∩ R1 ⊆ S ∩ R2
Why is “−” needed for “highest”?
• Composition of monotone operators produces a monotone query
• Old output rows remain “correct” when more rows are added to the input
• Is the “highest” query monotone?
• No!
• Current highest pop is 0.9
• Add another row with pop 0.91
• Old answer is invalidated
• So it must use difference!

57
Why do we need core operator 𝑋?
• Difference
• The only non-monotone operator
• Projection
• The only operator that removes columns
• Cross product
• The only operator that adds columns
• Union
• The only operator that allows you to add rows?
• A more rigorous argument?
• Selection?
• Left as an exercise for the viewer

58
Extensions to relational algebra
• Duplicate handling (“bag algebra”)
• Grouping and aggregation

• All these will come up when we talk about SQL


• But for now we will stick to standard relational algebra without
these extensions

59
Why is RA a good query language?
• Simple
• A small set of core operators
• Semantics are easy to grasp
• Declarative?
• Yes, compared with older languages like CODASYL
• Though assembling operators into a query does feel somewhat
“procedural”
• Complete?
• With respect to what?

60
User (uid int, name string, age int, pop float)

Relational Calculus
Group (gid string, name string)
Member (uid int, gid string)

• 𝑢. 𝑢𝑖𝑑 𝑢 ∈ 𝑈𝑠𝑒𝑟 ∧
¬ ∃𝑢. ∈ 𝑈𝑠𝑒𝑟: 𝑢. 𝑝𝑜𝑝 < 𝑢. . 𝑝𝑜𝑝 }, or
• 𝑢. 𝑢𝑖𝑑 𝑢 ∈ 𝑈𝑠𝑒𝑟 ∧
∀𝑢. ∈ 𝑈𝑠𝑒𝑟: 𝑢. 𝑝𝑜𝑝 ≥ 𝑢. . 𝑝𝑜𝑝 }
• Relational algebra = “safe” relational calculus
• Every query expressible as a safe relational calculus query is also expressible as a
relational algebra query
• And vice versa
• Example of an “unsafe” relational calculus query
• 𝑢. 𝑛𝑎𝑚𝑒 ¬ 𝑢 ∈ 𝑈𝑠𝑒𝑟
• Cannot evaluate it just by looking at the database

61
Relational Calculus
• RA and RC form the basis for “real” query languages (e.g. SQL),
and for implementation
• Relational Algebra
• Queries are expressed by applying a sequence of operations to relations
• More operational/procedural, very useful for representing how query executes
• Relational Calculus
• Let’s users describe WHAT they want in first-order logic
• Rather than HOW to compute it
• Non-operational, declarative

62
Codd’s Theorem
• Established equivalence in expressivity between :
• Relational Calculus
• Relational Algebra

• Why an important result?


• Connects declarative representation of queries with operational
description
• Constructive: we can “compile” SQL into relational algebra

63
Limits of relational algebra
• Relational algebra has no recursion
• Example: given relation Friend(uid1, uid2), who can Bart reach in his social
network with any number of hops?
• Writing this query in r.a. is impossible!
• So RA is not as powerful as general-purpose languages (not Turing-
complete)
• But why not?
• Optimization becomes undecidable
• Simplicity is empowering
• Besides, you can always implement it at the application level (and
recursion is added to SQL nevertheless 😱)

65
What’s Next
• Next class:
• Database design in E/R model (recording)
• Sign up Piazza
• https://piazza.com/sfu.ca/summer2024/cmpt354d100
• Access code: qc76kdzsz9l
• Assignment 1 releasing soon
• Stay tuned for RA help tools

66

You might also like