Database Systems
Zubaira Naz
zubaira.naz@itu.edu.pk
Information Technology University (ITU)
01|Course Intro & Relational Model
Today’s Agenda
• Overview
• Course Logistics
• Data vs Information
• File System Critics
• Database Management Systems
• Data Models
3
Course Overview
• This course is on the design and implementation of disk-
oriented database management systems.
• This is not a course on how to use a database to build
applications or how to administer a database.
• Course Format
– Two 90-min lectures / week
– 3 hours Lab / week
4
Textbook
• Database System Concepts
– 7th Edition, Silberschatz, Korth, &Sudarshan
• A First Course in Database Systems
– 3rd Edition, Ullman and Widom
• Lecture Notes are important
5
Tentative Grading Breakdown
• Assignment (5%)
• Quiz (10%)
• Lab (10%)
• Project (25%)
• Mid Term (20%)
• Final Term (30%)
• Project is compulsory
6
Project
• Select an application that needs a database
• Build a database application from start to
finish
• Will be done in stages
– you will submit some work at the end of each stage
• You must show a demo at end of the semester
7
Project
• For the project
– lots of programming will be required
– in a high-level language of your own choosing (or rather
your team’s choosing)
– could be Java, C, C++, Perl, Python
– must know how to build a Web based application or be
willing to learn
• Project will be done in group of 2-3 students
8
Databases
Data vs Information
Example:
• A car company has two divisions, and the two divisions have
1,380,456 and 1,453,907 invoices, respectively.
• Each invoice has invoice number, date, and amount
• The period is from the first quarter of 1997 to first quarter of
2002
• Total 2,834,363 (>2M) records
10
Data vs Information
… … …
Data 3000124 12-Jan-2002 $121.98
… … …
Information:----------?
11
12
Data vs Information
• Data constitute building blocks of information.
• Information produced by processing data.
• Information reveals meaning of data.
• Good, timely, relevant information key to decision-making.
• Good decision-making key to organizational survival
13
Data vs Information
Data Information
Data is unorganized raw facts that need processing Information is a processed, organized data presented in a
without which it is seemingly random and useless to given context and is useful to humans.
humans
Data is an individual unit that contains raw material Information is a group of data that collectively carry a
which does not carry any specific meaning. logical meaning.
Data doesn’t depend on information. Information depends on data.
Data is never suited to the specific needs of a designer. Information is specific to the expectations and
requirements because all the irrelevant facts and figures
are removed, during the transformation process.
An example of data is a student’s test score The average score of a class is the information derived
from the given data.
14
Database
• Database is shared, integrated computer structure that stores a
collection of data:
– End user data (raw data)
– Metadata (data about data, it contains data characteristics and
relationships)
• Organized collection of inter-related data that models some aspect of
the real-world.
• Databases are the core component of most computer applications.
15
Metadata
• Data that describe the properties or characteristics of the end user
data, and the context of that data.
§ Metadata describe the properties of data but are separate from data.
• Properties of data:
§ Data types, size, length, allowable values
• Data Context:
§ Source of data, ownership, where its stored, usage
Metadata
Database Example
• Create a database that models a digital music store to
keep track of the artists and their albums.
• Things we need to store:
– Information about Artists
– What Albums those Artists released
18
Flat File Strawman
• Store our database as comma-separated value
(CSV) files that we manage in our own code.
– Use a separate file perentity (Artist & Album).
– The application must parse the files
each time to read/update records.
19
Flat File Strawman
• Create a database that models a digital music store.
20
Flat File Strawman
Example: Get the year that Ice Cube went USA
21
Flat File Strawman
Example: Get the year that Ice Cube went USA
22
Flat File : Data Integrity
• How do we ensure that the artist is the same for each
album entry?
• What if somebody overwrites the album year with
an invalid string?
• How do we store that there are multiple artists on an
album?
23
Flat File : Implementation
• How do you find a particular record?
• Data Dependence
– Change in file’s data characteristics requires modification of data access programs
• Changes in the characteristics of data, such as changing a field from integer
to decimal, require changes in all the programs that access the file.
– Changing the customer address field length from 30 to 40 characters, changing
integer field to float etc.
– Changes in the data storage characteristics affect the application program’s
ability to access the data.
– Must tell program what to do and how to do
24
Flat File : Implementation
• How do you find a particular record?
• Data Dependence
– Change in file’s data characteristics requires modification of data access programs
• Changes in the characteristics of data, such as changing a field from integer to decimal, require
changes in all the programs that access the file.
– Changing the customer address field length from 30 to 40 characters, changing integer field to float etc.
– Changes in the data storage characteristics affect the application program’s ability to access the data.
– Must tell program what to do and how to do
• What if now these files are required by another department/company, and they want to have
write access?
• What if two threads try to write to the same file at the same time?
25
Flat File : Implementation
• What if now these files are required by another department/company,
and they want to have write access?
• Data Redundancy: (Unnecessary Duplication of data)
– Results of uncontrolled data redundancy
• Data anomalies
– Insert
– Update
– Delete
• Data inconsistency (Different and conflicting versions of same data)
– Lack of data integrity
• What if two threads try to write to the same file at the same time?
26
Flat File : Durability
• What if the machine crashes while our program
is updating a record?
• What if we want to replicate the database on
multiple machines for high availability?
27
Database Management
• Database Management System (DBMS): software
system (collection of software) helps to manage the
data contents
– Manages Database structure
– Controls access to data
– Contains query language
28
29
Importance of DBMS
• Makes data management more efficient and effective.
• Query language allows quick answers to ad hoc queries.
• Provides better access to more and better-managed data.
• Promotes integrated view of organization’s operations.
• Reduces the probability of inconsistent data.
• Improved data sharing.
• Improved data security.
30
Database Industry
• Relational databases are a great success of theoretical ideas.
• Big DBMS companies are among the largest software
companies in the world.
• Oracle
• IBM (with DB2)
• Microsoft (SQL Server, Microsoft Access)
• Others
• $54B industry
31
Building an Application with a DBMS
• Requirements modeling (conceptual, pictures)
• Decide what entities should be part of the application and how they
should be linked.
• Schema design and implementation (Conceptual and
Implementation Models)
• Decide on a set of tables, attributes.
• Define the tables in the database system.
• Populate database (insert tuples).
• Write application programs using the DBMS
• way easier now that the data management is taken care of
• Banking, universities, airlines etc. 34
Conceptual Modelling
name category name
cid
ssn
Takes Course
Student
quarter
Advises Teaches
Professor
address name field 33
Data Model
• A data model is collection of concepts for describing the data in
a database.
• Conceptual Models: logical nature of data representation; it
emphasizes on what entity is presented; it is used for database
design as blueprint (ERDs)
• Implementation Models: emphasis on how the data are
represented in the database
• Aschema is a description of a particular collection of data, using a given data
model.
34
Schema
• A database schema is the skeleton structure that represents the logical view of the
entire database.
• Internal Schema: Actual physical storage structure and access paths.
• Logical Schema: It describes the Database structure of the whole database for the
community of users. This schema hides information about the physical storage
structures and focuses on describing data types, entities, relationships etc.
• External Schema: Describes the part of
the database which specific user is
interested in. For example, a user from
the sales department will see only sales
related data. 37
Data Model
• Relational
• Key/Value
• Graph
• Document
• Column-family
• Array / Matrix
• Hierarchical
• Network
38
Data Model
• Relational • Most Databases
• Key/Value • RDBMS
• Comprises of tables, rows and columns
• Graph • SQL databases are a better fit for heavy
• Document duty or complex transactions because it's
more stable and ensures data integrity.
• Column-family
• Array / Matrix
• Hierarchical
• Network
39
Data Model : Implementation Models
• Relational • NoSql
• Key/Value • Other than the tabular relations
• Big data and real-time web apps
• Graph • Examples: MongoDB, Hbase
• Document • More scalable, suitable for big data
analysis
• Column-family • Designed for unstructured data
• Array / Matrix
• Hierarchical
• Network
38
Data Model : Implementation Models
• Relational
• Key/Value
• Graph
• Document
• Column-family • Data structure is in arrays
• Array / Matrix • Machine Learning
• Hierarchical • Google Earth Engine
• Network
39
Data Model : Implementation Models
• Relational
• Key/Value
• Graph
• Document
• Column-family
• Array / Matrix
• Hierarchical • Obsolete / rare
• Network • Tree like structure
40
Practice Questions
1. Define "data" and "information." How do these two concepts differ?
Provide examples to illustrate your points.
2. Consider a business scenario where data is collected from customer
transactions. Explain how this data can be transformed into actionable
information that can help in decision-making.
3. Critically analyze the limitations of traditional file systems for managing
large volumes of data. Include aspects such as data redundancy,
inconsistency, and lack of data security.
4. Discuss a real-world example where the limitations of a file system led to
significant problems. How would a DBMS have addressed these issues?
5. Provide an overview of what a Database Management System is. What
are its primary functions and benefits compared to traditional file systems?
41