[go: up one dir, main page]

0% found this document useful (0 votes)
41 views39 pages

Unit 1

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 39

1: Introduction to databases

Outline
 Introduction
 Traditional File based Systems
 Database Approach
 Roles in Database Environment
 History of Database management systems
 Advantages and Disadvantages of DBMS’s.
 Structure of Relational Databases
 Database Schema
 Keys
 Schema Diagram
 Relational Query Languages.
 Relational Operations
Introduction
 Data
 Information
 DBMS (Database Management System)
 Relational Queries
 SQL (Structured Query Language)
 Indexing
 Hashing
 Transactions
 Concurrency Control
 Recovery Systems
Database Management System
(DBMS)
 DBMS contains information about a particular enterprise
 Collection of interrelated data
 Set of programs to access the data
 An environment that is both convenient and efficient to use
 Database Applications:
 Banking: transactions
 Airlines: reservations, schedules
 Universities: registration, grades
 Sales: customers, products, purchases
 Online retailers: order tracking, customized
recommendations
 Manufacturing: production, inventory, orders, supply chain
 Human resources: employee records, salaries, tax
deductions
 Databases can be very large.
 Databases touch all aspects of our lives
University Database Example
 Application program examples
 Add new students, instructors, and courses
 Register students for courses, and generate
class rosters
 Assign grades to students, compute grade
point averages (GPA) and generate transcripts
 In the early days, database applications were
built directly on top of file systems
Traditional File based Systems
 File-based systems were an early attempt to
computerize the manual filing system that we are all
familiar with.
 We may have divisions in the filing system or separate
folders for different types of item that are in some way
logically related.
 The manual filing system works well while the number
of items to be stored is small.
 It even works quite adequately when there are large
numbers of items and we have only to store and retrieve
them.
Drawbacks of using file systems to
store data

 Data redundancy and inconsistency


 Multiple file formats, duplication of information in
different files
 Difficulty in accessing data
 Need to write a new program to carry out each new
task
 Data isolation
 Multiple files and formats
 Integrity problems
 Integrity constraints (e.g., account balance > 0)
become “buried” in program code rather than being
stated explicitly
 Hard to add new constraints or change existing ones
Drawbacks of using file systems to store data
(Cont.)

 Atomicity of updates
 Failures may leave database in an inconsistent state with
partial updates carried out
 Example: Transfer of funds from one account to another
should either complete or not happen at all
 Concurrent access by multiple users
 Concurrent access needed for performance
 Uncontrolled concurrent accesses can lead to inconsistencies
 Example: Two people reading a balance (say 100) and
updating it by withdrawing money (say 50 each) at the same
time
 Security problems
 Hard to provide user access to some, but not all, data

Database systems offer solutions to all the above problems


Database Approach
 A shared collection of logically related data, and a
description of this data, designed to meet the information
needs of an organization.
 The database holds not only the organization’s operational
data but also a description of this data.
 The Database Management System (DBMS)
 DDL
 DML
 Query Language
 Structured Query Language
 It may provide:
 a security system, which prevents unauthorized users
accessing the database;
 an integrity system, which maintains the consistency
of stored data;
 a concurrency control system, which allows shared
access of the database;
Roles in Database Environment
 We can identify four distinct types of people that
participate in the DBMS environment:
 Data and database administrators
 database designers
 application developers
 the end-users.
Data and Database administrators
 The Data Administrator (DA) is responsible for the
management of the data resource including database
planning, development and maintenance of standards,
policies and procedures, and conceptual/logical
database design.
 The Database Administrator (DBA) is responsible for the
physical realization of the database, including physical
database design and implementation, security and
integrity control, maintenance of the operational
system, and ensuring satisfactory performance of the
applications for users.
Database designers
 In large database design projects, we can distinguish
between two types of designer:
 Logical database designers
 Physical database designers.
 The logical database designer is concerned with
identifying the data (that is, the entities and attributes),
the relationships between the data, and the constraints
on the data that is to be stored in the database.
 The logical database designer must have a thorough and
complete understanding of the organization’s data and
any constraints on this data.
 The physical database designer decides how the logical
database design is to be physically realized. This
involves:
 mapping the logical database design into a set of
tables and integrity constraints;
 selecting specific storage structures and access
methods for the data to achieve good performance;
 designing any security measures required on the
data.
Application developers and end-
users
 Once the database has been implemented, the
application programs that provide the required
functionality for the end-users must be implemented.
This is the responsibility of the application developers.

 The end-users are the ‘clients’ for the database, which


has been designed and implemented, and is being
maintained to serve their information needs.
 End-users can be classified according to the way they
use the system:
 Naive users are typically unaware of the DBMS
 Sophisticated users.
History of Database Systems
 1950s and early 1960s:
 Data processing using magnetic tapes for storage
 Tapes provided only sequential access
 Punched cards for input
 Late 1960s and 1970s:
 Hard disks allowed direct access to data
 Network and hierarchical data models in widespread
use
 Ted Codd defines the relational data model
 Would win the ACM Turing Award for this work
 IBM Research begins System R prototype
 UC Berkeley begins Ingres prototype
 High-performance (for the era) transaction processing
History (cont.)
 1980s:
 Research relational prototypes evolve into commercial
systems
 SQL becomes industrial standard
 Parallel and distributed database systems
 Object-oriented database systems
 1990s:
 Large decision support and data-mining applications
 Large multi-terabyte data warehouses
 Emergence of Web commerce
 Early 2000s:
 XML and XQuery standards
 Automated database administration
 Later 2000s:
 Giant data storage systems
 Google BigTable, Yahoo PNuts, Amazon, ..
Advantages and Disadvantages of
DBMS’s
Structure of Relational Databases
 A relational database consists of a collection of tables,
tables
each of which is assigned a unique name.
 A row in a table represents a relationship among a set of
values.
 In the relational model the term relation is used to refer
to a table,
table while the term tuple is used to refer to a row.
row
Similarly, the term attribute refers to a column of a
table.
Relation Schema and Instance
 A1, A2, …, An are attributes

 R = (A1, A2, …, An ) is a relation schema


Example:
instructor = (ID, name, dept_name, salary)
 Formally, given sets D1, D2, …. Dn a relation r is a subset of

D 1 x D 2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where
 The current values (relation instance) of a relation are
each ai  Di
specified by a table
 An element t of r is a tuple, represented by a row in a
table
 For each attribute of a relation, there is a set of
permitted values, called the domain of that attribute.
 We require that, for all relations r, the domains of all
attributes of r be atomic.
atomic
 A domain is atomic if elements of the domain are
considered to be indivisible units.
Database Schema
 Database schema is the logical design of the database,
and the database instance,
instance which is a snapshot of the
data in the database at a given instant in time.
 In general, a relation schema consists of a list of
attributes and their corresponding domains.
 The schema of a relation does not generally change.
Example Schema
Keys
 Super key
 An attribute, or set of attributes, that uniquely
identifies a tuple within a relation.
 Candidate Key
 A super key such that no proper subset is a super key
within the relation.
 When a key consists of more than one attribute, we
call it a composite key.
 Primary
 The candidate key that is selected to identify tuples
uniquely within the relation.
 Foreign
 An attribute, or set of attributes, within one relation
that matches the candidate key of some (possibly the
same) relation.
Keys
 Let K  R
 K is a superkey of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
 Example: {ID} and {ID,name} are both superkeys of
instructor.
 Superkey K is a candidate key if K is minimal
Example: {ID} is a candidate key for Instructor
 One of the candidate keys is selected to be the primary key.
 which one?
 Foreign key constraint: Value in one relation must appear in
another
 Referencing relation
 Referenced relation
 Example – dept_name in instructor is a foreign key from
instructor referencing department
Schema Diagrams
 A database schema, along with primary key and foreign
key dependencies, can be depicted by schema diagrams
Relational Query Languages
 A query language is a language in which a user requests
information from the database. These languages are
usually on a level higher than that of a standard
programming language.
 Query languages can be categorized as either
procedural or nonprocedural.
 In a procedural language, the user instructs the system
to perform a sequence of operations on the database to
compute the desired result.
 In a nonprocedural language, the user describes the
desired information without giving a specific procedure
for obtaining that information.
Relational Operations
 RELATIONAL ALGEBRA
 The relational algebra defines a set of operations on
relations, paralleling the usual algebraic operations
such as addition, subtraction or multiplication, which
operate on numbers.
Select Operation – selection of rows
(tuples)
 Relation r

A=B ^ D > 5 (r)


Project Operation – selection of columns
(Attributes)

 Relation r:

 A,C (r)
Union of two relations
 Relations r, s:

 r  s:
Set difference of two relations
 Relations r, s:

 r – s:
Set intersection of two relations

 Relation r, s:

 rs

Note: r  s = r – (r – s)
joining two relations -- Cartesian-
product
 Relations r, s:

 r x s:
Cartesian-product – naming issue
 Relations r, s: B

 r x s: r.B s.B
Notes about Relational Languages
 Each Query input is a table (or set of tables)
 Each query output is a table.
 All data in the output table appears in one of the input
tables
 Relational Algebra is not Turning complete
 Can we compute:
 SUM
 AVG
 MAX
 MIN
Summary of Relational Algebra
Operators
Symbol (Name) Example of Use
σ
σ
(Selection) salary > = 85000 (instructor)
Return rows of the input relation that satisfy the predicate.
Π
Π
(Projection) ID, salary (instructor)
Output specified attributes from all rows of the input relation. Remove
duplicate tuples from the output.
x
(Cartesian Product) instructor x department
Output pairs of rows from the two input relations that have the same value on
all attributes that have the same name.

Π ∪ Π
(Union) name (instructor) name (student)
Output the union of tuples from the two input relations.
-
Π -- Π
(Set Difference) name (instructor) name (student)
Output the set difference of tuples from the two input relations.

(Natural Join) instructor ⋈ department
Output pairs of rows from the two input relations that have the same value on
all attributes that have the same name.
End of Unit 1

You might also like