Outline
• Introduction
➡ What is a distributed DBMS
➡ Distributed DBMS Architecture
• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
• Multidatabase query processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/1
File Systems
program 1
File 1
data description 1
program 2
data description 2 File 2
program 3
data description 3 File 3
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/2
Database Management
Application
program 1
(with data
semantics)
DBMS
description
Application
program 2 manipulation
(with data database
semantics) control
Application
program 3
(with data
semantics)
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/3
Motivation
Database Computer
Technology Networks
integration distribution
Distributed
Database
Systems
integration
integration ≠ centralization
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/4
Distributed Computing
• A number of autonomous processing elements (not necessarily
homogeneous) that are interconnected by a computer network and that
cooperate in performing their assigned tasks.
• What is being distributed?
➡ Processing logic
➡ Function
➡ Data
➡ Control
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/5
What is a Distributed Database
System?
A distributed database (DDB) is a collection of multiple, logically
interrelated databases distributed over a computer network.
A distributed database management system (D–DBMS) is the software
that manages the DDB and provides an access mechanism that makes this
distribution transparent to the users.
Distributed database system (DDBS) = DDB + D–DBMS
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/6
What is not a DDBS?
• A timesharing computer system
• A loosely or tightly coupled multiprocessor system
• A database system which resides at one of the nodes of a network of
computers - this is a centralized database on a network node
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/7
Centralized DBMS on a Network
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/8
Distributed DBMS Environment
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/9
Implicit Assumptions
• Data stored at a number of sites each site logically consists of a single
processor.
• Processors at different sites are interconnected by a computer network
not a multiprocessor system
➡ Parallel database systems
• Distributed database is a database, not a collection of files data logically
related as exhibited in the users’ access patterns
➡ Relational data model
• D-DBMS is a full-fledged DBMS
➡ Not remote file system, not a TP system
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/10
Data Delivery Alternatives
• Delivery modes
➡ Pull-only
➡ Push-only
➡ Hybrid
• Frequency
➡ Periodic
➡ Conditional
➡ Ad-hoc or irregular
• Communication Methods
➡ Unicast
➡ One-to-many
• Note: not all combinations make sense
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/11
Distributed DBMS Promises
Transparent management of distributed, fragmented, and replicated data
Improved reliability/availability through distributed transactions
Improved performance
Easier and more economical system expansion
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/12
Transparency
• Transparency is the separation of the higher level semantics of a system
from the lower level implementation issues.
• Fundamental issue is to provide
data independence
in the distributed environment
➡ Network (distribution) transparency
➡ Replication transparency
➡ Fragmentation transparency
✦ horizontal fragmentation: selection
✦ vertical fragmentation: projection
✦ hybrid
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/13
Example
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/14
Transparent Access
SELECT ENAME,SAL
Tokyo
FROM EMP,ASG,PAY
WHERE DUR > 12 Boston Paris
AND EMP.ENO = ASG.ENO Paris projects
Paris employees
AND PAY.TITLE = EMP.TITLE Communication Paris assignments
Network Boston employees
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/15
Distributed Database - User View
Distributed Database
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/16
Distributed DBMS - Reality
User
Query
User
DBMS
Application
Software
DBMS
Software
DBMS Communication
Software Subsystem
User
DBMS User Application
Software Query
DBMS
Software
User
Query
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/17
Types of Transparency
• Data independence
• Network transparency (or distribution transparency)
➡ Location transparency
➡ Fragmentation transparency
• Replication transparency
• Fragmentation transparency
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/18
Reliability Through Transactions
• Replicated components and data should make distributed DBMS more
reliable.
• Distributed transactions provide
➡ Concurrency transparency
➡ Failure atomicity
• Distributed transaction support requires implementation of
➡ Distributed concurrency control protocols
➡ Commit protocols
• Data replication
➡ Great for read-intensive workloads, problematic for updates
➡ Replication protocols
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/19
Potentially Improved
Performance
• Proximity of data to its points of use
➡ Requires some support for fragmentation and replication
• Parallelism in execution
➡ Inter-query parallelism
➡ Intra-query parallelism
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/20
Parallelism Requirements
• Have as much of the data required by each application at the site where the
application executes
➡ Full replication
• How about updates?
➡ Mutual consistency
➡ Freshness of copies
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/21
System Expansion
• Issue is database scaling
• Emergence of microprocessor and workstation technologies
➡ Demise of Grosh's law
➡ Client-server model of computing
• Data communication cost vs telecommunication cost
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/22
Distributed DBMS Issues
• Distributed Database Design
➡ How to distribute the database
➡ Replicated & non-replicated database distribution
➡ A related problem in directory management
• Query Processing
➡ Convert user transactions to data manipulation instructions
➡ Optimization problem
✦ min{cost = data transmission + local processing}
➡ General formulation is NP-hard
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/23
Distributed DBMS Issues
• Concurrency Control
➡ Synchronization of concurrent accesses
➡ Consistency and isolation of transactions' effects
➡ Deadlock management
• Reliability
➡ How to make the system resilient to failures
➡ Atomicity and durability
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/24
Relationship Between Issues
Directory
Management
Query Distribution
Reliability
Processing Design
Concurrency
Control
Deadlock
Management
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/25
Related Issues
• Operating System Support
➡ Operating system with proper support for database operations
➡ Dichotomy between general purpose processing requirements and database
processing requirements
• Open Systems and Interoperability
➡ Distributed Multidatabase Systems
➡ More probable scenario
➡ Parallel issues
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/26