CT80A0000 – DATA-INTENSIVE SYSTEMS
DISTRIBUTED DATA CONTROL
Lecture
Jiri Musto, D.Sc.
SEMANTIC DATA CONTROL
Objective is to make sure that authorized users perform correct operations on the
database
Aim to maintain database integrity
Methods for data control:
1. View management
2. Security control
3. Integrity control
3
VIEWS
A view is a (virtual) relation generated from a base relation
In SQL, a SELECT-query is stored as a VIEW
Views can be queried like normal relations
Main benefit of views:
Simplify complex queries by using views
Restrict access to data with views
Prevent accidental modification of data (view updates are heavily restricted)
4
VIEWS IN DISTRIBUTED DBMS
Views might be derived from fragments.
Views might be costly if base relations are distributed
You can use materialized views
Instead of a virtual view, the view is cached as a table
Used in practice to have local access instead of remote access
Used to speed up processes
Useful if having small number of data or query requires significant processing
5
TAKEN FROM SNOWFLAKE INC.
Performance Security Simplifies Supports Uses Storage
Benefits Benefits Query Logic Clustering
Regular table ✔
Regular view ✔ ✔
Cached query ✔
result
Materialized view ✔ ✔ ✔ ✔
6
VIEW REFRESHING
1. Immediate mode
When underlying data is updated
View is always consistent but increases update times
2. Deferred mode (preferred in practice)
Update views separately
Can be triggered at different times
▪ Lazily: Before querying a view
▪ Periodically: Every hour, every day, etc.
▪ Forcedly: After a certain number of updates
Views can be updated either
Fully
Incrementally
7
INCREMENTAL VIEW MAINTENANCE
You can utilize:
1. Counting algorithm
• Count the changes done to the base relation to update the views
2. Data skew
• Use different join plans based on differences in the amount of data and changes
• Rebalance
8
DATA SECURITY
Data protection
Prevents the content of data to be understood by unauthorized users
Encryption / decryption
Access control
Prevent the access and operations of data by unauthorized users
Discretionary access control (DAC)
▪ Authorization granted to subject within the database
Mandatory access control (MAC)
▪ Access control by the database based on the sensitiveness of data
▪ Done with security labels (e.g. LBAC)
9
DISCRETIONARY ACCESS CONTROL
Main actors
Subjects (users, groups of users) who execute operations
Operations (queries, functions, procedures)
Objects (data attribute, entity, row, table, etc.)
Check whether a subject may perform an operation on an object
In SQL: defined using GRANT OR REVOKE
Can have one admin class or multiple
Multiple is flexible but recursive authorization may be problematic
10
MANDATORY ACCESS CONTROL
Often associated with Multi-level access control (MLS)
Different security levels (clearances)
Top Secret > Secret > Confidential > Unclassified
Access controlled by 2 rules:
No read up
▪ Can only read data that is within or below users’ clearance level
▪ This is to avoid having access to data that is beyond one’s clearance level
No write down:
▪ Cannot write data below users’ clearance level
▪ This is to avoid accidentally sharing higher level data with lower level security
11
MAC IN RELATIONAL DB
A relation can be classified at different levels:
Relation: all tuples have the same clearance
Tuple: every tuple has its own clearance
Attribute: every attribute has its own clearance
A classified relation is thus multilevel
Appears differently (with different data) to subjects with different clearances
12
https://www.ekransystem.com/en/blog/mac-vs-dac
13
PROS AND CONS
PROS CONS
DAC Flexible Low level of protection
Easy to maintain User privilege overlap
User-friendly Access management
Granular maintenance
MAC High level of protection Difficult to scale
Granularity Manual maintenance
Fewer errors
https://www.ekransystem.com/en/blog/mac-vs-dac
14
ADDITIONAL PROBLEMS IN A DISTRIBUTED
ENVIRONMENT
Remote user authentication
Usually done using a service
The service should be replicated to multiple sites
Management of DAC rules
Problem if users’ group can span multiple sites
Rules stored at some directory based on user groups’ location
Accessing rules may incur remote queries
MAC covert channels
Accessing unauthorized data indirectly (query may contain data that is on a higher security level than
what the database contains)
15
SEMANTIC INTEGRITY CONTROL
Maintain database consistency
Structural constraints
Unique keys, primary and foreign keys, etc.
Behavioral constraints
Dependencies in the relational model
Two components
Integrity constraint specification
Integrity constraint enforcement
16
DIFFERENT CONSTRAINTS
Data model related (predefined)
Primary, foreign, unique key
Attribute types, not-null
Functional dependency between data
Update related (precompiled)
NEW and OLD tuples
Value restrictions
Methods:
IF and CHECK conditions
17
INTEGRITY ENFORCEMENT
1. Detection
• Execute a given query
• Detect an inconsistency across the databases
• Fix or undo
2. Preventive
• Execute a query only if the data will stay consistent within the databases
• Need to determine what state counts as consistent (based on update rules)
• Can use various methods in relational databases, for example:
• CHECK conditions
• ASSERTIONS (check condition on multiple tables)
• TRIGGERS
18
PROBLEMS WITH DISTRIBUTED INTEGRITY
CONTROL
Definition of constraints
How are the constraints defined?
Need to consider what fragments exist
Where to store
Where are the constraints stored?
How is data replicated across databases that have constraints?
Enforcement
Where should the constraints be enforced to minimize costs or increase efficiency?
19
CONCLUSION
Solutions initially designed for centralized systems
Extended for distributed systems
Materialized views and group-based discretionary access control
Semantic integrity control has received less attention
Generally not well supported by distributed DBMS
Full data control is more complex and costly in distributed systems
Definition and storage of the rules (site selection)
Design of enforcement algorithms which minimize communication costs
20