Data Warehousing - Denormalization (Lecture 8 & 9)
■ Lecture 8: Denormalization Techniques
1. Purpose of Denormalization
- Reduces runtime joins.
- Optimizes query performance in DSS environments.
- Can involve combining or splitting tables.
2. Table Splitting Techniques
A. Horizontal Splitting
- Splits rows based on values (e.g., campus, year).
- Benefits: Parallelism, security, faster access, smaller B-trees.
B. Vertical Splitting
- Splits columns into separate tables, primary key repeated.
- Benefits: Smaller record size, better I/O, useful for large text fields.
3. Pre-Joining Tables
- Joins master-detail tables ahead of time.
- Useful for 1:M relationships. Cost: Data repetition.
4. Adding Redundant Columns
- Adds frequently referenced data to avoid joins.
- Risks: Storage increase, update complexity.
5. Derived Attributes
- Examples: Age = Current Date – DoB, GP = Grade × Credits.
- Benefit: Faster queries via pre-computed values.
■ Lecture 9: Issues of Denormalization
1. General Issues
- Effects are unpredictable; normalize to 3NF first.
- Document logical and physical models.
2. Storage Issues
- Increases size (e.g., 12.5% more with pre-joining).
- Redundant columns increase table size.
3. Performance Issues
- Pre-joining: Count distinct, larger headers, more data.
- Redundant columns: Slower scans, risk of inconsistency.
- Vertical: Faster for frequent, slower for infrequent queries.
- Horizontal: Good for parallelism; combining is complex.
4. Maintenance & Usability
- Denormalization increases update and integrity issues.
5. Industry Ratios
- Health Care: 1:2 | Video Rental: 1:3 | Retail: 1:30