0% found this document useful (0 votes)

58 views19 pages

Lecture 16

This document provides an overview of storage and indexing concepts. It discusses different file organizations like heap files, sorted files, and indexed files. It also covers different types of indexes like hash-based and tree-based indexes. The document compares the costs of common operations like scans, searches, inserts and deletes on different file organizations and indexes. It provides assumptions used for the cost analysis and comparisons.

Uploaded by

Chandrika Surya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views19 pages

Lecture 16

Uploaded by

Chandrika Surya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Overview of Storage and Indexing

Chapter 8

Comp 521 – Files and Databases Fall 2016 1

Data on External Storage
 Solid State Disks, Secure Digital (SD) non-volatile memory:
 Block addressable storage device, relatively symmetric R/W speeds,
Access latency
 Disks: Can retrieve random page at fixed cost
 But reading consecutive pages is much cheaper than
reading them in random order
 Tapes: Can only read pages sequentially
 Cheaper than disks; used for archival storage
 File organization: Method of arranging a file of records on external
storage.
 Record id (rid) is sufficient to physically locate record
 Indexes are data structures that allow us to find the record ids of
records with given values in index search key fields
 Architecture: Buffer manager stages pages from external storage to
main memory buffer pool. File and index layers make calls to the
buffer manager.

Comp 521 – Files and Databases Fall 2016 2

Alternative File Organizations
Many alternatives exist, each ideal for some
situations, and not so good in others:
 Heap (random order) files: Suitable when typical
access is a file scan retrieving all records.
 Sorted Files: Best if records must be retrieved in
some order, or only a `range’ of records is needed.
 Indexes: Data structures to organize records via
trees or hashing.
• Like sorted files, they speed up searches for a subset of
records, based on values in certain (“search key”) fields
• Updates are much faster than in sorted files.

Comp 521 – Files and Databases Fall 2016 3

Indexes
 An index is an axillary data structure that
speeds up selections on the search key fields of
the index.
 Any subset of attributes from a relation can be a
search key.
 Search key is not necessarily a relation key (a set of
fields that uniquely identify a tuple in a relation).
 An index contains a collection of data entries,
and supports efficient retrieval of all data
entries k* with a given key value k.
 Given data entry k*, we can find record with key k
in at most one disk I/O. (Details soon …)

Comp 521 – Files and Databases Fall 2016 4

Hash-Based Index
 Place all records with a common attribute
together.
 Index is a collection of buckets.
 Bucket = primary page plus zero or
more overflow pages. key
 Buckets contain data entries. H(x)

 Hashing function, r = h(key) :

Mapping from the index’s search key to a
bucket in which the (data entry for) record r
belongs.

Comp 521 – Files and Databases Fall 2016 5

Tree-Based Index

Non-leaf
Pages

Leaf
Pages
(“Ordered” by search key)

 Leaf pages contain data entries, and are chained (prev & next)
 Non-leaf pages have index entries; only used to direct searches:
index entry

P0 K 1 P1 K 2 P 2 K m Pm

Comp 521 – Files and Databases Fall 2016 6

Alternative Data/Index Organizations
 In data entry k* we store one of the following:
 The actual data record with its key k (clustered)
 <k, rid of data record with search key value k>
 <k, list of rids of data records with search key k>
 Data organization choice is independent of the
indexing method.
 Clustered indices save on accesses, but you can only
have 1 clustered index per relation
 Unclustered alternatives tradeoff uniformity of
index entries verses size considerations
 Often, indices contains auxiliary information
Comp 521 – Files and Databases Fall 2016 7
Index Classifications
 Primary vs. Secondary: If search key contains
primary key, then it is called a primary index.
 Unique index: Search key contains a candidate key.
 Clustered vs. Unclustered:
 Clustered: tuples are sorted by search key and stored
sequentially in data blocks
 A file can be clustered on at most one search key.
 Unclustered: search keys are stored with record ids
(rids) that identify the block containing the
associated tuple

Comp 521 – Files and Databases Fall 2016 8

Clustered vs. Unclustered Index
 Index type (Hash or Tree) is independent of the data’s
organization (clustered or unclustered).
 To build clustered index, we must first sort the records (perhaps
allowing for some free space on each page for future inserts).
 Later inserts might create overflow pages. Thus, eventual order
of data records is “close to”, but not identical to, the sort order.

Index entries
CLUSTERED direct search for UNCLUSTERED
data entries

Data entries Data entries

(Index Blocks)
(Data Blocks)

Data Records Data Records

Comp 521 – Files and Databases Fall 2016 9
Costs / Benefits of Indexing
 Adding an index incurs
 Storage overhead
 Maintenance overhead
 Without indexing, searching the records of a
database for a particular record would
require on average

Number of Records * Cost to read a Record * 0.5

(assumes records are in random order)

Comp 521 – Files and Databases Fall 2016 10

Cost Model for Our Analysis
We ignore CPU costs, for simplicity:
 B: The number of data pages
 R: Number of records per page
 D: (Average) time to read or write a block
 Measuring number of page I/O’s ignores gains of
pre-fetching a sequence of pages; thus, even I/O
cost is only approximated.
 Average-case analysis; based on several simplistic
assumptions.

 Good enough to show the overall trends!

Comp 521 – Files and Databases Fall 2016 11
Comparing File Organizations
 Heap file (random record order;
insert at eof)
 Sorted files, sorted on <age, sal>
 Clustered B+ tree file, clustered on search
key <age, sal>
 Heap file with unclustered B+ tree index
on search key <age, sal>
 Heap file with unclustered hash index
on search key <age, sal>

Comp 521 – Files and Databases Fall 2016 12

Operations to Compare
 Scan: Fetch all records from disk SELECT *
FROM Emp
 Equality search SELECT *
FROM Emp
 Range selection WHERE Age = 25 SELECT *
 Insert a record FROM Emp
WHERE Age > 30
 Delete a record
INSERT
INTO Emp(Name, Age, Salary)
VALUES(‘Jordan’, 49, 3000000)

DELETE
FROM Emp
WHERE Name =‘Bristow’

Comp 521 – Files and Databases Fall 2016 13

Assumptions in Our Analysis
 Heap Files:
 Equality selection is on key  exactly one match
 Sorted Files:
 Files compacted after deletions.
 Indexes:
 Search key overhead = 10% size of record
 Hash: No overflow buckets.
• 80% page occupancy => File size = 1.25 data size
 Tree: 67% occupancy (this is typical).
• Implies file size = 1.5 data size
• Tree Fan-out = F

Comp 521 – Files and Databases Fall 2016 14

Assumptions (contd.)
 Scans:
 Leaf levels of a tree-index are chained.
 Index data-entries plus actual file scanned for
unclustered indexes.
 Range searches:
 We use tree indexes to restrict the set of data
records fetched, but ignore hash indexes.

Comp 521 – Files and Databases Fall 2016 15

Cost of Operations
File Type Scan Equality Range Search Insert Delete
Search
Heap BD 0.5BD BD 2D Search + D

Sorted BD Dlog2B Dlog2B + Search + BD Search + BD

#matches
Clustered 1.5BD DlogF1.5B DlogF1.5B + Search + D Search + D
#matches
Unclustered BD(R+0.15) D(1+ D(1+logF0.15B+ D(logF0.15B) Search + 2D
tree index logF0.15B) #matches)

Unclustered BD(R+0.125) 2D BD 4D Search + 2D

hash index

 Several assumptions underlie these (rough) estimates!

We’ll cover them in the next few lectures.

Comp 521 – Files and Databases Fall 2016 16

Indexes and Workload
 For each query in the workload:
 Which relations does it access?
 Which attributes are retrieved?
 Which attributes are involved in selection/join conditions?
How selective are the conditions applied likely to be?
 For each update in the workload:
 Which attributes are involved in selection/join conditions?
How selective are these conditions likely to be?
 The type of update (INSERT/DELETE/UPDATE), and the
attributes that are affected.

Comp 521 – Files and Databases Fall 2016 17

Index-Only Plans
 Some queries <E.dno> SELECT E.dno, COUNT(*)
Index stores a
can be answered count of tuples FROM Emp E
without with the same GROUP BY E.dno
retrieving any key
tuples from one A Tree index on
SELECT E.dno, MIN(E.sal)
or more of the <E.dno,E.sal> FROM Emp E
relations would give the
GROUP BY E.dno
involved if a anwser
suitable
index is <E. age,E.sal> SELECT AVG(E.sal)
or FROM Emp E
available.
<E.sal, E.age> WHERE E.age=25 AND
Average the E.sal BETWEEN 3000 AND 5000
index keys
Comp 521 – Files and Databases Fall 2016 18
Summary
 Alternative file organizations, each suited for
different situations.
 If selection queries are frequent, data
organization and indices are important.
 Hash-based indexes
 Sorted files
 Tree-based indexes
 An index maps search-keys to associated tuples.
 Understanding the workload of an application,
and its performance goals, is essential for a
good design.
Comp 521 – Files and Databases Fall 2016 19

File Storage and Indexing Guide
No ratings yet
File Storage and Indexing Guide
13 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Database File Organization Guide
No ratings yet
Database File Organization Guide
26 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Lec 7
No ratings yet
Lec 7
34 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
V Unit
No ratings yet
V Unit
36 pages
V Unit
No ratings yet
V Unit
15 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
Indexing
No ratings yet
Indexing
62 pages
Efficient File Indexing Methods
No ratings yet
Efficient File Indexing Methods
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Storage and Indexing Methods
No ratings yet
Storage and Indexing Methods
43 pages
Database Storage and Indexing
No ratings yet
Database Storage and Indexing
14 pages
Dbms PPT For Chapter 7
No ratings yet
Dbms PPT For Chapter 7
45 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
Index 1
No ratings yet
Index 1
25 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
DS TM Study Material Presentations Unit-4 1TM
No ratings yet
DS TM Study Material Presentations Unit-4 1TM
22 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
File Organization
No ratings yet
File Organization
19 pages
DBMS Unit-5 Notes
No ratings yet
DBMS Unit-5 Notes
23 pages
4 File & Index
No ratings yet
4 File & Index
35 pages
Storage and File Management
100% (1)
Storage and File Management
16 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
RDBMS File Organization Guide
No ratings yet
RDBMS File Organization Guide
58 pages
DBMS File & Index Organization
No ratings yet
DBMS File & Index Organization
10 pages
m5 Index PDF
No ratings yet
m5 Index PDF
60 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
60 pages
Database Indexing Essentials
No ratings yet
Database Indexing Essentials
16 pages
Lesson 9 Mod2l2
No ratings yet
Lesson 9 Mod2l2
16 pages
IT3031 L06 Indexing
No ratings yet
IT3031 L06 Indexing
45 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Unit-5 DBMS
No ratings yet
Unit-5 DBMS
28 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
Unit 5
No ratings yet
Unit 5
185 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
File Organization
No ratings yet
File Organization
11 pages
Database Indexing Basics
No ratings yet
Database Indexing Basics
31 pages
Unit 4 Strings
No ratings yet
Unit 4 Strings
7 pages
Unit - 4 Pointers
No ratings yet
Unit - 4 Pointers
12 pages
NoSQL Databases UNIT-3
No ratings yet
NoSQL Databases UNIT-3
20 pages
NoSQL Databases UNIT-2
No ratings yet
NoSQL Databases UNIT-2
29 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
18 pages
DHTML With Javascript
No ratings yet
DHTML With Javascript
25 pages
Oop With Python Lab
No ratings yet
Oop With Python Lab
48 pages
Catalog DTD
No ratings yet
Catalog DTD
1 page
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Euro Delta Pilot UK Techn Part1
No ratings yet
Euro Delta Pilot UK Techn Part1
18 pages
CE AND CB LAB Exp1 2
No ratings yet
CE AND CB LAB Exp1 2
7 pages
TAIT T850 Series II Service Manuals Issue 201 Drawings
No ratings yet
TAIT T850 Series II Service Manuals Issue 201 Drawings
108 pages
Suunto Watch Reset Guide
No ratings yet
Suunto Watch Reset Guide
1 page
Language Processing
50% (8)
Language Processing
29 pages
Full Flow Clock Domain Crossing - From Source To Si: March 2016
No ratings yet
Full Flow Clock Domain Crossing - From Source To Si: March 2016
13 pages
Professional Cloud Architect - 6
No ratings yet
Professional Cloud Architect - 6
10 pages
MyPROJECT Designer Manual
No ratings yet
MyPROJECT Designer Manual
431 pages
Energy-Efficient Iron for Home Use
No ratings yet
Energy-Efficient Iron for Home Use
2 pages
Universal Thermostat: Ruud Van Steenis
No ratings yet
Universal Thermostat: Ruud Van Steenis
2 pages
Unit 2 - Electronic - Measurement
No ratings yet
Unit 2 - Electronic - Measurement
67 pages
Operating System (1000 MCQS)
100% (3)
Operating System (1000 MCQS)
135 pages
Advanced Digital Modulation Guide
No ratings yet
Advanced Digital Modulation Guide
41 pages
Game Settings Configuration
No ratings yet
Game Settings Configuration
4 pages
Spring Boot & Cucumber Integration
No ratings yet
Spring Boot & Cucumber Integration
97 pages
Godown Wiring - Mechatrofice
No ratings yet
Godown Wiring - Mechatrofice
2 pages
06 - Web Services
No ratings yet
06 - Web Services
14 pages
orbilogin.com
No ratings yet
orbilogin.com
9 pages
Windows 2008 Server
No ratings yet
Windows 2008 Server
485 pages
Step by Step How To Applying Patch 27010930 - Database Proactive Bundle Patch 12.1.0.2.180116 (16JAN2018)
No ratings yet
Step by Step How To Applying Patch 27010930 - Database Proactive Bundle Patch 12.1.0.2.180116 (16JAN2018)
10 pages
Secure E-Waste Solutions for Businesses
No ratings yet
Secure E-Waste Solutions for Businesses
4 pages
Combined Cell
No ratings yet
Combined Cell
17 pages
Prashant Singh
No ratings yet
Prashant Singh
1 page
Operating System 2 Marks and 16 Marks - Answers
75% (4)
Operating System 2 Marks and 16 Marks - Answers
45 pages
Lesson Plan CSC567 20242
No ratings yet
Lesson Plan CSC567 20242
4 pages
Computer Operating System Assignment 2
No ratings yet
Computer Operating System Assignment 2
5 pages
Computer Fandamental
No ratings yet
Computer Fandamental
21 pages
Sap Linux Installation Guide
No ratings yet
Sap Linux Installation Guide
7 pages
SG800 3.2.4.8 Install Guide
No ratings yet
SG800 3.2.4.8 Install Guide
54 pages
Cisco Jabber MRA Guide for Engineers
No ratings yet
Cisco Jabber MRA Guide for Engineers
149 pages

Lecture 16

Uploaded by

Lecture 16

Uploaded by

Overview of Storage and Indexing

Comp 521 – Files and Databases Fall 2016 1

Comp 521 – Files and Databases Fall 2016 2

Comp 521 – Files and Databases Fall 2016 3

Comp 521 – Files and Databases Fall 2016 4

 Hashing function, r = h(key) :

Comp 521 – Files and Databases Fall 2016 5

Comp 521 – Files and Databases Fall 2016 6

Comp 521 – Files and Databases Fall 2016 8

Data entries Data entries

Data Records Data Records

Number of Records * Cost to read a Record * 0.5

(assumes records are in random order)

Comp 521 – Files and Databases Fall 2016 10

 Good enough to show the overall trends!

Comp 521 – Files and Databases Fall 2016 12

Comp 521 – Files and Databases Fall 2016 13

Comp 521 – Files and Databases Fall 2016 14

Comp 521 – Files and Databases Fall 2016 15

Sorted BD Dlog2B Dlog2B + Search + BD Search + BD

Unclustered BD(R+0.125) 2D BD 4D Search + 2D

 Several assumptions underlie these (rough) estimates!

Comp 521 – Files and Databases Fall 2016 16

Comp 521 – Files and Databases Fall 2016 17

You might also like