BTree 1
BTree 1
B-trees are balanced search trees designed to work well on magnetic disks or other direct-access secondary
storage devices, because they are better at minimizing disk I/O operations.
Many database systems use B-trees (or variants of B-trees such as B+-tree, B*-tree) to store information.
B-trees generalize binary search trees in a natural manner. Following diagram shows a simple B-
tree.
If an internal B-tree node x contains n[x] keys, then x has n[x] + 1 children. The keys in node x
are used as dividing points separating the range of keys handled by x into n[x] + 1 subranges, each
handled by one child of x. When searching for a key in a B-tree, we make an (n[x] + 1)-way
decision based on comparisons with the n[x] keys stored at node x.
Definition of B-Tree:
A B-tree T is a rooted tree (whose root is root[T]) having the following properties:
(1) Every node x has the following fields:
a. n[x]: the number of keys currently stored in node x,
b. the n[x] keys themselves, stored in nondecreasing order, so that
key1[x] ≤key2[x] ≤ · · · ≤ keyn[x][x],
c. leaf [x]: a boolean value that is TRUE if x is a leaf and FALSE if x is an internal node.
(2) Each internal node x also contains n[x]+1 pointers c1[x], c2[x], . . . , cn[x]+1[x] to its children.
Leaf nodes have no children, so their ci fields are undefined.
(3) The keys keyi[x] separate the ranges of keys stored in each subtree: if ki is any key stored
in the subtree with root ci[x], then
(4) All leaves have the same depth, which is the tree’s height h.
(5) There are lower and upper bounds on the number of keys a node can contain. These bounds
can be expressed in terms of a fixed integer t ≥ 2 called the minimum degree of the B-
tree:
a. Every node other than the root must have at least t − 1 keys.
(⇒ Every internal node other than the root thus has at least t children.)
If the tree is nonempty, the root must have at least one key.
b. Every node can contain at most 2t − 1 keys.
(⇒Therefore, an internal node can have at most 2t children.)
We say that a node is full if it contains exactly 2t − 1 keys.
Note:
1) The simplest B-tree occurs when t = 2. Every internal node then has either 2, 3, or 4
children, and we have a 2-3-4 tree. (⇒ Each node can contain x keys where 1≤ x ≤ 3) In
practice, however, much larger values of t are typically used.
2) B-tree node is usually as large as a whole disk page. The number of children a B-tree node
can have is therefore limited by the size of a disk page.
3) For a large B-tree stored on a disk, branching factors between 50 and 2000 are often used,
depending on the size of a key relative to the size of a page. A large branching factor
dramatically reduces both the height of the tree and the number of disk accesses required
to find any key.
Eg:
Figure below shows a B-tree with a branching factor of 1001 and height 2 that can store
over one billion keys; nevertheless, since the root node can be kept permanently in main
memory, only two disk accesses at most are required to find any key in this tree!