[go: up one dir, main page]

0% found this document useful (0 votes)
75 views14 pages

Data Structures Lab Project: Implementation and Optimization of Some Lesser-Known BST

The document describes a student project to implement and optimize lesser-known self-balancing binary search trees (BSTs). The project will explore Splay trees, Scapegoat trees, AA trees, 2-3 trees, and Treaps, and attempt to find realistic applications where their performance exceeds common BSTs like AVL and Red-Black trees. The implementation will be done in C++, with testing of execution times. Key aspects of the trees like operations, advantages, and uses will be discussed in the report.

Uploaded by

Shreyash Rautela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views14 pages

Data Structures Lab Project: Implementation and Optimization of Some Lesser-Known BST

The document describes a student project to implement and optimize lesser-known self-balancing binary search trees (BSTs). The project will explore Splay trees, Scapegoat trees, AA trees, 2-3 trees, and Treaps, and attempt to find realistic applications where their performance exceeds common BSTs like AVL and Red-Black trees. The implementation will be done in C++, with testing of execution times. Key aspects of the trees like operations, advantages, and uses will be discussed in the report.

Uploaded by

Shreyash Rautela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Department of CSE/IT

Jaypee Institute of Information Technology,


Noida
June-July 2021

Data Structures Lab


Project
(15B17CI371)

Implementation and Optimization


of Some Lesser-Known BST

Submitted by: - Submitted to: -


Abhinav Sharma 9919103107 Prof. Shariq Murtuza
Chirag Sharma 9919103108
Shreyash Rautela 9919103110
Title of The Project
Implementation and Optimization of Some Lesser-Known BST

Abstract

A self-balancing binary search tree is any node-based binary search tree that automatically keeps
its height small in the face of arbitrary item insertions and deletions. These structures provide
efficient implementations for mutable ordered lists, and can be used for other abstract data
structures such as associative arrays, priority queues and sets.

There are numerous types of self-balancing bst such as AVL tree (Adelson-Velsky and Landis),
Red-Black tree, B tree, Splay tree and many other modifications of these trees. Among these trees
the most commonly used ones are AVL tree and Red-Black tree, because these are by far the most
efficient and optimized data structures to be used in various real-life applications such as database
management system wherein insertion and deletion of data are fewer but there are frequent
lookups for data required.

There are a lot of little known (or less used) self-balancing bst implementations such as 2-3 tree,
Splay tree, Scapegoat tree, AA tree, Treap, which are not commonly preferred over AVL and
Red-Black tree when it comes to implementing them. Some of these are modified AVL and Red-
Black tress having some specific properties which are applicable in some specific scenarios. In
this project we are going to explore and implement such data structures and try to find out a semi-
realistic application where its implementation is faster than AVL tree and Red-Black tree. For
example, Splay trees can outperform Red-Black trees in cases with serious temporal locality.

Tools and Technologies Used

All of the implementation (coding) is done in C++ language, editor used is Visual Studio Code
and compiler is GNU/GCC. Online C++ compiler are used to measure the execution time of
the programs.

Dataset Description

It is mainly a report-based project through which we will get know about these specific data
structures and all the scenarios where they can be preferably used over the commonly used data
structures like AVL tree and Red-Black tree. This will help us to optimize the database
applications which are very common in today’s world where data is exponentially increasing.
Improved data retrieval is our main key of focus in this project.

Design of the Project

The three basic tree implementation that we are going to work with are AVL, Red Black, B trees.
The lesser-known trees which we will be discussing are derived from these trees.
1.Splay tree and Scapegoat tree, from AVL tree
2.AA tree from Red Black tree
3.2-3 tree from B tree
4.Treap, combination of BST and heap
Splay Tree
A splay tree is a binary search tree with the additional property that recently accessed elements are
quick to access again. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.
All normal operations on a binary search tree are combined with one basic operation, called
splaying. Splaying the tree for a certain element rearranges the tree so that the element is placed at
the root of the tree.

Operations-
a) Splaying-When a node x is accessed, a splay operation is performed on x to move it to the root.
To perform a splay operation, we carry out a sequence of splay steps, each of which moves x
closer to the root. By performing a splay operation on the node of interest after every access, the
recently accessed nodes are kept near the root and the tree remains roughly balanced, so that we
achieve the desired amortized time bounds.
Each particular step depends on three factors:
• Whether x is the left or right child of its parent node, p,
• Whether p is the root or not, and if not
• Whether p is the left or right child of its parent, g (the grandparent of x).
There are two types of splay steps, each of which has two symmetric variants: left- and right-
handed. The two types of splay steps are:
Zig rotation: It is similar to the single right rotation in AVL tree.

Zag rotation: It is similar to the single left rotation in AVL tree.


Zig Zag rotation: It is sequence of zig rotation followed by zag rotation.

Zag Zig rotation: It is a sequence of zag rotation followed by zig rotation.

These steps makes this splay tree a partially balanced binary search tree.

b) Insertion- Insertion in the splay tree is similar to insertion in binary search tree, since in AVL
tree after insertion of every new node a balancing is required to maintain the height. In the case of
splay tree, splaying is the operation followed by every insertion making the newly inserted node
closer to the root.
c) Deletion- To delete a node the same methodology is used as binary search tree. Whenever a
node is deleted, splaying of the parent of the deleted node is done.
d) Searching- Same as that of BST.

Advantages-
1. The main idea of splay tree is to bring the recently accessed item to root of the tree, this makes
the recently searched item to be accessible in O(1) time if accessed again. The idea is to use
locality of reference.
2. The search operation in Splay tree does the standard BST search, in addition to search, it also
splays (move a node to the root). If the search is successful, then the node that is found is splayed
and becomes the new root. Else the last node accessed prior to reaching the NULL is splayed and
becomes the new root.
3. Small memory footprint: Splay trees do not need to store any bookkeeping data.

Uses-
Since it uses locality of reference as its principle idea, many practical applications such as
implementing caches and garbage collection algorithms.
Scrapegoat
A scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson and again by
Igal Galperin and Ronald L. Rivest.
Unlike most other self-balancing binary search trees which provide worst case O(log n) lookup
time, scapegoat trees have no additional per-node memory overhead compared to a regular binary
search tree: a node stores only a key and two pointers to the child nodes.
This makes scapegoat trees easier to implement and, due to data structure alignment, can reduce
node overhead by up to one-third.

Unlike other self-balancing BSTs, ScapeGoat tree doesn’t require extra space per node. For
example, Red Black Tree nodes are required to have color.
AVL tree have to store the height of each node,therefore it is memory efficient as compared to avl
tree

A binary search tree is said to be weight-balanced if half the nodes are on the left of the root, and
half on the right. An α-weight-balanced node is defined as meeting a relaxed weight balance
criterion:
size(left) ≤ α*size(node)
size(right) ≤ α*size(node)

Where size can be defined recursively as:


function size(node) is
if node = nil then
return 0
else
return size(node->left) + size(node->right) + 1
end if
end function

A binary search tree that is α-weight-balanced must also be α-height-balanced, that is


height(tree) ≤ ⌊log1/α(size(tree))⌋

Operations: -
Lookup: - Lookup is not modified from a standard binary search tree, and has a worst-case time of
O(log n).The reduced node memory overhead compared to
other self-balancing binary search trees can further improve locality of reference and caching.

Insertion: - To insert value x in a Scapegoat Tree:


Create a new node u and insert x using the BST insert algorithm.
If the depth of u is greater than logα n where n is number of nodes in tree then we need to make
tree balanced. To make balanced, we use below step
to find a scapegoat.
Walk up from u until we reach a node w with size(w) > (α)*size(w.parent). This node is scapegoat
Rebuild the subtree rooted at w.parent.
In rebuilding, we simply convert the subtree to the most possible balanced BST. We first store
inorder traversal of BST in an array, then we build a new BST from array by recursively dividing it
into two halves.
Deletion: -Scapegoat trees are unusual in that deletion is easier than insertion. To enable deletion,
scapegoat trees need to store an additional value with the tree data structure.
This property, which we will call MaxNodeCount simply represents the highest achieved
NodeCount. It is set to NodeCount whenever the entire tree is rebalanced, and after insertion
is set to max(MaxNodeCount, NodeCount).
To perform a deletion, we simply remove the node as you would in a simple binary search tree, but
if
NodeCount ≤ α*MaxNodeCount
then we rebalance the entire tree about the root, remembering to set MaxNodeCount to
NodeCount.
This gives deletion its worst-case performance of O(n) time; however, it is amortized to O(log n)
average time.

Advantages: -
1)The advantage of the Scapegoat Tree compared to other self-balancing trees is that it is does not
need additional memory to store information about the tree to maintain a balanced tree. The
Scapegoat Tree is relatively straightforward to implement, so it has become well-known for its
simplicity and low memory usage.
2)There is a parameter, alpha, for the data structure which allows flexibility in deciding how
balanced the tree should be. This parameter lets the programmer decide on whether
the tree will be more balanced (which takes more rebuilding) or not as balanced (which takes less
rebuilding).
Red Black Tree
A red–black tree is a kind of self-balancing binary search tree. Each node stores an extra bit
representing "color" ("red" or "black"), used to ensure that the tree remains balanced during
insertions and deletions. It was invented by Rudolf Bayer in 1972.
When the tree is modified, the new tree is rearranged and "repainted" to restore the coloring
properties that constrain how unbalanced the tree can become in the worst case. The properties are
designed such that this rearranging and recoloring can be performed efficiently.

Properties-
In addition to the requirements imposed on a binary search tree the following must be satisfied by
a red–black tree:
1.Each node is either red or black.
2.All NIL leaves (figure 1) are considered black.
3.If a node is red, then both its children are black.
4.Every path from a given node to any of its descendant NIL leaves goes through the same number
of black nodes.

Uses-
Completely Fair Scheduler in Linux Kernel
Computational Geometry Data structures
AA Tree
An AA tree is a form of balanced tree used for storing and retrieving ordered data efficiently. AA
trees are named for Arne Andersson, their inventor. They are a variation of the red–black tree
which supports efficient addition and deletion of entries. Unlike red–black trees, red nodes on an
AA tree can only be added as a right subchild, which means that no red node can be a left sub-
child.

Operations-
a) Skew Operation-Skew is a right rotation to replace a subtree containing a left horizontal link
with one containing a right horizontal link instead.

b) Split Operation-Split is a left rotation and level increase to replace a subtree containing two or
more consecutive right horizontal links with one containing two fewer consecutive right horizontal
links.

c) Insertion-Insertion begins with the normal binary tree search and insertion procedure. Then, as
the call stack unwinds (assuming a recursive implementation of the search), it's easy to check the
validity of the tree and perform any rotations as necessary. If a horizontal left link arises, a skew
will be performed, and if two horizontal right links arise, a split will be performed, possibly
incrementing the level of the new root node of the current subtree.
d) Deletion-As in most balanced binary trees, the deletion of an internal node can be turned into
the deletion of a leaf node by swapping the internal node with either its closest predecessor or
successor, depending on which are in the tree or on the implementor's whims.
e) Searching-It is as same as it is in BST.

Advantages-
An AA tree has the same advantages as that in Red Black tree. But an AA tree makes more
rotations than a red-black tree, the simpler algorithms tend to be faster, and all of this balances out
to result in similar performance. A red-black tree is more consistent in its performance than an AA
tree, but an AA tree tends to be flatter, which results in slightly faster search times.
B Tree
A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches,
sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary
search tree, allowing for nodes with more than two children. Unlike other self-balancing binary
search trees, the B-tree is well suited for storage systems that read and write relatively large blocks
of data, such as disks. It is commonly used in databases and file systems. It was invented in 1970
by Rudolf Bayer and Edward M. McCreight.

Properties-
1.All leaves are at the same level.
2.A B-Tree is defined by the term minimum degree ‘t’. The value of t depends upon disk block
size.
3.Every node except root must contain at least (ceiling)([t-1]/2) keys. The root may contain
minimum 1 key.
4.All nodes (including root) may contain at most t – 1 key.
5.Number of children of a node is equal to the number of keys in it plus 1.
6.All keys of a node are sorted in increasing order. The child between two keys k1 and k2 contains
all keys in the range from k1 and k2.
7.B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary Search Trees
grow downward and also shrink from downward.
8.Like other balanced Binary Search Trees, time complexity to search, insert and delete is O(log
n).
9.Insertion of a Node in B-Tree happens only at Leaf Node.

Uses-
A B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and
deletions in logarithmic amortized time. Unlike self-balancing binary search trees, it is optimized
for systems that read and write large blocks of data. It is most commonly used in database and file
systems.
2-3 Trees
A 2-3 Tree is a tree data structure where every node with children has either two children and one
data element or three children and two data elements. A node with 2 children is called a 2-NODE
and a node with 3 children is called a 3-NODE. A 4-NODE, with three data elements, may be
temporarily created during manipulation of the tree but is never persistently stored in the tree.

Nodes on the outside of the tree i.e. the leaves have no children and have either one or two data
elements. All its leaves must be on the same level so that a 2-3 tree is always height balanced. A 2-
3 tree is a special case of a B-Tree of order 3. Below is an example of a 2-3 tree.

Properties-
A 2-3 tree follows the below mentioned properties:
1.Every internal node in the tree is a 2-node or a 3-node i.e. it has either one value or two values.
2.A node with one value is either a leaf node or has exactly two children. Values in left sub tree <
value in node < values in right sub tree.
3.A node with two values is either a leaf node or has exactly 3 children. It cannot have 2 children.
Values in left sub tree < first value in node < values in middle sub tree < second value in node
< value in right sub tree.
4.All leaf nodes are at the same level.

Operations-
a) Insertion-The insert operation takes care of the balanced property of a 2-3 tree. The insertion
algorithm into a two-three tree is quite different from the insertion algorithm for a binary search
tree. In a two-three tree, the algorithm will be as follows:
1.If the tree is empty, create a node and put value into the node.
2.Otherwise find the leaf node where the value belongs.
3.If the leaf node has only one value, put the new value into the node.
4.If the leaf node has more than two values, split the node and promote the median of the three
values to parent.
5.If the parent then has three values, continue to split and promote, forming a new root node if
necessary.

b) Deletion-Deleting a data element d is similar to inserting. There is a special case when T is just
a single node containing d. In this case, the tree is made empty. In other cases, the parent of the
node to be deleted is found, then the tree is fixed up (if necessary) so that it is still a 2-3 tree

c) Searching-Searching for an element in a 2-3 tree is very similar as searching for an item in a
binary search tree. Since the tree is sorted, the search function will be directed to the correct
subtree and eventually to the correct node which contains the element.
Let d be the data element we want to search for in the 2-3 tree T. The base cases are as follows:
1.If the tree T is empty, then d is not in the tree and we return FALSE.
2.If the current node contains value which is equal to d, we return TRUE.
3.If we reach the root node and it does not contain d, we return FALSE.

Advantages-
1.The 2-3 tree has the advantage over the BST in that the 2-3 tree can be kept height balanced at
relatively low cost.
2. In 2-3 trees you have faster inserts at the expense of slower searches (since height is more
compared to AVL trees).
3. A persistent data structure based on 2-3 tree is better than the one based on AVL trees. A
persistent data structure is one which makes it possible to roll back your algorithm to a previous
step (so it preserves its previous versions).

Uses-
Linux Kernel.
Completely Fair Scheduler
To keep track of Virtual Memory Segments of a Process.
Treaps
Treap is a data structure which combines binary tree and binary heap (hence the name: tree + heap
⇒ Treap).

More specifically, treap is a data structure that stores pairs (X, Y) in a binary tree in such a way
that it is a binary search tree by X and a binary heap by Y. Assuming that all X and all Y are
different, we can see that if some node of the tree contains values (X0, Y0), all nodes in the left
subtree have X<X0, all nodes in the right subtree have X>X0, and all nodes in both left and right
subtrees have Y<Y0. Treaps have been proposed by Siedel and Aragon in 1989.
Every node of Treap maintains two values.
1) Key Follows standard BST ordering (left is smaller and right is greater)
2) Priority Randomly assigned value that follows Max-Heap property.

Operations:-
a) Insertion: -To insert a new key x into the treap, we generate a random priority y for x. Then
binary search for x in the tree, and create a new node at the leaf position where the binary search
determines a node for x should exist. Then, as long as x is not the root of the tree and has a larger
priority number than its parent z, perform a tree rotation that reverses the parent-child relation
between x and z.
b) Deletion: -
1) If node to be deleted is a leaf, delete it.
2) Else replace node's priority with minus infinite ( -INF ), and do appropriate rotations to bring
the node down to a leaf.
c) Split: -
The power of treaps comes from their unique operations, which are impossible to implement in a
efficient way using others BSTs. We want to split the original treap into two parts L and R such
that both L and R are treaps as well, all nodes of L are smaller or equal to a given value X and all
nodes of R are greater than X. It’s real simple to implement it. Just insert a new node with key X
and infinity priority. The heap property will make that artificial node become the root of the treap,
and the BST property will guarantee that its left and right child satisfy the conditions we want.

Advantages: -
priorities allow to uniquely specify the tree that will be constructed (of course, it does not depend
on the order in which values are added), which can be proven using corresponding theorem.
Obviously, if you choose the priorities randomly, you will get non-degenerate trees on average,
which will ensure O(logN) complexity for the main operations. Hence another name of this data
structure-randomized binary search tree.

Uses: -
One way of implementing ordered sets are red black trees, but if we want to accept probabilistic
assurances of performance then there we use treaps, as they are easier to implement and faster too.
Since it uses randomness to create priority in treaps.Alse, fast set operations are possible when
using treaps like union(merge), intersection and difference. A randomized hash map is possible
only through treaps
References

1. https://en.wikipedia.org/wiki/Splay_tree
2. http://www.btechsmartclass.com/data_structures/splay-trees.html
3. https://www.cs.cornell.edu/courses/cs3110/2013sp/recitations/rec08-splay/rec08.html
4. https://en.wikipedia.org/wiki/Scapegoat_tree
5. https://www.geeksforgeeks.org/scapegoat-tree-set-1-introduction-insertion/
6. https://people.scs.carleton.ca/~maheshwa/Honor-Project/scapegoat.pdf
7. https://en.wikipedia.org/wiki/Red%E2%80%93black_tree
8. https://www.geeksforgeeks.org/red-black-tree-set-1-introduction-2/
9. https://en.wikipedia.org/wiki/AA_tree
10. https://ycpcs.github.io/cs350-fall2017/lectures/AA-tree_lecture.pdf
11. http://www.cse.chalmers.se/edu/year/2018/course/DIT961/files/lectures/dit961_lecture_10.pdf
12. https://en.wikipedia.org/wiki/B-tree
13. http://courses.washington.edu/css502/zander/Notes/06AVL-Btree.pdf
14. https://www.cse.chalmers.se/edu/year/2016/course/DIT960/slides/9.pdf
15. https://en.wikipedia.org/wiki/Treap
16. https://medium.com/carpanese/a-visual-introduction-to-treap-data-structure-part-1-
6196d6cc12ee

You might also like