[go: up one dir, main page]

0% found this document useful (0 votes)
49 views26 pages

Data Structures MiniProject Neeraja

AVL trees

Uploaded by

Joshni amshetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views26 pages

Data Structures MiniProject Neeraja

AVL trees

Uploaded by

Joshni amshetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

1 | Page

DEPARTMENT OF ARTIFICIAL INTELLIGENCE,


MACHINE LEARNING and DATA SCIENCE
SUBJECT NAME: DATA STRUCTURES
SUBJECT CODE : AIML335

Mini-Project Report on

Persistent Versioned Document Store using Persistent AVL Trees

B.Tech Degree – AIML

by,
Neeraja M Yadav-2463040
Shalini T-2463052
Joshni Amshetha F-2463073

School of Engineering and Technology,


CHRIST (Deemed to be University),
Kumbalagodu, Bengaluru-560 074
2 | Page

September 2025

Certificate

This is to certify that Joshni Amshetha.F has successfully completed the record work for Data

Structures–CSE333P in partial fulfillment for the award of Bachelor of Technology in Artificial

Intelligence, Machine Learning and Data Science during the year 2025-2026.

​ ​ ​ ​ ​ ​

​ ​ ​ ​ ​ ​ ​ ​ FACULTY- IN CHARGE​
3 | Page

INDEX

SL NO CONTENT PG NO

1 Introduction 4

2 Abstract 5

3 Title of the Project 6

4 Objective and Relevance 7-8

5 Concept applied 8-9

6 Algorithm/Pseudocode 10-13

7 Program Code 14-21

8 Sample Input and Output 22-23

9 Result and Conclusions 23-24

10 References 24-26
4 | Page

CHAPTER – 1
INTRODUCTION

The proliferation of digital content creation has precipitated a significant challenge:


maintaining immutable historical records of documents that are simultaneously subject to real-time,
collaborative modification. This necessitates the development of versioning systems that provide
reliable, efficient access to all prior states without compromising performance. This project proposes
and implements a solution to this challenge through the construction of a persistent version control
system utilizing AVL Trees.

While traditional tree structures are efficient for representing hierarchical data, their
ephemeral nature—where updates overwrite existing state—renders them inadequate for versioning.
This limitation is overcome by employing a persistent AVL Tree. The self-balancing property of the
AVL algorithm ensures that the tree remains optimized for search operations after every update,
maintaining an O(log n) time complexity for access and modification.

Critical to this endeavor is the implementation of a fully persistent model via path copying.
This strategy ensures that each commit operation creates a new version of the tree. By cloning only
the nodes along the path of modification and sharing unchanged subtrees, the system achieves
memory efficiency while guaranteeing that every version remains intact and traversable.
Consequently, the system can serve as the backbone for applications requiring robust audit trails,
collaborative editing, and archival.

This endeavor serves a dual purpose: it validates the theoretical advantages of balanced persistent
data structures in a practical context, and it provides a tangible framework for understanding the
underlying mechanics of modern versioning systems. The implementation underscores the direct
translation of abstract computer science principles into solutions for contemporary information
management problems.
5 | Page

CHAPTER – 2

ABSTRACT

A critical challenge in collaborative software is maintaining every historical version of a


document while supporting instant access and efficient updates. Existing ephemeral data structures
fail as they overwrite previous states, forcing a trade-off between preservation and performance. This
project solves this problem by implementing a persistent versioning system in C using self-balancing
AVL Trees.

Our solution hinges on the AVL Tree's guaranteed logarithmic-time operations, which
prevent performance degradation during frequent edits. To add persistence, we engineered the system
using path copying. This technique solves the overwrite problem by creating a new version for each
edit. Only the nodes along the modified path are duplicated; the rest are shared with previous
versions. This design efficiently preserves entire historical states without unnecessary memory
duplication.

We implemented a functional system to validate this approach. The core architecture manages
an array of root pointers, each representing a unique, immutable version of the document. Users can
insert new entries and commit changes, each operation generating a new version while leaving all
prior ones perfectly intact. To retrieve any document state, the system performs an in-order traversal
from the desired version's root, outputting the content in its correct sequence.

Testing confirmed the solution's effectiveness. The system successfully handles continuous
insertions, maintains balance across all versions, and provides immediate access to any historical
state. The results prove that a persistent AVL Tree is a robust, scalable foundation for version
control. This project provides a practical model for building efficient versioning features directly into
collaborative editors, content management systems, and archival tools, effectively solving the
problem of balancing historical integrity with computational performance.
6 | Page

CHAPTER – 3

TITLE OF THE PROJECT

This project, "Persistent Document Versioning Using AVL Trees," investigates the
deployment of sophisticated tree-based architectures to address the critical computational challenge
of efficient multi-version document management. In an era defined by iterative revisions and
collaborative editing, the imperative to maintain historical fidelity while ensuring performant access
to specific versions is paramount. The proposed system capitalizes on the algorithmic guarantees of
the AVL Tree—a self-balancing binary search structure—to engineer a versioning mechanism that
incorporates full persistence.

The intrinsic property of AVL Trees to maintain logarithmic time complexity for fundamental
operations renders them exceptionally suited for dynamic environments characterized by frequent
data mutations. This initiative extends the canonical AVL Tree through a persistence layer,
implemented via path-copying. Each document modification triggers the generation of a novel tree
instance, while prior versions persist unaltered. This methodology precludes data overwrite,
optimizes resource allocation through structural sharing of immutable nodes, and ensures
instantaneous retrieval of all historical states with minimal computational overhead.

A functional prototype, implemented in the C programming language, substantiates the


translation of theoretical data structure principles into a tangible resolution for a pervasive software
challenge. The work underscores the applicability of persistence in domains such as collaborative
platforms, content management systems, and archival solutions, where uncompromising historical
accuracy and rapid retrieval are non-negotiable requirements.

By synthesizing theoretical constructs with applied engineering, this project elucidates the
indispensable role of hierarchical data structures in modern computational ecosystems. It not only
exemplifies the efficacy and robustness of AVL Trees but also catalyzes further inquiry into
persistent data architectures, version control algorithms, and scalable information management
systems. Consequently, the project exemplifies the objective of harnessing foundational data
structures to forge an innovative, reliable, and efficient paradigm for document version management,
melding academic rigor with pragmatic innovation.
7 | Page

CHAPTER – 4

OBJECTIVE AND RELEVANCE

Objective:

The primary objective of this project is to design and implement a persistent document versioning
system using AVL Trees. The system aims to store and manage multiple versions of a document
without overwriting or losing access to earlier states. By leveraging the self-balancing property of
AVL Trees, the project ensures that all operations—such as insertion, retrieval, and traversal—are
performed efficiently with logarithmic time complexity.The specific objectives of the project are:

1.​ To apply tree data structures in solving a real-world problem.


2.​ To implement persistence in AVL Trees through path-copying, ensuring that all versions of a
document remain intact and retrievable.
3.​ To demonstrate practical usage of AVL Trees in version control systems, emphasizing their
efficiency and scalability.
4.​ To bridge theoretical learning with practical application by coding the project in C language,
reinforcing both algorithmic understanding and implementation skills.
5.​ To analyze effectiveness through sample inputs, outputs, and comparisons of performance
across different versions of documents.

Relevance:

The relevance of this project lies in its direct application to everyday computing environments where
document versioning is indispensable. In academic, professional, and collaborative contexts,
documents evolve constantly, requiring systems that can:

Preserve past versions for reference or rollback.

Allow seamless retrieval of older states without data corruption.

Handle frequent insertions and updates efficiently.

Real-world examples include Google Docs, GitHub repositories, content management systems, and
archival libraries, all of which depend on robust versioning mechanisms. While many of these
systems use advanced databases or custom storage solutions, this project demonstrates how a
8 | Page

fundamental data structure like an AVL Tree can be extended to achieve similar persistence and
reliability.

Furthermore, AVL Trees provide an excellent balance between academic depth and practical
significance. They are a classic topic in computer science education, and applying them in this
project proves their utility beyond classroom theory. By incorporating persistence, the project
introduces a higher-level concept that extends the boundaries of standard AVL Tree usage,
showcasing innovation in data structure applications.

In conclusion, the objective and relevance of this project reinforce the idea that efficient data
management is the foundation of modern software systems. The implementation of persistent
document versioning through AVL Trees not only deepens the understanding of tree-based structures
but also demonstrates their adaptability to solve complex, real-world problems.

CHAPTER – 5

CONCEPT APPLICATION

Concept Applied: Tree Type and Justification

In this project, the chosen tree structure is the AVL Tree, a self-balancing binary search tree.
The selection of this tree type is based on its efficiency, balance property, and suitability for
applications requiring frequent updates while maintaining fast access to stored information.

Why AVL Trees?:

An AVL Tree ensures that the height difference (balance factor) between the left and right
subtrees of any node is at most one. This property guarantees that the tree remains balanced at all
times, preventing skewed structures that can lead to poor performance. As a result, AVL Trees
provide:

1.​ O(log n) time complexity for insertion, deletion, and searching.


2.​ Guaranteed balance after every update, unlike standard Binary Search Trees (BSTs), which
can degrade to O(n) in the worst case.
9 | Page

In the context of document versioning, where frequent insertions occur (each representing a
new edit or document entry), AVL Trees ensure that retrieval and version comparison remain
efficient regardless of the number of versions stored.

Persistence in AVL Trees:

The key extension applied in this project is persistence, which allows older versions of the
tree to remain accessible even after updates. This is achieved through the path-copying technique:

1.​ When a new insertion occurs, only the nodes along the modified path are copied.
2.​ The rest of the tree is shared between versions, minimizing memory usage.
3.​ Each root corresponds to a distinct version, which can be stored in an array of version roots.
4.​ This ensures that users can navigate back to any previous version of the document, achieving
functionality similar to version control systems.

Comparison with Other Trees:

1.​ Binary Search Trees (BSTs): While simpler, BSTs can become unbalanced, leading to
inefficient operations. Hence, they are unsuitable for persistent systems that require
consistent performance.

2.​ Red-Black Trees: They also maintain balance but are less rigidly balanced compared to AVL
Trees. AVL Trees are preferable here because they provide faster lookups, which are crucial
in version retrieval.

3.​ B-Trees: Widely used in databases, B-Trees excel at handling disk-based storage. However,
for in-memory document versioning systems, AVL Trees are more lightweight and efficient.
10 | Page

CHAPTER – 6

ALGORITHM / PSEUDOCODE

Algorithm / Pseudocode of the Implementation:

The implementation of Persistent Document Versioning using AVL Trees is built upon the standard
AVL Tree operations—insertion, rotations, and traversal—combined with the concept of
path-copying to ensure persistence. Below is the structured explanation of the algorithm, followed by
pseudocode.

Stepwise Algorithm:

1.​ Initialize Structure:

o​ Each node contains key, height, left, and right pointers.

o​ An array versions[] is maintained to store the root of each version of the AVL Tree.

2.​ Insert Operation with Persistence:

o​ To insert a new document entry, copy the root node of the previous version.

o​ Recursively copy only the nodes along the path of insertion.

o​ Perform the standard AVL insertion logic on the copied path.

o​ Apply rotations (Left, Right, Left-Right, Right-Left) to maintain balance.

o​ Store the new root in versions[version_id].

3.​ Balance Factor Calculation:

o​ For every node, calculate balance = height(left) – height(right).

o​ If balance > 1 or balance < -1, apply appropriate rotations.

4.​ Traversal for Document Retrieval:


11 | Page

o​ To view a document version, perform an in-order traversal of the corresponding root


in versions[].

o​ This ensures entries are displayed in sorted order.

5.​ Persistence Guarantee:

o​ Since old nodes not in the insertion path are shared across versions, memory usage is
optimized.

o​ All previous versions remain intact and retrievable.

PseudoCode:

structure Node:

key, height

left, right

function newNode(key):

node = allocate memory

node.key = key

node.height = 1

node.left = NULL

node.right = NULL

return node

function insertPersistent(prevRoot, key):

if prevRoot == NULL:

return newNode(key)
12 | Page

# Copy the current node

root = copy(prevRoot)

if key < root.key:

root.left = insertPersistent(prevRoot.left, key)

else if key > root.key:

root.right = insertPersistent(prevRoot.right, key)

# Update height

root.height = 1 + max(height(root.left), height(root.right))

# Balance factor

balance = height(root.left) - height(root.right)

# Apply rotations

if balance > 1 and key < root.left.key:

return rotateRight(root)

if balance < -1 and key > root.right.key:

return rotateLeft(root)

if balance > 1 and key > root.left.key:

root.left = rotateLeft(root.left)

return rotateRight(root)

if balance < -1 and key < root.right.key:

root.right = rotateRight(root.right)
13 | Page

return rotateLeft(root)

return root

function inorder(root):

if root != NULL:

inorder(root.left)

print root.key

inorder(root.right)

main:

versions[0] = NULL

version_id = 0

# Insert new versions

version_id++

versions[version_id] = insertPersistent(versions[version_id-1], 10)

version_id++

versions[version_id] = insertPersistent(versions[version_id-1], 20)

version_id++

versions[version_id] = insertPersistent(versions[version_id-1], 15)


14 | Page

# Retrieve version

inorder(versions[2]) # Displays second version

CHAPTER – 7

PROGRAM CODE

#include <stdio.h>

#include <stdlib.h>

// Structure for AVL Tree Node

typedef struct Node {

int key;

int height;

struct Node* left;

struct Node* right;

} Node;

// Array of roots for different versions (persistence)

#define MAX_VERSIONS 50

Node* versions[MAX_VERSIONS];

int version_count = 0;

// Utility function to get height of a node


15 | Page

int height(Node* n) {

if (n == NULL) return 0;

return n->height;

// Maximum of two integers

int max(int a, int b) {

return (a > b) ? a : b;

// Allocate new node

Node* newNode(int key) {

Node* node = (Node*)malloc(sizeof(Node));

node->key = key;

node->height = 1;

node->left = NULL;

node->right = NULL;

return node;

// Copy node (for persistence)

Node* copyNode(Node* node) {

if (node == NULL) return NULL;

Node* newN = (Node*)malloc(sizeof(Node));


16 | Page

*newN = *node; // Shallow copy (safe since children re-linked)

return newN;

// Right rotate subtree

Node* rotateRight(Node* y) {

Node* x = y->left;

Node* T2 = x->right;

// Rotation

x->right = y;

y->left = T2;

// Update heights

y->height = max(height(y->left), height(y->right)) + 1;

x->height = max(height(x->left), height(x->right)) + 1;

return x;

// Left rotate subtree

Node* rotateLeft(Node* x) {

Node* y = x->right;

Node* T2 = y->left;
17 | Page

// Rotation

y->left = x;

x->right = T2;

// Update heights

x->height = max(height(x->left), height(x->right)) + 1;

y->height = max(height(y->left), height(y->right)) + 1;

return y;

// Get balance factor

int getBalance(Node* n) {

if (n == NULL) return 0;

return height(n->left) - height(n->right);

// Persistent Insertion

Node* insertPersistent(Node* prevRoot, int key) {

if (prevRoot == NULL) return newNode(key);

// Copy current node for persistence

Node* root = copyNode(prevRoot);


18 | Page

if (key < root->key) {

root->left = insertPersistent(prevRoot->left, key);

else if (key > root->key) {

root->right = insertPersistent(prevRoot->right, key);

else {

return root; // Duplicate keys not allowed

// Update height

root->height = 1 + max(height(root->left), height(root->right));

// Get balance

int balance = getBalance(root);

// Left Heavy

if (balance > 1 && key < root->left->key)

return rotateRight(root);

// Right Heavy

if (balance < -1 && key > root->right->key)

return rotateLeft(root);
19 | Page

// Left Right Case

if (balance > 1 && key > root->left->key) {

root->left = rotateLeft(root->left);

return rotateRight(root);

// Right Left Case

if (balance < -1 && key < root->right->key) {

root->right = rotateRight(root->right);

return rotateLeft(root);

return root;

// Inorder Traversal

void inorder(Node* root) {

if (root != NULL) {

inorder(root->left);

printf("%d ", root->key);

inorder(root->right);

}
20 | Page

// Main function

int main() {

int choice, key, version;

versions[0] = NULL; // First version empty

while (1) {

printf("\n--- Persistent Document Versioning ---\n");

printf("1. Insert new entry (create new version)\n");

printf("2. View version\n");

printf("3. Exit\n");

printf("Enter choice: ");

scanf("%d", &choice);

switch (choice) {

case 1:

printf("Enter key (document entry as number): ");

scanf("%d", &key);

version_count++;

versions[version_count] = insertPersistent(versions[version_count - 1], key);

printf("Inserted %d in version %d\n", key, version_count);

break;

case 2:
21 | Page

printf("Enter version number (0 - %d): ", version_count);

scanf("%d", &version);

if (version >= 0 && version <= version_count) {

printf("Inorder traversal of version %d: ", version);

inorder(versions[version]);

printf("\n");

else {

printf("Invalid version!\n");

break;

case 3:

printf("Exiting...\n");

return 0;

default:

printf("Invalid choice!\n");

return 0;

}
22 | Page

CHAPTER – 8

SAMPLE INPUT AND OUTPUT SCREENSHOTS


23 | Page

CHAPTER – 9

RESULTS AND CONCLUSIONS

The implemented project successfully demonstrates the use of AVL Trees with persistence to
manage document versions effectively. Each new insertion creates a new version of the document
tree while preserving the old versions. This ensures that no data is overwritten or lost during updates.

The program was tested with multiple inputs to verify correctness and efficiency:

1.​ Insertion and Versioning:

o​ Each inserted entry generated a new version of the AVL Tree.

o​ For example, inserting 10, 20, and 15 across three stages resulted in three different
versions.

o​ Each version could be retrieved independently, confirming that persistence was


functioning correctly.
24 | Page

2.​ Traversal Results:

o​ In-order traversal was used to retrieve document contents in sorted order.

o​ For Version 1 (after inserting 10): Output → 10

o​ For Version 2 (after inserting 20): Output → 10 20

o​ For Version 3 (after inserting 15): Output → 10 15 20

3.​ Efficiency:

o​ Due to the self-balancing property of AVL Trees, all insertions and retrievals executed
in O(log n) time.

o​ Even after multiple insertions, no significant performance drop was observed,


validating the efficiency of the chosen tree structure.

4.​ Persistence Validation:

o​ Accessing previous versions always displayed the correct state of the document at that
point in time.

o​ The results confirmed that old versions were preserved without duplication of the
entire tree, thereby optimizing memory usage.

CHAPTER – 10

REFERENCES

1.​ Adelson-Velskii, G. M., & Landis, E. M. (1962). An algorithm for the organization of
information. Proceedings of the USSR Academy of Sciences, 146, 263–266. (Original
paper on AVL Trees).

2.​ Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2022). Introduction to
Algorithms (4th ed.). MIT Press.
25 | Page

3.​ Sahni, S. (2004). Data Structures, Algorithms, and Applications in C++. Universities
Press.

4.​ Goodrich, M. T., Tamassia, R., & Goldwasser, M. H. (2014). Data Structures and
Algorithms in C++ (2nd ed.). Wiley.

5.​ Weiss, M. A. (2019). Data Structures and Algorithm Analysis in C (2nd ed.). Pearson.

6.​ Driscoll, J. R., Sarnak, N., Sleator, D. D., & Tarjan, R. E. (1989). Making data structures
persistent. Journal of Computer and System Sciences, 38(1), 86–124.

7.​ Knuth, D. E. (1998). The Art of Computer Programming, Volume 3: Sorting and
Searching. Addison-Wesley.

8.​ Mehlhorn, K. (1984). Data Structures and Algorithms 1: Sorting and Searching.
Springer-Verlag.

9.​ Tarjan, R. E. (1983). Data Structures and Network Algorithms. Society for Industrial and
Applied Mathematics.

10.​Sedgewick, R., & Wayne, K. (2011). Algorithms (4th ed.). Addison-Wesley.

11.​GeeksforGeeks. (2025). AVL Tree Data Structure. Retrieved from


https://www.geeksforgeeks.org/avl-tree-set-1-insertion/
26 | Page

12.​TutorialsPoint. (2025). Persistent Data Structures. Retrieved from


https://www.tutorialspoint.com/persistent-data-structures

13.​Programiz. (2025). AVL Tree Insertion and Rotations. Retrieved from


https://www.programiz.com/dsa/avl-tree

14.​GitHub. (2025). Open-source AVL Tree Implementations in C. Retrieved from


https://github.com/topics/avl-tree

15.​Stack Overflow. (2025). Persistent Data Structures in C. Retrieved from


https://stackoverflow.com/

You might also like