Basics of Disjoint Data Structures
The efficiency of an algorithm sometimes depends on the data structure that is used. An efficient
data structure, like the disjoint-set-union, can reduce the execution time of an algorithm.
Assume that you have a set of N elements that are into further subsets and you have to track the
connectivity of each element in a specific subset or connectivity of subsets with each other. You
can use the union-find algorithm (disjoint set union) to achieve this.
Let’s say there are 5 people A, B, C, D, and E.
A is B's friend, B is C's friend, and D is E's friend, therefore, the following is true:
1. A, B, and C are connected to each other
2. D and E are connected to each other
You can use the union-find data structure to check each friend is connected to the other directly
or indirectly. You can also determine the two different disconnected subsets. Here, 2 different
subsets are {A, B, C} and {D, E}.
You have to perform two operations:
1. Union(A, B): Connect two elements A and B
2. Find(A, B): Find whether the two elements A and B are connected
Example You have a set of elements S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Here there are 10 elements
(N = 10). Use an array arr to manage the connectivity of the elements. Arr[ ] that is indexed by
elements of sets, which are of size N (as N elements in set), can be used to manage the
operations of union and find.
Assumption
A and B objects are connected only if arr[ A ] = arr[ B ].
Implementation
To implement the operations of union and find, do the following:
1. Find(A, B): Check whether arr[ A ] = arr[ B ]
2. Union(A, B): Connect A to B and merge the components that comprise A and B by
replacing elements that have a value of arr[ A ] with the value of arr[ B ].
Initially there are 10 subsets and each subset has one element in it.
When each subset contains only one element, the array arr is as follows:
Let’s perform some operations.
1. Union(2, 1)
The arr is as follows:
1. Union(4, 3)
2. Union(8, 4)
3. Union(9, 3)
The arr is as follows:
1. Union(6, 5)
The arr is as follows:
After performing some operations of Union (A ,B), there are now 5 subsets as follows:
1. First subset comprises the elements {3, 4, 8, 9}
2. Second subset comprises the elements {1, 2}
3. Third subset comprises the elements {5, 6}
4. Fourth subset comprises the elements {0}
5. Fifth subset comprises the elements {7}
The elements of a subset, which are connected to each other directly or indirectly, can be
considered as the nodes of a graph. Therefore, all these subsets are called connected
components.
The union-find data structure is useful in graphs for performing various operations like
connecting nodes, finding connected components etc.
Let’s perform some find(A, B) operations.
1. Find(0, 7): 0 and 7 are disconnected, and therefore, you will get a false result
2. Find(8, 9): Although 8 and 9 are not connected directly, there is a path that connects both
the elements, and therefore, you will get a true result
When these operations are viewed in terms of components, then:
1. Union(A, B): Replace components that comprise the two objects A and B with their union
2. Find(A, B): Check whether the two objects A and B are in the same component
If you perform the operation union(5, 2) on the components mentioned above, then it will be as
follows:
The arr is as follows: