Data Structure
3.0 Understanding Set Relation and String Structure
15th August 2024
Table of Contents
• What is Set Data Structure?
• Need for Set Data Structure
• Types of Set Data Structure
• Set Data Structure in Different Languages
• Set in C++
• Set in Java
• Set in Python
• Set in C#
• Set in JavaScript
• Difference between Array, Set, and Map Data Structure
• Internal Implementation of Set Data Structure
• Operations on Set Data Structure
• Implementation of Set Data Structure
• Complexity Analysis of Operations on Set Data Structure:
• Properties of Set Data Structure
• Applications of Set Data Structure
• Advantages of Set Data Structure
• Disadvantages of Set Data Structure
• Some Standard Problems Associated with Set Data Structure
What is Set Data Structure?
In computer science, a set data structure is defined as a data structure that stores a
collection of distinct elements.
It is a fundamental Data Structure that is used to store and manipulate a group of objects,
where each object is unique . The Signature property of the set is that it doesn’t allow
duplicate elements.
A set is a mathematical model for a collection of different things, a set contains elements
or members, which can be mathematical objects of any kind numbers, symbols, points in
space, lines, other geometrical shapes, variables, or even other sets.
A set can be implemented in various ways but the most common ways are:
1. Hash-Based Set: the set is represented as a hash table where each element in the set is
stored in a bucket based on its hash code.
2. Tree-based set: In this implementation, the set is represented as a binary search tree
where each node in the tree represents an element in the set.
Need for Set Data Structure:
Set data structures are commonly used in a variety of computer
science applications, including algorithms, data analysis, and
databases. The main advantage of using a set data structure is that it
allows you to perform operations on a collection of elements in an
efficient and organized way.
Types of Set Data Structure
The set data structure can be classified into the following two categories:
1. Unordered Set
An unordered set is an unordered associative container implemented using a hash
table where keys are hashed into indices of a hash table so that the insertion is
always randomized. All operations on the unordered set take constant time O(1)
on an average which can go up to linear time O(n) in the worst case which
depends on the internally used hash function, but practically they perform very
well and generally provide a constant time lookup operation.
2. Ordered Set
An Ordered set is the common set data structure we are familiar with. It is
generally implemented using balanced BSTs and it supports O(log n) lookups,
insertions and deletion operations.
Set Data Structure in Different Languages
Set in C++
Set in C++ internally implemented as (Self-Balancing Binary Search Tree)
Set in C++ STL are a type of associative container in which each element has to
be unique because the value of the element identifies it. The values are stored in a
specific sorted order, i.e., ascending or descending.
The std::set class is the part of C++ Standard Template Library (STL) and it is
defined inside the <set> header file.
Types of set in C++ STL
1. Set
2. Unordered Set
3. Multiset
Syntax:
std::set <data_type> set_name;
Note: The set can take any data type depending on the values, e.g. int,
char, float, etc.
Set in Java
Set in Java internally implemented as (Hash-Table)
Set is an interface , objects cannot be created of the typeset. We always need a class that extends this
list in order to create an object. And also, after the introduction of Generics in Java 1.5, it is possible to
restrict the type of object that can be stored in the Set. This type-safe set can be defined as:
Types of set in Java
1. HashSet
2. TreeSet
3. LinkedHashSet
Syntax:
// Obj is the type of object to be stored in Set
Set<Obj> set = new HashSet<Obj> ();
Set in Python
Set in Python are internally implemented as (Hash-Table)
A Set in Python is an unordered collection data type that is iterable , mutable and has no
duplicate elements.
Note: Iterable is an object which can be looped over or iterated over with the help of a for
loop. Objects like lists, tuples, sets, dictionaries, strings, etc. are called iterables. In short
and simpler terms, iterable is anything that you can loop over.
Syntax:
Set are represented by { } (values enclosed in curly braces)
Set in C#
Set in C# internally implemented as (Hash-Table)
Set in C# is an unordered collection of unique elements. It comes under
System.Collections.Generic namespace. It is used in a situation where we want to prevent
duplicates from being inserted in the collection. As far as performance is concerned, it is
better in comparison to the list.
Syntax:
HashSet<int> set = new HashSet<int>();
Set in JavaScript
Set in JavaScript internally implemented as (Hash-Table)
Set in JavaScript is a collection of items that are unique i.e. no element can be
repeated. Set in ES6 are ordered: elements of the set can be iterated in the
insertion order. A set can store any type of value whether primitive or objects.
Note: ES6 is a programming language in JavaScript
Syntax:
new Set([it]);
Example:
array = [1,2,2,3,3,4,4,5] // Repeated values
Set = set(array)
SET(1,2,3,4,5) // only unique values
Difference between Array, Set, and Map
Data Structure:
Features : Array Set Map
Duplicate Duplicate Values Unique Values
keys are unique, but the
values values can be duplicated
Order Ordered Collection Unordered Collection Unordered Collection
Size Static Dynamic Dynamic
Elements in an array can
Iterate over the set to Elements can be retrieved
Retrieval be accessed using their
retrieve the value. using their key
index
Maps are used for
Operatio Adding, removing, and Set operations like union, operations like adding,
ns accessing elements intersection, and difference. removing, and accessing
key-value pairs.
Stored as contiguous Implemented using linked Implemented using linked
Memory blocks of memory lists or trees lists or trees
Internal Implementation of Set Data Structure
A set is a data structure that stores a collection of unique elements , with no duplicates allowed. Sets can be
implemented using a variety of data structures, including arrays, linked lists, binary search trees, and hash tables.
Basically, a Set is language dependent Data Structure. Every language uses a different data structure to implement a set
data structure internally like C++ uses Self-Balancing BST. Java, Python, C#, and JavaScript use Hash tables.
Sets in C++ use Self-Balancing Binary Tree(BST) . In this approach, the elements are stored in nodes of a binary tree,
with the property that the left subtree of any node contains only elements smaller than the node’s value, and the right
subtree contains only elements larger than the node’s value. This property ensures that the elements in the tree are
always sorted in ascending order.
To add an element to the set implemented with a BST, you start at the root of the tree and compare the element to the
node’s value. If the element is less than the node’s value, you traverse to the left subtree and repeat the process.
If the element is greater than the node’s value, you traverse to the right subtree and repeat the process.
If you reach a node with the same value as the element, you do not add the element to the tree since duplicates are not
allowed.
To remove an element from the set implemented with a BST, you first search for the node containing the element,
following the same process as adding an element.
If the element is not found in the tree, there is nothing to remove.
If the element is found, there are three cases to consider:
The node has no children: In this case, you simply remove the node from the tree.
The node has one child: In this case, you replace the node with its child.
The node has two children: In this case, you find the minimum element in the right subtree of the node (i.e., the
leftmost node in the right subtree) and replace the node’s value with that element. Then, you remove the duplicate
element from the right subtree
In the case of implementation of Set using Hash table (as happens in Python) the
implementation happens in the following way:
• To add an element to the set:
• Generate a hash value (say entry ) for the key to be inserted.
• Go to the entry in the hash table and check if there are already other entries or not.
• If the slot is empty add the new element there.
• Otherwise, if there are other entries, traverse to the end of that entry list and add
the new element there.
• To remove an element from the set:
• First, generate the hash value for the element.
• If the hash value is invalid or there are no entries then return an error message.
• Otherwise, go to that slot and find the element from the list present in that slot.
• Then remove that element from the hash table.
Operations on Set Data Structure
Here are some common operations that can be performed on a set data
structure in C++ using the set container.
1. Insert an element
You can insert an element into a set using the insert function. For
example:
For hash table implementations it will be like the following:
2. Check if an element is present
You can check if an element is present in a set using the count function.
The function returns 1 if the element is present, and 0 otherwise.
3. Remove an element
You can remove an element from a set using the erase function. For
example:
In the case of Hash table implementation it will be like the following
4. Find the minimum/maximum element
You can find the minimum and maximum elements in a set using
the begin and end iterators. The begin iterator points to the first element
in the set, and the end iterator points to one past the last element.
In the case of hash table implementation in Python, the max() and min() functions
return the maximum and the minimum respectively.
5. Get the size of the set
You can get the size of a set using the size function.
C++
#include <iostream>
#include <set>
using namespace std;
int main()
{
set<int> s1; // Declaring set
// inserting elements in set
s1.insert(10);
s1.insert(5);
s1.insert(12);
s1.insert(4);
// printing elements of set
for (auto i : s1) {
cout << i << ' ';
cout << endl;
// check if 10 present inside the set
if (s1.count(10) == 1) {
cout << "Element is present in the set:" << endl;
// erasing 10 from the set
s1.erase(5);
// printing element of set
for (auto it : s1) {
cout << it << " ";
cout << endl;
cout << "Minimum element: " << *s1.begin()
<< endl; // Printing maximum element
cout << "Maximum element: " << *(--s1.end())
<< endl; // Printing minimum element
cout << "Size of the set is: " << s1.size()
<< endl; // Printing the size of the set
return 0;
// Java program Illustrating Set Interface
Complexity Analysis of Operations on Set Data
Structure:
Operation Time Complexity Explanation
Inserting an element into a balanced
Insertion O(log n) binary search tree takes O(log n) time due
to the tree’s height balancing.
Searching for an element in a balanced
Searching O(log n) binary search tree takes O(log n) time due
to the tree’s height balancing.
Deleting an element from a balanced
Deletion O(log n) binary search tree takes O(log n) time due
to the tree’s height balancing.
Accessing Accessing the minimum/maximum
element in a set implemented as a
Minimum/M O(1)
balanced binary search tree can be done
aximum in O(1) time.
Size of the Accessing the number of elements in the
O(1) set takes constant time as it is stored
Set separately.
Properties of Set Data Structure
1. Storing order – The set stores the elements in sorted order.
2. Values Characteristics – All the elements in a set have unique values .
3. Values Nature – The value of the element cannot be modified once it is added
to the set, though it is possible to remove and then add the modified value of
that element. Thus, the values are immutable.
4. Search Technique – Sets follow the Binary search tree implementation.
5. Arranging order – The values in a set are unindexed .
Applications of Set Data Structure
Sets are abstract data types that can be used to store unique elements in a collection. Here
are some common applications of sets:
Removing duplicates: If you have a large collection of data, you can use a set to easily
remove duplicates and store only unique elements.
Membership testing: Sets provide efficient operations for checking if an element is a
member of the set, which is useful in various algorithms and data structures.
Set operations: Sets can be used to perform operations such as union, intersection, and
difference, which are useful in mathematical set theory and computer science.
Graph algorithms: Sets can be used to represent vertices and edges in graph algorithms,
allowing for efficient operations such as finding connected components, detecting cycles,
and computing minimum spanning trees.
Cache replacement: Sets can be used to implement cache replacement policies, where the
oldest or least recently used elements are removed when the cache is full.
Database indexing : Sets can be used to implement indexes in databases, allowing for fast
searching and retrieval of data based on specific keys or attributes.
Advantages of Set Data Structure
• Set can be used to store unique values in order to avoid duplications of
elements present in the set.
• Elements in a set are stored in a sorted fashion which makes it efficient to
check if an element is present in the set or not.
• Set is dynamic, so there is no error of overflowing of the set.
• Searching operation takes O(logN) time complexity.
• Sets can be implemented using different data structures, such
as HashSets and TreeSets , each with its own advantages and use cases.
• Sets can be used in a variety of applications, including algorithms, data
analysis, and databases.
Disadvantages of Set Data
Structure
• Elements in a set can only be accessed with pointers, there is no indexing in a
set like arrays.
• Set is very complex to implement because of its structure and properties.
• A set takes O(logN) time complexity for basic operations like insertion and
deletion.
• Sets can only store elements of a specific data type.
• Sets can use more memory than other data structures, such as arrays or lists
because they store each element in a separate location.
THE END
THANK YOU