DAA Mod3
DAA Mod3
Module III
GREEDY METHOD
Contents
1. Introduction to Greedy method
General method
Coin Change Problem
Knapsack Problem
Job sequencing with deadlines
2. Minimum cost spanning trees
Prim‟s Algorithm,
Kruskal‟s Algorithm
General Method
Greedy method is one of the straightforward design technique which is applicable to wide
variety of applications
Most problems have n inputs which requires us to get a solution which is subset of inputs
which satisfies certain constraints.
Solution which contains a subset of inputs that satisfies a given constraint is known as
Feasible solution.
A feasible solution that either maximizes or minimizes a given objective function is known
as optimal solution
The greedy method suggests to devise an algorithm that works in stages (subset paradigm)
consider the inputs in an order based on some selection procedure
Use some optimization measure for selection procedure ,at every stage, examine an input
to see whether it leads to an optimal solution
If the inclusion of input into partial solution yields an infeasible solution, discard the input;
otherwise, add it to the partial solution
Control abstraction for greedy method is given below:
Optimal Solution:- The feasible solution that maximizes or minimizes the given objective
function is an optimal solution. Every problem will have a unique optimal solution.
Example:
First finding the number of ways of making changes for a particular amount of cents, n,
using a given set of denominations C={c1…cd} (e.g, the US coin system: {1, 5, 10, 25, 50,
100})
If n = 4,C = {1,2,3}, feasible solutions: {1,1,1,1}, {1,1,2},{2,2},{1,3}.
Second, minimizing the number of coins returned for a particular quantity of change
If available coins are {1, 5, 10, 25}) and n= 30 Cents. Then the solutions obtained are shown
below:
Solution 1,2 &3 are feasible solution to the problem where as Solution 4 is the optimal
solution
Solution for coin change problem using greedy algorithm is very intuitive and called as
cashier‟s algorithm.
Basic principle is: At every iteration for search of a coin, take the largest coin which can
fit into remain amount to be changed at that particular time. At the end you will have
optimal solution.
Knapsack Problem
Given n objects each have a weight wi and a profit pi , and given a knapsack of total
capacity m.
More formally, let xi denote the fraction of the object i to be included in the knapsack,
0 xi 1, for 1 i n. The problem is to find values for the xi.
If a fraction xi (0 ≤ xi ≤ 1) of the object i is placed in the bag. A profit pi * xi is made.
The objective in the problem is to obtain the maximum profit by filling the bag with given
objects.
If it is not possible to include an object entirely a fraction of the object can be included and
accordingly a fraction of the profit is earned.
The knapsack problem can be stated mathematically as follows.
Maximize Σ pi xi 1 ≤ i ≤ n where n is the number of objects. Such
Example : Consider 3 objects whose profits and weights are defined as Total No of objects :
n=3 and Knapsack capacity : m=20 (P1, P2, P3) = ( 25, 24, 15 ) & (W1, W2, W3) = ( 18, 15, 10
) Problem is to determine the optimum strategy for placing the objects in to the knapsack.
The problem can be solved by the greedy approach where in the inputs are arranged
according to selection process (greedy strategy) and solve the problem in stages. The
various greedy strategies for the problem could be as follows.
(1) Greedy about profit: - Select an object which has highest profit i.e arrange
the objects in descending order of profits.
(4) If an additional constraint of including each and every object is placed then the
greedy strategy could be:
A feasible solution for the problem will be a subset „J‟ of jobs such that each job in this
subset can be completed by its deadline. The value of a feasible solution „J‟ is the sum of
the profits of the jobs in „J‟.
An optimal solution is a feasible solution with maximum value of the profit.
The problem involves identification of a subset of jobs which can be completed by its
deadline. Therefore the problem suites the subset methodology and can be solved by the
greedy method.
Example : - Obtain the optimal sequence for the following jobs where n=4 j1 j2 j3 j4 Profit :
(P1, P2, P3, P4) = (100, 10, 15, 27) and Deadlines : (d1, d2, d3, d4) = (2, 1, 2, 1)
The following table provides the solution for the above problem:
In this solution only jobs 1&4 are processed and the value is 127. These jobs must be
processed in the order j4 followed by j1.
The job4 processing begins at time 0 and ends at time 1. And the processing of job 1
begins at time 1 and ends at time 2. Therefore both the jobs are completed within their
deadlines.
The optimization measure for determining the next job to be selected in to the solution
is according to the profit. The next job to include is that which increases Σpi the most,
subject to the constraint that the resulting “J” is the feasible solution.
Therefore the greedy strategy for obtaining optimal solution is to consider the jobs in
decreasing order of profits.
If job i has not been assigned a processing time, then assign it to slot [ -1, ] where is
the integer such that 1 di where di is the deadline of the ith job and if the slot [ -1, ] is not
free then assign
The optimal solution is: Jobs selected= {J1,J2,J4} Processing sequence of jobs={J2,J1,J4}
Profit obtained =40
Prim’s Algorithm
Prim‟s algorithm constructs a minimum spanning tree through a sequence of expanding
subtrees.
The initial subtree in such a sequence consists of a single vertex selected arbitrarily from
the set V of the graph‟s vertices.
On each iteration, the algorithm expands the current tree in the greedy manner by simply
attaching to it the nearest vertex not in that tree. (By the nearest vertex, we mean a vertex
not in the tree connected to a vertex in the tree by an edge of the smallest weight. Ties can
be broken arbitrarily.)
The algorithm stops after all the graph‟s vertices have been included in the tree being
constructed.
Since the algorithm expands a tree by exactly one vertex on each of its iterations, the
total number of such iterations is n − 1, where n is the number of vertices in the graph.
The tree generated by the algorithm is obtained as the set of edges used for the spanning
tree
Here is pseudocode of this algorithm.
vertex in the middle column indicate the nearest tree vertex and edge weight; selected
vertices and edges are shown in bold.
Analysis of Efficiency The efficiency of prim‟s algorithm depends on the data structures
chosen for the graph itself and for the priority queue of the set V − VT whose vertex
priorities are the distances to the nearest tree vertices.
If a graph is represented by its weight matrix and the priority queue is implemented as an
unordered array, the algorithm‟s running time will be in θ(|V |2). Indeed, on each of the
|V| − 1 iterations, the array implementing the priority queue is traversed to find and delete
the minimum and then to update, if necessary, the priorities of the remaining vertices.
We can also implement the priority queue as a min-heap (A min-heap is a complete binary
tree in which every element is less than or equal to its children). Deletion of the smallest
element from and insertion of a new element into a min-heap of size n are O(log n)
operations,and so is the operation of changing an element‟s priority
If a graph is represented by its adjacency lists and the priority queue is implemented as
a min-heap, the running time of the algorithm is in O(|E| log |V |).
This is because the algorithm performs |V| − 1 deletions of the smallest element and makes
It is not difficult to see that a new cycle is created if and only if the new edge connects two
vertices already connected by a path, i.e., if and only if the two vertices belong to the
same connected component (Figure below- (a) leads to cycle whereas (b) may not).
Note also that each connected component of a subgraph generated by Kruskal‟s algorithm
is a tree because it has no cycles.
In view of these observations, it is convenient to use a slightly different interpretation of
Kruskal‟s algorithm.
We can consider the algorithm‟s operations as a progression through a series of forests
containing all the vertices of a given graph and some of its edges.
The initial forest consists of |V | trivial trees, each comprising a single vertex of the graph.
The final forest consists of a single tree, which is a minimum spanning tree of the graph.
On each iteration, the algorithm takes the next edge (u, v) from the sorted list of the graph‟s
edges, finds the trees containing the vertices u and v, and, if these trees are not the same,
unites them in a larger tree by adding the edge (u, v).
Analysis of efficiency
The crucial check for whether two vertices belong to the same tree can be found by using
union-find algorithms.
With an efficient union-find algorithm, the running time of Kruskal‟s algorithm will be
dominated by the time needed for sorting the edge weights of a given graph.
Hence, with an efficient sorting algorithm, the time efficiency of Kruskal‟s algorithm will be
in O(|E| log |E|).
Example: An example of kruskal‟s algorithm is shown below. Selected edges are shown
in bold.
Figure:Graph
df
5
Working - Dijkstra's algorithm finds the shortest paths to a graph's vertices in order of their
distance from a given source.
First, it finds the shortest path from the source to a vertex nearest to it, then to a second
nearest, and so on.
In general, before its ith iteration commences, the algorithm has already identified the
shortest paths to i-1 other vertices nearest to the source. These vertices, the source, and
To identify the ith nearest vertex, the algorithm computes, for every fringe vertex u, the sum
of the distance to the nearest tree vertex v (given by the weight of the edge (v, u)) and the
length d., of the shortest path from the source to v (previously determined by the algorithm)
and then selects the vertex with the smallest such sum. The fact that it suffices to compare
the lengths of such special paths is the central insight of Dijkstra's algorithm.
To facilitate the algorithm's operations, we label each vertex with two labels.
The numeric label d indicates the length of the shortest path from the source to this
vertex found by the algorithm so far; when a vertex is added to the tree, d indicates the
length of the shortest path from the source to that vertex.
The other label indicates the name of the next-to-last vertex on such a path, i.e., the parent
of the vertex in the tree being constructed. (It can be left unspecified for the source s
and vertices that are adjacent to none of the current tree vertices.) With such labeling,
finding the next nearest vertex u* becomes a simple task of finding a fringe vertex with
the smallest d value. Ties can be broken arbitrarily.
After we have identified a vertex u* to be added to the tree, we need to perform
two operations:
o Move u* from the fringe to the set of tree vertices.
o For each remaining fringe vertex u that is connected to u* by an edge of weight w(u∗,
u) such that du∗ + w(u∗, u) < du, update the labels of u by u* and du∗ +
w(u∗,u), respectively.
Illustration: An example of Dijkstra's algorithm is shown below. The next closest vertex is
shown in bold.
The shortest paths (identified by following nonnumeric labels backward from a destination
vertex in the left column to the source) and their lengths (given by numeric labels of the tree
vertices) are as follows:
from a to b : a − b of length 3 from a to d : a − b − d of length 5 from a to c : a − b − c
of length 7 from a to e : a − b − d − e of length 9
The pseudocode of Dijkstra‟s algorithm is given below. Note that in the following
pseudocode, VT contains a given source vertex and the fringe contains the vertices adjacent
to it after iteration 0 is completed.
Analysis:
The time efficiency of Dijkstra‟s algorithm depends on the data structures used for
implementing the priority queue and for representing an input graph itself. It is in θ(|V |2) for
graphs represented by their weight matrix and the priority queue implemented as an
unordered array. For graphs represented by their adjacency lists and the priority queue
implemented as a min-heap, it is in O( |E| log |V| )
Applications
Transportation planning and packet routing in communication networks, including the
Internet
Finding shortest paths in social networks, speech recognition, document formatting,
robotics, compilers, and airline crew scheduling.
This method assigns to each symbol a bit string of the same length m (m ≥ log2 n). This is
exactly what the standard ASCII code does.
One way of getting a coding scheme that yields a shorter bit string on the average is based
on the old idea of assigning shorter codewords to more frequent symbols and longer
codewords to less frequent symbols.
Variable-length encoding:
This method assigns codewords of different lengths to different symbols, introduces a
problem that fixed-length encoding does not have. Namely, how can we tell how many bits
of an encoded text represent the first (or, more generally, the ith) symbol?
To avoid this complication, we can limit ourselves to the so-called prefix-free (or simply
prefix) codes.
Huffman Trees and Codes Huffman’s algorithm Step 1 Initialize n one-node trees and
label them with the symbols of the alphabet given. Record the frequency of each symbol in
its tree‟s root to indicate the tree‟s weight. (More generally, the weight of a tree will be
equal to the sum of the frequencies in the tree‟s leaves.) Step 2 Repeat the following
operation until a single tree is obtained. Find two trees with the smallest weight. Make them
the left and right subtree of a new tree and record the sum of their weights in the root of the
new tree as its weight. A tree constructed by the above algorithm is called a Huffman tree.
It defines—in the manner described above—a Huffman code. EXAMPLE Consider the
five-symbol alphabet {A, B, C, D, _} with the following occurrence frequencies in a text
made up of these symbols
The Huffman
FIGURE:Example of constructing a Huffman coding tree. The resulting codewords are as follows:
Hence, DAD is encoded as 011101, and 10011011011101 is decoded as BAD_AD. With the
occurrence frequencies given and the codeword lengths obtained, the average number of
bits per symbol in this code is 2 * 0.35 + 3 * 0.1+ 2 * 0.2 + 2 * 0.2 + 3 * 0.15 = 2.25.
If we had used a fixed-length encoding for the same alphabet, we would have to use at
least 3 bits per each symbol. Thus, for this example, Huffman‟s code achieves the
compression. ratio a standard measure of a compression algorithm‟s effectiveness of (3−
2.25)/3 * 100%= 25%. In other words, Huffman‟s encoding of the text will use 25% less
Dept. of CSE , KNSIT Page 29
DAA 21CS42
memory than its fixed-length encodin
There are three major variations of this idea that differ by what we transform a given
instance to (Figure below):
Heaps
A heap is a partially ordered data structure that is especially suitable for implementing
priority queues.
A priority queue is a multiset of items with an orderable characteristic called an item‟s
priority, with the following operations:
Finding an item with the highest (i.e., largest) priority
Deleting an item with the highest priority
Adding a new item to the multiset
Notion of the Heap Definition A heap can be defined as a binary tree with keys assigned
to its nodes, one key per node, provided the following two conditions are met:
1. The shape property—the binary tree is essentially complete (or simply complete), i.e., all
its levels are full except possibly the last level, where only some rightmost leaves may be
missing. 2. The parental dominance or heap property—the key in each node is greater than
or equal to the keys in its children. (This condition is considered automatically satisfied for
Illustration:
The illustration of the definition of heap is shown below: only the left most tree is heap. The
second one is not a heap, because the tree‟s shape property is violated. The left child of
subtree where node 5 is the root cannot be empty. And the third one is not a heap, because
the parental dominance fails for the node with key 5.
Figure: Illustration of the definition of heap: only the leftmost tree is a heap.
Properties of Heap
Here is a list of important properties of heaps:
1. There exists exactly one essentially complete binary tree with n nodes. Its height is equal
to 2. The root of a heap always contains its largest element. 3. A node of a heap
considered with all its descendants is also a heap. 4. A heap can be implemented as an
array by recording its elements in the topdown, left-to-right fashion. It is convenient to store
the heap‟s elements in positions 1 through n of such an array, leaving H[0] either unused or
putting there a sentinel whose value is greater than every element in the heap. In such a
representation,
a. the parental node keys will be in the first n/2 positions of the array, while the leaf keys will
occupy the last n/2 positions;
b. the children of a key in the array‟s parental position i (1≤ i ≤ i/2) will be in positions 2i and
2i + 1, and, correspondingly, the parent of a key in position i (2 ≤ i ≤ n) will be in position i/2.
Thus, we could also define a heap as an array H[1..n] in which every element in position i
in the first half of the array is greater than or equal to the elements in positions 2i and 2i +
1, i.e., H[i]≥ max{H[2i], H[2i + 1]} for i = 1, . . . , Constructions of Heap There are two
principal alternatives for constructing Heap.
1) Bottom-up heap construction
2) Top-down heap construction
The bottom-up heap construction algorithm is illustrated bellow. It initializes the essentially
complete binary tree with n nodes by placing keys in the order given and then “heapifies”
the tree as follows.
Starting with the last parental node, the algorithm checks whether the parental dominance
holds for the key in this node. If it does not, the algorithm exchanges the node‟s key K with
the larger key of its children and checks whether the parental dominance holds for K in its
new position. This process continues until the parental dominance for K is satisfied.
(Eventually, it has to because it holds automatically for any key in a leaf.)
After completing the “heapification” of the subtree rooted at the current parental node, the
algorithm proceeds to do the same for the node‟s immediate predecessor.
The algorithm stops after this is done for the root of the tree.
Illustration Bottom-up construction of a heap for the list 2, 9, 7, 6, 5, 8 is shown below.
The double headed arrows show key comparisons verifying the parental dominance.
Since moving to the next level down requires two comparisons—one to find the larger child
and the other to determine whether the exchange is required—the total number of key
Therefore, the total number of key comparisons in the worst case will be
where the validity of the last equality can be proved either by using the closed-form formula
Obviously, this insertion operation cannot require more key comparisons than the heap‟s
height. Since the height of a heap with n nodes is about log2 n, the time efficiency of
insertion is in O(log n). Illustration Inserting a key (10) into the heap constructed above.
The new key is shifted up via a swap with its parent until it is not larger than its parent (or is
in the root).
Delete an item from a heap: Deleting the root‟s key from a heap can be done with the
following algorithm: Maximum Key Deletion from a heap
1. Exchange the root‟s key with the last key K of the heap.
2. Decrease the heap‟s size by 1.
3. “Heapify” the smaller tree by shifting K down the tree exactly in the same way we did it in
the bottom-up heap construction algorithm. That is, verify the parental dominance for K: if it
holds, we are done; if not, swap K with the larger of its children and repeat this operation
comparisons needed to “heapify” the tree after the swap has been made and the size of the
tree is decreased by 1. Since this cannot require more key comparisons than twice the
heap‟s height, the time efficiency of deletion is in O (log n) as well. Illustration The
key(root) 9 to be deleted is swapped with the last key i.e., 1 after which the smaller tree is
“heapified” by exchanging the new key in its root with the larger key in its children until the
parental dominance requirement is satisfied.
Heap Sort
Heapsort is an interesting sorting algorithm is discovered by J. W. J. Williams. This is a
two-stage algorithm that works as follows. Stage 1 (heap construction): Construct a heap
for a given array. Stage 2 (maximum deletions): Apply the root-deletion operation n−1
times to the remaining heap. As a result, the array elements are eliminated in decreasing
order. But since under the array implementation of heaps an element being deleted is
placed last, the resulting array will be exactly the original array sorted in increasing order.
Heap sort is traced on a specific input is shown below:
This means that C(n) Є O(n log n) for the second stage of heapsort. For both stages, we get
O(n) + O(n log n) = O(n log n). A more detailed analysis shows that the time efficiency of
heapsort is, in fact, in θ(n log n) in both the worst and average cases. Thus, heapsort‟s time
efficiency falls in the same class as that of mergesort. Unlike the latter, heapsort is in-place,