Advanced Algorithms
Sam Khosravi samkh@kth.se,
Emil Hammarberg emiham@kth.se,
Julia Wollter wollter@kth.se,
Noella Rahman noella@kth.se
January 18, 2024
Retake Group Assignment 1
Problem
The task is to investigate potential fraud within a business called BananaPhone company, by
leveraging its hierarchical structure, which is represented as a binary tree of employees. In this
binary tree, each node represents an employee and has at most two children, corresponding to the
employees directly managed by that person. The goal is to identify the optimal subset of employees
to place under observation to maximize the overall number of employees monitored. We must do
this using a polynomial-time algorithm. The root is the founder of the company, who cannot be
observed. Each employee is managed by exactly one other employee above them, and choosing to
observe some employee will places all other employees, under that person, under observation by
proxy.
Solution
The algorithm will start by finding some node w, which is closest to the root and has at most one
child. This would be either a leaf node or a node with one child, since the hierarchy is represented
by a binary tree. The reason for choosing this lies in the structural property of binary trees, as
the node w would be where the tree first becomes narrow.
Let w be located on level k. Then for each level, our algorithm marks the node for observation that
is not on the path from the root to node w. By doing this, we make sure that the larger subtrees
are included in our observation. Let S = {v1 , v2 , ..., vk } be the set of nodes that are chosen to be
put under “observation” by our algorithm, and let S ′ = {u1 , u2 , ..., um } be the set of nodes that
are on the path between the root and w, and are siblings to v1, v2, ... if such siblings exist (sibling
means the other node that share the same parent as vi ). In the figure below we see an example of
this problem and its optimal solution. With the algorithm we assign each node ui that is on the
path to w to S ′ and assign each sibling vi belonging to ui , if there is one, to S being the set of
1
nodes that the optimal algorithm picks to observe.
Analysis
We can choose to store the tree in an array, with each element at index i corresponding to a node,
with indices 2i + 1 and 2i + 2 forming its children. For nodes with less than two children, the
missing children are stored as null values in the array. We can then find the node w by simply
traversing the array and picking the node with the lowest index that has at most one child node
with no further children. This is done in O(n) time. Marking the nodes for observation also takes
linear time, as you only need to visit each level once. The total running time of the algorithm is
then O(n).
Given that n is the number of nodes (not the root included) in the tree and |S ′ | the size of the set
of nodes that are not put under observation (either directly or by proxy), then the total number
of optimally observed nodes is: Opt= n − |S ′ |. Maximizing Opt then implies that |S ′ | should be
minimized, i.e. minimize the number of nodes in the path to w.
To maximize observation, the algorithm assumes that some node on every level of the binary tree
should be observed. If a level were to be skipped, then adding any node from that level of the
tree would give us more observed nodes, which would contradict the assumption our algorithm
has optimality. Once a node has been observed by the algorithm, all nodes in its following subtree
are considered to also be observed, which is why avoiding the path to node w will maximize the
number of observed nodes, as it does not let us go down smaller subtrees.
Base case: The base case is trivial, and the root only has 1 child. Observing that single child
means that the entire tree (besides the root) is observed. From this, we can say that all nodes
(which in this case is only 1) besides the root node, are observed, which is the optimal solution
since the root is not allowed to be observed.
Inductive step: Our inductive hypothesis will be that we assume that the algorithm optimally
works for a tree with height t. If the tree has height t + 1, then the root must have two children,
which would in turn create two subtrees. The algorithm would choose to observe one of these
subtrees, based on the location of node w. The subtree that does not contain w would be chosen
and therefore fully observed, as it would represent a larger part of the tree. The subtree in which
w resides would be partially observed, where the path to w would be the only unobserved section.
Using this approach, we minimize the number of unobserved nodes |S ′ |, since the path to w is also
2
the shortest possible path to a leaf or node with one child. If we continue applying this mindset
or rule to each level, the algorithm would make sure that the maximum number of nodes are put
under observation. So if the algorithm is optimal for a tree with the height t, the it would continue
to be optimal for a tree with height t + 1, which tells us that the algorithm is effective for any
height t of a tree.
An optimal algorithm would select S in a manner that aims to minimize the magnitude of |S ′ |.
This selection would be based on the premise that the nodes in S ′ would create a path within the
tree. This path would conclude at either a leaf or at a node that has only a single child. Lastly,
by definition, the desired outcome is realized when S ′ is selected as the path leading from the root
to node w, which is described in the algorithm’s explanation.