[go: up one dir, main page]

0% found this document useful (0 votes)
31 views50 pages

Notes

Uploaded by

Ng Shi Qing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views50 pages

Notes

Uploaded by

Ng Shi Qing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

BINARY SEARCH TREES

Finding the successor of a node x


 Two cases
 Case 1: If node x has a right child, then x’s successor is the left-most descendant of
this right child

 Case 2:If node x doesn’t have a right child, then x’s successor is x’s most recent
(lowest) ancestor whose left child is also an ancestor of x
E
Q
Y

L
X
P

Keep going up
until you meet
a left link
 If z is a leaf (i.e., has no children), all we have to do is insert a NULL pointer in the node
that USED to be z’s parent
 If z has only one child, we just “splice out” or bypass node z (z’s parent will now refer to
z’s [only] child, rather than z)
 If z has two children, then we find its successor y, and replace z’s data with y’s data. The
remainder of z’s original right subtree becomes y’s new right subtree, and z’s original left
subtree becomes y’s new left subtree

INSERTION SORT
• Good for sorting small lists of keys

 Best case: the array is already sorted  O(n)


 Worst case: the array is in reverse order  O( n2)

MERGESORT

Run Time: O(N lg N)


Run Time: Θ(N)

HUFFMAN ENCODING

 “THIS IS AN EXAMPLE OF A HUFFMAN TREE


 {T, H, I, S, space, A, N, E, X, M, P, L, O, F, U, R}

 The original “alphabet” for this message consists of 16 characters (2 4 =
16  4 bits)

 7 + (4 × 2) + 3 + (2 × 6) + (1 × 6) = 36 × 4 bits = 144 bits


HUFFMAN DECODING
 We add padding bits to fill the void at the end

 If we need padding bits at all, then we need somewhere


between 1 and 7 of them, if we're using 8-bit bytes (1 – 3 if
using 4-bit “bytes”)
 We may not need any padding bits at all (if our encoded
message happens to be some multiple of 8 bits long – that fits
perfectly)
COUNTING SORT

 If k >> n, this isn’t a good sort


 Suppose we were sorting ten 32-bit unsigned integers, and we don’t
know anything about their range. With ten values, n = 10
 Absent of any knowledge about the range of the values, we must
assume k = 232  4 billion+
 The A and B arrays [1..n] would have 10 values
 The C array [1..k] would have 4 billion+ values
 The O(k) loops would dominate by 8 orders of magnitude!

RADIX SORT
QUICKSORT

first
x=
A[p]
i=p
for j = p + 1 to r

exchange A[i] with A[p]


return i

1. Best / Average case (random data): O(n lg n)


2. Worst case (already sorted data): O(n2)
 1 2 3 4 5 6 7 8 9 10 (already sorted data)
 This would be partitioned into an empty lower “half” and nearly
everything
else in the “high” half
 Each partitioning would make n – 1 comparisons, but would not
divide the list
in half!
 n passes of n – 1 comparisons each → n2 behavior!
3. Divide-and-conquer approach (recursive)
4. Can use lots of stack space (recursive calls)
5. Not good when there are a significant number of duplicates

HEAPSORT (Combines best of Merge Sort (O(n lg n)) & Insertion Sort (In-
place sort))

1. A nearly-full binary tree


 The tree is full, except for the right part of the lowest level
2. Not a Binary Search Tree (BST)
 We don’t go down through the tree comparing keys to find
information
 The BST property does NOT hold on a heap
3. We get the subscripts for storing the heap in a one-based array
4. The root is stored in A[1]. Heaps are not typically stored as zero-based arrays
i
5. Subscript: Left child = 2i; Right child = 2i+1; Parent = ⌊ ⌋ ; therefore, no pointer is
2
needed

6. Max-Heap property:
 Every node i other than the root: A[i].PARENT ≥ A[i]
 The value at a node is, at least as large as its child(ren)
 Due to recursion, the largest value is always at the root, or A[1]
 Runs in O(lg n) time
7. Height of a Heap
 Height of a node = Number of edges on the longest downward path from that node to
the leaf
 Height of the entire heap = Height of the root

 Height of the heap = Θ (lg n)


= Basic operation on heap (proportional to at most tree height)
Find the largest of A[i] and its children (if
either child doesn’t exist, don’t consider it)
largest will be either i, l, or r

If largest is i (not one of i’s children), then


we’re done. Otherwise (if largest is one of
i’s children), swap A[i] with that child
(A[largest]), and then (recursively) MAX-
HEAPIFY from that child down
8. A max-priority queue supports the following operations:
 INSERT(S, x) – inserts the element x into the set S

Θ(1)
Θ(1)
O(lg
n)

 MAXIMUM(S) – returns the element of S with the largest key (the highest-priority
item in the queue)
Θ(1)

 EXTRACT-MAX(S) removes and returns the element of S with the largest key
All else Θ(1)

Only Θ(lg n)

 INCREASE-KEY(S, x, k) – increases the value of element x’s key to the new value k,
which is assumed to be at least as large as x’s current key value (i.e., raise x’s priority
to k, maintaining the priority queue’s structure)
Runs O(lg n) but heap height Θ(lg n)

APPLICATION OF TREE
HUFFMAN ENCODING
 Motivation: compression
 The contents of a node are only meaningful at a leaf; internal nodes are of little
consequence
 The path to each leaf is of great importance
 Path used to encode (and compress) data
 Allows recoding a fixed "alphabet" with variable-length bit strings

BINARY SEARCH TREE


 For data lookup
 The data stored in every node is important; the paths are of no consequence
 An in-order traversal will output the values in increasing (or decreasing) order
 Can support the (7) dynamic set operations:
• SEARCH: O(h) operations, where h is the height of the tree
• INSERT: requires a SEARCH (O(h)), plus a couple of pointer changes to attach the
new node, so it’s O(h)
• DELETE: requires Search (O(h)), and may require Successor (O(h)), and a constant
number of pointer changes, so it, too, is O(h)
• PREDECESSOR and SUCCESSOR: require SEARCH (O(h)) and may require MIN /
MAX (O(h)), so they’re O(h)
• MINIMUM and MAXIMUM: require walking from the root to the left/right-most leaf
(which depends on the height of the tree), so they’re O(h)
 The height of a tree containing a given number of nodes (n) depends on how well-
balanced the tree is. If the tree is complete, lg(n); if it degenerates into a linked list, n.
 Completely full trees will have 2h – 1 nodes; 2h-1 leaves; 2h-1 – 1 internal.

AVL TREES (h ≤ (~1.4404 lg N))


1. Tree is not maintained with perfect balance; however, it does stay “nearly balanced” or
"balanced enough" (-1  BF  +1).
2. BF = height of the node’s LEFT subtree - height of the node’s RIGHT subtree
3. Properties of AVL Trees
• Time to insert: O(h)
• Time to search: O(h)
• Time to find successor / predecessor: O(h)
• Time to delete: O(h)
• Time to find min / max: O(h)
RED-BLACK TREES
1. It’s still a BST, and we keep it “mostly balanced”.
2. Uses a single bit for “color” (every node is either “red” or “black”) and some very
special rules.
3. The leaves are always empty – our data resides in the internal nodes.
Furthermore, it simplifies some of the coding (and saves space) if all leaves are the same
empty node, which Cormen calls T.NIL. The root’s parent is also T.NIL.
4. The “Ground Rules”:
i. Every node is colored either “Red” or “Black”
ii. The root is (always) black
iii. Every leaf, T.NIL, is always black
iv. If a node is red, then both of its children are black (hence, there can be no two
consecutive red nodes on any path from root to leaf)
v. For every node, the number of black nodes between that node and the leaves is the
same
5. The height (h) at a node is the number of edges (links) between the node and the
leaves along the longest path
6. The black height (bh) at a node X is the number of black nodes (including T.NIL)
along the path from X to the leaves, not counting X.

7. SEARCH, SUCCESSOR, PREDECESSOR, MIN, and MAX work just like any other
BST, and are all O(h)  O(lg n)
8. INSERT and DELETE – not so easy
• O(lg n) time to run RBT-INSERT up to the call of RBT-INSERT-FIXUP
• Within RBT-INSERT-FIXUP :
• Each iteration takes O(1) time
• Each iteration is either the last one or it moves z up 2 levels
• O(lg n) levels. O(lg n) time
• Also note that there are at most 2 rotations overall
• Thus, insertion into a red-black tree takes O(lg n) time

RBT Insertion – The Three cases:


Case 1: z is red, z.p is red, & z’s uncle y is red
• Re-color z, z’s parent, and z’s uncle (no rotation)
• Move z up two levels (z  z’s parent’s parent), and try again as long as z’s parent is red

Case 2: z and z.p are both red, z’s uncle y is black, z is a right child
• Do a left rotation around z, making z a left child. Takes us immediately to…

Case 3: z and z.p are both red, z’s uncle y is black, z is a left child
• Recolor z’s parent and grandparent
• Do a right rotation around z’s grandparent
GRAPHS
 Notation: A graph G has sets of both vertices and edges. The edges connect the vertices.
Any vertex can have any number (zero or more) of edges connecting it to other vertices
 Vertices are sometimes referred to as nodes
 In an undirected graph, the edges are bidirectional.
 In a directed graph, some of the edges are one-way (directional), and some are two-way
(bidirectional). Arrows indicate the direction(s) of the edges. A bidirectional edge is
counted as two edges in a directed graph.
 The degree of a node is the number of edges connected to it.
 In Euler’s graph, Vertex W is of degree 5; nodes N, E, and S are of degree 3 (Note: All
four nodes have odd degrees)

 Euler proved that for there to be a path that crosses every bridge exactly once, there
must be no more than two vertices of odd degree
 If there’s one vertex of odd degree, it must be either the starting OR ending point
of the path
 If there are two odd degrees, then they must be the starting AND ending points
 If there are more than two nodes of odd degree, then such a path does not exist
 In a directed graph, we also refer to the in-degree (the number of edges coming into
the node) and the out-degree (outbound edges) of a node
 The degree of a node in a directed graph is the sum of the two

 The number of edges is denoted by |E|, and the number of vertices is |V|.
 The search time might be Θ(E+V), which means Θ(|E|+ |V|).
 The vertex set of a graph G is G.V, and the edge set of a graph G is G.E
 Given graph G = (V, E):
 There are two common ways to represent a graph for algorithms:
 Adjacency lists – a listing of the edges from each vertex, to each vertex
Better for sparse graphs, where there are not a large number of edges (it’s
more compact)
 Adjacency matrix – a |V| x |V| matrix that has a 1 where there’s an edge between a
pair of vertices, a 0 if not
Preferred for graphs where there are many edges (dense graphs, where |E| → |
V2|)
Adjacency List
 An array Adj of |V| lists, one per vertex
 Directed and undirected graphs: Vertex u’s list, Adj[u], has all vertices v such that there’s
an edge (u, v)  E
 If edges have weights, we can put the weights in the lists: Weight: w : E → R
 Disadvantage: there has to be a search algorithm through the list of a vertex to see if there
is an edge to another vertex.
 Advantage: Less space to store than a matrix
 Directed graph: There is a total of |E| items across all of the adjacency lists
 Undirected graph: There are 2|E|

Adjacency Matrix

Undirected graph:
 Symmetric about its diagonal
 the adjacency matrix equals its transpose
(A = AT )

A fully-connected or complete graph is one in which every vertex has an edge to every other
vertex.
Breadth-First Search (BFS)
• A shortest-path algorithm that works on unweighted graphs (graphs in which all edges are
assumed to have a weight of 1)

u. =
s.
NIL =
NIL

v. = u

BFS for Undirected Graph


BFS for Directed Graph
By adding the red arrow instruction in the algorithm as shown above:

Depth-First Search (DFS)


 Goes “deeper” before it goes “wider”
 Doesn’t start from a particular vertex (but still hits all vertices and edges)
 Each vertex v has two timestamps – one, v.d, is when we discover the vertex; the other,
v.f, is when we are finished with the vertex
 To accomplish this, each vertex is one of three colors:
• White – we haven’t discovered that vertex yet (time < v.d)
• Gray – vertex has been discovered, but we're not finished exploring beyond it (v.d <
time < v.f)
• Black – we're finished with that vertex (and everything beyond it) (time > v.f)

 Run Time: Θ(V + E)

Note: In the boxed graph, initially D = Grey, D does not have any adjacent white vertex, so
color it black, increment of time 4  5, then pop back up the recursion (same concept for F).
 Each edge (u, v) can be classified by the color of v when the edge is first explored:
• If v is white: (u, v) is a tree edge
• If v is gray: (u, v) is a back edge
• If v is black: (u, v) is either a forward or a cross edge
 If u.d < v.d then (u, v) is a forward edge
 If u.d > v.d then (u, v) is a cross edge

Topological Sort
 Directed, Acyclic Graph (must be acyclic to do a linear ordering)
 Linear ordering (a listing) of its vertices V such that for every edge (u, v), u appears
(somewhere) before v in the ordered list
 Partial ordering – there’s not one single sequence in which all of the events have to
occur, but some events have to precede some others
 Not unique

Greedy Algorithm
 Locally-optimized decisions
 i.e., they make the most advantageous choice at any given point, without regard for
 any possible past or future knowledge about the problem
 As such, the most appealing choice at the moment may turn out, in the long run, to be a
poor choice
 Greedy algorithms never back up and “change their mind” based on what they later learn
 The Huffman tree-building algorithm is an example of a greedy algorithm
 It always picks the two remaining items with the lowest weights to select next for
merging into a new subtree
 In the case of Huffman, the greedy approach does happen to build the optimal
encoding tree (as Huffman proved)
 Example: Count out 36 cents in change
 Total change = 36 cents (QUARTER, DIME, PENNY); we’re done with 3 coins.

SPANNING TREES
 Given a connected, undirected graph G = (V, E), and a weight w(u, v) for each edge (u, v)
E
 We want to find a subset of E, which we’ll call T (i.e., T  E), such that

is minimized
 T is acyclic, undirected, and it connects all of the vertices, it must form a (single) tree,
which “spans” the graph, making it a spanning tree (directly from the DFS algorithm)
 Tree T minimizes the total weights is called the minimum spanning tree, precisely called
the “minimum-weight spanning tree”.
 Greedy algorithms are not always guaranteed to find the optimal solutions, but for
minimum spanning tree problems, they do always find an optimal solution

Note: The” MST may not be unique, but there is a single minimum weight, which is 37. This
graph has no spanning tree with a weight < 37.
 Minimum Spanning Tree (MST) Properties
 It has |V| – 1 edges
 It has no cycles
 It might not be unique for a given graph
 i.e., more than one tree with the minimum weight may exist

 We can use the loop invariant (A is a subset of some MST) to show that this generic
algorithm works
 Initialization: The empty set trivially satisfies the loop invariant
 Maintenance: Since we add only safe edges, A remains a subset of some MST
 Termination: All edges added to A are in an MST, so when we stop, A is a spanning
tree that is also an MST

Kruskal’s Algorithm
 In Kruskal’s algorithm, the set A is a forest (multiple trees). The safe edge we add always
connects two distinct trees.
 Kruskal’s algorithm depends on the existence of few set-based functions:
 MAKE-SET(x) creates a set containing the single item x
 FIND-SET(x) looks through the sets it is maintaining and determines which set x
belongs to
 UNION(u, v) merges two sets (one containing u and the other containing v) into one
set (u)
Negative-Weight Edges

 If we follow edge (d, c), that takes the distance from s to c to 8 via p = s, c, d, c,
with weights of 5 + 6 – 3 = 8, which is not a new minimum, so we don’t consider it.
This cycle has a net weight of 6 – 3 = +3, so following the cycle only lengthens the
path.
 If we follow (s, e) and then (e, f), we have a total distance from s to f of 5. But if we
follow (f, e), then the distance from s to e drops to –1. Following (e, f) and then (f, e)
again will drop the distance further to –4. We’re looking for the shortest-distance
path, so following the cycle, which has a net weight of 3 – 6 = – 3, does minimize the
total. But we get stuck in it, taking the total path weight to –. Thus the shortest-
distance path from s to e, f, and g is –.
 Despite the fact that {h, i, j} contains a negative-weight cycle (net weight: –3), the
fact that nodes h, i, and j are all unreachable from s means that the distance from s to
each of {h, i, j} is +.

Cycles
 Clearly, reachable, negative-weight cycles can’t be part of a shortest-length path
 Similarly, positive-weight cycles would not be taken, as following the cycle would only
raise the path’s length above the minimum
 Zero-weight cycles can be removed without changing the net path length, so they
effectively don’t raise or lower the total
 Therefore, we restrict our shortest-path discussion to paths with no cycles
 In the BELLMAN-FORD algorithm, edges can be relaxed many times
 DIJKSTRA’s algorithm relaxes each edge exactly once

BELLMAN-FORD Algorithm
 Disadvantages: This algorithm fails for negative-weight cycles.

 Run Time: O(E × (V-1)) = O(VE)

Relax the edges for V – 1


times. For this graph 5 – 1
= 4 times.

Reduce v.d if v.d > u.d +


Check every edge (u, v) to see if v.d > u.d + w(u, v)

Should be v.d < u.d + w(u, v)

If after 4 times, v.d still reduces, Bellman-Ford


algorithm fails, and has negative edge cycles.

Should be d[v] < d[u] +


w(u,v)

DIJKSTRA’s Algorithm
 Edge weights must be  0
 Weighted version of BFS
 Instead of a FIFO queue, uses a priority queue
 Keys are shortest-path weights (v.d)
 Two sets of vertices:
 S: vertices whose final shortest-path weights have been determined, and
 Q: a priority queue containing {V – S} (all of the vertices in the graph except those in
S)
 The running time depends on implementation of priority queue
 If binary heap, each operation takes O(lg V) time, and there are E of them, so O(E lg
V)
 If a Fibonacci heap: O(V lg V + E)
All Pairs Shortest Paths
 All-source shortest paths (i.e., the shortest path from every vertex to every other vertex)
 We could use the single-source algorithms |V| times
 If there are no negative weights, we can use Dijkstra’s algorithm
 If there are negative weights, we can’t use Dijkstra’s algorithm at all
 If we use BELLMAN-FORD, the run-time is O(V 2E). If the graph is dense, then |E|
approaches |V 2|, so the run time becomes O(V 4)

Floyd-Warshall Algorithm
B-Tree
 B-Trees raise the branching factor, lowering the height.
 When we have t keys, we have t + 1 children
 To search the tree:
• Values < k1 follow c1
• Values between k1 and k2 follow c2
• Values between k2 and k3 follow c3
• Values > k3 follow c4
Most of the nodes have
more than one key. Number of children per node = the number of keys in
the node + 1
Eg: In a node consists of
2 keys: D & H

 Insert the new key into an existing leaf node, rather than appending a new leaf node (as
we do with BST/ AVL/ RBT)
 If that node is currently full, The node has to be split into two nodes, both of which
are guaranteed to have some new empty space, into which the new key will be
guaranteed to fit
 B-TREE-SPLIT-CHILD splits the node y along its median, so half the keys can go in
one node, the other half in the other node
 The median key is denoted y.keyt
 The two nodes become the children of the old node y’s parent, and y’s original median
key y.keyt is moved up (“promoted”) into y’s parent
 There is a possibility that y’s parent is full, so the tree-split routine has to be run
again, possibly all the way back up the tree to (and including) the root
TIME COMPLEXITY
1. INSERTION SORT
 Best Case: O(N)
 Worst Case: O(N2)
2. MERGE: Θ(N)

 Divide: Just compute q as the average of p and r ⇒ D(n) = O(1)


3. MERGE SORT: O(N lg N)

 Conquer: Recursively solve 2 subproblems, each of size n/2 ⇒

 Combine: MERGE on an n-element subarray takes O(n) time ⇒ C(n)


2T(n/2)

= O(n)
4. HUFFMAN TREE: O(N log N)
5. COUNTING SORT: O(2N + 2k) ≈ O(N + k) ≈ O(N)
6. RADIX SORT: O(d * N) ≈ O(N)
7. QUICKSORT
 Average Case: O (N lg N)
 Worst Case (Already Sorted): O(N2)
8. HEAPSORT: O(n lg n)
 Height of Heap: Θ (lg n)
 MAX-HEAPIFY: Θ (lg n)
 HEAP-MAXIMUM: Θ(1)
 HEAP-EXTRACT-MAX: All else is Θ(1), except MAX-HEAPIFY
 HEAP-INCREASE-KEY: O(lg n), may move new value up heap to root;
heap height is Θ(lg n)
 MAX-HEAP-INSERT: O(lg n)
9. AVL Trees
 Height of tree (h): Tree is complete = lg(n), Tree degenerated into a
linked list = n
 insert: O(h)
 search: O(h)
 find successor / predecessor: O(h)
 delete: O(h)
 find min / max: O(h)
10. RED-BLACK TREE
 INSERT and DELETE: O(lg n)
11. BREADTH FIRST SEARCH (BFS): O(V+E)
 O(V) because every vertex enqueued at most once
 O(E) because every vertex dequeued at most once and we examine
edge (u, v) only when u is dequeued
 Therefore, every edge examined at most once if directed, at most
twice if undirected
 Either way, that’s still O(E)
12. DEPTH FIRST SEARCH (DFS): Θ(V+E)
 Θ, not just O, since it is guaranteed to examine every vertex and
every edge
13. TOPOLOGICAL SORT ON A DAG (Directed Acyclic Graph): O(V
+ E)
14. KRUSKAL Algorithm: O(E lg V)
15. INITIALIZE-SINGLE-SOURCE
 Exists in the BELLMAN FORD and DIJKSTRA’s) Algorithms: (V)
16. BELLMAN FORD Algorithms
 Single-source algorithms: O(V * E)
 Single-source algorithms |V| times: O(V2E)
 If the graph is dense, then |E| approaches |V 2|, so the run time
becomes O(V 4)
17. DIJKSTRA’s Algorithm
 Binary heap: O(E lg V) [Each operation takes O(lg V) time, and there
are E of them]
 Fibonacci heap: O(V lg V + E)
18. Floyd-Warshall Algorithm: (n3)
19. B-Tree
 Search, insert, delete: O(log N)

You might also like