Algorithms in Python
Algorithms in Python
Algorithms in Python
Theres nothing to fear but the fear itself. Thats called recursion, and that would lead you to innite fear.
Hello, human! Welcome to my book on Python and algorithms! If you are reading this you probably agree with me that those two can be a lot of fun together (or you might be lost, and in this case I suggest you give it a try anyway!). Also, many of the examples shown here are available in my git repository, together with several other (more advanced) examples for abstract data structures, trees, graphs, and solutions for the Euler Project and the Topcoder website. Dont forget to check them out! This text was written purely for fun (I know, I know, this is a broad definition of the word fun...) with no pretensions for anything big, so please forgive me (or better, let me know) if you find any typo or mistake. I am not a computer scientist by formation (I am actually an almost-I-swear-it-is-close-Ph.D. in Physics) so this maybe makes things a little less usual (or risky?). I hope you have fun!
Contents
I Flying with Python
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
11 11 12 13 14 15 15 16 25 27 33 35 43 45 45 49 54 58 63 63 66 72 79 81 83
1 Numbers 1.1 Integers . . . . . . . . . 1.2 Floats . . . . . . . . . . 1.3 Complex Numbers . . . 1.4 The fractions Module 1.5 The decimal Module . . 1.6 Other Representations . 1.7 Additional Exercises . . 2 Built-in Sequence Types 2.1 Strings . . . . . . . . . 2.2 Tuples . . . . . . . . . 2.3 Lists . . . . . . . . . . 2.4 Bytes and Byte Arrays
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
3 Collection Data Structures 3.1 Sets . . . . . . . . . . . . . . . . . 3.2 Dictionaries . . . . . . . . . . . . . 3.3 Pythons collection Data Types 3.4 Additional Exercises . . . . . . . . 4 Pythons Structure and Modules 4.1 Modules in Python . . . . . . . . 4.2 Control Flow . . . . . . . . . . . 4.3 File Handling . . . . . . . . . . . 4.4 Multiprocessing and Threading . 4.5 Error Handling in Python . . . . 4.6 Debugging and Proling . . . . . 5
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
6 4.7
5 Object-Oriented Design 5.1 Classes and Objects . . 5.2 Principles of OOP . . . 5.3 Python Design Patterns 5.4 Additional Exercises . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
II
99
101 101 104 108 110 114 120
6 Additional Abstract Data Structures 6.1 Stacks . . . . . . . . . . . . . . . . . 6.2 Queues . . . . . . . . . . . . . . . . . 6.3 Deques . . . . . . . . . . . . . . . . . 6.4 Priority Queues and Heaps . . . . . 6.5 Linked Lists . . . . . . . . . . . . . . 6.6 Additional Exercises . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
7 Asymptotic Analysis 133 7.1 Complexity Classes . . . . . . . . . . . . . . . . . . . . . . . . 133 7.2 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.3 Runtime in Functions . . . . . . . . . . . . . . . . . . . . . . 136 8 Sorting 8.1 Quadratic Sort . . . 8.2 Linear Sort . . . . . 8.3 Loglinear Sort . . . . 8.4 Comparison Between 8.5 Additional Exercises 139 . 139 . 142 . 142 . 148 . 149
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sorting Methods . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
9 Searching 153 9.1 Sequential Search . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.2 Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.3 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . 156 10 Dynamic Programming 163 10.1 Memoization . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 10.2 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . 165
CONTENTS
III
169
11 Introduction to Graphs 171 11.1 Basic Denitions . . . . . . . . . . . . . . . . . . . . . . . . . 171 11.2 The Neighborhood Function . . . . . . . . . . . . . . . . . . . 173 11.3 Introduction to Trees . . . . . . . . . . . . . . . . . . . . . . . 176 12 Binary Trees 12.1 Basic Concepts . . . . . . 12.2 Representing Binary Trees 12.3 Binary Search Trees . . . 12.4 Self-Balancing BST . . . . 12.5 Additional Exercises . . . 179 179 179 183 186 193 207 207 208 209 211
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
13 Traversals and Problems on Graphs and Trees 13.1 Depth-First Search . . . . . . . . . . . . . . . . . 13.2 Breadth-First Search . . . . . . . . . . . . . . . 13.3 Representing Tree Traversals . . . . . . . . . . . 13.4 Additional Exercises . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
CONTENTS
Part I
Chapter 1
Numbers
When you learn a new language, the rst thing you usually do (after our dear hello world) is to play with some arithmetic operations. Numbers can be integers, oat point number, or complex. They are usually given decimal representation but can be represented in any bases such as binary, hexadecimal, octahedral. In this section we will learn how Python deals with numbers.
1.1
Integers
Python represents integers (positive and negative whole numbers) using the int (immutable) type. For immutable objects, there is no dierence between a variable and an object dierence. The size of Pythons integers is limited only by the machine memory, not by a xed number of bytes (the range depends on the C or Java compiler that Python was built with). Usually plain integers are at least 32-bit long (4 bytes)1 .To see how many bytes a integer needs to be represented, starting in Python 3.1, the int.bit length() method is available:
>>> (999).bit_length() 10
To cast a string to an integer in some base or to change a the base of an integer, we simply use int(s, base):
>>> s = 11
1 To have an idea of how much this means, think that, 1K of disk memory has 1024 8 bits = 210 bytes.
11
12
>>> >>> 11 >>> >>> 3 d = int(s) print(d) b = int(s, 2) print(b)
CHAPTER 1. NUMBERS
It will raise a ValueError2 on failure (for example, if we had s=12 and tried to nd the binary representation). The optional base argument must be an integer between 2 and 36 inclusive.
1.2
Floats
Numbers with a fractional part are represented by the immutable type float. In the case of single precision, a 32-bit oat is represented by 1 bit for sign (negative being 1, positive being 0) + 23 bits for the signicant digits (or mantissa) + 8 bits for the exponent. In case of a double precision, the mantissa will have 53 bits instead. Also, the exponent is usually represented using the biased notation, where you add the number 127 to the original value3 .
Comparing Floats
We should never compare oats for equality nor subtract them. The reason for this is that oats are represented in binary fractions and there are many numbers that are exact in a decimal base but not exact in a binary base (for example, the decimal 0.1). Equality tests should instead be done in terms of some predened precision. For example, we can use the same approach that Pythons unittest module has with assert AlmostEqual:
>>> def a(x , y, places=7): ... return round(abs(x-y), places) == 0
Float numbers can also be compared by their bit patterns in memory. First we need to handle sign comparison separately: if both numbers are negative, we may compare them by ipping their signs, returning the opposite answer. Patterns with the same exponent are compared according to their mantissa.
2 3
We will learn about exceptions and errors in Python in following chapters. Try to gure out why!
13
The method round(x, n) returns x rounded to n integral digits if n is a negative int or returns x rounded to n decimal places if n is a positive int. The returned value has the same type as x:
>>> round(100.96,-2) 100.0 >>> round(100.96,2) 100.96
The method as integer ratio() gives the integer fractional representation of a oat:
>>> 2.75.as_integer_ratio() (11, 4)
1.3
Complex Numbers
The complex data type is an immutable type that holds a pair of oats: z = 3 + 4j , with methods such as: z.real, z.imag, and z.conjugate(). Complex numbers are imported from the cmath module, which provides complex number versions of most of the trigonometric and logarithmic functions that are in the math module, plus some complex number-specic functions such: cmath.phase(), cmath.polar(), cmath.rect(), cmath.pi, and cmath.e.
14
CHAPTER 1. NUMBERS
1.4
Python has the fraction module to deal with parts of a fraction. For instance, the following snippet shows the basics methods of this module:4
[general_problems/numbers/testing_floats.py] from fractions import Fraction def rounding_floats(number1, places): some operations with float() return round(number1, places) def float_to_fractions(number): return Fraction(*number.as_integer_ratio()) def get_denominator(number1, number2): a = Fraction(number1, number2) return a.denominator def get_numerator(number1, number2): a = Fraction(number1, number2) return a.numerator def test_testing_floats(module_name=this module): number1 = 1.25 number2 = 1 number3 = -1 number4 = 5/4 number6 = 6 assert(rounding_floats(number1, number2) == 1.2) assert(rounding_floats(number1*10, number3) == 10) assert(float_to_fractions(number1) == number4) assert(get_denominator(number2, number6) == number6) assert(get_numerator(number2, number6) == number2) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed)) if __name__ == __main__:
All the codes shown in this book show a directory structure of where you can nd it in my git repository. Also notice that, when you write your own codes, that the PEP 8 (Python Enhancement Proposal) guidelines recommend four spaces per level of indentation, and only spaces (no tabs). This is not explicit here because of the way Latex format the text.
4
15
1.5
When we need exact decimal oating-point numbers, Python has an additional immutable oat type, the decimal.Decimal. This method can take any integer or even a string as argument (and starting from Python 3.1, also oats, with the decimal.Decimal.from float() function). This an ecient alternative when we do not want to deal with the rounding, equality, and subtraction problems that floats have:
>>> sum (0.1 for i in range(10)) == 1.0 False >>> from decimal import Decimal >>> sum (Decimal ("0.1") for i in range(10)) == Decimal("1.0") True
While The math and cmath modules are not suitable for the decimal module, its built-in functions such as decimal.Decimal.exp(x) are enough to most of the problems.
1.6
Other Representations
16
CHAPTER 1. NUMBERS
1.7
Additional Exercises
By swapping all the occurrences of 10 with any other base in our previous method we can create a function that converts from a decimal number to another number (2 base 10):
[general_problems/numbers/convert_from_decimal.py] def convert_from_decimal(number, base): multiplier, result = 1, 0 while number > 0: result += number%base*multiplier multiplier *= 10 number = number//base return result def test_convert_from_decimal(): number, base = 9, 2 assert(convert_from_decimal(number, base) == 1001) print(Tests passed!)
17
If the base is above 10 then we must use non-numeric characters to represent these digits. We can let A stand for 10, B stand for 11 and so on. The following code will convert a number from a decimal base to any other base (up to 20):
[general_problems/numbers/convert_from_decimal_larger_bases.py] def convert_from_decimal_larger_bases(number, base): strings = "0123456789ABCDEFGHIJ" result = "" while number > 0: digit = number%base result = strings[digit] + result number = number//base return result def test_convert_from_decimal_larger_bases(): number, base = 31, 16 assert(convert_from_decimal_larger_bases(number, base) == 1F) print(Tests passed!) if __name__ == __main__: test_convert_from_decimal_larger_bases()
18
CHAPTER 1. NUMBERS
s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed))
19
Fibonacci Sequences
The module bellow shows how to nd the nth number in a Fibonacci sequence in three ways: (a) with a recursive O(2n ) runtime; (b) with a iterative O(n2 ) runtime; and (c) using a formula that gives a O(1) runtime but is not precise after around the 70th element:
[general_problems/numbers/find_fibonacci_seq.py] import math def find_fibonacci_seq_rec(n): if n < 2: return n return find_fibonacci_seq_rec(n - 1) + find_fibonacci_seq_rec(n - 2) def find_fibonacci_seq_iter(n): if n < 2: return n a, b = 0, 1 for i in range(n): a, b = b, a + b return a def find_fibonacci_seq_form(n):
20
sq5 = math.sqrt(5) phi = (1 + sq5) / 2 return int(math.floor(phi ** n / sq5))
CHAPTER 1. NUMBERS
def test_find_fib(): n = 10 assert(find_fibonacci_seq_rec(n) == 55) assert(find_fibonacci_seq_iter(n) == 55) assert(find_fibonacci_seq_form(n) == 55) print(Tests passed!) if __name__ == __main__: test_find_fib()
Primes
The following program nds whether a number is a prime in three ways: (a) brute force; (b) rejecting all the candidates up to the square root of the number; and (c) using the Fermats theorem with probabilistic tests:
[general_problems/numbers/finding_if_prime.py] import math import random def finding_prime(number): num = abs(number) if num < 4 : return True for x in range(2, num): if num % x == 0: return False return True def finding_prime_sqrt(number): num = abs(number) if num < 4 : return True for x in range(2, int(math.sqrt(num)) + 1): if number % x == 0: return False return True
21
The following program uses Pythons random module to generate n-bit prime numbers:
[general_problems/numbers/generate_prime.py] import math import random import sys from finding_prime import finding_prime_sqrt def generate_prime(number=3): while 1: p = random.randint(pow(2, number-2), pow(2, number-1)-1) p = 2 * p + 1 if finding_prime_sqrt(p): return p
22
CHAPTER 1. NUMBERS
if __name__ == __main__: if len(sys.argv) < 2: print ("Usage: generate_prime.py number") sys.exit() else: number = int(sys.argv[1]) print(generate_prime(number))
23
NumPy arrays are also much more ecient than Pythons lists, as we can see in the benchmark tests below:
[general_problems/numbers/testing_numpy_speed.py] import numpy import time def trad_version(): t1 = time.time() X = range(10000000) Y = range(10000000) Z = [] for i in range(len(X)): Z.append(X[i] + Y[i]) return time.time() - t1 def numpy_version(): t1 = time.time() X = numpy.arange(10000000) Y = numpy.arange(10000000) Z = X + Y return time.time() - t1 if __name__ == __main__: print(trad_version()) print(numpy_version())
24
CHAPTER 1. NUMBERS
Chapter 2
25
26
Mutability
Another propriety that any data type holds is mutability. Numbers are obviously immutable; however, when it comes to sequence types, we can have mutable types too. For instance, tuple, strings, and bytes are immutable, while lists and byte arrays are mutable. Immutable types are more ecient than mutable and some collection data types2 can only work with immutable data types. Since any variable is an object reference in Python, copying mutable objects can be tricky. When you say a = b you are actually pointing a to where b points. Therefore, to make a deep copy in Python you need to use special procedures:
To make a copy of a list: >>> newList = myList[:] >>> newList2 = list(myList2) To make a copy of a set (we will see in the next chapter), use: >>> people = {"Buffy", "Angel", "Giles"}
2 Collection data types are the subject in the next chapter, and it includes, for example, sets and dictionaries.
2.1. STRINGS
>>> slayers = people.copy() >>> slayers.discard("Giles") >>> slayers.remove("Angel") >>> slayers {Buffy} >>> people {Giles, Buffy, Angel} To make a copy of a dict (also in the next chapter), use the following: >>> newDict = myDict.copy() To make a copy of some other object, use the copy module: >>> import copy >>> newObj = copy.copy(myObj) # shallow copy >>> newObj2 = copy.deepcopy(myObj2) # deep copy
27
2.1
Strings
Python represents strings, i.e. a sequence of characters, using the immutable str type. In Python, all objects have two output forms: while string forms are designed to be human-readable, representational forms are designed to produce an output that if fed to a Python interpreter, reproduces the represented object. In the future, when we write our own classes, it will be important to dened the string representation of our our objects.
Unicode Strings
Pythons Unicode encoding is used to include a special characters in the string (for example, whitespace). Starting from Python 3, all strings are now Unicode, not just plain bytes. To create a Unicode string, we use the u prex:
>>> uGoodbye\u0020World ! Goodbye World !
In the example above, the escape sequence indicates the Unicode character with the ordinal value 0x0020. It is also useful to remember that in general ASCII representations are given by only 8-bits while the Unicode representation needs 16-bits.
28
The rjust(width[, fillchar]) and ljust(width[, fillchar]) Methods: Some formation (aligning) can be obtained with the methods rjust() (add only at the end), ljust() (add only at the start):
>>> name = "Agent Mulder" >>> name.rjust(50, -) -----------------------------Agent Mulder
2.1. STRINGS
29
From Python 3.1 it is possible to omit eld names, in which case Python will in eect put them in for us, using numbers starting from 0. For example:
>>> "{} {} {}".format("Python", "can", "count") Python can count
However, using the operator + would allow a more concise style here. This method allows three speciers: s to force string form, r to force representational form, and a to force representational form but only using ASCII characters:
>>> import decimal >>> "{0} {0!s} {0!r} {0!a}".format(decimal.Decimal("99.9")) "99.9 99.9 Decimal(99.9) Decimal(99.9)"
30
The split(t, n) Method: Returns a list of strings splitting at most n times on string t. If n is not given, it splits as many times as possible. If t is not given, it splits on whitespace:
>>> slayers = "Buffy*Slaying-Vamps*16" >>> fields = slayers.split("*") >>> fields [Buffy, Slaying-Vamps, 16] >>> job = fields[1].split("-") >>> job [Slaying, Vamps]
We can use split() to write our own method for erasing spaces from strings:
>>> def erase_space_from_string(string): ... s1 = string.split(" ") ... s2 = "".join(s1) ... return s2
The program bellow uses strip() to list every word and the number of the times they occur in alphabetical order for some le:3
[general_problems/strings/count_unique_words.py] import string import sys
3
2.1. STRINGS
31
def count_unique_word(): words = {} # create an empty dictionary strip = string.whitespace + string.punctuation + string.digits + "\"" for filename in sys.argv[1:]: with open(filename) as file: for line in file: for word in line.lower().split(): word = word.strip(strip) if len(word) > 2: words[word] = words.get(word,0) +1 for word in sorted(words): print("{0} occurs {1} times.".format(word, words[word]))
Similar methods are: lstrip(), which return a copy of the string with all whitespace at the beginning of the string stripped away; and rstrip(), which returns a copy of the string with all whitespace at the end of the string stripped away.
In the same way: capitalize() returns a copy of the string with only the rst character in uppercase; lower() returns a copy of the original string, but with all characters in lowercase; upper() returns a copy of the original string, but with all characters in uppercase.
32
An adaptation of the previous methods are: rfind(string), which returns the index within the string of the last (from the right) occurrence of string; and rindex(string), which returns the index within the string of the last (from the right) occurrence of string, causing an error if it cannot be found. The count(t, start, end) Method: Returns the number of occurrences of the string t in the string s:
>>> slayer = "Buffy is Buffy is Buffy" >>> slayer.count("Buffy", 0, -1) 2 >>> slayer.count("Buffy") 3
The replace(t, u, n) Method: Returns a copy of the string with every (or a maximum of n if given) occurrences of string t replaced with string u:
>>> slayer = "Buffy is Buffy is Buffy" >>> slayer.replace("Buffy", "who", 2)
2.2. TUPLES
who is who is Buffy
33
2.2
Tuples
It its possible to create tuples that contain mutable objects, such as lists. Where strings have a character at every position, tuples have an object reference at each position. Empty tuples are constructed by an empty pair of parentheses. A tuple with one item is constructed by following a value with a comma (it is not sucient to enclose a single value in parentheses):
>>> empty = () >>> t1 = hello, >>> len(empty) 0 >>> len(t1) 1 >>> t1 (hello,)
34
Tuple Unpacking
In Python, any iterable can be unpacked using the sequence unpacking operator, *. When used with two or more variables on the left-hand side of an assignment, one of which preceded by *, items are assigned to the variables, with all those left over assigned to the starred variable:
>>> >>> 1 >>> [2, x, *y = (1, 2, 3, 4) x y 3, 4]
Named Tuples
Pythons package collections4 contains a sequence data type called named tuple. This behaves just like the built-in tuple, with the same performance characteristics, but it also carries the ability to refer to items in the tuple by name as well as by index position. This allows the creation of aggregates of data items:
>>> import collections >>> MonsterTuple = collections.namedtuple("Monsters","name age power") >>> MonsterTuple = (Vampire, 230, immortal) >>> MonsterTuple (Vampire, 230, immortal)
The rst argument to collections.namedtuple is the name of the custom tuple data type to be created. The second argument is a string of spaceseparated names, one for each item that the custom tuple will take. The rst argument and the names in the second argument must be valid Python identiers. The example bellow shows a structured way of using named tuples to organize a data structure:
4
2.3. LISTS
35
[general_problems/tuples/namedtuple_example.py] from collections import namedtuple def namedtuple_example(): show an example for named tuples >>> namedtuple_example() slayer sunnydale = namedtuple(name, [job, age]) buffy = sunnydale(slayer, 17) print(buffy.job) if __name__ == __main__: namedtuple_example()
2.3
Lists
In computer science, arrays are a very simple data structure where elements are sequentially stored in continued memory and linked lists are structures where several separated nodes link to each other. Iterating over the contents of the data structure is equally ecient for both kinds, but directly accessing an element at a given index has O(1) (complexity) runtime5 in an array, while it is O(n) in a linked list with n nodes (where you would have to transverse the list from the beginning). Furthermore, in a linked list, once you know where you want to insert something, insertion is O(1), no matter how many elements the list has. For arrays, an insertion would have to move all elements that are to the right of the insertion point or moving all the elements to a larger array if needed, being then O(n). In Python, the closest object to an array is a list, which is a dynamic resizing array and it does not have anything to do with linked lists. Why mention linked lists? Linked lists are a very important abstract data structure (we will see more about them in a following chapter) and it is fundamental to understand what makes it so dierent from arrays (or Pythons lists) for when we need to select the right data structure for a specic problem.
The Big-O notation is a key to understand algorithms! We will learn more about this in the following chapters and use the concept extensively in our studies. For now just keep in mine that O(1) times O(n) O(n2 ), etc...
5
36
Lists in Python are created by comma-separated values, between square brackets. List items do not need to have all the same data type. Unlike strings which are immutable, it is possible to change individual elements of a list (lists are mutable):
>>> >>> >>> >>> [1, >>> [2, >>> [2, q = [2, 3] p = [1, q, 4] p[1].append("buffy") p [2, 3, buffy], 4] q 3, buffy] q 3, buffy]
To insert items, lists perform best (O(1)) when items are added or removed at the end, using the methods append() and pop(), respectively. The worst performance (O(n)) occurs when we perform operations that need to search for items in the list, for example, using remove() or index(), or using in for membership testing.6 If fast searching or membership testing is required, a collection type such as a set or a dictionary may be a more suitable choice (as we will see in the next chapter). Alternatively, lists can provide fast searching if they are kept in order by being sorted (we will see searching methods that perform on O(log n) for sorted sequences, particular the binary search, in the following chapters).
2.3. LISTS
37
The extend(c) Method: This method is used to extend the list by appending all the iterable items in the given list. Equivalent to a[len(a):]=L or using +=:
>>> people = ["Buffy", "Faith"] >>> people.extend("Giles") >>> people [Buffy, Faith, G, i, l, e, s] >>> people += "Willow" >>> people [Buffy, Faith, G, i, l, e, s, W, i, l, l, o, w] >>> people += ["Xander"] >>> people [Buffy, Faith, G, i, l, e, s, W, i, l, l, o, w, Xander]
The insert(i, x) Method: Inserts an item at a given position i: the rst argument is the index of the element before which to insert:
>>> people = ["Buffy", "Faith"] >>> people.insert(1, "Xander") >>> people [Buffy, Xander, Faith]
38
>>> people.remove("Buffy") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: list.remove(x): x not in list
The pop() Method: Removes the item at the given position in the list, and then returns it. If no index is specied, pop() returns the last item in the list:
>>> people = ["Buffy", "Faith"] >>> people.pop() Faith >>> people [Buffy]
The del Method: It deletes the object reference, not the contend, i.e., it is a way to remove an item from a list given its index instead of its value. This can also be used to remove slices from a list:
>>> >>> >>> [4, >>> >>> [4, >>> a = [-1, 4, 5, 7, 10] del a[0] a 5, 7, 10] del a[2:3] a 5, 10] del a # also used to delete entire variable
When an object reference is deleted and if no other object refers to its data, Python schedules the data item to be garbage-collected.7
Garbage is a memory occupied by objects that are no longer referenced and garbage collection is a form of automatic memory management, freeing the memory occupied by the garbage.
2.3. LISTS
39
The count(x) Method: Returns the number of times x appears in the list:
>>> people = ["Buffy", "Faith", "Buffy"] >>> people.count("Buffy") 2
List Unpacking
Similar to tuple unpacking:
40
Python also has a related concept called starred arguments, that can be used as a passing argument for a function:
>>> ... >>> >>> 24 >>> 24 def example_args(a, b, c): return a * b * c # here * is the multiplication operator L = [2, 3, 4] example_args(*L) example_args(2, *L[1:])
List Comprehensions
A list comprehension is an expression and loop (with an optional condition) enclosed in brackets:8
[item for item in iterable] [expression for item in iterable] [expression for item in iterable if condition]
2.3. LISTS
>>> d [3.1, 3.14, 3.142, 3.1416, 3.14159] >>> words = Buffy is awesome and a vampire slayer.split() >>> e = [[w.upper(), w.lower(), len(w)] for w in words] >>> for i in e: ... print(i) ... [BUFFY, buffy, 5] [IS, is, 2] [AWESOME, awesome, 7] [AND, and, 3] [A, a, 1] [VAMPIRE, vampire, 7] [SLAYER, slayer, 6]
41
The Google Python Style Guide advocates that list comprehensions should only be used for simple cases, when each portion ts in one line (no multiple for clauses or filter expressions):
[Good] result = [] for x in range(10): for y in range(5): if x * y > 10: result.append((x, y)) for x in range(5): for y in range(5): if x != y: for z in range(5): if y != z: yield (x, y, z) return ((x, complicated_transform(x)) for x in long_generator_function(parameter) if x is not None) squares = [x * x for x in range(10)] eat(jelly_bean for jelly_bean in jelly_beans if jelly_bean.color == black) [Bad] result = [(x, y) for x in range(10) for y in range(5) if x * y > 10]
42
43
""" The results are: (concat , 2.366791009902954, milliseconds) (append , 0.16743111610412598, milliseconds) (comprehension , 0.06446194648742676, milliseconds) (list range , 0.021029949188232422, milliseconds)
So we see the following pattern for lists: Operation index [] index assignment append pop() pop(i) insert(i,item) del operator iteration contains (in) get slice [x:y] del slice set slice reverse concatenate sort multiply """ Big-O Efficiency O(1) O(1) O(1) O(1) O(n) O(n) O(n) O(n) O(n) O(k) O(n) O(n+k) O(n) O(k) O(n log n) O(nk)
2.4
Python provides two data types for handling raw bytes: bytes which is immutable, and bytearray, which is mutable. Both types hold a sequence of zero of more 8-bit unsigned integers in the range 0 ... 255. The byte type is very similar to the string type and the bytearray provides mutating methods similar to lists.
44
Chapter 3
3.1
Sets
In Python, a Set is an unordered collection data type that is iterable, mutable, and has no duplicate elements. Sets are used for membership testing and eliminating duplicate entries. Sets have O(1) insertion, so the runtime of union is O(m + n). For intersection, it is only necessary to transverse the smaller set, so the runtime is O(n). 1
1 Pythons collection package has supporting for Ordered sets. This data type enforces some predened comparison for their members.
45
46
Frozen Sets
Frozen sets are immutable objects that only support methods and operators that produce a result without aecting the frozen set or sets to which they are applied.
The s.update(t) or s| = t Methods: They both return a set s with elements added from t. The s.union(t) or s|t Methods: They both perform union of the two sets. The s.intersection(t) or s&t Methods: They both return a new set that has each item from the sets:
>>> people = {"Buffy", "Angel", "Giles", "Xander"} >>> people.intersection({"Angel", "Giles", "Willow"}) {Giles, Angel}
3.1. SETS
47
>>> people = {"Buffy", "Angel", "Giles", "Xander"} >>> vampires = {"Spike", "Angel", "Drusilia"} >>> people.difference(vampires) {Xander, Giles, Buffy}
The discard(x), remove(x), and pop() Methods: discard(x) removes the item x from the set. remove(x) removes the item x from the set or raises a KeyError exception if the element is not in the set. pop() returns and removes a random item from the set or raises a KeyError exception if the set is empty.
48
3.2. DICTIONARIES
print(subtraction_items) # {(b, 2), (c, 3)} we can remove keys from a dict doing: d3 = {key:d2[key] for key in d2.keys() - {c, d}} print(d3) {a: 1, e: 4} if __name__ == __main__: set_operations_with_dict()
49
3.2
Dictionaries
Dictionaries in Python are implemented using hash tables. Hashing functions compute some random integer value from an arbitrary object in constant time, that can be used as an index into an array:
>>> hash(42) 42 >>> hash("hello") 355070280260770553
A dict is a collection mapping type that is iterable and supports the membership operator in and the size function len(). Mappings are collections of key-value items, providing methods for accessing items and their keys and values. When iterated, unordered mapping types provide their items in an arbitrary order. Accessing dictionaries has runtime O(1) so they are used to keep counts of unique items (for example, counting the number of each unique word in a le) and for fast membership test. Dictionaries are mutable, so we can easily add or remove items, but since they are unordered, they have no notion of index position (so that they cannot be sliced or striped):
>>> tarantino = {} >>> tarantino[name] = Quentin Tarantino >>> tarantino[job] = director >>> tarantino {job: director, name: Quentin Tarantino} >>> >>> sunnydale = dict({"name":"Buffy", "age":16, "hobby":"slaying"}) >>> sunnydale {hobby: slaying, age: 16, name: Buffy} >>>
50
>>> sunnydale = dict(name="Giles", age=45, hobby="Watch") >>> sunnydale {hobby: Watch, age: 45, name: Giles} >>> >>> sunnydale = dict([("name", "Willow"), ("age",15), ("hobby", "nerding")]) >>> sunnydale {hobby: nerding, age: 15, name: Willow}
def test_setdef(module_name=this module): dict_data = ((key1, value1), (key1, value2), (key2, value3),
3.2. DICTIONARIES
(key2, value4), (key2, value5),) print(usual_dict(dict_data)) print(setdefault_dict(dict_data)) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed))
51
The update([other]) Method: Updates the dictionary with the key/value pairs from other, overwriting existing keys. .
The items(), values(), and keys() Methods: The items(), keys(), and values() methods all return dictionary views. A dictionary view is eectively a read-only iterable object that appears to hold the dictionarys items or keys or values:
>>> sunnydale = dict(name="Xander", age=17, hobby="winning") >>> sunnydale.items() dict_items([(hobby, winning), (age, 17), (name, Xander)]) >>> sunnydale.values() dict_values([winning, 17, Xander]) >>> sunnydale.keys() dict_keys([hobby, age, name])
52
results are: 0.192, 0.002 0.600, 0.002 1.000, 0.002 1.348, 0.002 1.755, 0.002
3.2. DICTIONARIES
110000, 130000, 150000, 170000, 190000, 210000, 230000, 250000, 270000, 290000, 310000, 2.194, 2.635, 2.951, 3.405, 3.743, 4.142, 4.577, 4.797, 5.371, 5.690, 5.977, 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
53
So we can see the linear tile for lists, and constant for dict! Big-O Efficiency of Python Dictionary Operations Operation Big-O Efficiency copy O(n) get item O(1) set item O(1) delete item O(1) contains (in) O(1) iteration O(n) """
54
Dictionaries also support reverse iteration using reversed(). In addition, it is good to note that the Google Python Style guide advices that default iterators should be used for types that support them:
[Good] for key in adict: ... if key not in adict: ... [Bad] for key in adict.keys(): ... if not adict.has_key(key): ...
can be reduced to
functions = dict(a=add_to_dict, e=edit_dict,...) functions[actions](db)
3.3
Pythons collections module implements specialized container data types providing high-performance alternatives to the general purpose built-in containers.
Default Dictionaries
Default dictionaries are an additional unordered mapping type provide by Pythons collections.defaultdict. They have all the operators and methods that a built-in dictionary provide, but they also gracefully handle missing keys:
[general_examples/dicts/defaultdict_example.py]
55
Ordered Dictionaries
Ordered dictionaries are an ordered mapping type provided by Pythons collections.OrderedDict. They have all the methods and properties of a built-in dict, but in addition they store items in the insertion order:
[general_examples/dicts/OrderedDict_example.py] from collections import OrderedDict pairs = [(a, 1), (b,2), (c,3)] d1 = {} for key, value in pairs: if key not in d1: d1[key] = [] d1[key].append(value) for key in d1: print(key, d1[key]) d2 = OrderedDict(pairs) for key in d2:
56
If we change a key value, the order is not changed. To move an item to the end we should delete and re-insert it. We can also call popitem() to remove and return the last key-value item in the ordered dictionary. Or we can call do it in a FIFO order3 In general, using an ordered dictionary to produce a sorted dictionary makes sense only if we expect to iterate over the dictionary multiple times, and if we do not expect to do any insertions (or very few).
Counter Dictionaries
A specialised Counter type (subclass for counting hashable objects) is provided by Pythons collections.Counter:
[general_examples/dicts/Counter_example.py] from collections import Counter
3 In computer science, FIFO means rst-in rst-out. Pythons lists append and pop items by the end so they are LIFO, last-in, last-out.
57
def Counter_example(): show some examples for Counter it is a dictionary that maps the items to the number of occurrences seq1 = [1, 2, 3, 5, 1, 2, 5, 5, 2, 5, 1, 4] seq_counts = Counter(seq1) print(seq_counts) we can increment manually or use the update() method seq2 = [1, 2, 3] seq_counts.update(seq2) print(seq_counts) seq3 = [1, 4, 3] for key in seq3: seq_counts[key] += 1 print(seq_counts) also, we can use set operations such as a-b or a+b seq_counts_2 = Counter(seq3) print(seq_counts_2) print(seq_counts + seq_counts_2) print(seq_counts - seq_counts_2) if __name__ == __main__: Counter_example() """ Counter({5: Counter({1: Counter({1: Counter({1: Counter({1: Counter({1: """
4, 4, 5, 1, 6, 4,
1: 2: 2: 3: 2: 2:
3, 4, 4, 1, 4, 4,
2: 5: 5: 4: 3: 5:
3, 3: 4, 3: 4, 3: 1}) 4, 5: 4, 3:
58
3.4
Additional Exercises
def test_find_top_N_recurring_words(module_name=this module): seq = buffy angel monster xander a willow gg buffy the monster super buffy angel N = 3 assert(find_top_N_recurring_words(seq, N) == [(buffy, 3), (monster, 2), (angel, 2)]) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed))
59
Anagrams
The following program nds whether two words are anagrams. Since sets do not count occurrence, and sorting a list is O(n log n), hash tables can be the best solution in this case. The procedure we use is: we scan the rst string and add all the character occurrences. Then we scan the second string, decreasing all the character occurrences. In the end, if all the entries are zero, the string is an anagram:
[general_problems/dicts/verify_two_strings_are_anagrams.py] def verify_two_strings_are_anagrams(str1, str2): ana_table = {key:0 for key in string.ascii_lowercase} for i in str1: ana_table[i] += 1 for i in str2: ana_table[i] -= 1 # verify whether all the entries are 0 if len(set(ana_table.values())) < 2: return True else: return False def test_verify_two_strings_are_anagrams(): str1 = marina str2 = aniram assert(verify_two_strings_are_anagrams(str1, str2) == True) str1 = google str2 = gouglo assert(verify_two_strings_are_anagrams(str1, str2) == False) print(Tests passed!)
60
Another way to nd whether two words are anagrams is using the hashing functions proprieties, where every dierent amount of characters should give a dierent result. In the following program, ord() returns an integer representing the Unicode code point of the character when the argument is a unicode object, or the value of the byte when the argument is an 8-bit string:
[general_problems/dicts/find_anagram_hash_function.py] def hash_func(astring, tablesize): sum = 0 for pos in range(len(astring)): sum = sum + ord(astring[pos]) return sum%tablesize def find_anagram_hash_function(word1, word2): tablesize = 11 return hash_func(word1, tablesize) == hash_func(word2, tablesize)
def test_find_anagram_hash_function(module_name=this module): word1 = buffy word2 = bffyu word3 = bffya assert(find_anagram_hash_function(word1, word2) == True) assert(find_anagram_hash_function(word1, word3) == False) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed)) if __name__ == __main__: test_find_anagram_hash_function()
61
Sums of Paths
The following program uses two dierent dictionary containers to determine the number of ways two dices can sum to a certain value:
[general_problems/dicts/find_dice_probabilities.py] from collections import Counter, defaultdict def find_dice_probabilities(S, n_faces=6): if S > 2*n_faces or S < 2: return None cdict = Counter() ddict = defaultdict(list) for dice1 in range(1, n_faces+1): for dice2 in range(1, n_faces+1): t = [dice1, dice2] cdict[dice1+dice2] += 1 ddict[dice1+dice2].append( t) return [cdict[S], ddict[S]] def test_find_dice_probabilities(module_name=this module): n_faces = 6 S = 5 results = find_dice_probabilities(S, n_faces) print(results) assert(results[0] == len(results[1])) if __name__ == __main__: test_find_dice_probabilities()
Finding Duplicates
The program below uses dictionaries to nd and delete all the duplicate characters in a string:
[general_problems/dicts/delete_duplicate_char_str.py] import string def delete_unique_word(str1): table_c = { key : 0 for key in string.ascii_lowercase}
62
def test_delete_unique_word(): str1 = "google" assert(delete_unique_word(str1) == le) print(Tests passed!) if __name__ == __main__: test_delete_unique_word()
Chapter 4
In Python, modules are dened using the built-in name def. When def is executed, a function object is created together with its object reference. If we do not dene a return value, Python automatically returns None (like in C, we call the function a procedure when it does not return a value). An activation record happens every time we invoke a method: information is put in the stack to support invocation. Activation records process in the following order: Activation Records 1. the actual parameters of the method are pushed onto the stack, 2. the return address is pushed onto the stack, 3. the top-of-stack index is incremented by the total amount required by the local variables within the method, 4. a jump to the method. The process of unwinding an activation record happens in the following order: 1. the top-of-stack index is decremented by the total amount of memory consumed, 63
64
CHAPTER 4. PYTHONS STRUCTURE AND MODULES 2. the returned address is popped o the stack, 3. the top-of-stack index is decremented by the total amount of memory by the actual parameters.
Default Values in Modules Whenever you create a module, remember that mutable objects should not be used as default values in the function or method denition:
[Good] def foo(a, b=None): if b is None: b = [] [Bad] def foo(a, b=[]):
The
A package is a directory that contains a set of modules and a le called init .py . This is required to make Python treat the directories as containing packages, preventing directories with a common name (such as string) from hiding valid modules that occur later on the module search path:
>>> import foldername.filemodulename
In the simplest case, it can just be an empty le, but it can also execute initialization code for the package or set the all variable: init .py to:
__all__ = ["file1", ...]
means importing every object in the module, except those whose names begin with , or if the module has a global all variable, the list in it.
4.1. MODULES IN PYTHON Checking the Existence of a Module To check the existence of a module, we use the ag -c:
$ python -c "import decimas" Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: No module named decimas
65
The
name
Variable
Whenever a module is imported, Python creates a variable for it called name , and stores the modules name in this variable. In this case, everything below the statement
if __name__ == __main__:
will not be executed. In the other hand, if we run the .py le directly, Python sets name to main , and every instruction following the above statement will be executed.
66
The variables sys.ps1 and sys.ps2 dene the strings used as primary and secondary prompts. The variable sys.argv allows us to use the arguments passed in the command line inside our programs:
import sys def main(): print command line arguments for arg in sys.argv[1:]: print arg if __name__ == "__main__": main()
The built-in method dir() is used to nd which names a module denes (all types of names: variables, modules, functions). It returns a sorted list of strings:
>>> import sys >>> dir(sys) [ __name__ , argv , builtin_module_names , copyright , exit , maxint , modules , path , ps1 , ps2 , setprofile , settrace , stderr , stdin , stdout , version ]
It does not list the names of built-in functions and variables. Therefore, we can see that dir() is useful to nd all the methods or attributes of an object.
4.2
if
Control Flow
67
for
The for statement in Python diers from C or Pascal. Rather than always iterating over an arithmetic progression of numbers (like in Pascal), or giving the user the ability to dene both the iteration step and halting condition (as C), Pythons for statement iterates over the items of any sequence (e.g., a list or a string), in the order that they appear in the sequence:
>>> a = ["buffy", "willow", "xander", "giles"] >>> for i in range(len(a)): ... print(a[i]) buffy willow xander giles
The Google Python Style guide sets the following rules for using implicit False in Python: Never use == or != to compare singletons, such as the built-in variable None. Use is or is not instead. Beware of writing if x: when you really mean if x is not None.
68
CHAPTER 4. PYTHONS STRUCTURE AND MODULES Never compare a boolean variable to False using ==. Use if not x: instead. If you need to distinguish False from None then chain the expressions, such as if not x and x is not None:. For sequences (strings, lists, tuples), use the fact that empty sequences are False, so if not seq: or if seq: is preferable to if len(seq): or if not len(seq):. When handling integers, implicit False may involve more risk than benet, such as accidentally handling None as 0:
[Good] if not users: print no users if foo == 0: self.handle_zero() if i % 10 == 0: self.handle_multiple_of_ten() [Bad] if len(users) == 0: print no users if foo is not None and not foo: self.handle_zero() if not i % 10: self.handle_multiple_of_ten()
69
Generators are very robust and ecient and they should considered every time you deal with a function that returns a sequence or creates a loop. For example, the following program implements a Fibonacci sequence using the iterator paradigm:
def fib_generator(): a, b = 0, 1 while True: yield b a, b = b, a+b if __name__ == __main__: fib = fib_generator() print(next(fib)) print(next(fib)) print(next(fib)) print(next(fib))
70
[0, >>> [4, >>> [0, 1, 2, 3, range(4, 5, 6, 7, range(0, 3, 6, 9]
71
72
4.3
File Handling
File handling is very easy in Python. For example, the program below shows an program that reads a le and delete all the blank lines:
[general_problems/modules/remove_blank_lines.py] import os import sys def read_data(filename): lines = [] fh = None try: fh = open(filename) for line in fh: if line.strip(): lines.append(line) except (IOError, OSError) as err: print(err) finally: if fh is not None: fh.close() return lines def write_data(lines, filename): fh = None try: fh = open(filename, "w")
73
The read(size) Method: Reads some quantity from the data and returns it as a string. Size is an optional numeric argument, when it is omitted or negative, the entire contents
74
of the le will be read and returned. If the end of the le has been reached, read() will return an empty string:
>>> f.read() This is the entire file.\n >>> f.read()
The readline() Method: Reads a single line from the le. A newline character is left at the end of the string, and is only omitted on the last line of the le if the le. This makes the return value unambiguous. The readlines() Method: Return a list containing all the lines of data in the le. If given an optional parameter size, it reads that many bytes from the le and enough more to complete a line, and returns the lines from that. This is often used to allow ecient reading of a large le by lines, but without having to load the entire le in memory. Only complete lines will be returned:
>>> f.readlines() [This is the first line of the file.\n, Second line of the file\n]
The write() Method: Writes the contents of a string to the le, returning None. Write bytes/bytearray object to the le if opened in binary mode or a string object if opened in text mode:
>>> f.write(This is a test\n)
The tell() and seek() Methods: The tell() method returns an integer giving the le objects current position in the le, measured in bytes from the beginning of the le. To change the le objects position, use seek(offset, from-what). The position is computed from adding oset to a reference point and the
75
reference point is selected by the from-what argument. A from-what value of 0 measures from the beginning of the le, 1 uses the current le position, and 2 uses the end of the le as the reference point. The close() Method: Closes the le and free up any system resources taken up by the open le. It returns True if the le is closed. The input() Method: Accepts input from the user. This function takes an optional string argument (which will be printed in the console), then it will wait for the user to type in a response and to nish by pressing Enter (or Return). If the user does not type any text but just presses Enter, the function returns an empty string; otherwise, it returns a string containing what the user typed, without any line terminator.
>>> def get_int(msg): ... while True: ... try: ... i = int(input(msg)) ... return i ... except ValueError as err: ... print(err) >>> age = get_int("Enter your age: ")
The peek(n) Method: Returns n bytes without moving the le pointer position. The fileno() Method: Returns the underlying les le descriptor (available only for le objects that have le descriptors).
76
[general_problems/files/change_ext_file.py] import os import sys import shutil def change_file_ext(): if len(sys.argv) < 2: print("Usage: change_ext.py filename.old_ext new_ext") sys.exit() name = os.path.splitext(sys.argv[1])[0] + "." + sys.argv[2] print (name) try: shutil.copyfile(sys.argv[1], name) except OSError as err: print (err) if __name__ == __main__: change_file_ext()
Exporting Data with Pickle The example below shows an use for pickle:
77
[general_problems/files/export_pickle.py] import pickle def export_pickle(data, filename=test.dat, compress=False): fh = None try: if compress: fh = gzip.open(filename, "wb") # write binary else: fh = open(filename, "wb") # compact binary pickle format pickle.dump(data, fh, pickle.HIGHEST_PROTOCOL) except(EnvironmentError, pickle.PickingError) as err: print("{0}: export error: {1}".format(os.path.basename(sys.argv[0], err))) return False finally: if fh is not None: fh.close() def test_export_pickle(): mydict = {a: 1, b: 2, c: 3} export_pickle(mydict) if __name__ == __main__: test_export_pickle()
In general, booleans, numbers, and strings, can be pickled as can instances of classes and built-in collection types (providing they contain only pickable objects, .e., their dict is pickable). Reading Data with Pickle The example below shows how to read a pickled data:
[general_problems/files/import.py] import pickle def import_pickle(filename): fh = None try:
78
79
4.4
Each program in the operational system is a separate process. Each process has one or more threads. If a process has several threads, they appear to run simultaneously. Two approaches can be used to spread the workload in programs: Multiple processes Multiple processes have separate regions of memory and can only communicate by special mechanisms. The processor loads and saves a separate set of registers for each thread. It is inconvenient for communication and data sharing. This is handled by the subprocess module. Multiple threads Multiple threads in a single process have access to the same memory. They communicate simply by sharing data, providing ensure of one thread at time, handled by the threading module. Threads share the processs resources, including the heap space. But each thread still has it own stack. Although Python has a threading mechanism, it does not support true parallel execution. However, it is possible is to use parallel processes, which in modern operating systems are really ecient.
80
The queue.queue class can handle all the locking internally: we can rely on it to serialize accesses, meaning that only one thread at time has access to the data (FIFO). The program will not terminate while it has any threads running. It might create a problem since once the worker threads have done their work, they are nished but they are technically still running. The solution is to transform threads into daemons. In this case, the program will terminate as soon as there is no daemon threads running. The method queue.queue.join() blocks the end until the queue is empty.
81
4.5
There are two distinguishable kinds of errors when we compile a program in Python: syntax errors (parsing errors) and exceptions (errors detected during execution, not unconditionally fatal). While syntax errors will never allow the program to be compiled, exceptions can only be detected in some cases and for this reason they should be carefully handled.
Handling Exceptions
When an exception is raised and not handled, Python outputs a traceback along with the exceptions error message. A traceback (sometimes called a backtrace) is a list of all the calls made from the point where the unhandled exception occurred back to the top of the call stack. We can handle predictable exceptions by using the try-except-finally paradigm:
try: try_suite except exception1 as variable1: exception_suite1 ... except exceptionN as variableN: exception_suiteN
If the statements in the try blocks suite are all executed without raising an exception, the except blocks are skipped. If an exception is raised inside the try block, control is immediately passed to the suite corresponding to the rst matching exception. This means that any statements in the suite that follow the one that caused the exception will not be executed:
while 1: try:
82
The raise statement allows the programmer to force a specied exception to occur:
import string import sys try: f = open(myfile.txt) s = f.readline() i = int(string.strip(s)) except IOError, (errno, strerror): print "I/O error(%s): %s" % (errno, strerror) except ValueError: print "Could not convert data to an integer." except: print "Unexpected error:", sys.exc_info()[0] raise
83
Never use catch-all except: statements, or catch Exception or StandardError, unless you are re-raising the exception or in the outermost block in your thread (and printing an error message). Minimize the amount of code in a try/except block. The larger the body of the try, the more likely that an exception will be raised by a line of code that you didnt expect to raise an exception. In those cases, the try/except block hides a real error. Use the finally clause to execute code whether or not an exception is raised in the try block. This is often useful for clean-up, i.e., closing a le. When capturing an exception, use as rather than a comma. For example:
try: raise Error except Error as error: pass
4.6
Debugging a Code
Interactive Running: If you have some code in a source le and you want to explore it interactively, you can run Python with the -i switch, like this: python -i example.py. pdb: The debugger pdb can be used in the command line:
>>> python3 -m pdb program.py
84
To perform the inspection, type: s for step, p for print, and c for continue.
Proling
If a program runs very slowly or consumes far more memory than we expect, the problem is most often due to our choice of algorithms or data structures or due to some inecient implementation. Some performance verication is useful though: prefer tuples to list with read-only data; use generators rather than large lists or tuples to iteration; when creating large strings out of small strings, instead of concatenating the small, accumulate them all in a list and join the list of strings in the end. A good examples is given by the Google Python Style guide:
[Good] items = [<table>] for last_name, first_name in employee_list: items.append(<tr><td>%s, %s</td></tr> % (last_name, first_name)) items.append(</table>) employee_table = .join(items) [Bad] employee_table = <table> for last_name, first_name in employee_list: employee_table += <tr><td>%s, %s</td></tr> % (last_name, first_name) employee_table += </table>
The cProfile Package: Provides a detailed breakdown of call times and can be used to nd performance bottlenecks.
85
The timeit Package: Used for timing small pieces of the code:
>>> import timeit >>> timeit.timeit("x = 2 + 2") 0.034976959228515625 >>> timeit.timeit("x = sum(range(10))") 0.92387008666992188 > python -m timeit -s "import mymodule as m" "m.myfunction()"
86
4.7
Unit Testing
It is good practice to write tests for individual functions, classes, and methods, to ensure they behave to the expectations. Pythons standard library provides two unit testing modules: doctest and unittest. There are also third part testing tools: nose and py.test.
doctest
Use it when writing the tests inside the modules and functions docstrings. Then just add three line in the end:
if __name__ = "__main__" import doctest doctest.testmod()
To run the programs doctest, there are two approaches: Importing the doctest module and then run the program:
$ python3 program.py -v
Creating a separate test program using the unittest module, which is modelled on Javas JUnit unittesting library.
import doctest import unittest import module_to_be_tested suite = unittest.testsuite() suite.addtest(doctest.doctestsuite(module_to_be_tested) runner = unittest.testtestrunner() print(runner.run(suite))
Test Nomenclature
Test xtures The code necessary to set up a test (for example, creating an input le for testing and deleting afterwards). Test cases The basic unit of testing. Test suites Collection of test cases, created by the subclass unittest.testcase, where each method has a name beginning with test.
4.7. UNIT TESTING Test runner An object that executes one of more test suites.
87
88
Chapter 5
Object-Oriented Design
Suppose we want to dene an object in Python to represent a circle. We could remember about Pythons collections module and create a named tuple for our circle:
>>> circle = collections.namedtuple("Circle", "x y radius") >>> circle <class __main__.Circle> >>> circle = circle(13, 84, 9) >>> circle Circle(x=13, y=84, radius=9)
However, many things are missing here. First, there are no guarantees that anyone who uses our circle data is not going to type an invalid input value, such as a negative number for the radius. Second, how could we also associate to our circle some operations that are proper from it, such as its area or perimeter? For the rst problem, we can see that the inability to validate when creating an object is a really bad aspect of taking a purely procedural approach in programming. Even if we decide to include many exceptions handling the invalid inputs for our circles, we still would have a data container that is not intrinsically made and validated for its real purpose. Imagine now if we had chosen a list instead of the named tuple, how would we handle the fact that lists have sorting properties? It is clear from the example above that we need to nd a way to create an object that has only the proprieties that we expect it to have. In other words, we want to nd a way to package data and restrict its methods. That is what object-oriented programming allows you to do: to create your own 89
90
5.1
Classes are the way we can gather special predened data and methods together. We use them by creating objects, which are instances of a particular class. The simplest form of a class in Python looks like the following snippet:
class ClassName: <statement-1> .g <statement-N> >>> x = ClassName() # class instantiation
Class Instantiation Class instantiation uses function notation to create objects in a known initial state. The instantiation operation creates an empty object which has individuality. However, multiple names (in multiple scopes) can be bound to the same object (also know as aliasing). In Python, when an object is created, rst the special method new () is called (the constructor) and then init () initializes it. Attributes Objects have the attributes from their Classes, which are methods and data. Method attributes are functions whose rst argument is the instance on which it is called to operate (which in Python is conventionally called self). Attributes are any name following a dot. References to names in modules are attribute references: in the expression modname.funcname, modname is a module object and funcname is one of its attribute. Attributes may be read-only or writeable. Writeable attributes may be deleted with the del statement. Namespaces A namespace is a mapping from names to objects. Most namespaces are currently implemented as Python dictionaries. Examples of namespaces
91
are: the set of built-in names, the global names in a module, and the local names in a function invocation. The statements executed by the top-level invocation of the interpreter, either reading from a script le or interactively, are considered part of a module called main , so they have their own global namespace. Scope A scope is a textual region of a Python program where a namespace is directly accessible. Although scopes are determined statically, they are used dynamically. Scopes are determined textually: the global scope of a function dened in a module is that modules namespace. When a class denition is entered, a new namespace is created, and used as the local scope.
5.2
Principles of OOP
Specialization
Specialization (or inheritance) is the procedure of creating a new class that inherits all the attributes from the super class (also called base class). Any method can be overridden (reimplemented) in a subclass (in Python, all methods are virtual). Inheritance is described as an is-a relationship. Furthermore Google Python Style Guide advices that if a class inherits from no other base classes, we should explicitly inherit it from Pythons highest class, object:
class OuterClass(object): class InnerClass(object):
Polymorphism
Polymorphism (or dynamic method binding) is the principle where methods can be redened inside subclasses. In other words, if we have an object of a subclass and we call a method that is also dened in the superclass, Python will use the method dened in the subclass. If, for instance, we need to recover the superclasss method, we can easily call it using the built-in super(). For example, all instances of a custom class are hashable by default in Python. This means that the hash() attribute can be called, allowing
92
them to be used as dictionary keys and to be stored in sets. However, if we reimplement the attribute eq (), we change this propriety (what can result on our instances no longer being hashable).
Aggregation
Aggregation (or composition) denes the process where a class includes one of more instance variables that are from other classes. It is a has-a relationship. In Python, every class uses inheritance (they are all custom classes from the object base class), and most use aggregation since most classes have instance variables of various types.
93
def edge_distance_from_origin(self): return abs(self.distance_from_origin() - self.radius) def area(self): return math.pi*(self.radius**2) def circumference(self): return 2*math.pi*self.radius def __eq__(self, other): # avoid infinite recursion return self.radius == other.radius and super().__eq__(other) def __repr__(self): return "circle ({0.radius!r}, {0.x!r})".format(self) def __str__(self): return repr(self) >>> import ShapeClass as shape >>> a = shape.Point(3,4) >>> a point (3, 4) >>> repr(a) point (3, 4) >>> str(a) (3, 4) >>> a.distance_from_origin() 5.0 >>> c = shape.Circle(3,2,1) >>> c circle (3, 2) >>> repr(c) circle (3, 2) >>> str(c) circle (3, 2) >>> c.circumference() 18.84955592153876 >>> c. edge_distance_from_origin()
94
0.7639320225002102
5.3
Design patterns are an attempt to bring a formal denition for correctly designed structures to software engineering. There are many dierent design patterns to solve dierent general problems.
Decorator Pattern
Decorators (also know as the @ notation) are a tool to elegantly specify some transformation on functions and methods. The decorator pattern allows us to wrap an object that provides core functionality with other objects that alter that functionality. For example, the snippet bellow was copied from the Google Python Style guide:
class C(object): def method(self): method = my_decorator(method)
can be written as
class C(object): @my_decorator def method(self):
95
@benchmark def random_tree(n): temp = [n for n in range(n)] for i in range(n+1): temp[random.choice(temp)] = random.choice(temp) return temp
The most common decorators are @classmethod and @staticmethod, for converting ordinary methods to class or static methods.
Observer Pattern
The observer pattern is useful when we want to have a core object that maintains certain values, and then having some observers to create serialized copies of that object. This can be implemented by using the @properties decorator, placed before our functions (before def). This will control attribute access, for example, to make an attribute to be read-only. Properties are used for accessing or setting data instead of simple accessors or setters:
@property def radius(self): return self.__radius
Singleton Pattern
A class follows the singleton pattern if it allows exactly one instance of a certain object to exist. Since Python does not have private constructors, we use the new class method to ensure that only one instance is ever created. When we override it, we rst check whether our singleton instance was created. If not, we create it using a super class call:
>>> class SinEx:
96
... ... ... ...
_sing = None def __new__(self, *args, **kwargs): if not self._sing: self._sing = super(SinEx, self).__new__(self, *args, **kwargs) ... return self._sing >>> x = SinEx() >>> x <__main__.SinEx object at 0xb72d680c> >>> y = SinEx() >>> x == y True >>> y <__main__.SinEx object at 0xb72d680c>
The two objects are equal and are in the same address, so they are the same object.
5.4
Additional Exercises
97
98
Part II
99
Chapter 6
6.1
Stacks
A stack is a linear data structure that can be accessed only at one of its ends (which we will refers as the top) for either storing or retrieving. In other words, array access of elements in a stack is restricted and they are an example of a last-in-rst-out (LIFO) structure. You can think of a stack as a huge pile of books on your desk. Stacks need to have the following operations running at O(1): push Insert an item at the top of the stack. pop Remove an item from the top of the stack. 101
102
top/peek Look up the element on the top. empty/size Check whether the stack is empty or return its size. Stacks in Python can be implemented with lists and the methods append() and pop() (without an explicit index):
[adt/stacks/stack.py] class Stack(list): define the stack class def __init__(self): self.items = [] def isEmpty(self): return self.items == [] def push(self, items): self.items.append(items) def pop(self): if not self.isEmpty(): return self.items.pop() else: raise Exception(Stack is empty!) def peek(self): return self.items[-1] def size(self): return len(self.items) def main(): stack = Stack() stack.push(1) stack.push(2) stack.push(3) print(stack.size()) print(stack.peek()) print(stack.pop()) print(stack.peek()) if __name__ == __main__: main()
6.1. STACKS
103
Another approach we can use to implement a stack (or any abstract structure) is by thinking of it as a container of nodes (objects) following a LIFO order:1
class Node(object): def __init__(self, value=None): self.value = value self.next = None
We will use similar a Node Class in many examples in the rest of these notes.
104
def main(): stack = StackwithNodes() stack.push(1) stack.push(2) stack.push(3) print(stack.size()) print(stack.peek()) print(stack.pop()) print(stack.peek()) if __name__ == __main__: main()
Stacks are suitable for depth-rst traversal algorithms in graphs, as we will see in future chapters.
6.2
Queues
A queue, dierently of a stack, is a structure where the rst enqueued element (at the back) will be the rst one to be dequeued (when it is at the front), i.e., a queue is a rst-in-rst-out (FIFO) structure. You can think of a queue as a line of people waiting for a roller-coaster ride. Array access of elements in queues is also restricted and queues should have the following operations running at O(1): enqueue Insert an item at the back of the queue. dequeue Remove an item from the front of the queue. peek/front Retrieve an item at the front of the queue without removing it. empty/size Check whether the queue is empty or give its size.
6.2. QUEUES The example bellow shows a class for a queue in Python:
[adt/queues/queue.py] class Queue(object): a class for a queue def __init__(self): self.items = [] def isEmpty(self): return self.items == [] def enqueue(self, item): self.items.insert(0, item) def dequeue(self): return self.items.pop() def size(self): return len(self.items) def peek(self): if not self.isEmpty(): return self.items[-1] else: raise Exception(Queue is empty.) def size(self): return len(self.items)
105
def main(): queue = Queue() queue.enqueue(1) queue.enqueue(2) queue.enqueue(3) print(queue.size()) print(queue.peek()) print(queue.dequeue()) print(queue.peek()) if __name__ == __main__: main()
106
However, we have learned that the method insert() for lists in Python is very inecient (remember, lists only work on O(1) when we append or pop at/from their end, because otherwise all of the other elements would have to be shifted in memory). We can be smarter than that and write an ecient queue using two stacks (two lists) instead of one:
[adt/queues/queue_from_two_stacks.py] class Queue(object): an example of a queue implemented from 2 stacks def __init__(self): self.in_stack = [] self.out_stack = [] def enqueue(self, item): return self.in_stack.append(item) def dequeue(self): if self.out_stack: return self.out_stack.pop() while self.in_stack: self.out_stack.append(self.in_stack.pop()) if not self.out_stack: raise Exception("Queue empty!") return self.out_stack.pop() def size(self): return len(self.in_stack) + len(self.out_stack) def peek(self): if self.out_stack: return self.out_stack[-1] while self.in_stack: self.out_stack.append(self.in_stack.pop()) if self.out_stack: return self.out_stack[-1] else: return None def main(): queue = Queue() queue.enqueue(1) queue.enqueue(2) queue.enqueue(3)
6.2. QUEUES
print(queue.size()) print(queue.peek()) print(queue.dequeue()) print(queue.peek()) if __name__ == __main__: main()
107
Another approach is to implement a queue as a container for nodes, as we have done for stacks, but now the nodes are inserted and removed in a FIFO order:
[adt/queues/linked_queue.py] class Node(object): def __init__(self, value): self.value = value self.next = None
class LinkedQueue(object): Queue acts as a container for nodes (objects) that are inserted and removed according FIFO def __init__(self): self.front = None self.back = None def isEmpty(self): return bool(self.front) and bool(self.back) def dequeue(self): if self.front: value = self.front.value self.front = self.front.next return value raise Exception(Queue is empty, cannot dequeue.) def enqueue(self, value): node = Node(value) if self.front: self.back.next = node else: self.front = node self.back = node return True
108
def size(self): node = self.front if node: num_nodes = 1 node = node.next while node: num_nodes += 1 node = node.next return num_nodes def peek(self): return self.front.value def main(): queue = LinkedQueue() queue.enqueue(1) queue.enqueue(2) queue.enqueue(3) print(queue.size()) print(queue.peek()) print(queue.dequeue()) print(queue.peek()) if __name__ == __main__: main()
Queues are necessary for breath-rst traversal algorithms for graphs, as we will see in future chapters.
6.3
Deques
A deque is a double-ended queue, which can roughly be seen as an union of a stack and a queue:
[adt/queues/dequeue.py] class Deque(object): a class for a double ended queue def __init__(self): self.items = [] def isEmpty(self):
6.3. DEQUES
return self.items == [] def addFront(self, item): self.items.append(item) def addRear(self, item): self.items.insert(0, item) def removeFront(self): return self.items.pop() def removeRear(self): return self.items.pop(0) def size(self): return len(self.items) def __repr__(self): return {}.format(self.items) def main(): dq = Deque() dq.addFront(1) dq.addFront(2) dq.addFront(3) dq.addRear(40) dq.addRear(50) print(dq.size()) print(dq) if __name__ == __main__: main()
109
We see again the problem of inserting/removing items in Pythons lists in any positions that are not the end. The good news is that Python is your friend and its collections.deque gives us an ecient deque, with fast appending and popping from both ends:
>>> from collections import deque >>> q = deque(["buffy", "xander", "willow"]) >>> q deque([buffy, xander, willow]) >>> q.append("giles") >>> q
110
deque([buffy, xander, willow, giles]) >>> q.popleft() buffy >>> q.pop() giles >>> q deque([xander, willow]) >>> q.appendleft(angel) >>> q deque([angel, xander, willow])
Note that we can also specify the size of our deque. For example, we could have written q = deque(maxlen = 4) in the example above. Another interesting method for deques is rotate(n), which rotated the deque n steps to the right or, if n is negative, to the left. Interestingly, deques in Python are based on a doubly linked list,2 not in dynamic arrays. It means that operations such as inserting an item anywhere are fast (O(1)), but arbitrary index accessing can be slow (O(n)).
6.4
A priority queue is an abstract data type which is similar to a regular queue or stack, but where each element has a priority associated with it. If two elements have the same priority, they are served according to their order in the queue. A sensible implementation of a priority queue is given by a heap data structure and we will use it for most of our examples.
Heaps
Conceptually, a heap is a binary tree where each node is smaller (larger) than its children. We will learn about trees in the next chapters but we should already keep in mind that when modications are made, in a balanced tree, we can repair its structure with O(logn) runtimes. Heaps are generally useful for applications that repeatedly access the smallest (largest) element in the list. Moreover min-(max-)heap will let you to nd the smallest (largest) element in O(1) and to extract/add/replace it in O(ln n).
Linked lists are another abstract data structure that we will learn about at the end of this chapter. Doubly here means that their nodes have links to the next and to the previous node.
2
111
Once we have a heap, the heapq.heappush(heap, item) method is used to push the item onto it:
>>> import heapq >>> h = [] >>> heapq.heappush(h, (1, food)) >>> heapq.heappush(h, (2, have fun)) >>> heapq.heappush(h, (3, work)) >>> heapq.heappush(h, (4, study)) >>> h [(1, food), (2, have fun), (3, work), (4, study)]
The method heapq.heappop(heap) is used to pop and return the smallest item from the heap:
>>> [1, >>> 1 >>> [4, list1 4, 8, 6] heapq.heappop(list1) list1 6, 8]
The method heapq.heappushpop(heap, item) is used to push the item on the heap, then it pops and returns the smallest item from the heap. In a similar way, heapq.heapreplace(heap, item) will pop and return the smallest item from the heap, and then push the new item. The heap size does not change in any of these methods and they are more ecient than using each method separately. In addition, many operations can be made using the heaps propriety. For example heapq.merge(*iterables) will merge multiple sorted inputs
112
The methods heapq.nlargest(n, iterable[, key]) and heapq.nsmallest(n, iterable[, key]) will return a list with the n largest and smallest elements from the dataset dened by iterable.
113
def __max_heapify__(self, i): largest = i left = self.left_child(i) right = self.right_child(i) n = len(self.data) largest = (left < n and self.data[left] > self.data[i]) and left or i largest = (right < n and self.data[right] > self.data[largest]) and right or largest if i is not largest: self.data[i], self.data[largest] = self.data[largest], self.data[i] self.__max_heapify__(largest) def extract_max(self): n = len(self.data) max_element = self.data[0] self.data[0] = self.data[n - 1] self.data = self.data[:n - 1] self.__max_heapify__(0) return max_element
def test_Heapify(): l1 = [3, 2, 5, 1, 7, 8, 2] h = Heapify(l1) assert(h.extract_max() == 8) print ("Tests Passed!") if __name__ == __main__: test_Heapify()
114
class Item: def __init__(self, name): self.name = name def __repr__(self): return "Item({!r})".format(self.name) def test_PriorityQueue(): push and pop are all O(logN) q = PriorityQueue() q.push(Item(test1), 1) q.push(Item(test2), 4) q.push(Item(test3), 3) assert(str(q.pop()) == "Item(test2)") print(Tests passed!.center(20,*)) if __name__ == __main__: test_PriorityQueue()
6.5
Linked Lists
A linked list is like a stack (elements added to the head) or a queue (elements added to the tail), except that we can peek any element in the structure on O(1), not only the elements at the ends. In general, a linked list is simply a linear list of nodes containing a value and a pointer (a reference) to the next node (except for the last), where the reference points to None:
>>> class Node(object): ... def __init__(self, value=None, next=None): ... self.value = value ... self. next = next
115
We can adapt this node class accept some get and set methods:
class Node(object): def __init__(self, value): self.value = value self.next = None def getData(self): return self.value def getNext(self): return self.next def setData(self, newdata): self.value = newdata def setNext(self, newnext): self.next = newnext
116
def main(): ll = LinkList() print(ll.lenght) ll.addNode(1) ll.addNode(2) ll.addNode(3) print(ll.lenght) ll.printList() ll.deleteNode(4) ll.printList() print(ll.lenght) if __name__ == __main__: main()
117
118
from collections import Counter def main(): ll = LinkList() for i in range(1, 10): ll.addNode(i) ll.addNode(i+1) print(Linked List with duplicates:) ll.printList() print(Length before deleting duplicate is: , ll.length) ll.removeDupl() ll.printList() print(Lenght after deleting duplicates is: , ll.length) ll = LinkList() for i in range(1, 10):
119
Linked lists have a dynamic size at runtime and they are good for when you have an unknown number of items to store. Insertion is O(1) but deletion and searching can be O(n) because locating an element in a linked list is slow and is it done by a sequential search. Traversing backward or sorting a linked list are even worse, being both O(n2 ). A good trick to obtain deletion of a node i at O(1) is copying the data from i + 1 to i and then to deleting the node i + 1.
120
6.6
Additional Exercises
Stacks
Stacks are very useful when data has to be sorted and retrieved in reverse order. In the example bellow we use our Stack class to reverse a string:
[adt/stacks/reverse_string_with_stack.py] import sys import stack def reverse_string_with_stack(str1): s = stack.Stack() revStr = for c in str1: s.push(c) while not s.isEmpty(): revStr += s.pop() return revStr def test_reverse_string_with_stack(): str1 = Buffy is a Slayer! assert(reverse_string_with_stack(str1) == !reyalS a si yffuB) print(Tests passed!) if __name__ == __main__: test_reverse_string_with_stack()
121
The following example uses a stack to transform a decimal number to binary number:
[atf/stacks/dec2bin_with_stack.py] from stack import Stack def dec2bin_with_stack(decnum): s = Stack() str_aux = while decnum > 0: dig = decnum % 2 decnum = decnum//2 s.push(dig) while not s.isEmpty(): str_aux += str(s.pop()) return str_aux
def test_dec2bin_with_stack(module_name=this module): decnum = 9 assert(dec2bin_with_stack(decnum) == 1001) s = Tests in {name} have {con}!
122
The following example implements a stack that has O(1) minimum lookup:
[adt/stacks/stack_with_min.py] class Stack(list): def push(self, value): if len(self) > 0: last = self[-1] minimum = self._find_minimum(value, last) else: minimum = value self.minimum = minimum self.append(NodeWithMin(value, minimum)) def _find_minimum(self, value, last_value): if value < last_value.minimum: return value return last_value.minimum def min(self): return self.minimum class NodeWithMin(object): def __init__(self, value, minimum): self.value = value self.minimum = minimum def __repr__(self): return str(self.value) def min(self): return self.minimum def main(): stack = Stack() stack.push(1) stack.push(2) stack.push(3) node = stack.pop()
123
The following example implements a set of stacks, composed of several stacks. It creates a new stack when the previous stack exceeds capacity. The push and pop methods are identical to a single stack:
[adt/stacks/set_of_stacks.py] class SetOfStacks(list): def __init__(self, capacity=4): self.stacks = [] self.last_stack = [] self.capacity = capacity self.stacks.append(self.last_stack) def __repr__(self): return str(self.stacks) def push(self, value): last_stack = self.last_stack if len(last_stack) is self.capacity: last_stack = [] self.last_stack = last_stack self.stacks.append(last_stack) last_stack.append(value) def pop(self): last_stack = self.last_stack value = last_stack.pop() if len(last_stack) is 0: self.stacks.pop() self.last_stack = self.stacks[-1] return value
124
def main(): stack = SetOfStacks() stack.push(1) stack.push(2) stack.push(3) stack.push(4) stack.push(5) stack.push(6) print(stack) stack.pop() stack.pop() stack.pop() print(stack) if __name__ == __main__: main()
Queues
The example bellow uses the concepts of a queue to rotate an array from right to left for a given number n:3
[adt/queues/rotating_array.py] def rotating_array(seq, n): myqueue = [] for i in range(n): myqueue.append(seq.pop()) myqueue.reverse() return myqueue + seq
def test_rotating_array(module_name=this module): seq = [1, 2, 3, 4, 5, 6, 7] n = 4 assert(rotating_array(seq, N) == [4, 5, 6, 7, 1, 2, 3]) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed)) if __name__ == __main__:
3
We could get the same eect using collections.deque with the method rotate(n).
125
Deques
A nice application for a double-ended queue is verifying whether a string is a palindrome:
[adt/queues/palindrome_checker_with_deque.py] import sys import string import collections def palindrome_checker_with_deque(str1): d = collections.deque() eq = True strip = string.whitespace + string.punctuation + "\"" for s in str1.lower(): if s not in strip: d.append(s) while len(d) > 1 and eq: first = d.pop() last = d.popleft() if first != last: eq = False return eq def test_palindrome_checker_with_deque(): str1 = Madam Im Adam str2 = Buffy is a Slayer assert(palindrome_checker_with_deque(str1) == True) assert(palindrome_checker_with_deque(str2) == False) print(Tests passed!) if __name__ == __main__: test_palindrome_checker_with_deque()
126
[adt/heap/find_N_largest_smallest_items_seq.py] import heapq def find_N_largest_items_seq(seq, N): return heapq.nlargest(N,seq) def find_N_smallest_items_seq(seq, N): return heapq.nsmallest(N, seq) def find_smallest_items_seq_heap(seq): find the smallest items in a sequence using heapify first heap[0] is always the smallest item heapq.heapify(seq) return heapq.heappop(seq) def find_smallest_items_seq(seq): if it is only one item, min() is faster return min(seq) def find_N_smallest_items_seq_sorted(seq, N): N ~ len(seq), better sort instead return sorted(seq)[:N] def find_N_largest_items_seq_sorted(seq, N): N ~ len(seq), better sort instead return sorted(seq)[len(seq)-N:] def test_find_N_largest_smallest_items_seq(module_name=this module): seq = [1, 3, 2, 8, 6, 10, 9] N = 3 assert(find_N_largest_items_seq(seq, N) == [10, 9, 8]) assert(find_N_largest_items_seq_sorted(seq, N) == [8, 9, 10]) assert(find_N_smallest_items_seq(seq, N) == [1,2,3]) assert(find_N_smallest_items_seq_sorted(seq, N) == [1,2,3]) assert(find_smallest_items_seq(seq) == 1) assert(find_smallest_items_seq_heap(seq) == 1) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed))
if __name__ == __main__:
127
The following example uses Pythons heapq package to merge a two sorted sequences with little overhead:4
[adt/heap/merge_sorted_seqs.py] import heapq def merge_sorted_seqs(seq1, seq2): result = [] for c in heapq.merge(seq1, seq2): result.append(c) return result def test_merge_sorted_seq(module_name=this module): seq1 = [1, 2, 3, 8, 9, 10] seq2 = [2, 3, 4, 5, 6, 7, 9] seq3 = seq1 + seq2 assert(merge_sorted_seq(seq1, seq2) == sorted(seq3)) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed)) if __name__ == __main__: test_merge_sorted_seq()
Linked List
The following example implements a linked list class from stack methods:
[adt/linked_lists/linked_list_from_stack.py] class Node(object): def __init__(self,data=None,next=None): self.data = data self.next = next def setnext(self,next): self.next = next
4
Note that the result would not be sorted if we just added both lists.
128
def __str__(self): return "%s" % self.data class LinkedListStack(object): def __init__(self, max=0): self.max = max self.head = None self.z = None self.size = 0 def push(self, data): if self.size == 0: self.head = Node(data) self.size += 1 else: head = self.head node = Node(data = data) self.head = node node.setnext(head) def pop(self): node = self.head.next self.head = node def isEmpty(self): return self.size == 0 def __str__(self): d = "" if self.isEmpty(): return "" else: temp = self.head d += "%s\n" % temp while temp.next != None: temp = temp.next d += "%s\n" % temp return d def test_ll_from_stack(): ll = LinkedListStack(max = 20) ll.push("1") ll.push("2") ll.push("3") ll.push("4")
129
The following snippet shows an example of an ordered linked list. In this case, the list is a collection of items where each item holds a relative position that is based upon some underlying characteristic of the item. The ordering is typically either ascending or descending and we assume that list items have a meaningful comparison operation that is already dened. Many of the ordered list operations are the same as those from the unordered list:
[adt/linked_lists/ordered_list.py] from Node import Node class OrderedList(object): def __init__(self): self.head = None def add(self,item): this method is different from linked list current = self.head previous = None stop = False while current != None and not stop: if current.getData() > item: stop = True else: previous = current current = current.getNext() temp = Node(item) if previous == None: temp.setNext(self.head) self.head = temp else: temp.setNext(current) previous.setNext(temp) def length(self): current = self.head
130
def test_OrderedList(module_name=this module): olist = OrderedList() olist.add(31) olist.add(22) olist.add(10) assert(olist.search(22) == True) olist.remove(22) assert(olist.search(22) == False) s = Tests in {name} have {con}!
131
132
Chapter 7
Asymptotic Analysis
Asymptotic analysis is a method to describe the limiting behavior and the performance of algorithms when applied to very large input datasets. To understand why asymptotic analysis is important, suppose you have to sort a billion of numbers (n = 109 )1 in a common desktop computer. Suppose that this computer has a CPU clock time of 1 GHz, which roughly means that it executes 109 processor cycles (or operations) per second.2 Then, for an algorithm that has a runtime of O(n2 ), it would take approximately one billion of seconds to nish the sorting (in the worst case) which means one entire year! Another way of visualizing the importance of asymptotic analysis is directly looking to the functions behaviour. In the Fig. 7 we have many classes of functions plotted together and it is clear that when n increases, the number of operations for any polynomial or exponential algorithm is infeasible.
7.1
Complexity Classes
A complexity class is a set of problems with related complexity. A reduction is a transformation of one problem into another problem which is at least as dicult as the original problem. The most commonly used reduction
Remember that for memory gigabytes means 10243 = 230 bytes and for storage it means 10003 = 109 bytes. Also, integers usually take 2 or 4 bytes each. However, for this example we are simplifying this by saying that a number has 1 byte. 2 In this exercise we are not considering other factors that would make the processing slower, such as RAM latency, copy cache operations, etc.
1
133
134
is a polynomial-time reduction, meaning that the reduction process takes polynomial time. A problem is hard for a class of problems if every problem in it can be reduced to the original problem.
P
The complexity class of decision problems that can be solved on a deterministic Turing machine in polynomial time (in the worst case). If we can turn a problem into a decision problem, the result would belong to P.
NP
The complexity class of decision problems that can be solved on a nondeterministic Turing machine (NTM) in polynomial time. In other words, it includes all decision problems whose yes instances can be solved in polynomial time with the NTM. A problem is called complete if all problems in the class are reduced to it. Therefore, the subclass called NP-complete (NPC) contains the hardest problems in all of NP. Any problem that is at least as hard (determined by polynomial-time reduction) as any problem in NP, but that need not itself be in NP, is called NP-hard. For example, nding the shortest route through a graph,
7.2. RECURSION which is called the Travelling Salesman (or Salesrep) problem (TSP).
135
P=NP?
The class co-NP is the class of the complements of NP problems. For every yes answer, we have the no, and vice versa. If NP is truly asymmetric, then these two classes are dierent. Although there is overlap between them because all of P lies in their intersection: both the yes and no instances in P can be solved in polynomial time with an NTM. What would happen if a NPC was found in a intersection of N and co-NP? First, it would mean that all of NP would be inside co-NP, so we would show NP = co-NP and the asymmetry would disappear. Second, since all of P is in this intersection, P = NP. If P = NP, we could solve any (decision) problem that had a practical (veriable) solution. However, it is (strongly) believed that NP and co-NP are dierent. For instance, no polynomial solution to the problem of factoring numbers was found, and this problem is in both NP and co-NP.
7.2
Recursion
The three laws of recursion are: 1. A recursive algorithm must have a base case. 2. A recursive algorithm must change its state and move toward the base case. 3. A recursive algorithm must call itself, recursively. For every recursive call, the recursive function has to allocate memory on the stack for arguments, return address, and local variables, costing time to push and pop these data onto the stack. Recursive algorithms take at least O(n) space where n is the depth of the recursive call. Recursion is very costly when there are duplicated calculations and/or there are overlap among subproblems. In some cases this can cause the stack to overow. For this reason, where subproblems overlap, iterative solutions might be a better approach. For example, in the case of the Fibonacci series, the iterative solution runs on O(n) while the recursive solution runs on exponential runtime.
136
Recursive Relations
To describe the running time of recursive functions, we use recursive relations: T (n) = a T (g (n)) + f (n), where a represents the number of recursive calls, g (n) is the size of each subproblem to be solved recursively, and f (n) is any extra work done in the function. The following table shows examples of recursive relations:
T (n) = T (n 1) + 1 T (n) = T (n 1) + n T (n) = 2T (n 1) + 1 T (n) = T (n/2) + 1 T (n) = T (n/2) + n T (n) = 2T (n/2) + 1 T (n) = 2T (n/2) + n O(n) O(n2 ) O(2n ) O(ln n) O(n) O(n) O(n ln n) Processing a sequence Handshake problem Towers of Hanoi Binary search Randomized select Tree transversal Sort by divide and conquer
7.3
Runtime in Functions
We are now ready to estimate algorithm runtimes. First of all, if the algorithm does not have any recursive calling, we only need to analyse its data structures and ow blocks. In this case, complexities of code blocks executed one after the other are just added and complexities of nested loops are multiplied. If the algorithm has recursive calls, we can use the recursive functions from the previous section to nd the runtime. When we write a recurrence relation for a function, we must write two equations, one for the general case and one for the base case (that should be O(1), so that T (1) = 1). Keeping this in mind, let us take a look at the example of the algorithm to nd the nth element in a Fibonacci sequence, which is known as to be exponential:
137
Here, g (n)s are n 2 and n 1, a is 2, and f (n) is 1, so the recursive relation in this case is T (n) = 2T (n 1) + 1. Now let us open this equation for each next recursion: T (n) = 22 T (n 2) + 2 2k T (n k ) + k... We need to make sure that the function have O(1) in the base case, where it is T (1) = 1, this means that n k = 1 or k = n 1. So plugging back into the equation, we have: T (n) = 2n1 + n 1 2n . (7.3.1)
We have indeed proved that this algorithm is exponential! The same process can be done for each recursive relation and the following table shows the runtime results for many algorithm:
138
O(n2 ) O(n ln n)
quadratic loglinear
insertion, selection sort algorithms breaking problem into smaller chunks per invocation, and then sticking the results together, quick and merge sort iteration over a list algorithms breaking problem into smaller chunks per invocation, searching a binary search tree hash table lookup/modication k-nested for loops over n producing every subset of n items producing every ordering of n values
Chapter 8
Sorting
The simplest way of sorting a group of items is to start by removing the smallest item from the group, and putting it rst. Then removing the next smallest, and putting it next and so on. This is clearly an O(n2 ) algorithm, so we need to nd a better solution. In this chapter we will look at many examples of sorting algorithms and analyse their characteristics and runtimes. An in-place sort does not use any additional memory to do the sorting (for example, swapping elements in an array). A stable sort preserves the relative order of otherwise identical data elements (for example, if two data elements have identical values, the one that was ahead of the other stays ahead). In any comparison sort problem, a key is the value (or values) that determines the sorting order. A comparison sort requires only that there is a way to determine if a key is less than, equal to, or greater than another key. Most sorting algorithms are comparison sorts where the worst-case running time for such sorts can be no better than O(n ln n).
8.1
Quadratic Sort
Insertion Sort
Insertion sort is a simple sorting algorithm with best runtime case runtime of O(n) and average and worst runtime cases of O(n2 ). It sorts by repeatedly inserting the next unsorted element in an initial sorted segment of the array. For small data sets, it can be preferable to more advanced algorithms such as merge sort or quicksort if the list is already sorted (it is a good way to add new elements to a presorted list):
139
140
[sorting/insertion_sort.py]
CHAPTER 8. SORTING
def insertion_sort(seq): for i in range(1, len(seq)): j = i while j > 0 and seq[j-1] > seq[j]: seq[j-1], seq[j] = seq[j], seq[j-1] j -= 1 return seq def insertion_sort_rec(seq, i = None): if i == None: i = len(seq) -1 if i == 0: return i insertion_sort_rec(seq, i-1) j = i while j > 0 and seq[j-i] > seq[j]: seq[j-1], seq[j] = seq[j], seq[j-1] j -= 1 return seq def test_insertion_sort(): seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2, 5, 4, 1, 5, 3] assert(insertion_sort(seq) == sorted(seq)) assert(insertion_sort_rec(seq) == sorted(seq)) print(Tests passed!) if __name__ == __main__: test_insertion_sort()
Selection Sort
Selection sort is based on nding the smallest or largest element in a list and exchanging it to the rst, then nding the second, etc, until the end is reached. Even when the list is sorted, it is O(n2 ) (and not stable):
[sorting/selection_sort.py] def selection_sort(seq): for i in range(len(seq) -1, 0, -1): max_j = i for j in range(max_j): if seq[j] > seq[max_j]:
141
Gnome Sort
Gnome sort works by moving forward to nd a misplaced value and then moving backward to place it in the right position:
[sorting/gnome_sort.py] def gnome_sort(seq): i = 0 while i < len(seq): if i ==0 or seq[i-1] <= seq[i]: i += 1 else: seq[i], seq[i-1] = seq[i-1], seq[i] i -= 1 return seq def test_gnome_sort(): seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2, 5, 4, 1, 5, 3] assert(gnome_sort(seq) == sorted(seq)) print(Tests passed!) if __name__ == __main__: test_gnome_sort()
142
CHAPTER 8. SORTING
8.2
Linear Sort
Count Sort
Count sort sorts integers with a small value range, counting occurrences and using the cumulative counts to directly place the numbers in the result, updating the counts as it goes. There is a loglinear limit on how fast you can sort if all you know about your data is that they are greater or less than each other. However, if you can also count events, sort becomes linear in time, O(n + k ):
[sorting/count_sort.py] from collections import defaultdict def count_sort_dict(a): b, c = [], defaultdict(list) for x in a: c[x].append(x) for k in range(min(c), max(c) + 1): b.extend(c[k]) return b def test_count_sort(): seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2, 5, 4, 1, 5, 3] assert(count_sort_dict(seq) == sorted(seq)) print(Tests passed!) if __name__ == __main__: test_count_sort()
If several values have the same key, they will have the original order with respect with each other, so the algorithm is stable.
8.3
Loglinear Sort
143
a very ecient implementation of the Pythons timsort algorithm1 . With the sorted() function, the original object is not changed. The sorted() function can also be customized though optional arguments, for example: reverse=True; key=len; str.lower (treating uppercase and lowercase the same); or even with a custom sorting function.
Merge Sort
Merge sort divides the list in half to create two unsorted lists. These two unsorted lists are sorted and merged by continually calling the merge-sort algorithm, until you get a list of size 1. The algorithm is stable, as well as fast for large data sets. However, since it is not in-place, it requires much more memory than many other algorithms. The space complexity is O(n) for arrays and O(ln n) for linked lists2 . The best, average, and worst case times are all O(n ln n). Merge sort is a good choice when the data set is too large to t into the memory. The subsets can be written to disk in separate les until they are small enough to be sorted in memory. The merging is easy, and involves just reading single elements at a time from each le and writing them to the nal le in the correct order:
[sorting/merge_sort.py] O(log(n)) def merge_sort(seq): if len(seq) < 2 : return seq mid = len(seq)//2 left, right = None, None if seq[:mid]: left = merge_sort([:mid]) if seq[mid:]: right = merge_sort([mid:]) return merge_n(left,right) #O(2n) def merge_2n(left, right): if not left or not right: return left or right result = []
Timsort is a hybrid sorting algorithm, derived from merge sort and insertion sort, and invented by Tim Peters for Python. 2 Never ever consider to sort a linked list tough, it is problem the worst option you have in terms of runtime complexity.
1
144
while left and right : if left[-1] >= right[-1]: result.append(left.pop()) else: result.append(right.pop()) result.reversed() return (left or right) + result
CHAPTER 8. SORTING
#O(n) def merge_n(left,right): if not left or not right: return left or right result = [] i,j = 0,0 while i < len(left) and j < len(right): if left[i] <= right[i]: result.append(left[i]) i+=1 else : result.append(right[j]) j+=1 if i < len(left) - 1 : result+=left[i:] elif j < len(right) - 1 : result += right[j:] return result def test_merge_sort(): seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2] assert(merge_sort(seq) == sorted(seq)) print(Tests passed!) if __name__ == __main__: test_merge_sort()
Quick Sort
Quick sort works by choosing a pivot and partitioning the array so that the elements that are smaller than the pivot goes to the left. Then, it recursively sorts the left and right parts. The choice of the pivot value is a key to the performance. It can be shown that always choosing the value in the middle of the set is the best choice for already-sorted data and no worse than most other choices for random unsorted data.
145
The worst case is O(n2 ) in the rare cases when partitioning keeps producing a region of n 1 elements (when the pivot is the minimum value). The best case produces two n/2-sized lists. This and the average case are both O(n ln n). The algorithm is not stable.
[sorting/quick_sort.py] def quick_sort(seq): if len(seq) < 2 : return seq mid = len(seq)//2 pi = seq[mid] seq = seq[:mid] + seq[mid+1:] lo = [x for x in seq if x <= pi] hi = [x for x in seq if x > pi] return quick_sort(lo) + [pi] + quick_sort(hi) def test_quick_sort(): seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2] assert(quick_sort(seq) == sorted(seq)) assert(quick_sort_divided(seq) == sorted(seq)) print(Tests passed!) if __name__ == __main__: test_quick_sort()
Heap Sort
Heap sort is similar to a selection sort, except that the unsorted region is a heap, so nding the largest element n times gives a loglinear runtime. In a heap, for every node other than the root, the value of the node is at least (at most) the value of its parent. Thus, the smallest (largest) element is stored at the root and the subtrees rooted at a node contain larger (smaller) values than does the node itself. Although the insertion is only O(1), the performance of validating (the heap order) is O(ln n). Searching (traversing) is O(n). In Python, a heap sort can be implemented by pushing all values onto a heap and then popping o the smallest values one at a time:
[sorting/heap_sort1.py] import heapq
146
CHAPTER 8. SORTING
def heap_sort1(seq): heap sort with Pythons heapq h = [] for value in seq: heapq.heappush(h, value) return [heapq.heappop(h) for i in range(len(h))] def test_heap_sort1(): seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2] assert(heap_sort1(seq) == sorted(seq)) print(Tests passed!) if __name__ == __main__: test_heap_sort1()
If we decide to use the heap class that we have from the last chapters, we can write a heap sort simply by:
[sorting/heap_sort2.py] from heap import Heap def heap_sort2(seq): heap = Heap(seq) res = [] for i in range(len(seq)): res.insert(0, heap.extract_max()) return res def test_heap_sort2(): seq = [3, 5, 2, 6, 8, 1, 0, 3, 5, 6, 2] assert(heap_sort2(seq) == sorted(seq)) print(Tests passed!) if __name__ == __main__: test_heap_sort2()
147
148
CHAPTER 8. SORTING
8.4
149
8.5
Additional Exercises
Quadratic Sort
The following program implements a bubble sort, a very inecient sorting algorithm:
[searching/bubble_sort.py] def bubble_sort(seq): size = len(seq) -1 for num in range(size, 0, -1): for i in range(num): if seq[i] > seq[i+1]: temp = seq[i] seq[i] = seq[i+1] seq[i+1] = temp return seq def test_bubble_sort(module_name=this module): seq = [4, 5, 2, 1, 6, 2, 7, 10, 13, 8] assert(bubble_sort(seq) == sorted(seq)) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed)) if __name__ == __main__: test_bubble_sort()
Linear Sort
The example bellow shows a simple count sort for people ages:
def counting_sort_age(A): oldestAge = 100 timesOfAge = [0]*oldestAge ageCountSet = set() B = [] for i in A: timesOfAge[i] += 1 ageCountSet.add(i) for j in ageCountSet: count = timesOfAge[j]
150
while count > 0: B.append(j) count -= 1 return B
CHAPTER 8. SORTING
The example bellow uses quick sort to nd the k largest elements in a sequence:
[sorting/find_k_largest_seq_quicksort.py] import random def swap(A, x, y): tmp = A[x] A[x] = A[y] A[y] = tmp def qselect(A, k, left=None, right=None): left = left or 0 right = right or len(A) - 1 pivot = random.randint(left, right) pivotVal = A[pivot] swap(A, pivot, right) swapIndex, i = left, left while i <= right - 1: if A[i] < pivotVal: swap(A, i, swapIndex) swapIndex += 1 i += 1 swap(A, right, swapIndex) rank = len(A) - swapIndex if k == rank: return A[swapIndex] elif k < rank: return qselect(A, k, left=swapIndex+1, right=right) else: return qselect(A, k, left=left, right=swapIndex-1)
151
152
CHAPTER 8. SORTING
Chapter 9
Searching
The most common searching algorithms are the sequential search and the binary search. If an input array is not sorted, or the input elements are accommodated by dynamic containers (such as linked lists), the search has to be sequential. If the input is a sorted array, the binary search algorithm is the best choice. If we are allowed to use auxiliary memory, a hash table might help the search, with which a value can be located in O(1) time with a key.
9.1
Sequential Search
In the following example we illustrate the runtime of a sequential search for items stored in a collection. If the item is present, the best case is O(1), the average case is O(n/2), and the worst case is O(n). However, if the item is not present, all three cases will be O(n):
[searching/sequential_search.py] def sequential_search(seq, n): for item in seq: if item == n: return True return False def test_sequential_search(module_name=this module): seq = [1, 5, 6, 8, 3] n1 = 5 n2 = 7 assert(sequential_search(seq, n1) == True)
153
154
CHAPTER 9. SEARCHING
assert(sequential_search(seq, n2) == False) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed))
Now, if we sort the sequence rst, we can improve the sequential search in the case when the item is not present to have the same runtimes as when the item is present:
[searching/ordered_sequential_search.py] def ordered_sequential_search(seq, n): item = 0 for item in seq: if item > n: return False if item == n: return True return False def test_ordered_sequential_search(module_name=this module): seq = [1, 2, 4, 5, 6, 8, 10] n1 = 10 n2 = 7 assert(ordered_sequential_search(seq, n1) == True) assert(ordered_sequential_search(seq, n2) == False) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed)) if __name__ == __main__: test_ordered_sequential_search()
9.2
Binary Search
A binary search nds the position of a specied input value (the key) within a sorted array. In each step, the algorithm compares the search key value with the key value of the middle element of the array. If the keys match the items index, (position) is returned. Otherwise, if the search key is less than the middle elements key, the algorithm repeats the process in the left subarray; or if the search key is greater, on the right subarray. The algorithm
155
Note that the module returns the index after the key, which is where you should place the new value. Other available functions are bisect right and bisect left.
156
CHAPTER 9. SEARCHING
9.3
Additional Exercises
Searching in a Matrix
The following module searches an entry in a matrix where the rows and columns are sorted. In this case, every row is increasingly sorted from left to right, and every column is increasingly sorted from top to bottom. The runtime is linear on O(m + n):
[general_problems/numbers/search_entry_matrix.py] def find_elem_matrix_bool(m1, value): found = False row = 0 col = len(m1[0]) - 1 while row < len(m1) and col >= 0: if m1[row][col] == value: found = True break elif m1[row][col] > value: col-=1 else: row+=1 return found def test_find_elem_matrix_bool(module_name=this module): m1 = [[1,2,8,9], [2,4,9,12], [4,7,10,13], [6,8,11,15]] assert(find_elem_matrix_bool(m1,8) == True) assert(find_elem_matrix_bool(m1,3) == False) m2 = [[0]] assert(find_elem_matrix_bool(m2,0) == True) s = Tests in {name} have {con}! print(s.format(name=module_name, con=passed)) if __name__ == __main__: test_find_elem_matrix_bool()
The following problem searches an element in a matrix where in every row, the values increasing from left to right, but the last number in a row is smaller than the rst number in the next row. The naive brute force solution scans all numbers and costs O(mn). However, since the numbers are already sorted, the matrix can be viewed as a 1D sorted array and we can use the binary search algorithm with eciency O(lognm):
157
[searching/searching_in_a_matrix.py] import numpy def searching_in_a_matrix(m1, value): rows = len(m1) cols = len(m1[0]) lo = 0 hi = rows*cols while lo < hi: mid = (lo + hi)//2 row = mid//cols col = mid%cols v = m1[row][col] if v == value: return True elif v > value: hi = mid else: lo = mid+1 return False def test_searching_in_a_matrix(): a = [[1,3,5],[7,9,11],[13,15,17]] b = numpy.array([(1,2),(3,4)]) assert(searching_in_a_matrix(a, 13) == True) assert(searching_in_a_matrix(a, 14) == False) assert(searching_in_a_matrix(b, 3) == True) assert(searching_in_a_matrix(b, 5) == False) print(Tests passed!) if __name__ == __main__: test_searching_in_a_matrix()
Unimodal Arrays
An array is unimodal if it consists of an increasing sequence followed by a decreasing sequence. The example below shows how to nd the locally maximum of an array using binary search:
[searching/find_max_unimodal_array.py] def find_max_unimodal_array(A): if len(A) <= 2 : return None left = 0
158
CHAPTER 9. SEARCHING
right = len(A)-1 while right > left +1: mid = (left + right)//2 if A[mid] > A[mid-1] and A[mid] > A[mid+1]: return A[mid] elif A[mid] > A[mid-1] and A[mid] < A[mid+1]: left = mid else: right = mid return None
def test_find_max_unimodal_array(): seq = [1, 2, 5, 6, 7, 10, 12, 9, 8, 7, 6] assert(find_max_unimodal_array(seq) == 12) print(Tests passed!) if __name__ == __main__: test_find_max_unimodal_array()
159
Intersection of Arrays
The snippet bellow shows three ways to perform the intersection of two sorted arrays. The simplest way is to use sets, however this will not preserve the ordering. The second example uses an adaptation of the merge sort. The
160
CHAPTER 9. SEARCHING
third example is suitable when one of the arrays is much larger than other. In this case, binary search is the best option:
[searching/intersection_two_arrays.py] def intersection_two_arrays_sets(seq1, seq2): find the intersection of two arrays using set proprieties set1 = set(seq1) set2 = set(seq2) return set1.intersection(set2) #same as list(set1 & set2
def intersection_two_arrays_ms(seq1, seq2): find the intersection of two arrays using merge sort res = [] while seq1 and seq2: if seq1[-1] == seq2[-1]: res.append(seq1.pop()) seq2.pop() elif seq1[-1] > seq2[-1]: seq1.pop() else: seq2.pop() res.reverse() return res
from binary_search import binary_search def intersection_two_arrays_bs(seq1, seq2): using binary search if len(seq1) > len(seq2): seq, key = seq1, seq2 else: seq, key = seq2, seq1 intersec = [] for item in key: if binary_search(seq, item): intersec.append(item) return intersec
161
162
CHAPTER 9. SEARCHING
Chapter 10
Dynamic Programming
Dynamic programming is used to simplify a complicated problem by breaking it down into simpler subproblems by means of recursion. If a problem has an optimal substructure and overlapping subproblems, it may be solved by dynamic programming. Optimal substructure means that the solution to a given optimization problem can be obtained by a combination of optimal solutions to its subproblems. The rst step to utilize dynamic programming is to check whether the problem exhibits such optimal substructure. The second step is having overlapping problems by solving subproblems once and then storing the solution to be retrieved. A choice is made at each step in dynamic programming, and the choice depends on the solutions to subproblems, bottom-up manner, from smaller subproblems to larger subproblems.
10.1
Memoization
163
164
from functools import wraps def memo(func): cache = {} @wraps(func) def wrap(*args): if args not in cache: cache[args] = func(*args) return cache[args] return wrap >>> fibonacci = memo(fibonacci) >>> fibonacci(130) 1066340417491710595814572169
165
10.2
Additional Exercises
def naive_longest_inc_subseq(seq): exponential solution to the longest increasing subsequence problem for length in range(len(seq), 0, -1): for sub in combinations(seq, length): if list(sub) == sorted(sub): return len(sub)
def longest_inc_subseq1(seq): iterative solution for the longest increasing subsequence problem end = [] for val in seq: idx = bisect(end, val) if idx == len(end): end.append(val) else: end[idx] = val return len(end)
def longest_inc_subseq2(seq): another iterative algorithm for the longest increasing subsequence problem
1
See other versions of this problem in the end of the chapter about lists in Python.
166
def memoized_longest_inc_subseq(seq): memoized recursive solution to the longest increasing subsequence problem @memo def L(cur): res = 1 for pre in range(cur): if seq[pre] <= seq[cur]: res = max(res, 1 + L(pre)) return res return max(L(i) for i in range(len(seq)))
@benchmark def test_naive_longest_inc_subseq(): print(naive_longest_inc_subseq(s1)) benchmark def test_longest_inc_subseq1(): print(longest_inc_subseq1(s1)) @benchmark def test_longest_inc_subseq2(): print(longest_inc_subseq2(s1)) @benchmark def test_memoized_longest_inc_subseq(): print(memoized_longest_inc_subseq(s1))
if __name__ == __main__: from random import randrange s1 = [randrange(100) for i in range(40)] print(s1) test_naive_longest_inc_subseq() test_longest_inc_subseq1() test_longest_inc_subseq2()
167
168
Part III
169
Chapter 11
Introduction to Graphs
11.1 Basic Denitions
A graph is an abstract network, consisting of nodes (or vertices, V ) connected by edges (or arcs, E ). A graph can be dened as a pair of sets, G = (V, E ), where the node set V is any nite set, and the edge set E is a set of node pairs. For example, some graph can be simply represented by the node set V = {a, b, c, d} and the edge set E = {{a, b}, {b, c}, {c, d}, {d, a}}. Direction of a Graph If a graph has no direction, it is referred as undirected. In this case, nodes with an edge between them are adjacent and adjacent nodes are neighbors. If the edges have a direction, the graph is directed (digraph). In this case, the graph has leaves. The edges are no longer unordered: an edge between nodes u and v is now either an edge (u, v ) from u to v , or an edge (v, u) from v to u. We can say that in a digraph G, the function E (G) is a relation over V (G). Subgraphs A subgraph of G consists of a subset of V and E . A spanning subgraph contains all the nodes of the original graph. Completeness of a Graph If all the nodes in a graph are pairwise adjacent, the graph is called complete.
171
The number of undirected edges incident on a node is called degree. Zerodegree graphs are called isolated. For directed graphs, we can split this number into in-degree (incoming edges/parents) and out-degree/children (outgoing edges). Paths, Walks, and Cycle A path in G is a subgraph where the edges connect the nodes in a sequence, without revisiting any node. In a directed graph, a path has to follow the directions of the edges. A walk is an alternating sequence of nodes and edges that allows nodes and edges to be visited multiple times. A cycle is like a path except that the last edge links the last node to the rst. Length of a Path The length of a path or walk is the value given by its edge count.
Weight of an Edge
Associating weights with each edge in G gives us a weighted graph. The weight of a path or cycle is the sum of its edge weights. So, for unweighted graphs, it is simply the number of edges. Planar Graphs A graph that can be drawn on the plane without crossing edges is called planar. This graph has regions, which are areas bounded by the edges.The Eulers formula for connected planar graphs says that V E + F = 2, where V, E, F are the number of nodes, edges, and regions, respectively. Graph Traversal A traversal is a walk through all the connected components of a graph. The main dierence between graph traversals is the ordering of the to-do list among the unvisited nodes that have been discovered.
173
An undirected graph is connected if there is a path from every node to every other node. A directed graph is connected if its underlying undirected graph is connected. A connected component is a maximal subgraph that is connected. Connected components can be found using traversal algorithms such as depthrst searching (DFS) or breath-rst searching (BFS), as we will see in following chapters. If there is a path from every node to every other node in a directed graph, the graph is called strongly connected. A strongly connected component (SCC) is a maximal subgraph that is strongly connected. Trees and Forests A forest is a cycle-free graph. A tree is an acyclic, connected, and directed graph. A forest consists of one of more trees. In a tree, any two two nodes are connected by exactly one path. Adding any new edge to it creates a cycle and removing any edge yields unconnected components.
11.2
A graphs neighborhood function, N (V ), is a container (or iterable object) of all neighbors of V . The most well-known data structures used to represent them are adjacent lists and adjacent matrices.
Adjacent Lists
For each node in an adjacent list, we have access to a list (or set or container or iterable) of its neighbor. Supposing we have n nodes, each adjacent (or neighbor) list is just a list of such numbers. We place the lists into a main list of size n, indexable by the node numbers, where the order is usually arbitrary. Using Sets as Adjacent Lists: We can use Pythons set type to implement adjacent lists:
>>> a,b,c,d,e,f = range(6) # nodes
174
>>> N = [{b,c,d,f}, {a,d,f}, {a,b,d,e}, {a,e}, {a,b,c}, {b,c,d,e}] >>> b in N[a] # membership True >>> b in N[b] # membership False >>> len(N[f]) # degree 4
Using Lists as Adjacent Lists: We can also use Pythons lists to implement adjacent lists, which let you eciently iterate N (V ) over any node V . Replacing sets with lists makes membership checking to be O(n). If all that your algorithm does is iterating over neighbors, using list may be preferential. However if the graph is dense (many edges), adjacent sets are a better solution:
>>> a,b,c,d,e,f = range(6) # nodes >>> N = [[b,c,d,f], [a,d,f], [a,b,d,e], [a,e], [a,b,c], [b,c,d,e]] >>> b in N[a] # membership True >>> b in N[b] # membership False >>> len(N[f]) # degree 4
Deleting objects from the middle of a Python list is O(n), but deleting from the end is only O(1). If the order of neighbors is not important, you can delete an arbitrary neighbor in O(1) time by swapping it in to the last item in the list and then calling pop(). Using Dictionaries as Adjacent Lists: Finally, we can use dictionaries as adjacent lists. In this case, the neighbors would be the keys, and we are able to associate each of them with some extra value, such as an edge weight:
>>> a,b,c,d,e,f = range(6) # nodes >>> N = [{b:2,c:1,d:4,f:1}, {a:4,d:1,f:4}, {a:1,b:1,d:2,e:4}, {a:3,e:2}, {a:3,b:4,c:1}, {b:1,c:2,d:4,e:3}] >>> b in N[a] # membership True >>> len(N[f]) # degree 4
175
A more exible approach for node labels is to use dictionaries as a main structure only. For instance, we can use a dictionary with adjacency sets:
>>> a,b,c,d,e,f = range(6) # nodes >>> N = { a:set(bcdf), b:set(adf), c:set(abde), d:set(ae), e:set(abc), f:set(bcde)} >>> b in N[a] # membership True >>> len(N[f]) # degree 4
Adjacent Matrices
In adjacent matrices, instead of listing all the neighbors for each node, we have one row with one position for each possible neighbor, lled with True and False values. The simplest implementation of adjacent matrices is given by nested lists. Note that the diagonal is always False:
>>> a,b,c,d,e,f = range(6) # nodes >>> N = [[0,1,1,1,0,1], [1,0,0,1,0,1], [1,1,0,1,1,0], [1,0,0,0,1,0], [1,1,1,0,0,0], [0,1,1,1,1,0]] >>> N[a][b] # membership 1 >>> N[a][e] 0 >>> sum(N[f]) # degree 4
An adjacent matrix for an undirected graph will always be symmetric. To add weight to adjacent matrices, we can replace True and False by values. In this case, non-existent edges can be represented by innite weights (float(inf), or None, -1, or very large values):
>>> _ = float(inf) # nodes >>> N = [[_,2,1,4,_,1], [4,_,_,1,_,4], [1,1,_,2,4,_], [3,_,_,_,2,_], [3,4,1,_,_,_], [1,2,_,4,3,_]] >>> N[a][b] < _ # membership True >>> sum(1 for w in N[f] if w < _) # degree 4
176
Looking up an edge in an adjacent matrix is O(1) while iterating over a nodes neighbor is O(n).
11.3
Introduction to Trees
While in a graph there may be multiple references to any given node; in a tree each node (data element) is referenced only by at most one other node, the parent node. The root node is the node that has no parent. The nodes referenced by a parent node are called children. A tree is said to be full and complete if all of its leaves are at the bottom and all of the non-leaf nodes have exactly two children. Height or Depth of a Tree The height (or depth) of a tree is the length of the path from the root to the deepest node in the tree. It is equal to the maximum level of any node in the tree. The depth of the root is zero. If the height of a tree is represented as the log of the number of leaves, the integer number from the log may be also called depth. Level or Depth of a Node The level (or depth) of a node is the length of path from the root to this node. The set of all nodes at a given depth in a tree is also called the level of the tree.
Representing Trees
The simplest way of representing a tree is by a nested lists:
>>> >>> a >>> b >>> d >>> f >>> T = [a, [b, [d, f]], [c, [e, g]]] T[0] T[1][0] T[1][1][0] T[1][1][1] T[2][0]
177
However, this becomes very hard to handle if we we simply add a couple more branches. The only good way to work with trees is representing them as a collection of nodes. Let us start with a simple example, where we dene a simple tree class with an attribute for value, another for a children (or next), and a method for printing the result:
[trees/trees/simple_tree/tree.py] class SimpleTree(object): def __init__(self, value=None, children=None): self.children = children or [] self.value = value def __repr__(self, level=0): ret = "\t"*level+repr(self.value)+"\n" for child in self.children: ret += child.__repr__(level+1) return ret def main(): """ a b d e c h g """ st = SimpleTree(a, [SimpleTree(b, [SimpleTree(d), SimpleTree(e)] ), SimpleTree(c, [SimpleTree(h), SimpleTree(g)]) ]) print(st)
In the next chapter we will learn how to improve this class, including many features and methods that a tree can hold. For now, it is useful to
178
keep in mind that when we are prototyping data structures such as trees, we should always be able to come up with a exible class to specify arbitrary attributes in the constructor. The following program implements what is referred to as a bunch class;, a generic tool that is a specialization of the Pythons dict class and that let you create and set arbitrary attributes on the y:
[trees/simple_trees/bunchclass.py] class BunchClass(dict): def __init__(self, *args, **kwds): super(BunchClass, self).__init__(*args, **kwds) self.__dict__ = self def main(): {right: {right: Xander, left: Willow}, left: {right: Angel, left: Buffy}} bc = BunchClass # notice the absence of () tree = bc(left = bc(left="Buffy", right="Angel"), right = bc(left="Willow", right="Xander")) print(tree) if __name__ == __main__: main()
In the example above, the functions arguments *args and **kwds can hold an arbitrary number of arguments and an arbitrary number of keywords arguments, respectively.
Chapter 12
Binary Trees
12.1 Basic Concepts
Binary trees are tree data structures where each node has at most two child nodes: the left and the right. Child nodes may contain references to their parents. The root of a tree (the ancestor of all nodes) can exist either inside or outside the tree. Binary trees can be seen as a way of passing an initial number n of tokens down, meaning that at any point in the tree the sum of all the horizontal nodes will be n. The degree of every node is maximum two. Supposing that an arbitrary rooted tree has m internal nodes and each internal node has exactly two children, if the tree has n leaves, the degree of the tree is n 1: 2m = n + m 1 m = n 1, i.e a tree with n nodes has exactly n 1 branches or degree.
12.2
The simplest (and silliest) way to represent a binary tree is using Pythons lists. The following module constructs a list with a root and two empty sublists for the children. To add a left subtree to the root of a tree, we insert a new list into the second position of the root list. Note that this algorithm is not very ecient due to the restrictions that Pythons lists have on inserting and popping in the middle::
179
180
Figure 12.1: The height (h) and width (number of leaves) of a (perfectly balanced) binary tree.
[trees/binary_trees/BT_lists.py] def BinaryTreeList(r): return [r, [], []] def insertLeft(root, newBranch): t = root.pop(1) if len(t) > 1: root.insert(1,[newBranch,t,[]]) else: root.insert(1,[newBranch, [], []]) return root def insertRight(root, newBranch): t = root.pop(2) if len(t) > 1: root.insert(2,[newBranch,[],t]) else: root.insert(2,[newBranch,[],[]]) return root def getRootVal(root): return root[0] def setRootVal(root, newVal):
181
However this method is not very practical when we have many branches (or at least it needs many improvements, for example, how it manages the creation of new lists and how it displays or searches for new elements). A more natural way to handle binary trees is (again) by representing it as a collection of nodes. A simple node in a binary tree should carry attributes for value and for left and right children, and it can have a method to identify leaves:
[trees/binary_trees/BT.py] class BT(object): def __init__(self, value): self.value = value self.left = None self.right = None def is_leaf(self):
182
def tests_BT(): """ 1 2 3 4 5 6 7 """ tree = BT(1) tree.insert_left(2) tree.insert_right(3) tree.left().insert_left(4) tree.left().insert_right(5) tree.right().insert_left(6) tree.right().insert_right(7) print(tree.right().right()) tree.right().right().value(8) print(tree.right().right()) assert(tree.right().is_leaf() == False) assert(tree.right().right().is_leaf() == True) print("Tests Passed!") if __name__ == __main__: tests_BT()
183
12.3
A binary search tree (BST) is a node-based binary tree data structure which has the following properties: 1. The left subtree of a node contains only nodes with keys less than the nodes key. 2. The right subtree of a node contains only nodes with keys greater than the nodes key. 3. Both the left and right subtrees must also be a binary search tree. 4. There must be no duplicate nodes. If the binary search tree is balanced, the following operations are O(ln n): (i) nding a node with a given value (lookup), (ii) nding a node with maximum or minimum value, and (iii) insertion or deletion of a node.
184
def insert(self, value): if self.value == None: self.value = value else: if value > self.value: self.right = self.right and self.right.insert(value) or BST(value) else: self.left = self.left and self.left.insert(value) or BST(value) return self def find(self, value): if value == self.value: return self elif value > self.value and self.right: return self.right.find(value) elif self.left: return self.left.find(value) return None
def main(): """ 4 2 6 1 3 5 7 """ tree = BST() tree.insert(4) tree.insert(2) tree.insert(6) tree.insert(1) tree.insert(3) tree.insert(7) tree.insert(5) print(tree.get_right()) print(tree.get_right().get_left()) print(tree.get_right().get_right()) print(tree.get_left()) print(tree.get_left().get_left()) print(tree.get_left().get_right()) assert(tree.find(30) == None)
185
There are many other ways that a tree can be created. We could, for instance, think of two classes, one simply for nodes, and a second one that controls these nodes. This is not much dierent from the previous example (and in the end of this chapter we will see a third hybrid example of these two):
[trees/binary_trees/BST_with_Nodes.py] class Node(object): def __init__(self, value): self.value = value self.left = None self.right = None def __repr__(self): return {}.format(self.value) class BSTwithNodes(object): def __init__(self): self.root = None def insert(self, value): if not self.root: self.root = Node(value) else: current = self.root while True: if value < current.value: if current.left: current = current.left else: current.left = Node(value) break; elif value > current.value: if current.right: current = current.right else: current.right = Node(value) break;
186
else: break def main(): """ BST 4 2 6 1 3 5 7 """ tree = BSTwithNodes() l1 = [4, 2, 6, 1, 3, 7, 5] for i in l1: tree.insert(i) print(tree.root) print(tree.root.right) print(tree.root.right.left) print(tree.root.right.right) print(tree.root.left) print(tree.root.left.left) print(tree.root.left.right) if __name__ == __main__: main()
12.4
Self-Balancing BST
A balanced tree is a tree where the dierences of the height of every subtree is at most equal to 1. A self-balancing binary search tree is any node-based binary search tree that automatically keeps itself balanced. By applying a balance condition we ensure that the worst case runtime of common tree operations will be at most O(ln n). Balancing Factor of a Tree A balancing factor can be attributed to each internal node in a tree, being the dierence between the heights of the left and right subtrees. There are many balancing methods for trees, but they are usually based on two operations: Node splitting (and merging): nodes are not allowed to have more than two children, so when a node become overfull it splits into two
187
Node rotations: process of switching edges. If x is the parent of y , we make y the parent of x and x must take over one of the children of y .
AVL Trees
An AVL tree is a binary search tree with a self-balancing condition where the dierence between the height of the left and right subtrees cannot be more than one. To implement an AVL tree, we can start by adding a self-balancing method to our BST classes, called every time we add a new node to the tree. The method works by continuously checking the height of the tree, which is added as a new attribute:
def height(node): if node is None: return -1 else: return node.height def update_height(node): node.height = max(height(node.left), height(node.right)) + 1
Now we can go ahead and implement the rebalancing method for our tree. The method will check whether the dierence between the new heights of the right and left subtrees are up to 1. If this is not true, the method will perform the rotations:
def rebalance(self, node): while node is not None: update_height(node) if height(node.left) >= 2 + height(node.right): if height(node.left.left) >= height(node.left.right): self.right_rotate(node) else: self.left_rotate(node.left) self.right_rotate(node) elif height(node.right) >= 2 + height(node.left): if height(node.right.right) >= height(node.right.left): self.left_rotate(node)
188
The rotation methods are straightforward: it takes a node and swaps it to the right or to the left children node:
def left_rotate(self, x): y = x.right y.value = x.value if y.value is None: self.root = y else: if y.value.left is x: y.value.left = y elif y.value.right is x: y.value.right = y x.right = y.left if x.right is not None: x.right.value = x y.left = x x.value = y def right_rotate(self, x): y = x.left y.value = x.value if y.value is None: self.root = y else: if y.value.left is x: y.value.left = y elif y.value.right is x: y.value.right = y x.left = y.right if x.left is not None: x.left.value = x y.right = x x.value = y
We are now ready to write the entire AVL tree class! In the following code we have used our old BST class as a superclass, together with the methods we have described above. In addition, two methods for traversals
189
were used, and we will explain them better in the next chapter. For now, it is good to keep the example in mind and that this AVL tree indeed supports insert, nd, and delete-min operations at O(ln n) time:
[trees/binary_trees/avl.py] from BST_with_Nodes import BSTwithNodes, Node class AVL(BSTwithNodes): def __init__(self): self.root = None def left_rotate(self, x): y = x.right y.value = x.value if y.value is None: self.root = y else: if y.value.left is x: y.value.left = y elif y.value.right is x: y.value.right = y x.right = y.left if x.right is not None: x.right.value = x y.left = x x.value = y update_height(x) update_height(y) def right_rotate(self, x): y = x.left y.value = x.value if y.value is None: self.root = y else: if y.value.left is x: y.value.left = y elif y.value.right is x: y.value.right = y x.left = y.right if x.left is not None: x.left.value = x y.right = x x.value = y
190
update_height(x) update_height(y)
def insert_item(self, value): if self.root == None: self.root = Node(value) else: current = self.root while True: if value < current.value: if current.left: current = current.left else: current.left = Node(value) break; elif value > current.value: if current.right: current = current.right else: current.right = Node(value) break; else: break def insert(self, value): node = self.insert_item(value) self.rebalance(node)
def rebalance(self, node): while node is not None: update_height(node) if height(node.left) >= 2 + height(node.right): if height(node.left.left) >= height(node.left.right): self.right_rotate(node) else: self.left_rotate(node.left) self.right_rotate(node) elif height(node.right) >= 2 + height(node.left): if height(node.right.right) >= height(node.right.left): self.left_rotate(node) else: self.right_rotate(node.right)
191
192
Red-black Trees
Red-black trees are an evolution of a binary search trees that aim to keep the tree balanced without aecting the complexity of the primitive operations. This is done by coloring each node in the tree with either red or black and preserving a set of properties that guarantees that the deepest path in the tree is not longer than twice the shortest one. Red-black trees have the following properties: Every node is colored with either red or black. All leaf (nil) nodes are colored with black; if a nodes child is missing then we will assume that it has a nil child in that place and this nil child is always colored black. Both children of a red node must be black nodes. Every path from a node n to a descendent leaf has the same number of black nodes (not counting node n). We call this number the black height of n.
Binary Heaps
Binary heaps are complete balanced binary trees. The heap property makes it easier to maintain the structure, i.e., the balance of the tree. There is no need to modify a structure of the tree by splitting or rotating nodes in a heap: the only operation will be swapping parent and child nodes. In a binary heap, the root (the smallest or largest element) is always found in h[0]. Considering a node at index i: the parent index is
i 1 2 ,
193
12.5
Additional Exercises
Implementation of a binary tree and its properties. For example, the following bt: 1 2 4 6 8 9 7 5 3 ---> ---> ---> ---> ---> level level level level level 0 1 2 3 4
has the following properties: from collections import deque class NodeBT(object): def __init__(self, item=None, level=0): Construtor for a Node in the Tree self.item = item self.level = level self.left = None self.right = None self.traversal = [] SIZE OR NUMBER OF NODES: n = 9 NUMBER OF BRANCHES OR INTERNAL NODES: b = n-1 = 8 VALUE OF ROOT = 1 MAX_DEPTH OR HEIGHT: h = 4 IS BALANCED? NO IS BST? NO INORDER DFT: 8, 6, 9, 4, 7, 2, 5, 1, 3 POSTORDER DFT: 8, 9, 6, 7, 4, 5, 2, 3, 1 PREORDER DFT: 1, 2, 4, 6, 8, 9, 7, 5, 3 BFT: 1, 2, 3, 4, 5, 6, 7, 8, 9
194
def _addNextNode(self, value, level_here=1): Aux for self.addNode(value) self.traversal = [] new_node = NodeBT(value, level_here) if not self.item: self.item = new_node elif not self.left: self.left = new_node elif not self.right: self.right = new_node else: self.left = self.left._addNextNode(value, level_here+1) return self
METHODS TO PRINT/SHOW NODES ATTRIBUTES def __repr__(self): Private method for this class string representation return {}.format(self.item) def _getDFTpreOrder(self, node): Traversal Pre-Order, O(n) if node: if node.item: self.traversal.append(node.item) self._getDFTpreOrder(node.left) self._getDFTpreOrder(node.right) return self def _printDFTpreOrder(self, noderoot): Fill the pre-order traversal array self.traversal = [] self._getDFTpreOrder(noderoot) return self.traversal def _getDFTinOrder(self, node): Traversal in-Order, O(n) if node: self._getDFTinOrder(node.left) if node.item: self.traversal.append(node.item) self._getDFTinOrder(node.right)
195
def _printDFTpostOrder(self, noderoot): Fill the post-order traversal array self.traversal = [] self._getDFTpostOrder(noderoot) return self.traversal def _searchForNode(self, value): Traverse the tree looking for the node if self.item == value: return self else:
196
197
class BinaryTree(object): >>> bt = BinaryTree() >>> for i in range(1, 10): bt.addNode(i) >>> bt.hasNode(7) True >>> bt.hasNode(12) False >>> bt.printTree() [1, 2, 4, 6, 8, 9, 7, 5, 3] >>> bt.printTree(pre) [1, 2, 4, 6, 8, 9, 7, 5, 3] >>> bt.printTree(bft) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> bt.printTree(post) [8, 9, 6, 7, 4, 5, 2, 3, 1] >>> bt.printTree(in) [8, 6, 9, 4, 7, 2, 5, 1, 3]
198
def __init__(self): Construtor for the Binary Tree, which is a container of Nodes self.root = None
199
def getNodeLevel(self, item): Return the level of the node, best O(1), worst O(n) node = self.root._searchForNode(item) if node: return node.level else: raise Exception(Node not found) def getSizeTree(self): Return how many nodes in the tree, O(n) return len(self.root._printDFTpreOrder(self.root)) def isRoot(self, value):
200
201
if (value1right and not value2right) or (value2right and not value1right): return root else: return self._getAncestorPrePost(preorder, postorder[:i] + postorder[i+1:], value1, value2) def _getAncestorInPost(self, inorder, postorder, value1, value2): Return the ancestor of two nodes with in and post root = postorder[-1] postorder = postorder[:-1] value1left, value2left = False, False i = 0 item = inorder[i] while item != root and i < len(inorder): if item == value1: value1left = True elif item == value2: value2left = True i += 1 item = inorder[i] if (value1left and not value2left) or (value2left and not value1left): return root else: return self._getAncestorInPost(postorder, inorder[:i] + inorder[i+1:], value1, value2) def _getAncestorBST(self, preorder, value1, value2): Return the ancestor of two nodes if it is a bst while preorder: current = preorder[0] if current < value1: try: preorder = preorder[2:] except: return current
202
203
Implementation of a binary search tree and its properties. We use the Binary Tree class and its Node class as superclasses, and we modify the methods that are needeed to create a BST (polymorphism). For example, the following bst: 7 4 2 1 5 6 8 9 10 ---> ---> ---> ---> level level level level 0 1 2 3
has the following properties: SIZE OR NUMBER OF NODES: n = 10 NUMBER OF BRANCHES OR INTERNAL NODES: b = n-1 = 9 VALUE OF ROOT = 7 MAX_DEPTH OR HEIGHT: h = 3 IS BALANCED? YES IS BST? YES INORDER DFT: 1, 2, 4, 5, 6, 7, 8, 9, 10 POSTORDER DFT: 1, 2, 6, 5, 4, 8, 10, 9, 7 PREORDER DFT: 7, 4, 2, 1, 5, 6, 9, 8, 10 BFT: 7, 4, 9, 2, 5, 8, 10, 1, 6
class NodeBST(NodeBT): def _addNextNode(self, value, level_here=1): Aux for self.addNode(value): for BST, best O(1), worst O(log n) self.traversal = [] new_node = NodeBST(value, level_here) if not self.item: self.item = new_node elif value < self.item: self.left = self.left and self.left._addNextNode(value, level_here+1) or new_node else: self.right = self.right and self.right._addNextNode(value, level_here+1) or new_node return self
204
def _searchForNode(self, value): Traverse the tree looking for the node. For BST it is O(logn) if self.item == value: return self elif value > self.item and self.right: return self.right._findNode(value) elif value < self.item and self.left: return self.left._findNode(value) return None
class BinarySearchTree(BinaryTree): >>> bst = BinarySearchTree() >>> l1 = [7, 4, 5, 9, 2, 8, 1, 6, 10] >>> for i in l1: bst.addNode(i) >>> bst.hasNode(3) False >>> bst.hasNode(10) True >>> bst.printTree(pre) [7, 4, 2, 1, 5, 6, 9, 8, 10] >>> bst.printTree(post) [1, 2, 6, 5, 4, 8, 10, 9, 7] >>> bst.printTree(in) [1, 2, 4, 5, 6, 7, 8, 9, 10] >>> bst.printTree(bft) [7, 4, 9, 2, 5, 8, 10, 1, 6] >>> bst.getHeight() 3 >>> bst.isBST() True >>> bst.isBalanced() False >>> bst.isBalanced(2) False >>> bst.getAncestor(2, 9) 7 >>> bst.getAncestor(2, 9, bst) 7 >>> bst.getAncestor(2, 9, pre-post) 7 >>> bst.getAncestor(2, 9, post-in)
205
def addNode(self, value): Add new node to the tree, by the left first, O(n). The only difference from the Binary Tree class is the node class is NodeBST and not NodeBT if not self.root: self.root = NodeBST(value) else: self.root._addNextNode(value) if __name__ == __main__: import doctest doctest.testmod()
206
Chapter 13
13.1
Depth-First Search
Depth-rst traversal, or depth-rst search (DFS), are algorithms that searches deeper rst in a graph or a tree. Their dierence when in graphs or trees is that in case of graphs, it is necessary to mark nodes as visited (otherwise we might be stuck in a loop). DFS algorithms are called once for every node that is reachable from the start node, looking at its successors. The runtime is O(number of reachable
Figure 13.1: Binary tree traversals: preorder, inorder, postorder, and breath-rst search.
207
208CHAPTER 13. TRAVERSALS AND PROBLEMS ON GRAPHS AND TREES nodes + total number of outgoing edges from these nodes) = O(V + E ). DFSs are usually implemented using LIFO structure such as stacks to keep track of the discovered nodes, and they can be divided in three dierent strategies: Preorder: Visit a node before traversing subtrees (root left right):
def preorder(root): if root != 0: yield root.value preorder(root.left) preorder(root.right)
Postorder: Visit a node after traversing all subtrees (left right root):
def postorder(root): if root != 0: postorder(root.left) postorder(root.right) yield root.value
Inorder: Visit a node after traversing its left subtree but before the right subtree (left root right):
def inorder(root): if root != 0: inorder(root.left) yield root.value inorder(root.right)
13.2
Breadth-First Search
Breadth-rst traversal, or breath-rst search (BFS), are algorithms that yields the values of all nodes of a particular depth before going to any deeper node. Problems that use BFS usually ask to nd the fewest number of steps (or the shortest path) needed to reach a certain end point from the starting
209
point. Traditionally, BFSs are implemented using a list to store the values of the visited nodes and then a FIFO queue to store those nodes that have yet to be visited. The total runtime is also O(V + E ).
13.3
There are many ways we could write traversals. In the following code we use the BST with nodes class, dened in the last chapter, to implement each of the traversal algorithms. For the DFS cases, we have also tested two dierent methods:
[trees/traversals/BST_with_Nodes_traversal.py] from BST_with_Nodes import BSTwithNodes, Node class BSTTraversal(BSTwithNodes): def __init__(self): self.root = None self.nodes_BFS = [] self.nodes_DFS_pre = [] self.nodes_DFS_post = [] self.nodes_DFS_in = [] def BFS(self): self.root.level = 0 queue = [self.root] current_level = self.root.level while len(queue) > 0: current_node = queue.pop(0) if current_node.level > current_level: current_level += 1 self.nodes_BFS.append(current_node.value) if current_node.left: current_node.left.level = current_level + 1 queue.append(current_node.left) if current_node.right: current_node.right.level = current_level + 1 queue.append(current_node.right)
211
13.4
Additional Exercises
More Traversals in a BST As an alternative to the example shown in this chapter, in the following program we implement traversals for our old BST class (without nodes):
[traversals/BST_traversal.py] from BST import BST class TranversalBST(object): def __init__(self): self.bst = BST(None) self.nodes = [] def insert(self, value): if not self.bst.value: self.bst.value = value else: self.bst.insert(value) def contains(self, value): return bool(self.bst.find(value)) def get(self, index): for i, value in enumerate(self.inorder()): if i == index: return value def inorder(self): current = self.bst self.nodes = [] stack = [] while len(stack) > 0 or current is not None: if current is not None: stack.append(current) current = current.left else: current = stack.pop() self.nodes.append(current.value) current = current.right return self.nodes def preorder(self):
def preorder2(self): self.nodes = [] current = self.bst stack = [] while len(stack) > 0 or current is not None: if current is not None: self.nodes.append(current.value) stack.append(current) current = current.left else: current = stack.pop() current = current.right return self.nodes
def main(): """ 10 5 15 1 6 11 50 """ t = TranversalBST() t.insert(10) t.insert(5) t.insert(15) t.insert(1) t.insert(6) t.insert(11) t.insert(50) print(t.preorder()) print(t.preorder2()) print(t.inorder()) if __name__ == __main__:
213
Balance and Depth in a BST In the following example we use the class in the previous example with some methods to nd the (maximum and minimum) depths, to check whether the tree is balanced and to nd a key in preorder and inorder traversals:
[trees/traversals/BST_extra_methods.py] from BST_traversal import TranversalBST from BST import BST class BSTwithExtra(TranversalBST): def __init__(self): self.bst = BST(None) self.nodes = [] def get_inorder(self, k): for i, value in enumerate(self.inorder()): if value == k: return i def get_preorder(self, k): for i, value in enumerate(self.preorder()): if value == k: return i def is_balanced(self, threshold=1): maxd = self.get_max_depth() mind = self.get_min_depth() print(Max depth: + str(maxd)) print(Min depth: + str(mind)) return maxd -mind def get_min_depth(self, node=None, initial=1): if node is None and initial == 1: node = self.bst if node.left and node.right: return 1 + min(self.get_min_depth(node.left, 0), self.get_min_depth(node.right, 0)) else:
50 60 70
80 """ t = BSTwithExtra() l1 = [10, 5, 15, 1, 6, 11, 50, 60, 70, 80] for i in l1: t.insert(i) print(t.inorder()) print(t.preorder()) assert(t.get_max_depth() == 5) assert(t.get_min_depth() == 2) assert(t.is_balanced() == 3) assert(t.get_inorder(10) == 3) assert(t.get_preorder(10) == 0) """ 1 2 4 5 3 6 7
215
Ancestor in a BST The example bellow nds the lowest level common ancestor of two nodes in a binary search tree:
[trees/traversals/BST_ancestor.py] from BST_traversal import TranversalBST def find_ancestor(path, low_value, high_value): find the lowest ancestor in a BST while path: current_value = path[0] if current_value < low_value: try: path = path[2:] except: return current_value elif current_value > high_value: try: path = path[1:] except: return current_value elif low_value <= current_value <= high_value: return current_value return None
50]
Bibliography
Websites:
[Interactive Python] http://interactivepython.org
[My free GIT Repository] https://bitbucket.org/steinkirch/exercises-in-python: All the examples in this book are there, plus many exercises that I have not included here.
Books:
[A nice Book for Software Eng. Interviews] Cracking the Coding Interview, Gayle Laakmann McDowell, 2013
217
218
BIBLIOGRAPHY
[A nice Python 3 Book] Programming in Python 3: A Complete Introduction to the Python 3.1 Language, Mark Summereld, 2011
[A nice Python Book] Learn Python The Hard Way, Zed A. Shaw, 2010
[A nice Algorithms Book] Mastering Basic Algorithms in the Python Language, Magnus Lie Hetland, 2010
[Another nice Algorithms Book] The Algorithm Design Manual, S.S. Skiena, 2008
[Another nice Python Book] Python 3 Object Oriented Programming, Dusty Phillips, 2010
[Another nice guide for Software Eng. Interviews] Programming Pearls, Jon Bentley, 1986