Strings in Python
CS100 / CS101
Outline
●
Unicode
●
Common Sequence Operations
●
String Indexing
●
split, join, partition
String
●
immutable sequence of Unicode code points
●
String
●
immutable sequence of Unicode code points
●
can include letters, diacritical marks (é, ï, ô, ...),
numbers, currency symbols, emoji (" $ ),
punctuation, space and line break characters,
and more.
●
Example Devanagari string
üŋíc0de
üŋíc0de
Unicode
●
144,697 characters, 159 modern + historic scripts,
symbols, emoji, non-visual control and formatting
codes
●
Standard: How to store and display
– Normalization rules, decomposition, collation, rendering,
and bidirectional text display order, ...
– Encoding, representation
●
ASCII vs. Unicode
https://home.unicode.org, Unicode wikipage
Common Sequence Operations
●
These operations in Python are supported by
most Data types
Common Sequence Operations
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
Common Sequence Operations
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s+t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times
Common Sequence Operations
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s+t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times
s[i] ith item of s, origin 0
s[i:j] slice of s from i to j
String Indexing
N
N aa tt ii oo nn
0 1 2 3 4 5
N
N aa tt ii oo nn
-6 -5 -4 -3 -2 -1
Common Sequence Operations
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s+t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times
s[i] ith item of s, origin 0
s[i:j] slice of s from i to j
s[i:j:k] slice of s from i to j with step k
Common Sequence Operations
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s+t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times
s[i] ith item of s, origin 0
s[i:j] slice of s from i to j
s[i:j:k] slice of s from i to j with step k
len(s) length of s
Common Sequence Operations
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s+t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times
s[i] ith item of s, origin 0
s[i:j] slice of s from i to j
s[i:j:k] slice of s from i to j with step k
len(s) length of s
min(s) smallest item of s
max(s) largest item of s
Common Sequence Operations
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s+t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times
s[i] ith item of s, origin 0
s[i:j] slice of s from i to j
s[i:j:k] slice of s from i to j with step k
len(s) length of s
min(s) smallest item of s
max(s) largest item of s
s.index(x[, i[, j]]) index of the first occurrence of x in s (at or after index i
and before index j)
Common Sequence Operations
x in s True if an item of s is equal to x, else False
x not in s False if an item of s is equal to x, else True
s+t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times
s[i] ith item of s, origin 0
s[i:j] slice of s from i to j
s[i:j:k] slice of s from i to j with step k
len(s) length of s
min(s) smallest item of s
max(s) largest item of s
s.index(x[, i[, j]]) index of the first occurrence of x in s (at or after index i and before
index j)
s.count(<sub>, [<start> [, end]]) total number of occurrences of x in s
Strings
●
Single quotes, Double quotes
●
Multiline strings in Triple quotes
– Docstrings
Operations on Strings
●
Convert int to str, vice-versa
●
Convert list to str, vice-versa
●
Split Operations
>>> '1,2,3'.split(',')
>>> '1,2,3'.split(',', maxsplit=1)
>>> '1,2,,3,'.split(',')
>>> '1 2 3'.split()
>>> '1 2 3'.split(maxsplit=1)
>>> ' 1 2 3 '.split()
Split Operations
>>> '1,2,3'.split(',')
['1', '2', '3']
>>> '1,2,3'.split(',', maxsplit=1)
['1', '2,3']
>>> '1,2,,3,'.split(',')
['1', '2', '', '3', '']
>>> '1 2 3'.split()
['1', '2', '3']
>>> '1 2 3'.split(maxsplit=1)
['1', '2 3']
>>> ' 1 2 3 '.split()
['1', '2', '3']
Join, Partition
>>> chickens = ["hen", "egg", "rooster"]
>>> ' '.join(chickens)
>>> ' :: '.join(chickens)
>>>'foo.bar'.partition('.')
Iterate through characters of the String
>>> for code_point in some_string:
... print(code_point)
>>> >>> for index, code_point in enumerate(some_string):
... print(index, ": ", code_point)
Find, Replace
s.replace(<old>, <new>[, <count>])
s.capitalize()
s.swapcase()
s.lower(), s.upper(), s.title()
s.count(<sub>[, <start>[, <end>]])
s.startswith(<prefix>[, <start>[, <end>]]), s.endswith(<suffix>[, <start>[, <end>]])
s.find(<sub>[, <start>[, <end>]]), s.rfind(<sub>[, <start>[, <end>]])
s.index(<sub>[, <start>[, <end>]]), s.rindex(<sub>[, <start>[, <end>]])
Character Classification
s.isalnum()
s.isalpha()
s.isdigit()
s.islower(), s.isupper()
s.strip([<chars>]), s.lstrip([<chars>]), s.rstrip([<chars>]),
Strings Lab Assignments
Isogram
CS100/CS101
Outline
●
Example: Isogram
●
Lab Homework: Simple Cipher
●
Submit both
Isogram
●
Determine if a word or phrase is an isogram.
●
An isogram (also known as a "non-pattern word") is a
word or phrase without a repeating letter, however spaces
and hyphens are allowed to appear multiple times.
●
Examples of isograms:
– lumberjacks, background, downstream, six-year-old, isogram
●
The word isograms, however, is not an isogram, because
the s repeats.
https://en.wikipedia.org/wiki/Isogram, exercism.org
Isogram
●
Input:
●
Output
Isogram
●
Input: A word or a phrase (String)
●
Output – True / False
– True (if the supplied input is an Isogram)
– False (if the supplied input is NOT an Isogram)
Steps to do
●
Propose a solution
●
Formalize the plain English solution
– Flowchart OR Pseudocode
●
We’ll run it through a bunch of sample inputs
●
If the expected output is returned every time,
let’s translate the solution to Python code
Isogram – Solution
Isogram – Flowchart
Isogram – Pseudocode
Isogram – Sample Inputs
●
“” (Empty String)
●
Example words from first slide, uncopyrightable (longest
isogram)
●
First# Clan! (string contains punctuation marks)
●
“BackGround” (String contains upper- and lowercase letter)
●
Non-isograms containing single letter repetition to several
letters repeating.
●
...
Isogram – Solution
●
If the string is Empty, return True
●
For ever letter in the String
Isogram – Python
Isogram – Python
def is_isogram(string):
in_chars=[]
for i in string:
if i.lower() in in_chars:
return False
else:
if i.isalpha() is True:
in_chars.append(i.lower())
return True
Homework – Encryption, Decryption
Homework – Encryption, Decryption
http://en.wikipedia.org/wiki/Substitution_cipher, Simple Cipher at exercism.org
Encryption, Decryption – Key
The ROT13 key can be written as
“nnnnnnnnnnnnn”
Each position in the Key is the letter to be
substituted with the character ‘a’ in that
position
Examples.
Key is “aaaaa”. “hello” ==> “hello”
Key is “ddddd”. “hello” ==> “khoor”
Key is “nnnnn”. “hello” ==> “uryyb”
Key is “abcde”. “hello” ==> “hfnos”
http://en.wikipedia.org/wiki/Substitution_cipher, Simple Cipher at exercism.org
Homework –
Encryption, Decryption
●
Implement a Substitution Cipher
●
Substitution cipher replaces plaintext with an
identical length ciphertext
●
Ciphers render text less readable while still
allowing easy deciphering
http://en.wikipedia.org/wiki/Substitution_cipher, Simple Cipher at exercism.org
Substitution Cipher
●
Create 2 functions: encode(), and decode().
●
encode(text, key=“dddddddddddddddddddd”):
– Text: plain text to be encrypted
– Key: key to use to encrypt. Keep default as ROT3.
– Encode implements Simple Shift Cipher using the key (Eg. Caesar
Cipher)
●
encode(“dontlookup”) should return “grqworrnxs”
●
encode(“dontlookup”, “abcdefghij”) should return “dppwpturcy”
http://en.wikipedia.org/wiki/Substitution_cipher, Simple Cipher at exercism.org
Substitution Cipher
●
Create 2 functions: encode(), and decode().
●
decode(text, key=“dddddddddddddddddddd”):
– Text: cipher text to be decrypted to plaintext
– Key: key to use to encrypt. Default is ROT3.
– Decode implements Simple Shift Cipher using the key (Eg. Caesar
Cipher)
●
decode(“grqworrnxs”) should return “dontlookup”
●
decode(“dppwpturcy”, “abcdefghij”) should return “dontlookup”
http://en.wikipedia.org/wiki/Substitution_cipher, Simple Cipher at exercism.org
Substitution Cipher – Assumptions
●
If key supplied is “abc”, then the full key is
‘abcabcabc...’ (as long as the message is)
●
Assume there are no spaces and punctuation
marks in the message
●
Max message size is 100 letters
●
State any other assumptions in the comments
http://en.wikipedia.org/wiki/Substitution_cipher, Simple Cipher at exercism.org
Summary
●
Unicode
●
Common Sequence Operations
●
String Indexing
●
split, join, partition