ch2_9
ch2_9
ch2_9
Data types are nothing but variables you use to reserve some space in memory. Python variables do not need an
explicit declaration to reserve memory space. The declaration happens automatically when you assign a value to a
variable.
1. Sets - They are mutable and new elements can be added once sets are defined
2. Frozen Sets - They are immutable and new elements cannot added after its defined.
b = frozenset('asdfagsa')
print(b)
> frozenset({'f', 'g', 'd', 'a', 's'})
cities = frozenset(["Frankfurt", "Basel","Freiburg"])
print(cities)
> frozenset({'Frankfurt', 'Basel', 'Freiburg'})
list = [123,'abcd',10.2,'d'] #can be an array of any data type or single data type.
list1 = ['hello','world']
print(list) #will output whole list. [123,'abcd',10.2,'d']
print(list[0:2]) #will output first two element of list. [123,'abcd']
print(list1 * 2) #will gave list1 two times. ['hello','world','hello','world']
print(list + list1) #will gave concatenation of both the lists.
[123,'abcd',10.2,'d','hello','world']
dic={'name':'red','age':10}
print(dic) #will output all the key-value pairs. {'name':'red','age':10}
print(dic['name']) #will output only value with 'name' key. 'red'
print(dic.values()) #will output list of values in dic. ['red',10]
print(dic.keys()) #will output list of keys. ['name','age']
tuple = (123,'hello')
tuple1 = ('world')
print(tuple) #will output whole tuple. (123,'hello')
print(tuple[0]) #will output first value. (123)
print(tuple + tuple1) #will output (123,'hello','world')
tuple[1]='update' #this will give you error.
class ExampleClass:
#Every function belonging to a class must be indented equally
def __init__(self):
name = "example"
#If a function is not indented to the same level it will not be considers as part of the parent class
def separateFunction(b):
for i in b:
#Loops are also indented and nested conditions start a new indentation
if i == 1:
return True
return False
separateFunction([2,3,5,6,1])
Spaces or Tabs?
The recommended indentation is 4 spaces but tabs or spaces can be used so long as they are consistent. Do not
mix tabs and spaces in Python as this will cause an error in Python 3 and can causes errors in Python 2.
The lexical analyzer uses a stack to store indentation levels. At the beginning, the stack contains just the value 0,
which is the leftmost position. Whenever a nested block begins, the new indentation level is pushed on the stack,
and an "INDENT" token is inserted into the token stream which is passed to the parser. There can never be more
than one "INDENT" token in a row (IndentationError).
When a line is encountered with a smaller indentation level, values are popped from the stack until a value is on top
which is equal to the new indentation level (if none is found, a syntax error occurs). For each value popped, a
"DEDENT" token is generated. Obviously, there can be multiple "DEDENT" tokens in a row.
The lexical analyzer skips empty lines (those containing only whitespace and possibly comments), and will never
generate either "INDENT" or "DEDENT" tokens for them.
At the end of the source code, "DEDENT" tokens are generated for each indentation level left on the stack, until just
the 0 is left.
For example:
is analyzed as:
The parser than handles the "INDENT" and "DEDENT" tokens as block delimiters.
a = 7
if a > 5:
print "foo"
else:
print "bar"
print "done"
Or if the line following a colon is not indented, an IndentationError will also be raised:
if True:
print "true"
if True:
a = 6
b = 5
If you forget to un-indent functionality could be lost. In this example None is returned instead of the expected False:
def isEven(a):
if a%2 ==0:
return True
#this next line should be even with the if
return False
print isEven(7)
Python ignores comments, and so will not execute code in there, or raise syntax errors for plain English sentences.
Single-line comments begin with the hash character (#) and are terminated by the end of line.
Inline comment:
Comments spanning multiple lines have """ or ''' on either end. This is the same as a multiline string, but
they can be used as comments:
"""
This type of comment spans multiple lines.
These are mostly used for documentation of functions, classes and modules.
"""
An example function
def func():
"""This is a function that does nothing at all"""
return
print(func.__doc__)
help(func)
func()
help(greet)
greet(name, greeting='Hello')
Just putting no docstring or a regular comment in a function makes it a lot less helpful.
print(greet.__doc__)
None
help(greet)
greet(name, greeting='Hello')
def hello(name):
"""Greet someone.
Print a greeting ("Hello") for the person with the given name.
"""
print("Hello "+name)
class Greeter:
"""An object used to greet people.
The value of the docstring can be accessed within the program and is - for example - used by the help command.
Syntax conventions
PEP 257
PEP 257 defines a syntax standard for docstring comments. It basically allows two types:
One-line Docstrings:
According to PEP 257, they should be used with short and simple functions. Everything is placed in one line, e.g:
def hello():
"""Say hello to your friends."""
print("Hello my friends!")
The docstring shall end with a period, the verb should be in the imperative form.
Multi-line Docstrings:
Multi-line docstring should be used for longer, more complex functions, modules or classes.
Arguments:
name: the name of the person
language: the language in which the person should be greeted
"""
print(greeting[language]+" "+name)
They start with a short summary (equivalent to the content of a one-line docstring) which can be on the same line
as the quotation marks or on the next line, give additional detail and list parameters and return values.
Note PEP 257 defines what information should be given within a docstring, it doesn't define in which format it
should be given. This was the reason for other parties and documentation parsing tools to specify their own
standards for documentation, some of which are listed below and in this question.
Sphinx
Sphinx is a tool to generate HTML based documentation for Python projects based on docstrings. Its markup
language used is reStructuredText. They define their own standards for documentation, pythonhosted.org hosts a
very good description of them. The Sphinx format is for example used by the pyCharm IDE.
print(greeting[language]+" "+name)
return 4
Google has published Google Python Style Guide which defines coding conventions for Python, including
documentation comments. In comparison to the Sphinx/reST many people say that documentation according to
Google's guidelines is better human-readable.
The pythonhosted.org page mentioned above also provides some examples for good documentation according to
the Google Style Guide.
Using the Napoleon plugin, Sphinx can also parse documentation in the Google Style Guide-compliant format.
A function would be documented like this using the Google Style Guide format:
Args:
name: the name of the person as string
language: the language code string
Returns:
A number.
"""
print(greeting[language]+" "+name)
return 4
UTC offset in the form +HHMM or -HHMM (empty string if the object is naive).
import datetime
dt = datetime.datetime.strptime("2016-04-15T08:27:18-0500", "%Y-%m-%dT%H:%M:%S%z")
For other versions of Python, you can use an external library such as dateutil, which makes parsing a string with
timezone into a datetime object is quick.
import dateutil.parser
dt = dateutil.parser.parse("2016-04-15T08:27:18-0500")
For time zones that are a fixed offset from UTC, in Python 3.2+, the datetime module provides the timezone class, a
concrete implementation of tzinfo, which takes a timedelta and an (optional) name parameter:
print(dt.tzname())
# UTC+09:00
For Python versions before 3.2, it is necessary to use a third party library, such as dateutil. dateutil provides an
equivalent class, tzoffset, which (as of version 2.5.3) takes arguments of the form dateutil.tz.tzoffset(tzname,
offset), where offset is specified in seconds:
For zones with daylight savings time, python standard libraries do not provide a standard class, so it is necessary to
use a third party library. pytz and dateutil are popular libraries providing time zone classes.
In addition to static time zones, dateutil provides time zone classes that use daylight savings time (see the
documentation for the tz module). You can use the tz.gettz() method to get a time zone object, which can then
be passed directly to the datetime constructor:
CAUTION: As of version 2.5.3, dateutil does not handle ambiguous datetimes correctly, and will always default to
the later date. There is no way to construct an object with a dateutil timezone representing, for example
2015-11-01 1:30 EDT-4, since this is during a daylight savings time transition.
All edge cases are handled properly when using pytz, but pytz time zones should not be directly attached to time
zones through the constructor. Instead, a pytz time zone should be attached using the time zone's localize
method:
PT = pytz.timezone('US/Pacific')
dt_pst = PT.localize(datetime(2015, 1, 1, 12))
dt_pdt = PT.localize(datetime(2015, 11, 1, 0, 30))
print(dt_pst)
# 2015-01-01 12:00:00-08:00
print(dt_pdt)
# 2015-11-01 00:30:00-07:00
Be aware that if you perform datetime arithmetic on a pytz-aware time zone, you must either perform the
calculations in UTC (if you want absolute elapsed time), or you must call normalize() on the result:
delta = now-then
print(delta.days)
# 60
print(delta.seconds)
# 40826
import datetime
# Date object
today = datetime.date.today()
new_year = datetime.date(2017, 01, 01) #datetime.date(2017, 1, 1)
# Time object
noon = datetime.time(12, 0, 0) #datetime.time(12, 0)
# Current datetime
now = datetime.datetime.now()
Arithmetic operations for these objects are only supported within same datatype and performing simple arithmetic
with instances of different types will result in a TypeError.
# Do this instead
print('Time since the millenium at midnight: ',
datetime.datetime(today.year, today.month, today.day) - millenium_turn)
# Or this
print('Time since the millenium at noon: ',
datetime.datetime.combine(today, noon) - millenium_turn)
utc = tz.tzutc()
local = tz.tzlocal()
utc_now = datetime.utcnow()
utc_now # Not timezone-aware.
utc_now = utc_now.replace(tzinfo=utc)
utc_now # Timezone-aware.
local_now = utc_now.astimezone(local)
local_now # Converted to local time.
import datetime
today = datetime.date.today()
print('Today:', today)
Today: 2016-04-15
Yesterday: 2016-04-14
Tomorrow: 2016-04-16
Difference between tomorrow and yesterday: 2 days, 0:00:00
import time
from datetime import datetime
seconds_since_epoch=time.time() #1469182681.709
import calendar
from datetime import date
import datetime
import dateutil.relativedelta
d = datetime.datetime.strptime("2013-03-31", "%Y-%m-%d")
d2 = d - dateutil.relativedelta.relativedelta(months=1) #datetime.datetime(2013, 2, 28, 0, 0)
But these 2 forms need a different format for strptime. Furthermore, strptime' does not support at all
parsing minute timezones that have a:in it, thus2016-07-22 09:25:59+0300can be parsed, but the
standard format2016-07-22 09:25:59+03:00` cannot.
There is a single-file library called iso8601 which properly parses ISO 8601 timestamps and only them.
It supports fractions and timezones, and the T separator all with a single function:
import iso8601
iso8601.parse_date('2016-07-22 09:25:59')
# datetime.datetime(2016, 7, 22, 9, 25, 59, tzinfo=<iso8601.Utc>)
iso8601.parse_date('2016-07-22 09:25:59+03:00')
# datetime.datetime(2016, 7, 22, 9, 25, 59, tzinfo=<FixedOffset '+03:00' ...>)
iso8601.parse_date('2016-07-22 09:25:59Z')
# datetime.datetime(2016, 7, 22, 9, 25, 59, tzinfo=<iso8601.Utc>)
iso8601.parse_date('2016-07-22T09:25:59.000111+03:00')
# datetime.datetime(2016, 7, 22, 9, 25, 59, 111, tzinfo=<FixedOffset '+03:00' ...>)
If no timezone is set, iso8601.parse_date defaults to UTC. The default zone can be changed with default_zone
keyword argument. Notably, if this is None instead of the default, then those timestamps that do not have an
explicit timezone are returned as naive datetimes instead:
iso8601.parse_date('2016-07-22T09:25:59', default_timezone=None)
# datetime.datetime(2016, 7, 22, 9, 25, 59)
iso8601.parse_date('2016-07-22T09:25:59Z', default_timezone=None)
# datetime.datetime(2016, 7, 22, 9, 25, 59, tzinfo=<iso8601.Utc>)
datetime.now().isoformat()
# Out: '2016-07-31T23:08:20.886783'
datetime.now(tzlocal()).isoformat()
# Out: '2016-07-31T23:09:43.535074-07:00'
datetime.now(tzlocal()).replace(microsecond=0).isoformat()
# Out: '2016-07-31T23:10:30-07:00'
See ISO 8601 for more information about the ISO 8601 format.
Section 5.11: Parsing a string with a short time zone name into
For dates formatted with short time zone names or abbreviations, which are generally ambiguous (e.g. CST, which
could be Central Standard Time, China Standard Time, Cuba Standard Time, etc - more can be found here) or not
necessarily available in a standard database, it is necessary to specify a mapping between time zone abbreviation
and tzinfo object.
ET = tz.gettz('US/Eastern')
CT = tz.gettz('US/Central')
MT = tz.gettz('US/Mountain')
PT = tz.gettz('US/Pacific')
dt_est
# datetime.datetime(2014, 1, 2, 4, 0, tzinfo=tzfile('/usr/share/zoneinfo/US/Eastern'))
dt_pst
# datetime.datetime(2016, 3, 11, 16, 0, tzinfo=tzfile('/usr/share/zoneinfo/US/Pacific'))
It is worth noting that if using a pytz time zone with this method, it will not be properly localized:
EST = pytz.timezone('America/New_York')
dt = parse('2014-02-03 09:17:00 EST', tzinfos={'EST': EST})
If using this method, you should probably re-localize the naive portion of the datetime after parsing:
dt_fixed = dt.tzinfo.localize(dt.replace(tzinfo=None))
dt_fixed.tzinfo # Now it's EST.
# <DstTzInfo 'America/New_York' EST-1 day, 19:00:00 STD>)
dt is now a datetime object and you would see datetime.datetime(2047, 1, 1, 8, 21) printed.
import datetime
start_date = datetime.date.today()
end_date = start_date + 7*day_delta
Which produces:
2016-07-21
2016-07-22
2016-07-23
2016-07-24
2016-07-25
2016-07-26
2016-07-27
a = datetime(2016,10,06,0,0,0)
b = datetime(2016,10,01,23,59,59)
a-b
# datetime.timedelta(4, 1)
(a-b).days
# 4
(a-b).total_seconds()
# 518399.0
class Color(Enum):
red = 1
green = 2
blue = 3
print(Color.red) # Color.red
print(Color(1)) # Color.red
print(Color['red']) # Color.red
class Color(Enum):
red = 1
green = 2
blue = 3
# Intersection
{1, 2, 3, 4, 5}.intersection({3, 4, 5, 6}) # {3, 4, 5}
{1, 2, 3, 4, 5} & {3, 4, 5, 6} # {3, 4, 5}
# Union
{1, 2, 3, 4, 5}.union({3, 4, 5, 6}) # {1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5} | {3, 4, 5, 6} # {1, 2, 3, 4, 5, 6}
# Difference
{1, 2, 3, 4}.difference({2, 3, 5}) # {1, 4}
{1, 2, 3, 4} - {2, 3, 5} # {1, 4}
# Superset check
{1, 2}.issuperset({1, 2, 3}) # False
{1, 2} >= {1, 2, 3} # False
# Subset check
{1, 2}.issubset({1, 2, 3}) # True
{1, 2} <= {1, 2, 3} # True
# Disjoint check
{1, 2}.isdisjoint({3, 4}) # True
{1, 2}.isdisjoint({1, 4}) # False
# Existence check
2 in {1,2,3} # True
4 in {1,2,3} # False
4 not in {1,2,3} # True
s.discard(3) # s == {1,2,4}
s.discard(5) # s == {1,2,4}
s.remove(2) # s == {1,4}
s.remove(2) # KeyError!
Set operations return new sets, but have the corresponding in-place versions:
For example:
s = {1, 2}
s.update({3, 4}) # s == {1, 2, 3, 4}
Note that the set is not in the same order as the original list; that is because sets are unordered, just like dicts.
This can easily be transformed back into a List with Python's built in list function, giving another list that is the
same list as the original but without duplicates:
list(unique_restaurants)
# ['Chicken Chicken', "McDonald's", 'Burger King']
Now any operations that could be performed on the original list can be done again.
leads to:
>>> a = {1, 2, 2, 3, 4}
>>> b = {3, 3, 4, 4, 5}
NOTE: {1} creates a set of one element, but {} creates an empty dict. The correct way to create an
empty set is set().
>>> a.intersection(b)
{3, 4}
Union
>>> a.union(b)
{1, 2, 3, 4, 5}
Difference
>>> a.difference(b)
{1, 2}
>>> b.difference(a)
{5}
Symmetric Difference
a.symmetric_difference(b) returns a new set with elements present in either a or b but not in both
>>> a.symmetric_difference(b)
{1, 2, 5}
>>> b.symmetric_difference(a)
{1, 2, 5}
>>> c = {1, 2}
>>> c.issubset(a)
True
>>> a.issuperset(c)
True
Method Operator
a.intersection(b) a & b
a.union(b) a|b
a.difference(b) a - b
a.symmetric_difference(b) a ^ b
a.issubset(b) a <= b
a.issuperset(b) a >= b
Disjoint sets
>>> d = {5, 6}
>>> a.isdisjoint(b) # {2, 3, 4} are in both sets
False
>>> a.isdisjoint(d)
True
Testing membership
>>> 1 in a
True
>>> 6 in a
False
Length
The builtin len() function returns the number of elements in the set
>>> len(a)
4
>>> len(b)
3
By saving the strings 'a', 'b', 'b', 'c' into a set data structure we've lost the information on the fact that 'b'
occurs twice. Of course saving the elements to a list would retain this information
but a list data structure introduces an extra unneeded ordering that will slow down our computations.
For implementing multisets Python provides the Counter class from the collections module (starting from version
2.7):
Counter is a dictionary where where elements are stored as dictionary keys and their counts are stored as
dictionary values. And as all dictionaries, it is an unordered collection.
The numbers module contains the abstract metaclasses for the numerical types:
a, b, c, d, e = 3, 2, 2.0, -3, 10
In Python 2 the result of the ' / ' operator depends on the type of the numerator and denominator.
a / b # = 1
a / c # = 1.5
d / b # = -2
b / a # = 0
d / e # = -1
Note that because both a and b are ints, the result is an int.
Recommended:
a / (b * 1.0) # = 1.5
1.0 * a / b # = 1.5
a / b * 1.0 # = 1.0 (careful with order of operations)
float(a) / b # = 1.5
a / float(b) # = 1.5
The ' // ' operator in Python 2 forces floored division regardless of type.
a // b # = 1
a // c # = 1.0
In Python 3 the / operator performs 'true' division regardless of types. The // operator performs floor division and
maintains type.
a / b # = 1.5
e / b # = 5.0
a // b # = 1
a // c # = 1.0
Note: the + operator is also used for concatenating strings, lists and tuples:
(a ** b) # = 8
pow(a, b) # = 8
import math
math.pow(a, b) # = 8.0 (always float; does not allow complex results)
import operator
operator.pow(a, b) # = 8
Another difference between the built-in pow and math.pow is that the built-in pow can accept three arguments:
a, b, c = 2, 3, 2
Special functions
import math
import cmath
c = 4
math.sqrt(c) # = 2.0 (always float; does not allow complex results)
cmath.sqrt(c) # = (2+0j) (always complex)
To compute other roots, such as a cube root, raise the number to the reciprocal of the degree of the root. This
could be done with any of the exponential functions or operator.
math.exp(0) # 1.0
math.exp(1) # 2.718281828459045 (e)
The function math.expm1(x) computes e ** x - 1. When x is small, this gives significantly better precision than
math.exp(x) - 1.
math.expm1(0) # 0.0
math.exp(1e-6) - 1 # 1.0000004999621837e-06
math.expm1(1e-6) # 1.0000005000001665e-06
# exact result # 1.000000500000166666708333341666...
import math
Note that math.hypot(x, y) is also the length of the vector (or Euclidean distance) from the origin (0, 0)
to the point (x, y).
To compute the Euclidean distance between two points (x1, y1) & (x2, y2) you can use math.hypot as
follows
math.hypot(x2-x1, y2-y1)
To convert from radians -> degrees and degrees -> radians respectively use math.degrees and math.radians
math.degrees(a)
# Out: 57.29577951308232
math.radians(57.29577951308232)
# Out: 1.0
a = a + 1
or
a = a * 2
a += 1
# and
a *= 2
Any mathematic operator can be used before the '=' character to make an inplace operation:
Other in place operators exist for the bitwise operators (^, | etc)
a * b # = 6
import operator
Note: The * operator is also used for repeated concatenation of strings, lists, and tuples:
3 * 'ab' # = 'ababab'
3 * ('a', 'b') # = ('a', 'b', 'a', 'b', 'a', 'b')
import math
import cmath
math.log(5) # = 1.6094379124341003
# optional base argument. Default is math.e
math.log(5, math.e) # = 1.6094379124341003
cmath.log(5) # = (1.6094379124341003+0j)
math.log(1000, 10) # 3.0 (always returns float)
cmath.log(1000, 10) # (3+0j)
# Logarithm base 2
math.log2(8) # = 3.0
# Logarithm base 10
math.log10(100) # = 2.0
cmath.log10(100) # = (2+0j)
3 % 4 # 3
10 % 2 # 0
6 % 4 # 2
import operator
operator.mod(3 , 4) # 3
-9 % 7 # 5
9 % -7 # -5
-9 % -7 # -2
If you need to find the result of integer division and modulus, you can use the divmod function as a shortcut: