Sign in to your Python Morsels account to save your screencast settings.
Don't have an account yet? Sign up here.
Python's dataclasses create an initializer method (__init__) automatically.
But what if your dataclass needs additional initialization logic?
We have a dataclass here that accepts a list for its recipients field:
from dataclasses import dataclass
from typing import List
@dataclass
class EmailMessage:
sender: str
recipients: List[str]
subject: str
body: str
When we pass a list into this class, a pointer to that list object is simply stored on the class:
>>> sender = "[email protected]"
>>> recipients = ["[email protected]", "[email protected]"]
>>> subject = "Hello"
>>> body = "Hi there!"
>>> message = EmailMessage(sender, recipients, subject, body)
>>> message.recipients
['[email protected]', '[email protected]']
So the list that we passed in is actually identical to the list that's on this object:
>>> message.recipients is recipients
True
This might be a problem for us. For example, if we pass in a list and then we change the list, the dataclass attribute will see the same change:
>>> recipients.append("[email protected]")
>>> message.recipients
['[email protected]', '[email protected]', '[email protected]']
This happens because both recipients and message.recipients point to the same list object:
Also, most classes and functions that accept lists actually accept any iterable.
For example, the tuple constructor will accept a list, a generator object, or any other iterable:
>>> tuple([2, 1, 3, 4])
(2, 1, 3, 4)
>>> tuple(n**2 for n in range(2, 6))
(4, 9, 16, 25)
>>> tuple("letters")
('l', 'e', 't', 't', 'e', 'r', 's')
It might make more sense to allow our class to accept any iterable, rather than only accepting lists.
What we need is a way to copy the given iterable into a new list each time a new instance of our dataclass is initialized.
Let's see how we could do that.
__init__ in a dataclassIt's possible to specify your own initializer method on a dataclass.
Here's a version of our dataclass with a custom __init__ method:
from dataclasses import dataclass
from typing import Iterable
@dataclass
class EmailMessage:
sender: str
recipients: Iterable[str]
subject: str
body: str
def __init__(self, sender: str, recipients: Iterable[str], subject: str, body: str):
self.sender = sender
self.recipients = list(recipients)
self.subject = subject
self.body = body
Our __init__ method specifies all the fields, and it makes sure to copy the given iterable into a new list.
This does solve our problem from before. The list that's stored on our class instance is now independent from the list that was originally passed in:
>>> sender = "[email protected]"
>>> recipients = ["[email protected]", "[email protected]"]
>>> subject = "Hello"
>>> body = "Hi there!"
>>> message = EmailMessage(sender, recipients, subject, body)
>>> recipients.append("[email protected]")
>>> message.recipients
['[email protected]', '[email protected]']
This works, but it does feel a little bit redundant.
__post_init__There's a better way to perform initialization steps on a dataclass.
dataclasses support a __post_init__ method, which is automatically called by the default dataclass initializer method:
from dataclasses import dataclass
from typing import Iterable
@dataclass
class EmailMessage:
sender: str
recipients: Iterable[str]
subject: str
body: str
def __post_init__(self):
self.recipients = list(self.recipients)
This __post_init__ method loops over the given iterable to turn it into a new list:
>>> sender = "[email protected]"
>>> recipients = ["[email protected]", "[email protected]"]
>>> subject = "Hello"
>>> body = "Hi there!"
>>> message = EmailMessage(sender, recipients, subject, body)
>>> recipients.append("[email protected]")
>>> message.recipients
['[email protected]', '[email protected]']
This works just like our custom __init__ method did before, except we don't need to specially handle all of the fields in our dataclass.
There's one more thing I should mention about the __post_init__ method that's specifically relevant for frozen dataclasses.
Here's a frozen dataclass with a property that's derived from concrete attributes on the dataclass:
from dataclasses import dataclass
@dataclass(frozen=True)
class Rectangle:
width: float
height: float
@property
def area(self):
return self.width * self.height
This property's logic will be run every time the property is accessed:
>>> r = Rectangle(3, 4)
>>> r.area
12
>>> r.area
12
Since we're working with a frozen dataclass, our class instances should be immutable, meaning we should be able to store this property as a concrete attribute on each instance of our dataclass.
Let's add a __post_init__ method that does exactly that:
from dataclasses import dataclass
@dataclass(frozen=True)
class Rectangle:
width: float
height: float
def __post_init__(self):
self.area = self.width * self.height
This might seem like a simple change, but unfortunately, it doesn't work. When we try to make a new instance of our dataclass, we'll see an exception:
>>> r = Rectangle(3, 4)
Traceback (most recent call last):
File "<python-input-6>", line 1, in <module>
r = Rectangle(3, 4)
File "<string>", line 5, in __init__
File "/home/trey/rectangle.py", line 9, in __post_init__
self.area = self.width * self.height
^^^^^^^^^
File "<string>", line 17, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'area'
We are trying to update an attribute on a frozen dataclass, and that's not allowed.
That's frustrating!
We're the owners of this dataclass.
Shouldn't there be some way to add new attributes to it as new instances are being created?
There is!
This exception is raised by the __setattr__ method on our dataclass:
File "<string>", line 17, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'area'
That's the method that gets called for each attribute assignment.
Setting frozen=True on our dataclass will override this method with one that always raises an exception.
We can avoid this exception by bypassing our class's __setattr__ method, and instead calling the __setattr__ method in our parent class:
from dataclasses import dataclass
@dataclass(frozen=True)
class Rectangle:
width: float
height: float
def __post_init__(self):
super().__setattr__('area', self.width * self.height)
This will call the __setattr__ method on object.
All classes inherit from the built-in object class, and the __setattr__ method on object does the actual attribute-setting behind the scenes.
So we can make new instances of this dataclass without an exception being raised:
>>> r = Rectangle(3, 4)
And our area attribute acts just as we'd expect it to:
>>> r.area
12
And we still can't directly assign to any of the attributes on this new dataclass because it's frozen, which is exactly what we want:
>>> r.width = 5
Traceback (most recent call last):
File "<python-input-3>", line 1, in <module>
r.width = 5
^^^^^^^
File "<string>", line 17, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'width'
__post_init__ instead of __init__ in dataclassesWhen you need to customize the initialization of a dataclass, don't make a __init__ method.
Instead, make a __post_init__ method (the post-initialization method).
The post-initialization method will be automatically called after the dataclass does its initialization in its automatically generated __init__ method.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Sign in to your Python Morsels account to track your progress.
Don't have an account yet? Sign up here.