Customizing dataclass initialization PREMIUM

Series: Dataclasses

Trey Hunner

5 min. read • 4 min. video • Python 3.10—3.14 • Oct. 18, 2024

Watch as video

04:15

Show captions

Autoplay

Auto-expand

Python's dataclasses create an initializer method (__init__) automatically. But what if your dataclass needs additional initialization logic?

A dataclass with mutable arguments

We have a dataclass here that accepts a list for its recipients field:

from dataclasses import dataclass
from typing import List

@dataclass
class EmailMessage:
    sender: str
    recipients: List[str]
    subject: str
    body: str

When we pass a list into this class, a pointer to that list object is simply stored on the class:

>>> sender = "[email protected]"
>>> recipients = ["[email protected]", "[email protected]"]
>>> subject = "Hello"
>>> body = "Hi there!"
>>> message = EmailMessage(sender, recipients, subject, body)
>>> message.recipients
['[email protected]', '[email protected]']

So the list that we passed in is actually identical to the list that's on this object:

>>> message.recipients is recipients
True

This might be a problem for us. For example, if we pass in a list and then we change the list, the dataclass attribute will see the same change:

>>> recipients.append("[email protected]")
>>> message.recipients
['[email protected]', '[email protected]', '[email protected]']

This happens because both recipients and message.recipients point to the same list object:

Also, most classes and functions that accept lists actually accept any iterable. For example, the tuple constructor will accept a list, a generator object, or any other iterable:

>>> tuple([2, 1, 3, 4])
(2, 1, 3, 4)
>>> tuple(n**2 for n in range(2, 6))
(4, 9, 16, 25)
>>> tuple("letters")
('l', 'e', 't', 't', 'e', 'r', 's')

It might make more sense to allow our class to accept any iterable, rather than only accepting lists.

What we need is a way to copy the given iterable into a new list each time a new instance of our dataclass is initialized.

Let's see how we could do that.

Overriding `init` in a dataclass

It's possible to specify your own initializer method on a dataclass. Here's a version of our dataclass with a custom __init__ method:

from dataclasses import dataclass
from typing import Iterable

@dataclass
class EmailMessage:
    sender: str
    recipients: Iterable[str]
    subject: str
    body: str

    def __init__(self, sender: str, recipients: Iterable[str], subject: str, body: str):
        self.sender = sender
        self.recipients = list(recipients)
        self.subject = subject
        self.body = body

Our __init__ method specifies all the fields, and it makes sure to copy the given iterable into a new list.

This does solve our problem from before. The list that's stored on our class instance is now independent from the list that was originally passed in:

>>> sender = "[email protected]"
>>> recipients = ["[email protected]", "[email protected]"]
>>> subject = "Hello"
>>> body = "Hi there!"
>>> message = EmailMessage(sender, recipients, subject, body)
>>> recipients.append("[email protected]")
>>> message.recipients
['[email protected]', '[email protected]']

This works, but it does feel a little bit redundant.

Customizing dataclass initialization with `__post_init__`

There's a better way to perform initialization steps on a dataclass.

dataclasses support a __post_init__ method, which is automatically called by the default dataclass initializer method:

from dataclasses import dataclass
from typing import Iterable

@dataclass
class EmailMessage:
    sender: str
    recipients: Iterable[str]
    subject: str
    body: str

    def __post_init__(self):
        self.recipients = list(self.recipients)

This __post_init__ method loops over the given iterable to turn it into a new list:

>>> sender = "[email protected]"
>>> recipients = ["[email protected]", "[email protected]"]
>>> subject = "Hello"
>>> body = "Hi there!"
>>> message = EmailMessage(sender, recipients, subject, body)
>>> recipients.append("[email protected]")
>>> message.recipients
['[email protected]', '[email protected]']

This works just like our custom __init__ method did before, except we don't need to specially handle all of the fields in our dataclass.

Adding attributes to frozen dataclasses

There's one more thing I should mention about the __post_init__ method that's specifically relevant for frozen dataclasses.

Here's a frozen dataclass with a property that's derived from concrete attributes on the dataclass:

from dataclasses import dataclass

@dataclass(frozen=True)
class Rectangle:
    width: float
    height: float

    @property
    def area(self):
        return self.width * self.height

This property's logic will be run every time the property is accessed:

>>> r = Rectangle(3, 4)
>>> r.area
12
>>> r.area
12

Since we're working with a frozen dataclass, our class instances should be immutable, meaning we should be able to store this property as a concrete attribute on each instance of our dataclass.

Let's add a __post_init__ method that does exactly that:

from dataclasses import dataclass

@dataclass(frozen=True)
class Rectangle:
    width: float
    height: float

    def __post_init__(self):
        self.area = self.width * self.height

This might seem like a simple change, but unfortunately, it doesn't work. When we try to make a new instance of our dataclass, we'll see an exception:

>>> r = Rectangle(3, 4)
Traceback (most recent call last):
  File "<python-input-6>", line 1, in <module>
    r = Rectangle(3, 4)
  File "<string>", line 5, in __init__
  File "/home/trey/rectangle.py", line 9, in __post_init__
    self.area = self.width * self.height
    ^^^^^^^^^
  File "<string>", line 17, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'area'

We are trying to update an attribute on a frozen dataclass, and that's not allowed.

That's frustrating!

We're the owners of this dataclass. Shouldn't there be some way to add new attributes to it as new instances are being created?

There is!

This exception is raised by the __setattr__ method on our dataclass:

  File "<string>", line 17, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'area'

That's the method that gets called for each attribute assignment. Setting frozen=True on our dataclass will override this method with one that always raises an exception.

We can avoid this exception by bypassing our class's __setattr__ method, and instead calling the __setattr__ method in our parent class:

from dataclasses import dataclass

@dataclass(frozen=True)
class Rectangle:
    width: float
    height: float

    def __post_init__(self):
        super().__setattr__('area', self.width * self.height)

This will call the __setattr__ method on object.

All classes inherit from the built-in object class, and the __setattr__ method on object does the actual attribute-setting behind the scenes.

So we can make new instances of this dataclass without an exception being raised:

>>> r = Rectangle(3, 4)

And our area attribute acts just as we'd expect it to:

>>> r.area
12

And we still can't directly assign to any of the attributes on this new dataclass because it's frozen, which is exactly what we want:

>>> r.width = 5
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    r.width = 5
    ^^^^^^^
  File "<string>", line 17, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'width'

Use `__post_init` instead of `init__` in dataclasses

When you need to customize the initialization of a dataclass, don't make a __init__ method. Instead, make a __post_init__ method (the post-initialization method).

The post-initialization method will be automatically called after the dataclass does its initialization in its automatically generated __init__ method.