All You Need To Know About Dataclass in Python
It is common for us to work with simple data structures such as a Tuple (tuple
) or a Dictionary (dict
) in Python. We use them almost everywhere and every day in our lives programmers to store data.
For instance, we can represent a car object with the code example below:
# Using Dictionary
car = {"name": "Model X", "brand": "Tesla", "price": 120_000}
# OR using Tuple
car = ("Model X", "Tesla", 120_000)
Yet, these basic data structures become less ideal when we have to deal with more complex data. Here, we would need to remember that car
represents a car Dictionary or Tuple in our app, not some string or integer.
Using a Tuple to represent our car object in the example above works just fine if we only have three fields (name
, brand
, and price
). As we add more fields to our car object such as manufacturer
, condition
, etc., we would need to remember our attributes' order.
For the case of using Dictionary, we would not be able to use dot notation (i.e. car.name
) to access our attributes. Plus, a deep-nested Dictionary tends to be very messy to work with.
TL;DR
In this article, we are going to talk about better alternatives to our regular Dictionary or Tuple. Here are the topics covered:
- Named Tuple
- Data Classes, a better alternative to Named Tuple
- Customizing Data Classes
- When to use Data Classes
Let’s start!
Named Tuple To The Rescue
To improve our code readability, a more common approach is to use a Named Tuple (namedtuple
) from Python's built-in collections
library.
Using our car example above here is what Named Tuple would look like:
from collections import namedtuple
Car = namedtuple('Car', ['name', 'brand', 'price'])
car = Car('Model X', 'Tesla', 120_000)
Much better. So, why not just use Named Tuple all the time?
Well, Named Tuple does come with its own sets of restrictions. Besides not being able to assign a default value to our car properties, Named Tuple is immutable by nature.
Here’s an explanation from PEP on why we shouldn’t just use Named Tuple.
Moreover, using Dictionary, Tuple, or even Named Tuple does not allow us to have custom class methods, which begs the question: why not just use the regular Python Class?
Python Class
In Python, everything is an object, and most objects have attributes and methods. Typically, we would use class
in Python to create our own custom objects with their own properties and methods.
Using our previous example to create a simple car object:
class Car:
def __init__(self, name: str, brand: str, price: int) -> None:
self.name = name
self.brand = brand
self.price = price
car1 = Car('Model X', 'Tesla', 120_000)
car2 = Car('Model X', 'Tesla', 120_000)
car1 == car2 # False. We need to write out own __eq__ method to handle this.
Every time a new property is added to our car object, we would need to pass them into the __init__
method. What if we needed to add a more descriptive representation of our car object to our __repr__
method? What if we need to compare two car instances of the same car object?
Honestly, things aren’t that bad when we’re only dealing with a single car object. But what if we have to add more classes such as Manufacturer
, CarDealer
, etc.?
As you can already tell, the signs of code duplication are everywhere, and it smells! Truth be told, unless we actually need custom methods, we might be better off using Named Tuple.
As the bearer of bad news, this is often not the case in real life.
Enter Data Classes
Introduced in Python 3.7, Data Classes (dataclasses
) provides us with an easy way to make our class objects less verbose. To simplify, Data Classes are just regular classes that help us abstract a tonne of boilerplate codes.
To rewrite our previous example with Data Class, we simply have to decorate our basic class with @dataclass
:
from dataclasses import dataclass
@dataclass
class Car:
name: str # Supports typing out of the box!
brand: str
price: int
car1 = Car('Model X', 'Tesla', 120_000)
car2 = Car('Model X', 'Tesla', 120_000)
car1 == car2 # True. __eq__ is generated automatically.
car2.name # Supports dot annotation!
The best part of Data Class is that it automatically generates common Dunder methods in the class such as the __repr__
and __eq__
, eliminating all the duplicated code.
Customizing Data Class
- In certain cases, we might need to customize our Data Class fields:
from dataclasses import dataclass, field
@dataclass
class Car:
name: str = field(compare=False) # To exclude this field from comparison
brand: str = field(repr=False) # To hide fields in __repr__
price: int = 120_000
condition: str = field(default='New')
2. To override what happens after __init__
inside our newly created Data Class, we can declare a __post_init__
method. For example, we can easily override the price of the car based on its initialized condition:
from dataclasses import dataclass, field
@dataclass
class Car:
name: str = field(compare=False)
brand: str = field(repr=False)
price: int = 120_000
condition: str = field(default='New')
def __post_init__(self):
if self.condition == "Old":
self.price -= 30_000
old_car = Car('Model X', 'Tesla', 130_000, 'Old')
# Car(name='Model X', price=100000, condition='Old')
3. To make our Data Class immutable, we simply have to add @dataclass(frozen=True)
as our decorator.
4. Another good use case of Data Class is when we need to deal with nested Dictionary. Here’s a simple example of what a Data Class could do:
# ...
from typing import List
@dataclass
class CarDealer:
cars: List[Car]
car3 = Car('Model S', 'Tesla', 89_000)
car4 = Car('Model Y', 'Tesla', 54_000)
car_dealer = CarDealer(cars=[car3, car4])
# CarDealer(cars=[Car(name='Model S', price=89000, condition='New'), Car(name='Model Y', price=54000, condition='New')])
5. Lastly, in case it wasn’t obvious, Data Class supports inheritance too as they indeed behave just like our good old regular class.
So, when to use Data Class?
vs. Named Tuple
The use of Data Class is most often compared with the use of Named Tuples. For the most part, Data Class offers the same advantage if not more than a Named Tuple.
In the case where you need to unpack your variables, you might want to consider using Named Tuple instead.
vs. Dictionary
When our Dictionary has a fixed set of keys where their corresponding values have fixed types, it is almost always better to use Data Class.
In short, the rule of thumb is rather simple, if you create a dictionary or a class that mostly consists of attributes about the underlying data, use Data Class. It saves you a bunch of time.
Finally, Data Class also preserves type information for each property, which is a huge added advantage!
Closing Thoughts
Again, there is nothing wrong with just creating regular classes in Python. However, that could mean writing a lot of repetitive and boilerplate code just to set up our class instance.
To summarize what we went through, Data Class is great because:
- Saves time and reduces code duplication
- Offers more flexibility, it can be mutable or immutable
- Supports inheritance
- Allows for customization and default values
Don’t get me wrong. Not every class in Python needs to be a Data Class. A Data Class is not a silver bullet.
For the most part, we should always keep in mind that we shouldn’t complicate things if we don’t have to. As long as we’re not dealing with something overly complex, a good old Dictionary might just do the job.
Thank you for reading!