- Introduction
- Variables
- Functions
- Objects and Data Structures
- Classes
- Don't repeat yourself (DRY)
- Translation
Software engineering principles, from Robert C. Martin's book Clean Code, adapted for Python. This is not a style guide. It's a guide to producing readable, reusable, and refactorable software in Python.
Not every principle herein has to be strictly followed, and even fewer will be universally agreed upon. These are guidelines and nothing more, but they are ones codified over many years of collective experience by the authors of Clean Code.
Inspired from clean-code-javascript
Targets Python3.7+
Bad:
import datetime
ymdstr = datetime.date.today().strftime("%y-%m-%d")Good:
import datetime
current_date: str = datetime.date.today().strftime("%y-%m-%d")Bad: Here we use three different names for the same underlying entity:
def get_user_info(): pass
def get_client_data(): pass
def get_customer_record(): passGood: If the entity is the same, you should be consistent in referring to it in your functions:
def get_user_info(): pass
def get_user_data(): pass
def get_user_record(): passEven better Python is (also) an object oriented programming language. If it makes sense, package the functions together with the concrete implementation of the entity in your code, as instance attributes, property methods, or methods:
from typing import Union, Dict
class Record:
pass
class User:
info : str
@property
def data(self) -> Dict[str, str]:
return {}
def get_record(self) -> Union[Record, None]:
return Record()We will read more code than we will ever write. It's important that the code we do write is readable and searchable. By not naming variables that end up being meaningful for understanding our program, we hurt our readers. Make your names searchable.
Bad:
import time
# What is the number 86400 for again?
time.sleep(86400)Good:
import time
# Declare them in the global namespace for the module.
SECONDS_IN_A_DAY = 60 * 60 * 24
time.sleep(SECONDS_IN_A_DAY)Bad:
import re
address = "One Infinite Loop, Cupertino 95014"
city_zip_code_regex = r"^[^,\\]+[,\\\s]+(.+?)\s*(\d{5})?$"
matches = re.match(city_zip_code_regex, address)
if matches:
print(f"{matches[1]}: {matches[2]}")Not bad:
It's better, but we are still heavily dependent on regex.
import re
address = "One Infinite Loop, Cupertino 95014"
city_zip_code_regex = r"^[^,\\]+[,\\\s]+(.+?)\s*(\d{5})?$"
matches = re.match(city_zip_code_regex, address)
if matches:
city, zip_code = matches.groups()
print(f"{city}: {zip_code}")Good:
Decrease dependence on regex by naming subpatterns.
import re
address = "One Infinite Loop, Cupertino 95014"
city_zip_code_regex = r"^[^,\\]+[,\\\s]+(?P<city>.+?)\s*(?P<zip_code>\d{5})?$"
matches = re.match(city_zip_code_regex, address)
if matches:
print(f"{matches['city']}, {matches['zip_code']}")Don’t force the reader of your code to translate what the variable means. Explicit is better than implicit.
Bad:
seq = ("Austin", "New York", "San Francisco")
for item in seq:
#do_stuff()
#do_some_other_stuff()
# Wait, what's `item` again?
print(item)Good:
locations = ("Austin", "New York", "San Francisco")
for location in locations:
#do_stuff()
#do_some_other_stuff()
# ...
print(location)If your class/object name tells you something, don't repeat that in your variable name.
Bad:
class Car:
car_make: str
car_model: str
car_color: strGood:
class Car:
make: str
model: str
color: strTricky
Why write:
import hashlib
def create_micro_brewery(name):
name = "Hipster Brew Co." if name is None else name
slug = hashlib.sha1(name.encode()).hexdigest()
# etc.... when you can specify a default argument instead? This also makes it clear that you are expecting a string as the argument.
Good:
import hashlib
def create_micro_brewery(name: str = "Hipster Brew Co."):
slug = hashlib.sha1(name.encode()).hexdigest()
# etc.Limiting the amount of function parameters is incredibly important because it makes testing your function easier. Having more than three leads to a combinatorial explosion where you have to test tons of different cases with each separate argument.
Zero arguments is the ideal case. One or two arguments is ok, and three should be avoided. Anything more than that should be consolidated. Usually, if you have more than two arguments then your function is trying to do too much. In cases where it's not, most of the time a higher-level object will suffice as an argument.
Bad:
def create_menu(title, body, button_text, cancellable):
passJava-esque:
class Menu:
def __init__(self, config: dict):
self.title = config["title"]
self.body = config["body"]
# ...
menu = Menu(
{
"title": "My Menu",
"body": "Something about my menu",
"button_text": "OK",
"cancellable": False
}
)Also good
class MenuConfig:
"""A configuration for the Menu.
Attributes:
title: The title of the Menu.
body: The body of the Menu.
button_text: The text for the button label.
cancellable: Can it be cancelled?
"""
title: str
body: str
button_text: str
cancellable: bool = False
def create_menu(config: MenuConfig) -> None:
title = config.title
body = config.body
# ...
config = MenuConfig()
config.title = "My delicious menu"
config.body = "A description of the various items on the menu"
config.button_text = "Order now!"
# The instance attribute overrides the default class attribute.
config.cancellable = True
create_menu(config)Fancy
from typing import NamedTuple
class MenuConfig(NamedTuple):
"""A configuration for the Menu.
Attributes:
title: The title of the Menu.
body: The body of the Menu.
button_text: The text for the button label.
cancellable: Can it be cancelled?
"""
title: str
body: str
button_text: str
cancellable: bool = False
def create_menu(config: MenuConfig):
title, body, button_text, cancellable = config
# ...
create_menu(
MenuConfig(
title="My delicious menu",
body="A description of the various items on the menu",
button_text="Order now!"
)
)Even fancier
from dataclasses import astuple, dataclass
@dataclass
class MenuConfig:
"""A configuration for the Menu.
Attributes:
title: The title of the Menu.
body: The body of the Menu.
button_text: The text for the button label.
cancellable: Can it be cancelled?
"""
title: str
body: str
button_text: str
cancellable: bool = False
def create_menu(config: MenuConfig):
title, body, button_text, cancellable = astuple(config)
# ...
create_menu(
MenuConfig(
title="My delicious menu",
body="A description of the various items on the menu",
button_text="Order now!"
)
)Even fancier, Python3.8+ only
from typing import TypedDict
class MenuConfig(TypedDict):
"""A configuration for the Menu.
Attributes:
title: The title of the Menu.
body: The body of the Menu.
button_text: The text for the button label.
cancellable: Can it be cancelled?
"""
title: str
body: str
button_text: str
cancellable: bool
def create_menu(config: MenuConfig):
title = config["title"]
# ...
create_menu(
# You need to supply all the parameters
MenuConfig(
title="My delicious menu",
body="A description of the various items on the menu",
button_text="Order now!",
cancellable=True
)
)This is by far the most important rule in software engineering. When functions do more than one thing, they are harder to compose, test, and reason about. When you can isolate a function to just one action, they can be refactored easily and your code will read much cleaner. If you take nothing else away from this guide other than this, you'll be ahead of many developers.
Bad:
from typing import List
class Client:
active: bool
def email(client: Client) -> None:
pass
def email_clients(clients: List[Client]) -> None:
"""Filter active clients and send them an email.
"""
for client in clients:
if client.active:
email(client)Good:
from typing import List
class Client:
active: bool
def email(client: Client) -> None:
pass
def get_active_clients(clients: List[Client]) -> List[Client]:
"""Filter active clients.
"""
return [client for client in clients if client.active]
def email_clients(clients: List[Client]) -> None:
"""Send an email to a given list of clients.
"""
for client in get_active_clients(clients):
email(client)Do you see an opportunity for using generators now?
Even better
from typing import Generator, Iterator
class Client:
active: bool
def email(client: Client):
pass
def active_clients(clients: Iterator[Client]) -> Generator[Client, None, None]:
"""Only active clients"""
return (client for client in clients if client.active)
def email_client(clients: Iterator[Client]) -> None:
"""Send an email to a given list of clients.
"""
for client in active_clients(clients):
email(client)Bad:
class Email:
def handle(self) -> None:
pass
message = Email()
# What is this supposed to do again?
message.handle()Good:
class Email:
def send(self) -> None:
"""Send this message"""
message = Email()
message.send()When you have more than one level of abstraction, your function is usually doing too much. Splitting up functions leads to reusability and easier testing.
Bad:
# type: ignore
def parse_better_js_alternative(code: str) -> None:
regexes = [
# ...
]
statements = code.split('\n')
tokens = []
for regex in regexes:
for statement in statements:
pass
ast = []
for token in tokens:
pass
for node in ast:
passGood:
from typing import Tuple, List, Dict
REGEXES: Tuple = (
# ...
)
def parse_better_js_alternative(code: str) -> None:
tokens: List = tokenize(code)
syntax_tree: List = parse(tokens)
for node in syntax_tree:
pass
def tokenize(code: str) -> List:
statements = code.split()
tokens: List[Dict] = []
for regex in REGEXES:
for statement in statements:
pass
return tokens
def parse(tokens: List) -> List:
syntax_tree: List[Dict] = []
for token in tokens:
pass
return syntax_tree