Python Typing: Resisting the Any type

With typing in Python, we aim to restrict as many invalid programs as possible before they’re ever run. This post covers some useful features for tightening up our types:

TypeVar for when an unknown type appears multiple times in the same context [Jump]
@overload for when you have a function whose behaviour depends on its input [Jump]
Protocol for supporting any type with the desired attributes and methods [Jump]

Introduction #

Python itself doesn’t have a static type checker, but does provide a syntax for adding type annotations to your code:

# basics.py

from typing import List
from dataclasses import dataclass


# We can annotate the input and output types of functions.
def fibonacci(n: int) -> int:
    assert n >= 0
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)


@dataclass
class Document:
    # We can annotate class attributes too.
    # We're using the generic type 'List' with parameter 'str'.
    # This means the `lines` attribute is a list of strings.
    lines: List[str]


# This is invalid:
broken_doc = Document(lines=[1, 2, 3])

# This is valid:
fibonacci_doc = Document(
    lines=[
        f"{n}th fib number is {fibonacci(n)}"
        for n in range(20)
    ]
)

IDEs like PyCharm and type-checking tools like Mypy and Pyre use these annotations to identify type errors. The intended rules of this type-checking are mostly standardised in Python Enhancement Proposals (PEPs), so the behaviour is similar between the tools.

Getting started with Mypy #

Here, we’ll use Mypy for type-checking. To get started with Mypy, simply pip install mypy. Here’s an example of running Mypy using the basics.py file from above

# In your terminal:
virtualenv mypy-venv -p python3.8
source mypy-venv/bin/activate
pip install mypy
mypy --version
# mypy 0.770
mypy --strict python-typing-playground/basics.py
# warm-up/basics.py:22: error: List item 0 has incompatible type "int"; expected "str"
# warm-up/basics.py:22: error: List item 1 has incompatible type "int"; expected "str"
# warm-up/basics.py:22: error: List item 2 has incompatible type "int"; expected "str"
# Found 3 errors in 1 file (checked 1 source file)

I’m using Python 3.8. Many interesting typing features were added to the standard library in 3.8 but if you can’t use Python 3.8, typing-extensions often has your back.

Mypy lets you get started with a large existing untyped code base and gradually add type hints over time. It will simply silently treat all objects as Any if it can’t find type hints for them. I recommend using the --strict flag when first trying it out. This mostly stops Mypy from silently assuming things are Any, which I think is helpful when trying to understand what Mypy can and cannot infer.

What’s wrong with `Any` anyway? #

Adding types restricts the way our functions, classes and variables can be used. This restriction is good because it helps catch whole categories of bugs before the code is ever run whilst also (usually) making the code easier to think about. Typically, our goal with typing in Python is:

to restrict as many ‘invalid’ usages as possible,
whilst allowing all ‘valid’ usages,
and keeping our code relatively tidy.

Sometimes, using only the syntax given above, we might get stuck on how to express the type of something and turn to the Any type. The Any type is a magical type which pretends to the type-checker to support any operation you could possibly want, and to also be a parent type of all types:

from typing import Any


def do_something_shady(magic_input: Any) -> Any:
    # We can do any operation we want on an Any object.
    # The type-checker will allow it 
    # and assume the result is an Any object
    return magic_input.thing["stuff"] + 42


# All types are compatible with Any,
# so even if we pass in a str, this is valid.
# If we try to run this, it will fail,
# since a string doesn't have a 'thing' attribute.
do_something_shady(magic_input="just a string")

Clearly, using Any can allow many invalid programs. I’d like to share a few useful Python type-system features that we can consider before giving in to the Any type’s magical allure, or otherwise loosening our types.

Polymorphism with `TypeVar` #

Type variables lets us specify that some unknown type will appear multiple times in the same context, whether that be multiple times in the same function signature, or multiple times in methods of a generic class. We don’t know what the type will be, but we know it will be the same in all the places where it appears.

from typing import List, TypeVar
from dataclasses import dataclass

# Declare a type variable
ValueType = TypeVar("ValueType")  # The string in this constructor must match the variable name


def repeat(value: ValueType, count: int) -> List[ValueType]:
    """Returns the input value repeated `count` times in a list."""
    return [value] * count

# This is valid
", ".join(repeat("a", count=5))

# This is invalid (can't do a string join on a list of ints)
", ".join(repeat(3, count=5))

Because ValueType is used in two different places in the same signature, Mypy will check, for each call to this function, that the values used in those two places match types. In this case, it checks that the type of value returned from the function is always a list of the type of value given to the function. This also means Mypy knows more about the result of calling the function, and can be stricter there too.

Using bound= #

So TypeVar is great for capturing this idea of an unknown type appearing multiple times. Sometimes we do know something about the type, and maybe we want to use a certain field that we know will exist on the type, but we still want to capture and check the fact that it appears multiple times.

By default a TypeVar will bind to any type, all the way up to object but we can put an ‘upper bound’ on this using the bound parameter. This says ‘only let this TypeVar bind to a subtype of X’, where X is some type that we care about.

from typing import List, TypeVar
from dataclasses import dataclass

@dataclass
class Animal:
    name: str


class Dog(Animal):
    def pat(self) -> None:
        print(f"gave {self.name} a good pat")


class Cat(Animal):
    def stroke(self) -> None:
        print(f"stroked {self.name}")


AnimalType = TypeVar('AnimalType', bound=Animal)


def sort_animals_by_name(items: List[AnimalType]) -> List[AnimalType]:
    return sorted(items, key=lambda animal: animal.name)


# This is valid (all the inputs are dogs so all have the 'pat' method)
sorted_dogs = sort_animals_by_name([Dog(name="spots"), Dog(name="buzz")])
sorted_dogs[0].pat()

# This is invalid
sorted_dogs[0].stroke()

# This is invalid (cannot use str due to bound=Animal)
not_animals = ["John", "Joan", "Jan"]
sort_animals_by_name(not_animals)

Because we specified that AnimalType must be a subtype of Animal, the type-checker allows us to use the name property from Animal within our function. Note that we still aren’t able to use a more specific method like pat within the body of sort_animals_by_name, since it doesn’t always exist on the upper bound, Animal.

Misconceptions about scoping #

Because of the strange syntax for declaring a type variable, where we create a global object, it’s easy to get confused about their scoping rules:

Question: Is the following valid?

from typing import List, TypeVar

T = TypeVar("T")

def repeat(value: T, times: int) -> List[T]:
    return [value] * times

def identity(x: T) -> T:
    return x

repeat("echo", times=5)
repeat(42, times=5)
identity(42)

After all, we seem to be asking T to be a string and then to be an int.

The answer is yes, this is valid. Generic functions using TypeVars can happily be called multiple times with different input types. The two calls to the function are unrelated as far as this TypeVar is concerned. Similarly, TypeVars can happily be recycled across multiple independent functions. You often only need multiple TypeVars if you need multiple distinct types within the same function signature.

See Scoping rules for type variables within PEP-484 for more examples.

Overloading with `@overload` #

Sometimes we have an overloaded function, one which has different behaviour depending on the input given.

Overloading in Python is quite different to other languages. In other languages, we might write multiple function implementations with the same function name but different arguments (either in type or in name) and when that function is called, the correct implementation would be chosen based on the given arguments.

In Python, we are not going to define multiple implementations (remember our type annotations aren’t read at runtime). In Python we just have one implementation, which manually checks the types and does the right thing. All @overload lets us do is tell the type checker which combinations of parameters and outputs are valid.

We do this by providing multiple @overload signatures before defining our actual implementation.

from typing import Optional, overload
import math

@overload
def get_circle_area(*, radius: float) -> float: ...

@overload
def get_circle_area(*, circumference: float) -> float: ...

def get_circle_area(*,
     radius: Optional[float] = None,
     circumference: Optional[float] = None
) -> float:
    """
    Takes either a radius or circumference of a circle,
    and returns the area of that circle.
    """

    # Check we've been given exactly one of the two forms of input
    if radius is not None and circumference is not None:
        raise ValueError("Can't use both radius and circumference")
    elif radius is not None:
        canonical_radius = radius
    elif circumference is not None:
        canonical_radius = circumference / math.tau
    else:
        raise ValueError("Give either a radius or circumference")
    
    return math.pi * canonical_radius * canonical_radius


# Invalid (want exactly one of radius or circumference):
get_circle_area()
get_circle_area(radius=None)
get_circle_area(circumference=None)
get_circle_area(radius=1.0, circumference=None)
get_circle_area(radius=1.0, circumference=3.0)
get_circle_area(3.0)  # ambiguous without the argument keyword

# Valid:
get_circle_area(radius=1.0)
get_circle_area(circumference=math.tau)

We used two @overload signatures (with empty implementations) before specifying the implementation itself. The type signature of the implementation itself is only used for type-checking that implementation, not for type-checking usages of the function. The type checker will also make sure that our implementation supports all the @overload signatures described. For example, If we have an @overload which accepts a string argument, but the type of the implementation only takes floats, Mypy will throw an error.

As a side note, calling this function with a number but without the argument keyword (e.g. radius=) would be ambiguous, we wouldn’t know if the number was a radius or a circumference. The asterisk in the function signatures enforces that keywords be used. For more explanation, see Keyword-Only Arguments — Specification

Static Duck Typing with `Protocol` #

If we’re restricting the input types of a function, we sometimes only care that it has certain attributes and methods, not whether it’s a subclass of a particular class. Furthermore, it’s not always possible to modify a class to inherit from a certain base class, because it might be from a library that you don’t control.

Protocol lets us capture these attribute and method constraints without requiring the type to inherit from any particular class.

# If you can't use Python 3.8, you can also import Protocol from typing-extensions
# https://pypi.org/project/typing-extensions/
from abc import abstractmethod
from typing import Protocol
from dataclasses import dataclass

# Setup: we'll define a few classes which happen to have a 'name'

@dataclass
class Person:
    first_name: str
    last_name: str
    height: float

    @property
    def name(self) -> str:
        return self.first_name + " " + self.last_name

    def greet(self) -> str:
        return f"Hi, my name is {self.name}."


@dataclass
class Company:
    name: str
    address: str

    def greet(self) -> str:
        return f"We are {self.name}, find us at {self.address}."


@dataclass
class Dog:
    name: str
    color: str

    def greet(self) -> str:
        return "Woof!"



class Greetable(Protocol):
    """
    Matches any type with
    a readable (but not necessarily writeable) `name`
    and a `greet` method that returns a string.
    """
    @property
    def name(self) -> str: ...

    def greet(self) -> str: ...


def introduce(greetable: Greetable) -> None:
    print(
        f"This thing is named {greetable.name}"
        f" and it says '{greetable.greet()}'"
    )


class Renameable(Protocol):
    """Matches any type with a read/write `name`."""
    name: str


def rename(renameable: Renameable, new_name: str) -> None:
    renameable.name = new_name


spots = Dog(name="spots", color="white")
acme = Company(name="Acme Inc.", address="123 Industry Ave.")
john = Person(first_name="John", last_name="Johnson", height=1.96)

# Valid things
introduce(spots)
introduce(acme)
introduce(john)
rename(spots, "spotsy")

# Invalid things
introduce(42)
introduce("blah")
rename(john, "J-man")  # Person's `name` is read-only

None of the classes we defined inherit from Greetable (in fact, non-protocol classes cannot inherit from Protocols) yet they all conform to it because they have a name and a greet method with the appropriate types. As such, they can all be passed to a function that takes a Greetable.

When specifying attributes that must exist on the type, we saw both:

the @property form, which lets us specify that we only need to read this value,
and the name: str form which is a shorthand for an attribute that is both readable and writeable.

We can also opt in to these Protocols being checkable at runtime. For more detail, see @runtime_checkable decorator within PEP 544

Conclusion #

In this post we’ve seen some useful features for tightening our types:

Use TypeVar to express that a single unknown type will appear multiple times in the same context
Use @overload when you have a function whose behaviour depends on its input
Use Protocol when you want to support any type which happens to have the attributes and methods that you need.

Introduction #

Getting started with Mypy #

What’s wrong with Any anyway? #

Polymorphism with TypeVar #