Running Mypy in Pre-commit

The only thing worse than not type-checking your code is thinking you are type-checking it when you aren’t.

This post is about running Mypy in a Git pre-commit hook using the Pre-commit framework. Running Mypy is a little fiddly in itself, and pre-commit/mirrors-mypy (the de facto way to call Mypy in Pre-commit) calls Mypy in a slightly opinionated way that may introduce more confusions or hide errors you want to see.

Three take-away points if you’re in a hurry:

Make sure you run Mypy on all files, not just those that have changed
Make sure Mypy has access to the installed dependencies of the code it is type-checking
Be careful with the use of flags that reduce the strictness of Mypy like --ignore-missing-imports

Here, I show you how to make your own Mypy hook that suits your needs, in 3 only-somewhat-fiddly steps:

Running Mypy correctly outside of Pre-commit [Jump]
Creating your own Pre-commit hook [Jump]
Giving Mypy access to your project dependencies [Jump]

A solution that works in my case #

Before discussing the gory details and alternatives, here’s a solution that works for my project.

I add a mypy.ini:

[mypy]
# mypy_path will vary (and may not be necessary) 
# for your project layout.
mypy_path=./src:./tests

# Explicitly blacklist modules in use
# that don't have type stubs.
[mypy-pytest.*]
ignore_missing_imports = True
[mypy-pyproj.*]
ignore_missing_imports = True

and then add a script at ./run-mypy:

#!/usr/bin/env bash

# A script for running mypy, 
# with all its dependencies installed.

set -o errexit

# Change directory to the project root directory.
cd "$(dirname "$0")"

# Install the dependencies into the mypy env.
# Note that this can take seconds to run.
# In my case, I need to use a custom index URL.
# Avoid pip spending time quietly retrying since 
# likely cause of failure is lack of VPN connection.
pip install --editable . \
  --index-url https://custom-index-url.com/simple \
  --retries 1 \
  --no-input \
  --quiet

# Run on all files, 
# ignoring the paths passed to this script,
# so as not to miss type errors.
# My repo makes use of namespace packages.
# Use the namespace-packages flag 
# and specify the package to run on explicitly.
# Note that we do not use --ignore-missing-imports, 
# as this can give us false confidence in our results.
mypy --package acme --namespace-packages

and then define a custom Pre-commit hook that runs that script in: ./.pre-commit-config.yaml

# .pre-commit-config.yaml

repos
- repo: local
  # We do not use pre-commit/mirrors-mypy, 
  # as it comes with opinionated defaults 
  # (like --ignore-missing-imports)
  # and is difficult to configure to run 
  # with the dependencies correctly installed.
  hooks:
    - id: mypy
      name: mypy
      entry: "./run-mypy"
      language: python
      # use your preferred Python version
      language_version: python3.7
      additional_dependencies: ["mypy==0.790"]
      types: [python]
      # use require_serial so that script
      # is only called once per commit
      require_serial: true
      # Print the number of files as a sanity-check 
      verbose: true

You’ll have to adapt this to your own project structure and strictness/performance needs. To expose all the issues this tries to cover, we’ll build it up in 3 steps.

Step 1: Running Mypy correctly outside of Pre-commit #

Before thinking about Pre-commit, we should make sure we can run Mypy directly in the desired way.

Running on the correct files #

Running mypy . in the root of your project will often not do what you need it to. You should play around, keeping an eye on Mypy output, to make sure Mypy is running on all the files that you want. This involves choosing:

Whether to specify the files to type-check as a package, a module, a directory, or a file path
Whether to specify a MYPYPATH
Whether to add the --namespace-packages option
What working directory to invoke Mypy from

Running mypy and managing imports is a helpful section of the documentation for getting this right. Pay extra attention when you are using namespace packages, packages without __init__.py files.

I’m writing this whilst v0.790 is the latest release. Simplifying the calling of Mypy, and its import handling is a current priority for the maintainers. See for example the umbrella issue, #8584 — Redesign import handling. Various improvements have already been merged to the master branch.

Following the right rules #

Once Mypy is running on the correct files, you’ll want to get it running the right checks for your codebase so that it passes whilst also checking what you want it to check. This may involve:

Making changes to your codebase to meet new rules that you want to enforce
Setting various strictness settings. For example: --no-implicit-optional, --disallow-untyped-defs, --no-strict-optional or the umbrella option --strict
Deciding which imported modules to treat as Any. Sometimes Mypy will complain that it can’t find a certain module or its stubs. This can be indicative that Mypy does not have access to these dependencies, which you should fix (see below), but can also mean the library doesn’t have any type stubs. For the latter case, it’s sensible to treat those modules as Any in a mypy.ini file:
```
# mypy.ini
    
[mypy]
# this section is required
# you can add a mypy_path here, if you need one.
  
# example of explicitly ignoring missing stubs
# for a dependency and its subpackages.
# This is safer than ignoring everything 
# with the --ignore-missing-imports option.

[mypy-pyproj.*]
ignore_missing_imports = True
```

You may even want to temporarily introduce errors in certain files to make sure Mypy will notice them. See also Mypy docs — No errors reported for obviously wrong code.

Bake it into a script #

Now that you know precisely how you want to call Mypy, create a script called run-mypy that captures the arguments you want to use. For example, in my case, I have a namespace package in the src/acme directory, and my script ended up looking like this:

#!/usr/bin/env bash

set -o errexit

# Change directory to the project root directory.
cd "$(dirname "$0")"

# Because I'm using namespace packages,
# I have used --package acme rather than using 
# the path 'src/acme', which would correctly
# collect my files but erroneously add 
# 'src/acme' to the Mypy search path.
# We only want 'src' in the path so that Mypy
# knows our modules by their fully qualified names.
mypy --package acme --namespace-packages

I also had to add a mypy_path in mypy.ini:

[mypy]
mypy_path=./src

Step 2: Creating our own Pre-commit hook #

Now that we know how to run Mypy for our project, we can think about running it in Pre-commit. First, a brief primer on how Pre-commit works so that we can consider what might go wrong.

How Pre-commit runs hooks #

Pre-commit installs each Python hook in a separate virtualenv. Before each commit, the list of staged files is passed to that hook. Any unstaged changes are stashed and only restored after all hooks have run.

Problem: Only running on changed files #

With Mypy, we probably don’t want to pass it just the list of changed files:

It will miss type errors resulting from but not occurring in the staged changes. For example: if you have changed the definition of a function but not a usage of that function in another file then the usage is now invalid, but won’t be checked.
As mentioned above, you may need more control over how Mypy is invoked anyway.
Mypy uses an Incremental Mode by default. It stores calculated type information so re-running on all files after only a few changes doesn’t take as long. For faster incremental runs, consider using a long-running Mypy daemon.

We’ll solve this by using our own run-mypy script and ignoring the file list that Pre-commit passes to it.

Problem: Running in an isolated virtualenv #

Mypy running in a separate virtualenv is also problematic, since it won’t have access to all the dependencies installed in your main development environment. This means it can’t type check usages of those dependencies. We’ll solve this in Step 3.

Setting up the hook #

We can solve both these problems with a properly-configured hook, which we’ll set up ourselves. To get started, create a new Repository-local hook by adding the following to your .pre-commit-config.yaml like so

# .pre-commit-config.yaml

repos
- repo: local
  hooks:
    - id: mypy
      name: mypy
      entry: "./run-mypy"
      language: python
      # use your preferred Python version
      language_version: python3.7
      additional_dependencies: ["mypy==0.790"]
      # trigger for commits changing Python files
      types: [python]
      # use require_serial so that script
      # is only called once per commit
      require_serial: true
      # print the number of files as a sanity-check
      verbose: true

Step 3: Giving the Mypy hook access to dependencies #

Mypy needs an environment where the dependencies are imported so that it can check for type-errors in their usage. Here’s a few options for doing that, with differing levels of convenience and speed:

Option 1: Use `language: system` to run Mypy in an existing environment #

Replace language: python in your hook definition with language: system. Remove the additional_dependencies line and install Mypy into your environment directly. Now, Pre-commit will not create a separate virtualenv for the hook and will run it in whatever environment you happen to be in when you run git commit or pre-commit run. This means you always run Mypy directly in your dev environment, but breaks if any of the developers on the project want to trigger Pre-commit from outside the dev environment. For example, this won’t work if using a GUI Git client, as the correct virtualenv probably won’t be activated.

Option 2: Point Mypy to a specific environment with `--python-executable` #

If it’s possible to automatically figure out the path to the appropriate Python interpreter (the one associated with the existing installation of your dependencies, which may or may not be in a virtual environment), then you can point Mypy to that path using the --python-executable option on mypy.

Option 3: Install specific dependencies with the `additional_dependencies` hook option #

If you only care about type-checking the usages of a few third-party modules, then you can install those specific modules into the hook environment like so:

# .pre-commit-config.yaml

repos
- repo: local
  hooks:
    - id: mypy
    name: mypy
    entry: "./run-mypy"
    language: python
    # Replace with appropriate version
    language_version: python3.7
    # install Mypy, and the dependencies
    additional_dependencies: 
      - "mypy==0.790"
      - "sructlog==20.1.0"
    types: [python]
    # use require_serial so that script
    # is only called once per commit
    require_serial: true
    # Print the number of files as sanity-check 
    verbose: true

This is relatively fast as Pre-commit remembers the dependencies it installed in the environment [source code]. The downside is this means duplicating your list of dependencies (at least those that have type stubs).

The additional_dependencies are just sent directly to Pip [source code] so you can happily add, for example, a --index-url argument in this array. Just be aware that Pre-commit will only re-run Pip when the list of additional_dependencies changes, so don’t expect to put “requirements.txt” in this array and have it figure out when you’ve changed that file.

Option 4: Running a full `pip install` in the hook #

This is not fast. The speed we mostly care about is that of running the hook on each commit, not its initial setup. However, running pip install takes many seconds even when the dependencies are already installed. It is, however, a pretty reliable and easy way to make sure your dependencies are installed if the performance hit is acceptable to you and your team.

To do this, simply add the appropriate pip command into your run-mypy script. For example:

#!/usr/bin/env bash

set -o errexit

# Change directory to the project root directory.
cd "$(dirname "$0")"

# Install the dependencies into the mypy environment.
# Note that this can take seconds to run.
pip install --editable . --no-input --quiet

mypy --package acme --namespace-packages

This is the option I have gone for so far since it minimises the likelihood of us making, mistakes without imposing many restrictions on local project setup. For example, teammates can develop in whatever Python environment they like.

Bonus step: Running in CI #

If you use Pre-commit locally it’s often a good idea to run pre-commit run -a in your CI pipeline. The setup I gave at the top of this post works fine in CI too. However, if you’d rather have a different Mypy setup in CI than locally, you can run Pre-commit with a SKIP environment variable in CI to skip the Mypy hook, and then run Mypy however you want in a separate CI job: SKIP=mypy pre-commit run -a. See Pre-commit - Temporarily disabling hooks.

Summary #

We have seen many potential issues of running Mypy in Pre-commit:

Changing one file may cause a type error in another file, so we need to run Mypy on all files, not just those that have changed
We need to give Mypy access to the installed dependencies of the code it is type-checking, otherwise it can’t check the usages of those dependencies
Flags that reduce the strictness of Mypy like --ignore-missing-imports can give us false confidence

We saw how to address these issues by making our own custom hook. There doesn’t appear to be a neat, one-size-fits-all solution, so it’s worth giving some thought to this set up in each instance.