itsdrike.com/content/posts/python-type-checking.md

450 lines
20 KiB
Markdown
Raw Permalink Normal View History

---
title: A guide to type checking in python
date: 2024-10-04
tags: [programming, python, typing]
sources:
- https://dev.to/decorator_factory/type-hints-in-python-tutorial-3pel
- https://docs.basedpyright.com/#/type-concepts
- https://mypy.readthedocs.io/en/stable/
- https://typing.readthedocs.io/en/latest/spec/special-types.html
---
Python is often known for its dynamic typing, which can be a drawback for those who prefer static typing due to its
benefits in catching bugs early and enhancing editor support. However, what many people don't know is that Python does
actually support specifying the types and it is even possible to enforce these types and work in a statically
type-checked Python environment. This article is an introduction to using Python in this way.
## Regular python
In regular python, you might end up writing a function like this:
```python
def add(x, y):
return x + y
```
In this code, you have no idea what the type of `x` and `y` arguments should be. So, even though you may have intended
for this function to only work with numbers (ints), it's actually entirely possible to use it with something else. For
example, running `add("hello", "world)` will return `"helloworld"` because the `+` operator works on strings too.
The point is, there's nothing telling you what the type of these parameters should be, and that could lead to
misunderstandings. Even though in some cases, you can judge what the type of these variables should be just based on
the name of that function, in most cases, it's not that easy to figure out and often requires looking through docs, or
just going over the code of that function.
Annoyingly, python doesn't even prevent you from passing in types that are definitely incorrect, like: `add(1, "hi")`.
Running this would cause a `TypeError`, but unless you have unit-tests that actually run that code, you won't find out
about this bug until it actually causes an issue and at that point, it might already be too late, since your code has
crashed a production app.
Clearly then, this isn't ideal.
## Type-hints
While python doesn't require it, it does have support for specifying "hints" that indicate what type should a given
variable have. So, when we take a look at the function above, adding type-hints to it would look like this:
```python
def add(x: int, y: int) -> int:
return x + y
```
We've now made the types very explicit to the programmer, which means they'll no longer need to spend a bunch of time
looking through the implementation of that function, or going through the documentation just to know how to use this
function. Instead, the type hints will tell just you.
This is incredibly useful, because most editors will be able to pick up these type hints, and show them to you while
calling the function, so you know what to pass right away, without even having to look at the function definition where
the type-hints are defined.
Not only that, specifying a type-hint will greatly improve the development experience in your editor / IDE, because
you'll get much better auto-completion. The thing is, if you have a parameter like `x`, but your editor doesn't know
what type it should have, it can't really help you if you start typing `x.remove`, looking for the `removeprefix`
function. However, if you tell your editor that `x` is a string (`x: str`), it will now be able to go through all of
the methods that strings have, and show you those that start with `remove` (being `removeprefix` and `removesuffix`).
This makes type-hints great at saving you time while developing, even though you have to do some additional work when
specifying them.
## Run-time behavior
Even though type-hints are a part of the Python language, the Python interpreter doesn't actually care about them. That
means that there isn't any optimizations or checking performed when you're running your code, so even with type hints
specified, they will not be enforced! This means that you can actually just choose to ignore them, and call the
function with incorrect types, like: `add(1, "hi")` without it causing any immediate runtime errors.
Most editors are configured very loosely when it comes to type-hints. That means they will show you these hints when
you're working with the function, but they won't produce warnings. That's why they're called "type hints", they're only
hints that can help you out, but they aren't actually enforced.
## Static type checking tools
Even though python on it's own indeed doesn't enforce the type-hints you specify, there are tools that can run static
checks against your code to check for type correctness.
{{< notice tip >}}
A static check is a check that works with your code in it's textual form. It will read the contents of your python
files without actually running that file and analyze it purely based on that text content.
{{< /notice >}}
Using these tools will allow you to analyze your code for typing mistakes before you ever even run your program. That
means having a function call like `add(1, "hi")` anywhere in your code would be detected and reported as an issue. This
is very similar to running a linter like [`flake8`](https://flake8.pycqa.org/en/latest/) or
[`ruff`](https://docs.astral.sh/ruff/).
Since running the type-checker manually could be quite annoying, so most of them have integrations with editors / IDEs,
which will allow you to see these errors immediately as you code. This makes it much easier to immediately notice any
type inconsistencies, which can help you catch or avoid a whole bunch of bugs.
### Most commonly used type checkers
- [**Pyright**](https://github.com/microsoft/pyright): Known for its speed and powerful features, it's written in
TypeScript and maintained by Microsoft.
- [**MyPy**](https://mypy.readthedocs.io/en/stable/): The most widely used type-checker, developed by the official
Python community. It's well integrated with most IDEs and tools, but it's known to be slow to adapt new features.
- [**PyType**](https://google.github.io/pytype/): Focuses on automatic type inference, making it suitable for codebases
with minimal type annotations.
- [**BasedPyright**](https://docs.basedpyright.com/): A fork of pyright with some additional features and enhancements,
my personal preference.
## When to use type hints?
Like you saw before with the `add` function, you can specify type-hints on functions, which allows you to describe what
types can be passed as parameters of that function alongside with specifying a return-type:
```python
def add(x: int, y: int) -> int:
...
```
You can also add type-hints directly to variables:
```python
my_variable: str = "hello"
```
That said, doing this is usually not necessary, since most type-checkers can "infer" what the type of `my_variable`
should be, based on the value it's set to have. However, in some cases, it can be worth adding the annotation, as the
inference might not be sufficient. Let's consider the following example:
```python
my_list = []
```
In here, a type-checker can infer that this is a `list`, but they can't recognize what kind of elements will this list
contain. That makes it worth it to specify a more specific type:
```python
my_list: list[int] = []
```
Now the type-checker will recognize that the elements inside of this list will be integers.
## Special types
While in most cases, it's fairly easy to annotate something with the usual types, like `int`, `str`, `list`, `set`, ...
in some cases, you might need some special types to represent certain types.
### None
This isn't very special at all, but it may be surprising for beginners at first. You've probably seen the `None` type
in python before, but what you may not realize is that if you don't add any return statements into your function, it
will automatically return a `None` value. That means if your function doesn't return anything, you should annotate it
as returning `None`:
```python
def my_func() -> None:
print("I'm a simple function, I just print something, but I don't explicitly return anything")
x = my_func()
assert x is None
```
### Union
A union type is a way to specify that a type can be one of multiple specified types, allowing flexibility while still
enforcing type safety.
There are multiple ways to specify a Union type. In modern versions of python (3.10+), you can do it like so:
```python
x: int | str = "string"
```
If you need to support older python versions, you can also using `typing.Union`, like so:
```python
from typing import Union
x: Union[int, str] = "string"
```
As an example this function takes a value that can be of various types, and parses it into a bool:
```python
def parse_bool_setting(value: str | int | bool) -> bool:
if isinstance(value, bool):
return value
if isinstance(value, int):
if value == 0:
return False
if value == 1:
return True
raise ValueError(f"Value {value} can't be converted to boolean")
# value can only be str now
if value.lower() in {"yes", "1", "true"}:
return True
if value.lower() in {"no", "0", "false"}:
return False
raise ValueError(f"Value {value} can't be converted to boolean")
```
One cool thing to notice here is that after the `isinstance` check, the type-checker will narrow down the type, so that
when inside of the block, it knows what type `value` has, but also outside of the block, the type-checker can narrow
the entire union and remove one of the variants since it was already handled. That's why at the end, we didn't need the
last `isinstance` check, the type checker knew the value was a string, because all the other options were already
handled.
### Any
In some cases, you might want to specify that your function can take in any type. This can be useful when annotating a
specific type could be way too complex / impossible, or you're working with something dynamic where you just don't care
about the typing information.
```python
from typing import Any
def foo(x: Any) -> None:
# a type checker won't warn you about accessing unknown attributes on Any types,
# it will just blindly allow anything
print(x.foobar)
```
{{< notice warning >}}
Don't over-use `Any` though, in vast majority of cases, it is not the right choice. I will touch more on it in the
section below, on using the `object` type.
{{< /notice >}}
The most appropriate use for the `Any` type is when you're returning some dynamic value from a function, where the
developer can confidently know what the type will be, but which is impossible for the type-checker to figure out,
because of the dynamic nature. For example:
```python
from typing import Any
global_state = {}
def get_state_variable(name: str) -> Any:
return global_state[name]
global_state["name"] = "Ian"
global_state["surname"] = "McKellen"
global_state["age"] = 85
###
# Notice that we specified the annotation here manually, so that the type-checker will know
# what type we're working with. But we only know this type because we know what we stored in
# our dynamic state, so the function itself can't know what type to give us
full_name: str = get_state_variable("name") + " " + get_state_variable("surname")
```
### object
In many cases where you don't care about what type is passed in, people mistakenly use `typing.Any` when they should
use `object` instead. Object is a class that every other class subclasses. That means every value is an `object`.
The difference between doing `x: object` and `x: Any` is that with `Any`, the type-checker will essentially avoid
performing any checks whatsoever. That will mean that you can do whatever you want with such a variable, like access a
parameter that might not exist (`y = x.foobar`) and since the type-checker doesn't know about it, `y` will now also be
considered as `Any`. With `object`, even though you can still assign any value to such a variable, the type checker
will now only allow you to access attributes that are shared to all objects in python. That way, you can make sure that
you don't do something that not all types support, when your function is expected to work with all types.
For example:
```python
def do_stuff(x: object) -> None:
print(f"The do_stuff function is now working with: {x}")
if isinstance(x, str):
# We can still narrow the type down to a more specific type, now the type-checker
# knows `x` is a string, and we can do some more things, that strings support, like:
print(x.removeprefix("hello"))
if x > 5: # A type-checker will mark this as an error, because not all types support comparison against ints
print("It's bigger than 5")
```
### Collection types
Python also provides some types to represent various collections. We've already seen the built-in `list` collection
type before. Another such built-in collection types are `tuple`, `set`, `forzenset` and `dict`. All of these types are
what we call "generic", which means that we can specify an internal type, which in this case represents the items that
these collections can hold, like `list[int]`.
Here's a quick example of using these generic collection types:
```python
def print_items(lst: list[str]) -> None:
for index, item in enumerate(lst):
# The type-checker knows `item` variable is a string now
print(f"-> Item #{index}: {item.strip()}")
print_items([1, 2, 3])
```
That said, in many cases, instead of using these specific collection types, you can use a less specific collection, so
that your function will work with multiple kinds of collections. Python has abstract classes for general collections
inside of the `collections.abc` module. One example would be the `Sequence` type:
```python
from collections.abc import Sequence
def print_items2(lst: Sequence[str]) -> None:
for index, item in enumerate(lst):
# The type-checker knows `item` variable is a string now
print(f"Item #{index}: {item.strip()}")
print_items([1, 2, 3]) # fine
print_items((1, 2, 3)) # nope
print_items2([1, 2, 3]) # works
print_items2((1, 2, 3)) # works
print_items2({1, 2, 3}) # works
```
You may think that you could also just use a union like: `list[str] | set[str] | tuple[str, ...]`, however that still
wouldn't quite cover everything, since people can actually make their own custom classes that have `__getitem__` and
work like a sequence, yet doesn't inherit from `list` or any of the other built-in types. By specifying
`collections.abc.Sequence` type-hint, even these custom classes that behave like sequences will work with your function.
There are various other collections classes like these and it would take pretty long to explain them all here, so you
should do some research on them on your own to know what's available.
{{< notice warning >}}
It is important to note that the built-in collection types like `list` weren't subscriptable in earlier versions of
python (before 3.9). If you still need to maintain compatibility with such older python versions, you can instead use
`typing.List`, `typing.Tuple`, `typing.Set` and `typing.Dict`. These types will support being subscripted even in those
older versions.
Similarly, this also applies to the `collections.abc` abstract types, like `Sequence`, which also wasn't subscriptable
in these python versions. These also have alternatives in `typing` module: `typing.Sequence`, `typing.Mapping`,
`typing.MutableSequence`, `typing.Iterable`, ...
{{< /notice >}}
#### Tuple type
Python tuples are a bit more complicated than the other collection types, since we can specify which type is at which
position of the tuple. For example: `tuple[int, str, float]` will represent a tuple like: `(1, "hi", 5.3)`. The tricky
thing here is that specifying `tuple[int]` will not mean a tuple of integers, it will mean a tuple with a single
integer: `(1, )`. If you do need to specify a tuple with any amount of items of the same type, what you actually need
to do is: `tuple[int, ...]`. This annotation will work for `(1, )` or `(1, 1, 1)` or `(1, 1, 1, 1, 1)`.
The reason for this is that we often use tuples to allow returning multiple values from a function. Yet these values
usually don't have the same type, so it's very useful to be able to specify these types individually:
```python
def some_func() -> tuple[int, str]:
return 1, "hello"
```
That said, a tuple can also be useful as a sequence type, with the major difference between it and a list being that
tuples are immutable. This can make them more appropriate for storing certain sequences than lists.
## Type casts
Casting is a way to explicitly specify the type of a variable, overriding the type inferred by the type-checker.
This can be very useful, as sometimes, we programmers have more information than the type-checker does, especially when
it comes to some dynamic logic that is hard to statically evaluate. The type checker's inference may end up being too
broad or sometimes even incorrect.
For example:
```python
from typing import cast
my_list: list[str | int] = []
my_list.append("Foo")
my_list.append(10)
my_list.append("Bar")
# We know that the first item in the list is a string
# the type-checker would otherwise infer `x: str | int`
x = cast(str, my_list[0])
```
Another example:
```python
from typing import cast
def foo(obj: object, type_name: str) -> None:
if type_name == "int":
obj = cast(int, obj)
... # some logic
elif type_name == "str":
obj = cast(str, obj)
... # some logic
else:
raise ValueError(f"Unknown type name: {type_name}")
```
{{< notice warning >}}
It is important to mention that unlike the casts in languages like Java or C#, in Python, type casts do not perform any
runtime checks to ensure that the variable really is what we claim it to be. Casts are only used as a hint to the
type-checker, and on runtime, the `cast` function just returns the value back without any extra logic.
If you do wish to also perform a runtime check, you can use assertions to narrow the type:
```python
def foo(obj: object) -> None:
print(obj + 1) # can't add 'object' and 'int'
assert isinstance(obj, int)
print(obj + 1) # works
```
Alternatively, you can just check with if statements:
```python
def foo(obj: object) -> None:
print(obj + 1) # can't add 'object' and 'int'
if not isinstance(obj, int):
raise TypeError("Expected int")
print(obj + 1) # works
```
{{< /notice >}}
## Closing notes
In summary, Pythons type hints are a powerful tool for improving code clarity, reliability, and development
experience. By adding type annotations to your functions and variables, you provide valuable information to both your
IDE and fellow developers, helping to catch potential bugs early and facilitating easier code maintenance.
Type hints offer significant benefits:
- Enhanced Readability: Clearly specifies the expected types of function parameters and return values, making the code
more self-documenting.
- Improved Development Experience: Provides better auto-completion and in-editor type checking, helping you avoid
errors and speeding up development.
- Early Error Detection: Static type checkers can catch type-related issues before runtime, reducing the risk of bugs
making it into production.
For further exploration of Pythons type hints and their applications, you can refer to additional resources such as:
- The [Type Hinting Cheat Sheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html) from mypy for a quick
reference on various type hints and their usage.
- My other articles on more advanced typing topics like [TypeVars]({{< ref "posts/type-vars" >}}) and [Generics]({{< ref
"posts/generics-and-variance" >}}) for deeper insights into Python's typing system.
Embracing type hints can elevate your Python programming experience, making your code more robust and maintainable in
the long run.