mirror of
https://github.com/ItsDrike/itsdrike.com.git
synced 2025-06-30 00:20:43 +00:00
Add some new posts about typing & update the old one
This commit is contained in:
parent
26a1bf83ff
commit
3c8e6b8d65
4 changed files with 1266 additions and 670 deletions
449
content/posts/python-type-checking.md
Normal file
449
content/posts/python-type-checking.md
Normal file
|
@ -0,0 +1,449 @@
|
|||
---
|
||||
title: A guide to type checking in python
|
||||
date: 2024-10-04
|
||||
tags: [programming, python, typing]
|
||||
sources:
|
||||
- https://dev.to/decorator_factory/type-hints-in-python-tutorial-3pel
|
||||
- https://docs.basedpyright.com/#/type-concepts
|
||||
- https://mypy.readthedocs.io/en/stable/
|
||||
- https://typing.readthedocs.io/en/latest/spec/special-types.html
|
||||
---
|
||||
|
||||
Python is often known for its dynamic typing, which can be a drawback for those who prefer static typing due to its
|
||||
benefits in catching bugs early and enhancing editor support. However, what many people don't know is that Python does
|
||||
actually support specifying the types and it is even possible to enforce these types and work in a statically
|
||||
type-checked Python environment. This article is an introduction to using Python in this way.
|
||||
|
||||
## Regular python
|
||||
|
||||
In regular python, you might end up writing a function like this:
|
||||
|
||||
```python
|
||||
def add(x, y):
|
||||
return x + y
|
||||
```
|
||||
|
||||
In this code, you have no idea what the type of `x` and `y` arguments should be. So, even though you may have intended
|
||||
for this function to only work with numbers (ints), it's actually entirely possible to use it with something else. For
|
||||
example, running `add("hello", "world)` will return `"helloworld"` because the `+` operator works on strings too.
|
||||
|
||||
The point is, there's nothing telling you what the type of these parameters should be, and that could lead to
|
||||
misunderstandings. Even though in some cases, you can judge what the type of these variables should be just based on
|
||||
the name of that function, in most cases, it's not that easy to figure out and often requires looking through docs, or
|
||||
just going over the code of that function.
|
||||
|
||||
Annoyingly, python doesn't even prevent you from passing in types that are definitely incorrect, like: `add(1, "hi")`.
|
||||
Running this would cause a `TypeError`, but unless you have unit-tests that actually run that code, you won't find out
|
||||
about this bug until it actually causes an issue and at that point, it might already be too late, since your code has
|
||||
crashed a production app.
|
||||
|
||||
Clearly then, this isn't ideal.
|
||||
|
||||
## Type-hints
|
||||
|
||||
While python doesn't require it, it does have support for specifying "hints" that indicate what type should a given
|
||||
variable have. So, when we take a look at the function above, adding type-hints to it would look like this:
|
||||
|
||||
```python
|
||||
def add(x: int, y: int) -> int:
|
||||
return x + y
|
||||
```
|
||||
|
||||
We've now made the types very explicit to the programmer, which means they'll no longer need to spend a bunch of time
|
||||
looking through the implementation of that function, or going through the documentation just to know how to use this
|
||||
function. Instead, the type hints will tell just you.
|
||||
|
||||
This is incredibly useful, because most editors will be able to pick up these type hints, and show them to you while
|
||||
calling the function, so you know what to pass right away, without even having to look at the function definition where
|
||||
the type-hints are defined.
|
||||
|
||||
Not only that, specifying a type-hint will greatly improve the development experience in your editor / IDE, because
|
||||
you'll get much better auto-completion. The thing is, if you have a parameter like `x`, but your editor doesn't know
|
||||
what type it should have, it can't really help you if you start typing `x.remove`, looking for the `removeprefix`
|
||||
function. However, if you tell your editor that `x` is a string (`x: str`), it will now be able to go through all of
|
||||
the methods that strings have, and show you those that start with `remove` (being `removeprefix` and `removesuffix`).
|
||||
|
||||
This makes type-hints great at saving you time while developing, even though you have to do some additional work when
|
||||
specifying them.
|
||||
|
||||
## Run-time behavior
|
||||
|
||||
Even though type-hints are a part of the Python language, the Python interpreter doesn't actually care about them. That
|
||||
means that there isn't any optimizations or checking performed when you're running your code, so even with type hints
|
||||
specified, they will not be enforced! This means that you can actually just choose to ignore them, and call the
|
||||
function with incorrect types, like: `add(1, "hi")` without it causing any immediate runtime errors.
|
||||
|
||||
Most editors are configured very loosely when it comes to type-hints. That means they will show you these hints when
|
||||
you're working with the function, but they won't produce warnings. That's why they're called "type hints", they're only
|
||||
hints that can help you out, but they aren't actually enforced.
|
||||
|
||||
## Static type checking tools
|
||||
|
||||
Even though python on it's own indeed doesn't enforce the type-hints you specify, there are tools that can run static
|
||||
checks against your code to check for type correctness.
|
||||
|
||||
{{< notice tip >}}
|
||||
A static check is a check that works with your code in it's textual form. It will read the contents of your python
|
||||
files without actually running that file and analyze it purely based on that text content.
|
||||
{{< /notice >}}
|
||||
|
||||
Using these tools will allow you to analyze your code for typing mistakes before you ever even run your program. That
|
||||
means having a function call like `add(1, "hi")` anywhere in your code would be detected and reported as an issue. This
|
||||
is very similar to running a linter like [`flake8`](https://flake8.pycqa.org/en/latest/) or
|
||||
[`ruff`](https://docs.astral.sh/ruff/).
|
||||
|
||||
Since running the type-checker manually could be quite annoying, so most of them have integrations with editors / IDEs,
|
||||
which will allow you to see these errors immediately as you code. This makes it much easier to immediately notice any
|
||||
type inconsistencies, which can help you catch or avoid a whole bunch of bugs.
|
||||
|
||||
### Most commonly used type checkers
|
||||
|
||||
- [**Pyright**](https://github.com/microsoft/pyright): Known for its speed and powerful features, it's written in
|
||||
TypeScript and maintained by Microsoft.
|
||||
- [**MyPy**](https://mypy.readthedocs.io/en/stable/): The most widely used type-checker, developed by the official
|
||||
Python community. It's well integrated with most IDEs and tools, but it's known to be slow to adapt new features.
|
||||
- [**PyType**](https://google.github.io/pytype/): Focuses on automatic type inference, making it suitable for codebases
|
||||
with minimal type annotations.
|
||||
- [**BasedPyright**](https://docs.basedpyright.com/): A fork of pyright with some additional features and enhancements,
|
||||
my personal preference.
|
||||
|
||||
## When to use type hints?
|
||||
|
||||
Like you saw before with the `add` function, you can specify type-hints on functions, which allows you to describe what
|
||||
types can be passed as parameters of that function alongside with specifying a return-type:
|
||||
|
||||
```python
|
||||
def add(x: int, y: int) -> int:
|
||||
...
|
||||
```
|
||||
|
||||
You can also add type-hints directly to variables:
|
||||
|
||||
```python
|
||||
my_variable: str = "hello"
|
||||
```
|
||||
|
||||
That said, doing this is usually not necessary, since most type-checkers can "infer" what the type of `my_variable`
|
||||
should be, based on the value it's set to have. However, in some cases, it can be worth adding the annotation, as the
|
||||
inference might not be sufficient. Let's consider the following example:
|
||||
|
||||
```python
|
||||
my_list = []
|
||||
```
|
||||
|
||||
In here, a type-checker can infer that this is a `list`, but they can't recognize what kind of elements will this list
|
||||
contain. That makes it worth it to specify a more specific type:
|
||||
|
||||
```python
|
||||
my_list: list[int] = []
|
||||
```
|
||||
|
||||
Now the type-checker will recognize that the elements inside of this list will be integers.
|
||||
|
||||
## Special types
|
||||
|
||||
While in most cases, it's fairly easy to annotate something with the usual types, like `int`, `str`, `list`, `set`, ...
|
||||
in some cases, you might need some special types to represent certain types.
|
||||
|
||||
### None
|
||||
|
||||
This isn't very special at all, but it may be surprising for beginners at first. You've probably seen the `None` type
|
||||
in python before, but what you may not realize is that if you don't add any return statements into your function, it
|
||||
will automatically return a `None` value. That means if your function doesn't return anything, you should annotate it
|
||||
as returning `None`:
|
||||
|
||||
```python
|
||||
def my_func() -> None:
|
||||
print("I'm a simple function, I just print something, but I don't explicitly return anything")
|
||||
|
||||
|
||||
x = my_func()
|
||||
assert x is None
|
||||
```
|
||||
|
||||
### Union
|
||||
|
||||
A union type is a way to specify that a type can be one of multiple specified types, allowing flexibility while still
|
||||
enforcing type safety.
|
||||
|
||||
There are multiple ways to specify a Union type. In modern versions of python (3.10+), you can do it like so:
|
||||
|
||||
```python
|
||||
x: int | str = "string"
|
||||
```
|
||||
|
||||
If you need to support older python versions, you can also using `typing.Union`, like so:
|
||||
|
||||
```python
|
||||
from typing import Union
|
||||
|
||||
x: Union[int, str] = "string"
|
||||
```
|
||||
|
||||
As an example this function takes a value that can be of various types, and parses it into a bool:
|
||||
|
||||
```python
|
||||
def parse_bool_setting(value: str | int | bool) -> bool:
|
||||
if isinstance(value, bool):
|
||||
return value
|
||||
|
||||
if isinstance(value, int):
|
||||
if value == 0:
|
||||
return False
|
||||
if value == 1:
|
||||
return True
|
||||
raise ValueError(f"Value {value} can't be converted to boolean")
|
||||
|
||||
# value can only be str now
|
||||
if value.lower() in {"yes", "1", "true"}:
|
||||
return True
|
||||
if value.lower() in {"no", "0", "false"}:
|
||||
return False
|
||||
raise ValueError(f"Value {value} can't be converted to boolean")
|
||||
```
|
||||
|
||||
One cool thing to notice here is that after the `isinstance` check, the type-checker will narrow down the type, so that
|
||||
when inside of the block, it knows what type `value` has, but also outside of the block, the type-checker can narrow
|
||||
the entire union and remove one of the variants since it was already handled. That's why at the end, we didn't need the
|
||||
last `isinstance` check, the type checker knew the value was a string, because all the other options were already
|
||||
handled.
|
||||
|
||||
### Any
|
||||
|
||||
In some cases, you might want to specify that your function can take in any type. This can be useful when annotating a
|
||||
specific type could be way too complex / impossible, or you're working with something dynamic where you just don't care
|
||||
about the typing information.
|
||||
|
||||
```python
|
||||
from typing import Any
|
||||
|
||||
def foo(x: Any) -> None:
|
||||
# a type checker won't warn you about accessing unknown attributes on Any types,
|
||||
# it will just blindly allow anything
|
||||
print(x.foobar)
|
||||
```
|
||||
|
||||
{{< notice warning >}}
|
||||
Don't over-use `Any` though, in vast majority of cases, it is not the right choice. I will touch more on it in the
|
||||
section below, on using the `object` type.
|
||||
{{< /notice >}}
|
||||
|
||||
The most appropriate use for the `Any` type is when you're returning some dynamic value from a function, where the
|
||||
developer can confidently know what the type will be, but which is impossible for the type-checker to figure out,
|
||||
because of the dynamic nature. For example:
|
||||
|
||||
```python
|
||||
from typing import Any
|
||||
|
||||
global_state = {}
|
||||
|
||||
def get_state_variable(name: str) -> Any:
|
||||
return global_state[name]
|
||||
|
||||
|
||||
global_state["name"] = "Ian"
|
||||
global_state["surname"] = "McKellen"
|
||||
global_state["age"] = 85
|
||||
|
||||
|
||||
###
|
||||
|
||||
|
||||
# Notice that we specified the annotation here manually, so that the type-checker will know
|
||||
# what type we're working with. But we only know this type because we know what we stored in
|
||||
# our dynamic state, so the function itself can't know what type to give us
|
||||
full_name: str = get_state_variable("name") + " " + get_state_variable("surname")
|
||||
```
|
||||
|
||||
### object
|
||||
|
||||
In many cases where you don't care about what type is passed in, people mistakenly use `typing.Any` when they should
|
||||
use `object` instead. Object is a class that every other class subclasses. That means every value is an `object`.
|
||||
|
||||
The difference between doing `x: object` and `x: Any` is that with `Any`, the type-checker will essentially avoid
|
||||
performing any checks whatsoever. That will mean that you can do whatever you want with such a variable, like access a
|
||||
parameter that might not exist (`y = x.foobar`) and since the type-checker doesn't know about it, `y` will now also be
|
||||
considered as `Any`. With `object`, even though you can still assign any value to such a variable, the type checker
|
||||
will now only allow you to access attributes that are shared to all objects in python. That way, you can make sure that
|
||||
you don't do something that not all types support, when your function is expected to work with all types.
|
||||
|
||||
For example:
|
||||
|
||||
```python
|
||||
def do_stuff(x: object) -> None:
|
||||
print(f"The do_stuff function is now working with: {x}")
|
||||
|
||||
if isinstance(x, str):
|
||||
# We can still narrow the type down to a more specific type, now the type-checker
|
||||
# knows `x` is a string, and we can do some more things, that strings support, like:
|
||||
print(x.removeprefix("hello"))
|
||||
|
||||
if x > 5: # A type-checker will mark this as an error, because not all types support comparison against ints
|
||||
print("It's bigger than 5")
|
||||
```
|
||||
|
||||
### Collection types
|
||||
|
||||
Python also provides some types to represent various collections. We've already seen the built-in `list` collection
|
||||
type before. Another such built-in collection types are `tuple`, `set`, `forzenset` and `dict`. All of these types are
|
||||
what we call "generic", which means that we can specify an internal type, which in this case represents the items that
|
||||
these collections can hold, like `list[int]`.
|
||||
|
||||
Here's a quick example of using these generic collection types:
|
||||
|
||||
```python
|
||||
def print_items(lst: list[str]) -> None:
|
||||
for index, item in enumerate(lst):
|
||||
# The type-checker knows `item` variable is a string now
|
||||
print(f"-> Item #{index}: {item.strip()}")
|
||||
|
||||
print_items([1, 2, 3])
|
||||
```
|
||||
|
||||
That said, in many cases, instead of using these specific collection types, you can use a less specific collection, so
|
||||
that your function will work with multiple kinds of collections. Python has abstract classes for general collections
|
||||
inside of the `collections.abc` module. One example would be the `Sequence` type:
|
||||
|
||||
```python
|
||||
from collections.abc import Sequence
|
||||
|
||||
def print_items2(lst: Sequence[str]) -> None:
|
||||
for index, item in enumerate(lst):
|
||||
# The type-checker knows `item` variable is a string now
|
||||
print(f"Item #{index}: {item.strip()}")
|
||||
|
||||
print_items([1, 2, 3]) # fine
|
||||
print_items((1, 2, 3)) # nope
|
||||
|
||||
print_items2([1, 2, 3]) # works
|
||||
print_items2((1, 2, 3)) # works
|
||||
print_items2({1, 2, 3}) # works
|
||||
```
|
||||
|
||||
You may think that you could also just use a union like: `list[str] | set[str] | tuple[str, ...]`, however that still
|
||||
wouldn't quite cover everything, since people can actually make their own custom classes that have `__getitem__` and
|
||||
work like a sequence, yet doesn't inherit from `list` or any of the other built-in types. By specifying
|
||||
`collections.abc.Sequence` type-hint, even these custom classes that behave like sequences will work with your function.
|
||||
|
||||
There are various other collections classes like these and it would take pretty long to explain them all here, so you
|
||||
should do some research on them on your own to know what's available.
|
||||
|
||||
{{< notice warning >}}
|
||||
It is important to note that the built-in collection types like `list` weren't subscriptable in earlier versions of
|
||||
python (before 3.9). If you still need to maintain compatibility with such older python versions, you can instead use
|
||||
`typing.List`, `typing.Tuple`, `typing.Set` and `typing.Dict`. These types will support being subscripted even in those
|
||||
older versions.
|
||||
|
||||
Similarly, this also applies to the `collections.abc` abstract types, like `Sequence`, which also wasn't subscriptable
|
||||
in these python versions. These also have alternatives in `typing` module: `typing.Sequence`, `typing.Mapping`,
|
||||
`typing.MutableSequence`, `typing.Iterable`, ...
|
||||
{{< /notice >}}
|
||||
|
||||
#### Tuple type
|
||||
|
||||
Python tuples are a bit more complicated than the other collection types, since we can specify which type is at which
|
||||
position of the tuple. For example: `tuple[int, str, float]` will represent a tuple like: `(1, "hi", 5.3)`. The tricky
|
||||
thing here is that specifying `tuple[int]` will not mean a tuple of integers, it will mean a tuple with a single
|
||||
integer: `(1, )`. If you do need to specify a tuple with any amount of items of the same type, what you actually need
|
||||
to do is: `tuple[int, ...]`. This annotation will work for `(1, )` or `(1, 1, 1)` or `(1, 1, 1, 1, 1)`.
|
||||
|
||||
The reason for this is that we often use tuples to allow returning multiple values from a function. Yet these values
|
||||
usually don't have the same type, so it's very useful to be able to specify these types individually:
|
||||
|
||||
```python
|
||||
def some_func() -> tuple[int, str]:
|
||||
return 1, "hello"
|
||||
```
|
||||
|
||||
That said, a tuple can also be useful as a sequence type, with the major difference between it and a list being that
|
||||
tuples are immutable. This can make them more appropriate for storing certain sequences than lists.
|
||||
|
||||
## Type casts
|
||||
|
||||
Casting is a way to explicitly specify the type of a variable, overriding the type inferred by the type-checker.
|
||||
|
||||
This can be very useful, as sometimes, we programmers have more information than the type-checker does, especially when
|
||||
it comes to some dynamic logic that is hard to statically evaluate. The type checker's inference may end up being too
|
||||
broad or sometimes even incorrect.
|
||||
|
||||
For example:
|
||||
|
||||
```python
|
||||
from typing import cast
|
||||
|
||||
my_list: list[str | int] = []
|
||||
my_list.append("Foo")
|
||||
my_list.append(10)
|
||||
my_list.append("Bar")
|
||||
|
||||
# We know that the first item in the list is a string
|
||||
# the type-checker would otherwise infer `x: str | int`
|
||||
x = cast(str, my_list[0])
|
||||
```
|
||||
|
||||
Another example:
|
||||
|
||||
```python
|
||||
from typing import cast
|
||||
|
||||
def foo(obj: object, type_name: str) -> None:
|
||||
if type_name == "int":
|
||||
obj = cast(int, obj)
|
||||
... # some logic
|
||||
elif type_name == "str":
|
||||
obj = cast(str, obj)
|
||||
... # some logic
|
||||
else:
|
||||
raise ValueError(f"Unknown type name: {type_name}")
|
||||
```
|
||||
|
||||
{{< notice warning >}}
|
||||
It is important to mention that unlike the casts in languages like Java or C#, in Python, type casts do not perform any
|
||||
runtime checks to ensure that the variable really is what we claim it to be. Casts are only used as a hint to the
|
||||
type-checker, and on runtime, the `cast` function just returns the value back without any extra logic.
|
||||
|
||||
If you do wish to also perform a runtime check, you can use assertions to narrow the type:
|
||||
|
||||
```python
|
||||
def foo(obj: object) -> None:
|
||||
print(obj + 1) # can't add 'object' and 'int'
|
||||
assert isinstance(obj, int)
|
||||
print(obj + 1) # works
|
||||
```
|
||||
|
||||
Alternatively, you can just check with if statements:
|
||||
|
||||
```python
|
||||
def foo(obj: object) -> None:
|
||||
print(obj + 1) # can't add 'object' and 'int'
|
||||
if not isinstance(obj, int):
|
||||
raise TypeError("Expected int")
|
||||
print(obj + 1) # works
|
||||
```
|
||||
|
||||
{{< /notice >}}
|
||||
|
||||
## Closing notes
|
||||
|
||||
In summary, Python’s type hints are a powerful tool for improving code clarity, reliability, and development
|
||||
experience. By adding type annotations to your functions and variables, you provide valuable information to both your
|
||||
IDE and fellow developers, helping to catch potential bugs early and facilitating easier code maintenance.
|
||||
|
||||
Type hints offer significant benefits:
|
||||
|
||||
- Enhanced Readability: Clearly specifies the expected types of function parameters and return values, making the code
|
||||
more self-documenting.
|
||||
- Improved Development Experience: Provides better auto-completion and in-editor type checking, helping you avoid
|
||||
errors and speeding up development.
|
||||
- Early Error Detection: Static type checkers can catch type-related issues before runtime, reducing the risk of bugs
|
||||
making it into production.
|
||||
|
||||
For further exploration of Python’s type hints and their applications, you can refer to additional resources such as:
|
||||
|
||||
- The [Type Hinting Cheat Sheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html) from mypy for a quick
|
||||
reference on various type hints and their usage.
|
||||
- My other articles on more advanced typing topics like [TypeVars]({{< ref "posts/type-vars" >}}) and [Generics]({{< ref
|
||||
"posts/generics-and-variance" >}}) for deeper insights into Python's typing system.
|
||||
|
||||
Embracing type hints can elevate your Python programming experience, making your code more robust and maintainable in
|
||||
the long run.
|
Loading…
Add table
Add a link
Reference in a new issue