Add post about making great commits

This commit is contained in:
ItsDrike 2023-04-17 15:38:01 +02:00
parent 43615f816c
commit 52fdfc5c2a
Signed by: ItsDrike
GPG key ID: FA2745890B7048C0

View file

@ -0,0 +1,417 @@
---
title: Making great commits
date: 2023-04-17
tags: [programming, git]
sources:
- <https://chris.beams.io/posts/git-commit/>
- <https://dhwthompson.com/2019/my-favourite-git-commit>
- <https://dev.to/samuelfaure/how-atomic-git-commits-dramatically-increased-my-productivity-and-will-increase-yours-too-4a84>
- <https://thoughtbot.com/blog/5-useful-tips-for-a-better-commit-message>
- <https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html>
---
A well-structured git log is key to project's maintainability; it provides insight into when and why things were done,
for future maintainers of the project, ... and yet, so many people pay very little attention to how their commits are
structured.
The problem isn't necessarily that they don't even attempt to write good commit messages, it's that the commit they
made is not actually easy to compose a commit message for.
Another, perhaps even bigger issue is that a lot of people don't even know that there's a reason to care about their
git history, because they simply don't see a benefit in it. The problem with this argument is that these people have
simple never explored git enough, and therefore aren't even familiar with the benefits they could gain.
So then, in this post, I'll try to explain both what are the benefits that you can get, and how to make your commits
clean and easy to read and find in git history later on.
## Commit message
The purpose of every commit is always to simply represent some change that was made in the source code.
The commit message should then describe this change, however what many people get wrong is that they just state
**what** was changed, without explaining **why** it was changed. There is always a reason for why a change is made, and
while the contents of the commit (being the actual changes made in the code - diff) can tell you what was done, the
only way to figure out why it was done, is through the commit message.
Therefore, when thinking of a good commit message, you should always ask yourself not just "What does this commit
change?", but also, and perhaps more importantly, ask "Why is this change necessary?" and "What does this change
achieve?".
Knowing why something was added can then be incredibly beneficial for someone looking at `git blame`, which allows you
to find out the commit that was responsible for adding/modifying any particular line. In vast majority of cases, when
you look at git blame, you're not interested in what that single line of code is doing, but rather why it's even there.
Without having this information in the commit itself, you'd likely have to go look for the actual pull request that
added that commit, and read it's description, which might not even contain that reason anyway.
### Commit isn't just the first line
A huge amount of people are used to committing changes with a simple `git commit -m "My message"`, and while this is
enough and it's perfectly in many cases, sometimes you just need more space to describe what a change truly achieves.
Surprisingly, many people don't even know that they can make a commit that has more in it's message than just the
title/first line, which then leads to poorly documented changes, because single line sometimes simply isn't enough. To
create a commit with a bigger commit message, you can simply run `git commit` without the `-m` argument. This should
open your terminal text editor, allowing you to write out the message in multiple lines.
{{< notice tip >}}
I'd actually recommend making the simple `git commit` the default way you make new commits, since it invites you to
write more about it, by just seeing that you have that space available. We usually don't even know what exactly we'll
write in our new commit message before getting to typing it out, and knowing you have that extra space if you need it
will naturally lead to using it, even if you didn't know you needed it ahead of time.
{{< /notice >}}
That said, not every commit requires both a subject and a body, sometimes a single line is fine, especially when the
change is so simple that no further context is necessary, and including some would just waste the readers time. For
example:
```markdown
Fix typo in README
```
In this case, there's no need for anything extra. Some people like to include what the typo was, but if you want to know
that, you can use `git show` or `git diff`, or `git log --patch`, showing you the actual changes made to the code, so
this information isn't necessary either. So, while in some cases, having extra context can be very valuable, you also
shouldn't overdo it.
### Make commits searchable
It can be very beneficial to include some keywords that people could then easily find this commit by, when searching
for changes in the codebase. As an example, you can include the name of an exception, such as `InvalidDataStreamError`,
if your commit addresses a bug that causes this exception.
You can then add an explanation on why this error was getting raised, and why your change fixed that. With that, anyone
who found your commit by searching for this exception can immediately find out what this exception is, why was it
getting raised and what to do to fix it.
This is especially useful with internal API, whether it's custom exceptions, or just functions or names of classes.
People don't search the commit history very often, but if you do encounter a case where you think someone might perform
a search for at some point, it's worth it to make it as easy for them as you can.
### Make it exciting to read
I sometimes find myself going through random commit messages of a project, just to see what is the development like,
and explore what are the kinds of changes being introduced. Even more often, I look there to quickly see what was
changed, to bring myself up to date with the project.
When doing this, I'm always super thankful to people who took the time to for example include the debug process of how
they figured out X was an issue, or where they explain some strange behavior that you might not expect to be happening.
These kinds of commits make the history a fun place to go and read, and it allows you to teach someone something about
the language, the project, or programming in general, making everyone in your team a bit smarter!
### Follow the proper message structure
Git commits should be written in a very specific way. There's a few rules to follow:
- **Separate the subject/title from body with a blank line** (Especially useful when looking at `git log --oneline`,
as without the blank line, lines below are considered as parts of the same paragraph, and shown together)
- **Limit the subject line to 50 characters** (Not a hard limit, but try not going that much longer. This limit ensures
readability, and forces the author to think about the most concise way to explain what's going on. Note: If you're
having trouble summarizing, you might be committing too much at once)
- **Capitalize the subject line**
- **Don't end the subject line with a period**
- \*Use imperative mood in subject\*\* (Imperative mood means "written as if giving a command/instruction" i.e.: "Add
support for X", not "I added support for X" or "Support for X was added", as a rule of thumb, a subject message
should be able to complete the sentence: "If implemented, this commit will ...")
- **Wrap body at 72 characters** (We usually use `git log` to print out the commits into the terminal, but it's output
isn't wrapped, and going over the terminals width can cause a pretty messy output. The recommended maximum width for
terminal text output is 80 characters, but git tools can often add indents, so 72 characters is a pretty sensible
maximum)
- **Mention the "what" and the "why", but not the "how"** (A commit message shouldn't contain implementation details,
if people want to see those, whey should look at the changed code diff directly)
If you want to, you can consider using markdown in your commit message, as most other programmers will understand it as
it's a commonly used format, and it's a great way to bring in some more style, improving readability. In fact, if you
view the commit from a site like GitHub, it will even render the markdown properly for you.
For example:
```markdown
Summarize changes in around 50 characters or less
More detailed explanatory text, if necessary. Wrap it to about 72
characters or so. In some contexts, the first line is treated as the
subject of the commit and the rest of the text as the body. The
blank line separating the summary from the body is critical (unless
you omit the body entirely); various tools like `log`, `shortlog`
and `rebase` can get confused if you run the two together.
Explain the problem that this commit is solving. Focus on why you
are making this change as opposed to how (the code explains that).
Are there side effects or other unintuitive consequences of this
change? Here's the place to explain them.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, preceded
by a single space, with blank lines in between, but conventions
vary here
If you use an issue tracker, put references to them at the bottom,
like this:
Resolves: #123
See also: #456, #789
```
## Make "atomic" commits
_Atomic: of or forming a single irreducible unit or component in a larger system._
The term "atomic commit" means that the commit is only representing a single change, that can't be further reduced into
multiple commits, i.e. this commit only handles a single change. Ideally, it should be possible to sum up the changes
that a good commit makes in a single sentence.
That said, the irreducibility should only apply to the change itself, obviously, making a commit for every line of code
wouldn't be very clean. Having a commit only change a small amount of code isn't what makes it atomic. While the commit
certainly can be small, it can just as well be a commit that's changing thousands of lines. (That said, you should have
some really good justification for it if you're actually making commits that big.)
The important thing is that the commit is only responsible for addressing a single change. A counter-example would be
a commit that adds a new feature, but also fixes a bug you found while implementing this feature, and also improves the
formatting of some other function, that you encountered along the way. With atomic commits, all of these actions would
get their own standalone commits, as they're unrelated to each other, and describe several different changes.
But making atomic commits aren't just about splitting thins up to only represent single changes, indeed, while they
should only represent the smallest possible change, it should also be a "complete" change. This means that a commit
responsible for changing how some function works in order to improve performance should ideally also update the
documentation, make the necessary adjustments to unit-tests so they still pass, and update all of the references to
this updated function to work properly after this change.
So an atomic commit is a commit representing a single small (ideally an irreducible) change, that's fully implemented
and integrates well with the rest of the codebase.
### Partial adds
Many people tend to always simply use `git add -A` (or `git add .`), to stage all of the changes they made, and then
create a commit with it all.
In an ideal world, where you only made the changes you needed to make for this single atomic commit, this would work
pretty well, and while sometimes this is the case, in most cases, you will likely have say fixed some bug you found
alongside, or a typo you noticed, etc.
When that happens, you should know that you can instead make a partial add, and only stage the changes that belong into
the commit you're about to make. The simple case is when you have some unrelated changes, but they're all in different
files, and don't affect this commit. In that case, you can use `git add /path/to/file`, to only stage those files that
you need, leaving the unrelated ones alone.
But this is rarely the case, instead, you usually have a single file, that now contains both a new feature, and some
unrelated quick bugfix. In that case, you can use the `-p`/`--patch` flag: `git add -p /path/to/file`. This will let you
interactively go over every "hunk" (a chunk of code, with changes close to each other), and decide on whether to accept
it (hence staging it), split it into more chunks, skip it, or even modify it in your editor, allowing you to remove the
intertwined code for the bugfix from the code for your feature that you're committing now.
You can then make the feature commit, that only contains the changes related to it, and then create another commit, that
only contains the bugfix related changes.
This git feature has slowly became one of my favorite tools, and I use it almost every time I need to commit something,
as it also allows me to quickly review the changes I'm making, before they make it into a commit, so it can certainly be
worth using, even if you know you want to commit the entire file.
## Stop making fixing commits
A very common occurrence I see in a ton of different projects is people making sequences of commits that go like:
- Fix bug X
- Actually fix bug X
- Fix typo in variable name
- Sort imports
- Follow lint rules
- Run auto-formatter
While people can obviously mess up sometimes, and just not get something right on the first try, a fixing commit like
this is actually not the only way to solve this happening.
Instead of making a new commit, you can actually just amend the original. To do this, we can use the `git commit
--amned`, which will add your staged changes into the previous commit, even allowing you to change the message of that
old commit.
Not only that, if you've already made another commit, but now found something that needs changing in the commit before
that, you can use interactive rebase with `git rebase -i HEAD~3`, allowing you to change the last 3 commits, or even
completely remove some of those commits.
For more on history rewriting, I'd recommend checking the [official
documentation](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History)
### Force pushing
{{< notice warning >}}
Changing history is a great tool to clean up after yourself, it works best with local changes, i.e. with changes you
haven't yet pushed.
Even though changing already pushed history is possible, it requires a "force push" (`git push --force`). These kinds of
pushes are something you need to be very careful about, as someone might have already pulled the changes, which you then
overwritten with your force push. Now, they might've done some work from the point at which they pulled, but then they
find out that this point is actually gone from the history, and they can't push their changes back. So now, they'll need
to undo their changes, pull the force pushed branch, and carry the work over, which can be very annoying.
{{< /notice >}}
My recommendation to avoid force pushing is to reduce the amount of (regular) pushes you do completely. If your changes
are only local, rewriting history is easy, and won't break anyone else's workflow, but the moment you push, the changes
are public, and anyone might've pulled them already.
This especially applies when you're pushing directly to master/main branch, or other shared branch which multiple people
are working with. If this is your personal branch (like a feature branch you're responsible for), force-pushing there is
generally ok, though you might still have people using your branch since they wanted to try out a feature early, or
review the changes from their editor. So even with personal branches, it's not always safe to force-push.
My rule of thumb is to avoid pushing until the feature is fully complete, as that allows you to change anything during
the development. Perhaps some change you made no longer makes sense, because you realized you won't actually be using it
in the way you anticipated, or you found a bug with it later on. You can now simply rewrite your local history, and
rather than making a fixing commit, it'd be as if the bug was never there.
Once you do finally decide to push, it's a good practice to run any auto-formatters and linters, and perhaps even
unit-tests. You can also take a quick peek at `git log`, to make sure you didn't make any typos. Then, only if all of
those local toolings passed should you actually push your version.
{{< notice tip >}}
If you do need to force-push, try to at least do it as quickly as possible. The more time that has passed since your
normal push, the more likely it is that someone have already clonned/pulled those changes. If you force-push within just
a few seconds after pushing, it's not very likely that someone has pulled already, and so you won't break anyone's
version.
{{< /notice >}}
## Benefits
Alright, now that we've seen some of the best practices for making new commits, let's explore the benefits that we can
actually gain by following these.
### A generally improved development workflow
I can confidently say, that in my experience, learning to make good git commits made me a much better programmer
overall. That might sound surprising, but it's really true.
The reason for this is that making good commits, that only tackle one issue at a time naturally helps you to think about
how to split your problem up into several smaller "atomic" problems, and make commits addressing that single part, after
which you move to another. This is actually one of very well known approaches to problem-solving, called "divide and
conquer" method, because you divide your problem into really small, trivially simple chunks, which you solve one by one.
Learning and getting used to doing this just makes you better at problem solving in general, and while git commits
certainly aren't the only way to get yourself to think like this, it's honestly one of the simplest ones, and you become
good at git while at it!
### Finding a tricky bug
Imagine you've just came up with a new feature that you're really eager to implement for your project. So, the moment
you think of how to do it, you start working on it. Then, a good bit of work, you're finally done, entirely. You now
make a commit, with all of the changes.
However, you now realize that as you pushed your commit to your repo, the automated CI workflows start to fail on some
unit-tests. Turns out you didn't think of some edge-case, some part of your solution is suddenly affecting something
completely unrelated. As you attempt to fix it, more and more other issues arise, and you don't really even know where
to start. You have this big single diff for the entire feature, but you have no idea where in that is the bug.
Figuring it out takes at best a lot of mental effort, analyzing and keeping track with all of the changes at once, or at
worst, you'll spend a lot of time doing this, but you'll just keep getting lost in your own code, until you finally just
give up, and start over. This time, only doing small changes at a time, and running the unit-tests
for each one as you go.
#### Same scenario, but with atomic commits
Now, let's consider the same scenario, but this time, you're following the best git principles, and so you're splitting
the problem up and making atomic commits for each of necessary changes, that will together make up the feature.
Once you're done, you decide to push all of those commits, and see the CI fail. However this time, you have a much
eaiser time finding where that pesky bug hides. Why? Because this time, you can just checkout one of those commits you
divided your bigger task into, and run the tests there. If it fails, you can run the tests in the commit before that.
You can just repeat this until you find the exact commit that caused these failures.
At this point, you know exactly which change caused this, because the commit you discovered was pretty small, it only
changed a few dozen lines and introduced a very specific behavior, in which after looking at it for a while, you find
that there's indeed a completely unexpected fault, which you only found out because you knew exactly where to look.
#### Git bisect
This scenario is actually very common and can come up a lot while developing, because of that, git actually has an
amazing tool that can make this process even easier! This tool is called `git bisect`.
Essentially, you can give git bisect a specific start commit, where you know everything worked as it should've, and an
end commit, where you know the fault exists somewhere. Git will automatically check out the commits in between in the
most optimal way (binary search), and all you have to do is then check whether the issue exists in the checked out
commit, or not. If it does, you tell bisect that this commit is still faulty, or if not, you say it's good.
Since bisect is essentially a binary search, it won't take too many attempts to figure out exactly which commit is the
faulty one, essentially automating the process above. Better yet, if the task of finding the bug can be uncovered by
simply running some script/command (perhaps the unit tests suite), you can actually just specify that command when using
git bisect, and it'll do all of the work for you, running that command on each of those check outs, and depending on
it's exit code, if the command passed, marking the commit as good, or if not, marking it as faulty.
So, even if the test suite takes a while, you can actually just have git find the bug for you, while you take a break
and make a nice cup of coffee.
### Git blame
Git blame is a tool that allows you to look at a file, and see exactly which lines were committed by who, and in which
commit. This can be very useful if you just want to check what that line was added there for. If it's a part of a larger
spanning commit, you can then check the diff of that commit, to see why that line was relevant, with the context of the
rest of the changes done.
Having good commit history and using atomic commits makes doing this a great and easy experience, as you're not very
likely to find that commit to be addressing 10 different issues at once, without providing any real description in the
commit message, as to why, and perhaps not even as to what it's doing. With commits like those, git blame becomes almost
useless, but if you do follow these best practices, it can be a great tool for understanding why anything in the code is
where it is, without needing to check the documentation, if there even is any.
### Cherry picking
Cherry picking is the process of taking a commit (or multiple commits), and carrying them over (essentially
copying/transferring them) to another branch, or just another point. So for example, you might have a feature branch, in
which you fixed a bug that also affects the current release. Instead of checking out the release branch, and re-doing
the changes there, you can actually use cherry-picking to carry the commit from the feature branch into the release
branch. This will mean any changes made in that commit will be applied, fixing the bug in release branch and allowing
you to make a release.
However, if the commit that fixed this issue wasn't atomic, and it also contained fixes for tons of other things, or
worse off, includes logic for additional features, you can't just carry it over like this, as you'd be introducing other
things into the release branch which aren't supposed to be there (yet). So instead, you'd have to make the changes in
the branch yourself, and create another commit, which is simply slower.
### Pull request reviews
When someone else is reviewing your pull request, having clean commits can be incredibly helpful to the reviewer, as
they can go through the individual commits instead of reviewing all of the changes at once by looking at the full diff
compared to the branch you're merging to. This alone can greatly reduce the mental overhead of having to keep track of
all of the added/changed code, and knowing how it interacts with the rest of the changes.
Atomic commits then allow for the reviewer to understand each and every atomic change you made, one by one, which is
much easier to grasp. So even if when put together, the code is pretty complex, in these atomic chunks, it's actually
pretty easy to see what's going on, and why. This is especially the case if these commits include great descriptions of
what it is they're addressing exactly.
This then doesn't just apply for pull-requests, this kind of workflow can actually be useful to anyone looking over some
code in a file. You could use git blame to find out the commit, and follow the parent commits up, allowing you to see
the individual changes as they were done one by one, which again, is then easier to understand, and allows you to then
realize what the whole file is about much quicker.
### Easy reverts
Sometimes, we might realize that a change that we made a while ago should not actually have been made, but the change
was already pushed and there's a lot of commits after it. That means at this point, we can't simply rewrite the history,
and we will need to push a commit that undoes that change.
The great advantage of atomic commits is that they should include the entire change, along with documentation it
introduces, tests, etc. in a single piece, a single commit. Because of that, assuming there weren't any commits that
built upon this change later on, we can use git's amazing `git revert` command.
This will create a new commit that undoes everything another specified commit did, making it very easy to revert some
specific change, while leaving everything else alone. This is much faster and easier than having to look at what the
original commit changed line by line, and change it back ourselves, and while this isn't something you'll use all that
often, when you do get a chance to use it, it's really nice and can be a good time saver.
## Conclusion
Git is something programmers use every day, learning how to do so properly is invaluable. There's a lot of rules I
mentioned here, and of course, you probably won't be able to just start doing all of them at once. But I would encourage
you to at least stop for a while before every commit you're about to make, and think of whether you really need to stage
all of the files, or if you should do a partial add, and make multiple commits instead, and also take a while to think
of a good commit message.
For motivation, here's a quick recap of the most important benefits a good git workflow gives you:
- Your development workflow becomes easier by allowing you to find issues a lot quicker
- You can also help your team or whoever ends up reading your commits understand what's going on and bring them up to date with the project
- You will be able to quickly find out who committed something and why
- Your overall programming skills will improve, because you'll get used to dividing up your problems naturally