Add post about making great commits

2026-06-20 20:53:06 +00:00 · 2023-04-17 15:38:01 +02:00 · 2023-04-17 15:38:01 +02:00 · 52fdfc5c2a
commit 52fdfc5c2a
parent 43615f816c
1 changed files with 417 additions and 0 deletions
--- a/content/posts/great-commits.md
+++ b/content/posts/great-commits.md
@ -0,0 +1,417 @@
+---
+title: Making great commits
+date: 2023-04-17
+tags: [programming, git]
+sources:
+  - <https://chris.beams.io/posts/git-commit/>
+  - <https://dhwthompson.com/2019/my-favourite-git-commit>
+  - <https://dev.to/samuelfaure/how-atomic-git-commits-dramatically-increased-my-productivity-and-will-increase-yours-too-4a84>
+  - <https://thoughtbot.com/blog/5-useful-tips-for-a-better-commit-message>
+  - <https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html>
+---
+
+A well-structured git log is key to project's maintainability; it provides insight into when and why things were done,
+for future maintainers of the project, ... and yet, so many people pay very little attention to how their commits are
+structured.
+
+The problem isn't necessarily that they don't even attempt to write good commit messages, it's that the commit they
+made is not actually easy to compose a commit message for.
+
+Another, perhaps even bigger issue is that a lot of people don't even know that there's a reason to care about their
+git history, because they simply don't see a benefit in it. The problem with this argument is that these people have
+simple never explored git enough, and therefore aren't even familiar with the benefits they could gain.
+
+So then, in this post, I'll try to explain both what are the benefits that you can get, and how to make your commits
+clean and easy to read and find in git history later on.
+
+## Commit message
+
+The purpose of every commit is always to simply represent some change that was made in the source code.
+
+The commit message should then describe this change, however what many people get wrong is that they just state
+**what** was changed, without explaining **why** it was changed. There is always a reason for why a change is made, and
+while the contents of the commit (being the actual changes made in the code - diff) can tell you what was done, the
+only way to figure out why it was done, is through the commit message.
+
+Therefore, when thinking of a good commit message, you should always ask yourself not just "What does this commit
+change?", but also, and perhaps more importantly, ask "Why is this change necessary?" and "What does this change
+achieve?".
+
+Knowing why something was added can then be incredibly beneficial for someone looking at `git blame`, which allows you
+to find out the commit that was responsible for adding/modifying any particular line. In vast majority of cases, when
+you look at git blame, you're not interested in what that single line of code is doing, but rather why it's even there.
+
+Without having this information in the commit itself, you'd likely have to go look for the actual pull request that
+added that commit, and read it's description, which might not even contain that reason anyway.
+
+### Commit isn't just the first line
+
+A huge amount of people are used to committing changes with a simple `git commit -m "My message"`, and while this is
+enough and it's perfectly in many cases, sometimes you just need more space to describe what a change truly achieves.
+
+Surprisingly, many people don't even know that they can make a commit that has more in it's message than just the
+title/first line, which then leads to poorly documented changes, because single line sometimes simply isn't enough. To
+create a commit with a bigger commit message, you can simply run `git commit` without the `-m` argument. This should
+open your terminal text editor, allowing you to write out the message in multiple lines.
+
+{{< notice tip >}}
+I'd actually recommend making the simple `git commit` the default way you make new commits, since it invites you to
+write more about it, by just seeing that you have that space available. We usually don't even know what exactly we'll
+write in our new commit message before getting to typing it out, and knowing you have that extra space if you need it
+will naturally lead to using it, even if you didn't know you needed it ahead of time.
+{{< /notice >}}
+
+That said, not every commit requires both a subject and a body, sometimes a single line is fine, especially when the
+change is so simple that no further context is necessary, and including some would just waste the readers time. For
+example:
+
+```markdown
+Fix typo in README
+```
+
+In this case, there's no need for anything extra. Some people like to include what the typo was, but if you want to know
+that, you can use `git show` or `git diff`, or `git log --patch`, showing you the actual changes made to the code, so
+this information isn't necessary either. So, while in some cases, having extra context can be very valuable, you also
+shouldn't overdo it.
+
+### Make commits searchable
+
+It can be very beneficial to include some keywords that people could then easily find this commit by, when searching
+for changes in the codebase. As an example, you can include the name of an exception, such as `InvalidDataStreamError`,
+if your commit addresses a bug that causes this exception.
+
+You can then add an explanation on why this error was getting raised, and why your change fixed that. With that, anyone
+who found your commit by searching for this exception can immediately find out what this exception is, why was it
+getting raised and what to do to fix it.
+
+This is especially useful with internal API, whether it's custom exceptions, or just functions or names of classes.
+People don't search the commit history very often, but if you do encounter a case where you think someone might perform
+a search for at some point, it's worth it to make it as easy for them as you can.
+
+### Make it exciting to read
+
+I sometimes find myself going through random commit messages of a project, just to see what is the development like,
+and explore what are the kinds of changes being introduced. Even more often, I look there to quickly see what was
+changed, to bring myself up to date with the project.
+
+When doing this, I'm always super thankful to people who took the time to for example include the debug process of how
+they figured out X was an issue, or where they explain some strange behavior that you might not expect to be happening.
+
+These kinds of commits make the history a fun place to go and read, and it allows you to teach someone something about
+the language, the project, or programming in general, making everyone in your team a bit smarter!
+
+### Follow the proper message structure
+
+Git commits should be written in a very specific way. There's a few rules to follow:
+
+- **Separate the subject/title from body with a blank line** (Especially useful when looking at `git log --oneline`,
+  as without the blank line, lines below are considered as parts of the same paragraph, and shown together)
+- **Limit the subject line to 50 characters** (Not a hard limit, but try not going that much longer. This limit ensures
+  readability, and forces the author to think about the most concise way to explain what's going on. Note: If you're
+  having trouble summarizing, you might be committing too much at once)
+- **Capitalize the subject line**
+- **Don't end the subject line with a period**
+- \*Use imperative mood in subject\*\* (Imperative mood means "written as if giving a command/instruction" i.e.: "Add
+  support for X", not "I added support for X" or "Support for X was added", as a rule of thumb, a subject message
+  should be able to complete the sentence: "If implemented, this commit will ...")
+- **Wrap body at 72 characters** (We usually use `git log` to print out the commits into the terminal, but it's output
+  isn't wrapped, and going over the terminals width can cause a pretty messy output. The recommended maximum width for
+  terminal text output is 80 characters, but git tools can often add indents, so 72 characters is a pretty sensible
+  maximum)
+- **Mention the "what" and the "why", but not the "how"** (A commit message shouldn't contain implementation details,
+  if people want to see those, whey should look at the changed code diff directly)
+
+If you want to, you can consider using markdown in your commit message, as most other programmers will understand it as
+it's a commonly used format, and it's a great way to bring in some more style, improving readability. In fact, if you
+view the commit from a site like GitHub, it will even render the markdown properly for you.
+
+For example:
+
+```markdown
+Summarize changes in around 50 characters or less
+
+More detailed explanatory text, if necessary. Wrap it to about 72
+characters or so. In some contexts, the first line is treated as the
+subject of the commit and the rest of the text as the body. The
+blank line separating the summary from the body is critical (unless
+you omit the body entirely); various tools like `log`, `shortlog`
+and `rebase` can get confused if you run the two together.
+
+Explain the problem that this commit is solving. Focus on why you
+are making this change as opposed to how (the code explains that).
+Are there side effects or other unintuitive consequences of this
+change? Here's the place to explain them.
+
+Further paragraphs come after blank lines.
+
+- Bullet points are okay, too
+
+- Typically a hyphen or asterisk is used for the bullet, preceded
+  by a single space, with blank lines in between, but conventions
+  vary here
+
+If you use an issue tracker, put references to them at the bottom,
+like this:
+
+Resolves: #123
+See also: #456, #789
+```
+
+## Make "atomic" commits
+
+_Atomic: of or forming a single irreducible unit or component in a larger system._
+
+The term "atomic commit" means that the commit is only representing a single change, that can't be further reduced into
+multiple commits, i.e. this commit only handles a single change. Ideally, it should be possible to sum up the changes
+that a good commit makes in a single sentence.
+
+That said, the irreducibility should only apply to the change itself, obviously, making a commit for every line of code
+wouldn't be very clean. Having a commit only change a small amount of code isn't what makes it atomic. While the commit
+certainly can be small, it can just as well be a commit that's changing thousands of lines. (That said, you should have
+some really good justification for it if you're actually making commits that big.)
+
+The important thing is that the commit is only responsible for addressing a single change. A counter-example would be
+a commit that adds a new feature, but also fixes a bug you found while implementing this feature, and also improves the
+formatting of some other function, that you encountered along the way. With atomic commits, all of these actions would
+get their own standalone commits, as they're unrelated to each other, and describe several different changes.
+
+But making atomic commits aren't just about splitting thins up to only represent single changes, indeed, while they
+should only represent the smallest possible change, it should also be a "complete" change. This means that a commit
+responsible for changing how some function works in order to improve performance should ideally also update the
+documentation, make the necessary adjustments to unit-tests so they still pass, and update all of the references to
+this updated function to work properly after this change.
+
+So an atomic commit is a commit representing a single small (ideally an irreducible) change, that's fully implemented
+and integrates well with the rest of the codebase.
+
+### Partial adds
+
+Many people tend to always simply use `git add -A` (or `git add .`), to stage all of the changes they made, and then
+create a commit with it all.
+
+In an ideal world, where you only made the changes you needed to make for this single atomic commit, this would work
+pretty well, and while sometimes this is the case, in most cases, you will likely have say fixed some bug you found
+alongside, or a typo you noticed, etc.
+
+When that happens, you should know that you can instead make a partial add, and only stage the changes that belong into
+the commit you're about to make. The simple case is when you have some unrelated changes, but they're all in different
+files, and don't affect this commit. In that case, you can use `git add /path/to/file`, to only stage those files that
+you need, leaving the unrelated ones alone.
+
+But this is rarely the case, instead, you usually have a single file, that now contains both a new feature, and some
+unrelated quick bugfix. In that case, you can use the `-p`/`--patch` flag: `git add -p /path/to/file`. This will let you
+interactively go over every "hunk" (a chunk of code, with changes close to each other), and decide on whether to accept
+it (hence staging it), split it into more chunks, skip it, or even modify it in your editor, allowing you to remove the
+intertwined code for the bugfix from the code for your feature that you're committing now.
+
+You can then make the feature commit, that only contains the changes related to it, and then create another commit, that
+only contains the bugfix related changes.
+
+This git feature has slowly became one of my favorite tools, and I use it almost every time I need to commit something,
+as it also allows me to quickly review the changes I'm making, before they make it into a commit, so it can certainly be
+worth using, even if you know you want to commit the entire file.
+
+## Stop making fixing commits
+
+A very common occurrence I see in a ton of different projects is people making sequences of commits that go like:
+
+- Fix bug X
+- Actually fix bug X
+- Fix typo in variable name
+- Sort imports
+- Follow lint rules
+- Run auto-formatter
+
+While people can obviously mess up sometimes, and just not get something right on the first try, a fixing commit like
+this is actually not the only way to solve this happening.
+
+Instead of making a new commit, you can actually just amend the original. To do this, we can use the `git commit
+--amned`, which will add your staged changes into the previous commit, even allowing you to change the message of that
+old commit.
+
+Not only that, if you've already made another commit, but now found something that needs changing in the commit before
+that, you can use interactive rebase with `git rebase -i HEAD~3`, allowing you to change the last 3 commits, or even
+completely remove some of those commits.
+
+For more on history rewriting, I'd recommend checking the [official
+documentation](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History)
+
+### Force pushing
+
+{{< notice warning >}}
+Changing history is a great tool to clean up after yourself, it works best with local changes, i.e. with changes you
+haven't yet pushed.
+
+Even though changing already pushed history is possible, it requires a "force push" (`git push --force`). These kinds of
+pushes are something you need to be very careful about, as someone might have already pulled the changes, which you then
+overwritten with your force push. Now, they might've done some work from the point at which they pulled, but then they
+find out that this point is actually gone from the history, and they can't push their changes back. So now, they'll need
+to undo their changes, pull the force pushed branch, and carry the work over, which can be very annoying.
+{{< /notice >}}
+
+My recommendation to avoid force pushing is to reduce the amount of (regular) pushes you do completely. If your changes
+are only local, rewriting history is easy, and won't break anyone else's workflow, but the moment you push, the changes
+are public, and anyone might've pulled them already.
+
+This especially applies when you're pushing directly to master/main branch, or other shared branch which multiple people
+are working with. If this is your personal branch (like a feature branch you're responsible for), force-pushing there is
+generally ok, though you might still have people using your branch since they wanted to try out a feature early, or
+review the changes from their editor. So even with personal branches, it's not always safe to force-push.
+
+My rule of thumb is to avoid pushing until the feature is fully complete, as that allows you to change anything during
+the development. Perhaps some change you made no longer makes sense, because you realized you won't actually be using it
+in the way you anticipated, or you found a bug with it later on. You can now simply rewrite your local history, and
+rather than making a fixing commit, it'd be as if the bug was never there.
+
+Once you do finally decide to push, it's a good practice to run any auto-formatters and linters, and perhaps even
+unit-tests. You can also take a quick peek at `git log`, to make sure you didn't make any typos. Then, only if all of
+those local toolings passed should you actually push your version.
+
+{{< notice tip >}}
+If you do need to force-push, try to at least do it as quickly as possible. The more time that has passed since your
+normal push, the more likely it is that someone have already clonned/pulled those changes. If you force-push within just
+a few seconds after pushing, it's not very likely that someone has pulled already, and so you won't break anyone's
+version.
+{{< /notice >}}
+
+## Benefits
+
+Alright, now that we've seen some of the best practices for making new commits, let's explore the benefits that we can
+actually gain by following these.
+
+### A generally improved development workflow
+
+I can confidently say, that in my experience, learning to make good git commits made me a much better programmer
+overall. That might sound surprising, but it's really true.
+
+The reason for this is that making good commits, that only tackle one issue at a time naturally helps you to think about
+how to split your problem up into several smaller "atomic" problems, and make commits addressing that single part, after
+which you move to another. This is actually one of very well known approaches to problem-solving, called "divide and
+conquer" method, because you divide your problem into really small, trivially simple chunks, which you solve one by one.
+
+Learning and getting used to doing this just makes you better at problem solving in general, and while git commits
+certainly aren't the only way to get yourself to think like this, it's honestly one of the simplest ones, and you become
+good at git while at it!
+
+### Finding a tricky bug
+
+Imagine you've just came up with a new feature that you're really eager to implement for your project. So, the moment
+you think of how to do it, you start working on it. Then, a good bit of work, you're finally done, entirely. You now
+make a commit, with all of the changes.
+
+However, you now realize that as you pushed your commit to your repo, the automated CI workflows start to fail on some
+unit-tests. Turns out you didn't think of some edge-case, some part of your solution is suddenly affecting something
+completely unrelated. As you attempt to fix it, more and more other issues arise, and you don't really even know where
+to start. You have this big single diff for the entire feature, but you have no idea where in that is the bug.
+
+Figuring it out takes at best a lot of mental effort, analyzing and keeping track with all of the changes at once, or at
+worst, you'll spend a lot of time doing this, but you'll just keep getting lost in your own code, until you finally just
+give up, and start over. This time, only doing small changes at a time, and running the unit-tests
+for each one as you go.
+
+#### Same scenario, but with atomic commits
+
+Now, let's consider the same scenario, but this time, you're following the best git principles, and so you're splitting
+the problem up and making atomic commits for each of necessary changes, that will together make up the feature.
+
+Once you're done, you decide to push all of those commits, and see the CI fail. However this time, you have a much
+eaiser time finding where that pesky bug hides. Why? Because this time, you can just checkout one of those commits you
+divided your bigger task into, and run the tests there. If it fails, you can run the tests in the commit before that.
+You can just repeat this until you find the exact commit that caused these failures.
+
+At this point, you know exactly which change caused this, because the commit you discovered was pretty small, it only
+changed a few dozen lines and introduced a very specific behavior, in which after looking at it for a while, you find
+that there's indeed a completely unexpected fault, which you only found out because you knew exactly where to look.
+
+#### Git bisect
+
+This scenario is actually very common and can come up a lot while developing, because of that, git actually has an
+amazing tool that can make this process even easier! This tool is called `git bisect`.
+
+Essentially, you can give git bisect a specific start commit, where you know everything worked as it should've, and an
+end commit, where you know the fault exists somewhere. Git will automatically check out the commits in between in the
+most optimal way (binary search), and all you have to do is then check whether the issue exists in the checked out
+commit, or not. If it does, you tell bisect that this commit is still faulty, or if not, you say it's good.
+
+Since bisect is essentially a binary search, it won't take too many attempts to figure out exactly which commit is the
+faulty one, essentially automating the process above. Better yet, if the task of finding the bug can be uncovered by
+simply running some script/command (perhaps the unit tests suite), you can actually just specify that command when using
+git bisect, and it'll do all of the work for you, running that command on each of those check outs, and depending on
+it's exit code, if the command passed, marking the commit as good, or if not, marking it as faulty.
+
+So, even if the test suite takes a while, you can actually just have git find the bug for you, while you take a break
+and make a nice cup of coffee.
+
+### Git blame
+
+Git blame is a tool that allows you to look at a file, and see exactly which lines were committed by who, and in which
+commit. This can be very useful if you just want to check what that line was added there for. If it's a part of a larger
+spanning commit, you can then check the diff of that commit, to see why that line was relevant, with the context of the
+rest of the changes done.
+
+Having good commit history and using atomic commits makes doing this a great and easy experience, as you're not very
+likely to find that commit to be addressing 10 different issues at once, without providing any real description in the
+commit message, as to why, and perhaps not even as to what it's doing. With commits like those, git blame becomes almost
+useless, but if you do follow these best practices, it can be a great tool for understanding why anything in the code is
+where it is, without needing to check the documentation, if there even is any.
+
+### Cherry picking
+
+Cherry picking is the process of taking a commit (or multiple commits), and carrying them over (essentially
+copying/transferring them) to another branch, or just another point. So for example, you might have a feature branch, in
+which you fixed a bug that also affects the current release. Instead of checking out the release branch, and re-doing
+the changes there, you can actually use cherry-picking to carry the commit from the feature branch into the release
+branch. This will mean any changes made in that commit will be applied, fixing the bug in release branch and allowing
+you to make a release.
+
+However, if the commit that fixed this issue wasn't atomic, and it also contained fixes for tons of other things, or
+worse off, includes logic for additional features, you can't just carry it over like this, as you'd be introducing other
+things into the release branch which aren't supposed to be there (yet). So instead, you'd have to make the changes in
+the branch yourself, and create another commit, which is simply slower.
+
+### Pull request reviews
+
+When someone else is reviewing your pull request, having clean commits can be incredibly helpful to the reviewer, as
+they can go through the individual commits instead of reviewing all of the changes at once by looking at the full diff
+compared to the branch you're merging to. This alone can greatly reduce the mental overhead of having to keep track of
+all of the added/changed code, and knowing how it interacts with the rest of the changes.
+
+Atomic commits then allow for the reviewer to understand each and every atomic change you made, one by one, which is
+much easier to grasp. So even if when put together, the code is pretty complex, in these atomic chunks, it's actually
+pretty easy to see what's going on, and why. This is especially the case if these commits include great descriptions of
+what it is they're addressing exactly.
+
+This then doesn't just apply for pull-requests, this kind of workflow can actually be useful to anyone looking over some
+code in a file. You could use git blame to find out the commit, and follow the parent commits up, allowing you to see
+the individual changes as they were done one by one, which again, is then easier to understand, and allows you to then
+realize what the whole file is about much quicker.
+
+### Easy reverts
+
+Sometimes, we might realize that a change that we made a while ago should not actually have been made, but the change
+was already pushed and there's a lot of commits after it. That means at this point, we can't simply rewrite the history,
+and we will need to push a commit that undoes that change.
+
+The great advantage of atomic commits is that they should include the entire change, along with documentation it
+introduces, tests, etc. in a single piece, a single commit. Because of that, assuming there weren't any commits that
+built upon this change later on, we can use git's amazing `git revert` command.
+
+This will create a new commit that undoes everything another specified commit did, making it very easy to revert some
+specific change, while leaving everything else alone. This is much faster and easier than having to look at what the
+original commit changed line by line, and change it back ourselves, and while this isn't something you'll use all that
+often, when you do get a chance to use it, it's really nice and can be a good time saver.
+
+## Conclusion
+
+Git is something programmers use every day, learning how to do so properly is invaluable. There's a lot of rules I
+mentioned here, and of course, you probably won't be able to just start doing all of them at once. But I would encourage
+you to at least stop for a while before every commit you're about to make, and think of whether you really need to stage
+all of the files, or if you should do a partial add, and make multiple commits instead, and also take a while to think
+of a good commit message.
+
+For motivation, here's a quick recap of the most important benefits a good git workflow gives you:
+
+- Your development workflow becomes easier by allowing you to find issues a lot quicker
+- You can also help your team or whoever ends up reading your commits understand what's going on and bring them up to date with the project
+- You will be able to quickly find out who committed something and why
+- Your overall programming skills will improve, because you'll get used to dividing up your problems naturally