diff --git a/content/posts/great-commits.md b/content/posts/great-commits.md new file mode 100644 index 0000000..82c8a20 --- /dev/null +++ b/content/posts/great-commits.md @@ -0,0 +1,417 @@ +--- +title: Making great commits +date: 2023-04-17 +tags: [programming, git] +sources: + - + - + - + - + - +--- + +A well-structured git log is key to project's maintainability; it provides insight into when and why things were done, +for future maintainers of the project, ... and yet, so many people pay very little attention to how their commits are +structured. + +The problem isn't necessarily that they don't even attempt to write good commit messages, it's that the commit they +made is not actually easy to compose a commit message for. + +Another, perhaps even bigger issue is that a lot of people don't even know that there's a reason to care about their +git history, because they simply don't see a benefit in it. The problem with this argument is that these people have +simple never explored git enough, and therefore aren't even familiar with the benefits they could gain. + +So then, in this post, I'll try to explain both what are the benefits that you can get, and how to make your commits +clean and easy to read and find in git history later on. + +## Commit message + +The purpose of every commit is always to simply represent some change that was made in the source code. + +The commit message should then describe this change, however what many people get wrong is that they just state +**what** was changed, without explaining **why** it was changed. There is always a reason for why a change is made, and +while the contents of the commit (being the actual changes made in the code - diff) can tell you what was done, the +only way to figure out why it was done, is through the commit message. + +Therefore, when thinking of a good commit message, you should always ask yourself not just "What does this commit +change?", but also, and perhaps more importantly, ask "Why is this change necessary?" and "What does this change +achieve?". + +Knowing why something was added can then be incredibly beneficial for someone looking at `git blame`, which allows you +to find out the commit that was responsible for adding/modifying any particular line. In vast majority of cases, when +you look at git blame, you're not interested in what that single line of code is doing, but rather why it's even there. + +Without having this information in the commit itself, you'd likely have to go look for the actual pull request that +added that commit, and read it's description, which might not even contain that reason anyway. + +### Commit isn't just the first line + +A huge amount of people are used to committing changes with a simple `git commit -m "My message"`, and while this is +enough and it's perfectly in many cases, sometimes you just need more space to describe what a change truly achieves. + +Surprisingly, many people don't even know that they can make a commit that has more in it's message than just the +title/first line, which then leads to poorly documented changes, because single line sometimes simply isn't enough. To +create a commit with a bigger commit message, you can simply run `git commit` without the `-m` argument. This should +open your terminal text editor, allowing you to write out the message in multiple lines. + +{{< notice tip >}} +I'd actually recommend making the simple `git commit` the default way you make new commits, since it invites you to +write more about it, by just seeing that you have that space available. We usually don't even know what exactly we'll +write in our new commit message before getting to typing it out, and knowing you have that extra space if you need it +will naturally lead to using it, even if you didn't know you needed it ahead of time. +{{< /notice >}} + +That said, not every commit requires both a subject and a body, sometimes a single line is fine, especially when the +change is so simple that no further context is necessary, and including some would just waste the readers time. For +example: + +```markdown +Fix typo in README +``` + +In this case, there's no need for anything extra. Some people like to include what the typo was, but if you want to know +that, you can use `git show` or `git diff`, or `git log --patch`, showing you the actual changes made to the code, so +this information isn't necessary either. So, while in some cases, having extra context can be very valuable, you also +shouldn't overdo it. + +### Make commits searchable + +It can be very beneficial to include some keywords that people could then easily find this commit by, when searching +for changes in the codebase. As an example, you can include the name of an exception, such as `InvalidDataStreamError`, +if your commit addresses a bug that causes this exception. + +You can then add an explanation on why this error was getting raised, and why your change fixed that. With that, anyone +who found your commit by searching for this exception can immediately find out what this exception is, why was it +getting raised and what to do to fix it. + +This is especially useful with internal API, whether it's custom exceptions, or just functions or names of classes. +People don't search the commit history very often, but if you do encounter a case where you think someone might perform +a search for at some point, it's worth it to make it as easy for them as you can. + +### Make it exciting to read + +I sometimes find myself going through random commit messages of a project, just to see what is the development like, +and explore what are the kinds of changes being introduced. Even more often, I look there to quickly see what was +changed, to bring myself up to date with the project. + +When doing this, I'm always super thankful to people who took the time to for example include the debug process of how +they figured out X was an issue, or where they explain some strange behavior that you might not expect to be happening. + +These kinds of commits make the history a fun place to go and read, and it allows you to teach someone something about +the language, the project, or programming in general, making everyone in your team a bit smarter! + +### Follow the proper message structure + +Git commits should be written in a very specific way. There's a few rules to follow: + +- **Separate the subject/title from body with a blank line** (Especially useful when looking at `git log --oneline`, + as without the blank line, lines below are considered as parts of the same paragraph, and shown together) +- **Limit the subject line to 50 characters** (Not a hard limit, but try not going that much longer. This limit ensures + readability, and forces the author to think about the most concise way to explain what's going on. Note: If you're + having trouble summarizing, you might be committing too much at once) +- **Capitalize the subject line** +- **Don't end the subject line with a period** +- \*Use imperative mood in subject\*\* (Imperative mood means "written as if giving a command/instruction" i.e.: "Add + support for X", not "I added support for X" or "Support for X was added", as a rule of thumb, a subject message + should be able to complete the sentence: "If implemented, this commit will ...") +- **Wrap body at 72 characters** (We usually use `git log` to print out the commits into the terminal, but it's output + isn't wrapped, and going over the terminals width can cause a pretty messy output. The recommended maximum width for + terminal text output is 80 characters, but git tools can often add indents, so 72 characters is a pretty sensible + maximum) +- **Mention the "what" and the "why", but not the "how"** (A commit message shouldn't contain implementation details, + if people want to see those, whey should look at the changed code diff directly) + +If you want to, you can consider using markdown in your commit message, as most other programmers will understand it as +it's a commonly used format, and it's a great way to bring in some more style, improving readability. In fact, if you +view the commit from a site like GitHub, it will even render the markdown properly for you. + +For example: + +```markdown +Summarize changes in around 50 characters or less + +More detailed explanatory text, if necessary. Wrap it to about 72 +characters or so. In some contexts, the first line is treated as the +subject of the commit and the rest of the text as the body. The +blank line separating the summary from the body is critical (unless +you omit the body entirely); various tools like `log`, `shortlog` +and `rebase` can get confused if you run the two together. + +Explain the problem that this commit is solving. Focus on why you +are making this change as opposed to how (the code explains that). +Are there side effects or other unintuitive consequences of this +change? Here's the place to explain them. + +Further paragraphs come after blank lines. + +- Bullet points are okay, too + +- Typically a hyphen or asterisk is used for the bullet, preceded + by a single space, with blank lines in between, but conventions + vary here + +If you use an issue tracker, put references to them at the bottom, +like this: + +Resolves: #123 +See also: #456, #789 +``` + +## Make "atomic" commits + +_Atomic: of or forming a single irreducible unit or component in a larger system._ + +The term "atomic commit" means that the commit is only representing a single change, that can't be further reduced into +multiple commits, i.e. this commit only handles a single change. Ideally, it should be possible to sum up the changes +that a good commit makes in a single sentence. + +That said, the irreducibility should only apply to the change itself, obviously, making a commit for every line of code +wouldn't be very clean. Having a commit only change a small amount of code isn't what makes it atomic. While the commit +certainly can be small, it can just as well be a commit that's changing thousands of lines. (That said, you should have +some really good justification for it if you're actually making commits that big.) + +The important thing is that the commit is only responsible for addressing a single change. A counter-example would be +a commit that adds a new feature, but also fixes a bug you found while implementing this feature, and also improves the +formatting of some other function, that you encountered along the way. With atomic commits, all of these actions would +get their own standalone commits, as they're unrelated to each other, and describe several different changes. + +But making atomic commits aren't just about splitting thins up to only represent single changes, indeed, while they +should only represent the smallest possible change, it should also be a "complete" change. This means that a commit +responsible for changing how some function works in order to improve performance should ideally also update the +documentation, make the necessary adjustments to unit-tests so they still pass, and update all of the references to +this updated function to work properly after this change. + +So an atomic commit is a commit representing a single small (ideally an irreducible) change, that's fully implemented +and integrates well with the rest of the codebase. + +### Partial adds + +Many people tend to always simply use `git add -A` (or `git add .`), to stage all of the changes they made, and then +create a commit with it all. + +In an ideal world, where you only made the changes you needed to make for this single atomic commit, this would work +pretty well, and while sometimes this is the case, in most cases, you will likely have say fixed some bug you found +alongside, or a typo you noticed, etc. + +When that happens, you should know that you can instead make a partial add, and only stage the changes that belong into +the commit you're about to make. The simple case is when you have some unrelated changes, but they're all in different +files, and don't affect this commit. In that case, you can use `git add /path/to/file`, to only stage those files that +you need, leaving the unrelated ones alone. + +But this is rarely the case, instead, you usually have a single file, that now contains both a new feature, and some +unrelated quick bugfix. In that case, you can use the `-p`/`--patch` flag: `git add -p /path/to/file`. This will let you +interactively go over every "hunk" (a chunk of code, with changes close to each other), and decide on whether to accept +it (hence staging it), split it into more chunks, skip it, or even modify it in your editor, allowing you to remove the +intertwined code for the bugfix from the code for your feature that you're committing now. + +You can then make the feature commit, that only contains the changes related to it, and then create another commit, that +only contains the bugfix related changes. + +This git feature has slowly became one of my favorite tools, and I use it almost every time I need to commit something, +as it also allows me to quickly review the changes I'm making, before they make it into a commit, so it can certainly be +worth using, even if you know you want to commit the entire file. + +## Stop making fixing commits + +A very common occurrence I see in a ton of different projects is people making sequences of commits that go like: + +- Fix bug X +- Actually fix bug X +- Fix typo in variable name +- Sort imports +- Follow lint rules +- Run auto-formatter + +While people can obviously mess up sometimes, and just not get something right on the first try, a fixing commit like +this is actually not the only way to solve this happening. + +Instead of making a new commit, you can actually just amend the original. To do this, we can use the `git commit +--amned`, which will add your staged changes into the previous commit, even allowing you to change the message of that +old commit. + +Not only that, if you've already made another commit, but now found something that needs changing in the commit before +that, you can use interactive rebase with `git rebase -i HEAD~3`, allowing you to change the last 3 commits, or even +completely remove some of those commits. + +For more on history rewriting, I'd recommend checking the [official +documentation](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History) + +### Force pushing + +{{< notice warning >}} +Changing history is a great tool to clean up after yourself, it works best with local changes, i.e. with changes you +haven't yet pushed. + +Even though changing already pushed history is possible, it requires a "force push" (`git push --force`). These kinds of +pushes are something you need to be very careful about, as someone might have already pulled the changes, which you then +overwritten with your force push. Now, they might've done some work from the point at which they pulled, but then they +find out that this point is actually gone from the history, and they can't push their changes back. So now, they'll need +to undo their changes, pull the force pushed branch, and carry the work over, which can be very annoying. +{{< /notice >}} + +My recommendation to avoid force pushing is to reduce the amount of (regular) pushes you do completely. If your changes +are only local, rewriting history is easy, and won't break anyone else's workflow, but the moment you push, the changes +are public, and anyone might've pulled them already. + +This especially applies when you're pushing directly to master/main branch, or other shared branch which multiple people +are working with. If this is your personal branch (like a feature branch you're responsible for), force-pushing there is +generally ok, though you might still have people using your branch since they wanted to try out a feature early, or +review the changes from their editor. So even with personal branches, it's not always safe to force-push. + +My rule of thumb is to avoid pushing until the feature is fully complete, as that allows you to change anything during +the development. Perhaps some change you made no longer makes sense, because you realized you won't actually be using it +in the way you anticipated, or you found a bug with it later on. You can now simply rewrite your local history, and +rather than making a fixing commit, it'd be as if the bug was never there. + +Once you do finally decide to push, it's a good practice to run any auto-formatters and linters, and perhaps even +unit-tests. You can also take a quick peek at `git log`, to make sure you didn't make any typos. Then, only if all of +those local toolings passed should you actually push your version. + +{{< notice tip >}} +If you do need to force-push, try to at least do it as quickly as possible. The more time that has passed since your +normal push, the more likely it is that someone have already clonned/pulled those changes. If you force-push within just +a few seconds after pushing, it's not very likely that someone has pulled already, and so you won't break anyone's +version. +{{< /notice >}} + +## Benefits + +Alright, now that we've seen some of the best practices for making new commits, let's explore the benefits that we can +actually gain by following these. + +### A generally improved development workflow + +I can confidently say, that in my experience, learning to make good git commits made me a much better programmer +overall. That might sound surprising, but it's really true. + +The reason for this is that making good commits, that only tackle one issue at a time naturally helps you to think about +how to split your problem up into several smaller "atomic" problems, and make commits addressing that single part, after +which you move to another. This is actually one of very well known approaches to problem-solving, called "divide and +conquer" method, because you divide your problem into really small, trivially simple chunks, which you solve one by one. + +Learning and getting used to doing this just makes you better at problem solving in general, and while git commits +certainly aren't the only way to get yourself to think like this, it's honestly one of the simplest ones, and you become +good at git while at it! + +### Finding a tricky bug + +Imagine you've just came up with a new feature that you're really eager to implement for your project. So, the moment +you think of how to do it, you start working on it. Then, a good bit of work, you're finally done, entirely. You now +make a commit, with all of the changes. + +However, you now realize that as you pushed your commit to your repo, the automated CI workflows start to fail on some +unit-tests. Turns out you didn't think of some edge-case, some part of your solution is suddenly affecting something +completely unrelated. As you attempt to fix it, more and more other issues arise, and you don't really even know where +to start. You have this big single diff for the entire feature, but you have no idea where in that is the bug. + +Figuring it out takes at best a lot of mental effort, analyzing and keeping track with all of the changes at once, or at +worst, you'll spend a lot of time doing this, but you'll just keep getting lost in your own code, until you finally just +give up, and start over. This time, only doing small changes at a time, and running the unit-tests +for each one as you go. + +#### Same scenario, but with atomic commits + +Now, let's consider the same scenario, but this time, you're following the best git principles, and so you're splitting +the problem up and making atomic commits for each of necessary changes, that will together make up the feature. + +Once you're done, you decide to push all of those commits, and see the CI fail. However this time, you have a much +eaiser time finding where that pesky bug hides. Why? Because this time, you can just checkout one of those commits you +divided your bigger task into, and run the tests there. If it fails, you can run the tests in the commit before that. +You can just repeat this until you find the exact commit that caused these failures. + +At this point, you know exactly which change caused this, because the commit you discovered was pretty small, it only +changed a few dozen lines and introduced a very specific behavior, in which after looking at it for a while, you find +that there's indeed a completely unexpected fault, which you only found out because you knew exactly where to look. + +#### Git bisect + +This scenario is actually very common and can come up a lot while developing, because of that, git actually has an +amazing tool that can make this process even easier! This tool is called `git bisect`. + +Essentially, you can give git bisect a specific start commit, where you know everything worked as it should've, and an +end commit, where you know the fault exists somewhere. Git will automatically check out the commits in between in the +most optimal way (binary search), and all you have to do is then check whether the issue exists in the checked out +commit, or not. If it does, you tell bisect that this commit is still faulty, or if not, you say it's good. + +Since bisect is essentially a binary search, it won't take too many attempts to figure out exactly which commit is the +faulty one, essentially automating the process above. Better yet, if the task of finding the bug can be uncovered by +simply running some script/command (perhaps the unit tests suite), you can actually just specify that command when using +git bisect, and it'll do all of the work for you, running that command on each of those check outs, and depending on +it's exit code, if the command passed, marking the commit as good, or if not, marking it as faulty. + +So, even if the test suite takes a while, you can actually just have git find the bug for you, while you take a break +and make a nice cup of coffee. + +### Git blame + +Git blame is a tool that allows you to look at a file, and see exactly which lines were committed by who, and in which +commit. This can be very useful if you just want to check what that line was added there for. If it's a part of a larger +spanning commit, you can then check the diff of that commit, to see why that line was relevant, with the context of the +rest of the changes done. + +Having good commit history and using atomic commits makes doing this a great and easy experience, as you're not very +likely to find that commit to be addressing 10 different issues at once, without providing any real description in the +commit message, as to why, and perhaps not even as to what it's doing. With commits like those, git blame becomes almost +useless, but if you do follow these best practices, it can be a great tool for understanding why anything in the code is +where it is, without needing to check the documentation, if there even is any. + +### Cherry picking + +Cherry picking is the process of taking a commit (or multiple commits), and carrying them over (essentially +copying/transferring them) to another branch, or just another point. So for example, you might have a feature branch, in +which you fixed a bug that also affects the current release. Instead of checking out the release branch, and re-doing +the changes there, you can actually use cherry-picking to carry the commit from the feature branch into the release +branch. This will mean any changes made in that commit will be applied, fixing the bug in release branch and allowing +you to make a release. + +However, if the commit that fixed this issue wasn't atomic, and it also contained fixes for tons of other things, or +worse off, includes logic for additional features, you can't just carry it over like this, as you'd be introducing other +things into the release branch which aren't supposed to be there (yet). So instead, you'd have to make the changes in +the branch yourself, and create another commit, which is simply slower. + +### Pull request reviews + +When someone else is reviewing your pull request, having clean commits can be incredibly helpful to the reviewer, as +they can go through the individual commits instead of reviewing all of the changes at once by looking at the full diff +compared to the branch you're merging to. This alone can greatly reduce the mental overhead of having to keep track of +all of the added/changed code, and knowing how it interacts with the rest of the changes. + +Atomic commits then allow for the reviewer to understand each and every atomic change you made, one by one, which is +much easier to grasp. So even if when put together, the code is pretty complex, in these atomic chunks, it's actually +pretty easy to see what's going on, and why. This is especially the case if these commits include great descriptions of +what it is they're addressing exactly. + +This then doesn't just apply for pull-requests, this kind of workflow can actually be useful to anyone looking over some +code in a file. You could use git blame to find out the commit, and follow the parent commits up, allowing you to see +the individual changes as they were done one by one, which again, is then easier to understand, and allows you to then +realize what the whole file is about much quicker. + +### Easy reverts + +Sometimes, we might realize that a change that we made a while ago should not actually have been made, but the change +was already pushed and there's a lot of commits after it. That means at this point, we can't simply rewrite the history, +and we will need to push a commit that undoes that change. + +The great advantage of atomic commits is that they should include the entire change, along with documentation it +introduces, tests, etc. in a single piece, a single commit. Because of that, assuming there weren't any commits that +built upon this change later on, we can use git's amazing `git revert` command. + +This will create a new commit that undoes everything another specified commit did, making it very easy to revert some +specific change, while leaving everything else alone. This is much faster and easier than having to look at what the +original commit changed line by line, and change it back ourselves, and while this isn't something you'll use all that +often, when you do get a chance to use it, it's really nice and can be a good time saver. + +## Conclusion + +Git is something programmers use every day, learning how to do so properly is invaluable. There's a lot of rules I +mentioned here, and of course, you probably won't be able to just start doing all of them at once. But I would encourage +you to at least stop for a while before every commit you're about to make, and think of whether you really need to stage +all of the files, or if you should do a partial add, and make multiple commits instead, and also take a while to think +of a good commit message. + +For motivation, here's a quick recap of the most important benefits a good git workflow gives you: + +- Your development workflow becomes easier by allowing you to find issues a lot quicker +- You can also help your team or whoever ends up reading your commits understand what's going on and bring them up to date with the project +- You will be able to quickly find out who committed something and why +- Your overall programming skills will improve, because you'll get used to dividing up your problems naturally