mirror of
https://github.com/ItsDrike/itsdrike.com.git
synced 2025-01-23 12:04:35 +00:00
Add btrfs post
This commit is contained in:
parent
863b170334
commit
903123081b
204
content/posts/btrfs.md
Normal file
204
content/posts/btrfs.md
Normal file
|
@ -0,0 +1,204 @@
|
|||
---
|
||||
title: Jumping on the BTRFS hype wagon
|
||||
date: 2024-01-27
|
||||
tags: [linux]
|
||||
sources:
|
||||
- <https://wiki.archlinux.org/title/btrfs>
|
||||
- <https://docs.kernel.org/filesystems/btrfs.html>
|
||||
- <https://en.wikipedia.org/wiki/Btrfs>
|
||||
- <https://www.thegeekdiary.com/features-of-the-btrfs-filesystem/>
|
||||
---
|
||||
|
||||
After a long time constantly hearing about BTRFS filesystem, I decided to make the jump, leaving EXT4 behind. And I
|
||||
have to say, I couldn't be happier.
|
||||
|
||||
For those unaware, BTRFS is a B-tree based filesystem, which you can use as an alternative to EXT4, with some really
|
||||
cool new features, which I'll mention in the post here. In many ways, it is similar to ZFS, but it is meant for
|
||||
personal use, rather than being enterprise focused, and unlike with ZFS, there aren't any licensing controversies
|
||||
accompanying it.
|
||||
|
||||
## Subvolumes
|
||||
|
||||
First thing to mention (and for me probably the most important thing) about BTRFS is it's feature allowing you to
|
||||
create subvolumes.
|
||||
|
||||
Volumes are sort of like partitions, however instead of being done on the device level, specified in a partition table
|
||||
with concrete start and end sectors, making it occupy a very specific, well, "partiton" of the drive, subvolumes are a
|
||||
feature of the filesystem, allowing you to split it up into individual portions, that all live on a single BTRFS
|
||||
partition.
|
||||
|
||||
### Dynamic space
|
||||
|
||||
This is really cool, because these subvolumes can act almost like plain folders within the BTRFS root, and that means
|
||||
they don't have to have a size specified. They will all simply share the single partition, and all of the subvolumes
|
||||
will have the same amount of free space as there is available on the entire partition.
|
||||
|
||||
You can therefore really easily create separate subvolumes for your home (/home) and root (/), mount them individually,
|
||||
and pretty much treat them like regular partitions, but without having to make the decision about how to divide your
|
||||
disk space between them during installation.
|
||||
|
||||
For me, this is a HUGE benefit, because I'm a major proponent for the split architecture, where your home and root (or
|
||||
at least root and some kind of persistent data partition) are separate. This is because it allows you to only wipe out
|
||||
one of them when reinstalling, without the need to copy-over potentially hundreds of gigabytes of data from a backup.
|
||||
|
||||
In EXT4, it was always annoying to have to decide on how much space to allocate to each partition, because I knew that
|
||||
I could use the extra space for my data, but if I ever ran out of space in my root partition, it would pretty much mean
|
||||
I'm gonna have to reinstall, and re-partition, as there's really no good way to expand a partition without corrupting
|
||||
the one right below.
|
||||
|
||||
This setup finally changed that, and I can keep my data separate and persistent across installations, without having to
|
||||
compromise on space for my root partition.
|
||||
|
||||
{{< notice tip >}}
|
||||
While subvolumes are dynamic by default, it is actually possible to set a cap for the max size that subvolume can reach
|
||||
if you need to do so. This can be useful to prevent something growing unpredictably in size. This limit can later be
|
||||
easily increased if needed, making it far superior to regular partitions, which would need to be moved/recreated.
|
||||
{{< /notice >}}
|
||||
|
||||
### Automatic compression
|
||||
|
||||
Another great feature BTRFS subvolumes give you is the ability to specify different compression levels on each of your
|
||||
subvolumes. BTRFS allows you to pick from various compression algorithms, but the most common one which you'll probably
|
||||
want to use too is `zstd`. You can then a compression level 1-10, which will control how aggressive the compression
|
||||
will be.
|
||||
|
||||
This is really nice, because you can set up a separate subvolume for your cache or static data files, to be compressed
|
||||
at high levels, which will cost some CPU time and slow down the read/write speeds to the data stored there, but at the
|
||||
benefit of greatly reducing the disk size, while keeping your root subvolume, which will be written to a lot at a low
|
||||
compression level (or even disable the compression there), and will therefore have a much quicker disk speeds.
|
||||
|
||||
Because cache files usually aren't accessed that often, and contain a lot of data (like for example the pacman package
|
||||
cache, which will contain the older versions of packages you installed, to allow easily reverting), I find it really
|
||||
worth it to be able to create a highly compressed subvolume mounted on `/var/cache`. Additionally, I also have a pretty
|
||||
high compression level on my data subvolume, though since it does contain videos, I don't necessarily use the highest
|
||||
compression level there, to allow me to seamlessly watch them without disk buffering.
|
||||
|
||||
## Snapshots
|
||||
|
||||
Another really cool feature that BTRFS has is the ability to take instant snapshots of a volume, for great backups.
|
||||
This is possible because the snapshot create will essentially just be a link, pointing to the current state of the
|
||||
subvolume it targets, so the only thing that happens on the file-system side is basically a creation of that link,
|
||||
there's no copying done anywhere!
|
||||
|
||||
### Technical explanation
|
||||
|
||||
You can basically think of these as hard-links, pointing to the subvolume itself. Since BTRFS is a **copy-on-write**
|
||||
filesystem, rather than modifying the blocks affected (a single file can take up a lot of physical blocks), which is
|
||||
what EXT4 would do, it instead creates new blocks, where the data for that file is written to, and updating the file
|
||||
metadata, telling it that it should now use these new blocks.
|
||||
|
||||
This is really nice, because when writing to the disk, you're gonna be writing a whole block at a time anyway, so
|
||||
instead of overwriting the existing old one, BTRFS will use a new one, leaving the old ones behind. So if a file is
|
||||
composed of a lot of blocks, only the blocks that actually changed will be copied, and BTRFS will store the information
|
||||
that a part of that file is now in some other physical blocks.
|
||||
|
||||
(Not updating the original location also eliminates the risk of a partial update or data corruption during a power
|
||||
failure.)
|
||||
|
||||
That means you can create hard-links that point to the original blocks, rather than the original files, and since BTRFS
|
||||
won't change those original blocks, but instead will copy the changes to new blocks, these hard-links remain unaffected
|
||||
by any new changes.
|
||||
|
||||
In an EXT4 filesystem, is you have a hard-link, it will always be linked to the inode, representing the hard-linked
|
||||
file, and whenever the file is updated, it's the blocks specified by that inode that get updated, meaning you'll end up
|
||||
with it being modified both on the original (system) file, and in the hard-link pointing to it.
|
||||
|
||||
This kind of hard-link behavior is also possible on BTRFS systems, however you can also hard-link in a way that doesn't
|
||||
update, and instead just holds the original blocks, so as the real system is changing, it's pretty much only the
|
||||
deltas/diffs that get stored, making the backup only take as much space, as the newly made changes since it was taken.
|
||||
|
||||
Once a snapshot is deleted, the old blocks that aren't used in the primary volume anymore will be allowed to get
|
||||
overwritten, hence gaining that space back.
|
||||
|
||||
### Backups at no cost
|
||||
|
||||
Because of the way BTRFS handles snapshots, it therefore allows us to make backups which are essentially just the size
|
||||
of a single link, and are instant. They only get expensive as the original subvolume gets updated. This means it's
|
||||
really beneficial to set up an auto-snapshot routine with auto-rotation, and taking a lot of snapshots. For example,
|
||||
this is mine:
|
||||
|
||||
- 8 hourly snapshots (taken using cron, once we reach 9th snapshot, the oldest one is deleted)
|
||||
- 4 quaterly snapshots (taken using cron, every 15 minutes, except on the full hour, as that's covered by hourly)
|
||||
- 8 daily snapshots (taken by anacron, every day)
|
||||
- 4 weekly snapshots (taken by anacron)
|
||||
- 3 monthly snapshots (taken by anacron)
|
||||
|
||||
Notice just how many hourly and quaterly snapshots I'm able to take, literally I make a snapshot of my system every 15
|
||||
minutes, and i don't even notice!! It comes at no performance cost, not high CPU usage as the files are being copied
|
||||
over, all perfectly seamless.
|
||||
|
||||
To achieve this, I made a bash script that can handle this auto-rotation and snapshot taking, which I'm just calling
|
||||
from cron/anacron. If you're interested, you can find it in my
|
||||
[dotfiles](https://github.com/ItsDrike/dotfiles/blob/main/root/usr/local/bin/btrfs-backup).
|
||||
|
||||
The only backups that I do see some actual space cost from are the monthly ones, which do get out of sync eventually,
|
||||
and so a lot of files are indeed different there.
|
||||
|
||||
### Stupid simple restore
|
||||
|
||||
Snapshots of subvolumes themselves are just another subvolumes, and if you need to restore a snapshot, all you need to
|
||||
do is change the path where your main subvolume is pointing to, switching it to that backup, and done, you've restored
|
||||
from a snapshot.
|
||||
|
||||
This is super cool, because you can for example take some snapshots during your installation, and if you want to
|
||||
reinstall, you can just revert back to those.
|
||||
|
||||
Another really useful thing this allows is to take a snapshot before installing some big app that you really only need
|
||||
to use once, and then revert back, making sure that no residual files from that app will be left behind. Having
|
||||
quaterly snapshots is especially useful here, since you may install something you think you'll want to be using for a
|
||||
long time, only to realize it's actually not all that good.
|
||||
|
||||
{{< notice warning >}}
|
||||
|
||||
With all this great talk about snapshots, you may think that once set up, you'll never need to do those tedious full
|
||||
system backups ever again. Well, that's not true. While snapshots really are amazing, remember that they still all live
|
||||
on a single partition in a single drive. If this drive were to fail, all of your data on it, including the snapshots
|
||||
might get corrupted.
|
||||
|
||||
For that reason, it is very important that you don't just blindly replace your full backup strategies here.
|
||||
|
||||
{{< /notice >}}
|
||||
|
||||
## Multiple system versions
|
||||
|
||||
Another amazing thing you can set up is creating automated boot records for every snapshot, allowing you to boot into
|
||||
an older version of your system completely seamlessly.
|
||||
|
||||
All you need to do to make this work is changing the kernel arguments and defining a different subvolume as your
|
||||
root, specifically, the subvolume containing the snapshot you want to boot into.
|
||||
|
||||
Not only is this really useful for getting into an older system version by booting into snapshots, you can actually use
|
||||
non-snapshot subvolumes too. That means you could easily even keep multiple distributions on a single BTRFS partition.
|
||||
(With dynamic space, shared for each system.)
|
||||
|
||||
## Built-in RAID
|
||||
|
||||
BTRFS also has a built-in support for RAID-0, RAID-1 and RAID-10 levels.
|
||||
|
||||
This type of RAID will ensure that for every block, there are "x" amount of copies. For RAID-1 for example, BTRFS
|
||||
just stores two copies of everything on two different devices.
|
||||
|
||||
Unlike a simple `mdadm` software raid, BTRFS supports self-healing redundant arrays and online balancing, as BTRFS
|
||||
maintains CRC's for all metadata and data so everything is checksummed to preserve the integrity of data against
|
||||
corruption. With RAID-1 or RAID-10 configuration, if the checksum fails on the first read, data is pulled off from
|
||||
another copy.
|
||||
|
||||
## Great SSD performance
|
||||
|
||||
Another benefit of BTRFS is it's automatic detection of solid state drives (SSDs). If an SSD is detected, BTRFS will
|
||||
turn off all optimization for rotational media (i.e. optimizations to reduce seeking, by storing related data close
|
||||
together on spinning drives, which isn't important with SSDs). Alongside that, there's also TRIM support, which tells
|
||||
the SSD which blocks are no longer needed and are available to be written over.
|
||||
|
||||
These will improve reading/writing speed, since the useless CPU intensive operations for spinning drives are disabled,
|
||||
and it can also extend your SSD's lifespan, due to that TRIM support.
|
||||
|
||||
## Efficient storage for small files
|
||||
|
||||
All Linux filesystems address storage in blocks. These blocks have some pre-defined size, like say 4KB. That means
|
||||
storing a file that's smaller than this size will result in the block not being completely utilized. Using a smaller
|
||||
block size isn't a good option either, because it means having to store more metadata (there's more blocks to keep
|
||||
track of).
|
||||
|
||||
On BTRFS, for very small files, the data will actually be stored in the metadata, without taking up any of the data
|
||||
blocks!
|
Loading…
Reference in a new issue