/external/
on oldbox
is an internal hardrive.
It should be called /archive/
to match officebox
.officebox
that stores /archive/
is called internalpool
.
This is a weird name, which was original used to distinguish is from the external harddrive.archivepool
.tailscale
.I’ve been tinkering away at a backup system since 2022-02-15. At that time, my backup “system” was a mess. I had an external hard drive with a loose collection of backup files scattered over the past few years. I am not sure what is in any of the backups. How do I restore any of this? Who knows? I didn’t document anything when I backed this stuff up.
google-drive-backup.zip
lightbox-backup-2021-07-06
loopland-backup-2020-10-16
meg-photo-backup-2020-07-05
officebox-backup-2021-07-16
pgadey-website-syncthing-backup-2021-03-27
skatolo.backup-2019-11.tar.gz
work-syncthing-backup-2021-03-27
zenbook-backup-2020-03.tar.gz
zenbook-backup-2021-07-07
So, I’m writing up some documentation about how my current system works. This write-up has two purposes: to help me clarify my thinking about the system, and to explain to the reader how to setup such a system. Setting up this backup system, and writing all my own tooling around it, was very educational and I encourage any Linux beginner to undertake a similar project.
The whole backup system is based on ZFS a file system developed by Sun Microsystems in the early 2000s. Many people have called it “the last word in filesystems”. It has a lot of amazing features, but my backup system only uses two of them:
ssh
to move snapshots between computers.
This is how backups moved around my backup system.There are three types of backups that I store: archival, offsite, and offline. The archival copies of data are stored on my main desktop computer officebox which is located at work. The archival record gets updated every fifteen minutes. I’ve only ever used it to restore files that get deleted accidentally while working.
To protect against a disk failure (or worse) on officebox, I have offsite backups that are stored on a computer oldbox in the basement of my apartment. These get updated on a weekly basis.
Finally, to protect against both officebox and oldbox failing, I have offline backups that I keep with my family on the East Coast. These will get updated on an annual basis.
There are six major components to the system:
This might sound somewhat sophisticated, but it is actually a bunch of hack-y bash
scripts.
It relies heavily on cron
even though cron
has been deprecated since pre-historic times.
Soren Bjorstad has a nice post on Getting Your Filesystem Hierarchy Less Wrong. Really thinking through my file hierarchy and deciding what needed to be backed up and archived was a whole task in itself.
officebox has a datapool mounted to /archive/
that contains the following:
officebox:/archive/
├── backup-log.txt
├── Family
├── Google
├── Hugo
├── Music
├── Personal
├── Phones
├── Pictures
├── PROCESS
├── Servers
├── Trips
├── Video
└── Work
oldbox has a similar hierarchy mounted to /external/
which contains:
oldbox:/external/
├── Dotfiles
├── Family
├── Hugo
├── Music
├── Personal
├── Phones
├── Pictures
├── Servers
├── Trips
├── Video
└── Work
It is worth noting that Dotfiles
is present on oldbox but not on officebox.
For the details of why this is, see how I got Locked Out of My Dotfiles and other highlights in the Blooper Reel of False Starts.
The automatic snapshots are done on officebox by zfs-autosnapshot
.
These different frequencies of snapshots are run as cron
jobs.
/etc/cron.d/zfs-auto-snapshot
/etc/cron.daily/zfs-auto-snapshot
/etc/cron.hourly/zfs-auto-snapshot
/etc/cron.monthly/zfs-auto-snapshot
/etc/cron.weekly/zfs-auto-snapshot
A typical example looks something like the following script for daily backups:
#!/bin/sh
# Only call zfs-auto-snapshot if it's available
which zfs-auto-snapshot > /dev/null || exit 0
exec zfs-auto-snapshot --quiet --syslog --label=daily --keep=31 //
The trailing double slash means that zfs-auto-snapshot
will snapshot all zfs datapools.
The --keep=31
flag means that it will keep a rolling list of thirty one snapshots.
In principle one could ask for more granularity.
This script produces backup snapshots named like the following:
/archive/.zfs/snapshot/zfs-auto-snap_daily-2022-09-04-1136
It turns out that snapshotting all zfs datapools made some things difficult.
For example, I wanted to plug in an external harddrive with a ZFS datapool on it and use it without adding to the external’s snapshot chronology.
So, I had to switch zfs-auto-snapshot
to only snapshot internalpool
.
Now, the daily backup zfs-auto-snapshot
script looks like this:
#!/bin/sh
# Only call zfs-auto-snapshot if it's available
which zfs-auto-snapshot > /dev/null || exit 0
# create a rotating daily backup
exec zfs-auto-snapshot --quiet --syslog --label=daily --keep=31 internalpool
# create the common ancestor snapshot for offsite and offline backups
exec zfs-auto-snapshot --quiet --syslog --prefix=offsite --label=daily internalpool
This produces an additional series of daily backups intended to synchronize with the offsite backup. They are named as follows:
internalpool@offsite_daily-2022-11-24-0736
This insight about different retention policies and common ancestors came from this Stackexchange post: https://unix.stackexchange.com/questions/289127/zfs-send-receive-with-rolling-snapshots
The offsite backups are managed by a script:
#!/bin/bash
# run this script on the offsite backup box
# local and remote are a little weird here: LOCAL has the "archival" copy of the data, REMOTE has the "offsite" backup.
LOCAL="ts-officebox"
REMOTE="ts-oldbox"
OffsiteBackupPool="offsitepool"
CommonSnapshotPrefix="offsite_daily"
MostRecentLocal=$(ssh $LOCAL zfs list -t snapshot -o name -s creation | grep $CommonSnapshotPrefix | tail -n1)
# Get the snapshot name of the most recent remote snapshot by stripping the pool name before the @ symbol.
# For example: "offsitepool@offsite_daily-2022-10-31-0817" --> "offsite_daily-2022-10-31-0817"
MostRecentRemote=$(ssh $REMOTE zfs list -t snapshot -o name -s creation | grep $CommonSnapshotPrefix | tail -n1 | cut -d'@' -f 2)
ssh $LOCAL \"zfs send --verbose -I $MostRecentRemote $MostRecentLocal\" | pv | sudo zfs receive -F $OffsiteBackupPool
This script performs a differential backup of the archival dataset to the offsite backup. First, it looks at the most recent snapshot of the archival dataset intended to be sent for offsite backup. It then looks for the most recent snapshot in the offsite dataset. The script is supposed to be run on the computer storing the offsite backup. It asks the computer storing the archival copy of the data to send all intermediary snapshots needed to fill in the gap between the archival and offsite datasets.
The offline backups are managed by the following script:
#!/bin/bash
OfflineDevice="/dev/disk/by-uuid/17826005912122203015";
LocalPool="offsitepool";
OfflinePool="basementpool";
CommonSnapshotPrefix="offsite_daily";
MostRecentDaily=$(zfs list -t snapshot -o name -s creation $LocalPool | grep $CommonSnapshotPrefix | tail -n1) ;
MostRecentDailySnapshot=$(echo "$MostRecentDaily" | cut -d'@' -f 2);
echo "Most recent local daily snapshot: $MostRecentDaily";
echo "Importing all pools on offline storage: $OfflineDevice.";
zpool import -a -d $OfflineDevice;
echo "Checking on the status of $OfflinePool:";
zpool status $OfflinePool;
# Get the snapshot name of the most recent offline snapshot by stripping the pool name before the @ symbol.
# For example: "offsitepool@offsite_daily-2022-10-31-0817" --> "offsite_daily-2022-10-31-0817"
MostRecentOffline=$(zfs list -t snapshot -o name -s creation $OfflinePool | grep $CommonSnapshotPrefix | tail -n1 | cut -d'@' -f 2);
echo "Most recent offline daily snapshot: $MostRecentOffline"
if [ "$MostRecentDailySnapshot" = "$MostRecentOffline" ]; then
echo "The most recent snapshots match: $MostRecentDailySnapshot.";
else
echo -e "The most recent snapshots do not match:\n\t Daily: $MostRecentDailySnapshot \n\t Offline: $MostRecentOffline";
fi
# Show a dry-run of the proposed backup
zfs send --dryrun --verbose -I $MostRecentOffline $MostRecentDaily
echo -e "And now the moment we've all been waiting for ... \n";
echo "zfs send --verbose -I $MostRecentOffline $MostRecentDaily | pv | zfs receive -F $OfflinePool";
read -p "Do you want to run the command? y[n]" answerBackup
answerBackup=${answerBackup:-"N"};
case $answerBackup in
[Yy]* )
echo "Yes! Let's do this thing." ;
zfs send --verbose -I $MostRecentOffline $MostRecentDaily | pv | zfs receive -F $OfflinePool;
answerBackup="y" ;;
[Nn]* )
echo "No! It doesn't seem right." ;
answerBackup="n" ;;
esac
echo "Exporting $OfflinePool.";
zpool export $OfflinePool ;
This job sends me an e-mail to let me know that the scrub has started.
To make sure that things are running correctly, I need some way for the system to notify me regularly that everything is okay.
The script /etc/cron.weekly/zfs-scrub-and-email
is run weekly to start a scrub of the datapool which contains all the snapshots.
To verify the integrity of the backups, and make sure that things are still running smoothly, I have a cron
job that scrubs the datapool.
It does three things: checks the status of the pool, e-mails me to let me know that the scrub started, and makes a short note in the backup-log.txt
.
The way that it e-mails me is particularly hacky. It uses my account on ctrl-c.club to send an e-mail, because getting my office computer to send e-mail automatically is next to impossible.
#!/bin/bash
HOSTNAME=$(hostname -s)
POOL="internalpool"
EMAIL="parkerglynnadey@gmail.com"
# check current the status
STATUS=$(/usr/sbin/zpool status $POOL)
# use ctrl-c.club to send the mail for you
ssh pgadey@ctrl-c.club -i /home/pgadey/.ssh/id_rsa -x "echo -e \"Subject: $HOSTNAME: scrub STARTED on $POOL\n\n$STATUS\" | sendmail -t $EMAIL"
# scrub the pool
/usr/sbin/zpool scrub $POOL
# make a log entry
echo "$(date --iso-8601=seconds) : started scrub of $POOL on $HOSTNAME." >> /archive/backup-log.txt
Scrubs can take a variable amount of time, and so there needs to be some mechanism for notifying me that the scrub finished.
zed
, the ZFS Event Deamon, automatically runs scripts which match /etc/zfs/zed.d/scrub_finish-*
whenever a scrub finishes.
By suitably modifying the script above and placing it at /etc/zfs/zed.d/scrub_finish-notify-by-email.sh
, I am able to get notifications when scrubs finish.
This script also sends me the tail of backup-log.txt
.
#!/bin/bash
HOSTNAME=$(hostname -s)
POOL="internalpool"
EMAIL="parkerglynnadey@gmail.com"
# check current the status of the zpool
STATUS=$(/usr/sbin/zpool status $POOL)
# append a bit of the back-up.log for good measure
STATUS="$STATUS \n \n backup-log.txt: \n $(tail -n 5 /archive/backup-log.txt)"
# hackily use ctrl-c.club to send the mail for you
ssh pgadey@ctrl-c.club -i /home/pgadey/.ssh/id_rsa -x "echo -e \"Subject: $HOSTNAME: scrub FINISHED on $POOL\n\n$STATUS\" | sendmail -t $EMAIL"
echo "$(date --iso-8601=seconds) : scrub finished on $POOL on $HOSTNAME." >> /archive/backup-log.txt
To keep backups of my home directory on a couple remote servers, I have a script /etc/cron.weekly/archive-servers.sh
that runs rsync
in archive mode.
This script populates the directory /archive/Servers/
.
#!/bin/bash
# this backup will be perfomed by root@officebox
# so, use -rsh to setup ssh to act like pgadey@officebox
RSH="ssh -F /home/pgadey/.ssh/config -i /home/pgadey/.ssh/id_rsa"
# archive pgadey.ca
rsync --archive --verbose --compress \
--rsh="$RSH" \
pgadey@cloudbox:/home/pgadey \
/archive/Servers/pgadey.ca
# etc. etc. for various servers
echo "$(date --iso-8601=seconds) : Servers (pgadey.ca, etc.) archived." >> /archive/backup-log.txt
This backup strategy is rather hands on. And there are elements that I only do once in a long while. For example, I only update the offline storage about once per year. And so, I usually can’t remember how to use the setup when it comes time. Here are the instructions that I’ve left for myself.
It is helpful to have lots of copies of your backups. These are the steps that I take to setup a new external hard drive and get a copy of the data on to it.
If you’ve got a fresh hard drive, right out of the box, follow these steps. Check that the power on the external hard drive enclosure is turned off. Put the hard drive in to the bay. Turn the enclosure on.
# create a blank partition table
sudo gparted
# Find the name of the new device.
# Gparted --> Devices --> etc.
# Click through:
# Device --> Create Partition Table
# Select: "new partition table: msdos"
# create a new unformatted partition
# Click through:
# Partition --> New
# Select: "filesystem: cleared"
# Find the new device and note down its UUID
blkid
# some representative values
POOL="minipool";
UUID="/dev/disk/by-uuid/4075467478855155972";
# create a new pool on the drive
# the flag is needed to force overwriting the existing ext4 partition
sudo zpool create -f $POOL /dev/disk/by-uuid/$UUID
sudo chown -R pgadey:pgadey /$POOL
LocalPool="offsitepool";
OfflinePool="basementpool";
# Find the earliest "offsite_daily" snapshot
zfs list -t snapshot -o name,creation -s creation internalpool | grep offsite_daily | head
# send it to the new drive
# (this operation will take a long while )
sudo zfs send internalpool@offsite_daily-2023-03-28-0738 | pv | sudo zfs receive -F $POOL
# Perform a offline-backup.sh
# (see details below)
Once the pool is exported, I put a little label on the physical drive. If I ever need to access the data again, the label has everything I might need to get up and running quickly.
Parker Glynn-Adey
https://pgadey.ca
POOL="minipool"
UUID="4075467478855155972"
minipool@offsite_daily-2023-03-28-0738
About once a year, I update the offline backup.
To update the offline backup:
First, plug in the external hard drive while the machine is running.
Otherwise, the zfs filesystem on an external hard drive will mess up the boot process.
Second, the headers of the offline-backup.sh
script to make sure you have the right UUIDs and drives.
Third, run the following commands.
(You can skip the zpool status
on either side of the backup, but it’s nice to check.)
sudo zpool status
sudo /home/pgadey/bin/offline-backup.sh
sudo zpool status
The command offline-backup.sh
will export the drive for you once the backup is complete.
This means that you can unplug the external hard drive, and return it to storage.
At one point, I had a backup setup similar to the system described above.
However, it involved an external drive in a harddrive bay on my desk.
It looked pretty rad; I enjoyed seeing the exposed hard drive.
Every fifteen minutes, zfs-autosnapshot
would make a snapshot and I would hear the drive click in to action.
This was a nice reminder that the system was working.
So, for a couple weeks the drive sat there happily clicking every fifteen minutes.
I only noticed that the external hard drive wasn’t storing any data when I needed to recover a backup. It had been mounted to the wrong mountpoint for weeks, and nothing had been written to it!
During another iteration of this backup system, I had my dotfiles stored in the archival datapool.
ZFS had issues with mounting an external drive at startup, so I had to boot up the system with no dotfiles (and no notes!) and mount the external manually.
This was a huge hassle, and prompted me to separate out my dotfiles from the archival datapool.
Eventually, I switched away from keeping the archival datapool on an external harddrive to avoid these mount issues at boot time.
stow
Behaves Weird with This SetupIt turns out that stow does not like complicated symbolic link situations.
Right now, I have ~/Dotfiles
symlinked to /archive/Dotfiles/
and this messes
stow up
. The fix is to clearly lay out how the source “dir” and target are
related to each other.
stow --dir=/archive/Dotfiles --target=/home/pgadey --verbose=2 PACKAGE
A helpful script for automating this is:
cd /archive/Dotfiles
for name in */;
do stow --dir=/archive/Dotfiles --target=/home/pgadey --verbose=2 "$name";
done;
While I was writing up this post, I decided to check /archive/backup-log.txt
to see how things were looking.
It turned out that the script for archiving servers had not run in months.
A bit of digging turned up the fact that the permissions were set wrong on the script, and it never ran.
This prompted me to include a bit of the log in every notification of a completed scrub.
The zfs-auto-snaphot
setup that I use maintains a rolling list of thirty one daily snapshots and an unlimited amount daily snapshots for the sake of offline and offsite backups .
In principle, zfs is so efficient that it wouldn’t use up much more disk space to keep an unlimited number of daily snapshots.
However, I find that that amount of granularity is confusing.
Whenever I’ve needed to look something up in the backups, I usually remember which month it was but have only the foggiest notion of which day it was.
I think that it’s important to be aware of how the backup system is running,
and not have blind faith that it will work on its own.
Ultimately, this backup system is a bunch of bash
scripts that I hacked together.
It is not enterprise grade software by any means.
So, I like to check-in from time to time and see what is happening.
The weekly e-mails from the scrubs are just enough notification to make sure that I don’t forget what is going on.
Keeping a human in the loop is a concept from AI research that I think fits with this system.
I think that it is worth keeping a record of how and when one uses their backups. This gives a sense of the actual use cases for restoring data.
zfs-auto-snap_weekly-2023-02-28-1244
and restored back most of what I’d
mangled.ls /archive/.zfs/snapshot/*2023-03*/Hugo/pgadey/content/seminar.md
Eventually found one with the desired links are re-instated it./archive/.zfs/snapshot/zfs-auto-snap_hourly-2023-04-28-1717/
)sudo zpool import
found the disk and mounted the pool just fine. After that, running
sudo /home/pgadey/bin/offline-backup.sh
worked without issue.syncthing
wasn’t synching my data for a while and I
had no idea. This left me in an awkward spot when I had to work from home for
a week before classes start.Published: Sep 8, 2022
Last Modified: Aug 23, 2024
Thanks for reading! If you have any comments or questions about the content, please let me know. Anyone can contact me by email.