Investigating Backup Solutions: Tarsnap vs. Restic and B2
In my previous post I talked about the various changes I’ve made to my home server, among which was the ongoing switch from Tarsnap to Restic+B2 for backups. I’ve decided as part of this effort to evaluate both solutions in more detail, and I’m going to record the results of that research here. I’m going to be doing some rough tests to determine how fast, efficient (with regards to compression and deduplication), and how easy to use each solution is.
Tarsnap and Restic are both chunk-based incremental backup software. You give them a file or directory to backup and they save that data elsewhere, and try to deduplicate the data even within individual files.
Tarsnap is a service, as well as the open source client software. It’s all written by one person, Dr. Colin Percival, who’s a member of the FreeBSD Security Team and seems to be very well trusted across the internet to write secure reliable software. Because you can only use the official Tarsnap service, you’re tied to their servers and S3 where all Tarsnap data is backed.
Restic, by contrast, is an open source client software with many contributors. Instead of being tied to a single backup service you can use many different backends, and those backends don’t have to support a special protocol. For instance, S3 and Backblaze B2 are both supported directly through their native APIs. I’ve chosen Backblaze because they’re well-known and cheaper than S3.
Borg Backup is a similar, and perhaps better known, piece of software than those I’m evaluating. I’ve chosen not to look at it because, while there’s a number of storage services that offer Borg support, it requires a special protocol be used. That is, I couldn’t use a general-purpose object storage service like S3 or B2.
I’m not evaluating the encryption provided by either solution.
Initialization and Basic Usage
Both are very easy to initialize.
For tarsnap, once you have an account with some balance in it, you have to
tarsnap.key file that identifies your machine. A machine in this case
is just a repository with data you can back up.
$ tarsnap-keygen --keyfile tarsnap.key --user firstname.lastname@example.org --machine test
They key file, as well as some other required information like cache directory, can be passed into the cli when invoking. I’m going to create a config file with the recommended settings instead.
cachedir /usr/local/tarsnap-cache keyfile /media/shared/bu_test/tarsnap.key print-stats checkpoint-bytes 1G
The files I’m going to be backing up are on a slower HDD, but the cache here points to a different disk, a much faster SSD. Restic will be setup the same way, so this will not effect the results.
Tarsnap is designed to accept cli arguments in an analogous way to tar. So,
-cvf <name> means means create verbosely an “archive” of the given name, which
in Tarsnap’s case is the name of the backup. For purposes of this test I’m going
to pass in the tarsnap config explicitly, but I could just as easily move the
config to a well known location like
To demonstrate, I’m going to backup and restore Hello World1.
$ Hello World! >hello.txt $ tarsnap --configfile tarsnap.config -cvf hello-world ./hello.txt a ./hello.txt Total size Compressed size All archives 3581 1974 (unique data) 3581 1974 This archive 3581 1974 New data 3581 1974 $ rm hello.txt $ tarsnap --configfile tarsnap.config -xvf hello-world x ./hello.txt $ cat hello.txt Hello World!
For Restic’s case, I’m first going create a bucket on Backblaze’s website with Lifecycle Settings set to keep only the last version of a file. Then I need to generate credentials. Everything in Restic is encrypted with a password, of which I’m using a randomly generated hex value.
This is more setup than Tarsnap, but once I’m done I can put these into a file and to get passed as environment variables to Restic.
export B2_ACCOUNT_ID=0000000000000000000000001 export B2_ACCOUNT_KEY=000000000000+000000000000/00000 export RESTIC_REPOSITORY="b2:test-restic-backup719" export RESTIC_PASSWORD='00000000000000000000000000000000'
Additionally, unlike Tarsnap, I have to initialize the object store.
$ restic init created restic repository fd1ef589e2 at b2:test-restic-backup719 Please note that knowledge of your password is required to access the repository. Losing your password means that your data is irrecoverably lost.
With all that setup, the backup and restore process is very similar. One notable difference is that everything in Restic is an object that’s referenced by a “Storage ID”, or SHA-256 hash. In the below example I copy and paste the hash to restore the snapshot (in Tarsnap this would be called an archive). In the following test I’ll demonstrate Restic’s tag feature which lets you give names to snapshots.
$ Hello World! >hello.txt $ restic backup -v hello.txt open repository repository fd1ef589 opened (repository version 2) successfully, password is correct created new cache in /home/eric/.cache/restic lock repository no parent snapshot found, will read all files load index files start scan on [hello.txt] start backup on [hello.txt] scan finished in 1.080s: 1 files, 13 B Files: 1 new, 0 changed, 0 unmodified Dirs: 0 new, 0 changed, 0 unmodified Data Blobs: 1 new Tree Blobs: 1 new Added to the repository: 502 B (489 B stored) processed 1 files, 13 B in 0:02 snapshot 1ea367a7 saved $ rm hello.txt $ restic restore -v --target . 1ea367a7 repository fd1ef589 opened (repository version 2) successfully, password is correct restoring <Snapshot 1ea367a7 of [/media/shared/bu_test/hello.txt] at 2023-01-02 14:35:30.285350062 -0500 EST by eric@patrick> to . $ cat hello.txt Hello World!
Backing up Videos
As a first test I’m going to download about 1GB of US WWII training videos and back them up to each service. In the next test I’m going to restore them. This is a good baseline since compressed video is unlikely to be compressed further or have any opportunities for de-duplication.
These videos all come from searching Archive.org.
$ curl -i videos.txt $ ls -lh total 1.3G -rw-r--r--. 1 eric eric 88M Mar 25 2020 '25804 Nazi Tanks And How To Destroy Them.mp4' -rw-r--r--. 1 eric eric 182M Jan 27 2019 33064+US+Navy+This+Is+It+Reel+1_vwr.mp4 -rw-r--r--. 1 eric eric 90M Jan 19 2022 51924+US+Navy+Primary+Flight+Training+Attitides+Of+Flight+Pt+1.mp4 -rw-r--r--. 1 eric eric 131M Jul 26 2017 '77324 Know Your Enemy German Equipment.mp4' -rw-r--r--. 1 eric eric 174M Dec 22 2021 '79444 Hand To Hand Combat parts 1_2.mp4' -rw-r--r--. 1 eric eric 205M Jan 29 2019 85704+Personal+Hygiene.mp4 -rw-r--r--. 1 eric eric 45M Dec 24 2021 85834+Fighting+Men+How+To+Get+Killed.mp4 -rw-r--r--. 1 eric eric 123M Jul 28 2018 '87084 Normandy Invasion.mp4' -rw-r--r--. 1 eric eric 98M Sep 23 2020 '87614 Damage Control Part 2.mp4' -rw-r--r--. 1 eric eric 108M Sep 30 2020 '87634 US Navy Primary Flight Training Taxing And Take Offs.mp4' -rw-r--r--. 1 eric eric 1.2K Jan 2 14:57 videos.txt
Backing up with Tarsnap.2
$ time tarsnap --configfile ../tarsnap.config -cvf wwi_training_videos . a . a ./videos.txt a ./87634 US Navy Primary Flight Training Taxing And Take Offs.mp4 a ./25804 Nazi Tanks And How To Destroy Them.mp4 a ./77324 Know Your Enemy German Equipment.mp4 a ./87084 Normandy Invasion.mp4 a ./85834+Fighting+Men+How+To+Get+Killed.mp4 a ./87614 Damage Control Part 2.mp4 a ./79444 Hand To Hand Combat parts 1_2.mp4 a ./85704+Personal+Hygiene.mp4 tarsnap: Creating checkpoint... done. a ./33064+US+Navy+This+Is+It+Reel+1_vwr.mp4 a ./51924+US+Navy+Primary+Flight+Training+Attitides+Of+Flight+Pt+1.mp4 Total size Compressed size All archives 1299985235 1287737710 (unique data) 1299605036 1287653709 This archive 1299982465 1287737139 New data 1299602266 1287653138 real 6m2.652s user 1m42.956s sys 0m2.012s
Backing up with Restic.
$ time restic backup -v --tag wwi_training_videos . open repository repository fd1ef589 opened (repository version 2) successfully, password is correct lock repository no parent snapshot found, will read all files load index files start scan on [.] start backup on [.] scan finished in 0.912s: 11 files, 1.210 GiB Files: 11 new, 0 changed, 0 unmodified Dirs: 0 new, 0 changed, 0 unmodified Data Blobs: 824 new Tree Blobs: 1 new Added to the repository: 1.210 GiB (1.206 GiB stored) processed 11 files, 1.210 GiB in 2:20 snapshot 5f8b4a34 saved real 2m23.394s user 0m39.760s sys 0m4.496s
My initial conclusions here are that, while Restic is faster at backing up, the speeds here are both more than good enough for my needs. Tarsnap’s documentation sounds like on backup it automatically replicates the data across different regions, whereas my Backblaze data exists only in a single region. In that case, this extra security is worth the slower uploads.
Both, expectantly, take roughly 1.3GB to backup the videos. This makes sense since it’s unlikely to get much compression out of them. Tarsnap gives the total stored size as 1287737710, or 1.29GB, whereas the 1.210GiB given by Restic is 1.30 GB3.
Prior to restoring the videos, I’m going to grab the checksums so I can ensure they get restored properly. I’ll show this explicitly here but I’ll check these behind the scenes when I don’t show it.
$ md5sum video/* | tee videos_md5.txt a999d3901f85cd606546a0cce653cfc8 video/25804 Nazi Tanks And How To Destroy Them.mp4 a639246a18a0a0822c964f042fe83d76 video/33064+US+Navy+This+Is+It+Reel+1_vwr.mp4 4e22545327f5aa4160d951e5e4bec624 video/51924+US+Navy+Primary+Flight+Training+Attitides+Of+Flight+Pt+1.mp4 fb838d4ef0a81a5f99442889e5f109bd video/77324 Know Your Enemy German Equipment.mp4 fd03ca39ba81ec9bbcd2e859fc958e70 video/79444 Hand To Hand Combat parts 1_2.mp4 a456501b8174ca5fef221ec2e6227328 video/85704+Personal+Hygiene.mp4 70bd52b85a94affb7679d60690144570 video/85834+Fighting+Men+How+To+Get+Killed.mp4 4dc80fb1452d52a39d20d9479732dd9e video/87084 Normandy Invasion.mp4 959e05203e6ad8e0fbc5ec8cff7370e6 video/87614 Damage Control Part 2.mp4 e61699e91127c3b6b95c2c95687f30f1 video/87634 US Navy Primary Flight Training Taxing And Take Offs.mp4 ee66d4039cbfa77caeef2f2570c2ec95 video/videos.txt $ rm *
Now, to “extract” the archive.
$ time tarsnap --configfile ../tarsnap.config -xvf wwi_training_videos x ./ x ./videos.txt x ./87634 US Navy Primary Flight Training Taxing And Take Offs.mp4 x ./25804 Nazi Tanks And How To Destroy Them.mp4 x ./77324 Know Your Enemy German Equipment.mp4 x ./87084 Normandy Invasion.mp4 x ./85834+Fighting+Men+How+To+Get+Killed.mp4 x ./87614 Damage Control Part 2.mp4 x ./79444 Hand To Hand Combat parts 1_2.mp4 x ./85704+Personal+Hygiene.mp4 x ./33064+US+Navy+This+Is+It+Reel+1_vwr.mp4 x ./51924+US+Navy+Primary+Flight+Training+Attitides+Of+Flight+Pt+1.mp4 real 19m17.966s user 1m45.293s sys 0m11.627s $ diff videos_md5.txt <(md5sum video/*) && echo Files Matched Files Matched
Before restoring Restic’s copy let’s first grab the snapshots id by the tag.
$ time restic snapshots --tag wwi_training_videos repository fd1ef589 opened (repository version 2) successfully, password is correct ID Time Host Tags Paths ------------------------------------------------------------------------------------------- 5f8b4a34 2023-01-02 15:23:48 patrick wwi_training_videos /media/shared/bu_test/video ------------------------------------------------------------------------------------------- 1 snapshots real 0m2.983s user 0m0.627s sys 0m0.085s
Then, the restore.
$ rm * $ time restic restore -v --target . 5f8b4a34 repository fd1ef589 opened (repository version 2) successfully, password is correct restoring <Snapshot 5f8b4a34 of [/media/shared/bu_test/video] at 2023-01-02 15:23:48.514426462 -0500 EST by eric@patrick> to . real 0m39.491s user 0m21.337s sys 0m6.963s $ diff videos_md5.txt <(md5sum video/*) && echo Files Matched Files Matched
My conclusions here are a bit stronger. Tarsnap is not only a full 30 times slower, but the times imply even a small disk backup could take hours. For uploads Tarsnap has the excuse that the backups are probably more reliable, but it’s not clear to me how this would make restores take so much longer.
A less important note is that it seems Restic, regardless of the command, has a near constant time startup of a couple seconds. This is true even for simple commands like listing snapshots. My guess for this, is that it’s likely caused by the non-ideal protocol for communicating with B2, and possibly Go being slow.
Backing Up and Restoring Filesystem Dumps
I have three filesystem dumps taken within the same hour of each other. The first two only differ in logs, while the last has some additional changes that total under 1MB. In short, these files contain almost exactly the same data.
$ ls -lh *.xfsdump -rw-r--r--. 1 eric eric 12G Dec 30 14:07 2022-12-28_15-40-53.xfsdump -rw-r--r--. 1 eric eric 12G Dec 30 14:11 2022-12-28_15-57-39.xfsdump -rw-r--r--. 1 eric eric 12G Dec 30 14:16 2022-12-28_16-12-44.xfsdump
These were generated by the xfsdump utility, of which I don’t know
the format or how well it fares being incrementally backed up. To give a sense
of this, I’ve compressed these into a
$ tar -cJf all_xfsdumps.tar.xz *.xfsdump $ ls -lh all_xfsdumps.tar.xz -rw-r--r--. 1 eric eric 16G Dec 30 16:54 all_xfsdumps.tar.xz
These will be backed up one at a time4, then in the next test I’ll restore the snapshot. Since showing full commands and outputs would get repetitive, I’m only going to show the timing result.
|Service||Backup 15:40||Backup 15:57||Backup 16:12||Restore 16:12|
Restic is faster in every case, but the restore time is particularly notable. That’s a 50x difference, or almost 3 hours to restore a measly 12GB. Extrapolating means a 1TB hard drive, if it couldn’t be compressed, would take 9.7 days.
As a curiosity, let’s look at how each tool did with compressing the
archives.5 The only compression to be found is in the three disk images,
which are mostly the same. For reference, recall
tar -CJf compressed the three
files to 16GB, which is a ratio of 2.25.
$ tarsnap --configfile tarsnap.config --print-stats -f 2022-12-28_16-12-44_xfsdump -f 2022-12-28_15-40-53_xfsdump -f 2022-12-28_15-57-39_xfsdump Total size Compressed size All archives 36910640691 21689339506 (unique data) 12502260818 7832228513 2022-12-28_16-12-44_xfsdump 11871217344 6801465399 (unique data) 115988023 67828904 2022-12-28_15-40-53_xfsdump 11868948776 6799338774 (unique data) 112927367 68826211 2022-12-28_15-57-39_xfsdump 11870489336 6800797623 (unique data) 92878595 58701545
If you do some subtraction you find each of the xfsdumps have roughly 6.7GB of non-unique data, and roughly 58-69 MB of unique data. Using my rough math this gives
6.7 (shared) + 0.069 + 0.068 + 0.058 = 6.9GB total space to store 36GB, or a 5.22 compression ratio.
As for Restic,
$ restic stats --mode=raw-data fc381f78 88b07b3f 3cb72f15 repository fd1ef589 opened (repository version 2) successfully, password is correct scanning... Stats in raw-data mode: Snapshots processed: 3 Total Blob Count: 13380 Total Size: 6.295 GiB
Note, 6.295GiB = 6.76GB which is roughly the same compression ratio as Tarsnap. In all, both tools are good with compression.
The last test will be pruning, or deleting old backups. I’m going to keep things simple and delete all snapshots/archives except the last one, that is the 16:12 disk image.
$ time tarsnap --configfile ../tarsnap.config -dv -f wwi_training_videos -f hello-world -f 2022-12-28_15-40-53_xfsdump -f 2022-12-28_15-57-39_xfsdump Deleting archive "wwi_training_videos" Total size Compressed size All archives 35610659037 20401603770 (unique data) 11202659363 6544576778 wwi_training_videos 1299981654 1287735736 Deleted data 1299601455 1287651735 Deleting archive "hello-world" Total size Compressed size All archives 35610655456 20401601796 (unique data) 11202655782 6544574804 hello-world 3581 1974 Deleted data 3581 1974 Deleting archive "2022-12-28_15-40-53_xfsdump" Total size Compressed size All archives 23741706680 13602263022 (unique data) 11089728415 6475748593 2022-12-28_15-40-53_xfsdump 11868948776 6799338774 Deleted data 112927367 68826211 Deleting archive "2022-12-28_15-57-39_xfsdump" Total size Compressed size All archives 11871217344 6801465399 (unique data) 10968340960 6403712363 2022-12-28_15-57-39_xfsdump 11870489336 6800797623 Deleted data 121387455 72036230 real 0m50.110s user 0m5.140s sys 0m0.363s
$ time restic forget --prune 1ea367a7 5f8b4a34 fc381f78 88b07b3f repository fd1ef589 opened (repository version 2) successfully, password is correct [0:00] 100.00% 4 / 4 files deleted 4 snapshots have been removed, running prune loading indexes... loading all snapshots... finding data that is still in use for 1 snapshots [0:00] 100.00% 1 / 1 snapshots searching used packs... collecting packs for deletion and repacking [0:00] 100.00% 467 / 467 packs processed to repack: 0 blobs / 0 B this removes: 0 blobs / 0 B to delete: 894 blobs / 1.271 GiB total prune: 894 blobs / 1.271 GiB remaining: 13313 blobs / 6.230 GiB unused size after prune: 166.255 MiB (2.61% of remaining size) rebuilding index [0:00] 100.00% 385 / 385 packs processed deleting obsolete index files [0:00] 100.00% 6 / 6 files deleted removing 82 old packs [0:07] 100.00% 82 / 82 files deleted done real 0m13.132s user 0m1.570s sys 0m0.315s
I mentioned in the previous post that a big impetus to switch was the cost difference. Letting 100GB sit for 1 year cost $300 on Tarsnap and $6 on Backblaze.6 This hasn’t changed, and I would still likely switch if Restic was worse in other ways as well. Still, I think there’s a few other considerations to look at.
One thing I came across in using Restic is locks.
Both Tarsnap and Restic require some sort of transaction system to protect data from concurrent writes.
Tarsnap has a transaction system built on top of S3, but I’ve never had to care about this since the details are handled by Tarsnap’s own servers.
Restic creates a directory on B2 called
locks with json files describing what’s being locked, how and by who.
I’ve run into multiple situations where Restic’s locks went out of sync, usually because I exited in the middle of a backup or restore, and had to manually clear them with
This becomes uncomfortable if you can’t be sure other Restic operations aren’t happening at the same time, and speaks to the advantage of having a server between you and the storage provider.
Ergonomically I much prefer Restic’s interface.
Tarsnap seems to expect that you’re using some tool in between it, since it doesn’t give much metadata to your backups and doesn’t provide any tools to prune old backups.
Otherwise it’s up to the user to choose descriptive filenames so you know what time the backup was created and where the backup was created from.
Restic, by contrast, doesn’t require you to name anything and provides a very flexible
restic forget command.
To give an example, here’s what Restic shows when listing snapshots.
$ restic snapshots repository fd1ef589 opened (repository version 2) successfully, password is correct ID Time Host Tags Paths --------------------------------------------------------------------------------------------------------------------------- 1ea367a7 2023-01-02 14:35:30 patrick /media/shared/bu_test/hello.txt 5f8b4a34 2023-01-02 15:23:48 patrick wwi_training_videos /media/shared/bu_test/video fc381f78 2023-01-02 17:46:30 patrick /media/shared/bu_test/diskimage/2022-12-28_15-40-53.xfsdump 88b07b3f 2023-01-02 18:04:26 patrick /media/shared/bu_test/diskimage/2022-12-28_15-57-39.xfsdump 3cb72f15 2023-01-02 18:09:41 patrick /media/shared/bu_test/diskimage/2022-12-28_16-12-44.xfsdump ---------------------------------------------------------------------------------------------------------------------------
Asking tarsnap to list archives gives just it’s names, and they’re not even in any particular order.
$ tarsnap --configfile ../tarsnap.config --list-archives wwi_training_videos 2022-12-28_16-12-44_xfsdump hello-world 2022-12-28_15-40-53_xfsdump 2022-12-28_15-57-39_xfsdump
Tarsnap is more minimalist in its output, even with the verbose flag. When backing up a large file there’s no indication, except for the 1GB checkpoints I set in the configuration, of progress. Restic shows and updates the percentage complete on backup, but not restore.
Another feature I can’t find an equivalent of in Tarsnap is
restic find and
restic cat which let you find and print individual files. Tarsnap does allow restoring only a certain file or directory from within an archive.
Most of this analysis doesn’t matter because the cost analysis is so stark. Restic is going to serve the “1” in my 3-2-1 backup scheme, so the single region is fine. What I’m hoping to gain by the lower cost is the ability to cast a wider net without worrying about cost.
That said, even if the cost wasn’t a factor the restore times are genuinely concerning. Even knowing I’ll eventually get my data, I would be very stressed if it took hours or days to restore. And there’s a good chance I’ll need multiple attempts at restoring, such as if I need the data before I can buy a replacement for a failed drive.
I’m running tarsnap 1.0.40 and restic 0.14.0. A speed test has my upload and download exceeding 300Mbps. ↩︎
Ignore my mistake in tagging these videos
1GB = 10**9 bytes,
1GiB = 2**30 bytes↩︎
Tarsnap insists the user use
--snaptimefor filesystem snapshots to prevent race conditions, but this isn’t relevant here. ↩︎
I’m treating de-duplication and compress as the same thing in this article. ↩︎
There’s an example on Tarsnap’s homepage where “Alice pays less than $5/month” for “several terabytes of non-unique data”. From what I can tell this requires Tarsnap to compress the data down to 20GB, which seems unlikely in my situation. ↩︎