Investigating Backup Solutions: Tarsnap vs. Restic and B2

Published on January 4, 2023

In my previous post I talked about the various changes I’ve made to my home server, among which was the ongoing switch from Tarsnap to Restic+B2 for backups. I’ve decided as part of this effort to evaluate both solutions in more detail, and I’m going to record the results of that research here. I’m going to be doing some rough tests to determine how fast, efficient (with regards to compression and deduplication), and how easy to use each solution is.

Tarsnap and Restic are both chunk-based incremental backup software. You give them a file or directory to backup and they save that data elsewhere, and try to deduplicate the data even within individual files.

Tarsnap is a service, as well as the open source client software. It’s all written by one person, Dr. Colin Percival, who’s a member of the FreeBSD Security Team and seems to be very well trusted across the internet to write secure reliable software. Because you can only use the official Tarsnap service, you’re tied to their servers and S3 where all Tarsnap data is backed.

Restic, by contrast, is an open source client software with many contributors. Instead of being tied to a single backup service you can use many different backends, and those backends don’t have to support a special protocol. For instance, S3 and Backblaze B2 are both supported directly through their native APIs. I’ve chosen Backblaze because they’re well-known and cheaper than S3.

Borg Backup is a similar, and perhaps better known, piece of software than those I’m evaluating. I’ve chosen not to look at it because, while there’s a number of storage services that offer Borg support, it requires a special protocol be used. That is, I couldn’t use a general-purpose object storage service like S3 or B2.

I’m not evaluating the encryption provided by either solution.

Initialization and Basic Usage

Both are very easy to initialize.

For tarsnap, once you have an account with some balance in it, you have to create a tarsnap.key file that identifies your machine. A machine in this case is just a repository with data you can back up.

$ tarsnap-keygen --keyfile tarsnap.key --user eric@ericroberts.dev --machine test

They key file, as well as some other required information like cache directory, can be passed into the cli when invoking. I’m going to create a config file with the recommended settings instead.

cachedir /usr/local/tarsnap-cache
keyfile /media/shared/bu_test/tarsnap.key
print-stats
checkpoint-bytes 1G

The files I’m going to be backing up are on a slower HDD, but the cache here points to a different disk, a much faster SSD. Restic will be setup the same way, so this will not effect the results.

Tarsnap is designed to accept cli arguments in an analogous way to tar. So, -cvf <name> means means create verbosely an “archive” of the given name, which in Tarsnap’s case is the name of the backup. For purposes of this test I’m going to pass in the tarsnap config explicitly, but I could just as easily move the config to a well known location like /etc/tarsnap.conf.

To demonstrate, I’m going to backup and restore Hello World1.

$ Hello World! >hello.txt

$ tarsnap --configfile tarsnap.config -cvf hello-world ./hello.txt
a ./hello.txt
                                       Total size  Compressed size
All archives                                 3581             1974
  (unique data)                              3581             1974
This archive                                 3581             1974
New data                                     3581             1974

$ rm hello.txt

$ tarsnap --configfile tarsnap.config -xvf hello-world
x ./hello.txt

$ cat hello.txt 
Hello World!

For Restic’s case, I’m first going create a bucket on Backblaze’s website with Lifecycle Settings set to keep only the last version of a file. Then I need to generate credentials. Everything in Restic is encrypted with a password, of which I’m using a randomly generated hex value.

This is more setup than Tarsnap, but once I’m done I can put these into a file and to get passed as environment variables to Restic.

export B2_ACCOUNT_ID=0000000000000000000000001
export B2_ACCOUNT_KEY=000000000000+000000000000/00000
export RESTIC_REPOSITORY="b2:test-restic-backup719"
export RESTIC_PASSWORD='00000000000000000000000000000000'

Additionally, unlike Tarsnap, I have to initialize the object store.

$ restic init
created restic repository fd1ef589e2 at b2:test-restic-backup719

Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.

With all that setup, the backup and restore process is very similar. One notable difference is that everything in Restic is an object that’s referenced by a “Storage ID”, or SHA-256 hash. In the below example I copy and paste the hash to restore the snapshot (in Tarsnap this would be called an archive). In the following test I’ll demonstrate Restic’s tag feature which lets you give names to snapshots.

$ Hello World! >hello.txt

$ restic backup -v hello.txt
open repository
repository fd1ef589 opened (repository version 2) successfully, password is correct
created new cache in /home/eric/.cache/restic
lock repository
no parent snapshot found, will read all files
load index files
start scan on [hello.txt]
start backup on [hello.txt]
scan finished in 1.080s: 1 files, 13 B

Files:           1 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Data Blobs:      1 new
Tree Blobs:      1 new
Added to the repository: 502 B (489 B stored)

processed 1 files, 13 B in 0:02
snapshot 1ea367a7 saved

$ rm hello.txt

$ restic restore -v --target . 1ea367a7
repository fd1ef589 opened (repository version 2) successfully, password is correct
restoring <Snapshot 1ea367a7 of [/media/shared/bu_test/hello.txt] at 2023-01-02 14:35:30.285350062 -0500 EST by eric@patrick> to .

$ cat hello.txt 
Hello World!

Backing up Videos

As a first test I’m going to download about 1GB of US WWII training videos and back them up to each service. In the next test I’m going to restore them. This is a good baseline since compressed video is unlikely to be compressed further or have any opportunities for de-duplication.

These videos all come from searching Archive.org.

$ curl -i videos.txt
$ ls -lh
total 1.3G
-rw-r--r--. 1 eric eric  88M Mar 25  2020 '25804 Nazi Tanks And How To Destroy Them.mp4'
-rw-r--r--. 1 eric eric 182M Jan 27  2019  33064+US+Navy+This+Is+It+Reel+1_vwr.mp4
-rw-r--r--. 1 eric eric  90M Jan 19  2022  51924+US+Navy+Primary+Flight+Training+Attitides+Of+Flight+Pt+1.mp4
-rw-r--r--. 1 eric eric 131M Jul 26  2017 '77324 Know Your Enemy German Equipment.mp4'
-rw-r--r--. 1 eric eric 174M Dec 22  2021 '79444 Hand To Hand Combat parts 1_2.mp4'
-rw-r--r--. 1 eric eric 205M Jan 29  2019  85704+Personal+Hygiene.mp4
-rw-r--r--. 1 eric eric  45M Dec 24  2021  85834+Fighting+Men+How+To+Get+Killed.mp4
-rw-r--r--. 1 eric eric 123M Jul 28  2018 '87084 Normandy Invasion.mp4'
-rw-r--r--. 1 eric eric  98M Sep 23  2020 '87614 Damage Control Part 2.mp4'
-rw-r--r--. 1 eric eric 108M Sep 30  2020 '87634 US Navy Primary Flight Training Taxing And Take Offs.mp4'
-rw-r--r--. 1 eric eric 1.2K Jan  2 14:57  videos.txt

Backing up with Tarsnap.2

$ time tarsnap --configfile ../tarsnap.config -cvf wwi_training_videos .
a .
a ./videos.txt
a ./87634 US Navy Primary Flight Training Taxing And Take Offs.mp4
a ./25804 Nazi Tanks And How To Destroy Them.mp4
a ./77324 Know Your Enemy German Equipment.mp4
a ./87084 Normandy Invasion.mp4
a ./85834+Fighting+Men+How+To+Get+Killed.mp4
a ./87614 Damage Control Part 2.mp4
a ./79444 Hand To Hand Combat parts 1_2.mp4
a ./85704+Personal+Hygiene.mp4
tarsnap: Creating checkpoint... done.
a ./33064+US+Navy+This+Is+It+Reel+1_vwr.mp4
a ./51924+US+Navy+Primary+Flight+Training+Attitides+Of+Flight+Pt+1.mp4
                                       Total size  Compressed size
All archives                           1299985235       1287737710
  (unique data)                        1299605036       1287653709
This archive                           1299982465       1287737139
New data                               1299602266       1287653138

real    6m2.652s
user    1m42.956s
sys     0m2.012s

Backing up with Restic.

$ time restic backup -v --tag wwi_training_videos .
open repository
repository fd1ef589 opened (repository version 2) successfully, password is correct
lock repository
no parent snapshot found, will read all files
load index files
start scan on [.]
start backup on [.]
scan finished in 0.912s: 11 files, 1.210 GiB

Files:          11 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Data Blobs:    824 new
Tree Blobs:      1 new
Added to the repository: 1.210 GiB (1.206 GiB stored)

processed 11 files, 1.210 GiB in 2:20
snapshot 5f8b4a34 saved

real    2m23.394s
user    0m39.760s
sys     0m4.496s

My initial conclusions here are that, while Restic is faster at backing up, the speeds here are both more than good enough for my needs. Tarsnap’s documentation sounds like on backup it automatically replicates the data across different regions, whereas my Backblaze data exists only in a single region. In that case, this extra security is worth the slower uploads.

Both, expectantly, take roughly 1.3GB to backup the videos. This makes sense since it’s unlikely to get much compression out of them. Tarsnap gives the total stored size as 1287737710, or 1.29GB, whereas the 1.210GiB given by Restic is 1.30 GB3.

Restoring Videos

Prior to restoring the videos, I’m going to grab the checksums so I can ensure they get restored properly. I’ll show this explicitly here but I’ll check these behind the scenes when I don’t show it.

$ md5sum video/* | tee videos_md5.txt
a999d3901f85cd606546a0cce653cfc8  video/25804 Nazi Tanks And How To Destroy Them.mp4
a639246a18a0a0822c964f042fe83d76  video/33064+US+Navy+This+Is+It+Reel+1_vwr.mp4
4e22545327f5aa4160d951e5e4bec624  video/51924+US+Navy+Primary+Flight+Training+Attitides+Of+Flight+Pt+1.mp4
fb838d4ef0a81a5f99442889e5f109bd  video/77324 Know Your Enemy German Equipment.mp4
fd03ca39ba81ec9bbcd2e859fc958e70  video/79444 Hand To Hand Combat parts 1_2.mp4
a456501b8174ca5fef221ec2e6227328  video/85704+Personal+Hygiene.mp4
70bd52b85a94affb7679d60690144570  video/85834+Fighting+Men+How+To+Get+Killed.mp4
4dc80fb1452d52a39d20d9479732dd9e  video/87084 Normandy Invasion.mp4
959e05203e6ad8e0fbc5ec8cff7370e6  video/87614 Damage Control Part 2.mp4
e61699e91127c3b6b95c2c95687f30f1  video/87634 US Navy Primary Flight Training Taxing And Take Offs.mp4
ee66d4039cbfa77caeef2f2570c2ec95  video/videos.txt
$ rm *

Now, to “extract” the archive.

$ time tarsnap --configfile ../tarsnap.config -xvf wwi_training_videos
x ./
x ./videos.txt
x ./87634 US Navy Primary Flight Training Taxing And Take Offs.mp4
x ./25804 Nazi Tanks And How To Destroy Them.mp4
x ./77324 Know Your Enemy German Equipment.mp4
x ./87084 Normandy Invasion.mp4
x ./85834+Fighting+Men+How+To+Get+Killed.mp4
x ./87614 Damage Control Part 2.mp4
x ./79444 Hand To Hand Combat parts 1_2.mp4
x ./85704+Personal+Hygiene.mp4
x ./33064+US+Navy+This+Is+It+Reel+1_vwr.mp4
x ./51924+US+Navy+Primary+Flight+Training+Attitides+Of+Flight+Pt+1.mp4

real    19m17.966s
user    1m45.293s
sys     0m11.627s

$ diff videos_md5.txt <(md5sum video/*) && echo Files Matched
Files Matched

Before restoring Restic’s copy let’s first grab the snapshots id by the tag.

$ time restic snapshots --tag wwi_training_videos
repository fd1ef589 opened (repository version 2) successfully, password is correct
ID        Time                 Host        Tags                 Paths
-------------------------------------------------------------------------------------------
5f8b4a34  2023-01-02 15:23:48  patrick     wwi_training_videos  /media/shared/bu_test/video
-------------------------------------------------------------------------------------------
1 snapshots

real    0m2.983s
user    0m0.627s
sys     0m0.085s

Then, the restore.

$ rm *
$ time restic restore -v --target . 5f8b4a34
repository fd1ef589 opened (repository version 2) successfully, password is correct
restoring <Snapshot 5f8b4a34 of [/media/shared/bu_test/video] at 2023-01-02 15:23:48.514426462 -0500 EST by eric@patrick> to .

real    0m39.491s
user    0m21.337s
sys     0m6.963s
$ diff videos_md5.txt <(md5sum video/*) && echo Files Matched
Files Matched

My conclusions here are a bit stronger. Tarsnap is not only a full 30 times slower, but the times imply even a small disk backup could take hours. For uploads Tarsnap has the excuse that the backups are probably more reliable, but it’s not clear to me how this would make restores take so much longer.

A less important note is that it seems Restic, regardless of the command, has a near constant time startup of a couple seconds. This is true even for simple commands like listing snapshots. My guess for this, is that it’s likely caused by the non-ideal protocol for communicating with B2, and possibly Go being slow.

Backing Up and Restoring Filesystem Dumps

I have three filesystem dumps taken within the same hour of each other. The first two only differ in logs, while the last has some additional changes that total under 1MB. In short, these files contain almost exactly the same data.

$ ls -lh *.xfsdump
-rw-r--r--. 1 eric eric 12G Dec 30 14:07 2022-12-28_15-40-53.xfsdump
-rw-r--r--. 1 eric eric 12G Dec 30 14:11 2022-12-28_15-57-39.xfsdump
-rw-r--r--. 1 eric eric 12G Dec 30 14:16 2022-12-28_16-12-44.xfsdump

These were generated by the xfsdump utility, of which I don’t know the format or how well it fares being incrementally backed up. To give a sense of this, I’ve compressed these into a .tar.xz archive.

$ tar -cJf all_xfsdumps.tar.xz *.xfsdump
$ ls -lh all_xfsdumps.tar.xz 
-rw-r--r--. 1 eric eric 16G Dec 30 16:54 all_xfsdumps.tar.xz

These will be backed up one at a time4, then in the next test I’ll restore the snapshot. Since showing full commands and outputs would get repetitive, I’m only going to show the timing result.

Service Backup 15:40 Backup 15:57 Backup 16:12 Restore 16:12
Tarsnap 29m16.161s 2m28.794s 2m31.499s 167m42.291s
Restic 11m24.162s 1m39.458s 1m36.178s 3m15.206s

Restic is faster in every case, but the restore time is particularly notable. That’s a 50x difference, or almost 3 hours to restore a measly 12GB. Extrapolating means a 1TB hard drive, if it couldn’t be compressed, would take 9.7 days.

Compression

As a curiosity, let’s look at how each tool did with compressing the archives.5 The only compression to be found is in the three disk images, which are mostly the same. For reference, recall tar -CJf compressed the three files to 16GB, which is a ratio of 2.25.

$ tarsnap --configfile tarsnap.config --print-stats -f 2022-12-28_16-12-44_xfsdump -f 2022-12-28_15-40-53_xfsdump -f 2022-12-28_15-57-39_xfsdump
                                       Total size  Compressed size
All archives                          36910640691      21689339506
  (unique data)                       12502260818       7832228513
2022-12-28_16-12-44_xfsdump           11871217344       6801465399
  (unique data)                         115988023         67828904
2022-12-28_15-40-53_xfsdump           11868948776       6799338774
  (unique data)                         112927367         68826211
2022-12-28_15-57-39_xfsdump           11870489336       6800797623
  (unique data)                          92878595         58701545

If you do some subtraction you find each of the xfsdumps have roughly 6.7GB of non-unique data, and roughly 58-69 MB of unique data. Using my rough math this gives 6.7 (shared) + 0.069 + 0.068 + 0.058 = 6.9GB total space to store 36GB, or a 5.22 compression ratio.

As for Restic,

$ restic stats --mode=raw-data fc381f78 88b07b3f 3cb72f15
repository fd1ef589 opened (repository version 2) successfully, password is correct
scanning...
Stats in raw-data mode:
Snapshots processed:   3
   Total Blob Count:   13380
         Total Size:   6.295 GiB

Note, 6.295GiB = 6.76GB which is roughly the same compression ratio as Tarsnap. In all, both tools are good with compression.

Pruning

The last test will be pruning, or deleting old backups. I’m going to keep things simple and delete all snapshots/archives except the last one, that is the 16:12 disk image.

For Tarsnap,

$ time tarsnap --configfile ../tarsnap.config -dv -f wwi_training_videos -f hello-world -f 2022-12-28_15-40-53_xfsdump -f 2022-12-28_15-57-39_xfsdump
Deleting archive "wwi_training_videos"
                                       Total size  Compressed size
All archives                          35610659037      20401603770
  (unique data)                       11202659363       6544576778
wwi_training_videos                    1299981654       1287735736
Deleted data                           1299601455       1287651735
Deleting archive "hello-world"
                                       Total size  Compressed size
All archives                          35610655456      20401601796
  (unique data)                       11202655782       6544574804
hello-world                                  3581             1974
Deleted data                                 3581             1974
Deleting archive "2022-12-28_15-40-53_xfsdump"
                                       Total size  Compressed size
All archives                          23741706680      13602263022
  (unique data)                       11089728415       6475748593
2022-12-28_15-40-53_xfsdump           11868948776       6799338774
Deleted data                            112927367         68826211
Deleting archive "2022-12-28_15-57-39_xfsdump"
                                       Total size  Compressed size
All archives                          11871217344       6801465399
  (unique data)                       10968340960       6403712363
2022-12-28_15-57-39_xfsdump           11870489336       6800797623
Deleted data                            121387455         72036230

real    0m50.110s
user    0m5.140s
sys     0m0.363s

For Restic,

$ time restic forget --prune 1ea367a7 5f8b4a34 fc381f78 88b07b3f
repository fd1ef589 opened (repository version 2) successfully, password is correct
[0:00] 100.00%  4 / 4 files deleted
4 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 1 snapshots
[0:00] 100.00%  1 / 1 snapshots
searching used packs...
collecting packs for deletion and repacking
[0:00] 100.00%  467 / 467 packs processed

to repack:             0 blobs / 0 B
this removes:          0 blobs / 0 B
to delete:           894 blobs / 1.271 GiB
total prune:         894 blobs / 1.271 GiB
remaining:         13313 blobs / 6.230 GiB
unused size after prune: 166.255 MiB (2.61% of remaining size)

rebuilding index
[0:00] 100.00%  385 / 385 packs processed
deleting obsolete index files
[0:00] 100.00%  6 / 6 files deleted
removing 82 old packs
[0:07] 100.00%  82 / 82 files deleted
done

real    0m13.132s
user    0m1.570s
sys     0m0.315s

Visualization

Other Considerations

I mentioned in the previous post that a big impetus to switch was the cost difference. Letting 100GB sit for 1 year cost $300 on Tarsnap and $6 on Backblaze.6 This hasn’t changed, and I would still likely switch if Restic was worse in other ways as well. Still, I think there’s a few other considerations to look at.

One thing I came across in using Restic is locks. Both Tarsnap and Restic require some sort of transaction system to protect data from concurrent writes. Tarsnap has a transaction system built on top of S3, but I’ve never had to care about this since the details are handled by Tarsnap’s own servers. Restic creates a directory on B2 called locks with json files describing what’s being locked, how and by who. I’ve run into multiple situations where Restic’s locks went out of sync, usually because I exited in the middle of a backup or restore, and had to manually clear them with restic unlock. This becomes uncomfortable if you can’t be sure other Restic operations aren’t happening at the same time, and speaks to the advantage of having a server between you and the storage provider.

Ergonomically I much prefer Restic’s interface. Tarsnap seems to expect that you’re using some tool in between it, since it doesn’t give much metadata to your backups and doesn’t provide any tools to prune old backups. Otherwise it’s up to the user to choose descriptive filenames so you know what time the backup was created and where the backup was created from. Restic, by contrast, doesn’t require you to name anything and provides a very flexible restic forget command.

To give an example, here’s what Restic shows when listing snapshots.

$ restic snapshots
repository fd1ef589 opened (repository version 2) successfully, password is correct
ID        Time                 Host        Tags                 Paths
---------------------------------------------------------------------------------------------------------------------------
1ea367a7  2023-01-02 14:35:30  patrick                          /media/shared/bu_test/hello.txt
5f8b4a34  2023-01-02 15:23:48  patrick     wwi_training_videos  /media/shared/bu_test/video
fc381f78  2023-01-02 17:46:30  patrick                          /media/shared/bu_test/diskimage/2022-12-28_15-40-53.xfsdump
88b07b3f  2023-01-02 18:04:26  patrick                          /media/shared/bu_test/diskimage/2022-12-28_15-57-39.xfsdump
3cb72f15  2023-01-02 18:09:41  patrick                          /media/shared/bu_test/diskimage/2022-12-28_16-12-44.xfsdump
---------------------------------------------------------------------------------------------------------------------------

Asking tarsnap to list archives gives just it’s names, and they’re not even in any particular order.

$ tarsnap --configfile ../tarsnap.config --list-archives
wwi_training_videos
2022-12-28_16-12-44_xfsdump
hello-world
2022-12-28_15-40-53_xfsdump
2022-12-28_15-57-39_xfsdump

Tarsnap is more minimalist in its output, even with the verbose flag. When backing up a large file there’s no indication, except for the 1GB checkpoints I set in the configuration, of progress. Restic shows and updates the percentage complete on backup, but not restore.

Another feature I can’t find an equivalent of in Tarsnap is restic find and restic cat which let you find and print individual files. Tarsnap does allow restoring only a certain file or directory from within an archive.

Closing Remarks

Most of this analysis doesn’t matter because the cost analysis is so stark. Restic is going to serve the “1” in my 3-2-1 backup scheme, so the single region is fine. What I’m hoping to gain by the lower cost is the ability to cast a wider net without worrying about cost.

That said, even if the cost wasn’t a factor the restore times are genuinely concerning. Even knowing I’ll eventually get my data, I would be very stressed if it took hours or days to restore. And there’s a good chance I’ll need multiple attempts at restoring, such as if I need the data before I can buy a replacement for a failed drive.


  1. I’m running tarsnap 1.0.40 and restic 0.14.0. A speed test has my upload and download exceeding 300Mbps. ↩︎

  2. Ignore my mistake in tagging these videos wwi instead of wwii↩︎

  3. 1GB = 10**9 bytes, 1GiB = 2**30 bytes ↩︎

  4. Tarsnap insists the user use --snaptime for filesystem snapshots to prevent race conditions, but this isn’t relevant here. ↩︎

  5. I’m treating de-duplication and compress as the same thing in this article. ↩︎

  6. There’s an example on Tarsnap’s homepage where “Alice pays less than $5/month” for “several terabytes of non-unique data”. From what I can tell this requires Tarsnap to compress the data down to 20GB, which seems unlikely in my situation. ↩︎