Published by bp99 on Mon May 24 08:05:03 2021

bp99

How I back-up my computers with duplicity and Backblaze B2

One of the first things you should do after installing a new server or workstation is to set up a backup solution. Really, do not skip this. This is one of those things you do not notice normally, but once you get in trouble, you will really wish you had that backup.

The setup

There are many ways to back-up stuff. For starters, there is dump(8) and restore(8). I do not have a lot of experience with these, but it is probably a good idea to get to know them at some point. They are the de facto backup-related software in UNIX-like systems.

What I ended up using and liking is duplicity. Mostly because it supports encryption with GPG and uploading to Backblaze B2 buckets out of the box.

I have a little script called backup under /root/bin/ which I call from /etc/daily.local. It could be cron as well though. I just prefer to ‘extend’ daily(8) and the likes unless I need fine-tuned timing.

The script reads a file /etc/backup_paths, which is simply a list of paths to be backed up. It goes over each line of this file and calls duplicity with the appropriate parameters and environment variables to back-up the location and uploads it to my B2 bucket. I have a separate bucket for each server/workstation.

Prerequisites

We need python3 with pip, duplicity, and the b2sdk Python library. On OpenBSD, do

# pkg_add python3 py3-pip duplicity
# python3 -m pip install b2sdk

You will also need gpg2 in case you do not already have it. It is in OpenBSD base.

A little fix

Unfortunately, at the time of writing, there is a little bug in the latest available duplicity in OpenBSD’s ports tree. Namely, it uses a function in b2sdk that has since been refactored.

You can see this by running the script and immediately getting

# /root/bin/backup
Traceback (innermost last):
  File "/usr/local/bin/duplicity", line 82, in <module>
    with_tempdir(main)
  File "/usr/local/bin/duplicity", line 68, in with_tempdir
    fn()
  File "/usr/local/lib/python3.8/site-packages/duplicity/dup_main.py", line 1518, in
 main
    action = commandline.ProcessCommandLine(sys.argv[1:])
  File "/usr/local/lib/python3.8/site-packages/duplicity/commandline.py", line 1190,
 in ProcessCommandLine
    backup, local_pathname = set_backend(args[0], args[1])
  File "/usr/local/lib/python3.8/site-packages/duplicity/commandline.py", line 1061,
 in set_backend
    config.backend = backend.get_backend(bend)
  File "/usr/local/lib/python3.8/site-packages/duplicity/backend.py", line 223, in g
et_backend
    obj = get_backend_object(url_string)
  File "/usr/local/lib/python3.8/site-packages/duplicity/backend.py", line 209, in g
et_backend_object
    return factory(pu)
  File "/usr/local/lib/python3.8/site-packages/duplicity/backends/b2backend.py", lin
e 103, in __init__
    (self.path, bucket_name, self.service.account_info.get_minimum_part_size()), log
.INFO)
 AttributeError: 'InMemoryAccountInfo' object has no attribute 'get_minimum_part_siz
e'

That get_minimum_part_size at the end is what is giving us trouble. You can find the reason with a quick search online, but it also works to check out the b2-sdk-python repository and look around there:

b2-sdk-python $ git prettylog -S get_minimum_part_size --source
...
| * 5ca59a2 - review fix and test upgrade   (3 weeks ago) <mpnowacki-reef>
...
| * | 12c23d1 - refactored AccountInfo tests to single file using pytest   (4 weeks ago) <mpnowacki-reef>
|/ /
... 
| * f58b7fa - minimum_part_size refactored to recommended_part_size and added support for absolute_minimum_part_size   (4 weeks ago) <mpnowacki-reef>
|/
...
| | * | | | | | 3427cfa - Complex file creation scheme using multiple upload/copy sources   (1 year, 3 months ago) <Michal Zukowski>
...
| * 186ba5b - Copy tests, change deps   (2 years ago) <Pawel Polewicz>
...
| * 28a12f5 - test_console_tool suite moved to CLI package   (2 years, 3 months ago) <Dmitry Paramonov>
...
| * | 660477e - Add InMemoryAccountInfo   (4 years, 8 months ago) <Pawel Polewicz>
...
| * | 34ec4bc - Remove old account_info.py   (4 years, 11 months ago) <Pawel Polewicz>
...
| | * | 2f38539 - Move AccountInfo and related classes to their own directory   (4 years, 11 months ago) <Pawel Polewicz>
...
| * 1f28486 - Remove dependency on portalocker by using sqlite.   (5 years ago) <Brian Beach>
...
| * cb578c0 - Add --threads option to upload_file.   (5 years ago) <Brian Beach>
...
| | | * 48dcd15 - Complete unit tests for StoredAccountInfo.   (5 years ago) <Brian Beach>
...
| * ce96617 - Factor out AccountInfo classes into account_info.py   (5 years ago) <Brian Beach>
...
| * e9881a6 - Factor out Bucket class into bucket.py.   (5 years ago) <Brian Beach>
...
| * 1808501 - First ConsoleTool test case.   (5 years ago) <Brian Beach>
...
| | | * | 7eac0f1 - Add a unit function to figure how to break up a large file into parts.   (5 years ago) <Brian Beach>
| | | * | fd376ff - Fix a couple bugs that the integration tests found.  And yapf.   (5 years ago) <Brian Beach>
| | | * | c58c437 - Add decision point to decide which files should be uploaded in parts.   (5 years ago) <Brian Beach>

(prettylog is a git alias defined by me, but this will work with a simple git log as well, it just will not be this pretty)

Commit f58b7fa literally says ‘minimum_part_size refactored to recommended_part_size.’

(By the way, perhaps it would have been easier to first look in CHANGELOG.md as it contains the same information, but it is sometimes good to overcomplicate stuff to learn new things I guess.)

So an ugly hack to fix our problem is to edit /usr/local/lib/python3.8/site-packages/duplicity/backends/b2backend.py and change the occurrences of minimum_part_size to recommended_part_size.

I would not necessarily recommend this method as b2backend.py is managed by pkg_add and you should not hack modifications into its code. A better solution would be to downgrade b2sdk to a version which still co-operates with duplicity, such as 1.7.0.

# python3 -m pip install b2sdk==1.7.0

The backup script

The script is very simply and I have written it a long time ago. Here it goes:

#!/bin/ksh -e
#
# The list of backup locations is specified in a file with the following
# format:
#       /path/to/first/dir
#       /path/to/2nd/dir
#       ...
# This file is supplied by the BACKUP_PATHS environment variable.
# If unset, this script shall try /etc/backup_paths
#
# Another environment variable used is PASSPHRASE that must be set to
# the desired GnuPG passphrase. Obviously, take care not to expose this
# value.
#
# Since the script is made for BackBlaze B2, three more things must be
# supplied:
# - an app key ID (APPKEYID)
# - an app key (APPKEY)
# - a bucket name (BUCKET)
# Again, make sure to keep at least APPKEY confidential.
#
# The script has a verbose mode that can be enabled by passing the
# switch -v.

export PASSPHRASE=your_secret_gpg_passphrase
export APPKEYID=xx123456789123456789abcde
export APPKEY=j3ij19ksJjSkdsn92819120sndas938
export BUCKET=cauliflower-backup

duplicity=/usr/local/bin/duplicity
duplicity_flags='--allow-source-mismatch --gpg-binary gpg2'

function err { >&2 printf 'error: %s\n' "$@" >&2; exit 1; }

function backup
{
        [[ -n $1 ]] || err missing location argument
        loc="$1"

        if pkill gpg-agent && [[ -n $verb ]]; then
                print killed previous gpg-agent process
        fi
        [[ -n $verb ]] && printf 'now backing up: %s\n' "$loc"
        $duplicity $duplicity_flags \
            "$loc" "b2://$APPKEYID:$APPKEY@$BUCKET/$loc"
}


BACKUP_PATHS=
[[ -f $BACKUP_PATHS ]] || err cannot open BACKUP_PATHS -- $BACKUP_PATHS

[[ -n $PASSPHRASE ]] || err PASSPHRASE must be set

for b2var in APPKEYID APPKEY BUCKET; do
        [[ -n $b2var ]] || err $b2var must be set
done


unset verb
usage='vh'
while getopts $usage optchar; do
        case $optchar in
        v) verb=yes;;
        h) err usage: $0 '[-v]';;
        esac
done

while read loc; do
        case $loc in
        '' | \#*) continue;;
        esac
        backup "$loc"
done <"$BACKUP_PATHS"

There is really no need to break down the script as it is extremely simple. The things to note are the environment variables defined at the beginning of the file. These are sensitive, so be sure to keep the script secure, ie owned by root and permissioned as 0600 (only readable by root).

Running the script

This is a snippet from the verbose output:

killed previous gpg-agent process
now backing up: /var/mail
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Mon May 24 08:05:58 2021
--------------[ Backup Statistics ]--------------
StartTime 1621839894.13 (Mon May 24 09:04:54 2021)
EndTime 1621839894.17 (Mon May 24 09:04:54 2021)
ElapsedTime 0.04 (0.04 seconds)
SourceFiles 3
SourceFileSize 512 (512 bytes)
NewFiles 0
NewFileSize 0 (0 bytes)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 0
RawDeltaSize 0 (0 bytes)
TotalDestinationSizeChange 111 (111 bytes)
Errors 0
-------------------------------------------------

I just append the line /root/bin/backup in /etc/daily.local and this makes the backup script run every day. If you set it up, you can even get these nice stats in an email sent to you daily (which is what I get).

Restoring

Well, I never had to do this so far.

To be fair, you should always test your backup solution to make sure restoration actually works. I will have to do that another day though, perhaps in another post ☺