This tutorial will hopefully guide you through making automated encrypted backups to Amazon’s S3 using duplicity. It was written as a follow-up to Using Duplicity and Amazon S3 – Notes and Examples, in order to organize all the necessary information into a simple tutorial.
We’ll start by creating a simple wrapper for duplicity:
#! /usr/bin/python
import sys
import os
duplicity_bin = '/usr/bin/duplicity'
env = {
'AWS_ACCESS_KEY_ID': 'PUT YOUR KEY ID HERE',
'AWS_SECRET_ACCESS_KEY': 'PUT YOUR SECRET ACCESS KEY HERE',
'PASSPHRASE': 'PUT ENCRYPTION PASSPHRASE',
}
env.update(os.environ)
os.execve(duplicity_bin, sys.argv, env)
Save this under duplicity-wrapper.py and chmod 0500 it so only you will be able to read and execute it.
Note: You’ll want to write down the passphrase and store it in a safe location (preferably in two separate locations). That way, in case you need to restore the backups, you won’t have useless encrypted files.
Now edit your crontab and add a line like the following:
10 1 * * 0 /path/to/duplicity-wrapper.py /path/to/folder/ s3+http://bucket-name/somefolder &>> ~/log/backups.log
This will create a weekly backup for /path/to/folder. The backup will be encrypted with whatever passphrase you’ve given in the duplicity-wrapper.py. The output of the backup process will be saved in ~/log/backups.log.
You should also run
/path/to/duplicity-wrapper.py full /path/to/folder/ s3+http://bucket-name/somefolder
in order to create full backups. You might want to periodically verify your backups:
/path/to/duplicity-wrapper.py collection-status s3+http://bucket-name/somefolder
/path/to/duplicity-wrapper.py verify s3+http://bucket-name/somefolder /path/to/folder/
to check the status of the backups and verify them.
And last but not least, in case you ever need the backups, you can restore them using:
/path/to/duplicity-wrapper.py restore s3+http://bucket-name/somefolder /path/to/folder/
Security Considerations
As I know some people will comment on saving the encryption passphrase plainly in a file, I will explain my reasoning. I use the above encryption in order to secure my files in case of data leakage from Amazon S3. In order to read my backups, or silently tamper with them, someone will have to get the passphrase from my machine. While this isn’t impossible, I will say it’s unlikely. Furthermore, if someone has access that allows him to read files from my computer, he doesn’t need the backups; he can access the files directly.
I’ve given some thought to making the backups more secure, but it seems you always have to compromise on either automation or incremental backups. But, as I wrote, the current solution seems to me strong enough given the circumstances. Nonetheless, if you’ve got a better solution, it would be nice to hear.
I’d be interested to hear alternatives also.
If I remember correctly, getting your amazon credentials also means acquiring a broad set of abilities, including:
– deleting your important stuff
– costing you a lot of money by using lots of storage or making lots of API calls (remember that amazon does not allow you to set a cap on them)
You’re right to the point regarding Amazon. I tend to do DVD backups of my important files to complement the S3 backups, but I don’t do them as often.
Regarding the alternatives, I’m beginning to think that it is possible. It will require using public key to encrypt the backups, while keeping a local cache of the signatures unencrypted. I think duplicity already has such cache and even uses it, so probably it will just be checking that duplicity is willing to work given only the public key (or else hack it do act that way).