Automate backup to S3 buckets

Photo by Tyler on Unsplash

Photo by Tyler on Unsplash

Many web projects store user uploads, such as images and PDF documents, in the filesystem on the VPS (Virtual Private Server).

This article will teach you to use duplicity-backup-s3 to backup your web projects to an S3-compatible storage bucket provided by Linode.

Setup

Installing dependencies

Start by installing the needed dependencies on your VPS. Since duplicity-backup-s3 is a Python library based on duplicity, you must first install duplicity and then the python3-boto package to connect to your S3 bucket.

sudo apt install duplicity python3-boto
sudo python3 -m pip install duplicity-backup-s3

Config

Create config files for each directory you want to backup.

To simplify management, it's best to create a config file for each project directory you want to back up. Since the root user will run the backup scripts, all config files will be stored in a directory called duplicity in the root user's home directory. Start by creating that directory.

sudo mkdir /root/duplicity

Once configured, you must create an object storage bucket on Linode and an access key with both read and write permission for that bucket. Take note of the access key and secret before proceeding.

NB! It's critical that nobody without an access key or access to your Linode dashboard can access the backups. Do not allow reading or enumeration of your backups without an access key. 

I recommend using backup-specific object storage to make backups simple to secure.

Once you have a bucket and an access key, you can create specific config files for each directory you want to backup.

sudo vim /root/duplicity/<site-name>.yaml
<site-name>.yaml
aws:
  AWS_ACCESS_KEY_ID: access_token
  AWS_SECRET_ACCESS_KEY: secret_key
backuproot: <site-path>
excludes:
  - <site-path>/vendor
  - <site-path>/storage
  - <site-path>/node_modules
full_if_older_than: 7D
log-path: /var/log/duplicity_backup/
remote:
  endpoint: s3://eu-central-1.linodeobjects.com
  bucket: <bucket-name>
  path: <bucket-path>
volsize: 512

In the template above (which is based on a typical PHP project), replace <site-path> with the absolute path to your project root, <bucket-name> with the name of your object storage, and <bucket-path> with the path you want to store the backup on in your object storage. 

If you use another data center other than Frankfurt, Germany, you must change the endpoint address accordingly.

The backup is recursive, meaning that the backup will include all sub-directories to the <bucket-root>. If you want to exclude other specific directories, you change the list of excludes in the config.

This config will perform a full backup every seven days and incremental backups between the seven days. Change this to your needs.

To run and test your configuration, run this command as root, but replace <site-name> with the name of your config file:

duplicity_backup_s3 incr --config /root/duplicity/<site-name>.yaml

Automate backup

Automate the (boring and) important stuff.

If your configuration ran successfully, you can automate the backup process with a Python script that run through all the config files in the duplicity directory.

sudo vim /root/backup.py
backup.py
import os
import subprocess

os.chdir("/root")
config_files = os.listdir("duplicity")

for file in config_files:
    # Create incremental backup
    subprocess.run(['duplicity_backup_s3', 'incr', f'--config=./duplicity/{file}'])
    
    # Delete old backups
    subprocess.run(['duplicity_backup_s3', 'remove', '--older-than', '7D', f'--config=./duplicity/{file}'])

Once the script is created, you can add it to the root user crontab.

sudo crontab -e
/etc/crontab
...
# m h  dom mon dow   command
30 1 * * * python3 /root/backup.py

This makes the script run at 01:30 each night.

Congratulations, you now have a simple, but effective backup of your files. 🎉