Linux: incremental backup using rsync on btrfs with snapshots

From Luky-Wiki
Jump to: navigation, search

My production data are backed up by bacula. This is good for data "online" all the time, but it is not so convent for my notebook and media PC. Till now I was using only simple rsync script to "copy" everything important to USB drive or iSCSI LUN. It was working fine but hat one catch. There is no way how to retrieve older version of files.

I would like to have possibility to retrieve also older versions of files so I combined my rsync backup with btrfs and snapshot feature. Here is result:

Before first backup

I am using iSCSI lun as target for backup. Skip iSCSI part if you are using USB drive.

iSCSI configuration

It is important to password protect Target. Otherwise any device on local network can overwrite it. Also I highly recommend Header and Data digest. There is possibility of data corruption without it.

Configuration of iSCSI initiator is stored in /etc/iscsi/iscsid.conf. Following lines are mandatory:

node.session.auth.authmethod = CHAP
node.session.auth.username = <user>
node.session.auth.password = <pass>

node.conn[0].iscsi.HeaderDigest = CRC32C
node.conn[0].iscsi.DataDigest = CRC32C

All others can be left in "default" configuration.

iSCSI discovery and login

It is necessary to discover iSCSI target and log to it before system can access drive. It can be performed by following commands (replace "narnia-nas" and name of target with name used by your storage):

# iscsiadm -m discovery -t st -p narnia-nas
10.x.x.x:3260,1 iqn.2000-01.com.synology:narnia-nas.lukas

# iscsiadm -m node -T iqn.2000-01.com.synology:narnia-nas.lukas --login
Logging in to [iface: default, target: iqn.2000-01.com.synology:narnia-nas.lukas, portal: 10.x.x.x,3260] (multiple)
Login to [iface: default, target: iqn.2000-01.com.synology:narnia-nas.lukas, portal: 10.x.x.x,3260] successful.

There should be new disk once kernel detect it (snip from dmesg command):

scsi host4: iSCSI Initiator over TCP/IP
scsi 4:0:0:0: Direct-Access     SYNOLOGY iSCSI Storage    4.0  PQ: 0 ANSI: 5
sd 4:0:0:0: Attached scsi generic sg2 type 0
sd 4:0:0:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 43 00 10 08
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
sd 4:0:0:0: [sdb] Attached SCSI disk
# lsscsi | grep SYN
[4:0:0:0]    disk    SYNOLOGY iSCSI Storage    4.0   /dev/sdb 

File system

I prefer disk labels as device file may change over time. Make sure you select correct device:

# mkfs.btrfs -L lukas-backup /dev/sdb
btrfs-progs v4.0
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM (100.00GiB) ...
Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
Turning ON incompat feature 'skinny-metadata': reduced-size metadata extent refs
fs created label lukas-backup on /dev/sdb
	nodesize 16384 leafsize 16384 sectorsize 4096 size 100.00GiB

Note: TRIM on iSCSI LUN is feature of DSM 6.0 running on selected models of NAS from Synology. It may be available also on NAS devices from different vendors.

FS first time mounted

File system will be mounted by script therefore it is not necessary for system itself to mount it during boot up. Entry in /etc/fstab should looks like:

# backup
LABEL=lukas-backup	/mnt/backup	btrfs	noauto				0 0

Btrfs can be managed only while it is mounted so it is necessary to mount it first.

mkdir -p /mnt/backup
chmod  0 /mnt/backup
mount    /mnt/backup

Subvolume & quota

Subvolume and snapshot size are internally handled as quota. In order to see which snapshot or subvolume take so much space it is necessary to enable quota on file system level:

btrfs quota enable /mnt/backup

Note: enable quota before placing data on fs otherwise it will take some time to collect (scan) it.

I prefer to use separate subvolume for backup. With this configuration it is possible to use one filesystem as target for different backups:

# btrfs sub create /mnt/backup/lukas
Create subvolume '/mnt/backup/lukas'

Umount & logoff

My script is handling iSCSI and mounting of file system. It is necessary to unmount file system and disconnect from NAS in order to have "clean" start.

# umount -v /mnt/backup/
umount: /mnt/backup/ unmounted
# iscsiadm -m node -T iqn.2000-01.com.synology:narnia-nas.lukas --logout
Logging out of session [sid: 2, target: iqn.2000-01.com.synology:narnia-nas.lukas, portal: 10.x.x.x,3260]
Logout of [sid: 2, target: iqn.2000-01.com.synology:narnia-nas.lukas, portal: 10.x.x.x,3260] successful.

Backup script

This script don't try to be smart. It simply execute commands and end immediately if failure occur. If failure occur while executing rsync then snapshot is marked as "incomplete".

Note: don't use this script directly. It is created to fulfill my needs and it may be necessary to modify it before you can use it in your environment.

#!/bin/bash

echo
echo "To verify using \"--checksum\" execute:"
echo "${0} verify"
echo
sleep 5

if [ "${1}" == "verify" ]
then
        echo "... verify mode selected"
        opt="--checksum"
else
        echo "... normal mode selected"
        opt=""
fi

sleep 5
echo
echo

echo "Discovering narnia-nas ..."
iscsiadm -m discovery -t st -p narnia-nas || exit 1
echo

echo "Connecting to narnia-nas ..."
iscsiadm -m node -T iqn.2000-01.com.synology:narnia-nas.lukas --login || exit 2
echo

echo "Sleeping ... kernel need some time to detect new devices"
sleep 10
echo

echo "Mounting backup fs ..."
mount -v /mnt/backup || exit 3
echo

echo "Copying data ..."
# output of following commands is saved along with backup
(  echo; echo "lsusb:" ; lsusb;
   echo; echo "lsscsi:"; lsscsi;
   echo; echo "lshw:"  ; lshw -short;
   echo; echo "date:"  ; date;
   echo; echo "dpkg -l"; dpkg -l;
   echo; echo "# EOF"  ;
) > /_lastbackup_iscsi
echo

# copy data to backup location
rsync ${opt} --archive --delete --delete-excluded --human-readable --stats --progress \
 --exclude=/cdrom --exclude=/dev --exclude=/media --exclude=/mnt --exclude=/proc --exclude=/run \
 --exclude=/sys --exclude=/tmp \
 --exclude=/btrfs \
 --exclude=/root/hekate-certificates \
 --exclude=/home/lukas/.cache/google-chrome \
 --exclude=/home/lukas/.cache/mozilla/firefox \
 --exclude=/home/lukas/.cache/thumbnails \
 --exclude=/data/VMs/no-backup \
 --exclude=/data/swap \
 / /mnt/backup/lukas/

RC=$?

echo
echo "Done with rc: ${RC}"
echo

echo "Flushing file system buffers ..."
time sync
btrfs filesystem sync /mnt/backup
time sync
echo

echo "Creating snapshot of backup ..."
if [ "${RC}" -eq 0 ]
then
	btrfs sub snap -r /mnt/backup/lukas "/mnt/backup/lukas_$(LANG=C date +%Y-%m-%d_%s)"                  || exit 4
else
	btrfs sub snap -r /mnt/backup/lukas "/mnt/backup/lukas_$(LANG=C date +%Y-%m-%d_%s)_incomplete_${RC}" || exit 5
fi
echo

echo "Hit enter to continue ... "
read

echo "Umounting backup filesystem ..."
umount -v /mnt/backup || exit 6
echo

echo "Disconecting from narnia-nas ..."
iscsiadm -m node -T iqn.2000-01.com.synology:narnia-nas.lukas --logout || exit 7
echo

echo "Done :o)"
echo

# EOF

md5sum: e075be22b429a2be4b3dbf2fbb467ab9

Useful btrfs commands

"filesystem df"

BTRFS use different data layout compared to file systems like ext{2,3,4}. Data and metadata are organized to pools and pools are allocated from raw device. There are several different pools and each one have it's own utilization. Therefore it is not easy to provide "numbers" for df command. If you ran out of space or you are unsure what is causing disk utilization then review output of following command:

# btrfs filesystem df /mnt/backup/

Data, single: total=79.01GiB, used=78.66GiB

System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B

Metadata, DUP: total=1.00GiB, used=440.55MiB
Metadata, single: total=8.00MiB, used=0.00B

GlobalReserve, single: total=160.00MiB, used=0.00B

Note1: I put empty lines to output in order to visually separate pools

Note2: Data -> user data, System -> stuff related to "super block", Metadata -> Metadata :o) , GlobalReserve -> FS reserve to prevent deadlock

how to identify space consumers

Files and blocks of files can be shared between volumes (and snapshots). Standard Linux tools don't understand underlying data layout so it is not easy to find space consumers using this tools. It is necessary to use btrfs native commands in order to get accurate numbers. Example of usage:

  • First step is to get ID of subvolumes:
# btrfs sub list /mnt/backup/
ID 257 gen 44 top level 5 path lukas
ID 337 gen 30 top level 5 path lukas_2016-09-11_1473624103
ID 345 gen 35 top level 5 path lukas_2016-09-11_1473624272
ID 349 gen 39 top level 5 path lukas_2016-09-11_1473624385
ID 350 gen 42 top level 5 path lukas_2016-09-11_1473624546
  • Second step is list of quota groups:
# btrfs qgroup show /mnt/backup/
qgroupid         rfer         excl 
--------         ----         ---- 
0/5          16.00KiB     16.00KiB 
0/257        78.41GiB     42.92MiB 
0/337        78.84GiB      7.37GiB 
0/345        78.42GiB     48.53MiB 
0/349        78.42GiB     46.27MiB 
0/350        78.43GiB     37.96MiB 

First number represent subvolume id (snapshot is also subvolume). Second one is size of it (in mean of all data accessible via subvolume). Third one show data exclusively stored in this group. This size can be recovered by removing subvolume / snapshot.

As a example -> By removing "0/337 " I can get back 7.37GiB of space.

Subvolume (snapshot) remove

Snapshots and subvolumes are removed in same way:

# btrfs sub delete /mnt/backup/lukas_2016-09-11_1473624103/
Delete subvolume (no-commit): '/mnt/backup/lukas_2016-09-11_1473624103'

It may take some time till BTRFS release all data used by snapshot or subvolume.

File system extend

I am using iSCSI LUN so I can extend it. Here is way how to extend also BTRFS on it:

# btrfs filesystem resize max /mnt/backup/
Resize '/mnt/backup/' of 'max'

This command will extend all attached devices in btrfs to reflect maximum HW size (for example after LUN resize).