rsnapshot HOWTO

2004-01-20

Revision History
Revision 1.0.02005-01-31NR
Updated for rsnapshot 1.2.0
Revision 0.9.72005-01-17NR
Spelling corrections submitted by Nicolas Kaiser
Revision 0.9.62004-12-13NR
Misc. updates
Revision 0.9.52004-07-10NR
Relicensed document under GPL, instead of FDL
Revision 0.9.42004-07-02NR
Added description of proper crontab time settings
Revision 0.9.32004-06-11NR
Misc. updates
Revision 0.9.22004-05-16NR
Updated --link-dest info
Revision 0.9.12004-01-20NR
Added --link-dest info
Revision 0.92004-01-10NR
First draft

Abstract

rsnapshot is a filesystem backup utility based on rsync. Using rsnapshot, it is possible to take snapshots of your filesystems at different points in time. Using hard links, rsnapshot creates the illusion of multiple full backups, while only taking up the space of one full backup plus differences. When coupled with ssh, it is possible to take snapshots of remote filesystems as well. This document is a tutorial in the installation and configuration of rsnapshot.


rsnapshot is a filesystem backup utility based on rsync. Using rsnapshot, it is possible to take snapshots of your filesystems at different points in time. Using hard links, rsnapshot creates the illusion of multiple full backups, while only taking up the space of one full backup plus differences. When coupled with ssh, it is possible to take snapshots of remote filesystems as well.

rsnapshot is written in Perl, and depends on rsync. OpenSSH, GNU cp, GNU du, and the BSD logger program are also recommended, but not required. All of these should be present on most Linux systems. rsnapshot is written with the lowest common denominator in mind. It only requires at minimum Perl 5.004 and rsync. As a result of this, it works on pretty much any UNIX-like system you care to throw at it. It has been successfully tested with Perl 5.004 through 5.8.2, on Debian, Redhat, Fedora, Solaris, Mac OS X, FreeBSD, OpenBSD, NetBSD, and IRIX.

The latest version of the program and this document can always be found at http://www.rsnapshot.org/.

I originally used Mike Rubel's shell scripts to do rsync snapshots a while back. These worked very well, but there were a number of things that I wanted to improve upon. I had to write two shell scripts that were customized for my server. If I wanted to change the number of intervals stored, or the parts of the filesystem that were archived, that meant manually editing these shell scripts. If I wanted to install them on a different server with a different configuration, this meant manually editing the scripts for the new server, and hoping the logic and the sequence of operations was correct. Also, I was doing all the backups locally, on a single machine, on a single hard drive (just to protect from dumb user mistakes like deleting files). Never the less, I continued on with this system for a while, and it did work very well.

Several months later, the IDE controller on my web server failed horribly (when I typed /sbin/shutdown, it said the command was not found). I was then faced with what was in the back of my mind all along: I had not been making regular remote backups of my server, and the local backups were of no use to me since the entire drive was corrupted. The reason I had only been making sporadic, partial remote backups is that they weren't automatic and effortless. Of course, this was no one's fault but my own, but I got frustrated enough to write a tool that would make automated remote snapshots so easy that I wouldn't ever have to worry about them again. This goal has long been reached, but work on rsnapshot still continues as people submit patches, request features, and ways are found to improve the program.

This section will walk you through the installation of rsnapshot, step by step. This is not the only way to do it, but it is a way that works and that is well documented. Feel free to improvise if you know what you're doing.

This guide assumes you are installing rsnapshot 1.2.0 for the first time. If you are upgrading from an earlier version, please read the INSTALL file that comes with the source distribution instead.

In this example, we will be using the /.snapshots/ directory to hold the filesystem snapshots. This is referred to as the “snapshot root”. Feel free to put this anywhere you have lots of free disk space. However, the examples in this document assume you have not changed this parameter, so you will have to substitute this in your commands if you put it somewhere else.

Also please note that fields are separated by tabs, not spaces. The reason for this is so it's easier to specify file paths with spaces in them.

Please note that the destination paths specified here are based on the assumption that the --relative flag is being passed to rsync via the rsync_long_args parameter. If you are installing for the first time, this is the default setting. If you upgraded from a previous version, please read the INSTALL file that came with the source distribution for more information.

This is the section where you tell rsnapshot what files you actually want to back up. You put a “backup” parameter first, followed by the full path to the directory or network path you're backing up. The third column is the relative path you want to back up to inside the snapshot root. Let's look at an example:

backup      /etc/      localhost/

In this example, backup tells us it's a backup point. /etc/ is the full path to the directory we want to take snapshots of, and localhost/ is a directory inside the snapshot_root we're going to put them in. Using the word localhost as the destination directory is just a convention. You might also choose to use the server's fully qualified domain name instead of localhost. If you are taking snapshots of several machines on one dedicated backup server, it's a good idea to use their various hostnames as directories to keep track of which files came from which server.

In addition to full paths on the local filesystem, you can also backup remote systems using rsync over ssh. If you have ssh installed and enabled (via the cmd_ssh parameter), you can specify a path like:

backup      root@example.com:/etc/     example.com/

This behaves fundamentally the same way, but you must take a few extra things into account.

With this parameter, the second column is the full path to an executable backup script, and the third column is the local path you want to store it in (just like with the "backup" parameter). For example:

backup_script      /usr/local/bin/backup_pgsql.sh       localhost/postgres/

In this example, rsnapshot will run the script /usr/local/bin/backup_pgsql.sh in a temp directory, then sync the results into the localhost/postgres/ directory under the snapshot root. You can find the backup_pgsql.sh example script in the utils/ directory of the source distribution. Feel free to modify it for your system.

Your backup script simply needs to dump out the contents of whatever it does into it's current working directory. It can create as many files and/or directories as necessary, but it should not put its files in any pre-determined path. The reason for this is that rsnapshot creates a temp directory, changes to that directory, runs the backup script, and then syncs the contents of the temp directory to the local path you specified in the third column. A typical backup script would be one that archives the contents of a database. It might look like this:

#!/bin/sh

/usr/bin/mysqldump -uroot mydatabase > mydatabase.sql
/bin/chmod 644 mydatabase.sql

There are several example scripts in the utils/ directory of the rsnapshot source distribution to give you more ideas.

Make sure the destination path you specify is unique. The backup script will completely overwrite anything in the destination path, so if you tried to specify the same destination twice, you would be left with only the files from the last script. Fortunately, rsnapshot will try to prevent you from doing this when it reads the config file.

Please remember that these backup scripts will be invoked as the user running rsnapshot. In our example, this is root. Make sure your backup scripts are owned by root, and not writable by anyone else. If you fail to do this, anyone with write access to these backup scripts will be able to put commands in them that will be run as the root user. If they are malicious, they could take over your server.

We have a snapshot root under which all backups are stored. By default, this is the directory /.snapshots/. Within this directory, other directories are created for the various intervals that have been defined. In the beginning it will be empty, but once rsnapshot has been running for a week, it should look something like this:

[root@localhost]# ls -l /.snapshots/
drwxr-xr-x    7 root     root         4096 Dec 28 00:00 daily.0
drwxr-xr-x    7 root     root         4096 Dec 27 00:00 daily.1
drwxr-xr-x    7 root     root         4096 Dec 26 00:00 daily.2
drwxr-xr-x    7 root     root         4096 Dec 25 00:00 daily.3
drwxr-xr-x    7 root     root         4096 Dec 24 00:00 daily.4
drwxr-xr-x    7 root     root         4096 Dec 23 00:00 daily.5
drwxr-xr-x    7 root     root         4096 Dec 22 00:00 daily.6
drwxr-xr-x    7 root     root         4096 Dec 29 00:00 hourly.0
drwxr-xr-x    7 root     root         4096 Dec 28 20:00 hourly.1
drwxr-xr-x    7 root     root         4096 Dec 28 16:00 hourly.2
drwxr-xr-x    7 root     root         4096 Dec 28 12:00 hourly.3
drwxr-xr-x    7 root     root         4096 Dec 28 08:00 hourly.4
drwxr-xr-x    7 root     root         4096 Dec 28 04:00 hourly.5

Inside each of these directories is a “full” backup of that point in time. The destination directory paths you specified under the backup and backup_script parameters get stuck directly under these directories. In the example:

backup          /etc/           localhost/

The /etc/ directory will initially get backed up into /.snapshots/hourly.0/localhost/etc/

Each subsequent time rsnapshot is run with the hourly command, it will rotate the hourly.X directories, and then “copy” the contents of the hourly.0 directory (using hard links) into hourly.1.

When rsnapshot daily is run, it will rotate all the daily.X directories, then copy the contents of hourly.5 into daily.0.

hourly.0 will always contain the most recent snapshot, and daily.6 will always contain a snapshot from a week ago. Unless the files change between snapshots, the “full” backups are really just multiple hard links to the same files. Thus, if your /etc/passwd file doesn't change in a week, hourly.0/localhost/etc/passwd and daily.6/localhost/etc/passwd will literally be the same exact file. This is how rsnapshot can be so efficient on space. If the file changes at any point, the next backup will unlink the hard link in hourly.0, and replace it with a brand new file. This will now take double the disk space it did before, but it is still considerably less than it would be to have full unique copies of this file 13 times over.

Remember that if you are using different intervals than the ones in this example, the first interval listed is the one that gets updates directly from the main filesystem. All subsequently listed intervals pull from the previous intervals. For example, if you had weekly, monthly, and yearly intervals defined (in that order), the weekly ones would get updated directly from the filesystem, the monthly ones would get updated from weekly, and the yearly ones would get updated from monthly.

When rsnapshot is first run, it will create the snapshot_root directory (/.snapshots/ by default). It assigns this directory the permissions 700, and for good reason. The snapshot root will probably contain files owned by all sorts of users on your system. If any of these files are writeable (and of course some of them will be), the users will still have write access to their files. Thus, if they can see the snapshots directly, they can modify them, and the integrity of the snapshots can not be guaranteed.

For example, if a user had write permission to the backups and accidentally ran rm -rf /, they would delete all their files in their home directory and all the files they owned in the backups!

If users need to be able to pull their own backups, you will need to do a little extra work up front (but probably less work in the long run). The best way to do this seems to be creating a container directory for the snapshot root with 700 permissions, giving the snapshot root directory 755 permissions, and mounting the snapshot root for the users read-only. This can be done over NFS and Samba, to name two possibilities. Let's explore how to do this using NFS on a single machine:

Set the snapshot_root variable in /etc/rsnapshot.conf equal to /.private/.snapshots/

snapshot_root       /.private/.snapshots/

Create the container directory:

mkdir /.private/

Create the real snapshot root:

mkdir /.private/.snapshots/

Create the read-only snapshot root mount point:

mkdir /.snapshots/

Set the proper permissions on these new directories:

chmod 0700 /.private/
chmod 0755 /.private/.snapshots/
chmod 0755 /.snapshots/

In /etc/exports, add /.private/.snapshots/ as a read only NFS export:

/.private/.snapshots/  127.0.0.1(ro,no_root_squash)

In /etc/fstab, mount /.private/.snapshots/ read-only under /.snapshots/

localhost:/.private/.snapshots/   /.snapshots/   nfs    ro   0 0

You should now restart your NFS daemon.

Now mount the read-only snapshot root:

mount /.snapshots/

To test this, go into the /.snapshots/ directory as root. It is set to read-only, so even root shouldn't be able to write to it. As root, try:

touch /.snapshots/testfile

This should fail, citing insufficient permissions. This is what you want. It means that your users won't be able to mess with the snapshots either.

Now, all your users have to do to recover old files is go into the /.snapshots directory, select the interval they want, and browse through the filesystem until they find the files they are looking for. They can't modify anything in here because NFS will prevent them, but they can copy anything that they had read permission for in the first place. All the regular filesystem permissions are still at work, but the read-only NFS mount prevents any writes from happening.

Please note that some NFS configurations may prevent you from accessing files that are owned by root and set to only be readable by root. In this situation, you may wish to pull backups for root from the "real" snapshot root, and let non-privileged users pull from the read-only NFS mount.