IMPORTANT INFORMATION!
The original creator and maintainer of this web site Tony Sanderson died in June 2006. This page is being maintained in his memory by others.
The script and associated items on this page is now being maintained by mybrid~at~datarebels~dot~com so please direct all enquiries there.

A BACKUP STRATEGY FOR LINUX VIA CD OR DVD

Backup version 4.31 (Feb 26, 2005)

Jan 2006 note: This script uses the standard unix archiving program cpio. Version 2.6.7 of cpio as supplied with Fedora Core 4 for X86_64 had some sort of checksum bug. If you're using that release of Linux, upgrade your cpio to 2.6.8 or better. (Thanks to Richard Kline for passing this on.)


For a Linux server which has no high-capacity tape drive fitted but which does have access to a CD or DVD burner (either on-board or via another networked machine), and plenty of spare disc for some large temporary files, a useful and convenient alternative is to use multiple CDs or DVDs instead.

Although originally written for use with CDs, as of version 4.20 (and fsplit version 2.2), backup will now handle DVDs (using the -s size parameter). So wherever you see the abbreviation CD, just read that as CD or DVD, as appropriate.

Backup is a shell script that makes use of standard unix utilities such as find, sed, and cpio, plus a small file-splitter called fsplit (#) (source and Linux x86 executable included) to create a cpio backup archive of your system. The archive is created on your hard drive in the form of a series of files which can then be burned onto separate CDs or DVDs.

The default fragment size is still currently set to around 640 Mb so they can fit onto normal CDs, but as of version 4.20, this can be changed via "-s " to suit common DVDs (eg: -s 4.7). The chunk files are automatically named by backup by appending 000, 001 etc onto a date-derived file name base.

The script also provides for full or partial recovery from the CD or DVD set, plus the listing of archive contents in long or short format.

(#) fsplit is just a basic little file splitter that I wrote for this project (although I occasionally use it for other things now, such as chopping the ends off MPEG files when a video grab has gone too far). The supplied binary is an x86 Linux executable. For other unix platforms, just stick fsplit.c in an empty directory and type "make fsplit" and Bob's your auntie. fsplit -h gives usage (but you don't need to know anything about that if you're only using it for this backup script).

Note 1: fsplit must be at least version 2.2 if you intend to burn DVDs. Earlier versions of fsplit (as included up to backup version 4.0) max out at 2Gb, which I didn't realise until I modified the script to allow for DVD-sized chunks. Your Unix/Linux file system also needs to be able to handle > 2Gb files, of course. In the case of Linux, this basically means elf2 or better.

My DVD burner is actually on an adjacent Windoze XP PC, so once the chunks have been created on the Linux server, I just FTP them over to that machine to burn them. Of course, if you have a burner on the Linux server itself, you can just create them locally - using (eg) CdRoast.

Once the (Joliet format) CDs or DVDs have been created, they can then be read at any time on the Linux server's drive.

backup_4.31.tar.gz includes backup version 4.31 (the main script), fsplit version 2.2 (x86 linux executable used by backup), fsplit.c version 2.2 (the fsplit source), and a sample bex file (the optional list of dir/file backup exclusions).

Extracting everything

To extract the archive, type (under Linux, which comes with GNU tar)
tar xvzf backup_4.31.tar.gz

or (under Unix, if you're only using their standard distributed tar)
gunzip backup_4.31.tar.gz
tar xvf backup_4.31.tar

Now move the backup script and the little fsplit program to a directory in your PATH (such as /usr/local/bin).

Decide on a backup "working directory"

Now decide on the backup working directory you'll be using for this first backup exercise. You can change the default value by simply altering the BKUP_DIR= line in the script. Whether you do that or not, you can specify it via -D at runtime for running backups of different systems. You need to define it as some place in your filesystem where there's plenty of room, whether that be on a real physical disc or on a network mount. Anyway, for the sake of this discussion, let's assume that you are using /usr/local/backups.

This backup working directory is where the script reads and writes its files by default. It's where it will try to create your actual backup-archive pieces, any list and log files, and (by default) where it expects to find "bex" - the backup exclusions file (unless you specify a different location and/or name via -B). It must be on a mount point with enough room to temporarily hold your entire backup set.

Incidentally, if the exclusions file can't be found by backup, it will be ignored, and a message to that effect will appear in Log.backup.

Usage parameters

Typing backup -h prints a screen-full of help as a bit of a memory jogger.

The script is slightly unusual in that simply typing backup (with no args) doesn't provide a one-line brief help message - doing this will actually start a backup, creating default CD sized chunks(see below).


Full backup example

Typing backup "as is" (ie: with no parameters) will create (in /usr/local/backups) an archive of your whole system (from root down) via a set of 600+Mb archive chunks to suit CD-Rs, looking something like this:

-rw-r--r--    1 root     root     665845760 Aug 11 00:49 200208111625.cpio.000
-rw-r--r--    1 root     root     665845760 Aug 11 00:54 200208111625.cpio.001
-rw-r--r--    1 root     root     665845760 Aug 11 00:59 200208111625.cpio.002
-rw-r--r--    1 root     root     665845760 Aug 11 01:04 200208111625.cpio.003
-rw-r--r--    1 root     root     199999488 Aug 11 01:06 200208111625.lastcpio.004
-rw-r--r--    1 root     root          162 Aug 11 17:37 Log.backup
-rw-r--r--    1 root     root           37 Aug 11 00:40 find_errors
-rw-r--r--    1 root     root           15 Aug 11 16:45 errors
-rw-r--r--    1 root     root      3424867 Aug 11 00:40 list

If you use compression (-g or -b), the string .gz or .bz2 will also be included in your chunk names (*), so (eg) the chunk called 200208111625.cpio.000 would be 200208111625.cpio.gz.000.

Note: There is a known bug with the creation of compressed archive sets via backup version 4.20 - click here for details and a workaround if you've created any compressed backups using that version.

The Log.backup file seen in the above file set (introduced in version 3.40) just keeps a few lines of info relating to each run, viz:

Date run=Wed Jan 29 15:22:41 EST 2003
Command=/usr/local/bin/backup
Wkg dir=/usr/local/backups
Back up=/
Archive=/usr/local/backups/200301291522.cpio.*
/usr/local/backups/bex backup-exclusions found and read

This is an accumulating log, ie: every backup run leaves a record in it.

I only felt the need for such a log recently. I'd decided it was about time to make another delta backup, and found an existing archive from the previous month sitting in my backup directory. I then felt somehow obliged to stuff around for several minutes to work out what it was before starting the fresh backup. So the obvious work-around was just to log all future runs.

The find_errors file is the error output of the find command, and errors is the error output from cpio. It's worth browsing through them after running backup for the first time on a new system, at least. On Linux, find_errors will typically just contain one or two harmless complaints about access to the /proc directory. This occurs even though /proc is in the default exceptions list because find traverses every directory - it's not until the following sed command in the pipe that the filtering occurs. So find_errors may look something like:
find: ./proc/5/fd: Permission denied

which is harmless (unless of course you intended to include a floppy disc in the backup).

The cpio errors file, errors, should only contain a final block-count figure, such as:
5760242 blocks

(As with many unix commands, I suspect that the only reason cpio puts the block count out via standard error like this is that there's nowhere else to put it! Using standard out would be even less appropriate, since cpio is often used as a component of a unix pipe.)

The first time you use the script (or after you make changes to it), you should also have a quick browse through list to make sure there are no surprises. That is - make sure that the most obvious directories (such as /etc, /usr/lib ...) are all present and accounted for. And do the reverse check - that none of the directories which should have been excluded are present.

list will just be a 'directory relative' list looking something like:

./usr
./usr/local
./usr/local/lost+found
./usr/local/bin
./usr/local/bin/hd
./usr/local/bin/dig
./usr/local/bin/wh
./usr/local/bin/host
./usr/local/bin/slogin
./usr/local/bin/webalizer

... and so on ...

If you want to know how many files and directories have been included, use the wc command, eg:

<art-root>: % wc list
wc: list:216581: Invalid or incomplete multibyte or wide character
 446744  447041 29208297 list

This told me that list contains 446744 lines, which is the total number of directories and files included. Not terribly important but interesting nonetheless. Complaints from wc such as the one about line 216581 are worth looking into. When I opened list and had a look at this line, I found a file called Tr?ja.JPG in our June 18, 2004 issue of Friday humour . I checked this out by going to the actual directory and typing ls -b Tr* , which tells me that its actual name was Tr\363ja.JPG, ie: Tr followed by octal 363 and ja.JPG! Rather odd, but we'd obviously received it under that name and it was still accessible, so I shrugged my shoulders and pressed on - I wasn't about to repeat the whole backup for the sake of one inconsequential file name.

If all looks well, you can then copy each of your chunks onto CDs or DVDs as appropriate (1 per disc).

I also copy list and backup onto the first CD or DVD for later convenience. This ensures that I have the list of the files that are available with the particular CD set, and (in the case of a full-scale disaster) the script for getting it back!

So my first CD typically looks like:

-rw-r--r--    1 root     root     665845760 Aug 11 00:49 200208111625.cpio.000
-rw-r--r--    1 root     root      3424867 Aug 11 00:40 list
-rwxr-xr-x    1 root     root        10761 Apr  4 23:33 backup*
or (in the case of DVDs, using -s4.7), it looks like:

-rw-r--r--    1 root     root     4606000000 Nov  2 00:49 200411021144.cpio.000
-rw-r--r--    1 root     root      27276438 Nov  2 00:40 list
-rwxr-xr-x    1 root     root        10761 Nov  1 23:33 backup*

If you do forget to include list on the CD or DVD, it can be regenerated later using backup -t (or backup -T). And if you forget to include backup - well, you'll just have to grab another copy from here (assuming Bluehaze still exists), or do your recovery manually as described in backup version 2.7. Note, though, that this older method assumes that all the archives have been copied back from the CDs onto your hard disc, which can potentially get ugly space-wise.


Making intermediate (delta) backups

Once a full backup has been made, many subsequent backups may then be done as deltas. So, for example, if you last did a full backup 4 weeks (28 days) ago, you could now cover yourself with a delta via:

%<root> backup -n 29

This will pick up only those files that have been touched in the last 29 days (I usually add +1 or so to play safe).

You can do a number of deltas over a period of weeks or months, and provided you always specify a number of 'days' for the delta that at least covers the period of time since your last full backup, you can then stage a recovery via the original full backup followed a 2nd, minor 'refresh' recovery using your latest delta.

The reason for doing deltas is of course that - whereas your full backup may have consumed a number of CDs or DVDs - subsequent deltas will only require a single disc until such time as you add (and/or change) a total of 630Mb worth of files. And that will often be several months.

Anyway, once you reach that point (ie: where a delta needs more than 1 CD or DVD to hold it), it may be a good time to do another full backup!

Deltas - CD or not CD?

If you don't feel inclined to burn delta archives onto CD or DVD all the time, another option is to run backup to create the deltas but then, instead of burning to CD/DVD, just copy or move the resulting cpio archive(s) onto another machine for safe-keeping. After all, the probability of two machines losing a disc on the same day is pretty close to zero! That way, you can avoid building up a large pile of 'delta' CDs in your cupboard.

To recover from from such a file (instead of from a CD or DVD), just include the file name(s) as an extra parameter when you run backup -r. For example:
backup -r 200208111625.cpio*
Giving such a file parameter tells backup to recover from this file (set) instead of from a CD (set). You need to be using at least version 3.30 of the script, though (earlier versions assume that recovery is always from a CD).


General usage comments

The implicit assumption with this script is that entire trees will be backed up (root, by default) except for any you've specified via one of the sed REs (regular expressions) in the bex file. BTW, you would almost certainly want to edit bex to suit your own system. Directions and suggestions for doing this are included in the default version.

Irrespective of any such exclusions, with 9Gb (*) lying around in my /usr/local tree, I generally find that I need to do some tidying up before I run the script. Probably not a bad thing thing in itself (although having a $70,000 high-speed tape jukebox like we have at the work QTH would certainly make life much easier :-)

But if I just run backup from a cold start, I often find that it wants to create 9 or 10 CDs (*), and that just seems plain silly. As of backup version 4, you can now use compression (via -g (gzip) or -b (bzip2)) to reduce the number of CDs required - although be warned that (a) version 4.20 has a bug with this in terms of file-naming (see Changes and Updates version 4.30 for more details), (b) this can slow things down considerably, and (c) it will only squash things to around 60% on such a large and diverse archive (see item 4 in CAVEATS and COMMENTS further down). But just being picky about what gets included can be very useful, and this is where the "tidy up" comes in.

(*) Even now that I'm burning onto 4.7Gb DVDs (2 years after writing the above paragraphs), my main server has grown considerably and now has around 35Gb of system and application data to back up. So this sort of careful pruning is still very useful!

What I do before running backup is to go into each of /usr, /usr/local, and /usr/local/src  (to name the three main offenders) and run the du command as follows:

%<root> du -sk * > DUSAGE

I can then whizz around looking into each of these DUSAGE files to see where most of the megabyte mass lies, and shuffle directories around appropriately. For example, in my /usr/local/src  tree, I may have 500Mb worth of directories of externally sourced stuff that can quite happily be moved down into /usr/local/src/nobackup. That way, the script will skip them, because the string /nobackup/ is in my RE exclusions list.

Also, I might find some new, big fat directories in the Linux system area that deserve to be added to the backup exclusion list. Three that I did add after installing Mandrake 7.2 were /usr/share, /usr/src, and /usr/doc. This eliminated about 700Mb of stuff, and will have no particular effect on my recovery process when and if I need it (I can retrieve those off the Linux installation discs).

Of course, you may spend an hour doing this sort of thing and then run backup only to find that you're still dissatisfied with the result (as in still having too many chunks). The only option is to iterate - go around again and either (a) move still more stuff into one of the backup-exclusion 'exclusion' directories, or (b) [where the directories themselves can't be moved because they're in the system area] by adding more exclusion directories to the bex file.

Doubtless this all sounds rather messy, but unless you're prepared to fork out the necessary dollars for a high-speed tape drive that will back up 20+ Gb for you, you really have no choice. The good thing about this system (if any) is that the real effort (as described above) is at backup time. Recovery (when you finally do need it) is relatively straightforward.


Verifying your first run

Okay - so you've made a backup and burned it onto a set of CD/DVDs, but if you haven't used the script before, a little voice in the back of your head should be whispering "How do I really know that these fragments can be reliably pieced back together for a recovery if and/or when the evil day comes ...?"

Well, there are a number of ways of verifying a set, varying from a quick and dirty 5 minute method to the more exhaustive.

Go to the backup directory to run these tests unless you've modified the script to point the log files somewhere else:

Quick and dirty

Just type:

%<root> backup -t

and insert your CDs as requested. If this runs to completion with no complaints and generates the log file (called rlist - ie: the recovery list), your archive set is indeed capable of being strung together at any time in the future and used for a recovery, because cpio must have read all your fragments from beginning to end to generate rlist.

The first few times you use the script, you may also want to verify the number of files which have been backed up in your CD set. You can do that by comparing the original list with the recovery list, using something like:

%<root> wc list rlist

This prints the line, word and byte count of each file (plus the totals).

Ignoring the totals line (which is irrelevant), all you really need to know is that column 1 (the line count = the number of files) is the same for both files.

A more accurate method using diff

Unfortunately, doing a diff between the input and output logs is a little messy, because (a) the filename lines are in slightly different formats, and (b) cpio logs its output in a slightly random order, depending on file links etc. But if we sort them and add the leading "./" to each line in rlist before we run diff, the comparison will work, as in:

sort list > list-sorted
sed -e s'/^/.\//' -e s'/\.\/\.$/./' rlist | sort > rlist-sorted
diff list-sorted rlist-sorted

(Just copy and paste the above block into a shell window if you want to try it.) It assumes, of course, that list and rlist correspond to the same archive.

If diff is silent with this comparison, it means that every file mentioned in your source list ( list ) is in the archive.


And for the ultimate warm feeling ...

The first one or two times you use backup, I'd also strongly recommend that you do a full restore from the archive set into an empty spare directory somewhere. This will verify that everything will indeed work for a full-scale recovery but without actually replacing anything in the original area. Check your spare disc space first via df -k to make sure that you have got enough room to hold such a full restore.

Now get your CD backup set ready, cd to a convenient empty directory, and type backup -r and start feeding in CDs. Satisfy yourself that things run to completion with no complaints. When the whole process has finished without errors (and don't forget to check rerrors as well), type du -sk . in the same directory to get at least a rough idea of how much was restored. (This command will also take a couple of minutes to run.) The size should be approximately the same as the sum total of all your archive pieces.

Now drop down into one or two of your favourite hunting grounds in this recovered tree. Does it all look correct? Do the owners and permissions look about right?

This is the absolute "sledge-hammer" test, and I'd suggest you do it the first time you try the script. That way, you'll know that your data really is safe and recoverable.

If you have the time, you might even like to experiment with the -p flag for restoring just a single file or a single directory. Delete the restored tree in the above example, and now try bringing back a single file or a single directory. (This is described in more detail under the heading "Method 3" below). But as with the full-restore example, I suggest you spend a little time trying this after you first start using the script. Don't leave it until a user rushes up to you in a complete panic!

As a side issue which is doubtless out of place in here (because it's just common sense): It's good practice to keep at least one CD set off-site to protect against (eg) theft or fire. Don't make this your latest backup set though - you just never know when you'll need those at short notice. Keep at least one "slightly older version" off-site. After all, they're only meant to be your last-gasp reserve in case of a total disaster such as NY World Trade Centre.


Types of recovery procedures

Before we discuss the various types of file recovery, there are a couple of points that you need to keep in mind:

Files are archived with path (directory) information rather than just as simple file names and these paths will always be re-created as the files are pulled out, and moreover

Because these paths are all relative as opposed to absolute, they are re-created under the current working directory in which you do the restore.

So, for example, assuming backup was run without using -d (so it backed up from the root directory down), a file /usr/local/bin/hd which is recovered using backup -r whilst you're in (for example) the directory /usr/local/tmp will end up being restored as /usr/local/tmp/usr/local/bin/hd.

On the other hand, if you did the recovery by running backup -r from the root directory, it would be restored into its original spot, ie: into /usr/local/bin/hd.

In other words, whatever path existed between the target file and the directory from which backup originally ran (root, by default) is recorded as part of the filename and will be recreated (if necessary) during the restore. Furthermore, it will always be created relative to the directory in which you run that backup -r (restore) command.

You can see the form of these relative-path filenames by reading rlist. (Run backup -t if you don't have rlist sitting there already.) You'll see something like:

usr
usr/local
usr/local/lost+found
usr/local/bin
usr/local/bin/hd
usr/local/bin/dig
usr/local/bin/wh
usr/local/bin/host
usr/local/bin/slogin
usr/local/bin/webalizer

This tells you that cpio will always re-create the file hd in the form of usr/local/bin/hd - relative to whatever directory you're in at the time. This is how cpio (and tar) always recover their files - period! Okay, now on to some examples.


To retrieve one or more files, the basic procedure is to cd into a suitable empty directory and type

%<root> backup -r

and just follow the instructions. Backup then prompts you for each CD. As such, this will dump the entire archive into that directory. That may or may not be exactly what you want, so now we'll look at some possible permutations.


Full direct restore

Overall, the archive may be used for recovery via a number of slightly different methods, depending on what you need on the day.

Method 1 - if a complete recovery is needed (after a disc trash or similar disaster), you would typically:

(a) Do a fresh, minimal install of the same version of Linux off the original installation disc(s), then
(b) type (as root) the following command:

%<root> cd /
%<root> backup -fr

This does a full, forced recovery.

If you also have a delta backup which is newer than the full backup set, repeat the above process using that CD (or file) as well.

As soon as this recovery completes, reboot. (Yep - that's all there is to it - full recoveries are straightforward)


Restoring selected files or directories (quick and dirty)

Method 2 (brute force but very simple and reliable): cd into an empty temporary directory and extract the entire archive into it. For example:

%<root> cd /usr/local/tmp
%<root> backup -r

All directories and files will then be pulled out and installed under this temporary directory with a dir structure which mirrors the original system tree under / - but transplanted.

If you also have a delta backup which is newer than the full backup set, repeat the above process using that CD (or file) as well.

You can then fetch individual files out of the usr/local/tmp tree and cp or mv them at your leisure.

If you need to refresh entire directory trees, then (assuming you'd pulled out a whole archive into /usr/local/tmp as just described), let's suppose you'd now like to automatically refresh (as necessary) all files in the real (live) /etc/sysconfig tree.

Just do this:

%<root> cd /usr/local/tmp/etc/sysconfig
%<root> find . | cpio -pdmv /etc/sysconfig 2>errors

This uses cpio's pass through mode (-p) to update all files and directories in the /etc/sysconfig tree which NEED updating from the recovered ones in /usr/local/tmp/etc/sysconfig. I've done this sort of thing quite often. You can use the same method to recover any directory tree.

Remember - because we unpacked the archive into /usr/local/tmp, everything in that directory will mirror what used to be in the root directory on the day the backup backup was made. So (eg) the old /etc directory will now appear as /usr/local/tmp/etc, the old /usr as /usr/local/tmp/usr, and so on.

Incidentally - if you do update just about anything in or below /etc as per the above example, you may need to reboot before going too much further. Linux keeps all its startup stuff in and below /etc, and a warm reboot is the quickest way of picking up any configuration data that may have been modified.


Restoring selected files or directories (more elegantly)

Method 3 - retrieve only selected files or directories from the archive using cpio's somewhat macabre "pattern matching" syntax, as in:

%<root> backup -r -p 'some/dir/level.c'

Provided that you're using the GNU cpio (which is the default for Linux), this intuitive kind of file-matching string specification for recovering one file will work fine.

By the way, don't include a leading path-slash.   The backup script uses find and cpio in such a way that file locations are recorded by cpio in relative form, to provide the most flexible retrieval. So specifying a name using a leading path slash in a pattern will always fail.

As a more concrete example, you may want to refresh a user's directory called /home/fiona/c-files because one or more files have been accidently deleted. Assuming that the original backup was done from root (the default), you could do this:

Method 3-A (User has just accidentally deleted one or more files)
%<root> cd /
%<root> backup -r -p 'home/fiona/c-files/*'

If you also have a delta backup which is newer than the full backup set, repeat the above process using that CD (or file) as well.

Note that since we haven't used the -f (force) flag in the above example, files will only be restored if they're missing from the directory. (Those on the backup CD set are unlikely to have more recent dates than those on disc.) If you do need to ensure that all files are restored back into the directory, maybe pull them into an empty temporary directory first by doing something like:

Method 3-B (User confused and initially wants to check previous versions)
%<root> cd /home/fiona/c-files
%<root> mkdir temp
%<root> cd temp
%<root> backup -r -p 'home/fiona/c-files/*'

and then copy (or move) them as required.

Or you could, of course, just force things via the "f" flag, as in:

Method 3-C (Forced restore of all backed up files in this tree)
%<root> cd /
%<root> backup -fr -p 'home/fiona/c-files/*'

but keep in mind that this will replace every file in her tree even where the archived files are older. Method 3-A or 3-B are the more common approach.


For non-GNU versions of cpio, such as one finds by default on commercial versions of Unix (Solaris, HP-UX, Irix, etc), the pattern matching syntax is unfortunately even more macabre. The corresponding man entries claim (as does GNU cpio) that it follows the normal shell pattern-matching rules, but certainly with Solaris this doesn't seem to be the case.

In particular, a forward-slash won't match itself, and an asterix must be used instead, as in:

%<root> backup -r -p 'home*fiona*c-files*'

In theory, this leaves open the possibility of retrieving other files, since the asterix will match anything.

In practice though, it's really just an irritation. All that happens is that one day you forget the need for asterixes and inadvertantly type in real slashes instead, and you end up wasting half an hour retrieving nothing at all.

At which point you may decide to go and install the GNU cpio instead. Or you may just do as I do, and use method 1 or 2 (the brute force approach) and just ignore these pattern-matching retrieval options altogether!


CAVEATS and COMMENTS
The usual comments re my "legal irresponsibility" apply if you use any of this.
Version 2.6.7 of cpio as supplied with Fedora Core 4 for X86_64 had a checksum bug. If you're using that release of Linux, upgrade your cpio to 2.6.8 or better. (Thanks to Richard Kline for passing that on.)
If the size of the filesystem portion being archived is X megabytes in size, you must have around X megabytes (*) of spare disc space to use backup. You'll also need the same amount to stage a recovery if you want to use Method 2 above.
(*) Or approx 0.6X if using compression (-g or -b)
Compression via gzip (-g) seems to work much faster than bzip2 (-b) for large archives, but bzip2 seems to achieve a somewhat better compression ratio. For example, in terms of speed, bzip2 took about 8 hours to compress 5Gb on my 366MHz 64Mb Celeron, whereas gzip did the same job in around 90 minutes. Recovery on gzipped archives is also faster (by almost 2X), so in my case I currently prefer gzip even though it costs me one more CD (ie: 5 instead of 4).
The best arrangement is to have a suitable CD burner on the same machine which has created the archive pieces. Personally, I'm not that fortunate yet - my burner's on a Windoze box. If you do have to move the archive pieces over to another machine to burn the CDs and you use FTP (*), just make quite sure you use binary mode to do it! A quick and easy way of verifying this is to look at the chunk sizes on the destination system and ensure that they're identical to the originals.
(*) I now tend to use the FTP that comes as part of the Windoze SSH Secure Shell program. It's so easy - if you're already logged into a remote machine, all you have to do is click on the FTP button on the toolbar and it fires up a 2nd window for FTP with no extra logging in required. Available from SSH Communications Security Corp (free for non-commercial or academic use). Because their FTP is tunnelled through SSH, it's also secure, so you can use it for external connections with no worries about having your password sniffed.
Remember particularly that files are archived in the form of relative pathnames, so they will always be restored relative to your current working directory. So cd to an empty directory somewhere before you start, or (if you intend a full, direct emergency recovery) cd into "/" before you "unload" the archive. Pulling out, for example, the entire /usr tree into the /etc tree just because you happened to be in /etc when you started a recovery can be very messy to clean up (and yes - I did do this on a really bad day many years ago).
The delta option (-n [days]) should really have an automated variant. The need for working out the number of days for -n is a pain and the system should be doing it. After all, all we usually want from a delta is to back up all changed files since the last completed, full backup. If that was available, it could then be run regularly via cron.
Why use cpio instead of tar for all this? Well, I was "brought up" on cpio (System V Unix from Nat Semi), and I think it's more flexible in its recovery than tar. And on Linux, at least (thanks to its GNU heritage), the man entry for tar (*) contains the usual infuriating GNU disclaimer to the effect that " The GNU folks, in general, abhor man pages, and create info documents instead. The maintainer of tar falls into this category. This man page is neither complete, nor current." Well, okay guys (?!), but I prefer doco that's current, easy to print, and easy to navigate through - and on unix, that means a man entry, a plain text README, or some HTML. And since cpio comes with a proper man entry, it's another small reason I prefer it over (GNU) tar.
Actually - since Redhat 9.0 at least, the GNU tar man entry has improved. It actually has one small paragraph at the top, the options are now included, and the sentence about GNU "abhoring man entries" has been deleted. So tar may almost be worth a look soon ...
Protect your CD set carefully after you burn them. If you damage one, you won't be able to read any that follow it. (Obvious enough, I suppose, but it should be said for completeness.) And try to keep at least one semi-recent set off-site to guard against theft and/or fire.
I've also provided a page re backing up Windoze PCs using a vaguely similar method.
Q: What is worse than not making a backup?
A: Making a backup that doesn't work!

After making a backup using any new system, always invest the time to do a trial recovery, even if it's just a single file or directory being pulled into a spare directory somewhere. Never trust any backup program (including this one) until you've seen it work full circle.
I once saw a department make backups for over a year (every Friday night) before discovering on the critical day that they were all useless. They were most disconcerted by this discovery.
(Granted - the guy that set it all up for them was a dickhead that I later had to sack anyway - but it can still happen to any of us. Don't let it happen to you - you must do at least one trial recovery before you trust any new backup system.)


Miscellaneous thoughts ...

The above may all sound somewhat drawn out and involved to the Unix "newbie", but in reality, it is no more than a series of very simple steps - plus 3 recovery alternatives, depending on your needs. It provides a backup scheme which (with the exception of the little backup script, fsplit, and your CD-burner software) is all plumbed up from standard bits and pieces of good old Unix.

On the other hand, comprehensive high-speed tape-based backup systems are not cheap. A reasonably good one will set you back around $AU30,000, and you can spend 2 to 5 times that amount if you want the real up-market equipment. And these systems are not just costly to purchase - they have a steep learning curve as well. So there is still no easy road.

I've used one or two commercial tape-based systems over the years, of course, and to be honest, they make me nervous. You're completely at the mercy of the company who designed and manufactured the tape unit, and the company who produced the backup/recovery software. Open systems, on the other hand, give me a warm, cosy feeling because (a) I can see what they're doing, and (b) I can go look up the 'man' entries for the different bits and get a full understanding. (Hell - with Linux, I can even get out the damn source and change it if I like ... :-)

And no - this system probably won't impress your average Windoze-soaked MS Certified IT manager. (But who cares ... in my experience, they usually can't recover files for their users anyway :-)


Other CD backup options

 CDBKUP is a suite of perl programs by John-Paul Gignac and is also worth a look. Features full or incremental backups, standard GNU tarballs, support for multi-session CDs, and can also split large backups between multiple CDs.

  The April 2002 edition of Sys Admin magazine featured a variety of backup scripts for unix. If you don't have this issue in your collection, it may be worth the effort of back-ordering it via their back issues web page. Has a very interesting CD backup script (back2cd) by Bryan Smith, plus lots of other interesting articles, such as Recovering deleted files in Linux.

 Mondorescue comes highly recommended by Lachlan (the other half of this site). See Lachlan's work-server page here. Lots of stuff re experiences with various CD and DVD drives with Linux, plus some interesting grabs off the Mondo development list, etc. A much more sophisticated and comprehensive system than backup.


Changes and updates

Version 4.31, 26-Feb-2005
Vincent Lussenburg discovered a problem when using backup on a system with no CD drive fitted. Basically, the script goes troppo due to an infinite loop between "ejectcd()", "clean()" and "error_exit()" under these conditions. In particular - if "clean" is called, it calls "ejectcd", but when the latter fails because the machine has no CD drive, "error_exit" is called, which then calls "clean" and we're back where we started.
Solution: Change the "ejectcd" code so that, on error, instead of calling "error_exit", it does its own error handling and clean-up.

Change #2: Back in Sep 2003 (!), Lee Parmeter pointed out that attempting to run backup with BKUP_DIR on an SMB mount fails because FIFOs aren't supported on Windoze file systems. Have thus changed the FIFO definition from $BKUP_DIR/$FI to $TMP/$FI to guarantee that it's always on a real filesystem, as per Lee's own mod.

For changes and updates for all previous versions, click here .


Conditions for use

This package is in the public domain.


Previous versions

 Backup version 4.30 (current from Jan 1, 2005 to Feb 25, 2005)

 Backup version 4.20 (current from Nov 15, 2004 to Dec 31, 2004)

 Backup version 4.0 (current from Jan 30, 2003 to Nov 14, 2004)


Vaguely related ravings on this site

The Micro$oft/Intel TCPA/Palladium project is bad news for all computer users, and something we all need to watch very carefully. It could mean the end of Linux and the end of lots of other great things that we currently use.
A few thoughts re Unix versus M$ Windoze (written after yet another Micro$soft server rebuild)

  To Bluehaze humour archives if you now feel the need to lighten up (backup systems are pretty boring, after all :-)

  Back to Bluehaze software archive page

 Bluehaze home page

Last revised: Wed 18-Jan-2006 (mention the Fedora Core 4 X86_64 cpio bug)
By Tony Sanderson (Bluehaze) and a cast of thousands (well, 6 or 7 anyway ...)