BackupPC FAQ: Road Map

What new features are planned for BackupPC?

What new features are planned for BackupPC?

Adding hardlink support to rsync. Currently the tar XferMethod correctly saves/restores hardlinks, but rsync does not. Rsync 2.6.2 has greatly improved the efficiency of saving hardlink information, so File::RsyncP and BackupPC::Xfer::RsyncFileIO need the corresponding changes. Hardlink support is necessary for doing a bare-metal restore of a *nix system.
Adding more complete utf8 support. BackupPC should use utf8 as the native charset for storing file names, and the CGI script should emit utf8 so that the file names can be rendered correctly. Additional configuration parameters should allow you to specify the client Xfer charset (ie: the filcharset delivered by the XferMethod). BackupPC should encode/decode between this charset and utf8 when doing a backup/restore. That way BackupPC can store all files in utf8 no matter what charset is used by the XferMethod to deliver the file names. Secondly, the CGI charset should be configurable (default utf8) and the CGI script BackupPC_Admin should encode the utf8 file names in the desired output charset. Finally, the charset used to deliver file names when restoring individual file names should also be configurable, and again BackupPC_Admin should encode the file names in this charset (again, utf8 default). That should allow the ``Save As'' IE dialog to default to the correct file name.
Adding a trip wire feature for notification when files below certain directories change. For example, if you are backing up a DMZ machine, you could request that you get sent email if any files below /bin, /sbin or /usr change.
Allow editing of config parameters via the CGI interface. Users should have permission to edit a subset of the parameters for their clients. Additionally, allow an optional self-service capability so that users can sign up and setup their own clients with no need for IT support.
Host summary improvement requests from Carl Soderstrom:
- It would be really nice to be able to sort the hosts by a particular column, eg: click on a column heading, and the table gets sorted and displayed in ascending order based on that item. click it again, and it's sorted in descending order. This would be really nice for seeing what hosts haven't been backed up in a while.
- Include a column for backup time, so I can look at the host summary, and get some idea how long this backup will take, without having to look into the individual host's summary.
- Some indication of real BackupPC load, eg: average number of backups running each 24 hours, perhaps plotted in 1 hour buckets. Also show BackupPC_link and BackupPC_nightly.
- Total disk space used. I know this is in the 'Status' screen, but that's the only thing I really look at on that screen. I spend most of my time looking at the Host Summary, so I'd like to have that data there as well.
- From Les Mikesell: how about making the host summary change colors when the backups are old for any reason? I let the server disk overfill again and it is starting to send some warning emails but the host summary page is still mostly green even though the backups are all 4-6 days old. A 'last success' timestamp column would be a nice touch since you have to scan both the full and incremental age columns to tell now. And I don't consider logging 'disk too full' to be a success, even though the program is doing what it is supposed to do.
From Sam Przyswa: only save a new partial if it has more files than the previous partial. Currently a new partial always replaces the pervious partial.
From Wayne Scott: add links on the LOG viewer to see the next and previous LOG file.
Add backend SQL support for various BackupPC metadata, including configuration parameters, client lists, and backup and restore information. At installation time the backend data engine will be specified (eg: MySQL, ascii text etc).
Because of file name mangling (adding ``f'' to start of each file name) and with pending utf8 changes, BackupPC is not able to store files whose file name length is 255 bytes or greater. The format of the attrib file should be extended so that very long file names (ie: >= 255) are abbreviated on the file system, but the full name is stored in the attrib file. This could also be used to eliminate the leading ``f'', since that is there to avoid collisions with the attrib file.
Adding support for FTP (via Net::FTP) and/or wget (for HTTP and FTP) as additional XferMethods. One question with ftp is whether you can support incrementals. It should be possible. For example, you could use Net::FTP->ls to list each directory, and get the file sizes and mtimes. That would allow incrementals to be supported: only backup the files that have different sizes or mtimes for an incremental. You also need the ls() function to recurse directories. The code would need to be robust to the different formats returned by ls() on different clients.

For wget there would be a new module called BackupPC::Xfer::Wget that uses wget. Wget can do both http and ftp. Certainly backing up a web site via ftp is better than http, especially when there is active content and not just static pages. But the benefit of supporting http is that you could use it to backup config pages of network hardware (eg: routers etc). So if a router fails you have a copy of the config screens and settings. (And the future tripwire feature on the todo list could tell you if someone messed with the router settings.) Probably the only choice with wget is to fetch all the files (either via ftp or http) into a temporary directory, then run tar on that directory and pipe it into BackupPC_tarExtract.

The advantage of using wget is you get both http and ftp. The disadvantage is that you can't support incrementals with wget, but you can with Net::FTP. Also people will find wget harder to configure and run.
Replacing smbclient with the perl module FileSys::SmbClient. This gives much more direct control of the smb transfer, allowing incrementals to depend on any attribute change (eg: exist, mtime, file size, uid, gid), and better support for include and exclude. Currently smbclient incrementals only depend upon mtime, so deleted files or renamed files are not detected. FileSys::SmbClient would also allow resuming of incomplete full backups in the same manner as rsync will.
Support --listed-incremental or --incremental for tar, so that incrementals will depend upon any attribute change (eg: exist, mtime, file size, uid, gid), rather than just mtime. This will allow tar to be to as capable as FileSys::SmbClient and rsync. This is low priority since rsync is really the preferred method.
In addition to allowing a specific backup (incremental or full) to be started from the CGI interface, also allow a ``regular'' backup to be started. This would behave just like a regular background backup and determine whether a full, incremental or nothing should be done. The equivalent BackupPC_serverMesg command would allow scripts to schedule regular backups, in addition to the current ability to force a full or incremental. (Suggested by Mike Trisko.)
Provided a script/utility that allows a file tree to be linked into the pool. Helpful if you restore or copy a backup and want to pool it. Suggested by Les Mikesell.
For rysnc (and smb when FileSys::SmbClient is supported, and tar when --listed-incremental is supported) support multi-level incrementals. In fact, since incrementals will now be more ``accurate'', you could choose to never do full backups (except the first time), or at a minimum do them infrequently: each incremental would depend upon the last, giving a continuous chain of differential backups.
Allow diffs between two different backed up files to be displayed. The history feature in 2.1.0 does show files that are the same between backups. Most often we would like to just take the diff of the same host, path and file between different backups, but it would be nice to generalize it to any pair of files (ie: different hosts, backups, paths etc). But I'm not sure how the user interface would look.
More speculative: Storing binary file deltas (in fact, reverse deltas) for files that have the same name as a previous backup, but that aren't already in the pool. This will save storage for things like mailbox files or documents that change slightly between backups. Running some benchmarks on a large pool suggests that the potential savings are around 15-20%, which isn't spectacular, and likely not worth the implementation effort. The program xdelta (v1) on SourceForge (see http://sourceforge.net/projects/xdelta) uses an rsync algorithm for doing efficient binary file deltas. Rather than using an external program, File::RsyncP would eventually get the necessary delta generation code from rsync.

Comments and suggestions are welcome.