I decided to benchmark rzip against bzip for my backup needs. The benchmark was performed on a 89M tar archive of a directory which I regularly backup using my Amazon S3 backup script. The directory contains mostly LaTeX, PDF and Open Office files, so this benchmark may reflect very different results than what you will get if you will test it on other kinds of files.
I ran both rzip
and bzip
with their default settings. bzip
, the program I currently use for my backups, managed to compress the tar file to 43.5M in 1:4.286 (min:sec). On the other hand, rzip
compressed the tar into a mere 39M and did in only 32.844 seconds.
When weighting the results based only on the resulted compressed file and processing time, rzip
takes bzip
by far. It has superior compressing abilities and rzip
does it two times faster than bzip
.
One must note that rzip
was much more memory intensive, so it isn’t suited for environments which are low on available memory. Another disadvantage of rzip
is that it can’t operate on stdin/stdout, so one must first physically create the file on the disk before compressing it.
As I don’t have any memory shortage on my machine (especially during the night when I make my backups), and I can live with creating a temporary tar file, I plan to switch over my backup script to use rzip
. The new script will probably work very similar to the current one except it will produce a rzip
compressed archive, which will help me cut down the costs of backups.
I tested rzip on a directory of 3186 files of saved webpages, mostly
news articles, in the .mht format. One .mht file for each webpage
containing html, pictures, js/css-files etc. Since the same pictures
are used on many pages, rzip should be ideal for this task.
I first tar’ed the directory into webpages.tar resulting in a 1077 MB file.
gzip redused the size to 515 MB in 1m 19s
bzip2 to 511 MB in 3m 59s
rzip to just 247 MB in 2m 19s
Using the -9 option (best compression) rzip returned 236 MB in 2m
37s. It did however use nearly one gigabyte of memory at most. Not a
problem on new powerful computers, but still worth mentioning.
I suspect rzip would do even better with smart sorting of the files
inside the .tar-file.