Difference between revisions of "Software: GZIP vs. BZIP2 vs. XZ - performance"
Lukas Dzunko (talk | contribs) (→Decompression time and memory usage) |
Lukas Dzunko (talk | contribs) (→Decompression time and memory usage) |
||
Line 145: | Line 145: | ||
{|class="wikitable" | {|class="wikitable" | ||
− | ! || DVD.iso || fs.bin || linux.tar || random.bin || zero.bin | + | ! || DVD.iso || fs.bin || linux.tar || random.bin || zero.bin || sql.dump |
|- | |- | ||
! gzip -1 | ! gzip -1 | ||
− | || 03:19.63 (3184) || 00:53.82 (3152) || 00:15.20 (3168) || 00:25.38 (3120) || 00:25.46 (3104) | + | || 03:19.63 (3184) || 00:53.82 (3152) || 00:15.20 (3168) || 00:25.38 (3120) || 00:25.46 (3104) || 00:17.75 (3152) |
|- | |- | ||
! gzip -6 | ! gzip -6 | ||
− | || 03:11.92 (3152) || 01:01.34 (3168) || 00:13.51 (3152) || 00:23.63 (3136) || 00:40.41 (3088) | + | || 03:11.92 (3152) || 01:01.34 (3168) || 00:13.51 (3152) || 00:23.63 (3136) || 00:40.41 (3088) || 00:16.21 (3152) |
|- | |- | ||
! gzip -9 | ! gzip -9 | ||
− | || 03:11.00 (3152) || 01:01.08 (3168) || 00:13.17 (3168) || 00:23.75 (3120) || 00:40.41 (3088) | + | || 03:11.00 (3152) || 01:01.08 (3168) || 00:13.17 (3168) || 00:23.75 (3120) || 00:40.41 (3088) || 00:16.31 (3136) |
|- | |- | ||
! bzip2 -1 | ! bzip2 -1 | ||
− | || 27:22.65 (3584) || 06:38.13 (3600) || 01:15.41 (3600) || 05:05.59 (3584) || 00:42.02 (3584) | + | || 27:22.65 (3584) || 06:38.13 (3600) || 01:15.41 (3600) || 05:05.59 (3584) || 00:42.02 (3584) || 01:28.91 (3600) |
|- | |- | ||
! bzip2 -6 | ! bzip2 -6 | ||
− | || 35:12.75 (12048) || 08:51.09 (12048) || 01:40.37 (12032) || 06:45.97 (12032) || 00:42.78 (10992) | + | || 35:12.75 (12048) || 08:51.09 (12048) || 01:40.37 (12032) || 06:45.97 (12032) || 00:42.78 (10992) || 02:07.83 (12048) |
|- | |- | ||
! bzip2 -9 | ! bzip2 -9 | ||
− | || 35:49.54 (16272) || 09:00.72 (16256) || 01:44.57 (16272) || 06:53.12 (16272) || 00:42.96 (16272) | + | || 35:49.54 (16272) || 09:00.72 (16256) || 01:44.57 (16272) || 06:53.12 (16272) || 00:42.96 (16272) || 02:12.46 (16256) |
|- | |- | ||
! xz -1 | ! xz -1 | ||
− | || 18:58.10 (8128) || 01:26.62 (8128) || 00:33.49 (8128) || 00:13.17 (7968) || 00:51.64 (8080) | + | || 18:58.10 (8128) || 01:26.62 (8128) || 00:33.49 (8128) || 00:13.17 (7968) || 00:51.64 (8080) || 0:58.49 (8128) |
|- | |- | ||
! xz -6 | ! xz -6 | ||
− | || 18:53.31 (36784) || 01:14.53 (36816)|| 00:28.21 (36816) || 00:12.86 (36624) || 00:51.59 (36752) | + | || 18:53.31 (36784) || 01:14.53 (36816)|| 00:28.21 (36816) || 00:12.86 (36624) || 00:51.59 (36752) || 0:50.62 (36800) |
|- | |- | ||
! xz -9 | ! xz -9 | ||
− | || 18:47.33 (266176) || 01:11.99 (266192) || 00:25.69 (266176) || 00:13.15 (265984) || 00:51.58 (266144) | + | || 18:47.33 (266176) || 01:11.99 (266192) || 00:25.69 (266176) || 00:13.15 (265984) || 00:51.58 (266144) || 0:33.14 (266176) |
|- | |- | ||
Revision as of 13:00, 28 November 2013
Attention: this page is work in progress.
I was part of discussion on G+ recently. Discussion was about best possible compression method for Linux kernel. Later it was extended also to user space compression algorithm. I think it will be interesting to see various compress method and levels on different type of files.
Contents
Input data
For test i selected following files:
-
DVD.iso
- iso image containing mpeg2 stream (DVD-Video) and jpeg files (pictures) -
fs.bin
- ext4 file system containing "linux.tar" and "random.bin" -
linux.tar
- tarball archive of Linux kernel sources + objects and final kernel / module images -
random.bin
- file containing data from /dev/urandom -
zero.bin
- file containing only 'zero' data (read /dev/zero) -
sql.dump
- text dump of my PostgreSQL database (backup catalog, sql commands)
As a preparation i executed following cycle:
for a in DVD.iso fs.bin linux.tar random.bin zero.bin sql.dump do for b in 1 6 9 do cat ${a} | gzip -${b} > ${a}.${b}.gz cat ${a} | bzip2 -${b} > ${a}.${b}.bz2 cat ${a} | xz -${b} > ${a}.${b}.xz done done
Test methodology
Test is executed on "Intel(R) Atom(TM) CPU 330 @ 1.60GHz". System was running in dual core mode with HT enabled (SMP). There should be no significant difference using one core and "UP" code as compression/decompression is done in one thread. System was configured with 3GB of usable RAM memory and without CPU frequency scaling. At time of test system was idling. Sequential disk read speed is 80 MB/sec so it should not affect testing. I used /dev/null
as target for compression and decompression to prevent possible problems with concurrent I/O and cache entries.
Result
Size after compression
Table contain size reported by stat
and ls -lh
command (smaller is better):
DVD.iso | fs.bin | linux.tar | random.bin | zero.bin | sql.dump | |
---|---|---|---|---|---|---|
source | 6189107200 (5,8G) | 4294967296 (4,0G) | 760012800 (725M) | 1073741824 (1,0G) | 4294967296 (4,0G) | 739279146 (706M) |
gzip -1 | 5887330412 (5,5G) | 1307236772 (1,3G) | 222544172 (213M) | 1073924290 (1,1G) | 18734949 (18M) | 294669689 (282M) |
gzip -6 | 5879295258 (5,5G) | 1265502809 (1,2G) | 189164983 (181M) | 1073915726 (1,1G) | 4168175 (4,0M) | 257863244 (246M) |
gzip -9 | 5878183653 (5,5G) | 1263912039 (1,2G) | 187578775 (179M) | 1073915726 (1,1G) | 4168175 (4,0M) | 255110366 (244M) |
bzip2 -1 | 5845697940 (5,5G) | 1259732950 (1,2G) | 177327051 (170M) | 1082371295 (1,1G) | 26041 (26K) | 234198972 (224M) |
bzip2 -6 | 5485927519 (5,2G) | 1235652239 (1,2G) | 156321978 (150M) | 1079336646 (1,1G) | 4491 (4,4K) | 225847372 (216M) |
bzip2 -9 | 5430387273 (5,1G) | 1231448849 (1,2G) | 152999062 (146M) | 1078496689 (1,1G) | 3023 (3,0K) | 224565320 (215M) |
xz -1 | 5383272868 (5,1G) | 1227513964 (1,2G) | 153319596 (147M) | 1073795128 (1,1G) | 624848 (611K) | 235411304 (225M) |
xz -6 | 5305999740 (5,0G) | 1188389560 (1,2G) | 114173192 (109M) | 1073795048 (1,1G) | 624848 (611K) | 175777564 (168M) |
xz -9 | 5264433664 (5,0G) | 1174081380 (1,1G) | 99830680 (96M) | 1073795048 (1,1G) | 624848 (611K) | 100893196 (97M) |
Compression time and memory usage
for a in DVD.iso fs.bin linux.tar random.bin zero.bin sql.dump do for b in 1 6 9 do cat ${a} | /usr/bin/time -v -o ${a}.${b}.gz.c.txt gzip -${b} > /dev/null cat ${a} | /usr/bin/time -v -o ${a}.${b}.bz2.c.txt bzip2 -${b} > /dev/null cat ${a} | /usr/bin/time -v -o ${a}.${b}.xz.c.txt xz -${b} > /dev/null done done
DVD.iso | fs.bin | linux.tar | random.bin | zero.bin | |
---|---|---|---|---|---|
source | - | - | - | - | - |
gzip -1 | - | - | - | - | - |
gzip -6 | - | - | - | - | - |
gzip -9 | - | - | - | - | - |
bzip2 -1 | - | - | - | - | - |
bzip2 -6 | - | - | - | - | - |
bzip2 -9 | - | - | - | - | - |
xz -1 | - | - | - | - | - |
xz -6 | - | - | - | - | - |
xz -9 | - | - | - | - | - |
Decompression time and memory usage
For decompression test i used following for cycle:
for a in DVD.iso fs.bin linux.tar random.bin zero.bin sql.dump do for b in 1 6 9 do cat ${a}.${b}.gz | /usr/bin/time -v -o ${a}.${b}.gz.d.txt gzip -d > /dev/null cat ${a}.${b}.bz2 | /usr/bin/time -v -o ${a}.${b}.bz2.d.txt bzip2 -d > /dev/null cat ${a}.${b}.xz | /usr/bin/time -v -o ${a}.${b}.xz.d.txt xz -d > /dev/null done done
Table contain "wall clock" (e.g. total time of execution) and "maximum resident set size":
DVD.iso | fs.bin | linux.tar | random.bin | zero.bin | sql.dump | |
---|---|---|---|---|---|---|
gzip -1 | 03:19.63 (3184) | 00:53.82 (3152) | 00:15.20 (3168) | 00:25.38 (3120) | 00:25.46 (3104) | 00:17.75 (3152) |
gzip -6 | 03:11.92 (3152) | 01:01.34 (3168) | 00:13.51 (3152) | 00:23.63 (3136) | 00:40.41 (3088) | 00:16.21 (3152) |
gzip -9 | 03:11.00 (3152) | 01:01.08 (3168) | 00:13.17 (3168) | 00:23.75 (3120) | 00:40.41 (3088) | 00:16.31 (3136) |
bzip2 -1 | 27:22.65 (3584) | 06:38.13 (3600) | 01:15.41 (3600) | 05:05.59 (3584) | 00:42.02 (3584) | 01:28.91 (3600) |
bzip2 -6 | 35:12.75 (12048) | 08:51.09 (12048) | 01:40.37 (12032) | 06:45.97 (12032) | 00:42.78 (10992) | 02:07.83 (12048) |
bzip2 -9 | 35:49.54 (16272) | 09:00.72 (16256) | 01:44.57 (16272) | 06:53.12 (16272) | 00:42.96 (16272) | 02:12.46 (16256) |
xz -1 | 18:58.10 (8128) | 01:26.62 (8128) | 00:33.49 (8128) | 00:13.17 (7968) | 00:51.64 (8080) | 0:58.49 (8128) |
xz -6 | 18:53.31 (36784) | 01:14.53 (36816) | 00:28.21 (36816) | 00:12.86 (36624) | 00:51.59 (36752) | 0:50.62 (36800) |
xz -9 | 18:47.33 (266176) | 01:11.99 (266192) | 00:25.69 (266176) | 00:13.15 (265984) | 00:51.58 (266144) | 0:33.14 (266176) |
Note: Wall clock is in format m:ss
. Size is reported in kbytes.