![]() |
.tar.gz, gzip and pigz
We are probably Doing It Wrong™ but here is an interesting time-saver we worked out today.
When we get a relations file we run remdups4 on it and the results are sent to an uncompressed file. We have been gzipping the resulting file with gzip, but the [URL="http://zlib.net/pigz/"]pigz[/URL] [URL="https://packages.debian.org/jessie/pigz"]package[/URL] is [U]much[/U] faster. The test below is only a small file. With a larger file it really saves time! [CODE]$ ls -l total 6000512 -rw-r--r-- 1 m m 3072256000 Nov 1 16:02 gzip.tar -rw-r--r-- 1 m m 3072256000 Nov 1 16:03 pigz.tar $ md5sum * 4cd6e722fb2489a4d0a6fc9cba9758f6 gzip.tar 4cd6e722fb2489a4d0a6fc9cba9758f6 pigz.tar $ \time -v pigz -9 pigz.tar Command being timed: "pigz -9 pigz.tar" User time (seconds): 167.58 System time (seconds): 3.55 Percent of CPU this job got: 549% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.11 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 8992 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 3 Minor (reclaiming a frame) page faults: 1900 Voluntary context switches: 57420 Involuntary context switches: 59306 Swaps: 0 File system inputs: 32 File system outputs: 4378632 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 $ \time -v gzip -9 gzip.tar Command being timed: "gzip -9 gzip.tar" User time (seconds): 141.16 System time (seconds): 6.12 Percent of CPU this job got: 98% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:29.53 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1788 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 150 Voluntary context switches: 63 Involuntary context switches: 801 Swaps: 0 File system inputs: 0 File system outputs: 4376224 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 $ ls -l total 4377436 -rw-r--r-- 1 m m 2240623152 Nov 1 16:02 gzip.tar.gz -rw-r--r-- 1 m m 2241855831 Nov 1 16:03 pigz.tar.gz $ gzip -d *gz $ ls -l total 6000512 -rw-r--r-- 1 m m 3072256000 Nov 1 16:02 gzip.tar -rw-r--r-- 1 m m 3072256000 Nov 1 16:03 pigz.tar $ md5sum * 4cd6e722fb2489a4d0a6fc9cba9758f6 gzip.tar 4cd6e722fb2489a4d0a6fc9cba9758f6 pigz.tar[/CODE] |
Have you tried 7zip? It is fast and makes some small files.
|
Msieve reads (via a [URL="https://en.wikipedia.org/wiki/Zlib"]library routine[/URL]) gzip compressed files, which saves a lot of disk space. (For personal use we prefer bzip.)
|
We don't like compressed files because we can not peek inside, and they become unreadable if something goes wrong (like some bytes change, as opposite to uncompressed, where you still can recover most of the relations if a file crashes). Storage space is cheap and ramdisk (software, not physical) is fast.
|
[QUOTE=LaurV;414569]We don't like compressed files because we can not peek inside, and they become unreadable if something goes wrong (like some bytes change, as opposite to uncompressed, where you still can recover most of the relations if a file crashes). Storage space is cheap and ramdisk (software, not physical) is fast.[/QUOTE]
But storage space is slow when you're talking about files that are multiple GB in size (as msieve relations are), even on an SSD. I wasn't aware until a few days ago that msieve can read gzip'd relations directly, and I will certainly be using that in the future. :smile: I wasted a good chunk of time and storage space unzipping, copying, and deleting msieve.dat/msieve.dat.gz files. |
| All times are UTC. The time now is 00:57. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.