![]() |
|
|
#1 |
|
"Mike"
Aug 2002
25×257 Posts |
We are probably Doing It Wrong™ but here is an interesting time-saver we worked out today.
When we get a relations file we run remdups4 on it and the results are sent to an uncompressed file. We have been gzipping the resulting file with gzip, but the pigz package is much faster. The test below is only a small file. With a larger file it really saves time! Code:
$ ls -l
total 6000512
-rw-r--r-- 1 m m 3072256000 Nov 1 16:02 gzip.tar
-rw-r--r-- 1 m m 3072256000 Nov 1 16:03 pigz.tar
$ md5sum *
4cd6e722fb2489a4d0a6fc9cba9758f6 gzip.tar
4cd6e722fb2489a4d0a6fc9cba9758f6 pigz.tar
$ \time -v pigz -9 pigz.tar
Command being timed: "pigz -9 pigz.tar"
User time (seconds): 167.58
System time (seconds): 3.55
Percent of CPU this job got: 549%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.11
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 8992
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3
Minor (reclaiming a frame) page faults: 1900
Voluntary context switches: 57420
Involuntary context switches: 59306
Swaps: 0
File system inputs: 32
File system outputs: 4378632
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
$ \time -v gzip -9 gzip.tar
Command being timed: "gzip -9 gzip.tar"
User time (seconds): 141.16
System time (seconds): 6.12
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:29.53
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1788
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 150
Voluntary context switches: 63
Involuntary context switches: 801
Swaps: 0
File system inputs: 0
File system outputs: 4376224
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
$ ls -l
total 4377436
-rw-r--r-- 1 m m 2240623152 Nov 1 16:02 gzip.tar.gz
-rw-r--r-- 1 m m 2241855831 Nov 1 16:03 pigz.tar.gz
$ gzip -d *gz
$ ls -l
total 6000512
-rw-r--r-- 1 m m 3072256000 Nov 1 16:02 gzip.tar
-rw-r--r-- 1 m m 3072256000 Nov 1 16:03 pigz.tar
$ md5sum *
4cd6e722fb2489a4d0a6fc9cba9758f6 gzip.tar
4cd6e722fb2489a4d0a6fc9cba9758f6 pigz.tar
|
|
|
|
|
|
#2 |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
263A16 Posts |
Have you tried 7zip? It is fast and makes some small files.
|
|
|
|
|
|
#3 |
|
"Mike"
Aug 2002
25·257 Posts |
Msieve reads (via a library routine) gzip compressed files, which saves a lot of disk space. (For personal use we prefer bzip.)
|
|
|
|
|
|
#4 |
|
Romulan Interpreter
Jun 2011
Thailand
226138 Posts |
We don't like compressed files because we can not peek inside, and they become unreadable if something goes wrong (like some bytes change, as opposite to uncompressed, where you still can recover most of the relations if a file crashes). Storage space is cheap and ramdisk (software, not physical) is fast.
Last fiddled with by LaurV on 2015-11-02 at 05:08 |
|
|
|
|
|
#5 | |
|
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
17×251 Posts |
Quote:
I wasn't aware until a few days ago that msieve can read gzip'd relations directly, and I will certainly be using that in the future. I wasted a good chunk of time and storage space unzipping, copying, and deleting msieve.dat/msieve.dat.gz files.
Last fiddled with by Mini-Geek on 2015-11-02 at 23:47 |
|
|
|
|