mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2015-11-01, 23:00   #1
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

1F5C16 Posts
Default .tar.gz, gzip and pigz

We are probably Doing It Wrong™ but here is an interesting time-saver we worked out today.

When we get a relations file we run remdups4 on it and the results are sent to an uncompressed file. We have been gzipping the resulting file with gzip, but the pigz package is much faster. The test below is only a small file. With a larger file it really saves time!

Code:
$ ls -l 
total 6000512 
-rw-r--r-- 1 m m 3072256000 Nov  1 16:02 gzip.tar 
-rw-r--r-- 1 m m 3072256000 Nov  1 16:03 pigz.tar 
 
$ md5sum * 
4cd6e722fb2489a4d0a6fc9cba9758f6  gzip.tar 
4cd6e722fb2489a4d0a6fc9cba9758f6  pigz.tar 
 
$ \time -v pigz -9 pigz.tar  
    Command being timed: "pigz -9 pigz.tar" 
    User time (seconds): 167.58 
    System time (seconds): 3.55 
    Percent of CPU this job got: 549% 
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.11 
    Average shared text size (kbytes): 0 
    Average unshared data size (kbytes): 0 
    Average stack size (kbytes): 0 
    Average total size (kbytes): 0 
    Maximum resident set size (kbytes): 8992 
    Average resident set size (kbytes): 0 
    Major (requiring I/O) page faults: 3 
    Minor (reclaiming a frame) page faults: 1900 
    Voluntary context switches: 57420 
    Involuntary context switches: 59306 
    Swaps: 0 
    File system inputs: 32 
    File system outputs: 4378632 
    Socket messages sent: 0 
    Socket messages received: 0 
    Signals delivered: 0 
    Page size (bytes): 4096 
    Exit status: 0 
     
$ \time -v gzip -9 gzip.tar  
    Command being timed: "gzip -9 gzip.tar" 
    User time (seconds): 141.16 
    System time (seconds): 6.12 
    Percent of CPU this job got: 98% 
    Elapsed (wall clock) time (h:mm:ss or m:ss): 2:29.53 
    Average shared text size (kbytes): 0 
    Average unshared data size (kbytes): 0 
    Average stack size (kbytes): 0 
    Average total size (kbytes): 0 
    Maximum resident set size (kbytes): 1788 
    Average resident set size (kbytes): 0 
    Major (requiring I/O) page faults: 0 
    Minor (reclaiming a frame) page faults: 150 
    Voluntary context switches: 63 
    Involuntary context switches: 801 
    Swaps: 0 
    File system inputs: 0 
    File system outputs: 4376224 
    Socket messages sent: 0 
    Socket messages received: 0 
    Signals delivered: 0 
    Page size (bytes): 4096 
    Exit status: 0 
     
$ ls -l 
total 4377436 
-rw-r--r-- 1 m m 2240623152 Nov  1 16:02 gzip.tar.gz 
-rw-r--r-- 1 m m 2241855831 Nov  1 16:03 pigz.tar.gz 
 
$ gzip -d *gz 
 
$ ls -l 
total 6000512 
-rw-r--r-- 1 m m 3072256000 Nov  1 16:02 gzip.tar 
-rw-r--r-- 1 m m 3072256000 Nov  1 16:03 pigz.tar 
 
$ md5sum * 
4cd6e722fb2489a4d0a6fc9cba9758f6  gzip.tar 
4cd6e722fb2489a4d0a6fc9cba9758f6  pigz.tar
Xyzzy is offline   Reply With Quote
Old 2015-11-01, 23:21   #2
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

250616 Posts
Default

Have you tried 7zip? It is fast and makes some small files.
Uncwilly is online now   Reply With Quote
Old 2015-11-01, 23:42   #3
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

22×32×223 Posts
Default

Msieve reads (via a library routine) gzip compressed files, which saves a lot of disk space. (For personal use we prefer bzip.)
Xyzzy is offline   Reply With Quote
Old 2015-11-02, 05:06   #4
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×2,341 Posts
Default

We don't like compressed files because we can not peek inside, and they become unreadable if something goes wrong (like some bytes change, as opposite to uncompressed, where you still can recover most of the relations if a file crashes). Storage space is cheap and ramdisk (software, not physical) is fast.

Last fiddled with by LaurV on 2015-11-02 at 05:08
LaurV is offline   Reply With Quote
Old 2015-11-02, 23:47   #5
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

Quote:
Originally Posted by LaurV View Post
We don't like compressed files because we can not peek inside, and they become unreadable if something goes wrong (like some bytes change, as opposite to uncompressed, where you still can recover most of the relations if a file crashes). Storage space is cheap and ramdisk (software, not physical) is fast.
But storage space is slow when you're talking about files that are multiple GB in size (as msieve relations are), even on an SSD.

I wasn't aware until a few days ago that msieve can read gzip'd relations directly, and I will certainly be using that in the future. I wasted a good chunk of time and storage space unzipping, copying, and deleting msieve.dat/msieve.dat.gz files.

Last fiddled with by Mini-Geek on 2015-11-02 at 23:47
Mini-Geek is online now   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 16:34.

Sun Apr 11 16:34:41 UTC 2021 up 3 days, 11:15, 1 user, load averages: 1.81, 1.69, 1.75

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.