mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   On relations file format (https://www.mersenneforum.org/showthread.php?t=17082)

jrk 2012-08-19 00:05

[QUOTE=Batalov;308455]A radical version of the same idea: keep only a and b values in the file.[/QUOTE]
Even more radical: Keep the lattice basis vectors for each special-q root and you can write each relation in lattice coordinates to save even more space.

Dubslow 2012-08-19 01:02

Here's a somewhat facetious reason to use base 36: Now we can go hunting for funny words spelled out by the factors. :smile:

Dubslow 2012-08-19 07:19

Hmm, less promising than I thought.
[code]bill@Gravemind:~/yafu/rsals∰∂ wc -l nfs.dat
93193867 nfs.dat
bill@Gravemind:~/yafu/rsals∰∂ wc -l rels36
91974259 rels36
bill@Gravemind:~/yafu/rsals∰∂ wc -c nfs.dat
11683340634 nfs.dat
bill@Gravemind:~/yafu/rsals∰∂ wc -c rels36
10045856817 rels36
bill@Gravemind:~/yafu/rsals∰∂ head -n 2 nfs.dat
353566,832085:5200a0f,af8df,20bc63,2912c0b,3114f59,1087,751:34dd2953,18cf85c1,1aa0f1,1159,3,709,1360f1f
1337012,1463419:199c69b1,1ce65701,5bc9,96fd,3ab829,d,29,17f,5ab,13:a47a68b,2ee792d,dea7,3d444f,3,5,11b,11,11,13,1360f1f
bill@Gravemind:~/yafu/rsals∰∂ head -n 2 rels36
353566,832085:1F6Z2N,FEU7,19ZDV,PN3T7,UN3H5,39J,1G1:EO1JKJ,6VTR5T,11EK1,3FD,3,1E1,C3J1B
1337012,1463419:73TK1D,80O6IP,I4P,TTP,2AHBD,D,15,AN,14B,J:2UOL1N,TA5V1,17ZB,2E24V,3,5,7V,H,H,J,C3J1B
bill@Gravemind:~/yafu/rsals∰∂ remdups4 nfs.dat
Counting rels in nfs.dat. This might take a few minutes for large rel files.
Found 93193103 unique, 5 duplicate (0.0% of total), and 107 bad relations.
Largest dimension used: 362 of 930
Average dimension used: 284.4 of 930
bill@Gravemind:~/yafu/rsals∰∂ bzip2 nfs.dat
bill@Gravemind:~/yafu/rsals∰∂ wc -c nfs.dat.bz2
5130787706 nfs.dat.bz2
bill@Gravemind:~/yafu/rsals∰∂ bzip2 rels36
bill@Gravemind:~/yafu/rsals∰∂ wc -c rels36.bz2
5035796294 rels36.bz2[/code]
(Don't be fooled, the conversion took ~2 hrs, and each of the zips took around an hour.)

Uncompressed 36/hex ratio: 85.98%
Compressed 36/hex ratio: 98.15%
36 un/compressed ratio: 50.13%
hex un/compressed ratio: 43.92%

So it seems I was right about one thing: adding the extra characters made the compression less effective.

Batalov 2012-08-19 07:31

Sounds about right. This is because if gzip does its job right, then the gzipped size is the "raw information content" size.
See, the hex file has more air, 36-base has less air (in both cases, there are unused bits as well as abundant commas), but gzip squeezes air out and the bit stream remains.

Now, bzip2 and 7zip will compress better still (at the expense of much longer time) and the resulting compressed sizes may be closer still. What you did is a small pre-compression. Information (or essentially random data) is incompressible. The only way to make compressed files really smaller is to give up some information and pay back later (pay back with time) to restore the sacrificed information.

fivemack 2012-08-20 10:05

[QUOTE=Batalov;308443]Tom, have you thought about using 16e siever and a much lesser sieving area? At q~268M, the 16e siever is only 10% slower and produces [STRIKE]3x[/STRIKE]* >2x more relations (so the sieving area would be way shorter). It does take 1.5G of memory compared to 1.1G for the 15e. [/QUOTE]

Yes, I ran 0.1% of the job with both 15e and 16e; in my tests 16e was significantly slower (10% on this project is more than a CPU-year!) to get to 400 million relations, and since it also uses significantly more memory than 15e I thought 15e would be a better choice.


All times are UTC. The time now is 04:52.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.