![]() |
[QUOTE=Batalov;308455]A radical version of the same idea: keep only a and b values in the file.[/QUOTE]
Even more radical: Keep the lattice basis vectors for each special-q root and you can write each relation in lattice coordinates to save even more space. |
Here's a somewhat facetious reason to use base 36: Now we can go hunting for funny words spelled out by the factors. :smile:
|
Hmm, less promising than I thought.
[code]bill@Gravemind:~/yafu/rsals∰∂ wc -l nfs.dat 93193867 nfs.dat bill@Gravemind:~/yafu/rsals∰∂ wc -l rels36 91974259 rels36 bill@Gravemind:~/yafu/rsals∰∂ wc -c nfs.dat 11683340634 nfs.dat bill@Gravemind:~/yafu/rsals∰∂ wc -c rels36 10045856817 rels36 bill@Gravemind:~/yafu/rsals∰∂ head -n 2 nfs.dat 353566,832085:5200a0f,af8df,20bc63,2912c0b,3114f59,1087,751:34dd2953,18cf85c1,1aa0f1,1159,3,709,1360f1f 1337012,1463419:199c69b1,1ce65701,5bc9,96fd,3ab829,d,29,17f,5ab,13:a47a68b,2ee792d,dea7,3d444f,3,5,11b,11,11,13,1360f1f bill@Gravemind:~/yafu/rsals∰∂ head -n 2 rels36 353566,832085:1F6Z2N,FEU7,19ZDV,PN3T7,UN3H5,39J,1G1:EO1JKJ,6VTR5T,11EK1,3FD,3,1E1,C3J1B 1337012,1463419:73TK1D,80O6IP,I4P,TTP,2AHBD,D,15,AN,14B,J:2UOL1N,TA5V1,17ZB,2E24V,3,5,7V,H,H,J,C3J1B bill@Gravemind:~/yafu/rsals∰∂ remdups4 nfs.dat Counting rels in nfs.dat. This might take a few minutes for large rel files. Found 93193103 unique, 5 duplicate (0.0% of total), and 107 bad relations. Largest dimension used: 362 of 930 Average dimension used: 284.4 of 930 bill@Gravemind:~/yafu/rsals∰∂ bzip2 nfs.dat bill@Gravemind:~/yafu/rsals∰∂ wc -c nfs.dat.bz2 5130787706 nfs.dat.bz2 bill@Gravemind:~/yafu/rsals∰∂ bzip2 rels36 bill@Gravemind:~/yafu/rsals∰∂ wc -c rels36.bz2 5035796294 rels36.bz2[/code] (Don't be fooled, the conversion took ~2 hrs, and each of the zips took around an hour.) Uncompressed 36/hex ratio: 85.98% Compressed 36/hex ratio: 98.15% 36 un/compressed ratio: 50.13% hex un/compressed ratio: 43.92% So it seems I was right about one thing: adding the extra characters made the compression less effective. |
Sounds about right. This is because if gzip does its job right, then the gzipped size is the "raw information content" size.
See, the hex file has more air, 36-base has less air (in both cases, there are unused bits as well as abundant commas), but gzip squeezes air out and the bit stream remains. Now, bzip2 and 7zip will compress better still (at the expense of much longer time) and the resulting compressed sizes may be closer still. What you did is a small pre-compression. Information (or essentially random data) is incompressible. The only way to make compressed files really smaller is to give up some information and pay back later (pay back with time) to restore the sacrificed information. |
[QUOTE=Batalov;308443]Tom, have you thought about using 16e siever and a much lesser sieving area? At q~268M, the 16e siever is only 10% slower and produces [STRIKE]3x[/STRIKE]* >2x more relations (so the sieving area would be way shorter). It does take 1.5G of memory compared to 1.1G for the 15e. [/QUOTE]
Yes, I ran 0.1% of the job with both 15e and 16e; in my tests 16e was significantly slower (10% on this project is more than a CPU-year!) to get to 400 million relations, and since it also uses significantly more memory than 15e I thought 15e would be a better choice. |
| All times are UTC. The time now is 04:52. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.