![]() |
10^263-1 sieving
Here is a nice hefty project to keep your CPUs bushy-tailed and glossy-coated into the Spring. The polynomial file is
[code] n: 276397778616959975917054244686348356245407334346331193552844312837949697033718836316510648552112872462175754206758265936948213056626234970255319654022011994534451487805497984097946614373924022283714690178871836462083 c6: 1 c0: -10 Y1: -1 Y0: 100000000000000000000000000000000000000000000 type: snfs skew: 1.47 rlambda: 2.6 alambda: 2.6 lpbr: 31 lpba: 31 mfbr: 62 mfba: 62 alim: 125000000 rlim: 100000000 [/code]with parameters selected to minimise sieving time by sampling 10kQ at Q=75,100,125,150M and curve-fitting. Start sieving at Q=25M on both sides (using gnfs-lasieve4I15e), and we'll run until we get a matrix of reasonable size, by which point I should have a computer of more-than-reasonable size on which to run the matrix. I guess we'll have to get to 125M or so, but the drop-off in speed and yield with increasing Q is quite perceptible: 150M .. 150M+1000 A total yield: 987, q=150001039 (0.80463 sec/rel) R total yield: 1187, q=150001039 (0.68066 sec/rel) 25M .. 25M+1000 A total yield: 2580, q=25001077 (0.43460 sec/rel) R total yield: 1799, q=25001029 (0.44244 sec/rel) The city is Oslo. [B]Reservations[/B] Xyzzy 15M-20M (algebraic side) (done 04Feb) JF 15M-20M (rational side) (done 30Jan) fivemack 20M-25M (both sides) (done 06Feb) Xyzzy 25M-30M (both sides) (done 27Jan) batalov 30M-35M (both sides) (done by Xyzzy and batalov ~20 Jan) JF 35M-40M (both sides) (done 03Jan) andi47 40M-42M (algebraic side only) (done 04Feb) bsquared 40M-42M (rational side only) (done 30Jan) JF 42M-50M (both sides) (done 05Jan) fivemack 50M-60M (both sides) (done 27Jan) Syd 60M-65M (both sides) (done 30Jan) batalov 65M-70M (both sides) (done 06Feb) JF 70M-75M (both sides) (done 14Jan) JF 75M-85M (both sides) (done 19Jan) fivemack 85M-90M (both sides) (done 03Feb) JF 90M-100M (both sides) (done 22Jan) fivemack 100M-102M (both sides) (done 16Feb) Xyzzy 102M-106M (both sides) (done 22Feb) fivemack 106M-110M (both sides) (done 23Feb) smh 110M-111M (both sides) bsquared 111M-120M (both sides) (done 20Feb) fivemack 120M-12xM (just killing time ...) [B]Relation stock[/B] A 15-121.3 R 15-121.3 [B]Incremental analysis[/B] 2015 14/01/2009 (50MQ): 63712027 relations, 54883524 unique. 0030 23/01/2009 (102MQ): 136304978 relations, 104394776 unique 2230 06/02/2009 (170MQ): 233613937 relations, 143698189 unique 2040 27/02/2009 (206.6MQ): 284849410 relations, 171805386 unique Sat Feb 28 08:20:51 2009 matrix is 15785588 x 15785836 (4555.7 MB) with weight 1111879653 (70.44/col) Sat Feb 28 08:20:51 2009 sparse part has weight 1036401865 (65.65/col) |
skew: 1.47
Looks good, but maybe add [B]skew: 1.47[/B] (=10^1/6)
With and without this skew, I got -- [CODE] -a 15e total yield: 2456, q=25001029 (0.41596 sec/rel) (skew=1.47) total yield: 2387, q=25001029 (0.42721 sec/rel) (skew=1) -r 15e total yield: 1837, q=25001029 (0.41020 sec/rel) (skew=1.47) total yield: 1768, q=25001029 (0.45177 sec/rel) (skew=1) -a 14e (my favorite blooper! but I ran it, so will show these as well) total yield: 1129, q=25001029 (0.36877 sec/rel) (skew=1.47) total yield: 1118, q=25001029 (0.37191 sec/rel) (skew=1) -r 14e total yield: 888, q=25001029 (0.35605 sec/rel) (skew=1.47) total yield: 882, q=25001029 (0.37367 sec/rel) (skew=1) [/CODE] (This didn't match some of your numbers, though. Oh, I see what happened, your runs are not all stopped at q=25001029.) Anyway, the difference is small, but skew=1.47 seems a bit better than 1.00. Would you like to bench it, too? your benchmark is more comprehesive. I'll take 30-35M both sides. |
Good point about the skew, I'm happy to believe your benchmark without running it myself.
|
We were thinking today, which is usually a dangerous thing.
We wanted to run a range on a core and have it survive a system crash, process crash or a shutdown with minimal loss. We also didn't want to have to watch over it too much. So we came up with this very simple script: [FONT=Courier New]for i in $(seq 30000000 10000 30990000); do ./gnfs-lasieve4I15e -r poly -f ${i} -c 10000 -o ${i}r && chmod 400 ${i}r; done[/FONT] Basically, for each range of a million iterations, we create a folder. We reserved 30-35M so we used "30r", "31r", "32r", "33r" and "34r". We can run 4 ranges at a time on a quad core, so "34r" is going to have to wait for a while. We put a copy of gnfs-lasieve4I15e and a copy of the poly in each folder. We want all the work from 30,000,000 to 30,999,999 to stay in the "30" folder, and so on and so forth. Instead of doing all 1,000,000 iterations in one go, we break it up into 10,000 iteration chunks. We suppose you could use different sized chunks depending on your hardware and needs. There does seem to be a minute or so of lag when you start a range before you see any results so if you made the chunks too small you might lose a lot of CPU time. After each chunk is finished the results file is set to read only. The "&&" only allows this file to be set read only if the previous sieving command was successful. In the event of a crash or whatever, you just look at your folder and find the highest file, which should be writable, and then restart the script at that level, which is a matter of editing only 2 characters in the script. (The command will be in your history buffer so this is trivial.) As a sanity check you can ensure all the lower results files are read only. The script will, of course, overwrite the file that was not completed so there is no need to delete it. In the worst case scenario, you lose 9,999 iterations, or something like that. We run mprime in the background to catch stray cycles and if a process fails mprime picks up the slack so there is no lost CPU time. We've played with "PauseWhenRunning=gnfs" in prime.txt, which works, but if one core goes down mprime will not kick in. If you run mprime without "PauseWhenRunning" and you do not "nice" your gnfs work, mprime will use ~5-6% out of 400% on a quad core box. One possible solution might be to rename each gnfs-lasieve4I15e binary to something unique and somehow have "PauseWhenRunning" deal with partial failures but right now it just gives us a headache thinking about it. We'll deal with the 5-6% hit. We append "r" to the file name to indicate we are working the R side. We have 2 spare quad core boxes so we are running the R stuff on one box and the A stuff on the other. The A stuff has "a" appended to the file name. Our folders over there are "30a", "31a", "32a", "33a" and "34a". The changes in the script for the A side amount to changing 3 characters, all from "r" to "a". If the changes are not painfully obvious you probably should not be in the same room with sharp objects. We use "screen" to keep track of all this. None of our Linux boxes have a GUI interface installed, and all have no monitor, keyboard or mouse. We use SSH to log in remotely. We know there are probably much more elegant solutions out there but this is a start. We suspect there are people out there lurking, like we do, who appreciate having simple stuff explained. (The fact that this took us all day is a confirmation of our third-grade education level.) If there is a more optimal way to script things or divide the work up amongst cores please let us know. What would be really cool would be to have a master server issue out work units to the cores in your network and track them. In the end, when the range is done, you just "cat" all the results files up and redirect them to one big file. When all the ranges are done you "cat" all the big files and redirect them to a final file, which can then have "bzip2 -9" applied to it. One interesting benefit is you can look at the file modification time, related to the file modification time of the file that came before it, to get an idea of how long each chuck takes to compute. Another benefit is you can tell which chunk you are working on by looking at "top", rather than looking in each window or folder. In other news, we are getting excellent iteration times compared to the last work we did. We are now happily using Linux rather than Windows. The Linux binary is very fast and much easier to use, now that we have access to the tools we are used to using. :spot: |
That is a VERY good start. No kidding!
I use a bit less granularity, say the "-c 50000" chunks. Don't forget that with the latest binary (this reminds me to build it and repost it, the Opteron one) you will be able to finish failed chunks by adding [B]-R [/B](you cannot use it on a gzipped file, gunzip it). P.S. I wonder how many people know that there's a [B]-z[/B] option in the siever?! That'll save you half the disk space. It was always there! Note: libz is not linked, the output stream is simply piped into a [I]gzip[/I] child process; you need a [I]gzip[/I] binary in the [I]$path[/I]. And then the chunks may be concatenated into 1M chunks to be ftp'd (gzipped files allow that), but for even better compression, you can also gunzip them, cat together and bzip2 -9. Surely, you can do that on the fly using pipes, too. |
[quote]That'll save you half the disk space.[/quote]We're not going to sweat disk space.
[code]$ df -h /dev/sda1 Filesystem Size Used Avail Use% Mounted on /dev/sda1 463G 2.8G 437G 1% /[/code]We remember what it was like using a cassette tape for mass storage. We'd actually use around 700MiB for the system but we accidentally installed the basic system and the desktop. This will be addressed soon. (We don't bother with a swap partition or swap file either.) Interesting note: When you have this much space to spare you can format EXT3 with "largefile4". It really doesn't make much of a difference, except you probably will never have to worry about fragmentation since each file is allocated, at the minimum, 4MiB. And you won't run out of inodes, either. PS: Here is what "top" looks like. Note the ability to track the siever progress. [code]top - 01:08:35 up 4 days, 7:22, 7 users, load average: 4.00, 4.00, 4.00 Tasks: 118 total, 6 running, 112 sleeping, 0 stopped, 0 zombie Cpu0 : 99.0%us, 1.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 99.7%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 99.7%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 5987960k total, 1647444k used, 4340516k free, 83656k buffers Swap: 0k total, 0k used, 0k free, 274212k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 17911 m 20 0 637m 280m 472 R 100 4.8 29:35.51 3 ./gnfs-lasieve4I15e -r poly -f 32020000 -c 10000 -o 32020000r 17933 m 20 0 637m 280m 472 R 100 4.8 27:57.22 1 ./gnfs-lasieve4I15e -r poly -f 33020000 -c 10000 -o 33020000r 18029 m 20 0 637m 280m 472 R 100 4.8 20:00.72 0 ./gnfs-lasieve4I15e -r poly -f 31020000 -c 10000 -o 31020000r 18057 m 20 0 637m 280m 472 R 100 4.8 17:48.94 2 ./gnfs-lasieve4I15e -r poly -f 30020000 -c 10000 -o 30020000r[/code] |
[QUOTE=Xyzzy;156310]Interesting note: When you have this much space to spare you can format EXT3 with "largefile4". It really doesn't make much of a difference, except you probably will never have to worry about fragmentation since each file is allocated, at the minimum, 4MiB. And you won't run out of inodes, either.[/QUOTE]Thanks for posting this tip. I've learned something new!
Paul |
I'll take 50-55M, both sides.
Being new and all, is there anything I should know about setting rlim, alim and such? |
JF: would you mind taking 35-40 instead? The yield seems distinctly better for lower Q, and I'd prefer (perhaps only for aesthetic reasons) a contiguous range of sieving rather than lots of gaps to fill at the end.
rlim and alim: if you're sieving -r, make a copy of the .poly file and set rlim to the smallest Q that that run is using, if you're sieving -a do the same with alim. |
I will take 40-42 (A side)
I will start sieving on Jan 7th or 8th, but I want to grab one of the low ranges which (presumably?) take less memory due to the lower alim, as long as these ranges are available. (please correct me if I'm wrong about memory useage.) (these huge GNFSes are almost maxing out what is the highest memory usage for the siever to run (almost) invisible on my office box, i.e. to not slow down other programs too much) |
[quote=fivemack;156351]JF: would you mind taking 35-40 instead? The yield seems distinctly better for lower Q, and I'd prefer (perhaps only for aesthetic reasons) a contiguous range of sieving rather than lots of gaps to fill at the end.
rlim and alim: if you're sieving -r, make a copy of the .poly file and set rlim to the smallest Q that that run is using, if you're sieving -a do the same with alim.[/quote] Ok, 35-40 it is. |
| All times are UTC. The time now is 22:31. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.