![]() |
Seeing people solve problems that were unimaginably huge to me even a little while ago has been completely toxic to my humility.
I finally got a chance to adapt most of the assembly code and ran a medium-size test. Solving a 212k matrix takes 11.25 minutes if done with 128-bit vectors and takes 7.9 minutes if done with 64-bit vectors. I guess it still needs work. |
[QUOTE=bdodson;131687]Speaking of which, I'm past the halfway point on sieving for 10m257, ...
Suppose ... we'll run out of disk soon. Guess two msieves, one on each number would get me up to 8 threads, if the disk space holds out...-Bruce[/QUOTE] I managed to whine over a sufficiently long period that our IT people gave in, and let me have 60Gb on /home. That together with some useful over-sieving brought the filter time for 10,257- down to 10 hours (from 44 hours, with very slow disk write/reads, for 3,536+). This is a harder number (difficulty 257), but sieving was over 160M (40M-200M), rather than 130M (40-170). Also, Greg's .poly dropped the factorbases from 80M to 70M, so I ended up with more unique relns on a smaller factorbase (didn't get a timing comparison on the sieving time, lots of other user effects; condor "evictions"). New matrix ends up at [code] saving the first 48 matrix rows for later matrix is 13938310 x 13938558 (3804.8 MB) with weight 937778319 (67.28/col) sparse part has weight 858020543 (61.56/col) [/code] so 290K fewer rows/cols; and at 61.56 -vs- 61.45 on the sparse part, just a bit extra density. As I was observing on the post before the Wiedermann interjection, this gets me very close to eight treads, as in [code] %CPU 15341 bad0 25 0 4664m 4.5g 484 R 400 4.1 269:18.79 msieve 7310 bad0 25 0 4766m 4.6g 476 R 373 4.2 13468:40 msieve or 15341 bad0 25 0 4664m 4.5g 484 R 357 4.1 255:36.68 msieve 7310 bad0 25 0 4766m 4.6g 476 S 354 4.2 13456:43 msieve 16246 bad0 25 0 494m 209m 608 R 100 0.2 109:59.31 gnfs-lasieve4I1 16248 bad0 25 0 497m 210m 556 R 100 0.2 109:50.76 gnfs-lasieve4I1 [/code] -Bruce (10,257+ c208 is going rather quickly; especially since I took Richard's suggestion and cut the ranges down to 10K (from 20K, from 50K ...) just over 3hrs per quadcore core, just under on the Opterons --- fewer condor evictions (or maybe less other users, hard to keep track). Maybe I'll hit 12 threads for a bit, if the current 10p257 rate holds. After 30 console minutes I'm seeing 249 dim/min -vs- 354 for the larger matrix, I'll try an update after the weekend (when I'm next at my office console) if the rate picks up (initial timing were also slower on the c252). |
[QUOTE=bdodson;132176]
-Bruce (10,257+ c208 is going rather quickly; ... Maybe I'll hit 12 threads for a bit, if the current 10p257 rate holds. After 30 console minutes I'm seeing 249 dim/min -vs- 354 for the larger matrix, I'll try an update after the weekend (when I'm next at my office console) if the rate picks up ...[/QUOTE] Weekend consoles show +8.8% (to 40.8%) on 3,536+ and +10.2% (to 11.2%) on 10,257-, so the matrix for the harder number is quicker --- over-sieving wins again. On 10,257+ I collected the qs from a range of width 133M (just 27M left to go) over six days; sustained on 300+ cpus. Looks like I'll have to wait a couple of days for the last 27M to re-start (my jobs get lowest priority here; users with fewer cpuhours get their jobs scheduled first; can't hardly complain that I'm not getting enough!). -Bruce |
[QUOTE=bdodson;132176] As I was observing on the post before the
Wiedermann interjection, this gets me very close to eight treads, as in ... [/QUOTE] Ah, whilst digging around for some sort of "which core is used" info, I found "top -H" for show all threads, and got: [code] %CPU 7314 bad0 25 0 4766m 4.6g 476 R 100 4.2 6081:16 msieve 23721 bad0 25 0 4661m 4.5g 476 R 100 4.1 3221:59 msieve 23726 bad0 25 0 4661m 4.5g 476 R 100 4.1 3231:59 msieve 7310 bad0 25 0 4766m 4.6g 476 R 100 4.2 8890:38 msieve 7313 bad0 25 0 4766m 4.6g 476 R 100 4.2 5867:24 msieve 7315 bad0 25 0 4766m 4.6g 476 R 100 4.2 5965:16 msieve 23691 bad0 25 0 4661m 4.5g 476 R 100 4.1 4027:32 msieve 23717 bad0 25 0 4661m 4.5g 476 R 100 4.1 3119:01 msieve [/code] Jason's not kidding when he mentions long-term threads. Also, I was muttering (elsewhere) about maybe the threads were geting in each other's way (as in "why not all 4 threads for both matrices on two cores of a single quad?"). But that was a mis-reading of the checkpoint info --- both matrices are getting checkpoints every four hours, so I was only seeing every 2nd or third. This is with no sieving, but another user on five cores. [QUOTE] I'll hit 12 threads for a bit, if the current 10p257 rate holds. [/QUOTE] I'm just waiting for that last range of length 27M, which was taking under 24 hours on 300+ cpus. Both matrices will be running for another two weeks, so a 3rd 4 thread msieve is almost certain, not too far off. -Bruce |
hello,
I'd like to report a strange behaviour of the poly selection : With the 1.34 windows binaries a test with a c97 ran smoothly but, with a c126, the poly selection ran for 4 days (!!) (instead of 8 hours) and a "control C" didn't stop the search after 4 hours ! The problem is that nothing is written to disk during the search and all the work was lost. Another 24h try gives the same problem. The partial space search option doesn't work and start from scratch, with a message. All the large coefs seems to be truncated and finished with "0000000". It was just a test but I'd like the best polys to be saved in a file, like the .cand.all in the ggnfs distribution, at least to get the possibility of an abrupt interruption and to let some comparisons with ggnfs to be done. Thanks Jason. Regards. Philippe |
Msieve 1.35
1 Attachment(s)
Jason,
Excuse that so long did not write. Two months were in mountains. It abruptly!!!!! And now about the main thing. Has started to test version 1.35. Also that interestingly I tried example RSA100. That I have understood the first, it does not work. And to not be proofless here files. |
Pilippe: none of that functionality (limiting the search, interrupting early, etc) has been implemented. The next version will include a lot of changes and improvements to the polynomial selection, though you shouldn't expect big differences from the pol5 code.
Sergey: there are too many relations for the filtering to handle; it should work if you delete maybe 1 million relations. |
1 Attachment(s)
[quote=jasonp;132348]Pilippe: none of that functionality (limiting the search, interrupting early, etc) has been implemented. The next version will include a lot of changes and improvements to the polynomial selection, though you shouldn't expect big differences from the pol5 code.
Sergey: there are too many relations for the filtering to handle; it should work if you delete maybe 1 million relations.[/quote] Jason, Here one more example of work msieve1.35 for RSA154 in a file ggnfs_rsa154.txt. And here that that has turned out on version 1.32 (thus entrance data identical). [code] Sun Apr 27 05:11:35 2008 commencing linear algebra Sun Apr 27 05:11:36 2008 read 4027634 cycles Sun Apr 27 05:11:58 2008 cycles contain 9816816 unique relations Sun Apr 27 05:15:10 2008 read 9816816 relations Sun Apr 27 05:15:27 2008 using 32 quadratic characters above 536870658 Sun Apr 27 05:21:41 2008 read 4027634 cycles Sun Apr 27 05:21:44 2008 matrix is 4024942 x 4027634 with weight 376537877 (avg 93.49/col) Sun Apr 27 05:23:08 2008 filtering completed in 3 passes Sun Apr 27 05:23:10 2008 matrix is 3991550 x 3991748 with weight 374270753 (avg 93.76/col) Sun Apr 27 05:24:20 2008 read 3991748 cycles Sun Apr 27 05:24:24 2008 matrix is 3991550 x 3991748 with weight 374270753 (avg 93.76/col) Sun Apr 27 05:24:24 2008 saving the first 48 matrix rows for later Sun Apr 27 05:24:26 2008 matrix is 3991502 x 3991748 with weight 288791612 (avg 72.35/col) Sun Apr 27 05:24:26 2008 matrix includes 64 packed rows Sun Apr 27 05:24:26 2008 using block size 65536 for processor cache size 4096 kB Sun Apr 27 05:25:37 2008 commencing Lanczos iteration (4 threads) Mon Apr 28 12:18:17 2008 lanczos halted after 63129 iterations (dim = 3991496) Mon Apr 28 12:18:27 2008 recovered 39 nontrivial dependencies Mon Apr 28 12:18:27 2008 elapsed time 31:06:55 Mon Apr 28 12:18:27 2008 Mon Apr 28 12:18:27 2008 Mon Apr 28 12:18:27 2008 Msieve v. 1.32 Mon Apr 28 12:18:27 2008 random seeds: c74788e7 cfb8cdd3 Mon Apr 28 12:18:27 2008 factoring 8715559889770423424310203883632167048898008133505110868539099203591147550598330383715037180377356985067654557901317213606892162668124627648279771812893257 (154 digits) Mon Apr 28 12:18:29 2008 searching for 15-digit factors Mon Apr 28 12:18:30 2008 commencing number field sieve (154-digit input) Mon Apr 28 12:18:30 2008 R0: -272943704267283933914006624529 Mon Apr 28 12:18:30 2008 R1: 289440566749750559 Mon Apr 28 12:18:30 2008 A0: -1479120383524660035081694220384272 Mon Apr 28 12:18:30 2008 A1: -79121815855669782650267611608 Mon Apr 28 12:18:30 2008 A2: 4929773863590481595325727 Mon Apr 28 12:18:30 2008 A3: -1300115266100187588 Mon Apr 28 12:18:30 2008 A4: -46749049796676 Mon Apr 28 12:18:30 2008 A5: 5753520 Mon Apr 28 12:18:30 2008 size score = 7.262495e-16, Murphy alpha = -5.718998, combined = 4.886474e-15 Mon Apr 28 12:18:30 2008 Mon Apr 28 12:18:30 2008 commencing square root phase Mon Apr 28 12:18:30 2008 reading relations for dependency 1 Mon Apr 28 12:18:31 2008 read 1995313 cycles Mon Apr 28 12:18:41 2008 cycles contain 6061614 unique relations Mon Apr 28 12:21:34 2008 read 6061614 relations Mon Apr 28 12:22:25 2008 multiplying 9138262 relations Mon Apr 28 13:37:26 2008 multiply complete, coefficients have about 544.94 million bits Mon Apr 28 13:37:54 2008 initial square root is modulo 77479 Mon Apr 28 15:17:01 2008 reading relations for dependency 2 Mon Apr 28 15:17:02 2008 read 1995838 cycles Mon Apr 28 15:17:12 2008 cycles contain 6063340 unique relations Mon Apr 28 15:20:06 2008 read 6063340 relations Mon Apr 28 15:20:56 2008 multiplying 9140822 relations Mon Apr 28 16:36:09 2008 multiply complete, coefficients have about 545.11 million bits Mon Apr 28 16:36:34 2008 initial square root is modulo 77731 Mon Apr 28 18:15:42 2008 prp77 factor: 76648678625485733611695846561820047569238811477621174717293370669816009242343 Mon Apr 28 18:15:42 2008 prp78 factor: 113707894853030046972514394561611812429142647991789076119120797742828164069199 Mon Apr 28 18:15:42 2008 elapsed time 05:57:15 [/code] |
[QUOTE=jasonp;132348]Pilippe: none of that functionality (limiting the search, interrupting early, etc) has been implemented. The next version will include a lot of changes and improvements to the polynomial selection, though you shouldn't expect big differences from the pol5 code.
Sergey: there are too many relations for the filtering to handle; it should work if you delete maybe 1 million relations.[/QUOTE] Thanks Jason ! My goal was to compare the efficiency of your implementation (it was v 1.35) with ggnfs and there is no problem since it was just for testing purposes. Thanks for this work on the polynomial selection part. But is this normal that for a c97, the poly search time was respected and that, for a c126, the search seems to run for ever (>96h) instead of a planned 8 hours ? I think this is a bug and that is the reason of my report. Regards Philippe. PS : the number is 120589172157011794277137847608483215552179876319127843442720258721624648233382859532388590243405874352824255989606819236818821 |
[QUOTE=Phil MjX;132369]
But is this normal that for a c97, the poly search time was respected and that, for a c126, the search seems to run for ever (>96h) instead of a planned 8 hours ? [/QUOTE] Right now the code sets the limits on leading polynomial coefficient and searches the entire range, regardless of how long it would take. Polynomial selection happens in two stages, and most of the runtime is packed in the first stage, which I have not optimized at all (in fact I've removed the assembly code that the GGNFS version uses, so it's between 2.5x and 4x slower than the GGNFS version). I'm planning the next release in a week or two, and some of the features you ask about will be implemented by then. Sergey: I will think about how your RSA100 example could be made to work. Also, the RSA154 job has a huge amount of oversieving, you can probably succeed with 20-25% less relations. The filtering in v1.35 is much more effective than in 1.32 when there are many excess relations, and the matrix weight is much lower than v1.32 produced. In fact, the matrix stopped only 80% of the way to completion, so it was probably not dense enough to solve correctly. Try this: in gnfs/filter/merge.c, change TARGET_DENSITY to 75.0 and rerun the filtering |
[QUOTE=bdodson;132176] ... this gets me very close to eight treads, as in
[code] %CPU 15341 bad0 25 0 4664m 4.5g 484 R 400 4.1 269:18.79 msieve 7310 bad0 25 0 4766m 4.6g 476 R 373 4.2 13468:40 msieve [/code] [/QUOTE] Neither -t 12 nor -t 8 ran well; most of the extra threads were showing "status=Sleeping" rather than Running. But -t 6 does seem to offer improvment over -t 4, which looks like [code] %CPU 10129 bad0 25 0 5326m 5.2g 476 S 591 4.7 65:45.73 msieve 10123 bad0 25 0 4661m 4.5g 476 S 398 4.1 65:02.85 msieve 10105 bad0 18 0 4766m 4.6g 476 S 310 4.2 57:28.75 msieve [/code] Uhm, that's 14 threads, four each on the first two matrices, six on the new one, 10,257+. A monster, by the way, 15.479M^2, despite having sieved the same region. Switching from x^6 -10 to x^6+10 seems to have been a radical alteration in the roots; even the ratings of the polyn looks different: size score = 1.839597e-12, Murphy alpha = 1.635261, combined = 1.152956e-12 for 257+ and size score = 2.036089e-12, Murphy alpha = 1.321650, combined = 1.395729e-12 for 257- despite having the same "difficulty". The previous matrix having been 13.938M^2; so the same region seems to have somewhat over-sieved 257-, but not so much for 257+. Lots fewer relns; fewer unique relns. -Bruce (starting sieving 3,547-, supposed to be the most "difficult" Childers/Dodson number so far; and a larger sieve region --- we're not taking chances, and this time oversieving on purpose ...) |
| All times are UTC. The time now is 22:43. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.