![]() |
|
|
#12 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
'Stage 1' means finding some combination of constant and linear terms in the rational polynomial that minimize the size of the third-largest algebraic coefficient, given a fixed value of leading algebraic coefficient.
'Stage 2' means taking the resulting polynomial, attempting to minimize the average size of sieve values via multivariable optimization, then running a sieve to optimize the number of algebraic roots modulo the first few small primes, then attempting to fix up the now-worse size of the resulting polynomial so that the Murphy score is maximized. In practice the size optimization is fast, and the root sieve is what takes forever. For degree-6 polynomials I've written much faster code that tries to home in on the best candidates as soon as possible, without exhaustive search that takes many hours even for degree 5. Eventually I'll modify the degree-5 root sieve to use the same code, so that stage 2 will eventually only takes a few minutes. |
|
|
|
|
|
#13 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
Excellent, thanks. The stage 2 computation for the stage 1 having c5=240 just past 24.000 hrs of cputime in 26:15 hrs walltime. With the two boards running (well, one running; the other waiting, I guess), I have Code:
grep c5 msieve.dat.p | uniq -c | tail -20
1 c5: 134640
1 c5: 240
2 c5: 134640
5 c5: 240
2 c5: 134640
68 c5: 240
1 c5: 137520
3 c5: 240
1 c5: 138780
17 c5: 240
3 c5: 138420
39 c5: 240
15 c5: 136980
65 c5: 240
6 c5: 139500
44 c5: 240
4 c5: 139500
1 c5: 240
1 c5: 139500
23 c5: 240
all with the same fixed c5? On the side issue, 2L2370 is sieving (via Serge's polyn) with 3LP. That's ggnfs, of course. -Bruce |
|
|
|
|
|
|
#14 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
36×13 Posts |
You mean ggnfs standard (albeit 64-bit asm opt.) sievers.
But snfs! An octic, this one is. -Yoda. |
|
|
|
|
|
#15 | |||
|
May 2008
3·5·73 Posts |
Thanks Bruce
Quote:
Things are further worsened when you allow stage2 to run, since that will stall the GPU as well. Msieve has options for running either stage1 (-np1) or stage2 (-np2) or both (-np). In case you have only limited access to the cards, you can try running only stage1 first, and then collect those results to run stage2 on another machine. If the CPU code in stage1 is limiting the card's throughput too much, there are a couple of things to do, but I wanted to see how it was working first. In any case it seems like you're getting some decent polys. Quote:
Some work is done on the CPU to prepare pieces of the things to test, and the rest of stage1 takes place on the GPU. The GPUs are effective because there are potentially millions of millions of such things to test (for huge problems), and GPUs can easily burn through them in a massively parallel fashion. Quote:
|
|||
|
|
|
|
|
#16 | ||||
|
Jun 2005
lehigh.edu
102410 Posts |
Quote:
Quote:
Code:
(./msieve -g 1 -l msieveg1c.log -v -np 1,60000 ) > seg1c.err & some of that output above, as in Quote:
Then with msieve.dat.p having the stage2 output? In any case, -np1 would put stage1 results somewhere, and -np2 knows (or can be instructed) where to look, and gives output like what -np puts in msieve.dat.p. Seems like, at least for small c5's like 240 with _lots_ of poly/saves I'd clearly be better off using -np1 to keep the two cards as busy as possible, unless -np2 is too labor intensive. Incidently, c5=240 finished at last (and -g1 has gone on to c5 = 3480), somewhere short of 34 hrs. Hmm, the counts are Code:
44 c5: 126480
47 c5: 192780
63 c5: 120480
64 c5: 130680
69 c5: 3480 [running -g1]
93 c5: 193800
139 c5: 123660
151 c5: 142560 [running -g0]
340 c5: 100020
354 c5: 240
Since I have six cores, I could leave two for stage1 with -np1 and run stage2's on three other cores, with one idle. I could recruit a few i7 cores if/when the stage2s start to get behind. The partial timing on c5 = 240 above was just 2:15 hrs out of 26:15 hrs that wasn't spent on the CPU. Anyone have thoughts on running -np1/-np2's, and why I oughtn't to be splitting the stage2's off of the cards? Uhm. It's the massively parallel stage1 searching that's key; if I'm reading correctly? -Bruce PS --- Ah, I have one of these Code:
integrator failed 9.915485e-23 1.421272e-24 error: size score computation failed Quote:
|
||||
|
|
|
|
|
#17 |
|
Tribal Bullet
Oct 2004
354110 Posts |
Running with -np1 produces <datfile_name>.m which contains (a5,p,m) triplets, one per line. Running with -np2 reads the triplets from the .m file and performs stage 2 directly.
Using -np1 instead of -np should be good for getting full use out of the Tesla system. Regarding the integration error, that sometimes occurs in stage 2 despite my best efforts to squash it. The current polynomial is aborted but you won't have anything else go wrong. It really happens much less often than it used to :) Last fiddled with by jasonp on 2010-09-09 at 04:32 |
|
|
|
|
|
#18 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
from 3481 (to get c5 = 4200). Guess I'll see whether there's too much extra overhead; how busy the stage2 cores get. On the effect of the size of c5, the above count of polyn with c5 = 100020 may indicate that most of 1-to-200000 (at least) is subject to having the cpu dominate access to the gpu, but it's way-far from uniformly an issue (or the effects are far from uniform). Here's the current ps on the two jobs Code:
83 Sep07 1-12:40:13 ./msieve -g 1 -l msieveg1c.log -v -np 1,60000 49 Sep05 1-20:59:09 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000 --- with start times 722 Sep 5 12:09 msieveg0b.log 716 Sep 7 11:46 msieveg1c.log the range above 122400 with 45 cpuhrs running 48 hrs walltime longer. Meanwhile, the new 9th place polyn looks like Code:
# norm 2.409745e-17 alpha -8.294924 e 1.150e-13 rroots 5 skew: 240344763.51 c0: -33857744015657762955279336799101810160171516095 c1: 568095717364628257207324473587558254827 c2: 5370986853453609897446590284537 c3: -31520799754011630854563 c4: -111275857954258 c5: 142560 <-- Y0: -10036454263612531871181703557730172 Y1: 9612481887476029967 <-- --- save 1.922823e-17 -7.6157 236407529.45 9.925516e-14 rroots 5 save 1.913994e-17 -7.6051 239084241.94 9.892358e-14 rroots 5 save 2.409745e-17 -8.2949 240344763.51 1.149655e-13 rroots 5 <-- --------- poly 20 p 3007669913 q 3194744269 coeff 9608736217600478597 poly 35 p 3007999279 q 3194416139 coeff 9608801442937963781 poly 15 p 3007850933 q 3195797299 coeff 9612481887476029967 <-- save 2.011369e-17 -7.1812 111032758.69 9.918854e-14 rroots 3 save 1.993533e-17 -7.5877 174155104.86 1.005867e-13 rroots 3 That was quick. After four minutes Code:
[bad0@blaze1 BDCD]$ more msieve.dat.m 3960 4974563463948961229 20551368431556631921447664108221817 4200 5050414601974904801 20310935367557069124730545641782930 4020 5010349792144868483 20489651495598794872124773875195322 msieve.dat.p) is that I'm going to want to split off the first few hours to run -np2 on one of the other cores, but -g1 is still going to be writing to msieve.dat.m. Does that mean picking off head -10000 to another directory and copying msieve_gpu and its *ptx's; then a different directory yet for (head -20000 ...) | tail -10000? Guess that will work ... ----- And well. Here's the stderr version Code:
searching leading coefficients from 3481 to 60000 using GPU 1 (Tesla C2050) deadline: 800 seconds per coefficient coeff 3540-5880 2052640438 2257904481 2257904482 2483694930 ------- 2052640438-2093693246 2393378749-2438536838 poly 7 p 2072497891 q 2400274319 coeff 4974563463948961229 poly 11 p 2087170079 q 2419742719 coeff 5050414601974904801 poly 8 p 2087960177 q 2399638579 coeff 5010349792144868483 ... Code:
wc -l msieve.dat.m = 83 and already a whopping 00:03:20 ./msieve_gpu -g 1 -l msieveg1d.log -v -np1 3481,60000 Last fiddled with by bdodson on 2010-09-09 at 12:33 Reason: early report on -np1; fiddling |
|
|
|
|
|
|
#19 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
Code:
08:07 00:03:37 ./msieve_gpu -g 1 -l msieveg1d.log -v -np1 3481,60000 Sep05 1-22:00:44 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000 and then 08:07 00:15:03 ./msieve_gpu -g 1 -l msieveg1d.log -v -np1 3481,60000 Sep05 1-23:11:46 ./msieve -g 0 -l msieveg0b.log -v -np 122401,180000 spent 11min35sec [and the previous post ought to have reported 3min of 20min; sigh.]. There's already a line-count of Code:
452 msieve.dat.m through -np2 [in a different directory, as above] before thinking about restarting g0. If I'm reading correctly, starting from 3481 [rather than closer to the c5 = 4200 of the previous run] ought to have re-searched some of the same range; but I'm not seeing duplicate m's ... maybe I ought to wait for -np2. The stdout of the previous search 1-4200 reported eight integration errors; but I'm hoping my union rep won't count ignoring these against my record as calculus instructor (such as it is). To complete the -np and -np1 comparison, g0 isn't waiting in vain; in fact there's a new largest flair Code:
70 c5: 3480
93 c5: 193800
139 c5: 123660
332 c5: 142560
340 c5: 100020
354 c5: 240
778 c5: 145080
new reports from c5=145080 ... no, make that _five_ Code:
save 2.409745e-17 -8.2949 240344763.51 1.149655e-13 rroots 5 save 2.555559e-17 -8.4629 182438594.86 1.180259e-13 rroots 3 save 2.528513e-17 -8.2019 144727708.13 1.162457e-13 rroots 5 save 2.485098e-17 -8.3967 189368595.86 1.160380e-13 rroots 3 save 2.549955e-17 -8.3652 172479067.87 1.177023e-13 rroots 5 save 2.543863e-17 -8.3850 177755667.18 1.176994e-13 rroots 5 the top3, so far ... 834-and-counting. -Bruce PS --- So the Nvidia GeForce on my home pc from hp (via circuit city, r.i.p.) is busy refreshing my 1024-x-768 pixels, and that's like 1024*768-worth of massively parallel, gpu-wise? Uhm, with an SE2-or-so added in? Last fiddled with by bdodson on 2010-09-09 at 14:42 Reason: just muttering |
|
|
|
|
|
|
#20 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
23·3·5·72 Posts |
@jasonp From memory you have said previously that multiple runs of poly selection don't have that much crossover. Would this be useful now as yield in stage 1 reduces when searching higher?
We had plenty of room to search back then because GPUs weren't so good but it looks to me that bruce is suffering a big reduction in the number of good new polynomials he has found since the first low run. |
|
|
|
|
|
#21 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
For large enough inputs, the stage 1 search space is split into two halves, and each is divided into several tens of pieces. A single run will pick one piece from each half, then when the search on that combo is finished the code will move to the next batch of algebraic leading coefficients. The upshot is that for the largest problems there are ~2500 possible pairs to choose from, only one of which is searched.
I would only be concerned if work was being duplicated. The hit rate is expected to be smaller once leading algebraic coefficients become 'too large', but you can always search the same ranges again if you want to go fishing for new hits in old neighborhoods. |
|
|
|
|
|
#22 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
from a low run: Code:
coeff 120060-122400 2838617022 3122478724 3122478725 3434726597 ------- 2895389362-2952161702 3372277021-3434726595 ... R0: -10379984941552728453972444081504096 R1: 9877593410632453061 ... A5: 120480 skew 76104896.21, size 2.678e-17, alpha -6.876, combined = 1.233e-13 rroots = 5 on the lowest range Code:
searching leading coefficients from 3481 to 60000 using GPU 1 (Tesla C2050) deadline: 800 seconds per coefficient coeff 3540-5880 2052640438 2257904481 2257904482 2483694930 ------- 2052640438-2093693246 2393378749-2438536838 for stage2. On the quality of the more recent best polyn, the flair I reported this morning is still running Code:
63 c5: 120480 <--- best polyn, 9th largest number of c5's
...
139 c5: 123660
332 c5: 142560
340 c5: 100020
354 c5: 240
1495 c5: 145080
Code:
save 2.543863e-17 -8.3850 177755667.18 1.176994e-13 rroots 5
---
save 2.504032e-17 -8.2935 177422337.42 1.157586e-13 rroots 5
save 2.626962e-17 -8.5424 193002660.75 1.204407e-13 rroots 5
save 2.503218e-17 -8.3819 191450596.57 1.165153e-13 rroots 5
save 2.645394e-17 -8.5587 195833030.04 1.207571e-13 rroots 5
save 2.680206e-17 -8.5782 199960907.66 1.212357e-13 rroots 5
Code:
# norm 2.490282e-17 alpha -7.085114 e 1.182e-13 rroots 3 # norm 2.626962e-17 alpha -8.542450 e 1.204e-13 rroots 5 # norm 2.645394e-17 alpha -8.558684 e 1.208e-13 rroots 5 (from 1.20757) # norm 2.680206e-17 alpha -8.578218 e 1.212e-13 rroots 5 # norm 2.750309e-17 alpha -8.147588 e 1.217e-13 rroots 5 # norm 2.677511e-17 alpha -6.875759 e 1.233e-13 rroots 5 On searching all of the small to medium-small ranges, I haven't yet been able to touch most of 1-to-120000. The longest range that I've covered continuously is 120001-145080; with 145081-180000 untouched and 180001-200000 done. @Jason: are these out in the "large enough" range in which the splitting in two ranges and picking a single pair has already started? -Bruce Last fiddled with by jasonp on 2010-09-09 at 21:52 Reason: Yes, definitely |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Tweaking polynomial search for C197 | fivemack | Msieve | 38 | 2011-07-08 08:12 |
| 109!+1 polynomial search | fivemack | Factoring | 122 | 2009-02-24 07:03 |
| 5^421-1 polynomial search | fivemack | Factoring | 61 | 2008-07-21 11:16 |
| 6^383+1 by GNFS (polynomial search; now complete) | fivemack | Factoring | 20 | 2007-12-26 10:36 |
| GNFS polynomial search tools | JHansen | Factoring | 0 | 2004-11-07 12:15 |