![]() |
|
|
#34 | |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Quote:
R_1 + j*P_1^2 = R_2 + k*P_2^2 and both j and k are smaller than a fairly stringent bound. When such a pair is found then P_1 * P_2 generates the leading rational coefficient Y1 of a GNFS polynomial and the above equality generates a 'small' correction to the 'optimal' value of Y0, such that Y1 and Y0 satisfy a modular condition that is required for GNFS polynomials. |
|
|
|
|
|
|
#35 | |
|
May 2008
3·5·73 Posts |
Quote:
|
|
|
|
|
|
|
#36 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
reports (Y1 or the p, q with p*q = Y1, or ...), or does the "generating" of Y1 leave the initial match behind? @jrk: "hundreds" of combinations in parallel? That doesn't _sound_ "massively parallel" (someone else's quotes). On the "early abort" of the root sieve, a reboot has me with two -np2's running at once, with Code:
841 stage2b/msieve.dat.p:c5: 186960 1434 stage2c/msieve.dat.p:c5: 199080 7315 stage2b/msieve.dat.p:c5: 182040 77391 stage2b/msieve.dat.p:c5: 186732 127743 stage2b/msieve.dat.p:c5: 183600 893383 stage2b/msieve.dat.p:c5: 189924 1265386 stage2c/msieve.dat.p:c5: 202020 has the best of Code:
# norm 1.542442e-18 alpha -8.336848 e 2.092e-14 rroots 3 # norm 1.545278e-18 alpha -8.254718 e 2.124e-14 rroots 3 # norm 1.557593e-18 alpha -8.052209 e 2.125e-14 rroots 3 # norm 1.726763e-18 alpha -7.413974 e 2.192e-14 rroots 5 # norm 1.665776e-18 alpha -8.499358 e 2.237e-14 rroots 3 Code:
# norm 1.440151e-18 alpha -8.162720 e 1.995e-14 rroots 1 # norm 1.447408e-18 alpha -8.420328 e 2.023e-14 rroots 3 # norm 1.449678e-18 alpha -8.415766 e 2.026e-14 rroots 3 # norm 1.528283e-18 alpha -8.316211 e 2.071e-14 rroots 3 # norm 1.542442e-18 alpha -8.336848 e 2.092e-14 rroots 3 two 2.3's in the first five hours, and another two in the next five hours; early promise before the new 2.4. Before the reboot hit, this flare with c5 = 182280 filled-in 9 of the cumulative top10, with the one earlier 2.4 mentioned elsewhere. So on a side issue, I suppose 37hours on that flare is most likely representative of what to expect from another 2 days or so; but is it possible to restart this root sieve without redo-ing the first 37hrs? Not worth that; seems unlikely to produce that 2.6-2.7 that's supposed to be lurking out there. By contrast, the root sieve for c5=202020 has every prospect to keep running another 2-3 days without ever hitting 2.2. I'm inclined to start a 3rd -np2 from the stage1 hit just after 202020, and kill 202020 if/when it continues running without prospect. At least, that's consistent with what I was seeing with Arjen's version of the initial Montgomery-Murphy code, c. rsa140/155. Maybe c5= 189924 might hit a productive range somewhat further out, but a few std deviations up from where 202020 is won't help ... -Bruce --- Hmm, that's 10 stage1 hits with c5=202020, distinct Y1's; these are all from the first one ... Last fiddled with by bdodson on 2010-09-25 at 12:05 |
|
|
|
|
|
|
#37 | |||
|
Jun 2005
lehigh.edu
102410 Posts |
Quote:
Code:
pair of (R_1, j, P_1) and (R_2, k, P_2) triples such that R_1 + j*P_1^2 = R_2 + k*P_2^2 remarkable factorization of the earlier post Quote:
(anyone?), P_1 and P_2 seem quite likely to be the factors in "poly 12 p = ..." namely P_1 = 106019 and P_2 = 108947. To remove any doubt that this rather non-random looking factorization isn't a coincidence, here's five more from the tail of this morning's output Code:
> ifactor(12013826899) = (198127) (60637) > ifactor(12066899171) = (59699) (202129) > ifactor(12064312489) = (228677) (52757) > ifactor(12064312489) = (228677) (52757) > ifactor(11988147403) = (264949) (45247) and the "q" is manufactured after the match in the arithmetic progressions mod P_1^2 and P_2^2, shifted by R_1 and R_2. Uhm. So we're looking at a collection of arithmetic progression values mod p^2 with p a prime between 5.2e4 and 2.7e5; so p^2 in 2.5e9-to-7.3e11 ... with each modulus having p^2 shifts R_1, R_2, and going out some length in the progressions. Now that's a massive search space, plenty of room for whatever stringent conditions are useful. Sorry jrk; but the distribution of factorization types among random numbers of the size of Y1 isn't where the hits on matches among those progression values occurs, as Jason describes. We already have "p" = P_1*P_2 from the match, and then search and/or manufacture the cofactor "q". (And remark that the rational poly (Y1)x+Y0, also written as px+m in the triple (c5,p,m) is a different use of the symbol "p".) So is this a simultaneous search, for both a progression value match and a cofactor q; or is that sequential, with each P_1*P_2 having a q-search? And what part of the search is uniquely suited to our GPU? I was considerably less succesful in my "early abort" proposal; with the best hit from the less likely (by my hypothesis ...) flare. Straight-up, both flares are still running (in that root sieve part of stage2) Code:
grep c5 stage2b/msieve.dat.p | uniq -c | tail -4
192 c5: 60492
432 c5: 187404
274 c5: 61488
3158265 c5: 189924
---
grep c5 stage2c/msieve.dat.p | uniq -c | tail -4
200 c5: 67260
116 c5: 199080
455 c5: 202800
3414673 c5: 202020
Code:
grep norm stage2b/msieve.dat.p | sort -gk7 | tail
...
# norm 1.545278e-18 alpha -8.254718 e 2.124e-14 rroots 3
# norm 1.557593e-18 alpha -8.052209 e 2.125e-14 rroots 3
# norm 1.726763e-18 alpha -7.413974 e 2.192e-14 rroots 5
# norm 1.665776e-18 alpha -8.499358 e 2.237e-14 rroots 3
and today's
grep norm stage2c/msieve.dat.p | sort -gk7 | tail
...
# norm 1.528283e-18 alpha -8.316211 e 2.071e-14 rroots 3
# norm 1.498165e-18 alpha -8.655833 e 2.078e-14 rroots 3
# norm 1.503053e-18 alpha -8.665323 e 2.082e-14 rroots 3
# norm 1.542442e-18 alpha -8.336848 e 2.092e-14 rroots 3
# norm 1.760565e-18 alpha -9.180202 e 2.310e-14 rroots 3
out of nowhere, combined with the 1.7 norm (the other norms from that top10 below 1.5). I did split off the rest of the two msieve.dat.m files (each with 2 days of reports from the two fermi cards), and found no other flares; so clearly there wasn't any payoff from cutting off that c5 = 202020 with Y1 = 160701245257519162799 (the only Y1 of the 10 with this c5 that has a top10 flare) Uhm, that's < 200000 -vs- over 1M from yesterday's Code:
841 stage2b/msieve.dat.p:c5: 186960 1434 stage2c/msieve.dat.p:c5: 199080 7315 stage2b/msieve.dat.p:c5: 182040 77391 stage2b/msieve.dat.p:c5: 186732 127743 stage2b/msieve.dat.p:c5: 183600 893383 stage2b/msieve.dat.p:c5: 189924 1265386 stage2c/msieve.dat.p:c5: 202020 Quote:
top10 for c5 = 182280 all above 2.372, three of which are above 2.4, up to 2.492 to go with the prior best 2.469. Ah, well. That's the relevant std deviation (as it turns out ...). These were the first two -np1's for the c187, with 2745 stage1 hits, two substantial flares, but neither of them was useful Code:
17845 c5: 26220 23588 c5: 145800 <--- prior best 70307 c5: 149880 661123 c5: 147420 1840174 c5: 35280 relevant, after all Code:
# norm 1.595271e-18 alpha -8.876493 e 2.163e-14 rroots 3 # norm 1.715896e-18 alpha -7.689817 e 2.191e-14 rroots 5 # norm 1.634327e-18 alpha -7.040722 e 2.211e-14 rroots 3 # norm 1.996707e-18 alpha -8.511119 e 2.469e-14 rroots 3 stage2 reports with c5 = 145800) were relevant to the 2.469 and that was Code:
grep norm stage2b/msieve-g01a.dat.p | head -60000 | sort -gk7 | tail
...
# norm 1.507029e-18 alpha -7.806310 e 2.062e-14 rroots 3
# norm 1.601214e-18 alpha -7.849728 e 2.144e-14 rroots 5
# norm 1.715896e-18 alpha -7.689817 e 2.191e-14 rroots 5
# norm 1.634327e-18 alpha -7.040722 e 2.211e-14 rroots 3
# norm 1.996707e-18 alpha -8.511119 e 2.469e-14 rroots 3
std deviation to matter; and "large flares", with no evidence yet that a large variation like the above could occur among millions of reports for that (c5, Y1) pair all well below the range of interest. Hmm, black swan range; they're sufficiently unlikely that they may occur where ever they want. Any chance that adjusting Serge's parameters for c185 and/or c190 (at least locally, for fermi's) would improve the rate of burning through these gigantic flares, without too much of a reduction in the chances of seeing one these elusive black swans? -Bruce |
|||
|
|
|
|
|
#38 | ||
|
Tribal Bullet
Oct 2004
354110 Posts |
Quote:
Quote:
The factorization of P1 and P2 is arbitrary, as long as P1 and P2 are coprime. You're correct that there are more combinations possible than can conveniently be searched, so in practice you need to settle on a subset of factorizations that make the all-against-all search more convenient. jrk has done a lot of work manufacturing P1 and P2 of the correct size in large batches. We need m = M0 + i, and i = R1 + j*P1^2 = R2 + k*P2^2 for small j and k. There are two ways to find j and k so that the correction i is small enough. You can break up the admissible range of i into small blocks and use a hashtable within each block to find a P1 and P2 that collide; that's convenient because the inner loops are driven by memory access and pointer chasing, rather than modular arithmetic. The other way is to go through each (P1,P2) pair and find the first k in the P2 progression that lies on the P1 progression; the first such k is (R2-R1)*inv(P1^2) mod P2^2. If the k is small enough then you have a hit. There are P1*P2 such tests but each involves only modular arithmetic and no memory, and the runtime is independent of the bound on j and k. It is the second formulation that the GPU code uses; all those execution units on a GPU make the second formulation run faster than the first in practice. Last fiddled with by jasonp on 2010-09-27 at 13:33 |
||
|
|
|
|
|
#39 | |
|
Jun 2005
lehigh.edu
100000000002 Posts |
Quote:
a description of the actual hardware Code:
The six core Intel Xeon (X5650 Westmere 2.66 GHz) CPU based server, with 24 GB of DDR3 memory, comes with 2 nVidia “Fermi” Tesla C2050 GPUs. Each one of those GPUs has 3 GB of GDDR5 memory and supports single and double precision floating point operations (with a Peak ...) across 448 cores. ... |
|
|
|
|
|
|
#40 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Yes; my medium-end GPU at home does an 8000 x 800 all-against-all search, across 40 A5 simultaneously, in a single call. The 8000*800 individual tests are handled 112 at a time because the card has 112 cores. Bigger cards handle bigger batches; I think a G200 series card searches blocks of size 40000 x 2000 in a single call.
By the way, if you're still doing any poly searches, can you try replacing the PTX files you have with the precomputed PTX files in the 'fermi' subdirectory of the CUDA download on sourceforge? No recompile is necessary, and you should hopefully get faster search times. |
|
|
|
|
|
#41 | ||
|
May 2008
3·5·73 Posts |
Quote:
Quote:
What the GPU search does is find combinations of one element from p and one element from q which satisfy the conditions that Jason mentioned. This is an all-against-all search which eventually tries all the possible combinations of p and q (or until time runs out). What I was hinting at earlier is that the reason the GPUs are so useful is because it takes billions of tests (at numbers this size) on average to find a combination satisfying the restriction on the size of the j,k terms Jason described. So you have to be able to burn up a lot of combinations in a short time, which we can do very effectively on a GPU. |
||
|
|
|
|
|
#42 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
of 1.47 (which I can tell because the logs now report 1.48!), the PTX's were new/different from what I got the first time, with the tarfile from before the official release. I wasn't ever able to locate the "fermi" subdirectory (after some effort/searching files from the tarfile extract, &etc.), but they're written along with the binary. Unless there's been a more recent update of PTX's, since Sept 18th? I'm very much still running the c187. The two flares I was grumbling about ran past 10M stage2 reports before I killed them; still with just one polyn above 2.3 (from the one I didn't expect) and the 2.2 (from the one that looked somewhat better, somewhat sooner) --- that was 10M+ stage2 root-sieve reports for each of the two (c5, m, p). I did go back and re-run the most productive flare (9 of the 10 best), which had gotten rebooted at 4M stage2 reports. It's now past 17M (counting the 4M from the first attempt, re-run), and still generating new top5 poly (all with a single c5 = 182280 and just one of several Y1's for that c5). I've now run all of 1-to-348348 once, and re-ran the most productive range at/above 120001. I did find 26 duplicate (c5, m, p)'s, but haven't yet tracked down the C/D, E/F ranges. Not worth removing, really; hardly any extra time in -np2. Hard to tell what proportion 26-of-???, as the two runs stopped at different places, the stage1 reports in the msieve.dat.m's are mixed from -g0 and -g1; and the 2nd pass uses multiples of 12 rather than multiples of 60. Observing that, I'm re-running 1-to-20000 with the new binary; despite that being one of the least productive ranges. I do expect to spend some time above 1000001, but haven't gotten there yet. The new binary is very fast in 1-to-1000; and instead of an initial C/D and then a second, I have Code:
searching leading coefficients from 1 to 20000 using GPU 0 (Tesla C2050) deadline: 800 seconds per coefficient coeff 12-552 5824496281 6406945909 6406945910 7047640501 ------- 5940986205-5999231167 6727293205-6791362664 ------- 5294996619-5347946585 7329546119-7400022523 ------- 5150587614-5198723946 7907452644-7984976689 ------- 4769872981-4813633283 9209856609-9295133059 ------- 4097555608-4137337701 9943234082-10037038177 coeff 564-1128 6519456975 7171402672 7171402673 7888542940 ------- 7041013527-7106208096 7171402673-7243116699 ------- 6400921388-6460189178 8598511804-8677397233 ------- 5872899251-5926779060 9371589014-9458362986 ------- 5338999323-5387980968 9640588333-9736039702 coeff 1140-1872 6904347527 7594782279 7594782280 8354260508 ------- 7042434477-7111477952 8050469212-8126417034 ------- 6276679569-6339446364 8688430929-8771973534 ------- 5934315227-5991375950 9373480292-9465377157 coeff 1896-2652 7186234313 7904857744 7904857745 8695343519 ------- 7473683685-7545546028 8537246361-8616294938 it will get back to two soon. The 10 largest flares are Code:
289608 c5: 63480 354056 c5: 327600 515681 c5: 127512 661123 c5: 147420 706200 c5: 134568 1840174 c5: 35280 5597766 c5: 50160 (reboot) 10555575 c5: 189924 10578461 c5: 202020 17379623 c5: 182280 (2nd pass, after 1st pass to 4065305) c5=182280 and that one Y1 --- is still from Code:
23588 c5: 145800 <--- prior best finished; and I haven't seen any evidence at all that there will be a finite number of stage2 reports in these four largest flares; and _lots_ of evidence (5M, resp. 10M(x2)) that 3-of-the-4 will never produce a single poly of interest. I'm typically running both boards for around two days, then send a TERM, move the .m reports and restart from just after where the previous run stopped. Counts of stage1 reports from c. 4 days of card-time (2 days*2cards) are the last few from Code:
3851 msieve-ini.dat.m 2745 msieve-g01a.dat.m 2586 msieve-g01b.dat.m 773 msieve-g01c.dat.m (reboot) 2203 msieve-g01d.dat.m 2447 msieve-g01f.dat.m 3216 msieve-g01g.dat.m 3025 msieve-g01h.dat.m 3163 msieve-g01i.dat.m 3283 msieve-g01j.dat.m 3776 msieve-g01k.dat.m TMI, for sure, but a large number of these (c5,m,p) reports spend just moments in stage2; with a vast majority of the stage2 cputime in the largest flares. For a better idea of what a mid-range flare looks like, there's Code:
tail -30 | head -5 15403 c5: 279300 16704 c5: 177840 17451 c5: 143640 17845 c5: 26220 18118 c5: 58200 and tail -50 | head -5 2821 c5: 129600 2879 c5: 157080 3056 c5: 305448 3220 c5: 123420 3378 c5: 157920 computing with ecm and then sieving RSA120, Arjen was using Bellcore's 16K processor maspar. There's also the matrix for snfs512, aka F9 (before I started) which I believe I recall using 25K processors? More recently we're hearing 100000's of bluegene cell processors; which sounds massive. I'm very happy with the few 100's on the fermi cards; thanks for the clarification. -Bruce |
|
|
|
|
|
|
#43 | ||
|
May 2008
3·5·73 Posts |
Quote:
Quote:
The optimal size of p*q depends on the size of a_d. The larger the p,q the bigger the ranges are, and bigger ranges take longer to finish than smaller ranges (more things to test). There is a time limit for each batch of a_d, so what you are noticing is that more smaller ranges are done for small a_d yet fewer bigger ranges done for large a_d. The amount of work that is actually getting done in either case is about the same. |
||
|
|
|
|
|
#44 | |
|
Jun 2005
lehigh.edu
210 Posts |
Quote:
"millions", as I recalled) and 10M; and am currently running 30M (re-reading the "tens of millions" here). On the rates of stage1 hits, I'm seeing Code:
91.5 hits/hour from 10M 122 hits/hour from 30M going larger. I had both boards running 2M and 10M into a single msieve.dat.m, and have a vague recollection that 2M gave better polyn, with way smaller norms from 10M that the alphas weren't able to make up. I'm just now running -np2 on stage1 hits from a single board in 10M, and expect to get a clear read once my current single board run from 30M has run for 2-3 days in -np1. Unless the run from 30M gives an unexpectedly good polyn (in which case, I might consider 50M?) I'm thinking about going back to sample 1M, 3M, 5M. Meanwhile, the other board is running Serge's new c176 gnfs, but hasn't yet gotten anything near the top10 range from the previous gnfs176 (as in the title of this subthread). Looks like three more days of sieving on 5L815 (with ggnfs/16e), during which we hope to see a usable polyn for the new 3, 610+ gnfs176. -Bruce PS - Ah, and the 2nd dependency from 3, 605- gnfs176 has just reported p53*p123, a somewhat undertested number from the extension. Last fiddled with by bdodson on 2010-10-16 at 13:50 Reason: breaking news/this-just-in |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Tweaking polynomial search for C197 | fivemack | Msieve | 38 | 2011-07-08 08:12 |
| 109!+1 polynomial search | fivemack | Factoring | 122 | 2009-02-24 07:03 |
| 5^421-1 polynomial search | fivemack | Factoring | 61 | 2008-07-21 11:16 |
| 6^383+1 by GNFS (polynomial search; now complete) | fivemack | Factoring | 20 | 2007-12-26 10:36 |
| GNFS polynomial search tools | JHansen | Factoring | 0 | 2004-11-07 12:15 |