mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-09-23, 17:50   #34
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by bdodson View Post
tacked on to fill-out the time for this range of algebraic c5's, with
reports p, q -> Y1 = p*q disjoint from the initial search space. I can
see that this member of the C-to-D range is in fact coprime to the
member of the E-to-F range; and wonder whether that's a condition
on the search? This is where the massively parallel gpu search is
occuring (if I'm reading correctly), and worth hearing some more details.
Coprimality is guaranteed when the sets are constructed. The GPU portion of the search starts with the C/D and E/F sets, each of which contains many arithmetic progressions of the form R_i + j * P_i^2, where R_i < P_i^2. The task is to perform an all-against-all comparison between the two sets, looking for a pair of (R_1, j, P_1) and (R_2, k, P_2) triples such that

R_1 + j*P_1^2 = R_2 + k*P_2^2

and both j and k are smaller than a fairly stringent bound. When such a pair is found then P_1 * P_2 generates the leading rational coefficient Y1 of a GNFS polynomial and the above equality generates a 'small' correction to the 'optimal' value of Y0, such that Y1 and Y0 satisfy a modular condition that is required for GNFS polynomials.
jasonp is offline   Reply With Quote
Old 2010-09-23, 19:09   #35
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by jasonp View Post
When such a pair is found then P_1 * P_2 generates the leading rational coefficient Y1 of a GNFS polynomial and the above equality generates a 'small' correction to the 'optimal' value of Y0, such that Y1 and Y0 satisfy a modular condition that is required for GNFS polynomials.
Note also that, the 'small' correction mentioned above leads to having a correspondingly small A_[d-2] coefficient in the expanded polynomial, which is the whole purpose of doing this fancy search. At numbers this large, only about 1 in a billion combination of Y1's factors meet those requirements Jason mentions. So you need to burn through a lot of combinations to find many hits, and GPUs are excellent at accelerating this search because they can cheaply burn hundreds of these things in parallel.
jrk is offline   Reply With Quote
Old 2010-09-25, 11:58   #36
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by jasonp View Post
...looking for a pair of (R_1, j, P_1) and (R_2, k, P_2) triples such that

R_1 + j*P_1^2 = R_2 + k*P_2^2

and both j and k are smaller than a fairly stringent bound. When such a pair is found then P_1 * P_2 generates the leading rational coefficient Y1 of a GNFS polynomial and the above equality generates a 'small' correction to the 'optimal' value of Y0, such that Y1 and Y0 satisfy a modular condition that is required for GNFS polynomials.
Thanks, that's clear. Are P_1, P_2 or P_1*P_2 visible in the stage1
reports (Y1 or the p, q with p*q = Y1, or ...), or does the "generating"
of Y1 leave the initial match behind?

@jrk: "hundreds" of combinations in parallel? That doesn't _sound_
"massively parallel" (someone else's quotes).

On the "early abort" of the root sieve, a reboot has me with two -np2's
running at once, with
Code:
    841 stage2b/msieve.dat.p:c5:  186960
   1434 stage2c/msieve.dat.p:c5:  199080
   7315 stage2b/msieve.dat.p:c5:  182040
  77391 stage2b/msieve.dat.p:c5:  186732
 127743 stage2b/msieve.dat.p:c5:  183600
 893383 stage2b/msieve.dat.p:c5:  189924
1265386 stage2c/msieve.dat.p:c5:  202020
Some 15 hours of cputime (=walltime), of which the c5 = 189924 flare
has the best of
Code:
# norm 1.542442e-18 alpha -8.336848 e 2.092e-14 rroots 3
# norm 1.545278e-18 alpha -8.254718 e 2.124e-14 rroots 3
# norm 1.557593e-18 alpha -8.052209 e 2.125e-14 rroots 3
# norm 1.726763e-18 alpha -7.413974 e 2.192e-14 rroots 5
# norm 1.665776e-18 alpha -8.499358 e 2.237e-14 rroots 3
while the best 1.2 million stage2 hits with c5 = 202020 has managed is
Code:
# norm 1.440151e-18 alpha -8.162720 e 1.995e-14 rroots 1
# norm 1.447408e-18 alpha -8.420328 e 2.023e-14 rroots 3
# norm 1.449678e-18 alpha -8.415766 e 2.026e-14 rroots 3
# norm 1.528283e-18 alpha -8.316211 e 2.071e-14 rroots 3
# norm 1.542442e-18 alpha -8.336848 e 2.092e-14 rroots 3
By contrast, the flare that gave the new best had already found
two 2.3's in the first five hours, and another two in the next five
hours; early promise before the new 2.4. Before the reboot hit,
this flare with c5 = 182280 filled-in 9 of the cumulative top10, with
the one earlier 2.4 mentioned elsewhere.

So on a side issue, I suppose 37hours on that flare is most likely
representative of what to expect from another 2 days or so; but
is it possible to restart this root sieve without redo-ing the first 37hrs?
Not worth that; seems unlikely to produce that 2.6-2.7 that's supposed
to be lurking out there.

By contrast, the root sieve for c5=202020 has every prospect to keep
running another 2-3 days without ever hitting 2.2. I'm inclined to start
a 3rd -np2 from the stage1 hit just after 202020, and kill 202020 if/when
it continues running without prospect. At least, that's consistent with
what I was seeing with Arjen's version of the initial Montgomery-Murphy
code, c. rsa140/155. Maybe c5= 189924 might hit a productive range
somewhat further out, but a few std deviations up from where 202020
is won't help ... -Bruce

---
Hmm, that's 10 stage1 hits with c5=202020, distinct Y1's; these
are all from the first one ...

Last fiddled with by bdodson on 2010-09-25 at 12:05
bdodson is offline   Reply With Quote
Old 2010-09-26, 14:52   #37
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

102410 Posts
Default GPU reports, and "shows what I know ..."

Quote:
Originally Posted by bdodson View Post
Thanks, that's clear. Are P_1, P_2 or P_1*P_2 visible in the stage1
reports (Y1 or the p, q with p*q = Y1, or ...), ...
these being Jason's
Code:
pair of (R_1, j, P_1) and (R_2, k, P_2) triples such that

R_1 + j*P_1^2 = R_2 + k*P_2^2
There was in fact an obvious location for P_1 and P_2 in the
remarkable factorization of the earlier post
Quote:
Originally Posted by bd
a stage1 hit was

"poly 12 p 11550451993 q 12350885161 coeff 142658306123186575873"

in which the triple (a5, p, m) with rational polyn px + m == Y1*x +Y0

has coeff Y1 = 11550451993 * 12350885161,

where 11550451993 = (106019) * (108947) and
12350885161 = (23^2) * (97) * (313) * (769).
Short of locating the place in the code where the searching is done
(anyone?), P_1 and P_2 seem quite likely to be the factors in
"poly 12 p = ..." namely P_1 = 106019 and P_2 = 108947. To remove
any doubt that this rather non-random looking factorization isn't a
coincidence, here's five more from the tail of this morning's output
Code:
> ifactor(12013826899) =    (198127)  (60637)

> ifactor(12066899171) =     (59699)  (202129)

> ifactor(12064312489)  =   (228677)  (52757)

> ifactor(12064312489)  =   (228677)  (52757)

> ifactor(11988147403)  =  (264949)  (45247)
where those factors are in the leading rational coeff Y1 = p*q;
and the "q" is manufactured after the match in the arithmetic
progressions mod P_1^2 and P_2^2, shifted by R_1 and R_2.
Uhm. So we're looking at a collection of arithmetic progression
values mod p^2 with p a prime between 5.2e4 and 2.7e5; so
p^2 in 2.5e9-to-7.3e11 ... with each modulus having p^2 shifts
R_1, R_2, and going out some length in the progressions. Now
that's a massive search space, plenty of room for whatever
stringent conditions are useful.

Sorry jrk; but the distribution of factorization types among
random numbers of the size of Y1 isn't where the hits on matches
among those progression values occurs, as Jason describes. We
already have "p" = P_1*P_2 from the match, and then search and/or
manufacture the cofactor "q". (And remark that the rational poly
(Y1)x+Y0, also written as px+m in the triple (c5,p,m) is a different
use of the symbol "p".) So is this a simultaneous search, for both
a progression value match and a cofactor q; or is that sequential,
with each P_1*P_2 having a q-search? And what part of the search
is uniquely suited to our GPU?

I was considerably less succesful in my "early abort" proposal;
with the best hit from the less likely (by my hypothesis ...) flare.
Straight-up, both flares are still running (in that root sieve part of
stage2)
Code:
 grep c5 stage2b/msieve.dat.p | uniq -c | tail -4

    192 c5:  60492
    432 c5:  187404
    274 c5:  61488
3158265 c5:  189924
       ---

grep c5 stage2c/msieve.dat.p | uniq -c | tail -4

    200 c5:  67260
    116 c5:  199080
    455 c5:  202800
3414673 c5:  202020
with yesterday's winner showing
Code:
 grep norm stage2b/msieve.dat.p | sort -gk7 | tail 
     ...
# norm 1.545278e-18 alpha -8.254718 e 2.124e-14 rroots 3
# norm 1.557593e-18 alpha -8.052209 e 2.125e-14 rroots 3
# norm 1.726763e-18 alpha -7.413974 e 2.192e-14 rroots 5
# norm 1.665776e-18 alpha -8.499358 e 2.237e-14 rroots 3

and today's

grep norm stage2c/msieve.dat.p | sort -gk7 | tail
      ...
# norm 1.528283e-18 alpha -8.316211 e 2.071e-14 rroots 3
# norm 1.498165e-18 alpha -8.655833 e 2.078e-14 rroots 3
# norm 1.503053e-18 alpha -8.665323 e 2.082e-14 rroots 3
# norm 1.542442e-18 alpha -8.336848 e 2.092e-14 rroots 3
# norm 1.760565e-18 alpha -9.180202 e 2.310e-14 rroots 3
Looks like one of those exceptional alpha -9.180202 from
out of nowhere, combined with the 1.7 norm (the other norms from
that top10 below 1.5). I did split off the rest of the two msieve.dat.m
files (each with 2 days of reports from the two fermi cards), and
found no other flares; so clearly there wasn't any payoff from
cutting off that c5 = 202020 with Y1 = 160701245257519162799
(the only Y1 of the 10 with this c5 that has a top10 flare)

Uhm, that's < 200000 -vs- over 1M from yesterday's
Code:
    841 stage2b/msieve.dat.p:c5:  186960
   1434 stage2c/msieve.dat.p:c5:  199080
   7315 stage2b/msieve.dat.p:c5:  182040
  77391 stage2b/msieve.dat.p:c5:  186732
 127743 stage2b/msieve.dat.p:c5:  183600
 893383 stage2b/msieve.dat.p:c5:  189924
1265386 stage2c/msieve.dat.p:c5:  202020
And, for the record
Quote:
Originally Posted by bd
By contrast, the root sieve for c5=202020 has every prospect to keep
running another 2-3 days without ever hitting 2.2. ...

[and]

Maybe c5= 189924 might hit a productive range
somewhat further out, but a few std deviations up from where 202020
is won't help ... -Bruce
Both wrong. Not that locating another 2.31 is any help, with the
top10 for c5 = 182280 all above 2.372, three of which are above 2.4,
up to 2.492 to go with the prior best 2.469. Ah, well. That's the
relevant std deviation (as it turns out ...). These were the first two
-np1's for the c187, with 2745 stage1 hits, two substantial flares, but
neither of them was useful
Code:
  17845 c5:  26220
  23588 c5:  145800   <--- prior best
  70307 c5:  149880
 661123 c5:  147420
1840174 c5:  35280
Hmm; so nevermind, the std deviation from the large flares wasn't
relevant, after all
Code:
# norm 1.595271e-18 alpha -8.876493 e 2.163e-14 rroots 3
# norm 1.715896e-18 alpha -7.689817 e 2.191e-14 rroots 5
# norm 1.634327e-18 alpha -7.040722 e 2.211e-14 rroots 3
# norm 1.996707e-18 alpha -8.511119 e 2.469e-14 rroots 3
is from 26.67M stage2 hits, but only the first 60K (to include all
stage2 reports with c5 = 145800) were relevant to the 2.469 and
that was
Code:
grep norm stage2b/msieve-g01a.dat.p | head -60000 | sort -gk7 | tail 
    ...
# norm 1.507029e-18 alpha -7.806310 e 2.062e-14 rroots 3
# norm 1.601214e-18 alpha -7.849728 e 2.144e-14 rroots 5
# norm 1.715896e-18 alpha -7.689817 e 2.191e-14 rroots 5
# norm 1.634327e-18 alpha -7.040722 e 2.211e-14 rroots 3
# norm 1.996707e-18 alpha -8.511119 e 2.469e-14 rroots 3
So that's "short flares", where there aren't enough hits for the
std deviation to matter; and "large flares", with no evidence yet
that a large variation like the above could occur among millions
of reports for that (c5, Y1) pair all well below the range of interest.
Hmm, black swan range; they're sufficiently unlikely that they
may occur where ever they want.

Any chance that adjusting Serge's parameters for c185 and/or
c190 (at least locally, for fermi's) would improve the rate of burning
through these gigantic flares, without too much of a reduction in
the chances of seeing one these elusive black swans? -Bruce
bdodson is offline   Reply With Quote
Old 2010-09-27, 13:30   #38
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

354110 Posts
Default

Quote:
Originally Posted by bdodson View Post
these being Jason's
Code:
pair of (R_1, j, P_1) and (R_2, k, P_2) triples such that

R_1 + j*P_1^2 = R_2 + k*P_2^2
There was in fact an obvious location for P_1 and P_2 in the
remarkable factorization of the earlier post

Short of locating the place in the code where the searching is done
(anyone?), P_1 and P_2 seem quite likely to be the factors in
"poly 12 p = ..." namely P_1 = 106019 and P_2 = 108947.
Yes, those are how P1 and P2 are chosen; one has all the small factors and the other has all the large factors (although either can be prime as well).
Quote:
Sorry jrk; but the distribution of factorization types among
random numbers of the size of Y1 isn't where the hits on matches
among those progression values occurs, as Jason describes. We
already have "p" = P_1*P_2 from the match, and then search and/or
manufacture the cofactor "q". (And remark that the rational poly
(Y1)x+Y0, also written as px+m in the triple (c5,p,m) is a different
use of the symbol "p".) So is this a simultaneous search, for both
a progression value match and a cofactor q; or is that sequential,
with each P_1*P_2 having a q-search? And what part of the search
is uniquely suited to our GPU?
To add some detail, we need a (c5,p,m) triple such that c3 in the resulting algebraic poly is small, and also p^2 divides N - m^5 (the modular condition I alluded to). The requirement that c3 is small means m is close to M0 (= (N/c5)^1/5). With p = P1 * P2 the modular condition effectively means that the R1 in R1+j*P1^2 is a 5th root of N modulo P1^2.

The factorization of P1 and P2 is arbitrary, as long as P1 and P2 are coprime. You're correct that there are more combinations possible than can conveniently be searched, so in practice you need to settle on a subset of factorizations that make the all-against-all search more convenient. jrk has done a lot of work manufacturing P1 and P2 of the correct size in large batches.

We need m = M0 + i, and i = R1 + j*P1^2 = R2 + k*P2^2 for small j and k. There are two ways to find j and k so that the correction i is small enough. You can break up the admissible range of i into small blocks and use a hashtable within each block to find a P1 and P2 that collide; that's convenient because the inner loops are driven by memory access and pointer chasing, rather than modular arithmetic. The other way is to go through each (P1,P2) pair and find the first k in the P2 progression that lies on the P1 progression; the first such k is (R2-R1)*inv(P1^2) mod P2^2. If the k is small enough then you have a hit. There are P1*P2 such tests but each involves only modular arithmetic and no memory, and the runtime is independent of the bound on j and k. It is the second formulation that the GPU code uses; all those execution units on a GPU make the second formulation run faster than the first in practice.

Last fiddled with by jasonp on 2010-09-27 at 13:33
jasonp is offline   Reply With Quote
Old 2010-10-05, 18:07   #39
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

100000000002 Posts
Default

Quote:
Originally Posted by jasonp View Post
...
To add some detail, we need a (c5,p,m) triple such that c3 in the resulting algebraic poly is small, and also p^2 divides N - m^5 (the modular condition I alluded to). ...

The factorization of P1 and P2 is arbitrary, as long as P1 and P2 are coprime. You're correct that there are more combinations possible than can conveniently be searched, so in practice you need to settle on a subset of factorizations that make the all-against-all search more convenient. jrk has done a lot of work manufacturing P1 and P2 of the correct size in large batches.

We need m = M0 + i, and i = R1 + j*P1^2 = R2 + k*P2^2 for small j and k. There are two ways to find j and k so that the correction i is small enough. You can ... The other way is to go through each (P1,P2) pair and find the first k in the P2 progression that lies on the P1 progression; the first such k is (R2-R1)*inv(P1^2) mod P2^2. If the k is small enough then you have a hit. There are P1*P2 such tests but each involves only modular arithmetic and no memory, and the runtime is independent of the bound on j and k. It is the second formulation that the GPU code uses; all those execution units on a GPU make the second formulation run faster than the first in practice.
Ah, execution units. The HPC person that collected vendor quotes posted
a description of the actual hardware
Code:
The six core Intel Xeon (X5650 Westmere 2.66 GHz) CPU based server, 
with 24 GB of DDR3 memory, comes with 2 nVidia “Fermi” Tesla C2050 GPUs.

Each one of those GPUs has 3 GB of GDDR5 memory and supports single 
and double precision floating point operations (with a Peak ...) across 
448 cores. ...
Are these the "execution units" in question? -Bruce
bdodson is offline   Reply With Quote
Old 2010-10-06, 17:33   #40
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by bdodson View Post
Are these the "execution units" in question? -Bruce
Yes; my medium-end GPU at home does an 8000 x 800 all-against-all search, across 40 A5 simultaneously, in a single call. The 8000*800 individual tests are handled 112 at a time because the card has 112 cores. Bigger cards handle bigger batches; I think a G200 series card searches blocks of size 40000 x 2000 in a single call.

By the way, if you're still doing any poly searches, can you try replacing the PTX files you have with the precomputed PTX files in the 'fermi' subdirectory of the CUDA download on sourceforge? No recompile is necessary, and you should hopefully get faster search times.
jasonp is offline   Reply With Quote
Old 2010-10-06, 21:09   #41
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by bdodson View Post
@jrk: "hundreds" of combinations in parallel? That doesn't _sound_
"massively parallel" (someone else's quotes).
One test for each thread, of which there are hundreds on the biggest cards. Personally I'd call having hundreds of threads running on a single card "massive," though your definition might vary. :)

Quote:
Originally Posted by bdodson View Post
Sorry jrk; but the distribution of factorization types among
random numbers of the size of Y1 isn't where the hits on matches
among those progression values occurs, as Jason describes. We
already have "p" = P_1*P_2 from the match, and then search and/or
manufacture the cofactor "q". (And remark that the rational poly
(Y1)x+Y0, also written as px+m in the triple (c5,p,m) is a different
use of the symbol "p".) So is this a simultaneous search, for both
a progression value match and a cofactor q; or is that sequential,
with each P_1*P_2 having a q-search? And what part of the search
is uniquely suited to our GPU?
There's some confusion of terms here I'm afraid. You're right that the term "p" is sometimes used to refer to the Y1 cofactor. But in the context of the stage1 search used by msieve, "p" refers to just one of the groups of factors used to build Y1. The other group being called "q," so that Y1=p*q. The factorizations of elements of p and q are mostly irrelevant so long as the factors of p are never factors of q and vice-versa. The way that p and q are generated guarantees that they are coprime.

What the GPU search does is find combinations of one element from p and one element from q which satisfy the conditions that Jason mentioned. This is an all-against-all search which eventually tries all the possible combinations of p and q (or until time runs out).

What I was hinting at earlier is that the reason the GPUs are so useful is because it takes billions of tests (at numbers this size) on average to find a combination satisfying the restriction on the size of the j,k terms Jason described. So you have to be able to burn up a lot of combinations in a short time, which we can do very effectively on a GPU.
jrk is offline   Reply With Quote
Old 2010-10-07, 03:47   #42
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by jasonp View Post
Yes; my medium-end GPU at home does an 8000 x 800 all-against-all search, across 40 A5 simultaneously, in a single call. The 8000*800 individual tests are handled 112 at a time because the card has 112 cores. Bigger cards handle bigger batches; I think a G200 series card searches blocks of size 40000 x 2000 in a single call.

By the way, if you're still doing any poly searches, can you try replacing the PTX files you have with the precomputed PTX files in the 'fermi' subdirectory of the CUDA download on sourceforge? No recompile is necessary, and you should hopefully get faster search times.
Thanks!! When I downloaded the tarfile after the "official" release
of 1.47 (which I can tell because the logs now report 1.48!), the PTX's
were new/different from what I got the first time, with the tarfile
from before the official release. I wasn't ever able to locate the "fermi"
subdirectory (after some effort/searching files from the tarfile extract,
&etc.), but they're written along with the binary. Unless there's been
a more recent update of PTX's, since Sept 18th?

I'm very much still running the c187. The two flares I was grumbling
about ran past 10M stage2 reports before I killed them; still with just one
polyn above 2.3 (from the one I didn't expect) and the 2.2 (from the one
that looked somewhat better, somewhat sooner) --- that was 10M+
stage2 root-sieve reports for each of the two (c5, m, p). I did go back
and re-run the most productive flare (9 of the 10 best), which had gotten
rebooted at 4M stage2 reports. It's now past 17M (counting the 4M from
the first attempt, re-run), and still generating new top5 poly (all with
a single c5 = 182280 and just one of several Y1's for that c5).

I've now run all of 1-to-348348 once, and re-ran the most productive
range at/above 120001. I did find 26 duplicate (c5, m, p)'s, but haven't
yet tracked down the C/D, E/F ranges. Not worth removing, really;
hardly any extra time in -np2. Hard to tell what proportion 26-of-???,
as the two runs stopped at different places, the stage1 reports in the
msieve.dat.m's are mixed from -g0 and -g1; and the 2nd pass uses
multiples of 12 rather than multiples of 60. Observing that, I'm re-running
1-to-20000 with the new binary; despite that being one of the least
productive ranges. I do expect to spend some time above 1000001,
but haven't gotten there yet. The new binary is very fast in 1-to-1000;
and instead of an initial C/D and then a second, I have
Code:
searching leading coefficients from 1 to 20000
using GPU 0 (Tesla C2050)
deadline: 800 seconds per coefficient
coeff 12-552 5824496281 6406945909 6406945910 7047640501
------- 5940986205-5999231167 6727293205-6791362664
------- 5294996619-5347946585 7329546119-7400022523
------- 5150587614-5198723946 7907452644-7984976689
------- 4769872981-4813633283 9209856609-9295133059
------- 4097555608-4137337701 9943234082-10037038177
coeff 564-1128 6519456975 7171402672 7171402673 7888542940
------- 7041013527-7106208096 7171402673-7243116699
------- 6400921388-6460189178 8598511804-8677397233
------- 5872899251-5926779060 9371589014-9458362986
------- 5338999323-5387980968 9640588333-9736039702
coeff 1140-1872 6904347527 7594782279 7594782280 8354260508
------- 7042434477-7111477952 8050469212-8126417034
------- 6276679569-6339446364 8688430929-8771973534
------- 5934315227-5991375950 9373480292-9465377157
coeff 1896-2652 7186234313 7904857744 7904857745 8695343519
------- 7473683685-7545546028 8537246361-8616294938
Uhm, that's five ranges; then four; then three --- looks like
it will get back to two soon.

The 10 largest flares are
Code:
 289608 c5:  63480
 354056 c5:  327600
 515681 c5:  127512
 661123 c5:  147420
 706200 c5:  134568
1840174 c5:  35280
5597766 c5:  50160  (reboot)
10555575 c5:  189924
10578461 c5:  202020
17379623 c5:  182280  (2nd pass, after 1st pass to 4065305)
where the 2nd best poly --- the only one of the top10 that doesn't have
c5=182280 and that one Y1 --- is still from
Code:
23588 c5:  145800   <--- prior best
Only the smallest one with more than 1000000 stage2 reports actually
finished; and I haven't seen any evidence at all that there will be a
finite number of stage2 reports in these four largest flares; and _lots_
of evidence (5M, resp. 10M(x2)) that 3-of-the-4 will never produce a
single poly of interest. I'm typically running both boards for around
two days, then send a TERM, move the .m reports and restart from
just after where the previous run stopped. Counts of stage1 reports
from c. 4 days of card-time (2 days*2cards) are the last few from
Code:
  3851 msieve-ini.dat.m
  2745 msieve-g01a.dat.m
  2586 msieve-g01b.dat.m
   773 msieve-g01c.dat.m  (reboot)
  2203 msieve-g01d.dat.m
  2447 msieve-g01f.dat.m
  3216 msieve-g01g.dat.m
  3025 msieve-g01h.dat.m
  3163 msieve-g01i.dat.m
  3283 msieve-g01j.dat.m
  3776 msieve-g01k.dat.m
where g01e ran in 3 chunks, after hitting the 5.597M flare.
TMI, for sure, but a large number of these (c5,m,p) reports
spend just moments in stage2; with a vast majority of the
stage2 cputime in the largest flares. For a better idea of
what a mid-range flare looks like, there's
Code:
 tail -30 | head -5

  15403 c5:  279300
  16704 c5:  177840
  17451 c5:  143640
  17845 c5:  26220
  18118 c5:  58200

and

 tail -50 | head -5

   2821 c5:  129600
   2879 c5:  157080
   3056 c5:  305448
   3220 c5:  123420
   3378 c5:  157920
@jrk, I'll postpone a careful reading for later, but when I started
computing with ecm and then sieving RSA120, Arjen was using
Bellcore's 16K processor maspar. There's also the matrix for snfs512,
aka F9 (before I started) which I believe I recall using 25K processors?
More recently we're hearing 100000's of bluegene cell processors;
which sounds massive. I'm very happy with the few 100's on the
fermi cards; thanks for the clarification. -Bruce
bdodson is offline   Reply With Quote
Old 2010-10-07, 05:13   #43
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by bdodson View Post
and the 2nd pass uses
multiples of 12 rather than multiples of 60. Observing that, I'm re-running
1-to-20000 with the new binary;
Blame me for that one. I decided it would be a good idea to identify the best a_d coefficients by keeping only the ones which are made of small primes & prime powers. But that meant throwing away a lot of the a_d at the larger ranges, so the multiplier was lowered from 60 to 12 to increase the number again. The idea is that the presence of many small factors in a_d should increase the root properties of the resulting polynomials.

Quote:
Originally Posted by bdodson View Post
despite that being one of the least
productive ranges. I do expect to spend some time above 1000001,
but haven't gotten there yet.
For this c187, if you try some a_d in the tens of millions, you should see your stage1 hit rate increase to a few times faster than it is under 10000. The result is that you get many more chances to find a great size score, although the skew will be smaller and the average root score might take a dip. Give it a try and see how it works.

Quote:
Originally Posted by bdodson View Post
Uhm, that's five ranges; then four; then three --- looks like
it will get back to two soon.
The optimal size of p*q depends on the size of a_d. The larger the p,q the bigger the ranges are, and bigger ranges take longer to finish than smaller ranges (more things to test). There is a time limit for each batch of a_d, so what you are noticing is that more smaller ranges are done for small a_d yet fewer bigger ranges done for large a_d. The amount of work that is actually getting done in either case is about the same.
jrk is offline   Reply With Quote
Old 2010-10-16, 13:40   #44
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by jrk View Post
...
For this c187, if you try some a_d in the tens of millions, you should see your stage1 hit rate increase to a few times faster than it is under 10000. The result is that you get many more chances to find a great size score, although the skew will be smaller and the average root score might take a dip. Give it a try and see how it works....
I've tried 2M (from when I was thinking of a previous post as trying
"millions", as I recalled) and 10M; and am currently running 30M (re-reading
the "tens of millions" here). On the rates of stage1 hits, I'm seeing
Code:
91.5 hits/hour from 10M

122 hits/hour from 30M
so even way out here there's an observable increase in the rate from
going larger. I had both boards running 2M and 10M into a single
msieve.dat.m, and have a vague recollection that 2M gave better polyn,
with way smaller norms from 10M that the alphas weren't able to make up.
I'm just now running -np2 on stage1 hits from a single board in 10M, and
expect to get a clear read once my current single board run from 30M has
run for 2-3 days in -np1. Unless the run from 30M gives an unexpectedly
good polyn (in which case, I might consider 50M?) I'm thinking about going
back to sample 1M, 3M, 5M.

Meanwhile, the other board is running Serge's new c176 gnfs, but hasn't
yet gotten anything near the top10 range from the previous gnfs176 (as
in the title of this subthread). Looks like three more days of sieving on
5L815 (with ggnfs/16e), during which we hope to see a usable polyn for the
new 3, 610+ gnfs176. -Bruce

PS - Ah, and the 2nd dependency from 3, 605- gnfs176 has just reported
p53*p123, a somewhat undertested number from the extension.

Last fiddled with by bdodson on 2010-10-16 at 13:50 Reason: breaking news/this-just-in
bdodson is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tweaking polynomial search for C197 fivemack Msieve 38 2011-07-08 08:12
109!+1 polynomial search fivemack Factoring 122 2009-02-24 07:03
5^421-1 polynomial search fivemack Factoring 61 2008-07-21 11:16
6^383+1 by GNFS (polynomial search; now complete) fivemack Factoring 20 2007-12-26 10:36
GNFS polynomial search tools JHansen Factoring 0 2004-11-07 12:15

All times are UTC. The time now is 00:47.


Sat Jul 17 00:47:39 UTC 2021 up 49 days, 22:34, 1 user, load averages: 1.66, 1.48, 1.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.