mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   ECM Efforts (https://www.mersenneforum.org/showthread.php?t=3998)

R.D. Silverman 2005-04-13 14:05

ECM Efforts
 
Hello to All;

I am thankful for the help people have given in applying ECM efforts to
the 2+ table.

I am in the process of doing 2,737+ via NFS and will then do 2,749+.
Finishing these will take me about 3 months.

Before I go further, I ask that some additional ECM effort be expended in
particular on the following numbers: (to at least the 45 digit level)

2,791+, 2,951+, 2,969+, 2,993+, 2,1322M, 2,1334L, 2,1342L.

Thanks in advance for any help.

Xyzzy 2005-04-14 04:36

I'd be willing to do some of this work if you could simplify it a bit... As it is now, I would have to refer to the various tables and try to determine how far each one has been done, and in the process hope that the work isn't being duplicated...

Is it possible for you to just ask for x curves for a particular number at a particular depth? I could set up something like that in a matter of minutes and walk away...

One reason I have been doing M1061 work is because it is so simple, and I have very little free time to devote to figuring out other work... Well that, and the fact I have no clue what I am doing... Having a dedicated thread for M1061 makes it so easy to coordinate work...

If you want to correspond via email about this work or other work you need done feel free to email me at xyzzyATmersenneforumDOTorg or send me a PM...

Or maybe setting up this work is a lot simpler than I have been doing it? I'd be willing to learn if there is a better way...

R.D. Silverman 2005-04-14 13:38

[quote=Xyzzy]I'd be willing to do some of this work if you could simplify it a bit... As it is now, I would have to refer to the various tables and try to determine how far each one has been done, and in the process hope that the work isn't being duplicated...

Is it possible for you to just ask for x curves for a particular number at a particular depth? I could set up something like that in a matter of minutes and walk away...

One reason I have been doing M1061 work is because it is so simple, and I have very little free time to devote to figuring out other work... Well that, and the fact I have no clue what I am doing... Having a dedicated thread for M1061 makes it so easy to coordinate work...

If you want to correspond via email about this work or other work you need done feel free to email me at xyzzyATmersenneforumDOTorg or send me a PM...

Or maybe setting up this work is a lot simpler than I have been doing it? I'd be willing to learn if there is a better way...[/quote]

I must be having a stupidity attack because I don't understand the request
to 'simplify'. I don't keep track of exact curve counts; George Woltman
has a web page that gives them. They are just a click away. It is trivial to
determine how many more are required to finish ECM to a given level. Or
have I mis-understood you?

akruppa 2005-04-14 14:35

I believe Xyzzy is asking for step-by-step instructions how to set up the programs so they attack the numbers you requested as he is not familiar with the process.

Xyzzy, I suggest doing the numbers entirely in Prime95 because it's somewhat easier than resuming residues with GMP-ECM. The Aurifeullians (the ones with L or M) are better done with GMP-ECM.

So, to do 1000 curves on 2,791+, you should
1. stop Prime95
2. add the line
ECM=791,11000000,0,1000,0,0,1
to the worktodo.ini file
3. make sure you have the lowp.txt file from [url]http://www.mersenne.org/ecm.htm[/url] in your Prime95 directory
4. start Prime95 again

If the ECM= line is first in worktodo.ini, Prime95 should say that it's starting ECM on P791.

The other numbers (2,951+, 2,969+, 2,993+) work the same, except you'd replace the 791 in the "ECM=" line accordingly.

Alex

R.D. Silverman 2005-04-14 14:40

[QUOTE=akruppa]I believe Xyzzy is asking for step-by-step instructions how to set up the programs so they attack the numbers you requested as he is not familiar with the process.

Xyzzy, I suggest doing the numbers entirely in Prime95 because it's somewhat easier than resuming residues with GMP-ECM. The Aurifeullians (the ones with L or M) are better done with GMP-ECM.

So, to do 1000 curves on 2,791+, you should
1. stop Prime95
2. add the line
ECM=791,11000000,0,1000,0,0,1
to the worktodo.ini file
3. make sure you have the lowp.txt file from [url]http://www.mersenne.org/ecm.htm[/url] in your Prime95 directory
4. start Prime95 again

If the ECM= line is first in worktodo.ini, Prime95 should say that it's starting ECM on P791.

The other numbers (2,951+, 2,969+, 2,993+) work the same, except you'd replace the 791 in the "ECM=" line accordingly.

Alex[/QUOTE]


Ah! I am not familiar with the process either, as I use just GMP-ECM;

I just put the number(s) in ASCII, in a file, and type

ecm b1_limit < file > outputfile. I may sometimes modify the step2 limit as
well. Of course, this is called repeatedly by a script.

smh 2005-04-14 15:00

[QUOTE=R.D. Silverman]Ah! I am not familiar with the process either, as I use just GMP-ECM;

I just put the number(s) in ASCII, in a file, and type

ecm b1_limit < file > outputfile. I may sometimes modify the step2 limit as
well. Of course, this is called repeatedly by a script.[/QUOTE]

There is a -C option to tell gmp-ecm how many curves to run

akruppa 2005-04-14 15:17

...but the -c must be lowercase

Alex

Xyzzy 2005-04-14 15:19

I wasn't clear enough...

:smile:

I know how to set up the work... I just want someone to tell me what work to do...

For example, say we are working on x... I need someone to simply say:

"I need someone to run 1000 curves on x at y digits depth and report back here!"

And then I can say:

"Okay, I'll do it, and I'll report back soon!"

Setting up the work is easy... Deciding what work needs to be done is what confuses me...

Or we could keep track like we do in the M1061 thread...

Coordinating work in this thread would only take a few minutes and sure would make it easier for people like me... If this is unreasonable please let me know... I don't want to complicate things...

I know Bob has indicated in the past that perhaps we should spend less time on M1061 and more time on other stuff... I'm willing to do so, if someone (Bob?) can make it easy and fun, like it is in the M1061 thread... The only reason I continue to work on M1061 is because the way we keep track of work is fun and simple...

I bet it would only take a few minutes and I bet more people would participate...

:bow:

akruppa 2005-04-14 17:37

Ah, so it was me who misunderstood...

Ok, to complete the p45 level, we need

8089 more curves at B1=11M for P791,
10420 more curves at B1=11M for P951,
4357 more curves at B1=11M for P969,
10420 more curves at B1=11M for P993.

So, grab a bunch of curves while they're fresh! :smile:

Alex

Xyzzy 2005-04-14 18:32

Excellent! I'll start on 2,791+ then...

:cat:

hlaiho 2005-04-14 19:08

[QUOTE=Xyzzy]Excellent! I'll start on 2,791+ then...

:cat:[/QUOTE]

If you're starting to do ECM work with Prime 95 on number 2^791+1, it's recommeded to do number 2^1582-1, because it's faster so. (2^791-1 is already completely factored.)

In the worktodo.ini file 7th value must be 0.

Good luck!

Heikki

akruppa 2005-04-14 19:11

Not with Prime95 version 24. It now support 2^k and 3*2^k FFT lengths on both SSE2 and non-SSE2 machines for 2^n+1 numbers. So it is much better to work only on 2^791+1 now.

Alex

Xyzzy 2005-04-14 19:46

I'm using GMP-ECM... It takes 3.5 minutes per curve...

akruppa 2005-04-14 19:51

Which cpu type are you using?

flava 2005-04-14 22:05

I'll give a try to 2,951+

Xyzzy 2005-04-15 00:13

[QUOTE=akruppa]Which cpu type are you using?[/QUOTE]A 2.4GHz AMD64...

geoff 2005-04-15 04:00

I will do some work on 2,1342L together with 2,1342M (working on 2^1342+1 should make good use of Prime95's length 64 FFT), but I will do 50 digit curves because 2,1342M has already been tested to 45 digits.

flava 2005-04-19 11:08

Done 1000 curves (45 digit) for 2,951+
I'll do at least 1000 more.

MrHappy 2005-04-19 16:14

600 curves done for 969. Reported nowhere but here.

[edit: typo corrected]

Xyzzy 2005-04-22 23:15

1 Attachment(s)
For 2,791+

3184 curves using B1=11e6 & B2=25577181640...

akruppa 2005-04-23 02:49

The required number of curves I posted above was for B2=100*B1; when using the GMP-ECM default B2, the numbers are lower. In terms of fractions, the required work is now

P791: 0.07 (319 GMP-ECM curves or 736 with Prime95)
P951: 0.98 (4512 or 10420)
P969: 0.41 (1887 or 4357)
P993: 0.98 (4512 or 10420)

Alex

Xyzzy 2005-04-23 13:03

[QUOTE=akruppa]P791: 0.07 (319 GMP-ECM curves or 736 with Prime95)[/QUOTE]I'll finish this up if nobody else is working on it...

Pi Rho 2005-04-25 20:16

I've completed ~400 curves with B1=11MM and B2=110MM for 2,791+ using Prime95, although not at 3.5 minutes per curve (more like 23 min. per curve with the Pentium 3 I'm using). I can go to the 736, which will take another 5 days or so at this rate, unless Xyzzy is done with them already.

flava 2005-04-25 21:41

Done another 1000 curves (45 digit) for 2,951+, no factor
It makes a total of 2000 curves. I hope it helps.

Xyzzy 2005-04-25 21:41

[QUOTE=Pi Rho]I can go to the 736, which will take another 5 days or so at this rate, unless Xyzzy is done with them already.[/QUOTE]Go ahead... I am having trouble getting the client to work right now...

:cat:

Pi Rho 2005-05-02 20:41

[Mon May 02 08:55:02 2005]
2^791+1 completed 843 ECM curves, B1=11000000, B2=1100000000

That should finish up 2,791+, at least to 45 digits. I'll pass this along to George.

Mike

geoff 2005-05-09 01:17

Bob, you can cross this one off your list: 2,1342L c170 = p48.p123
p48 = 312689963186011191785543941405234534125118895633

I found it after 507 curves with B1=43e6 B2=11e10. (mprime 24.11 + gmp-ecm 6.0.1). I will work on 2,1334L now.

Geoff.

dsouza123 2005-05-11 23:28

2^993+1 with Prime95
completed 120 ECM curves, B1=11000000, B2=1100000000

R.D. Silverman 2005-05-12 01:57

[QUOTE=geoff]Bob, you can cross this one off your list: 2,1342L c170 = p48.p123
p48 = 312689963186011191785543941405234534125118895633

I found it after 507 curves with B1=43e6 B2=11e10. (mprime 24.11 + gmp-ecm 6.0.1). I will work on 2,1334L now.

Geoff.[/QUOTE]

Nice! This certainly was easier with ECM......

trilliwig 2005-05-12 10:21

Done 1200 curves on 2,969+C164 with GMP-ECM, B1=11e6, B2=default.

MrHappy 2005-05-15 08:44

Another 800 curves on 969, B1=11000000

MrHappy 2005-05-21 09:33

600 more curves on 969.
That's it for now.

dsouza123 2005-05-21 16:46

2^993+1 with Prime95 10 ECM curves, B1=11000000, B2=1100000000

2^993+1 with Prime95 _2 ECM curves, B1=44000000, B2=4290000000

trilliwig 2005-05-22 10:13

Another 670 curves on 2,969+C164 with GMP-ECM, B1=11e6, B2=default. Um, I think I'll switch numbers now.

akruppa 2005-05-22 10:29

[QUOTE=flava]Done another 1000 curves (45 digit) for 2,951+, no factor
It makes a total of 2000 curves. I hope it helps.[/QUOTE]

flava, which B2 value did you use?

Alex

R.D. Silverman 2005-05-23 12:11

[QUOTE=trilliwig]Another 670 curves on 2,969+C164 with GMP-ECM, B1=11e6, B2=default. Um, I think I'll switch numbers now.[/QUOTE]


2,969+ has been completed to the 45 digit level. I will inform George.
Bruce Dodson just finished of 2,1193+. :banana:

I have finished sieving 2,737+ and will start 2,749+ later today.

xilman 2005-05-23 12:21

[QUOTE=R.D. Silverman]2,969+ has been completed to the 45 digit level. I will inform George.
Bruce Dodson just finished of 2,1193+. :banana:

I have finished sieving 2,737+ and will start 2,749+ later today.[/QUOTE]
NFSNET should switch to 2,751+ today or so. Wacky is in charge of the timing of the changeover but current stats suggest to me that we are about finished 11,212+


I've been doing 6,786L.c135 by GNFS on my home box. Sieving finished 2 days ago and the matrix started a few hours ago. It should take about 52 hours to run on my 1.3GHz Athlon. The matrix is 1.3M square and has 48 set bits per row.


Paul

flava 2005-05-23 18:46

[QUOTE=akruppa]flava, which B2 value did you use?

Alex[/QUOTE]

[Tue Apr 19 04:27:00 2005]
2^951+1 completed 1000 ECM curves, B1=11000000, B2=1100000000
[Mon Apr 25 09:54:13 2005]
2^951+1 completed 1000 ECM curves, B1=11000000, B2=1100000000

Mystwalker 2005-05-23 20:19

Keller and I are currently doing curves for P951 - I expect to be complete with 45 digits within a week or two.

So far, I did ~6% of the total curves needed. Around 4,000 Prime95 resp. 2,000 gmp-ecm curves are left...

bdodson 2005-06-13 15:58

P951: c182 = p47*c135
 
[QUOTE=R.D. Silverman]Hello to All;

I am thankful for the help people have given in applying ECM efforts to
the 2+ table.
....

Before I go further, I ask that some additional ECM effort be expended in
particular on the following numbers: (to at least the 45 digit level)

2,791+, 2,951+, 2,969+, 2,993+, 2,1322M, 2,1334L, 2,1342L.

[/QUOTE]

I'm replying to an email from Bob about running the
first four of the above. I was able to report having already
completed a t45 on all four. After some further discussion,
I offered the view that P791 was already tested sufficiently
to be sieved, but the other three could do with some further
testing. The reason being that my t45 test mostly consisted
of c. 600 ecm6 curves with B1=110M on these, while the
2,7xx+ had 2500 ecm5 curves with B1=43M (which seemed
less likely to miss smallish prime factors).
After setting a curve count in B1=110M curves, I ended
up running B1=260M curves instead, and found the p47 on the
160th curve. Anyone know how much of a t45 has been run/
reported here on the other two, P969 and P993?
And a belated Greetings! to the forum.
B. Dodson (p57 from M997 c301 using prime95;
p59 and p66 with gmp-ecm6, not to mention
the p62 that wasn't a record!)
-----------------------

p47 = 40740225338758415715380694619267076907012248419
GMP-ECM 6.0 [powered by GMP 4.1.90] [ECM]

Using B1=260000000, B2=2317234062121, polynomial Dickson(30), sigma=2953010074
Step 1 took 2347474ms
Step 2 took 806954ms

1804.125 Opteron x10cpu, with Torbjorn's x86_64 gmp development
code using AMD 64-bit assembler

rogue 2005-06-13 16:04

[QUOTE=bdodson]Anyone know how much of a t45 has been run/
reported here on the other two, P969 and P993?[/QUOTE]

Hi Bruce. I believe that what you are looking for can be found here:

[url]http://www.mersenne.org/ecmp.htm[/url]

dsouza123 2005-06-13 20:38

2^993+1 with Prime95 __2 ECM curves, B1=11000000, B2=1100000000

2^993+1 with Prime95 250 ECM curves, B1=44000000, B2=4290000000

bdodson 2005-06-14 03:57

on to the 2LM's?
 
[QUOTE=rogue]Hi Bruce. I believe that what you are looking for can be found here:

[url]http://www.mersenne.org/ecmp.htm[/url][/QUOTE]

Ah, you'd think that I would have looked there first; just there
didn't seem to be any recent activity on this thread. But now that
I have the current counts, the three remaining Pn's on Bob's list
have already been over-run. Montgomery reports that these
have snfs difficulty 194; 199 and 204 (with the one that factored
at 190; figures that the hard ones would be left), which is likely
the reason Bob's chosen them. The effort spent on ecm pretesting
is way past the optimal percentage of the sieving time, so perhaps
Bob agrees that he's willing to risk sieving without further ecm, freeing
up the curves for something more likely to have a factor in ecm range?

If so, do we know where testing on the other three,

> 2,1322M, 2,1334L, 2,1342L

stands? Incidently, an update on this morning's post, Sean Irving
has added the new c135 to his gnfs queue, so we may reliably expect
factors in the near future.
Bruce

rogue 2005-06-14 12:47

[QUOTE=bdodson]
If so, do we know where testing on the other three,

> 2,1322M, 2,1334L, 2,1342L

stands?[/QUOTE]

Both have enough curves at B1=11e6. 1334L has 40 curves at B1=43e6 and 630 at 11e7.

2,1342L had this factor submitted recently 312689963186011191785543941405234534125118895633. I believe that the cofactor 312119007097134742328978209700769058421902458121821582805367454109670158835039663246756558186935608062761438029523087241093 is prime.

Mystwalker 2005-06-14 13:26

According to Dario Alpern's applet, it is.

R.D. Silverman 2005-06-14 13:54

[QUOTE=bdodson]Ah, you'd think that I would have looked there first; just there
didn't seem to be any recent activity on this thread. But now that
I have the current counts, the three remaining Pn's on Bob's list
have already been over-run. Montgomery reports that these
have snfs difficulty 194; 199 and 204 (with the one that factored
at 190; figures that the hard ones would be left), which is likely
the reason Bob's chosen them. The effort spent on ecm pretesting
is way past the optimal percentage of the sieving time, so perhaps
Bob agrees that he's willing to risk sieving without further ecm, freeing
up the curves for something more likely to have a factor in ecm range?

If so, do we know where testing on the other three,

> 2,1322M, 2,1334L, 2,1342L

stands? Incidently, an update on this morning's post, Sean Irving
has added the new c135 to his gnfs queue, so we may reliably expect
factors in the near future.
Bruce[/QUOTE]


Hi,

I agree that we have run enough trials on the 2+ numbers. I am about
70% done with 2,749+. 2,791+ is queued. I will then do 2,969+ and
2,993+ before doing the LM's.

I only have 3 PC's..... (and one is part time)

The recent (and excellent!) work by Bruce still shows that there are
p45 to p50 factors remaining in the base 2 tables (he just found 3 more,
finishing 2,1091- and 2,943+ and finding a p49 of 2,1157+) :banana:

I wish I had his resources......

bdodson 2005-06-15 07:21

two 2LM first holes
 
[QUOTE=rogue]Both have enough curves at B1=11e6. 1334L has 40 curves at B1=43e6 and 630 at 11e7.

2,1342L had this factor submitted recently 312689963186011191785543941405234534125118895633. I believe that the cofactor 312119007097134742328978209700769058421902458121821582805367454109670158835039663246756558186935608062761438029523087241093 is prime.[/QUOTE]

Ah, these are the 2LM's with smallest exponent; 2,1322M C175 and
2,1334M C186. And those curve counts sound rather familiar, since
they're my curves? I was wondering about curves run by someone else,
perhaps in response to this thread. Also, the C175 "curves at B1=11e6"
were 2500 curves at B1=43e6, for c. 1.75-times a test to p45. By
contast, the 40 curves at B1=43M and 630 at 11e7 was my equivalent of
"curves to B1=11e6," rather than an addition to a previous test to p45.
Or do you know of someone else (possibly a group of someones) having
run a complete test to p45 on either of these numbers? The previous
2+ counts should be OK, since neither you nor Bob mentioned translating
my curve counts and reporting them to George?

I was happy to share curve counts for other people running curves to
know which limits to set, but ambiguities leading to double counts aren't
good --- in particular, for deciding whether numbers are ready for sieving.

Bruce

rogue 2005-06-15 12:44

[QUOTE=bdodson]2,1334M C186. And those curve counts sound rather familiar, since
they're my curves? I was wondering about curves run by someone else,
perhaps in response to this thread. Also, the C175 "curves at B1=11e6"
were 2500 curves at B1=43e6, for c. 1.75-times a test to p45. By
contast, the 40 curves at B1=43M and 630 at 11e7 was my equivalent of
"curves to B1=11e6," rather than an addition to a previous test to p45.
Or do you know of someone else (possibly a group of someones) having
run a complete test to p45 on either of these numbers? The previous
2+ counts should be OK, since neither you nor Bob mentioned translating
my curve counts and reporting them to George?[/QUOTE]

Yes, those are your curve counts. You indicated to me (via PM) that all composites under 176 digits had enough curves at 11e6. Did I misintepret your statement? I don't have the PM in front of me, so I do recall your exact wording.

Paul Zimmerman's site doesn't indicate that additional curves were run. I'm unaware of anyone else running curves for those numbers. If they are, then they haven't reported them in the ECM Status forum. I have not sent any curve counts to George. When garo returns from vacation, we will try to get the curve counts (for all composites) straightened out and get the info to George and Paul.

geoff 2005-06-15 16:25

[QUOTE=bdodson]Ah, these are the 2LM's with smallest exponent; 2,1322M C175 and 2,1334M C186. And those curve counts sound rather familiar, since they're my curves? I was wondering about curves run by someone else, perhaps in response to this thread.[/QUOTE]

On 2,1334L I have done 1763 curves at B1=11e6 and 202 curves at B1=43e6 that don't show in the c120-355 curve counts yet.

edit: Also, on 2,1342M (now third hole) I have done 600 curves at B1=43e6.

R.D. Silverman 2005-06-15 16:34

[QUOTE=bdodson]Ah, these are the 2LM's with smallest exponent; 2,1322M C175 and
2,1334M C186. And those curve counts sound rather familiar, since
they're my curves? I was wondering about curves run by someone else,
perhaps in response to this thread. Also, the C175 "curves at B1=11e6"
were 2500 curves at B1=43e6, for c. 1.75-times a test to p45. By
contast, the 40 curves at B1=43M and 630 at 11e7 was my equivalent of
"curves to B1=11e6," rather than an addition to a previous test to p45.
Or do you know of someone else (possibly a group of someones) having
run a complete test to p45 on either of these numbers? The previous
2+ counts should be OK, since neither you nor Bob mentioned translating
my curve counts and reporting them to George?

I was happy to share curve counts for other people running curves to
know which limits to set, but ambiguities leading to double counts aren't
good --- in particular, for deciding whether numbers are ready for sieving.

Bruce[/QUOTE]


I did translate the curve counts (from your email) and sent them to George.
He has already added them into his tables.

Prime95 2005-06-15 16:44

[QUOTE=R.D. Silverman]I did translate the curve counts (from your email) and sent them to George.
He has already added them into his tables.[/QUOTE]

Based on that email I added 6000 curves at 44 million to the tables (and marked 11,000,000 done). So any curves at 44,000,000 in excess of 6,000 was done by other users.

bdodson 2005-06-16 17:38

counting confusion(s)
 
[QUOTE=rogue]Yes, those are your curve counts. You indicated to me (via PM) that all composites under 176 digits had enough curves at 11e6. Did I misintepret your statement? I don't have the PM in front of me, so I do recall your exact wording.

Paul Zimmerman's site doesn't indicate that additional curves were run. I'm unaware of anyone else running curves for those numbers. If they are, then they haven't reported them in the ECM Status forum. I have not sent any curve counts to George. When garo returns from vacation, we will try to get the curve counts (for all composites) straightened out and get the info to George and Paul.[/QUOTE]

There is an ambiguity about reporting 11e6 as being done, based on
translations of curves run with larger limits. For the c192, my email reported
the 40 curves with B1=43M and c. 600 curves with B1=110M (all ecm6
curves). My intention would have been to have 11e6 marked as "no
additional curves needed" --- "not needed", for short. But when I
read "DONE" in the 11e6 column of ecmp.htm, I took that to mean that
someone actually ran 10,600 curves with b1=11M (or some translation
of that, to curves with B2=100*B1), in addition to the curves I ran.

George's post (if I'm now reading correctly), says P969 had (an equivalent
of) 10753 - 6000 = 4753 B1=44M total curves (including, perhaps, some
combination of 11M curves and 44M curves), with the "Done" in the
11e6 column to be ignored. I suppose you and garo had better follow
that format as well! Likewise, P993 has 8155 - 6000 = 2155. Since
CWI/Montgomery are sieving numbers of this difficulty after c. 25% of
a test for p50's, which is c. 5000 b1=44M with b2=100*b1 curves, the
ecm pretesting conclusion on these two seems to be OK. (For a better
margin, 33% would have been c. 6500, and we're also past that.)
[Since Bob has mis-reported my counts on these numbers (see below),
P969 is OK, but P993 may still be a bit short.]

This reduces the unknowns here to how Bob arrived at 6000 B1=44M
curves (and a few minor issues such as why he neglected to wait for
my reply when he said "I'll translate these counts and send them to
George, OK?"). For readers aside from Mark and Bob, the 6000's added
at various locations in the Mn and Pn tables are supposed to mean, first,
that the remaining cofactor is between c136 and c195. The actual curves
and limits depend upon 3 subdivisions (five if one wishes to be picky
and/or precise), c136-c145, c146-c175 and c176-c195; but the executive
summary is that all of these had enough curves run to mark 11e6 as
"Done". If that means 10600*11e6, then a factor of (11e6/44e6) applied
to 10600 would suggest 2650*44e6. Ooops. So Bob OUGHT to be
reporting 6000 curves on just one of the five divisions, c146-c155. For
c136-c145 and c176-c195, "Done" in 11e6 and 2650 in 44e6 is the
correct use of George's system, and for c156-c175 "Done and 4690".
(Bob's prefered factor of 2 for translation double counts the B2 difference
between prime95 and ecm6, since my report to him already included a factor
for that difference.) That's for base-2. Since the tables for larger
bases are using ecm6 curves, seems like keeping the original curves
(except for ecm5-to-ecm6 translation) is better.

About Paul's counts from the ecmnet page, only the c136-c145's have
been reported, since I was still working on c146-c175 and c176-c196
until recently.

Bruce (if you'd like to be really mislead, ask Bob about how well
block Lanczos on snfs/gnfs matrices parallelizes! See my colleague
Xilman's recent post for references. Or perhaps Bob's revised his
views since the RSA200 matrix? Sorry, this is off-thread.)

bdodson 2005-06-16 21:35

edit?
 
[QUOTE=bdodson]There is an ambiguity about reporting 11e6 as being done, based on
translations of curves run with larger limits. For the c192, my email reported
the 40 curves with B1=43M and c. 600 curves with B1=110M (all ecm6
curves). My intention would have been to have 11e6 marked as "no
additional curves needed" --- "not needed", for short. But when I
read "DONE" in the 11e6 column of ecmp.htm, I took that to mean that
someone actually ran 10,600 curves with b1=11M (or some translation
of that, to curves with B2=100*B1), in addition to the curves I ran.

George's post (if I'm now reading correctly), says P969 had (an equivalent
of) 10753 - 6000 = 4753 B1=44M total curves (including, perhaps, some
combination of 11M curves and 44M curves), with the "Done" in the
11e6 column to be ignored. I suppose you and garo had better follow
that format as well! Likewise, P993 has 8155 - 6000 = 2155. Since
CWI/Montgomery are sieving numbers of this difficulty after c. 25% of
a test for p50's, which is c. 5000 b1=44M with b2=100*b1 curves, the
ecm pretesting conclusion on these two seems to be OK. (For a better
margin, 33% would have been c. 6500, and we're also past that.)
[[This assumes that P969 and P993 were among the numbers Bob asked
George to add 6000 to, on my behalf. But they shouldn't have been.
Corrections could be made using other assumptions, but the point at
the moment is that I'm still not able to determine what the correct count
should be, or how my curves were recorded. ]]

This reduces the unknowns here to how Bob arrived at 6000 B1=44M
curves (and a few minor issues such as why he neglected to wait for
my reply when he said "I'll translate these counts and send them to
George, OK?"). For readers aside from Mark and Bob, the 6000's added
at various locations in the Mn and Pn tables are supposed to mean, first,
that the remaining cofactor is between c136 and c195. The actual curves
and limits depend upon 3 subdivisions (five if one wishes to be picky
and/or precise), c136-c145, c146-c175 and c176-c195; but the executive
summary is that all of these had enough curves run to mark 11e6 as
"Done". If that means 10600*11e6, then a factor of (11e6/44e6) applied
to 10600 would suggest 2650*44e6. Ooops. So Bob OUGHT to be
reporting 6000 curves on just one of the five divisions, c146-c155. For
c136-c145 and c176-c195, "Done" in 11e6 and 2650 in 44e6 is the
correct use of George's system, and for c156-c175 "Done and 4690".
(Bob's prefered factor of 2 for translation double counts the B2 difference
between prime95 and ecm6, since my report to him already included a factor
for that difference.) That's for base-2. Since the tables for larger
bases are using ecm6 curves, seems like keeping the original curves
(except for ecm5-to-ecm6 translation) is better.

About Paul's counts from the ecmnet page, only the c136-c145's have
been reported, since I was still working on c146-c175 and c176-c196
until recently.

Bruce (if you'd like to be really mislead, ask Bob about how well
block Lanczos on snfs/gnfs matrices parallelizes! See my colleague
Xilman's recent post for references. Or perhaps Bob's revised his
views since the RSA200 matrix? Sorry, this is off-thread.)[/QUOTE]

The second paragraph in my original post ought to end with the above
comment in brackets, [[...]], rather than what's in the brackets [...]
there.

Bruce

R.D. Silverman 2005-06-17 11:36

[QUOTE=bdodson]


<snip>


This reduces the unknowns here to how Bob arrived at 6000 B1=44M
curves (and a few minor issues such as why he neglected to wait for
my reply when he said "I'll translate these counts and send them to
George, OK?").

Bruce (if you'd like to be really mislead, ask Bob about how well
block Lanczos on snfs/gnfs matrices parallelizes! See my colleague
Xilman's recent post for references. Or perhaps Bob's revised his
views since the RSA200 matrix? Sorry, this is off-thread.)[/QUOTE]


You reported 3000 ecm6 curves in your email. It has been agreed
that 1 ecm5/ecm6 curve ~ 2 prime95 curves. That's where the
6000 comes from.

As for your implications that I mislead people about parallel block Lanczos,

(1) Both CWI and NFSNET reported low (less than 50%) per processor
utiliization and CWI has stated in public that communication costs dominate
the computation. Peter Montgomery has presented graphs showing how
the per-processor utiliization decreases (somewhat dramatically) as the
number of processors increase.

My public comments have simply stated that B-L does not parallelize well.
I stick by that comment and it is backed by data presented by others.

(2) And it is quite unprofessional of you to accuse me of misleading others in
the manner you did.

bdodson 2005-06-17 15:03

four of ?? numbers agreed; rest inaccurate/wrong
 
[QUOTE=Prime95]Based on that email I added 6000 curves at 44 million to the tables (and marked 11,000,000 done). So any curves at 44,000,000 in excess of 6,000 was done by other users.[/QUOTE]

Since Bob doesn't appear to have read my email, his report to you (submitted
without my agreement or approval) doesn't seem to me to be a postive
contribution to the counts reported on your pages ecmm.htm and ecmp.htm.
There are no numbers on which I ran 3000 ecm6 curves, and the 3000 ecm5
curves with B1=43M that I did run applies only to Cunningham composites
between 145- and 155-digits. There were, at that time, eight base-2
numbers, 2^n +/- 1 for n < 1200 in this range. Four have subsequently been
factored. So adding 6000 curves should at most have been applied to M1173
and P760, P820 and P1044.

If you were to email me any other numbers in Bob's email, I'd be happy to
supply a round number using Bob's factor of 2 that should replace the bogus
6000 value (if there were any numbers other than the above four in the
email you received). As I pointed out to Bob in the email he didn't wait
for (saying among other things that I'd be happier reporting my counts
myself), some care needs to be taken to be sure that a number in a given
range was actually in one of my input files. Of numbers that were run,
5000 would fit numbers from 156- to 175-digits, which is perhaps not so
serious a correction. Of numbers with 136- to 144-digits that I ran, Sean
Irvine's factorization of M949 finished the last number from the ecmm and
ecmp ranges (although there are some 20 numbers for other bases that
remain from the list of numbers run, not yet reserved by anyone). All of
these curves were run on a cluster of 1.1 Mhz Pentium 3.

The more recent Opteron curve counts are well below 6000 curves with
B1=44M, perhaps closer to 2650 curves, rounded to 2500, but with 11e6
(t45) marked as Done. Most numbers from 176- to 195-digits were run this
far (but not farther yet), the only exceptions being other people's recent
new composite cofactors. My recent base-2 factorizations were found by
applying just 165 ecm6 curves with B1=110M, which has finished on all of
c196-c384. I'm still working in this c196- range, so including counts seems
premature, but 3e6 (t40) is Done on the complete Cunningham list.

Bruce

ewmayer 2005-06-17 17:09

[QUOTE=R.D. Silverman]Both CWI and NFSNET reported low (less than 50%) per processor utiliization and CWI has stated in public that communication costs dominate the computation. Peter Montgomery has presented graphs showing how the per-processor utilization decreases (somewhat dramatically) as the number of processors increase.[/QUOTE]

Bob, I'd be very interested in seeing some version (even a simple tabular one) of these data. Did Peter try different processor types, or were all the runs on the same kind of hardware? (I seem to recall CWI using a big SGI-based system for their most memory-intensive work, but perhaps they have other multi-CPU systems as well).

R.D. Silverman 2005-06-17 17:39

[QUOTE=ewmayer]Bob, I'd be very interested in seeing some version (even a simple tabular one) of these data. Did Peter try different processor types, or were all the runs on the same kind of hardware? (I seem to recall CWI using a big SGI-based system for their most memory-intensive work, but perhaps they have other multi-CPU systems as well).[/QUOTE]

Peter presented his information at the annual RSA crypto conference
back in 2000. I used to have a soft copy of his slides, but it got left
on my PC when I was laid off at RSA.

I do have a hard copy of his slides. His hardware was 16 dual CPU pentium II's
on a 100Mb/sec Ethernet. (and a second set of numbers on a 1.28Gb/sec
Myrinet)

RSA should still have a copy plus
perhaps a cassette tape recording of his talk.

With 1 processor the time per iteration was 7 sec. With 8 processors,
it dropped to 4 seconds on Ethernet and 2 sec on Myrinet.

-- Note that even on the faster network an 8-fold increase in processors
only reduced run time fourfold.


With another 4 processors (12 total) it dropped to 3+ sec on Ethernet and
~1.6 sec on Myrinet. With another 4, it dropped to 3 sec on Ethernet
and 1.5 sec on Myrinet. This CLEARLY shows a sharp reduction in
per-processor utilization as the number of processors increases.

If you send me your snail mail address, I will Xerox my hard copy and send it to you.


I have stated in public that "parallel efforts have met with some, but not
large success".

Paul Leyland reported to me that the cluster he used (~30 procs on a gigabit
network) was getting only about 50% per processor utilization. You can
get exact numbers from Paul.

What justifies Bruce's accusation that I have misrepresented the situation?
His off topic remark was totally unjustified.

ewmayer 2005-06-17 20:17

Thanks for the numbers, Bob - no need to send me the gory details, I was interested in roughly how fast the per-CPU utilization dropped off with increasing #CPUs and your data answer that question in as much detail as I desire. That's interesting about the 50% utilization point (a rather arbitrary but nonetheless useful threshold for deciding whether throwing more CPUs at a problem is likely to be worthwhile) occurring with as few as 8 CPUs even on some systems with GB/sec networking - I had always been under the impression that large-scale linear algebra of most kinds (e.g. Gaussian elimination, QR for finding of eigenvalues, and such) could be parallelized extremely effectively - this is clearly a major counterexample.

Note that my question was strictly related to the block-Lanczos parallelization problem (first I'd heard of this) and unrelated to Bruce's subjective remarks, although I agree the latter types of issues are the kind that should be broached in private - last thing we need is another public flame war.

jasonp 2005-06-18 18:55

[QUOTE=ewmayer]I had always been under the impression that large-scale linear algebra of most kinds (e.g. Gaussian elimination, QR for finding of eigenvalues, and such) could be parallelized extremely effectively - this is clearly a major counterexample.[/QUOTE]
The enormous performance numbers are typically for dense systems where there's much more computation than communication. Linear systems derived from factorization algorithms are always sparse nowadays, and the most stressful computations they do are matrix-vector products whose computation-to-communication ratio is even lower.

Sparse linear system solvers for floating-point problems reorder the rows and columns of the matrix to cluster blocks of nonzero elements together, then switch to dense matrix routines as much as possible when actually performing Gaussian elimination. The algorithms they use look to me to be more advanced than the typical NFS filtering phase.

So I wonder: NFS filtering tries to produce a matrix with as few nonzero elements as possible. Is it feasible to modify the filtering phase not to optimize the matrix weight but the position of the weights? I.e. the matrix produced is more dense than usual but also more clustered, so that the matrix multiplications that block lanczos uses spend more time in cache. If things can be packed tightly enough, perhaps an implementation can divide the matrix into sparse blocks and dense blocks stored as bit-vectors, to lower the real storage used.

Or perhaps this is already done, and I just didn't know it.

jasonp

bdodson 2005-06-19 04:04

more data (off topic)
 
[QUOTE=R.D. Silverman]Peter presented his information at the annual RSA crypto conference
back in 2000. I used to have a soft copy of his slides, but it got left
on my PC when I was laid off at RSA.

I do have a hard copy of his slides. His hardware was 16 dual CPU pentium II's
on a 100Mb/sec Ethernet. (and a second set of numbers on a 1.28Gb/sec
Myrinet)

... This CLEARLY shows a sharp reduction in
per-processor utilization as the number of processors increases.

If you send me your snail mail address, I will Xerox my hard copy and send it to you.

I have stated in public that "parallel efforts have met with some, but not
large success".

Paul Leyland reported to me that the cluster he used (~30 procs on a gigabit
network) was getting only about 50% per processor utilization. You can
get exact numbers from Paul.

What justifies Bruce's accusation that I have misrepresented the situation?
His off topic remark was totally unjustified.[/QUOTE]

I'll admit to being inappropriately annoyed at Bob's enthusiasm for sharing
his data and view on one of the computational issues related to the security
of RSA, the matrix step in the number field sieve. In fact, when Bob sent
me a factoring paper last year, I checked to see whether he'd revised his
views since the last I heard, and when I found him repeating the above,
found myself unable to take anything else in the paper seriously.

I don't dispute Bob's quote of CWI to the effect that communication
dominates the computation. But Meyer's report of recalling CWI using
an sgi system is closer to my point. I don't need lost files from RSA Data,
we have an early post from Peter to the Mersenne list, Sept 7,2001 about
the factorization of M727. I'd try digging it out, but for the fact that I
keep a copy on my own web page, and here's what Peter's post says:

> SARA, a Dutch High-Performance Computing Center in Amsterdam,
> had been down August 13-20, to enlarge its network (teras)
> of shared memory 500 MHz MIPS R14000 processors.
> Peter Montgomery had written a parallel MPI-based version of
> his Block Lanczos linear algebra code earlier in 1999-2001.
> Happily, this code functioned well on the new hardware configuration,
> ....
> The largest matrix run on teras, for 2^727 - 1, was 3847967 x 3862821.
> This took 99153 real-time seconds (about 27.5 hours) using 25 processors.
> The estimated single-processor time is 1200000 seconds (330 hours)
> for that same matrix on the same hardware, so the job got almost 50%
> utilization of the 25 processors.

This was before everyone was running the last Mn without factor, M1061,
and this M727 was on George's list. Arjen Lenstra and I spent 7 months
sieving it before sending the matrix to CWI, so I have a rather good
recollection of the processor utilization. This was before I factored one of
the others, with ecm/prime95, M997 = c301 = p57*prime_cof, before Franke
took on running M809 by snfs (one of the others), and before M971 factored
by ecm/prime95, leaving just the last M1061.

Likewise, I don't dispute Bob's conversation with Leyland, but I did wonder,
when I saw a private conversation referenced in Bob's paper, whether he
asked Paul about having the above stand as his summary of the NFSNET
experience running large matrices on the MS cluster? I've worked with
Paul on RSA130, RSA140, RSA155=RSA512 (and you won't find Bob listed
as a co-author with us on the papers), and every time I spoke with Paul
about running large matrices his view was that he'd be able to run anything
that came along, and in the unlikely event of one being too large to run on
his cluster he'd get access to a larger one (that would be, one with more
processors, if anyone wonders).

Our most recent conversation on this topic was at the CWI/Utrecht
workshop, where we heard the state-of-the-art described by Franke.
At that time, the three largest snfs/gnfs matrices were the M809 matrix
(referred to above), which was done on CWI's sgi system and the larger,
harder matrices needed for RSA160 and RSA576, done at Bonn. What's
happened since? Both Franke records were broken using yet another block
Lanczos at NTT, by Aoki, first a 248-bit snfs using 16 pentium4's for the
matrix, then c176 gnfs using 36 pentium4's. Of course, the latter was just
before Franke reclaimed the gnfs, breaking RSA200 using Block Wiedemann
on a cluster of 80 Opterons --- he skipped RSA640, just as he'd said he
would in conversation at Utrecht.

So we have Bob's CWI data before their sgi use, and his use of Paul's
comment that I'd bet came early in NFSNET, before their best computations;
and then five years of steady improvement in parallel matrix computation.
Meanwhile, Bob's still quoting the data from five years ago, as if they had
been the final word on the subject. Now that's what I'd call unprofessional.
I'll apologize to the forum for baiting Bob by characterizing his view as
being misleading.

Bruce (So Bob, would you mind sending me a copy of the version of
my ecm curve counts that you sent to George? Email would be
best, but I'll go with whatever you prefer.)

R.D. Silverman 2005-06-19 18:42

[QUOTE=bdodson]

"for that same matrix on the same hardware, so the job got almost 50%
> utilization of the 25 processors."



[/QUOTE]


'Almost 50%'. This is still a sharp under-utilization of the processors.

I do not dispute that people have done larger matrices. However, I have
not seen any of the various people involved publish data on per-processor
utilization (of their various implementations) as the number of processors
changes.

Since Peter prsented his data, computers have gotten faster, as have
networks. But I have seen no new graphs showing speed-up as a function of
the number of processors.

If someone presents data that shows *other* than almost hyperbolic
drop-off in processor utilization as the number of processors increases I
will change my view.

We can speed sieving by as much as we want. Of course we would reach
a limit as the number of processors approached the number of sieve lines,
but speedup remains linear even up to using 10's of thousands of processors.

However, for *loosely coupled* networks of workstations, the inter-
processor latencies cause the computation-to-communication time ratios
to be low. Until this improves, network implementations of Block Lanczos
and Block Wiedemann simply *will not scale*.

And for the record, I have considerable experience in implementing
linear algebra routines on parallel machines (e.g. old Intel hypercube,
BBN Butterfly, Symult S2010, CM2, CM5, Intel Paragon). The big
problem is not bandwidth, but LATENCY. And socket latching mechanisms
for network implementations put a lower bound on latency. The mechanisms
by which operating systems send out data through sockets is too slow.
I'd like to see data using a modern, tightly coupled (dedicated fast network
without sockets) parallel computer. One with latencies in the 10 microsecond,
rather than millisecond range.

Noone denies that we have done progressively larger matrices. But we
also have bigger and faster machines and networks.

I still want to see data that shows that network parallel BL and BW can
*scale*, before I will change my position. When someone shows (say)
greater than 50% utilization on a *large* network (say 256 machines) I
will also change my position.

As for your not taking my paper seriously because of my views on this
subject, I find that rather silly, since my paper does not deal with this
subject. You say in essence "I don't agree with Bob regarding his position
on the linear algebra, so I won't take his mathematics on optimization seriously".

xilman 2005-06-20 08:41

[QUOTE=R.D. Silverman]Peter presented his information at the annual RSA crypto conference
back in 2000. I used to have a soft copy of his slides, but it got left
on my PC when I was laid off at RSA.
...

I have stated in public that "parallel efforts have met with some, but not
large success".

Paul Leyland reported to me that the cluster he used (~30 procs on a gigabit
network) was getting only about 50% per processor utilization. You can
get exact numbers from Paul.[/QUOTE]
You have higher standards than many people working in the field.

I don't think anyone believes that linear algebra on sparse matrices is anywhere near as trivially parallelizable as sieving. Nonetheless, clusters have greatly increased the size of matrices which can be processed on reasonable hardware.

There are two main reasons, I believe. The first is that for a given number of machine cycles, it is much cheaper to have a dozen or few commodity machines than a single supercomputer with the same performance. The other, which historically has been much the same as the first, though the situation is changing, is that commodity machines are generally limited by their physical address space to a couple of gigabytes of RAM or so, and holding large matrices can require much more storage than that. Distributing the matrix over a number of 2G machines gets around the problem to some extent.

You may be able to get exact numbers from me, in principle, but I don't have them to hand. Experiments performed by a number of people, including Peter Montgomery and myself, suggest that the effective processing power scales roughly as the square root of the number of cpus, at least for a small number (<100 perhaps) of machines connected by a network. The situation may be different on shared memory machines but we haven't had access to such to be able to carry out meaningful experiments. I have noticed that a dual proc shared memory machine is noticeably faster than sqrt(2) times the speed of the same box using only one cpu.


Paul

bdodson 2005-06-20 16:26

proc. use over varying machines
 
[QUOTE=R.D. Silverman]'Almost 50%'. This is still a sharp under-utilization of the processors.

I do not dispute that people have done larger matrices. However, I have
not seen any of the various people involved publish data on per-processor
utilization (of their various implementations) as the number of processors
changes.

Since Peter prsented his data, computers have gotten faster, as have
networks. But I have seen no new graphs showing speed-up as a function of
the number of processors.

If someone presents data that shows *other* than almost hyperbolic
drop-off in processor utilization as the number of processors increases I
will change my view.
...

And for the record, I have considerable experience in implementing
linear algebra routines on parallel machines (e.g. old Intel hypercube,
BBN Butterfly, Symult S2010, CM2, CM5, Intel Paragon). The big
problem is not bandwidth, but LATENCY. And socket latching mechanisms
for network implementations put a lower bound on latency. The mechanisms
by which operating systems send out data through sockets is too slow.
I'd like to see data using a modern, tightly coupled (dedicated fast network
without sockets) parallel computer. One with latencies in the 10 microsecond,
rather than millisecond range.

Noone denies that we have done progressively larger matrices. But we
also have bigger and faster machines and networks.

I still want to see data that shows that network parallel BL and BW can
*scale*, before I will change my position. When someone shows (say)
greater than 50% utilization on a *large* network (say 256 machines) I
will also change my position.

As for your not taking my paper seriously because of my views on this
subject, I find that rather silly, since my paper does not deal with this
subject. You say in essence "I don't agree with Bob regarding his position
on the linear algebra, so I won't take his mathematics on optimization seriously".[/QUOTE]

OK, thanks for a coherent description of your position on this topic. (By
comparison to what I've seen on other threads, this hardly counts as
being off-topic.) I'm currently using an expensive loosely coupled cluster
(two actually, if you count the one with dual P3's on the nodes) to run
ECM, which completely wastes the most expensive parts, the communication
interconnects between nodes. That's 0.0% use of the data connects.
Even 25% processor use, with hyperbolic decay, would still be better use
of the parallel processing resource. (The reasons I'm able to use the time
for this purpose is that I run at absolutely bottom priority, and the Opteron
cluster PBS scheduling software isn't up yet -- the cluster hasn't quite
opened to the public for truely parallel jobs.)

About the CWI machine on which almost all of their parallel jobs were
run (of course a SMP machine; with a cputime budget that doesn't
encourage pursuit of topics only of academic interest), the utilization you're
objecting to ("only 50%") is again at the very start of their use of the
machine for parallel jobs --- we're still talking 2001. The largest matrix run
there that I know of was still on the same machine, but if I recall correctly,
was already done at the Lenstra retirement (H.W.) from Berkeley, March 2003
[before Utrecht]; namely the matrix for M809. The part of the sieving done
by Franke, et. al. used larger large-primes than the CWI siever (and both
RSA160 and RSA576 used larger bounds yet), so the matrix was a virtual
monster, 10M^2 rows-times-cols. This is usually referred to as the
244-digit snfs record, the one Aoki's 248-digit snfs broke. So there's our
tiny little M727 matrix, 3.8M^2, that used 27cpus in Sept 2001. Then
something like a year-and-a-half later, this monster matrix on the same
machine (with 512cpus, IIR). I don't know how many cpus were used, or
what the processor utilization was; and I gather no one else here does
either. But it's the performance of that computation we ought to be
discussing, anything earlier is outdated (by orders of magnitude); and
unless Peter's inclined to put checkmarks on Bob's graph, none of us
actually knows how scaling went (perhaps it's proprietary, even), and in
the absence of the data we ought to be clear that we don't know. Bob
is entitled to assert that "based on early data, on loosely couple hardware,
the CWI parallel Lanczos didn't scale well"; but we shouldn't be lead by
that (I'm being descriptive, not pejorative here) to conclude anything about
the SMP version (which is the one that matters, historically: early
experiments -vs- production runs) a year-and-a-half after the very first
substantial run blows away anything we'd seen before (25 cpus!!!, after,
for years, experts assuring that the matrix step was inherently un-
parallelizable, as a nontrivial component of the security of RSA). It was
Peter's program, but my M727 matrix (towards George's problem, now reduced
to just M1061); and I'd appreciate Bob adding the M727-matrix and the
M809-matrix to his description of the data (in the wish-we-had-data
column for M809).

That's two points: (1) people do far worse things, utilization-wise, than
pushing parallel programs out past their sweet-spot (0% vs 25% use);
and (2) the CWI/sgi computation Meyer ought to be thinking about visibly
is in a completely different range than the data supporting poor cpu
utilization that we're being referred to.

But the factoring records I was recalling above have a more salient point,
namely that subsequent records are rarely set on the same soft/hardware
(I mean "RSA130 to RSA155", with RSA140 being a warm-up for RSA155=
RSA512). We had snfs248 an entirely different matrix than the snfs244
matrix (on NTT's pc-cluster vs CWI's sgi/smp); and RSA576 -to- c176
-to- RSA200 for gnfs (heavy pc at Bonn, lighter pc at NTT, Opteron at
Bonn). Look at the variety of hardware on which parallel Lanczos/W. has
been acceptably run (in the view of the person using the software and
the person authorizing the hardware). We also had the 2nd c155 on a block
of 4 alphas. So that's CWI/sgi, the alphas, MSR, Bonn/Intel-pc, NTT/pc
and Bonn/Opteron. (Admittedly, the alpha version was never scaled-up,
but if we're using experience instead of data, my bet's on Torbjorn.)
That's a very wide collection, and the variety itself says something about
how well the parallelization works. My point here is that it's not how adding
cpus on a fixed machine works that matters, but how the new version runs
on the next new machine. The one we use for snfs1024 and the one for
RSA768 and for RSA1024. Does anyone believe those three matrices will
be done on the same machine?

Now for the record, I'm a Calculus instructor, and the TA's probably better
at explaining the Maple implementations. I do have c. five years of using
the CWI suite of programs (with special attention to the filter, and getting
matrices small enough to run on the old single cpu sgi/Lanczos). Our new
sgi/linux/IA64 that's replacing the old Origin3800/Irix/Mips won't be using
those, so I'll probably be looking for a new siever soon. In the meantime,
I have my attention fairly well occupied feeding input files into Torbjorn's
x86-64 Opteron binary of gmp-ecm6; with keeping track of curve counts
a crucial part of making best use of that resource -- it matters more to
me than it does to you. (A new 2L p44 today, c241 = p44*c197.) I've
also given considerable attention to the changing views of the security
of RSA, especially to the matrix step that we believe to be reduced to a
problem that's minor by comparision to sieving (that's Shamir-Lenstra-
Leyland-Dodson, estimating RSA1024; cf. Bob's view, above). Do results
in optimization depend upon making new judgements based on unexpected
new results (like Peter's crack of our M727-matrix was)? Maybe, maybe
not. To me, insisting that Lanczos doesn't parallelize is something like
insisting that the sky is orange or the earth flat --- I could take agnosticsm;
the data needed isn't available, so we don't know how well it scales on
a fixed machine -- but to take the most recent data-point, shifting sieving
effort over to matrix effort was a winning strategy for RSA200, and even
that matrix fell to parallelization. That means those of us building small
matrices (Aoki's for example) have a very long way to go before running up
against parallelization limits in the matrix step of snfs/gnfs. Lanczos (well,
Weidemann) parallelizes well enough -- at least, when considered over
changing machines, as is what's needed in practice.

Bruce (Thanks to Bob for the curve count email.)

R.D. Silverman 2005-06-24 13:41

[QUOTE=bdodson]OK, thanks for a coherent description of your position on this topic. (By
comparison to what I've seen on other threads, this hardly counts as
being off-topic.)

<snip>


[/QUOTE]

Thanks for your comments.

I would like to add one observation that, while somewhat "qualitative"
supports my view.

If one observes the reported times for factorizations over the last few years
one can see that the *Elapsed* time ratio between the linear algebra and
sieving has increased. Some efforts have reported : LA Elapsed Time/Sieve Elapsed Time of over .25.

This *suggests* that improvements in parallel LA have not kept pace with improvements in sieving.

We have thrown hundreds of machines at the sieving step. I'd like to see
a successful LA step using a similar number of machines.


All times are UTC. The time now is 16:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.