![]() |
ECM Efforts
Hello to All;
I am thankful for the help people have given in applying ECM efforts to the 2+ table. I am in the process of doing 2,737+ via NFS and will then do 2,749+. Finishing these will take me about 3 months. Before I go further, I ask that some additional ECM effort be expended in particular on the following numbers: (to at least the 45 digit level) 2,791+, 2,951+, 2,969+, 2,993+, 2,1322M, 2,1334L, 2,1342L. Thanks in advance for any help. |
I'd be willing to do some of this work if you could simplify it a bit... As it is now, I would have to refer to the various tables and try to determine how far each one has been done, and in the process hope that the work isn't being duplicated...
Is it possible for you to just ask for x curves for a particular number at a particular depth? I could set up something like that in a matter of minutes and walk away... One reason I have been doing M1061 work is because it is so simple, and I have very little free time to devote to figuring out other work... Well that, and the fact I have no clue what I am doing... Having a dedicated thread for M1061 makes it so easy to coordinate work... If you want to correspond via email about this work or other work you need done feel free to email me at xyzzyATmersenneforumDOTorg or send me a PM... Or maybe setting up this work is a lot simpler than I have been doing it? I'd be willing to learn if there is a better way... |
[quote=Xyzzy]I'd be willing to do some of this work if you could simplify it a bit... As it is now, I would have to refer to the various tables and try to determine how far each one has been done, and in the process hope that the work isn't being duplicated...
Is it possible for you to just ask for x curves for a particular number at a particular depth? I could set up something like that in a matter of minutes and walk away... One reason I have been doing M1061 work is because it is so simple, and I have very little free time to devote to figuring out other work... Well that, and the fact I have no clue what I am doing... Having a dedicated thread for M1061 makes it so easy to coordinate work... If you want to correspond via email about this work or other work you need done feel free to email me at xyzzyATmersenneforumDOTorg or send me a PM... Or maybe setting up this work is a lot simpler than I have been doing it? I'd be willing to learn if there is a better way...[/quote] I must be having a stupidity attack because I don't understand the request to 'simplify'. I don't keep track of exact curve counts; George Woltman has a web page that gives them. They are just a click away. It is trivial to determine how many more are required to finish ECM to a given level. Or have I mis-understood you? |
I believe Xyzzy is asking for step-by-step instructions how to set up the programs so they attack the numbers you requested as he is not familiar with the process.
Xyzzy, I suggest doing the numbers entirely in Prime95 because it's somewhat easier than resuming residues with GMP-ECM. The Aurifeullians (the ones with L or M) are better done with GMP-ECM. So, to do 1000 curves on 2,791+, you should 1. stop Prime95 2. add the line ECM=791,11000000,0,1000,0,0,1 to the worktodo.ini file 3. make sure you have the lowp.txt file from [url]http://www.mersenne.org/ecm.htm[/url] in your Prime95 directory 4. start Prime95 again If the ECM= line is first in worktodo.ini, Prime95 should say that it's starting ECM on P791. The other numbers (2,951+, 2,969+, 2,993+) work the same, except you'd replace the 791 in the "ECM=" line accordingly. Alex |
[QUOTE=akruppa]I believe Xyzzy is asking for step-by-step instructions how to set up the programs so they attack the numbers you requested as he is not familiar with the process.
Xyzzy, I suggest doing the numbers entirely in Prime95 because it's somewhat easier than resuming residues with GMP-ECM. The Aurifeullians (the ones with L or M) are better done with GMP-ECM. So, to do 1000 curves on 2,791+, you should 1. stop Prime95 2. add the line ECM=791,11000000,0,1000,0,0,1 to the worktodo.ini file 3. make sure you have the lowp.txt file from [url]http://www.mersenne.org/ecm.htm[/url] in your Prime95 directory 4. start Prime95 again If the ECM= line is first in worktodo.ini, Prime95 should say that it's starting ECM on P791. The other numbers (2,951+, 2,969+, 2,993+) work the same, except you'd replace the 791 in the "ECM=" line accordingly. Alex[/QUOTE] Ah! I am not familiar with the process either, as I use just GMP-ECM; I just put the number(s) in ASCII, in a file, and type ecm b1_limit < file > outputfile. I may sometimes modify the step2 limit as well. Of course, this is called repeatedly by a script. |
[QUOTE=R.D. Silverman]Ah! I am not familiar with the process either, as I use just GMP-ECM;
I just put the number(s) in ASCII, in a file, and type ecm b1_limit < file > outputfile. I may sometimes modify the step2 limit as well. Of course, this is called repeatedly by a script.[/QUOTE] There is a -C option to tell gmp-ecm how many curves to run |
...but the -c must be lowercase
Alex |
I wasn't clear enough...
:smile: I know how to set up the work... I just want someone to tell me what work to do... For example, say we are working on x... I need someone to simply say: "I need someone to run 1000 curves on x at y digits depth and report back here!" And then I can say: "Okay, I'll do it, and I'll report back soon!" Setting up the work is easy... Deciding what work needs to be done is what confuses me... Or we could keep track like we do in the M1061 thread... Coordinating work in this thread would only take a few minutes and sure would make it easier for people like me... If this is unreasonable please let me know... I don't want to complicate things... I know Bob has indicated in the past that perhaps we should spend less time on M1061 and more time on other stuff... I'm willing to do so, if someone (Bob?) can make it easy and fun, like it is in the M1061 thread... The only reason I continue to work on M1061 is because the way we keep track of work is fun and simple... I bet it would only take a few minutes and I bet more people would participate... :bow: |
Ah, so it was me who misunderstood...
Ok, to complete the p45 level, we need 8089 more curves at B1=11M for P791, 10420 more curves at B1=11M for P951, 4357 more curves at B1=11M for P969, 10420 more curves at B1=11M for P993. So, grab a bunch of curves while they're fresh! :smile: Alex |
Excellent! I'll start on 2,791+ then...
:cat: |
[QUOTE=Xyzzy]Excellent! I'll start on 2,791+ then...
:cat:[/QUOTE] If you're starting to do ECM work with Prime 95 on number 2^791+1, it's recommeded to do number 2^1582-1, because it's faster so. (2^791-1 is already completely factored.) In the worktodo.ini file 7th value must be 0. Good luck! Heikki |
Not with Prime95 version 24. It now support 2^k and 3*2^k FFT lengths on both SSE2 and non-SSE2 machines for 2^n+1 numbers. So it is much better to work only on 2^791+1 now.
Alex |
I'm using GMP-ECM... It takes 3.5 minutes per curve...
|
Which cpu type are you using?
|
I'll give a try to 2,951+
|
[QUOTE=akruppa]Which cpu type are you using?[/QUOTE]A 2.4GHz AMD64...
|
I will do some work on 2,1342L together with 2,1342M (working on 2^1342+1 should make good use of Prime95's length 64 FFT), but I will do 50 digit curves because 2,1342M has already been tested to 45 digits.
|
Done 1000 curves (45 digit) for 2,951+
I'll do at least 1000 more. |
600 curves done for 969. Reported nowhere but here.
[edit: typo corrected] |
1 Attachment(s)
For 2,791+
3184 curves using B1=11e6 & B2=25577181640... |
The required number of curves I posted above was for B2=100*B1; when using the GMP-ECM default B2, the numbers are lower. In terms of fractions, the required work is now
P791: 0.07 (319 GMP-ECM curves or 736 with Prime95) P951: 0.98 (4512 or 10420) P969: 0.41 (1887 or 4357) P993: 0.98 (4512 or 10420) Alex |
[QUOTE=akruppa]P791: 0.07 (319 GMP-ECM curves or 736 with Prime95)[/QUOTE]I'll finish this up if nobody else is working on it...
|
I've completed ~400 curves with B1=11MM and B2=110MM for 2,791+ using Prime95, although not at 3.5 minutes per curve (more like 23 min. per curve with the Pentium 3 I'm using). I can go to the 736, which will take another 5 days or so at this rate, unless Xyzzy is done with them already.
|
Done another 1000 curves (45 digit) for 2,951+, no factor
It makes a total of 2000 curves. I hope it helps. |
[QUOTE=Pi Rho]I can go to the 736, which will take another 5 days or so at this rate, unless Xyzzy is done with them already.[/QUOTE]Go ahead... I am having trouble getting the client to work right now...
:cat: |
[Mon May 02 08:55:02 2005]
2^791+1 completed 843 ECM curves, B1=11000000, B2=1100000000 That should finish up 2,791+, at least to 45 digits. I'll pass this along to George. Mike |
Bob, you can cross this one off your list: 2,1342L c170 = p48.p123
p48 = 312689963186011191785543941405234534125118895633 I found it after 507 curves with B1=43e6 B2=11e10. (mprime 24.11 + gmp-ecm 6.0.1). I will work on 2,1334L now. Geoff. |
2^993+1 with Prime95
completed 120 ECM curves, B1=11000000, B2=1100000000 |
[QUOTE=geoff]Bob, you can cross this one off your list: 2,1342L c170 = p48.p123
p48 = 312689963186011191785543941405234534125118895633 I found it after 507 curves with B1=43e6 B2=11e10. (mprime 24.11 + gmp-ecm 6.0.1). I will work on 2,1334L now. Geoff.[/QUOTE] Nice! This certainly was easier with ECM...... |
Done 1200 curves on 2,969+C164 with GMP-ECM, B1=11e6, B2=default.
|
Another 800 curves on 969, B1=11000000
|
600 more curves on 969.
That's it for now. |
2^993+1 with Prime95 10 ECM curves, B1=11000000, B2=1100000000
2^993+1 with Prime95 _2 ECM curves, B1=44000000, B2=4290000000 |
Another 670 curves on 2,969+C164 with GMP-ECM, B1=11e6, B2=default. Um, I think I'll switch numbers now.
|
[QUOTE=flava]Done another 1000 curves (45 digit) for 2,951+, no factor
It makes a total of 2000 curves. I hope it helps.[/QUOTE] flava, which B2 value did you use? Alex |
[QUOTE=trilliwig]Another 670 curves on 2,969+C164 with GMP-ECM, B1=11e6, B2=default. Um, I think I'll switch numbers now.[/QUOTE]
2,969+ has been completed to the 45 digit level. I will inform George. Bruce Dodson just finished of 2,1193+. :banana: I have finished sieving 2,737+ and will start 2,749+ later today. |
[QUOTE=R.D. Silverman]2,969+ has been completed to the 45 digit level. I will inform George.
Bruce Dodson just finished of 2,1193+. :banana: I have finished sieving 2,737+ and will start 2,749+ later today.[/QUOTE] NFSNET should switch to 2,751+ today or so. Wacky is in charge of the timing of the changeover but current stats suggest to me that we are about finished 11,212+ I've been doing 6,786L.c135 by GNFS on my home box. Sieving finished 2 days ago and the matrix started a few hours ago. It should take about 52 hours to run on my 1.3GHz Athlon. The matrix is 1.3M square and has 48 set bits per row. Paul |
[QUOTE=akruppa]flava, which B2 value did you use?
Alex[/QUOTE] [Tue Apr 19 04:27:00 2005] 2^951+1 completed 1000 ECM curves, B1=11000000, B2=1100000000 [Mon Apr 25 09:54:13 2005] 2^951+1 completed 1000 ECM curves, B1=11000000, B2=1100000000 |
Keller and I are currently doing curves for P951 - I expect to be complete with 45 digits within a week or two.
So far, I did ~6% of the total curves needed. Around 4,000 Prime95 resp. 2,000 gmp-ecm curves are left... |
P951: c182 = p47*c135
[QUOTE=R.D. Silverman]Hello to All;
I am thankful for the help people have given in applying ECM efforts to the 2+ table. .... Before I go further, I ask that some additional ECM effort be expended in particular on the following numbers: (to at least the 45 digit level) 2,791+, 2,951+, 2,969+, 2,993+, 2,1322M, 2,1334L, 2,1342L. [/QUOTE] I'm replying to an email from Bob about running the first four of the above. I was able to report having already completed a t45 on all four. After some further discussion, I offered the view that P791 was already tested sufficiently to be sieved, but the other three could do with some further testing. The reason being that my t45 test mostly consisted of c. 600 ecm6 curves with B1=110M on these, while the 2,7xx+ had 2500 ecm5 curves with B1=43M (which seemed less likely to miss smallish prime factors). After setting a curve count in B1=110M curves, I ended up running B1=260M curves instead, and found the p47 on the 160th curve. Anyone know how much of a t45 has been run/ reported here on the other two, P969 and P993? And a belated Greetings! to the forum. B. Dodson (p57 from M997 c301 using prime95; p59 and p66 with gmp-ecm6, not to mention the p62 that wasn't a record!) ----------------------- p47 = 40740225338758415715380694619267076907012248419 GMP-ECM 6.0 [powered by GMP 4.1.90] [ECM] Using B1=260000000, B2=2317234062121, polynomial Dickson(30), sigma=2953010074 Step 1 took 2347474ms Step 2 took 806954ms 1804.125 Opteron x10cpu, with Torbjorn's x86_64 gmp development code using AMD 64-bit assembler |
[QUOTE=bdodson]Anyone know how much of a t45 has been run/
reported here on the other two, P969 and P993?[/QUOTE] Hi Bruce. I believe that what you are looking for can be found here: [url]http://www.mersenne.org/ecmp.htm[/url] |
2^993+1 with Prime95 __2 ECM curves, B1=11000000, B2=1100000000
2^993+1 with Prime95 250 ECM curves, B1=44000000, B2=4290000000 |
on to the 2LM's?
[QUOTE=rogue]Hi Bruce. I believe that what you are looking for can be found here:
[url]http://www.mersenne.org/ecmp.htm[/url][/QUOTE] Ah, you'd think that I would have looked there first; just there didn't seem to be any recent activity on this thread. But now that I have the current counts, the three remaining Pn's on Bob's list have already been over-run. Montgomery reports that these have snfs difficulty 194; 199 and 204 (with the one that factored at 190; figures that the hard ones would be left), which is likely the reason Bob's chosen them. The effort spent on ecm pretesting is way past the optimal percentage of the sieving time, so perhaps Bob agrees that he's willing to risk sieving without further ecm, freeing up the curves for something more likely to have a factor in ecm range? If so, do we know where testing on the other three, > 2,1322M, 2,1334L, 2,1342L stands? Incidently, an update on this morning's post, Sean Irving has added the new c135 to his gnfs queue, so we may reliably expect factors in the near future. Bruce |
[QUOTE=bdodson]
If so, do we know where testing on the other three, > 2,1322M, 2,1334L, 2,1342L stands?[/QUOTE] Both have enough curves at B1=11e6. 1334L has 40 curves at B1=43e6 and 630 at 11e7. 2,1342L had this factor submitted recently 312689963186011191785543941405234534125118895633. I believe that the cofactor 312119007097134742328978209700769058421902458121821582805367454109670158835039663246756558186935608062761438029523087241093 is prime. |
According to Dario Alpern's applet, it is.
|
[QUOTE=bdodson]Ah, you'd think that I would have looked there first; just there
didn't seem to be any recent activity on this thread. But now that I have the current counts, the three remaining Pn's on Bob's list have already been over-run. Montgomery reports that these have snfs difficulty 194; 199 and 204 (with the one that factored at 190; figures that the hard ones would be left), which is likely the reason Bob's chosen them. The effort spent on ecm pretesting is way past the optimal percentage of the sieving time, so perhaps Bob agrees that he's willing to risk sieving without further ecm, freeing up the curves for something more likely to have a factor in ecm range? If so, do we know where testing on the other three, > 2,1322M, 2,1334L, 2,1342L stands? Incidently, an update on this morning's post, Sean Irving has added the new c135 to his gnfs queue, so we may reliably expect factors in the near future. Bruce[/QUOTE] Hi, I agree that we have run enough trials on the 2+ numbers. I am about 70% done with 2,749+. 2,791+ is queued. I will then do 2,969+ and 2,993+ before doing the LM's. I only have 3 PC's..... (and one is part time) The recent (and excellent!) work by Bruce still shows that there are p45 to p50 factors remaining in the base 2 tables (he just found 3 more, finishing 2,1091- and 2,943+ and finding a p49 of 2,1157+) :banana: I wish I had his resources...... |
two 2LM first holes
[QUOTE=rogue]Both have enough curves at B1=11e6. 1334L has 40 curves at B1=43e6 and 630 at 11e7.
2,1342L had this factor submitted recently 312689963186011191785543941405234534125118895633. I believe that the cofactor 312119007097134742328978209700769058421902458121821582805367454109670158835039663246756558186935608062761438029523087241093 is prime.[/QUOTE] Ah, these are the 2LM's with smallest exponent; 2,1322M C175 and 2,1334M C186. And those curve counts sound rather familiar, since they're my curves? I was wondering about curves run by someone else, perhaps in response to this thread. Also, the C175 "curves at B1=11e6" were 2500 curves at B1=43e6, for c. 1.75-times a test to p45. By contast, the 40 curves at B1=43M and 630 at 11e7 was my equivalent of "curves to B1=11e6," rather than an addition to a previous test to p45. Or do you know of someone else (possibly a group of someones) having run a complete test to p45 on either of these numbers? The previous 2+ counts should be OK, since neither you nor Bob mentioned translating my curve counts and reporting them to George? I was happy to share curve counts for other people running curves to know which limits to set, but ambiguities leading to double counts aren't good --- in particular, for deciding whether numbers are ready for sieving. Bruce |
[QUOTE=bdodson]2,1334M C186. And those curve counts sound rather familiar, since
they're my curves? I was wondering about curves run by someone else, perhaps in response to this thread. Also, the C175 "curves at B1=11e6" were 2500 curves at B1=43e6, for c. 1.75-times a test to p45. By contast, the 40 curves at B1=43M and 630 at 11e7 was my equivalent of "curves to B1=11e6," rather than an addition to a previous test to p45. Or do you know of someone else (possibly a group of someones) having run a complete test to p45 on either of these numbers? The previous 2+ counts should be OK, since neither you nor Bob mentioned translating my curve counts and reporting them to George?[/QUOTE] Yes, those are your curve counts. You indicated to me (via PM) that all composites under 176 digits had enough curves at 11e6. Did I misintepret your statement? I don't have the PM in front of me, so I do recall your exact wording. Paul Zimmerman's site doesn't indicate that additional curves were run. I'm unaware of anyone else running curves for those numbers. If they are, then they haven't reported them in the ECM Status forum. I have not sent any curve counts to George. When garo returns from vacation, we will try to get the curve counts (for all composites) straightened out and get the info to George and Paul. |
[QUOTE=bdodson]Ah, these are the 2LM's with smallest exponent; 2,1322M C175 and 2,1334M C186. And those curve counts sound rather familiar, since they're my curves? I was wondering about curves run by someone else, perhaps in response to this thread.[/QUOTE]
On 2,1334L I have done 1763 curves at B1=11e6 and 202 curves at B1=43e6 that don't show in the c120-355 curve counts yet. edit: Also, on 2,1342M (now third hole) I have done 600 curves at B1=43e6. |
[QUOTE=bdodson]Ah, these are the 2LM's with smallest exponent; 2,1322M C175 and
2,1334M C186. And those curve counts sound rather familiar, since they're my curves? I was wondering about curves run by someone else, perhaps in response to this thread. Also, the C175 "curves at B1=11e6" were 2500 curves at B1=43e6, for c. 1.75-times a test to p45. By contast, the 40 curves at B1=43M and 630 at 11e7 was my equivalent of "curves to B1=11e6," rather than an addition to a previous test to p45. Or do you know of someone else (possibly a group of someones) having run a complete test to p45 on either of these numbers? The previous 2+ counts should be OK, since neither you nor Bob mentioned translating my curve counts and reporting them to George? I was happy to share curve counts for other people running curves to know which limits to set, but ambiguities leading to double counts aren't good --- in particular, for deciding whether numbers are ready for sieving. Bruce[/QUOTE] I did translate the curve counts (from your email) and sent them to George. He has already added them into his tables. |
[QUOTE=R.D. Silverman]I did translate the curve counts (from your email) and sent them to George.
He has already added them into his tables.[/QUOTE] Based on that email I added 6000 curves at 44 million to the tables (and marked 11,000,000 done). So any curves at 44,000,000 in excess of 6,000 was done by other users. |
counting confusion(s)
[QUOTE=rogue]Yes, those are your curve counts. You indicated to me (via PM) that all composites under 176 digits had enough curves at 11e6. Did I misintepret your statement? I don't have the PM in front of me, so I do recall your exact wording.
Paul Zimmerman's site doesn't indicate that additional curves were run. I'm unaware of anyone else running curves for those numbers. If they are, then they haven't reported them in the ECM Status forum. I have not sent any curve counts to George. When garo returns from vacation, we will try to get the curve counts (for all composites) straightened out and get the info to George and Paul.[/QUOTE] There is an ambiguity about reporting 11e6 as being done, based on translations of curves run with larger limits. For the c192, my email reported the 40 curves with B1=43M and c. 600 curves with B1=110M (all ecm6 curves). My intention would have been to have 11e6 marked as "no additional curves needed" --- "not needed", for short. But when I read "DONE" in the 11e6 column of ecmp.htm, I took that to mean that someone actually ran 10,600 curves with b1=11M (or some translation of that, to curves with B2=100*B1), in addition to the curves I ran. George's post (if I'm now reading correctly), says P969 had (an equivalent of) 10753 - 6000 = 4753 B1=44M total curves (including, perhaps, some combination of 11M curves and 44M curves), with the "Done" in the 11e6 column to be ignored. I suppose you and garo had better follow that format as well! Likewise, P993 has 8155 - 6000 = 2155. Since CWI/Montgomery are sieving numbers of this difficulty after c. 25% of a test for p50's, which is c. 5000 b1=44M with b2=100*b1 curves, the ecm pretesting conclusion on these two seems to be OK. (For a better margin, 33% would have been c. 6500, and we're also past that.) [Since Bob has mis-reported my counts on these numbers (see below), P969 is OK, but P993 may still be a bit short.] This reduces the unknowns here to how Bob arrived at 6000 B1=44M curves (and a few minor issues such as why he neglected to wait for my reply when he said "I'll translate these counts and send them to George, OK?"). For readers aside from Mark and Bob, the 6000's added at various locations in the Mn and Pn tables are supposed to mean, first, that the remaining cofactor is between c136 and c195. The actual curves and limits depend upon 3 subdivisions (five if one wishes to be picky and/or precise), c136-c145, c146-c175 and c176-c195; but the executive summary is that all of these had enough curves run to mark 11e6 as "Done". If that means 10600*11e6, then a factor of (11e6/44e6) applied to 10600 would suggest 2650*44e6. Ooops. So Bob OUGHT to be reporting 6000 curves on just one of the five divisions, c146-c155. For c136-c145 and c176-c195, "Done" in 11e6 and 2650 in 44e6 is the correct use of George's system, and for c156-c175 "Done and 4690". (Bob's prefered factor of 2 for translation double counts the B2 difference between prime95 and ecm6, since my report to him already included a factor for that difference.) That's for base-2. Since the tables for larger bases are using ecm6 curves, seems like keeping the original curves (except for ecm5-to-ecm6 translation) is better. About Paul's counts from the ecmnet page, only the c136-c145's have been reported, since I was still working on c146-c175 and c176-c196 until recently. Bruce (if you'd like to be really mislead, ask Bob about how well block Lanczos on snfs/gnfs matrices parallelizes! See my colleague Xilman's recent post for references. Or perhaps Bob's revised his views since the RSA200 matrix? Sorry, this is off-thread.) |
edit?
[QUOTE=bdodson]There is an ambiguity about reporting 11e6 as being done, based on
translations of curves run with larger limits. For the c192, my email reported the 40 curves with B1=43M and c. 600 curves with B1=110M (all ecm6 curves). My intention would have been to have 11e6 marked as "no additional curves needed" --- "not needed", for short. But when I read "DONE" in the 11e6 column of ecmp.htm, I took that to mean that someone actually ran 10,600 curves with b1=11M (or some translation of that, to curves with B2=100*B1), in addition to the curves I ran. George's post (if I'm now reading correctly), says P969 had (an equivalent of) 10753 - 6000 = 4753 B1=44M total curves (including, perhaps, some combination of 11M curves and 44M curves), with the "Done" in the 11e6 column to be ignored. I suppose you and garo had better follow that format as well! Likewise, P993 has 8155 - 6000 = 2155. Since CWI/Montgomery are sieving numbers of this difficulty after c. 25% of a test for p50's, which is c. 5000 b1=44M with b2=100*b1 curves, the ecm pretesting conclusion on these two seems to be OK. (For a better margin, 33% would have been c. 6500, and we're also past that.) [[This assumes that P969 and P993 were among the numbers Bob asked George to add 6000 to, on my behalf. But they shouldn't have been. Corrections could be made using other assumptions, but the point at the moment is that I'm still not able to determine what the correct count should be, or how my curves were recorded. ]] This reduces the unknowns here to how Bob arrived at 6000 B1=44M curves (and a few minor issues such as why he neglected to wait for my reply when he said "I'll translate these counts and send them to George, OK?"). For readers aside from Mark and Bob, the 6000's added at various locations in the Mn and Pn tables are supposed to mean, first, that the remaining cofactor is between c136 and c195. The actual curves and limits depend upon 3 subdivisions (five if one wishes to be picky and/or precise), c136-c145, c146-c175 and c176-c195; but the executive summary is that all of these had enough curves run to mark 11e6 as "Done". If that means 10600*11e6, then a factor of (11e6/44e6) applied to 10600 would suggest 2650*44e6. Ooops. So Bob OUGHT to be reporting 6000 curves on just one of the five divisions, c146-c155. For c136-c145 and c176-c195, "Done" in 11e6 and 2650 in 44e6 is the correct use of George's system, and for c156-c175 "Done and 4690". (Bob's prefered factor of 2 for translation double counts the B2 difference between prime95 and ecm6, since my report to him already included a factor for that difference.) That's for base-2. Since the tables for larger bases are using ecm6 curves, seems like keeping the original curves (except for ecm5-to-ecm6 translation) is better. About Paul's counts from the ecmnet page, only the c136-c145's have been reported, since I was still working on c146-c175 and c176-c196 until recently. Bruce (if you'd like to be really mislead, ask Bob about how well block Lanczos on snfs/gnfs matrices parallelizes! See my colleague Xilman's recent post for references. Or perhaps Bob's revised his views since the RSA200 matrix? Sorry, this is off-thread.)[/QUOTE] The second paragraph in my original post ought to end with the above comment in brackets, [[...]], rather than what's in the brackets [...] there. Bruce |
[QUOTE=bdodson]
<snip> This reduces the unknowns here to how Bob arrived at 6000 B1=44M curves (and a few minor issues such as why he neglected to wait for my reply when he said "I'll translate these counts and send them to George, OK?"). Bruce (if you'd like to be really mislead, ask Bob about how well block Lanczos on snfs/gnfs matrices parallelizes! See my colleague Xilman's recent post for references. Or perhaps Bob's revised his views since the RSA200 matrix? Sorry, this is off-thread.)[/QUOTE] You reported 3000 ecm6 curves in your email. It has been agreed that 1 ecm5/ecm6 curve ~ 2 prime95 curves. That's where the 6000 comes from. As for your implications that I mislead people about parallel block Lanczos, (1) Both CWI and NFSNET reported low (less than 50%) per processor utiliization and CWI has stated in public that communication costs dominate the computation. Peter Montgomery has presented graphs showing how the per-processor utiliization decreases (somewhat dramatically) as the number of processors increase. My public comments have simply stated that B-L does not parallelize well. I stick by that comment and it is backed by data presented by others. (2) And it is quite unprofessional of you to accuse me of misleading others in the manner you did. |
four of ?? numbers agreed; rest inaccurate/wrong
[QUOTE=Prime95]Based on that email I added 6000 curves at 44 million to the tables (and marked 11,000,000 done). So any curves at 44,000,000 in excess of 6,000 was done by other users.[/QUOTE]
Since Bob doesn't appear to have read my email, his report to you (submitted without my agreement or approval) doesn't seem to me to be a postive contribution to the counts reported on your pages ecmm.htm and ecmp.htm. There are no numbers on which I ran 3000 ecm6 curves, and the 3000 ecm5 curves with B1=43M that I did run applies only to Cunningham composites between 145- and 155-digits. There were, at that time, eight base-2 numbers, 2^n +/- 1 for n < 1200 in this range. Four have subsequently been factored. So adding 6000 curves should at most have been applied to M1173 and P760, P820 and P1044. If you were to email me any other numbers in Bob's email, I'd be happy to supply a round number using Bob's factor of 2 that should replace the bogus 6000 value (if there were any numbers other than the above four in the email you received). As I pointed out to Bob in the email he didn't wait for (saying among other things that I'd be happier reporting my counts myself), some care needs to be taken to be sure that a number in a given range was actually in one of my input files. Of numbers that were run, 5000 would fit numbers from 156- to 175-digits, which is perhaps not so serious a correction. Of numbers with 136- to 144-digits that I ran, Sean Irvine's factorization of M949 finished the last number from the ecmm and ecmp ranges (although there are some 20 numbers for other bases that remain from the list of numbers run, not yet reserved by anyone). All of these curves were run on a cluster of 1.1 Mhz Pentium 3. The more recent Opteron curve counts are well below 6000 curves with B1=44M, perhaps closer to 2650 curves, rounded to 2500, but with 11e6 (t45) marked as Done. Most numbers from 176- to 195-digits were run this far (but not farther yet), the only exceptions being other people's recent new composite cofactors. My recent base-2 factorizations were found by applying just 165 ecm6 curves with B1=110M, which has finished on all of c196-c384. I'm still working in this c196- range, so including counts seems premature, but 3e6 (t40) is Done on the complete Cunningham list. Bruce |
[QUOTE=R.D. Silverman]Both CWI and NFSNET reported low (less than 50%) per processor utiliization and CWI has stated in public that communication costs dominate the computation. Peter Montgomery has presented graphs showing how the per-processor utilization decreases (somewhat dramatically) as the number of processors increase.[/QUOTE]
Bob, I'd be very interested in seeing some version (even a simple tabular one) of these data. Did Peter try different processor types, or were all the runs on the same kind of hardware? (I seem to recall CWI using a big SGI-based system for their most memory-intensive work, but perhaps they have other multi-CPU systems as well). |
[QUOTE=ewmayer]Bob, I'd be very interested in seeing some version (even a simple tabular one) of these data. Did Peter try different processor types, or were all the runs on the same kind of hardware? (I seem to recall CWI using a big SGI-based system for their most memory-intensive work, but perhaps they have other multi-CPU systems as well).[/QUOTE]
Peter presented his information at the annual RSA crypto conference back in 2000. I used to have a soft copy of his slides, but it got left on my PC when I was laid off at RSA. I do have a hard copy of his slides. His hardware was 16 dual CPU pentium II's on a 100Mb/sec Ethernet. (and a second set of numbers on a 1.28Gb/sec Myrinet) RSA should still have a copy plus perhaps a cassette tape recording of his talk. With 1 processor the time per iteration was 7 sec. With 8 processors, it dropped to 4 seconds on Ethernet and 2 sec on Myrinet. -- Note that even on the faster network an 8-fold increase in processors only reduced run time fourfold. With another 4 processors (12 total) it dropped to 3+ sec on Ethernet and ~1.6 sec on Myrinet. With another 4, it dropped to 3 sec on Ethernet and 1.5 sec on Myrinet. This CLEARLY shows a sharp reduction in per-processor utilization as the number of processors increases. If you send me your snail mail address, I will Xerox my hard copy and send it to you. I have stated in public that "parallel efforts have met with some, but not large success". Paul Leyland reported to me that the cluster he used (~30 procs on a gigabit network) was getting only about 50% per processor utilization. You can get exact numbers from Paul. What justifies Bruce's accusation that I have misrepresented the situation? His off topic remark was totally unjustified. |
Thanks for the numbers, Bob - no need to send me the gory details, I was interested in roughly how fast the per-CPU utilization dropped off with increasing #CPUs and your data answer that question in as much detail as I desire. That's interesting about the 50% utilization point (a rather arbitrary but nonetheless useful threshold for deciding whether throwing more CPUs at a problem is likely to be worthwhile) occurring with as few as 8 CPUs even on some systems with GB/sec networking - I had always been under the impression that large-scale linear algebra of most kinds (e.g. Gaussian elimination, QR for finding of eigenvalues, and such) could be parallelized extremely effectively - this is clearly a major counterexample.
Note that my question was strictly related to the block-Lanczos parallelization problem (first I'd heard of this) and unrelated to Bruce's subjective remarks, although I agree the latter types of issues are the kind that should be broached in private - last thing we need is another public flame war. |
[QUOTE=ewmayer]I had always been under the impression that large-scale linear algebra of most kinds (e.g. Gaussian elimination, QR for finding of eigenvalues, and such) could be parallelized extremely effectively - this is clearly a major counterexample.[/QUOTE]
The enormous performance numbers are typically for dense systems where there's much more computation than communication. Linear systems derived from factorization algorithms are always sparse nowadays, and the most stressful computations they do are matrix-vector products whose computation-to-communication ratio is even lower. Sparse linear system solvers for floating-point problems reorder the rows and columns of the matrix to cluster blocks of nonzero elements together, then switch to dense matrix routines as much as possible when actually performing Gaussian elimination. The algorithms they use look to me to be more advanced than the typical NFS filtering phase. So I wonder: NFS filtering tries to produce a matrix with as few nonzero elements as possible. Is it feasible to modify the filtering phase not to optimize the matrix weight but the position of the weights? I.e. the matrix produced is more dense than usual but also more clustered, so that the matrix multiplications that block lanczos uses spend more time in cache. If things can be packed tightly enough, perhaps an implementation can divide the matrix into sparse blocks and dense blocks stored as bit-vectors, to lower the real storage used. Or perhaps this is already done, and I just didn't know it. jasonp |
more data (off topic)
[QUOTE=R.D. Silverman]Peter presented his information at the annual RSA crypto conference
back in 2000. I used to have a soft copy of his slides, but it got left on my PC when I was laid off at RSA. I do have a hard copy of his slides. His hardware was 16 dual CPU pentium II's on a 100Mb/sec Ethernet. (and a second set of numbers on a 1.28Gb/sec Myrinet) ... This CLEARLY shows a sharp reduction in per-processor utilization as the number of processors increases. If you send me your snail mail address, I will Xerox my hard copy and send it to you. I have stated in public that "parallel efforts have met with some, but not large success". Paul Leyland reported to me that the cluster he used (~30 procs on a gigabit network) was getting only about 50% per processor utilization. You can get exact numbers from Paul. What justifies Bruce's accusation that I have misrepresented the situation? His off topic remark was totally unjustified.[/QUOTE] I'll admit to being inappropriately annoyed at Bob's enthusiasm for sharing his data and view on one of the computational issues related to the security of RSA, the matrix step in the number field sieve. In fact, when Bob sent me a factoring paper last year, I checked to see whether he'd revised his views since the last I heard, and when I found him repeating the above, found myself unable to take anything else in the paper seriously. I don't dispute Bob's quote of CWI to the effect that communication dominates the computation. But Meyer's report of recalling CWI using an sgi system is closer to my point. I don't need lost files from RSA Data, we have an early post from Peter to the Mersenne list, Sept 7,2001 about the factorization of M727. I'd try digging it out, but for the fact that I keep a copy on my own web page, and here's what Peter's post says: > SARA, a Dutch High-Performance Computing Center in Amsterdam, > had been down August 13-20, to enlarge its network (teras) > of shared memory 500 MHz MIPS R14000 processors. > Peter Montgomery had written a parallel MPI-based version of > his Block Lanczos linear algebra code earlier in 1999-2001. > Happily, this code functioned well on the new hardware configuration, > .... > The largest matrix run on teras, for 2^727 - 1, was 3847967 x 3862821. > This took 99153 real-time seconds (about 27.5 hours) using 25 processors. > The estimated single-processor time is 1200000 seconds (330 hours) > for that same matrix on the same hardware, so the job got almost 50% > utilization of the 25 processors. This was before everyone was running the last Mn without factor, M1061, and this M727 was on George's list. Arjen Lenstra and I spent 7 months sieving it before sending the matrix to CWI, so I have a rather good recollection of the processor utilization. This was before I factored one of the others, with ecm/prime95, M997 = c301 = p57*prime_cof, before Franke took on running M809 by snfs (one of the others), and before M971 factored by ecm/prime95, leaving just the last M1061. Likewise, I don't dispute Bob's conversation with Leyland, but I did wonder, when I saw a private conversation referenced in Bob's paper, whether he asked Paul about having the above stand as his summary of the NFSNET experience running large matrices on the MS cluster? I've worked with Paul on RSA130, RSA140, RSA155=RSA512 (and you won't find Bob listed as a co-author with us on the papers), and every time I spoke with Paul about running large matrices his view was that he'd be able to run anything that came along, and in the unlikely event of one being too large to run on his cluster he'd get access to a larger one (that would be, one with more processors, if anyone wonders). Our most recent conversation on this topic was at the CWI/Utrecht workshop, where we heard the state-of-the-art described by Franke. At that time, the three largest snfs/gnfs matrices were the M809 matrix (referred to above), which was done on CWI's sgi system and the larger, harder matrices needed for RSA160 and RSA576, done at Bonn. What's happened since? Both Franke records were broken using yet another block Lanczos at NTT, by Aoki, first a 248-bit snfs using 16 pentium4's for the matrix, then c176 gnfs using 36 pentium4's. Of course, the latter was just before Franke reclaimed the gnfs, breaking RSA200 using Block Wiedemann on a cluster of 80 Opterons --- he skipped RSA640, just as he'd said he would in conversation at Utrecht. So we have Bob's CWI data before their sgi use, and his use of Paul's comment that I'd bet came early in NFSNET, before their best computations; and then five years of steady improvement in parallel matrix computation. Meanwhile, Bob's still quoting the data from five years ago, as if they had been the final word on the subject. Now that's what I'd call unprofessional. I'll apologize to the forum for baiting Bob by characterizing his view as being misleading. Bruce (So Bob, would you mind sending me a copy of the version of my ecm curve counts that you sent to George? Email would be best, but I'll go with whatever you prefer.) |
[QUOTE=bdodson]
"for that same matrix on the same hardware, so the job got almost 50% > utilization of the 25 processors." [/QUOTE] 'Almost 50%'. This is still a sharp under-utilization of the processors. I do not dispute that people have done larger matrices. However, I have not seen any of the various people involved publish data on per-processor utilization (of their various implementations) as the number of processors changes. Since Peter prsented his data, computers have gotten faster, as have networks. But I have seen no new graphs showing speed-up as a function of the number of processors. If someone presents data that shows *other* than almost hyperbolic drop-off in processor utilization as the number of processors increases I will change my view. We can speed sieving by as much as we want. Of course we would reach a limit as the number of processors approached the number of sieve lines, but speedup remains linear even up to using 10's of thousands of processors. However, for *loosely coupled* networks of workstations, the inter- processor latencies cause the computation-to-communication time ratios to be low. Until this improves, network implementations of Block Lanczos and Block Wiedemann simply *will not scale*. And for the record, I have considerable experience in implementing linear algebra routines on parallel machines (e.g. old Intel hypercube, BBN Butterfly, Symult S2010, CM2, CM5, Intel Paragon). The big problem is not bandwidth, but LATENCY. And socket latching mechanisms for network implementations put a lower bound on latency. The mechanisms by which operating systems send out data through sockets is too slow. I'd like to see data using a modern, tightly coupled (dedicated fast network without sockets) parallel computer. One with latencies in the 10 microsecond, rather than millisecond range. Noone denies that we have done progressively larger matrices. But we also have bigger and faster machines and networks. I still want to see data that shows that network parallel BL and BW can *scale*, before I will change my position. When someone shows (say) greater than 50% utilization on a *large* network (say 256 machines) I will also change my position. As for your not taking my paper seriously because of my views on this subject, I find that rather silly, since my paper does not deal with this subject. You say in essence "I don't agree with Bob regarding his position on the linear algebra, so I won't take his mathematics on optimization seriously". |
[QUOTE=R.D. Silverman]Peter presented his information at the annual RSA crypto conference
back in 2000. I used to have a soft copy of his slides, but it got left on my PC when I was laid off at RSA. ... I have stated in public that "parallel efforts have met with some, but not large success". Paul Leyland reported to me that the cluster he used (~30 procs on a gigabit network) was getting only about 50% per processor utilization. You can get exact numbers from Paul.[/QUOTE] You have higher standards than many people working in the field. I don't think anyone believes that linear algebra on sparse matrices is anywhere near as trivially parallelizable as sieving. Nonetheless, clusters have greatly increased the size of matrices which can be processed on reasonable hardware. There are two main reasons, I believe. The first is that for a given number of machine cycles, it is much cheaper to have a dozen or few commodity machines than a single supercomputer with the same performance. The other, which historically has been much the same as the first, though the situation is changing, is that commodity machines are generally limited by their physical address space to a couple of gigabytes of RAM or so, and holding large matrices can require much more storage than that. Distributing the matrix over a number of 2G machines gets around the problem to some extent. You may be able to get exact numbers from me, in principle, but I don't have them to hand. Experiments performed by a number of people, including Peter Montgomery and myself, suggest that the effective processing power scales roughly as the square root of the number of cpus, at least for a small number (<100 perhaps) of machines connected by a network. The situation may be different on shared memory machines but we haven't had access to such to be able to carry out meaningful experiments. I have noticed that a dual proc shared memory machine is noticeably faster than sqrt(2) times the speed of the same box using only one cpu. Paul |
proc. use over varying machines
[QUOTE=R.D. Silverman]'Almost 50%'. This is still a sharp under-utilization of the processors.
I do not dispute that people have done larger matrices. However, I have not seen any of the various people involved publish data on per-processor utilization (of their various implementations) as the number of processors changes. Since Peter prsented his data, computers have gotten faster, as have networks. But I have seen no new graphs showing speed-up as a function of the number of processors. If someone presents data that shows *other* than almost hyperbolic drop-off in processor utilization as the number of processors increases I will change my view. ... And for the record, I have considerable experience in implementing linear algebra routines on parallel machines (e.g. old Intel hypercube, BBN Butterfly, Symult S2010, CM2, CM5, Intel Paragon). The big problem is not bandwidth, but LATENCY. And socket latching mechanisms for network implementations put a lower bound on latency. The mechanisms by which operating systems send out data through sockets is too slow. I'd like to see data using a modern, tightly coupled (dedicated fast network without sockets) parallel computer. One with latencies in the 10 microsecond, rather than millisecond range. Noone denies that we have done progressively larger matrices. But we also have bigger and faster machines and networks. I still want to see data that shows that network parallel BL and BW can *scale*, before I will change my position. When someone shows (say) greater than 50% utilization on a *large* network (say 256 machines) I will also change my position. As for your not taking my paper seriously because of my views on this subject, I find that rather silly, since my paper does not deal with this subject. You say in essence "I don't agree with Bob regarding his position on the linear algebra, so I won't take his mathematics on optimization seriously".[/QUOTE] OK, thanks for a coherent description of your position on this topic. (By comparison to what I've seen on other threads, this hardly counts as being off-topic.) I'm currently using an expensive loosely coupled cluster (two actually, if you count the one with dual P3's on the nodes) to run ECM, which completely wastes the most expensive parts, the communication interconnects between nodes. That's 0.0% use of the data connects. Even 25% processor use, with hyperbolic decay, would still be better use of the parallel processing resource. (The reasons I'm able to use the time for this purpose is that I run at absolutely bottom priority, and the Opteron cluster PBS scheduling software isn't up yet -- the cluster hasn't quite opened to the public for truely parallel jobs.) About the CWI machine on which almost all of their parallel jobs were run (of course a SMP machine; with a cputime budget that doesn't encourage pursuit of topics only of academic interest), the utilization you're objecting to ("only 50%") is again at the very start of their use of the machine for parallel jobs --- we're still talking 2001. The largest matrix run there that I know of was still on the same machine, but if I recall correctly, was already done at the Lenstra retirement (H.W.) from Berkeley, March 2003 [before Utrecht]; namely the matrix for M809. The part of the sieving done by Franke, et. al. used larger large-primes than the CWI siever (and both RSA160 and RSA576 used larger bounds yet), so the matrix was a virtual monster, 10M^2 rows-times-cols. This is usually referred to as the 244-digit snfs record, the one Aoki's 248-digit snfs broke. So there's our tiny little M727 matrix, 3.8M^2, that used 27cpus in Sept 2001. Then something like a year-and-a-half later, this monster matrix on the same machine (with 512cpus, IIR). I don't know how many cpus were used, or what the processor utilization was; and I gather no one else here does either. But it's the performance of that computation we ought to be discussing, anything earlier is outdated (by orders of magnitude); and unless Peter's inclined to put checkmarks on Bob's graph, none of us actually knows how scaling went (perhaps it's proprietary, even), and in the absence of the data we ought to be clear that we don't know. Bob is entitled to assert that "based on early data, on loosely couple hardware, the CWI parallel Lanczos didn't scale well"; but we shouldn't be lead by that (I'm being descriptive, not pejorative here) to conclude anything about the SMP version (which is the one that matters, historically: early experiments -vs- production runs) a year-and-a-half after the very first substantial run blows away anything we'd seen before (25 cpus!!!, after, for years, experts assuring that the matrix step was inherently un- parallelizable, as a nontrivial component of the security of RSA). It was Peter's program, but my M727 matrix (towards George's problem, now reduced to just M1061); and I'd appreciate Bob adding the M727-matrix and the M809-matrix to his description of the data (in the wish-we-had-data column for M809). That's two points: (1) people do far worse things, utilization-wise, than pushing parallel programs out past their sweet-spot (0% vs 25% use); and (2) the CWI/sgi computation Meyer ought to be thinking about visibly is in a completely different range than the data supporting poor cpu utilization that we're being referred to. But the factoring records I was recalling above have a more salient point, namely that subsequent records are rarely set on the same soft/hardware (I mean "RSA130 to RSA155", with RSA140 being a warm-up for RSA155= RSA512). We had snfs248 an entirely different matrix than the snfs244 matrix (on NTT's pc-cluster vs CWI's sgi/smp); and RSA576 -to- c176 -to- RSA200 for gnfs (heavy pc at Bonn, lighter pc at NTT, Opteron at Bonn). Look at the variety of hardware on which parallel Lanczos/W. has been acceptably run (in the view of the person using the software and the person authorizing the hardware). We also had the 2nd c155 on a block of 4 alphas. So that's CWI/sgi, the alphas, MSR, Bonn/Intel-pc, NTT/pc and Bonn/Opteron. (Admittedly, the alpha version was never scaled-up, but if we're using experience instead of data, my bet's on Torbjorn.) That's a very wide collection, and the variety itself says something about how well the parallelization works. My point here is that it's not how adding cpus on a fixed machine works that matters, but how the new version runs on the next new machine. The one we use for snfs1024 and the one for RSA768 and for RSA1024. Does anyone believe those three matrices will be done on the same machine? Now for the record, I'm a Calculus instructor, and the TA's probably better at explaining the Maple implementations. I do have c. five years of using the CWI suite of programs (with special attention to the filter, and getting matrices small enough to run on the old single cpu sgi/Lanczos). Our new sgi/linux/IA64 that's replacing the old Origin3800/Irix/Mips won't be using those, so I'll probably be looking for a new siever soon. In the meantime, I have my attention fairly well occupied feeding input files into Torbjorn's x86-64 Opteron binary of gmp-ecm6; with keeping track of curve counts a crucial part of making best use of that resource -- it matters more to me than it does to you. (A new 2L p44 today, c241 = p44*c197.) I've also given considerable attention to the changing views of the security of RSA, especially to the matrix step that we believe to be reduced to a problem that's minor by comparision to sieving (that's Shamir-Lenstra- Leyland-Dodson, estimating RSA1024; cf. Bob's view, above). Do results in optimization depend upon making new judgements based on unexpected new results (like Peter's crack of our M727-matrix was)? Maybe, maybe not. To me, insisting that Lanczos doesn't parallelize is something like insisting that the sky is orange or the earth flat --- I could take agnosticsm; the data needed isn't available, so we don't know how well it scales on a fixed machine -- but to take the most recent data-point, shifting sieving effort over to matrix effort was a winning strategy for RSA200, and even that matrix fell to parallelization. That means those of us building small matrices (Aoki's for example) have a very long way to go before running up against parallelization limits in the matrix step of snfs/gnfs. Lanczos (well, Weidemann) parallelizes well enough -- at least, when considered over changing machines, as is what's needed in practice. Bruce (Thanks to Bob for the curve count email.) |
[QUOTE=bdodson]OK, thanks for a coherent description of your position on this topic. (By
comparison to what I've seen on other threads, this hardly counts as being off-topic.) <snip> [/QUOTE] Thanks for your comments. I would like to add one observation that, while somewhat "qualitative" supports my view. If one observes the reported times for factorizations over the last few years one can see that the *Elapsed* time ratio between the linear algebra and sieving has increased. Some efforts have reported : LA Elapsed Time/Sieve Elapsed Time of over .25. This *suggests* that improvements in parallel LA have not kept pace with improvements in sieving. We have thrown hundreds of machines at the sieving step. I'd like to see a successful LA step using a similar number of machines. |
| All times are UTC. The time now is 16:05. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.