mersenneforum.org How to use prime95 for stage 1 & GMP-ECM for stage 2
 Register FAQ Search Today's Posts Mark Forums Read

2015-04-27, 15:42   #45
Serpentine Vermin Jar

Jul 2014

5·677 Posts

Quote:
 Originally Posted by lycorn Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...
Over the weekend I had a few machines running Prime95 doing just stage 1 at b1=29e8 for M1277. I spit out 240 curves between the machines using P95 just for the stage 1.

Now I took those 240 results and between these same systems I can run 10 at a time without causing memory issues (I'm assuming 20 GB per instance). Each instance will process 24 of the results. In detail, it's 4 machines (2x6 core Xeons). 3 of the 4 had enough extra RAM to run 2 at once and one of them was only using 50GB out of 144 GB so I'm giving it 4 to run at once. Too bad that machine has slightly slower CPU's (2.53 GHz instead of 3.47 GHz...oh well).

I just slightly tweaked the process I put together for having gmp-ecm do everything... mostly I didn't want to click around a bunch to set affinity and priority, and I only had to modify it slightly to grab a particular results file.

Anyway, those are all going now. I can't remember what it was last time, but I think it was taking something like 90 minutes or so for it to run stage 2 on each result. I guess in 36 hours I should have all 240 curves finished with stage 2.

BTW, I just let gmp-ecm figure out B2 (it's using B2=105101237217912).

2015-04-27, 15:47   #46
Serpentine Vermin Jar

Jul 2014

5×677 Posts

Quote:
 Originally Posted by Madpoo Anyway, those are all going now. I can't remember what it was last time, but I think it was taking something like 90 minutes or so for it to run stage 2 on each result. I guess in 36 hours I should have all 240 curves finished with stage 2.
It occurred to me that my previous tests were just running one per machine. Doing 2 (or 4) on the same system may create memory contention well beyond what I saw before and slow down the individual threads. I guess I'll find out.

I guess I could try setting the affinity on multiple threads to different NUMA nodes, but I don't even know if that would help. When the process started up it didn't have an affinity so the Windows scheduler (which is NUMA aware) probably didn't know to try and give it memory on those channels?

One more reason the affinity setting *should* be baked into the EXE so it's defined at launch.

 2015-04-27, 18:15 #47 Madpoo Serpentine Vermin Jar     Jul 2014 5·677 Posts [QUOTE=Madpoo;401029]It occurred to me that my previous tests were just running one per machine. Doing 2 (or 4) on the same system may create memory contention well beyond what I saw before and slow down the individual threads. I guess I'll find out./QUOTE] It seems like it might be just fine. On the faster 3.47 GHz systems it's still doing about 94 minutes to do the stage 2 on each one, which is about the same as before when only one was running (this one has 2 going). I did manually change the affinity so each instance was on it's own NUMA node which may have been a good idea. On the slower 2.53 GHz systems which I didn't benchmark previously it's doing each stage 2 in about 128 minutes. 2.53 GHz / 3.47 GHz = 0.73 94 minutes / 128 minutes = 0.73 So yeah, seems like it scales pretty evenly with CPU speed, everything else being equal (similar servers, similar mem speed, same Xeon class CPU, etc)
 2015-04-28, 00:16 #48 lycorn     "GIMFS" Sep 2002 Oeiras, Portugal 1,571 Posts Well, that´s what I call a powerhouse! Those 240 curves will end up counting as probably more than 1000 @ default/current bounds set in the Primenet server. Have you measured the time each curve is taking on S1? According to some posts in this thread, the ratio S1:S2 should be ~ 1:0.7. So if S2 is taking 94 minutes and S1 more than 94/0.7=134 mins you could ftry larger B2 values until you attain that ratio.
2015-04-28, 00:34   #49
lycorn

"GIMFS"
Sep 2002
Oeiras, Portugal

1,571 Posts

Quote:
 Originally Posted by R.D. Silverman I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers? If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would be very fast.... I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).
I´ve timed some P95 and GMP-ECM runs, and in general P95 S1 is faster than GMP-ECM´s. I still have just a few data points so the results aren´t yet that meaningful, I will try to find some time to run some more, and also comparing, for the same expo size, different B1 values. I will then post some more robust conclusions.
It´s already becoming apparent, though, that for smaller exponents the difference is not as big as for larger ones, in line with what you wrote in your post.

2015-04-28, 03:30   #50
Serpentine Vermin Jar

Jul 2014

5×677 Posts

Quote:
 Originally Posted by lycorn Well, that´s what I call a powerhouse! Those 240 curves will end up counting as probably more than 1000 @ default/current bounds set in the Primenet server. Have you measured the time each curve is taking on S1? According to some posts in this thread, the ratio S1:S2 should be ~ 1:0.7. So if S2 is taking 94 minutes and S1 more than 94/0.7=134 mins you could ftry larger B2 values until you attain that ratio.
Pppphhhbbth.

The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95.

Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?

2015-04-28, 04:14   #51
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

33·11·19 Posts

Quote:
 Originally Posted by Madpoo Pppphhhbbth. The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95. Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?
Using the -B2scale 4 flag when calling stage 2 GMP-ECM will multiply B2 by four, which will double memory requirement and double stage 2 time. If your serve can handle that memory load, that will be more efficient (though only 5% or so more efficient than your current default settings- not an "OMG must do!" issue).

Edit: It's less clear that increasing B2 without increasing memory (which increases the number of steps to finish stage 2, a parameter GMP-ECM calls "k") will prove more efficient. If you are memory-limited to this current stage 2 footprint, it may be that only a small increase in B2 is worthwhile. B2 increases in large steps, corresponding to a unit change in k-value. If your current test uses k=3, the next B2 would be 1/3rd bigger and k=4 for same memory footprint. GMP-ECM by default uses k values 2 through 6, followed by a doubling of memory and reset to k=2 for a bigger more efficient work-chunk. If you set maxmem={number too small for default k choice}, the program will stick to the smaller work-chunk-size, increasing k beyond 6. This is usually less efficient, but experimentation is required (depends on individual machine specs).

Last fiddled with by VBCurtis on 2015-04-28 at 04:23 Reason: added detail about k parameter

 2015-04-28, 06:08 #52 lorgix     Sep 2010 Scandinavia 3·5·41 Posts I'm pretty sure B2 should be increased, even with a memory constraint.
2015-04-28, 10:44   #53
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

22×1,877 Posts

Quote:
 Originally Posted by Madpoo Pppphhhbbth. The server that was taking 128 minutes in stage 2 with gmp-ecm was taking 5 hours, 45 minutes in stage 1 with P95. Sounds like you're saying that I should really be goosing up B2 until it takes around 4 hours in stage 2?
If people would ever bother to READ my joint paper with Sam Wagstaff, they would learn that
optimal ECM performance is obtained when one spends the same amount of TIME in step 1
and step 2.

 2015-04-28, 12:09 #54 lycorn     "GIMFS" Sep 2002 Oeiras, Portugal 1,571 Posts According to posts #16, 22 and 32 of this thread, written by someone that apparently has read your papers, the ratio is 1:0.7, hence my observation. I lack the math background to fully understand what´s involved, so I trusted what seemed to come from a reliable source. I´m obviously happy to be corrected from someone qualified in the subject as yourself.
2015-04-28, 12:33   #55
ATH
Einyen

Dec 2003
Denmark

19·181 Posts

Quote:
 Originally Posted by R.D. Silverman I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers? If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would be very fast.... I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).
Prime95 and GMPECM running on 1 core each, stage1 only:

Code:
M1277 B1=110M	Prime95:16 min	GMPECM: 27 min

M1277 B1=44M	Prime95:6.5min	GMPECM:  9.5 min

M2137 B1=44M	Prime95:8.5min	GMPECM: 29 min

M10061 B1=44M	Prime95:34.5min	GMPECM: 277 min

Last fiddled with by ATH on 2015-04-28 at 12:33

 Similar Threads Thread Thread Starter Forum Replies Last Post Gordon GMP-ECM 3 2016-01-08 12:44 D. B. Staple Factoring 2 2007-12-14 00:21 jasong GMP-ECM 9 2007-10-25 22:32 Angular Hardware 18 2004-11-15 07:04 Matthias C. Noc PrimeNet 5 2004-08-25 15:42

All times are UTC. The time now is 16:02.

Wed Feb 8 16:02:43 UTC 2023 up 174 days, 13:31, 1 user, load averages: 1.22, 1.06, 0.97