mersenneforum.org How to use prime95 for stage 1 & GMP-ECM for stage 2
 Register FAQ Search Today's Posts Mark Forums Read

 2015-04-22, 18:38 #35 lorgix     Sep 2010 Scandinavia 3×5×41 Posts > log.txt is how I log results. -n sets low priority. Optimal bounds depend on many different things. Luckily, the efficiency curve is pretty flat around the optimal. Adjusting bounds based on experience is not hard.
2015-04-22, 19:45   #36
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

563910 Posts

Quote:
 Originally Posted by Madpoo;400653[* and considering that experienced users know about optimizing things, could it automatically pick optimal bounds for the type of work?[/LIST] Ideally it'd be cool if Prime95 used the faster code that GMP-ECM apparently uses so there's not the back and forth shuffle in the first place but I know George is busy. His list of "nice to have" changes in Prime95 is probably long already.
How would we tell GMP-ECM what the type of work is, such that it could make a decision about optimal bounds? In present use, we supply it an input number and a B1 chosen to most quickly find factors of a certain size. We choose B1 based on how much previous factoring effort was done. To get GMP-ECM to do that, we would have to somehow have the input file include all previous factoring work, while also adding quite a bit of code to determine what B1 now makes sense for the memory available, size of composite, and size of hoped-for-factor. That would be pretty complicated, though comes fairly naturally to users of the program after some experience. YAFU and ecm.py both automate some of these choices- you can tell YAFU what digit-level of ECM has already been run, and to what digit level you wish to go and how many cores to use for ECM, and it automagically chooses B1/B2 bounds and fires up the proper number of ecm.exe processes.

I agree that it would be quite nice for ECM work overall for Prime95 to be able to call ecm directly for stage 2! Alas, a very small use case compared to the overall project.

 2015-04-23, 13:49 #37 lycorn     "GIMFS" Sep 2002 Oeiras, Portugal 1,571 Posts @Madpoo: The following considerations assume you will run Stage 1 on Prime95 and Stage 2 on GMP-ECM. Make a copy of Prime95 directory, and configure the program for a single worker. You´ll be running the Stage 1 from there. Insert the line GmpEcmHook=1 in prime.txt. Run P95 with B1=B2=the applicable B1default bound for the given exponent (for 1277, B1=8e08). You may later adjust this value based on experience. Run as many curves as you wish Upon finishing, the results.txt file from P95 will contain a bunch of Stage 1 residues, one from each curve. Copy the results.txt file to the GMP-ECM home directory, open a cmd prompt and run ecm -v -resume results.txt -save 8e08-8e08. You may as well keep both executables and associated files in the same directory. It will save you the hassle of moving files around. GMP-ECM will sweep through the residues file, running the Stage 2 for each one, and recording the residues in the file chosen for saving (GMP-ECM creates the file, you just supply the name). Upon finishing, the save file will contain the Stage 2 residues. To report the results to the server, contact GW - he may get you sorted out. To set the affinity, I use the task manager. It´s really not a must to set the affinity, but it keeps ecm from stealing resources from the lower priority workers - I do it because when running ecm on one core and P95 on the remaining ones, if I don´t stick ecm to the idle core, the CPU usage of the P95 workers decreases. I appreciate it´s a difference between running this combo of programs at home, on a single desktop computer, or in a datacenter environment, where the abilty to automate tasks is a must. But anyway, if you want to give it a shot, there´s how I do it. As there is some manual work involved, you may wish to do long runs - jobs that take a couple of days on each Stage - to reduce the manual overhead. Try using much larger bounds, as suggested by VBCurtis, and see what it gives in terms of running time and memory usage. It would be very nice to put all that memory to good use! See the Readme file that comes with the GMP-ECM package for more info. HTH
2015-04-23, 16:33   #38
Serpentine Vermin Jar

Jul 2014

5·677 Posts

Quote:
 Originally Posted by lycorn @Madpoo: The following considerations assume you will run Stage 1 on Prime95 and Stage 2 on GMP-ECM. Make a copy of Prime95 directory, and configure the program for a single worker. You´ll be running the Stage 1 from there. Insert the line GmpEcmHook=1 in prime.txt. Run P95 with B1=B2=the applicable B1default bound for the given exponent (for 1277, B1=8e08). You may later adjust this value based on experience. Run as many curves as you wish Upon finishing, the results.txt file from P95 will contain a bunch of Stage 1 residues, one from each curve. Copy the results.txt file to the GMP-ECM home directory, open a cmd prompt and run ecm -v -resume results.txt -save 8e08-8e08. You may as well keep both executables and associated files in the same directory. It will save you the hassle of moving files around. GMP-ECM will sweep through the residues file, running the Stage 2 for each one, and recording the residues in the file chosen for saving (GMP-ECM creates the file, you just supply the name). Upon finishing, the save file will contain the Stage 2 residues. To report the results to the server, contact GW - he may get you sorted out. To set the affinity, I use the task manager. It´s really not a must to set the affinity, but it keeps ecm from stealing resources from the lower priority workers - I do it because when running ecm on one core and P95 on the remaining ones, if I don´t stick ecm to the idle core, the CPU usage of the P95 workers decreases. I appreciate it´s a difference between running this combo of programs at home, on a single desktop computer, or in a datacenter environment, where the abilty to automate tasks is a must. But anyway, if you want to give it a shot, there´s how I do it. As there is some manual work involved, you may wish to do long runs - jobs that take a couple of days on each Stage - to reduce the manual overhead. Try using much larger bounds, as suggested by VBCurtis, and see what it gives in terms of running time and memory usage. It would be very nice to put all that memory to good use! See the Readme file that comes with the GMP-ECM package for more info. HTH
Thanks, that's good advice. That's pretty much what I ended up doing, although I thought the -save option wouldn't save the output it shows... maybe it doesn't exactly but it would save the important stuff.

It's probably worth setting the affinity anyway even if you don't also have Prime95 running, just because Windows will switch you around to different cores at whim and you'll lose any benefit of the core caching. There'd be a shared L3 cache which won't matter but the L1/L2 caching could have a benefit. And if you have a multi-socket system then switching between NUMA nodes would be detrimental.

I started a run last night with the output of some Prime95 stage 1 results. I have a file of 24 curves of M1277 with B1=29e8.

Feeding those into GMP-ECM tells me that stage 2 will use an estimated 17GB, although I see ecm.exe using nearly 19 GB currently. Fortunately this dev system is only using 110 of it's 144 GB right now (including that 19 GB of ECM). I guess I could run another one. :)

Here's the output of one of the stage 2 runs... maybe you can help figure out if everything looks okay. I was a little confused about the input number being 0x1FFF... but it does say it's doing "special division for factor of 2^1277-1"

One of the main takeaways there is stage 2 took 5754 seconds (let's call it 96 minutes). I guess that's okay for a 29e8 B1 for stage 2? The CPU is one of the cores of an X5690 @ 3.47 GHz. It's done 6 of the 24 so far and that timing is pretty consistent. Longest one was 5815 seconds, not that much off from the 5754 of the quickest one.

Code:
GMP-ECM 6.4.4 [configured with MPIR 2.6.0] [ECM]
Resuming ECM residue saved with Prime95
Input number is 0x1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF (385 digits)
Using special division for factor of 2^1277-1
Using B1=2900000000-2900000000, B2=105101237217912, polynomial Dickson(30), sigma=3389447693745215
dF=2097152, k=2, d=23130030, d2=13, i0=113
Expected number of curves to find a factor of n digits:
35	40	45	50	55	60	65	70	75	80
10	28	89	309	1175	4842	21459	102212	513971	2730842
Step 1 took 15ms
Using 44 small primes for NTT
Estimated memory usage: 17G
Initializing tables of differences for F took 8703ms
Computing roots of F took 687282ms
Building F from its roots took 623860ms
Computing 1/F took 229328ms
Initializing table of differences for G took 2421ms
Computing roots of G took 758954ms
Building G from its roots took 619859ms
Computing roots of G took 777922ms
Building G from its roots took 621860ms
Computing G * H took 120859ms
Reducing  G * H mod F took 123829ms
Computing polyeval(F,G) took 1162766ms
Computing product of all F(g_i) took 2953ms
Step 2 took 5753906ms
Expected time to find a factor of n digits:
35	40	45	50	55	60	65	70	75	80
16.51h	1.89d	5.92d	20.59d	78.24d	322.47d	3.92y	18.65y	93.78y	498.26y

 2015-04-23, 17:26 #39 lycorn     "GIMFS" Sep 2002 Oeiras, Portugal 1,571 Posts I think it looks just fine. Pretty much what I would expect. The "weird" string of 1´s is actually your input number (2^1277-1) in binary, 385 digits long. The memory use is consistent with the values I get in my system. Using B1=8e08, the estimated mem usage is 4049 MB, but the during the run the usage fluctuates along the progress of the computation, and may use up to 4600 MB. So estimating 17GB and using up to 19 seems OK. The times also seem reasonable, compared with the ones on my system (much better than mines, actually...). All in all, I think you´re doing pretty well. Hope you´ll find a factor soon... I´m not making fun, just being optimistic It´s good to get help from dream machines like the ones you use. One last note: Comparing the number of curves estimated by P95 alone to find a 65~digit factor (360,000) with the number estimated by the combo you´re using (21,459) shows how much more effective your setting is, even noting that each of "your" curves takes longer to run. Last fiddled with by lycorn on 2015-04-23 at 17:31
2015-04-23, 18:15   #40
Serpentine Vermin Jar

Jul 2014

338510 Posts

Quote:
 Originally Posted by lycorn I think it looks just fine. Pretty much what I would expect. The "weird" string of 1´s is actually your input number (2^1277-1) in binary, 385 digits long.
Doh! I should have known that. All the FF's should have tipped me that it was the 2^x-1 at work.

Quote:
 Originally Posted by lycorn The memory use is consistent with the values I get in my system. Using B1=8e08, the estimated mem usage is 4049 MB, but the during the run the usage fluctuates along the progress of the computation, and may use up to 4600 MB. So estimating 17GB and using up to 19 seems OK. The times also seem reasonable, compared with the ones on my system (much better than mines, actually...). All in all, I think you´re doing pretty well. Hope you´ll find a factor soon... I´m not making fun, just being optimistic It´s good to get help from dream machines like the ones you use. One last note: Comparing the number of curves estimated by P95 alone to find a 65~digit factor (360,000) with the number estimated by the combo you´re using (21,459) shows how much more effective your setting is, even noting that each of "your" curves takes longer to run.
Thanks for the feedback. It's helpful to know I'm not doing something totally stupid beforehand in case I do throw some resources at this here and there.

I'm still doing triple-checks on self-verified LL runs and I figure this might be another fun mini-project once that's out of the way.

2015-04-23, 19:43   #41
lorgix

Sep 2010
Scandinavia

3×5×41 Posts

Quote:
 Originally Posted by Madpoo Thanks for the feedback. It's helpful to know I'm not doing something totally stupid beforehand in case I do throw some resources at this here and there. I'm still doing triple-checks on self-verified LL runs and I figure this might be another fun mini-project once that's out of the way.
See post #16 in this thread for how to optimize the bounds. You will arrive at a higher B2.
Use -maxmem to limit RAM usage if you want to.

2015-04-25, 17:57   #42
Serpentine Vermin Jar

Jul 2014

5·677 Posts

Quote:
 Originally Posted by Madpoo If I have a system with 12 cores, it's very easy to have Prime95 manage all the work with a nice, tidy worktodo file, and I can set the affinity so each worker is using it's own core.
I managed to script out the finer points of running multiple "ecm.exe" processes at once.

Notably, on the 12-core (2x6 core) system I'm testing around on, I wanted to run 12 instances of ECM at once, each with affinity to a specific core, running in "Idle" priority, and logging it's output to it's own file.

The powershell command to set affinity/priority on a running process would have failed if there were multiple processes with the same name. Solution: make copies of "ecm.exe" named "ecm1.exe" through "ecm12.exe". Done.

To launch ECM itself, I kick it off in it's own command window:
start /min cmd /c ecm%corenum%.exe -v -c %curves% -inp %infile% %b1% > %infile%.out

In the batch file I set corenum, curves, b1 and infile to whatever (the "infile" might simply be named "1277" and contains "2^1277-1")

It kicks off a command console in it's own minimized window, running that specifically named exe file.

Then I have to pause about a second (do this however... I use the command line replacement "TCMD" from JPSoft which has a "delay" command). That gives the exe time to start up before the next step.

That next step is to run a simple Powershell set of commands:
PowerShell "$Process = Get-Process ecm%corenum%;$Process.ProcessorAffinity=%mask%; \$Process.PriorityClass = 'Idle'"

For that to work you would need to set the %mask% variable to bit masked affinity to have "corenum" run on a specific one.

For my Windows system with hyperthreading enabled, cpus 1 and 2 are the physical and HT of one core, 3 and 4 are the next, etc.

So if corenum=1 then I'd want the mask to be 0x1, corenum=2 would be 0x4, corenum=3 is 0xF, etc. Going back to my preference for TCMD as a command replacement, it's easy to get the decimal mask (which powershell can use) with this little thing:

(just does a shift left of 0x1 by the corenum-1, and then times 2 since I skip over the HT cores).

Or you can do some "if %corenum%==5 set mask=256" things to keep it in the realm of "cmd.exe" compatible.

It's a little Rube Goldberg'ish but it works.

For doing stage 2 work where Prime95 did the stage 1, I could work that in as well but because of the memory usage I could probably only run 1 or maybe 2 on a machine at once anyway, and at that point it's fine to just manually set the affinity/priority as I feed it a list of a couple hundred stage 1 curves to finish and leave it alone.

2015-04-26, 01:14   #43
lycorn

"GIMFS"
Sep 2002
Oeiras, Portugal

62316 Posts

Quote:
 Originally Posted by Madpoo For doing stage 2 work where Prime95 did the stage 1, I could work that in as well but because of the memory usage I could probably only run 1 or maybe 2 on a machine at once anyway, and at that point it's fine to just manually set the affinity/priority as I feed it a list of a couple hundred stage 1 curves to finish and leave it alone.
Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...

2015-04-27, 14:01   #44
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

22·1,877 Posts

Quote:
 Originally Posted by lycorn Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...
I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers?
If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would
be very fast....

I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).

 Similar Threads Thread Thread Starter Forum Replies Last Post Gordon GMP-ECM 3 2016-01-08 12:44 D. B. Staple Factoring 2 2007-12-14 00:21 jasong GMP-ECM 9 2007-10-25 22:32 Angular Hardware 18 2004-11-15 07:04 Matthias C. Noc PrimeNet 5 2004-08-25 15:42

All times are UTC. The time now is 02:56.

Mon Feb 6 02:56:31 UTC 2023 up 172 days, 25 mins, 1 user, load averages: 1.92, 1.24, 1.00