mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Lone Mersenne Hunters

Reply
 
Thread Tools
Old 2015-04-22, 18:15   #34
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

5×677 Posts
Default

As a somewhat "fly on the wall" observer so far, I'm just dipping my toes into the GMP-ECM thing.

I have to admit, it's not very user friendly, and I mean that with all due respect, nothing towards how the actual code itself works.

It's just that, let's say I have several servers with many spare GB's of RAM available, maybe I want to throw some work it's way.

If I have a system with 12 cores, it's very easy to have Prime95 manage all the work with a nice, tidy worktodo file, and I can set the affinity so each worker is using it's own core.

So what it sounds like is that I should use this nice ecosystem for stage 1, and then get into hack mode to pipe all that into GMP-ECM (albeit running on Windows...Linux may be much nicer). I had a heckuva time getting just one instance of "ecm.exe" to affine to a single core. I can do a "start /affinity=<hex mask> ecm.exe ..." from the command line, and that's about as close as I could get to truly automating the affinity. Otherwise I was launching it and then using task manager to set the affinity after the fact, manually. No thanks. Because some things you can't do when "START"ing with affinity, like pipe stdin to the process you're launching or redirect output. You would kind of need one of the 3rd party (or Microsoft) apps to change the affinity of a running process, and be able to script it.

Oh, and bear in mind, in my case I could be running 12-20 instances at once if I had the memory available.

Then there's the question of output from the program. If I use the start /affinity, it launches in a new console window and when the program finishes, it closes it. Unless I had the foresight to capture all of the output by piping it somewhere, if it did manage to find anything it'd be lost. And while I found the -inp option to read the work in from a file that contains something like "2^1277-1" I had no luck finding a switch that would output any text, just the large save file. And when using "start" it's not always easy to redirect console output to a file with any certainty (you may wind up redirecting the output of your "start" command and not the program you're actually starting).

See the dilemma? I appreciate that GMP-ECM is faster, but for a simple fella like myself it would take far too much effort to actually use it for more than just tinkering. Then there's the issue of getting some kind of result out of it that could be fed into Primenet. In theory it's not that hard to output a text line similar to what mfaktc does. All the manual result page does is parse the text for the relevant info, and as George said, ECM results are accepted on the honor system.

Lest this come across as mere complaining, I'll be more specific in what would be nice to see. Can GMP-ECM, at least the Win64 compiled version, do these things:
  • a switch to output text to a file instead of (or as well as) the screen
  • a switch to set CPU affinity for the process
  • and considering that experienced users know about optimizing things, could it automatically pick optimal bounds for the type of work?

Ideally it'd be cool if Prime95 used the faster code that GMP-ECM apparently uses so there's not the back and forth shuffle in the first place but I know George is busy. His list of "nice to have" changes in Prime95 is probably long already.
Madpoo is offline   Reply With Quote
Old 2015-04-22, 18:38   #35
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3×5×41 Posts
Default

<factorme.txt >> log.txt
is how I log results.
-n sets low priority.
Optimal bounds depend on many different things. Luckily, the efficiency curve is pretty flat around the optimal.
Adjusting bounds based on experience is not hard.
lorgix is offline   Reply With Quote
Old 2015-04-22, 19:45   #36
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

563910 Posts
Default

Quote:
Originally Posted by Madpoo;400653[*
and considering that experienced users know about optimizing things, could it automatically pick optimal bounds for the type of work?[/LIST]
Ideally it'd be cool if Prime95 used the faster code that GMP-ECM apparently uses so there's not the back and forth shuffle in the first place but I know George is busy. His list of "nice to have" changes in Prime95 is probably long already.
How would we tell GMP-ECM what the type of work is, such that it could make a decision about optimal bounds? In present use, we supply it an input number and a B1 chosen to most quickly find factors of a certain size. We choose B1 based on how much previous factoring effort was done. To get GMP-ECM to do that, we would have to somehow have the input file include all previous factoring work, while also adding quite a bit of code to determine what B1 now makes sense for the memory available, size of composite, and size of hoped-for-factor. That would be pretty complicated, though comes fairly naturally to users of the program after some experience. YAFU and ecm.py both automate some of these choices- you can tell YAFU what digit-level of ECM has already been run, and to what digit level you wish to go and how many cores to use for ECM, and it automagically chooses B1/B2 bounds and fires up the proper number of ecm.exe processes.

I agree that it would be quite nice for ECM work overall for Prime95 to be able to call ecm directly for stage 2! Alas, a very small use case compared to the overall project.
VBCurtis is online now   Reply With Quote
Old 2015-04-23, 13:49   #37
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,571 Posts
Default

@Madpoo:

The following considerations assume you will run Stage 1 on Prime95 and Stage 2 on GMP-ECM.

Make a copy of Prime95 directory, and configure the program for a single worker. You´ll be running the Stage 1 from there.
Insert the line GmpEcmHook=1 in prime.txt.
Run P95 with B1=B2=the applicable B1default bound for the given exponent (for 1277, B1=8e08). You may later adjust this value based on experience. Run as many curves as you wish
Upon finishing, the results.txt file from P95 will contain a bunch of Stage 1 residues, one from each curve.
Copy the results.txt file to the GMP-ECM home directory, open a cmd prompt and run ecm -v -resume results.txt -save <choose a file name> 8e08-8e08. You may as well keep both executables and associated files in the same directory. It will save you the hassle of moving files around.
GMP-ECM will sweep through the residues file, running the Stage 2 for each one, and recording the residues in the file chosen for saving (GMP-ECM creates the file, you just supply the name).
Upon finishing, the save file will contain the Stage 2 residues.
To report the results to the server, contact GW - he may get you sorted out.
To set the affinity, I use the task manager. It´s really not a must to set the affinity, but it keeps ecm from stealing resources from the lower priority workers - I do it because when running ecm on one core and P95 on the remaining ones, if I don´t stick ecm to the idle core, the CPU usage of the P95 workers decreases.
I appreciate it´s a difference between running this combo of programs at home, on a single desktop computer, or in a datacenter environment, where the abilty to automate tasks is a must. But anyway, if you want to give it a shot, there´s how I do it. As there is some manual work involved, you may wish to do long runs - jobs that take a couple of days on each Stage - to reduce the manual overhead. Try using much larger bounds, as suggested by VBCurtis, and see what it gives in terms of running time and memory usage. It would be very nice to put all that memory to good use!
See the Readme file that comes with the GMP-ECM package for more info.
HTH
lycorn is offline   Reply With Quote
Old 2015-04-23, 16:33   #38
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

5·677 Posts
Default

Quote:
Originally Posted by lycorn View Post
@Madpoo:

The following considerations assume you will run Stage 1 on Prime95 and Stage 2 on GMP-ECM.

Make a copy of Prime95 directory, and configure the program for a single worker. You´ll be running the Stage 1 from there.
Insert the line GmpEcmHook=1 in prime.txt.
Run P95 with B1=B2=the applicable B1default bound for the given exponent (for 1277, B1=8e08). You may later adjust this value based on experience. Run as many curves as you wish
Upon finishing, the results.txt file from P95 will contain a bunch of Stage 1 residues, one from each curve.
Copy the results.txt file to the GMP-ECM home directory, open a cmd prompt and run ecm -v -resume results.txt -save <choose a file name> 8e08-8e08. You may as well keep both executables and associated files in the same directory. It will save you the hassle of moving files around.
GMP-ECM will sweep through the residues file, running the Stage 2 for each one, and recording the residues in the file chosen for saving (GMP-ECM creates the file, you just supply the name).
Upon finishing, the save file will contain the Stage 2 residues.
To report the results to the server, contact GW - he may get you sorted out.
To set the affinity, I use the task manager. It´s really not a must to set the affinity, but it keeps ecm from stealing resources from the lower priority workers - I do it because when running ecm on one core and P95 on the remaining ones, if I don´t stick ecm to the idle core, the CPU usage of the P95 workers decreases.
I appreciate it´s a difference between running this combo of programs at home, on a single desktop computer, or in a datacenter environment, where the abilty to automate tasks is a must. But anyway, if you want to give it a shot, there´s how I do it. As there is some manual work involved, you may wish to do long runs - jobs that take a couple of days on each Stage - to reduce the manual overhead. Try using much larger bounds, as suggested by VBCurtis, and see what it gives in terms of running time and memory usage. It would be very nice to put all that memory to good use!
See the Readme file that comes with the GMP-ECM package for more info.
HTH
Thanks, that's good advice. That's pretty much what I ended up doing, although I thought the -save option wouldn't save the output it shows... maybe it doesn't exactly but it would save the important stuff.

It's probably worth setting the affinity anyway even if you don't also have Prime95 running, just because Windows will switch you around to different cores at whim and you'll lose any benefit of the core caching. There'd be a shared L3 cache which won't matter but the L1/L2 caching could have a benefit. And if you have a multi-socket system then switching between NUMA nodes would be detrimental.

I started a run last night with the output of some Prime95 stage 1 results. I have a file of 24 curves of M1277 with B1=29e8.

Feeding those into GMP-ECM tells me that stage 2 will use an estimated 17GB, although I see ecm.exe using nearly 19 GB currently. Fortunately this dev system is only using 110 of it's 144 GB right now (including that 19 GB of ECM). I guess I could run another one. :)

Here's the output of one of the stage 2 runs... maybe you can help figure out if everything looks okay. I was a little confused about the input number being 0x1FFF... but it does say it's doing "special division for factor of 2^1277-1"

One of the main takeaways there is stage 2 took 5754 seconds (let's call it 96 minutes). I guess that's okay for a 29e8 B1 for stage 2? The CPU is one of the cores of an X5690 @ 3.47 GHz. It's done 6 of the 24 so far and that timing is pretty consistent. Longest one was 5815 seconds, not that much off from the 5754 of the quickest one.

Code:
GMP-ECM 6.4.4 [configured with MPIR 2.6.0] [ECM]
Resuming ECM residue saved with Prime95 
Input number is 0x1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF (385 digits)
Using special division for factor of 2^1277-1
Using B1=2900000000-2900000000, B2=105101237217912, polynomial Dickson(30), sigma=3389447693745215
dF=2097152, k=2, d=23130030, d2=13, i0=113
Expected number of curves to find a factor of n digits:
35	40	45	50	55	60	65	70	75	80
10	28	89	309	1175	4842	21459	102212	513971	2730842
Step 1 took 15ms
Using 44 small primes for NTT
Estimated memory usage: 17G
Initializing tables of differences for F took 8703ms
Computing roots of F took 687282ms
Building F from its roots took 623860ms
Computing 1/F took 229328ms
Initializing table of differences for G took 2421ms
Computing roots of G took 758954ms
Building G from its roots took 619859ms
Computing roots of G took 777922ms
Building G from its roots took 621860ms
Computing G * H took 120859ms
Reducing  G * H mod F took 123829ms
Computing polyeval(F,G) took 1162766ms
Computing product of all F(g_i) took 2953ms
Step 2 took 5753906ms
Expected time to find a factor of n digits:
35	40	45	50	55	60	65	70	75	80
16.51h	1.89d	5.92d	20.59d	78.24d	322.47d	3.92y	18.65y	93.78y	498.26y
Madpoo is offline   Reply With Quote
Old 2015-04-23, 17:26   #39
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

1,571 Posts
Default

I think it looks just fine. Pretty much what I would expect.
The "weird" string of 1´s is actually your input number (2^1277-1) in binary, 385 digits long.
The memory use is consistent with the values I get in my system. Using B1=8e08, the estimated mem usage is 4049 MB, but the during the run the usage fluctuates along the progress of the computation, and may use up to 4600 MB. So estimating 17GB and using up to 19 seems OK.
The times also seem reasonable, compared with the ones on my system (much better than mines, actually...).
All in all, I think you´re doing pretty well. Hope you´ll find a factor soon... I´m not making fun, just being optimistic It´s good to get help from dream machines like the ones you use.

One last note: Comparing the number of curves estimated by P95 alone to find a 65~digit factor (360,000) with the number estimated by the combo you´re using (21,459) shows how much more effective your setting is, even noting that each of "your" curves takes longer to run.

Last fiddled with by lycorn on 2015-04-23 at 17:31
lycorn is offline   Reply With Quote
Old 2015-04-23, 18:15   #40
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

338510 Posts
Default

Quote:
Originally Posted by lycorn View Post
I think it looks just fine. Pretty much what I would expect.
The "weird" string of 1´s is actually your input number (2^1277-1) in binary, 385 digits long.
Doh! I should have known that. All the FF's should have tipped me that it was the 2^x-1 at work.

Quote:
Originally Posted by lycorn View Post
The memory use is consistent with the values I get in my system. Using B1=8e08, the estimated mem usage is 4049 MB, but the during the run the usage fluctuates along the progress of the computation, and may use up to 4600 MB. So estimating 17GB and using up to 19 seems OK.
The times also seem reasonable, compared with the ones on my system (much better than mines, actually...).
All in all, I think you´re doing pretty well. Hope you´ll find a factor soon... I´m not making fun, just being optimistic It´s good to get help from dream machines like the ones you use.

One last note: Comparing the number of curves estimated by P95 alone to find a 65~digit factor (360,000) with the number estimated by the combo you´re using (21,459) shows how much more effective your setting is, even noting that each of "your" curves takes longer to run.
Thanks for the feedback. It's helpful to know I'm not doing something totally stupid beforehand in case I do throw some resources at this here and there.

I'm still doing triple-checks on self-verified LL runs and I figure this might be another fun mini-project once that's out of the way.
Madpoo is offline   Reply With Quote
Old 2015-04-23, 19:43   #41
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3×5×41 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Thanks for the feedback. It's helpful to know I'm not doing something totally stupid beforehand in case I do throw some resources at this here and there.

I'm still doing triple-checks on self-verified LL runs and I figure this might be another fun mini-project once that's out of the way.
See post #16 in this thread for how to optimize the bounds. You will arrive at a higher B2.
Use -maxmem to limit RAM usage if you want to.
lorgix is offline   Reply With Quote
Old 2015-04-25, 17:57   #42
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

5·677 Posts
Default

Quote:
Originally Posted by Madpoo View Post
If I have a system with 12 cores, it's very easy to have Prime95 manage all the work with a nice, tidy worktodo file, and I can set the affinity so each worker is using it's own core.
I managed to script out the finer points of running multiple "ecm.exe" processes at once.

Notably, on the 12-core (2x6 core) system I'm testing around on, I wanted to run 12 instances of ECM at once, each with affinity to a specific core, running in "Idle" priority, and logging it's output to it's own file.

The powershell command to set affinity/priority on a running process would have failed if there were multiple processes with the same name. Solution: make copies of "ecm.exe" named "ecm1.exe" through "ecm12.exe". Done.

To launch ECM itself, I kick it off in it's own command window:
start /min cmd /c ecm%corenum%.exe -v -c %curves% -inp %infile% %b1% > %infile%.out

In the batch file I set corenum, curves, b1 and infile to whatever (the "infile" might simply be named "1277" and contains "2^1277-1")

It kicks off a command console in it's own minimized window, running that specifically named exe file.

Then I have to pause about a second (do this however... I use the command line replacement "TCMD" from JPSoft which has a "delay" command). That gives the exe time to start up before the next step.

That next step is to run a simple Powershell set of commands:
PowerShell "$Process = Get-Process ecm%corenum%; $Process.ProcessorAffinity=%mask%; $Process.PriorityClass = 'Idle'"

For that to work you would need to set the %mask% variable to bit masked affinity to have "corenum" run on a specific one.

For my Windows system with hyperthreading enabled, cpus 1 and 2 are the physical and HT of one core, 3 and 4 are the next, etc.

So if corenum=1 then I'd want the mask to be 0x1, corenum=2 would be 0x4, corenum=3 is 0xF, etc. Going back to my preference for TCMD as a command replacement, it's easy to get the decimal mask (which powershell can use) with this little thing:
set mask=%@eval[1 shl 2*(%corenum%-1)]

(just does a shift left of 0x1 by the corenum-1, and then times 2 since I skip over the HT cores).

Or you can do some "if %corenum%==5 set mask=256" things to keep it in the realm of "cmd.exe" compatible.

It's a little Rube Goldberg'ish but it works.

For doing stage 2 work where Prime95 did the stage 1, I could work that in as well but because of the memory usage I could probably only run 1 or maybe 2 on a machine at once anyway, and at that point it's fine to just manually set the affinity/priority as I feed it a list of a couple hundred stage 1 curves to finish and leave it alone.
Madpoo is offline   Reply With Quote
Old 2015-04-26, 01:14   #43
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

62316 Posts
Default

Quote:
Originally Posted by Madpoo View Post

For doing stage 2 work where Prime95 did the stage 1, I could work that in as well but because of the memory usage I could probably only run 1 or maybe 2 on a machine at once anyway, and at that point it's fine to just manually set the affinity/priority as I feed it a list of a couple hundred stage 1 curves to finish and leave it alone.
Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...
lycorn is offline   Reply With Quote
Old 2015-04-27, 14:01   #44
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

22·1,877 Posts
Default

Quote:
Originally Posted by lycorn View Post
Doing Stage 1 with Prime95, at least for these very small exponents, is definitely the best shot as Prime95 is a lot faster than GMP-ECM for S1. Feeding GMP-ECM with a large number of P95 S1 curves and forgetting it for a while renders the overhead negligible. Well, sort of...
I am curious. How much faster is P95 than GMP-ECM for S1 for Mersenne/Wagstaff numbers?
If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would
be very fast....

I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5).
R.D. Silverman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GMP-ECM & Prime95 Stage 1 Files Gordon GMP-ECM 3 2016-01-08 12:44
Stage 1 with mprime/prime95, stage 2 with GMP-ECM D. B. Staple Factoring 2 2007-12-14 00:21
Need help to run stage 1 and stage 2 separately jasong GMP-ECM 9 2007-10-25 22:32
P4 Prescott - 31 Stage Pipeline ? Bad news for Prime95? Angular Hardware 18 2004-11-15 07:04
Stage 1 and stage 2 tests missing Matthias C. Noc PrimeNet 5 2004-08-25 15:42

All times are UTC. The time now is 02:56.


Mon Feb 6 02:56:31 UTC 2023 up 172 days, 25 mins, 1 user, load averages: 1.92, 1.24, 1.00

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔