![]() |
![]() |
#34 |
Serpentine Vermin Jar
Jul 2014
5×677 Posts |
![]()
As a somewhat "fly on the wall" observer so far, I'm just dipping my toes into the GMP-ECM thing.
I have to admit, it's not very user friendly, and I mean that with all due respect, nothing towards how the actual code itself works. It's just that, let's say I have several servers with many spare GB's of RAM available, maybe I want to throw some work it's way. If I have a system with 12 cores, it's very easy to have Prime95 manage all the work with a nice, tidy worktodo file, and I can set the affinity so each worker is using it's own core. So what it sounds like is that I should use this nice ecosystem for stage 1, and then get into hack mode to pipe all that into GMP-ECM (albeit running on Windows...Linux may be much nicer). I had a heckuva time getting just one instance of "ecm.exe" to affine to a single core. I can do a "start /affinity=<hex mask> ecm.exe ..." from the command line, and that's about as close as I could get to truly automating the affinity. Otherwise I was launching it and then using task manager to set the affinity after the fact, manually. No thanks. Because some things you can't do when "START"ing with affinity, like pipe stdin to the process you're launching or redirect output. You would kind of need one of the 3rd party (or Microsoft) apps to change the affinity of a running process, and be able to script it. Oh, and bear in mind, in my case I could be running 12-20 instances at once if I had the memory available. Then there's the question of output from the program. If I use the start /affinity, it launches in a new console window and when the program finishes, it closes it. Unless I had the foresight to capture all of the output by piping it somewhere, if it did manage to find anything it'd be lost. And while I found the -inp option to read the work in from a file that contains something like "2^1277-1" I had no luck finding a switch that would output any text, just the large save file. And when using "start" it's not always easy to redirect console output to a file with any certainty (you may wind up redirecting the output of your "start" command and not the program you're actually starting). See the dilemma? I appreciate that GMP-ECM is faster, but for a simple fella like myself it would take far too much effort to actually use it for more than just tinkering. Then there's the issue of getting some kind of result out of it that could be fed into Primenet. In theory it's not that hard to output a text line similar to what mfaktc does. All the manual result page does is parse the text for the relevant info, and as George said, ECM results are accepted on the honor system. Lest this come across as mere complaining, I'll be more specific in what would be nice to see. Can GMP-ECM, at least the Win64 compiled version, do these things:
Ideally it'd be cool if Prime95 used the faster code that GMP-ECM apparently uses so there's not the back and forth shuffle in the first place but I know George is busy. His list of "nice to have" changes in Prime95 is probably long already. ![]() |
![]() |
![]() |
![]() |
#35 |
Sep 2010
Scandinavia
3×5×41 Posts |
![]()
<factorme.txt >> log.txt
is how I log results. -n sets low priority. Optimal bounds depend on many different things. Luckily, the efficiency curve is pretty flat around the optimal. Adjusting bounds based on experience is not hard. |
![]() |
![]() |
![]() |
#36 | |
"Curtis"
Feb 2005
Riverside, CA
563910 Posts |
![]() Quote:
I agree that it would be quite nice for ECM work overall for Prime95 to be able to call ecm directly for stage 2! Alas, a very small use case compared to the overall project. |
|
![]() |
![]() |
![]() |
#37 |
"GIMFS"
Sep 2002
Oeiras, Portugal
1,571 Posts |
![]()
@Madpoo:
The following considerations assume you will run Stage 1 on Prime95 and Stage 2 on GMP-ECM. Make a copy of Prime95 directory, and configure the program for a single worker. You´ll be running the Stage 1 from there. Insert the line GmpEcmHook=1 in prime.txt. Run P95 with B1=B2=the applicable B1default bound for the given exponent (for 1277, B1=8e08). You may later adjust this value based on experience. Run as many curves as you wish Upon finishing, the results.txt file from P95 will contain a bunch of Stage 1 residues, one from each curve. Copy the results.txt file to the GMP-ECM home directory, open a cmd prompt and run ecm -v -resume results.txt -save <choose a file name> 8e08-8e08. You may as well keep both executables and associated files in the same directory. It will save you the hassle of moving files around. GMP-ECM will sweep through the residues file, running the Stage 2 for each one, and recording the residues in the file chosen for saving (GMP-ECM creates the file, you just supply the name). Upon finishing, the save file will contain the Stage 2 residues. To report the results to the server, contact GW - he may get you sorted out. To set the affinity, I use the task manager. It´s really not a must to set the affinity, but it keeps ecm from stealing resources from the lower priority workers - I do it because when running ecm on one core and P95 on the remaining ones, if I don´t stick ecm to the idle core, the CPU usage of the P95 workers decreases. I appreciate it´s a difference between running this combo of programs at home, on a single desktop computer, or in a datacenter environment, where the abilty to automate tasks is a must. But anyway, if you want to give it a shot, there´s how I do it. As there is some manual work involved, you may wish to do long runs - jobs that take a couple of days on each Stage - to reduce the manual overhead. Try using much larger bounds, as suggested by VBCurtis, and see what it gives in terms of running time and memory usage. It would be very nice to put all that memory to good use! See the Readme file that comes with the GMP-ECM package for more info. HTH |
![]() |
![]() |
![]() |
#38 | |
Serpentine Vermin Jar
Jul 2014
5·677 Posts |
![]() Quote:
It's probably worth setting the affinity anyway even if you don't also have Prime95 running, just because Windows will switch you around to different cores at whim and you'll lose any benefit of the core caching. There'd be a shared L3 cache which won't matter but the L1/L2 caching could have a benefit. And if you have a multi-socket system then switching between NUMA nodes would be detrimental. I started a run last night with the output of some Prime95 stage 1 results. I have a file of 24 curves of M1277 with B1=29e8. Feeding those into GMP-ECM tells me that stage 2 will use an estimated 17GB, although I see ecm.exe using nearly 19 GB currently. Fortunately this dev system is only using 110 of it's 144 GB right now (including that 19 GB of ECM). I guess I could run another one. :) Here's the output of one of the stage 2 runs... maybe you can help figure out if everything looks okay. I was a little confused about the input number being 0x1FFF... but it does say it's doing "special division for factor of 2^1277-1" One of the main takeaways there is stage 2 took 5754 seconds (let's call it 96 minutes). I guess that's okay for a 29e8 B1 for stage 2? The CPU is one of the cores of an X5690 @ 3.47 GHz. It's done 6 of the 24 so far and that timing is pretty consistent. Longest one was 5815 seconds, not that much off from the 5754 of the quickest one. Code:
GMP-ECM 6.4.4 [configured with MPIR 2.6.0] [ECM] Resuming ECM residue saved with Prime95 Input number is 0x1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF (385 digits) Using special division for factor of 2^1277-1 Using B1=2900000000-2900000000, B2=105101237217912, polynomial Dickson(30), sigma=3389447693745215 dF=2097152, k=2, d=23130030, d2=13, i0=113 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 10 28 89 309 1175 4842 21459 102212 513971 2730842 Step 1 took 15ms Using 44 small primes for NTT Estimated memory usage: 17G Initializing tables of differences for F took 8703ms Computing roots of F took 687282ms Building F from its roots took 623860ms Computing 1/F took 229328ms Initializing table of differences for G took 2421ms Computing roots of G took 758954ms Building G from its roots took 619859ms Computing roots of G took 777922ms Building G from its roots took 621860ms Computing G * H took 120859ms Reducing G * H mod F took 123829ms Computing polyeval(F,G) took 1162766ms Computing product of all F(g_i) took 2953ms Step 2 took 5753906ms Expected time to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 16.51h 1.89d 5.92d 20.59d 78.24d 322.47d 3.92y 18.65y 93.78y 498.26y |
|
![]() |
![]() |
![]() |
#39 |
"GIMFS"
Sep 2002
Oeiras, Portugal
1,571 Posts |
![]()
I think it looks just fine. Pretty much what I would expect.
The "weird" string of 1´s is actually your input number (2^1277-1) in binary, 385 digits long. The memory use is consistent with the values I get in my system. Using B1=8e08, the estimated mem usage is 4049 MB, but the during the run the usage fluctuates along the progress of the computation, and may use up to 4600 MB. So estimating 17GB and using up to 19 seems OK. The times also seem reasonable, compared with the ones on my system (much better than mines, actually...). All in all, I think you´re doing pretty well. Hope you´ll find a factor soon... I´m not making fun, just being optimistic ![]() One last note: Comparing the number of curves estimated by P95 alone to find a 65~digit factor (360,000) with the number estimated by the combo you´re using (21,459) shows how much more effective your setting is, even noting that each of "your" curves takes longer to run. Last fiddled with by lycorn on 2015-04-23 at 17:31 |
![]() |
![]() |
![]() |
#40 | ||
Serpentine Vermin Jar
Jul 2014
338510 Posts |
![]() Quote:
Quote:
I'm still doing triple-checks on self-verified LL runs and I figure this might be another fun mini-project once that's out of the way. |
||
![]() |
![]() |
![]() |
#41 | |
Sep 2010
Scandinavia
3×5×41 Posts |
![]() Quote:
Use -maxmem to limit RAM usage if you want to. |
|
![]() |
![]() |
![]() |
#42 | |
Serpentine Vermin Jar
Jul 2014
5·677 Posts |
![]() Quote:
Notably, on the 12-core (2x6 core) system I'm testing around on, I wanted to run 12 instances of ECM at once, each with affinity to a specific core, running in "Idle" priority, and logging it's output to it's own file. The powershell command to set affinity/priority on a running process would have failed if there were multiple processes with the same name. Solution: make copies of "ecm.exe" named "ecm1.exe" through "ecm12.exe". Done. ![]() To launch ECM itself, I kick it off in it's own command window: start /min cmd /c ecm%corenum%.exe -v -c %curves% -inp %infile% %b1% > %infile%.out In the batch file I set corenum, curves, b1 and infile to whatever (the "infile" might simply be named "1277" and contains "2^1277-1") It kicks off a command console in it's own minimized window, running that specifically named exe file. Then I have to pause about a second (do this however... I use the command line replacement "TCMD" from JPSoft which has a "delay" command). That gives the exe time to start up before the next step. That next step is to run a simple Powershell set of commands: PowerShell "$Process = Get-Process ecm%corenum%; $Process.ProcessorAffinity=%mask%; $Process.PriorityClass = 'Idle'" For that to work you would need to set the %mask% variable to bit masked affinity to have "corenum" run on a specific one. For my Windows system with hyperthreading enabled, cpus 1 and 2 are the physical and HT of one core, 3 and 4 are the next, etc. So if corenum=1 then I'd want the mask to be 0x1, corenum=2 would be 0x4, corenum=3 is 0xF, etc. Going back to my preference for TCMD as a command replacement, it's easy to get the decimal mask (which powershell can use) with this little thing: set mask=%@eval[1 shl 2*(%corenum%-1)] (just does a shift left of 0x1 by the corenum-1, and then times 2 since I skip over the HT cores). Or you can do some "if %corenum%==5 set mask=256" things to keep it in the realm of "cmd.exe" compatible. It's a little Rube Goldberg'ish but it works. For doing stage 2 work where Prime95 did the stage 1, I could work that in as well but because of the memory usage I could probably only run 1 or maybe 2 on a machine at once anyway, and at that point it's fine to just manually set the affinity/priority as I feed it a list of a couple hundred stage 1 curves to finish and leave it alone. |
|
![]() |
![]() |
![]() |
#43 | |
"GIMFS"
Sep 2002
Oeiras, Portugal
62316 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#44 | |
"Bob Silverman"
Nov 2003
North of Boston
22·1,877 Posts |
![]() Quote:
If one turns on the fast modular reduction for 2^n-1 within GMP-ECM, I would think that it would be very fast.... I agree that P95 would/should be faster for large exponents (e.g. exponents greater than say 10^5). |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
GMP-ECM & Prime95 Stage 1 Files | Gordon | GMP-ECM | 3 | 2016-01-08 12:44 |
Stage 1 with mprime/prime95, stage 2 with GMP-ECM | D. B. Staple | Factoring | 2 | 2007-12-14 00:21 |
Need help to run stage 1 and stage 2 separately | jasong | GMP-ECM | 9 | 2007-10-25 22:32 |
P4 Prescott - 31 Stage Pipeline ? Bad news for Prime95? | Angular | Hardware | 18 | 2004-11-15 07:04 |
Stage 1 and stage 2 tests missing | Matthias C. Noc | PrimeNet | 5 | 2004-08-25 15:42 |